Virtual Memory
Last updated
Was this helpful?
Last updated
Was this helpful?
MIPS gives each program its own 32-bit address space
Programs can access any byte in their 32-bit address space (A: 2(32)bytes=4GB, A 32 bit address space gives you(theoretically) 4GB of memory you can address. In practice the OS reserves some if it so it is closer to 2GB of usable space.)
Always 32 bits in MIPS. The size depends on installed memory(30-bit RAM, address space(1GB)<4GB) Crash if we try to access more than 1GB.
Memory Fragmentation: As processes are loaded and removed from memory, the free memory space is broken into little pieces. It happens after sometimes that processes cannot be allocated to memory blocks considering their small size and memory blocks remains unused.
Each program can access any 32-bit memory address
What if multiple programs access the same address? -- sw R2, 1024(R0) will write to address 1024 regardless of the program that's running
They can corrupt or crash each other: security and reliability
If all programs have access to the same 32-bit memory space:
Can crash if less than 4GB of RAM memory in the system
Can run out of space if we run multiple programs
Can corrupt other programs' data
How do we solve this?
Key to the problem:"same memory space"
Can we give each program it's own virtual memory space?
if so, we can:
Separately map each program's memory space to the RAM memory(Mapping gives us flexibility in how we use the physical RAM memory) space
And even move it to disk if we run out of memory
Virtual memory is a layer of indirection
Any problem in computer science can be solved by adding indirection.
Virtual memory takes program addresses and maps them to RAM address
VM: mapping gives us flexibility in how we use the RAM
2.2.1 not enough memory
Map some of the program's address space to the disk
When we need it, we bring it into memory(Mapping lets us use our disk to give the illusion of unlimited memory)
2.2.2 holes in the address space
We can map a program's addresses to RAM addresses however we like
Each program has its own mapping.
Mappings lets us put our program data wherever we want in the RAM.
2.2.3 keeping programs secure
Program 1's and Program 2's addresses map to different RAM addresses
Because each program has its own address space, they can't access each other's data: security and reliability
Virtual memory: what the program sees
Physical memory: the physical RAM in the computer
What the program uses
In MIPS, this is the full 32-bit address space: 0 to 2(32)-1
What the hardware uses to talk to the RAM
Address space determined by how much RAM is installed
3.4.1 How does a program access memory?
Program executes a load with a virtual address(VA)
Computer translates the address to the physical address(PA) in memory
(If the physical address(PA) is not in memory, the operating system loads it in from disk, then store it in RAM)
The computer then reads the RAM using the physical address(PA) and returns the data to the program
The map from Virtual Addresses(VA) to Physical Addresses(PA) is the Page Table.
So far we have had one Page Table Entry (PTE) for every Virtual Address.
1 word == 4 bits
We need to translate every possible address:
Our programs have 32-bit Virtual Address spaces
That's 2^30 words that need Page Table Entries(1 billion entries!): 1 word VS 1 entry (If they don't have a Page Table Entry then we can't access them because we can't find the physical address)
How can we make this more manageable?
What if we divided memory up into chunks(pages) instead of words?
Coarse-grained: pages instead of words
The Page Table manages larger chunks(pages) of data:(each entry match a chuck(4KB) instead a word)
Fewer Page Table entries needed to cover the whole address space
But, less flexibility in how to use the RAM(have to move a page at a time)
Today:
Typically 4KB pages(1024 words per page): In total we have 1 billion words, so 1 billion / 1024 is 1 million Page Table entries. This is much more manageable.
Sometimes 2MB pages(524,228 words per page)
32 bit Virtual Addresses
28 bit Physical Addresses
4kB page require 12 bits for the Page offset
Page Table Entry says the page is on disk
Hardware(CPU) generates a page fault exception
The hardware jumps to the OS page fault handler to clean up
The OS chooses a page to evict from RAM and write to disk
If the page is dirty(has been changed), it needs to be written back to disk first
The OS then reads the page from disk and puts it in RAM
The OS then changes the Page Table to map the new page
The OS jumps back to the instruction that caused the page fault.(This time it won't cause a page fault since the page has been loaded.)
In the time it takes to do handle one page fault you could execute 80 million cycles on a modern CPU.
Page faults are the SLOWEST possible thing that can happen to a computer(except for human interaction).
very slow: when you have to page
very good: you don't crush because you run out of memory
Some systems do not page:
iOS: the OS kills your program if you use too much memory
OS X 10.9: the OS compresses your program first, then pages if it has to
Using virtual memory to prevent programs from corrupting each other
If each program has its own Page Table, then we can map each program's virtual addresses to unique physical addresses
Prevents programs from accessing each other's data
Each program has it's own 32-bit virtual address space
Linux defines how that address space is used:
1GB reserved for the kernel. No read/write.(Keeps kernel data mapped)
Program static data at the bottom
heap grows up
stack grows down(and has a fixed maximum size)
the bottom 128MB zround zero unavailable(for I/O)
Random offsets to enhance security
You never know exactly where a certain piece of data/code will be.
Kernel space is shared by all.
Shared data is mapped to both.
Separate mappings allow us to keep programs isolated.
Mappings allow programs to share data.
each process needs its own Page Table
OS makes sure they only map to the same physical address when we want sharing
VM is great:
Unlimited programs/memory, protection, flexibility, etc.
But it comes at a high cost:
Every memory operation has to look up in the page table
Need to access 1) the page table and 2) the memory address(2X memory accesses) (1.33 memory accesses per instruction. This is going to hurt.)
How can we make a page table look up really fast?
software would be far too slow
Perhaps a hardware page table cache?
To make VM fast we add a special Page Table cache: the Translation Lookaside Buffer(TLB)
Fast: less than 1 cycle
Very similar to a cache
To be fast, TLBs must be small:
Separate TLBs for instructions(iTLB) and data(dTLB)
64 entries, 4-way(4kB pages)
32 entries, 4-way(2MB pages)
Good: Page in RAM
PTE in the TLB
excellent
<1 cycle to translate, then go to RAM(or cache)
PTE not in the TLB
poor
20-1000 cycles to load PTE from RAM, then go to RAM(With 1.33 memory accesses per instruction we can't afford 20-1000 cycles very often.)
Bad: Page not in RAM
PTE in the TLB(unlikely)
Horrible
1 cycle to know it's on disk
~80M cycles to get it from disk
PTE not in the TLB
(ever so slightly more) horrible
20-1000 cycles to know it's on disk
~80M cycles to get it from disk
Make pages larger
Add a second TLB that is larger, but a bit slower
Have hardware to fill the TLB automatically(This is called a "hardware page table walk". Basically the hardware assumes the page table is in a special form in memory, and it can go get data from it on a TLB miss without having to go to the OS)
For 32-bit machine with 4kB pages we need:
1M Page Table Entries(32 bits - 12 bits for page offset= 20 bits, 2^20=1M)
Each PTE is about 4 bytes
4MB total
Not bad
except each program needs its own page table
If we have 100 programs running, we need 400MB of page tables
And here's the tough part:
We can't swap the page tables out to disk
If the page table is not in RAM, we have no way to access it to find it.
How can we fix this?
just add more indirection...
With multi-level page tables, what is the smallest amount of page table data we need to keep in memory for each 32-bit application?
4kB + 4kB:
We always need the 1st-level page table so we can find the 2nd-level ones.
But, the 1st-level page table only helps us find other page tables. It isn't enough by itself to translate any program addresses. So we need at least one 2nd-level page table to actually translate memory addresses:
One 1st-level page table:4kB
One 2nd-level page table: 4kB
4kB+4kB per application (much better than 4MB per application!)
VIPT
Idea:
Look up in the cache with a virtual address
verify that the data is right with a physical tag
VIPT: Virtually Indexed, Physically Tagged
Data in the cache is indexed by the virtual address
But tagged by the physical address
We only get a hit if the tag matches the physical address, but we can start looking with the virtual address
TLB translation and Cache lookup at the same time
Use virtual page bits to index the TLB
Use page offset bits to index the cache
TLB->Physical Page
Cache->Physical Tag
Physical Page=Physical tag->Cache hit!
Fast: look in the TLB at the same time as the cache
Safe: Cache hit only on PA match
But, can only use non-translated bits to index cache(limit on how large the cache can be)
need to combine the TLB and the cache for good performance
physical caches: require translation first(slow)
Virtual caches: use virtual addresses(no translation: fast, but protection)
Virtually-Indexed, Physically-Tagged(VIPT) caches: use VA for index and PA for tag
TLB hit