Posted by: Sue Loh
This is really a generic topic, not something specific to Windows CE, but I’ve had to explain it to a few people so I thought it would be worth writing up. If you had asked me what virtual memory was several years ago, I’d have given you some hand-wave explanation about making it appear as if your computer has much more RAM than it actually does. Which is only sort of correct and fairly imprecise. Sometimes people will get stuck on the idea of writing data to the file system, which is not quite correct either.
First, a thought experiment about a system with no virtual memory. Suppose you have an EXE running, it has loaded a couple of DLLs, has one thread and some heap. The physical memory might be portioned out between them like the picture on the left.
The EXE and DLLs consume an equal amount of RAM to the size of the files on the file system (Actually, probably larger because executables are mapped into RAM less compactly than they’re stored on disk, but executable loading is a story for another day.) The EXE / DLL code is copied from the file system into RAM, and will stay there until the executables are unloaded.
The heap could be implemented to grow on demand into the unused area. More likely, the heap would be implemented as a large fixed size block of RAM, and if the heap must grow, a new equal sized block of RAM would be allocated. Smaller allocations would be made out of the inside of these large blocks. This type of strategy would cut down on RAM fragmentation.
The thread stack size is fixed; by definition a thread stack must consume contiguous memory addresses, because the code refers to data on the stack by using offsets from the current frame pointer. It is not possible to chain new blocks onto the stack because the already compiled code would incorrectly refer to the wrong addresses.
The main problem with a system like this is that you must fight endless battles between fragmentation and RAM consumption. If the program in this picture was to unload DLL 2 and load a new DLL that was slightly larger than DLL 2, it would have to load the new DLL in the unused area, leaving a large gap where DLL 2 used to run. Similarly, heap fragmentation can result in fragmentation of device RAM. Yet if you were to speculatively decide to allocate a large heap block to avoid fragmentation, and if the application did not use so much heap space after all, you’d end up wasting RAM on a large unused heap block.
Additionally, such a system consumes a lot of RAM for little benefit. For example, most EXEs and DLLs include setup code that runs when the executable is first loaded. Once that setup code has run once, it will never run again. Yet this system is still consuming RAM to hold that code.
Even today, some very simple systems operate without virtual memory. But most use virtual memory to lessen the fragmentation problem. Imagine if you could take the single chunk of RAM on the computer and break it into pieces, with unused space in between which you could choose to use up later. That’s a virtual memory system. You could portion your RAM out as needed into a larger address “space.”
Virtual memory is a mapping from the addresses that running programs are using (virtual addresses) to the addresses of physical RAM (physical addresses). They almost never have any correspondence to each other; memory at a particular virtual address could come from a completely different physical address.
Virtual memory is allocated in increments of “pages” – all of the bits inside one page of virtual memory are together in the same page of physical memory. (On all of today’s Windows CE devices, a page is 4KB.) If two pages are next to each other in virtual memory – if their addresses are contiguous, such as 4KB page 0x00001000 and page 0x00002000 are next to each other – they are probably not next to each other in actual RAM. If you added a new page to the heap and one to the thread stack, and then went back to add a second new page to the heap, the two heap pages would have virtual addresses next to each other but they would not be next to each other in physical RAM.
Virtual memory is implemented by using “page tables,” which list, for every virtual address, what physical page corresponds to that address and some properties of the virtual page. Think of the page table as a big array, with one entry for each page of virtual memory. When a running program touches a memory address, the CPU looks into the page tables to find out what physical page is being touched. If there is no physical page there, the CPU raises an exception. The page tables also contain properties like whether the page is read-only or read/write. If an application tries to write to a read-only page, the CPU will raise an exception.
Note the directional nature of my definition of virtual memory: it is a mapping from VIRTUAL addresses to PHYSICAL addresses. Not the other way around. In Windows CE at least, it is almost impossible to ask, “For a given physical page, where is that page stored in virtual memory?” First off, there isn’t a one-to-one mapping; the physical page may be in virtual memory more than once, or may not be in virtual memory at all. For example if two processes load the same DLL, we won’t copy the DLL code into memory twice. We’ll copy it once and map two different virtual addresses to the same physical memory. Also, we just don’t keep track of physical pages that way. The Windows CE kernel doesn’t have a big table of information about each page of physical memory. It does have a list of free physical pages that aren’t being used for anything. Beyond that, the page mappings are tracked only in the page tables. The kernel does keep a reference count per page, so that it knows when the last page table reference to the page is removed.
32-bit addresses give you a 4GB address space (2^32 = 4GB), and Windows CE is a 32-bit operating system, so Windows CE has 4GB of address space. Does that mean you need a 4GB RAM chip in your device? No! There will be large areas of virtual address space with no physical memory assigned to them.
You could stop right there and you’d have a system that uses virtual memory. However most systems today take it a step further, and use the virtual memory system to implement physical memory savings.
Once you have a virtual address space that is different from the physical address space underneath it, you can do more than just leave empty space between allocations. You can also decide not to keep all of the virtually allocated pages in physical RAM at once. For example, when the application creates a new thread, you could reserve virtual address space for the thread stack, to satisfy the previously mentioned requirement that it be contiguous, but not assign physical RAM pages to those addresses. If the executing thread touched a virtual page that had no physical page behind it, the CPU would raise an exception. The kernel would have to catch that exception, recognize that it was an exception on a stack page, allocate a new physical page for the stack and fix up the page tables to point at the new page. Then the kernel could say for the thread to continue execution on the same instruction. This time the thread would find real RAM at that virtual address, and access it successfully. The exception (a “page fault”) is completely handled by the kernel so the application never knows that the stack memory was not already allocated.
I don’t know how widely used this terminology is – it may just be from Win32 – but virtual addresses are either “committed” (have physical memory assigned to them), “reserved” (assigned to an allocation, like a DLL or stack or heap, but don’t have physical memory assigned to them), or “free.”
Windows CE will “demand commit” pages, meaning that it will usually delay committing them until the last possible moment. There are also some cases in which Windows CE will take committed pages out of memory again, returning them to “reserved” status:
- Since DLL and EXE code can easily be re-loaded from the file system, it is often decommitted.
- Memory-mapped file pages also have well defined files to read from, so those are decommitted as well.
- Thread stack can shrink; if the system is very low on memory we’ll scavenge the top pages from a stack if the thread is not currently using them.
- Heaps can shrink; if the system is very low on memory we’ll check whether there are heap pages without data in them that can be decommitted.
However that is where Windows CE stops. Other operating systems have a “page file” in which they will write other pages that don’t have corresponding files, notably:
- Allocations from VirtualAlloc()
- Memory-mapped files that don’t actually have files underneath them (CreateFileMapping with INVALID_HANDLE_VALUE)
- The writable data of executables (global variables)
Those operating systems have algorithms to decide that these pages have not been used in a while, and will write them to the page file and decommit the RAM. Windows CE does not have a page file. We’ll demand-commit to delay committing these pages as long as possible, but once they are committed, the operating system won’t page them out again.
So, as you see, virtual memory in its most minimal definition is just having a mapping between virtual addresses and physical addresses. To lay out allocations in the address space in an efficient manner and avoid wasting physical memory on unallocated address space. In more practical terms, we also use the virtual address space to implement paging, avoid wasting physical memory on allocated addresses that are not actively being used.
And finally to get to the most pressing reason I wrote this post. People have asked, does the fact that Windows CE 6 changes from having 32MB of virtual memory per process to 2GB of virtual memory per process mean that CE 6 devices now require more RAM? And the answer is, absolutely not. Our address space just became sparser – there are more free gaps between allocations.