Who ate my memory?

The consumer versions of 32-bit Windows XP and Vista have a stated limit of 4 GB RAM, but a practical limit of about 3.1 GB. A lot of partial explanations have been floating around, so I thought I would try my hand at clearing up the issue. (Wish me luck!)

The design of the Intel 386 architecture supported access to up to 4 GB of physical memory (32-bit physical addresses) and unlimited virtual memory (4 GB at a time via 32-bit virtual addresses). 4 GB of physical memory seemed quite unthinkable at the time the chip was released, so the actual CPU did not have enough address pins to actually do this. Back then a 32-bit address space seemed extravagant for anything less than a supercomputer or mainframe. Nowadays, you can get 4 GB for under $400, and what was unthinkable in 1986 is within reach of anybody thinking about a new computer.

So at least I can access 4 GB, right? Nope.

The original IBM PC’s processor could access 1024 KB of physical address space, but you could only use 640 KB for RAM. The remaining 384 KB of address space was reserved for memory-mapped hardware and ROM. A similar situation exists with current systems: hardware reserves large chunks of the upper 1 GB of physical address space. Because of these reserved areas, a system with a 32-bit physical address space will be limited to somewhere around 3.1-3.5 GB of RAM.

To overcome the 32-bit limitation, recent x86 CPUs (Pentium Pro and later) have 36 address pins and can address 64 GB of RAM. The original design of the x86 32-bit protected mode only provided access to 32-bit addresses, so PAE (Physical Address Extensions) mode was created to allow access to 36-bit addresses.

PAE mode changes the layout of the page tables. Page tables map virtual addresses to physical addresses. Without PAE, the 32-bit virtual addresses map through 2 levels of page tables (1 level for huge pages) and are translated to 32-bit physical addresses. With PAE, the 32-bit virtual addresses map through 3 levels of page tables (2 levels for huge pages) and are translated to 64-bit physical addresses.

PAE doesn’t do anything to the virtual memory limit. Pointers are still 32 bits, so a process can only access 4 GB of address space at a time. However, using PAE, two or more processes could each access a different 4 GB of physical memory. With proper operating system support (i.e. AWE on Windows operating systems) PAE also allows a process to allocate additional memory outside its normal address space, then swap portions of that additional memory into its address space as needed.

So PAE is the answer, right? Well, maybe…

One thing that can prevent access to more than 4 GB of RAM is motherboard design. PAE can only access 64 GB of memory if all 36 address pins are properly wired up on the motherboard. This is not always the case, since those extra 4 wires make the motherboard just a little bit more expensive to design and manufacture (and use just a little bit more power). Many motherboards (especially on laptops) only have 32 address pins connected. If that is the case, no OS will be able to access more than 4 GB of address space.

Another hardware limitation is the ability of the chipset to remap RAM. If you have 4 GB of RAM, and 600 MB of address space is used up by PCI/AGP reserved areas, the only way to access the top 600 MB of RAM is to remap it into the addresses above the 4 GB boundary. Not all chipsets are able to do this, so some systems will just waste any RAM that happens to be shadowed by a PCI/AGP reserved region.

My BIOS reports 4 (or more) GB of RAM, I’ve enabled PAE, and I still only see 3.1 GB. What gives?

Unless you’re running one of the advanced server varieties of Windows, you won’t see more than 4 GB of physical memory. This is a limitation of Windows designed (I assume) to encourage people building expensive servers to pay more for Windows than those who are using it for normal day-to-day activities.

As for that last 0.9 GB, it all comes down to drivers and system stability. Not all drivers behave well in the presence of 64 bit physical addresses. Many driver authors assume that only the bottom 32 bits of the physical address are valid. Others don't properly handle the creation of bounce buffers when necessary (they’re needed when transferring data from a hardware device to/from a buffer that is above the 4 GB mark in physical memory).

Windows XP originally supported a full 4 GB of RAM. You would be limited to 3.1-3.5 GB without PAE, but if you enabled PAE on a 4 GB system with proper chipset and motherboard support, you would have access to the full 4 GB. As more people began to take advantage of this feature using commodity (read: cheapest product with the features I want) hardware, Microsoft noticed a new source of crashes and blue screens. These were traced to drivers failing to correctly handle 64-bit physical addresses. A decision was made to improve system stability at a cost of possibly wasting memory. XP SP2 introduced a change such that only the bottom 32 bits of physical memory will ever be used, even if that means some memory will not be used. (This is also the case with 32-bit editions of Vista.) While this is annoying to those who want that little bit of extra oomph, and while I would have liked a way to re-enable the memory “at my own risk”, this is probably the right decision for 99.9% of the general population of Windows users (and probably saves Dell millions in support costs). See the relevant KB article and a TechNet article for details.

Some of the server Operating Systems still allow the use of larger amounts of memory. I’m guessing that this is done with the assumption that higher quality parts will be used and drivers will be more likely to have been tested in PAE mode with large addresses.

Side-note: PAE is also related to page execution protection, called "hardware DEP" (Microsoft term), "NX" (AMD term), and "XD" (Intel term). In 32-bit x86 processors, this can only be used in PAE mode. This is why you might see PAE mode used even on systems with less than 4 GB of memory.

Performance note: 3-level page table lookups are inherently slower than 2-level page table lookups. However, the processor has substantial dedicated circuitry that usually eliminates most of the performance impact.