Myth: Without /3GB a single program can’t allocate more than 2GB of virtual memory


Virtual memory is not virtual address space (part 2).

This myth is being perpetuated even as I write this series of articles.

The user-mode virtual address space is normally 2GB, but that doesn't limit you to 2GB of virtual memory. You can allocate memory without it being mapped into your virtual address space. (Those who grew up with Expanded Memory or other forms of bank-switched memory are well-familiar with this technique.)

HANDLE h = CreateFileMapping(INVALID_HANDLE_VALUE, 0,
                             PAGE_READWRITE, 1, 0, NULL);

Provided you have enough physical memory and/or swap file space, that 4GB memory allocation will succeed.

Of course, you can't map it all into memory at once on a 32-bit machine, but you can do it in pieces. Let's read a byte from this memory.

BYTE ReadByte(HANDLE h, DWORD offset)
{
 SYSTEM_INFO si;
 GetSystemInfo(&si);
 DWORD chunkOffset = offset % si.dwAllocationGranularity;
 DWORD chunkStart = offset - chunkOffset;
 LPBYTE pb = (LPBYTE*)MapViewOfFile(h, FILE_MAP_READ, 0,
      chunkStart, chunkOffset + sizeof(BYTE));
 BYTE b = pb[chunkOffset];
 UnmapViewOfFile(pb);
 return b;
}

Of course, in a real program, you would have error checking and probably a caching layer in order to avoid spending all your time mapping and unmapping instead of actually doing work.

The point is that virtual address space is not virtual memory. As we have seen earlier, you can map the same memory to multiple addresses, so the one-to-one mapping between virtual memory and virtual address space has already been violated. Here we showed that just because you allocated memory doesn't mean that it has to occupy any space in your virtual address space at all.

[Updated: 10:37am, fix minor typos reported in comments.]

Comments (40)
  1. Anonymous says:

    Raymond, didn’t you mean:

    HANDLE h = CreateFileMapping(INVALID_HANDLE_VALUE, 0

    PAGE_READWRITE, -1, 0, NULL);

    ^^

  2. Anonymous says:

    Raymond is right – take a look into MSDN for a description of CreateFileMapping parameters.

  3. Anonymous says:

    Mm. If we’re going to be critical I’ll point out a missing comma before PAGE_READWRITE and that I’d normally use NULL, rather than 0, for lpAttributes (especially if you’re going to use NULL for lpName). It’s a style point, obviously.

  4. Anonymous says:

    Ooops, yep. Ahhh mornings.

  5. Anonymous says:

    Is this technique worth the trouble? Can it give reasonable performance?

    Assume for the moment that I’m an ignorant manager ;-)

    I have a CAD like application where users are building 3D models that won’t fit in 2GB and some day real soon won’t fit in 3GB. Is this a viable option or will performance suck?

    Could you use this in conjuction with some sort of ram disk?

    Or is all of this a waste of time and we should just plan on using 64 bit proc when the OS is read?

  6. Anonymous says:

    The mapping virtual memory to multiple addresses trick is very handy. I keep meaning to write an article for MSDN magazine on it; it’s very useful for high-throughput networking – and for high-speed MPEG decoders ;-)

    The only trick is that there’s no easy way of doing an all-in-one mapping of the memory to two adjacent blocks; you always run the risk that the memory you set aside is going to be gazumped before you can map the two virtual memory areas into adjacent sections.

    I keep meaning to find someone in the kernel team to ask if they’ll add this in as a feature.

  7. Anonymous says:

    You said you were going to write a byte, but you read it instead :)

  8. Anonymous says:

    Simon, out of curiousity, how is that mapping virtual memory trick handy? It seems that the biggest problem with that trick is that you never have fine control over where these pages are located.

    Now, if I could go and allocate a large set of pages, and only then subdivide them and point them to varying places, that would be cool.

  9. Anonymous says:

    All someone has to do now is write a virtual-virtual page manager library on top of this, and you can access terabytes? pentabytes? of memory. Build your backing disk file, and put this vvpm on top of it.

    Performance? Well, then you’d need more than 4GB of RAM, in which case why didn’t you just go get a 64bit processor in the first place.

  10. Anonymous says:

    John:

    Allocate a memory mapped section – 64kb in size is the minimum due to Windows platform constraints.

    Reserve a block of memory 128kb in length in the memory map using VirtualAlloc.

    Now, release those blocks, and map the memory mapped section into the 128kb area twice – one after the other.

    You can now use the 128kb block as a persistent ring buffer that acts in an optimal fashion; when the head pointer moves into the second 64kb block, you adjust the head and tail back into the first 64kb block. You never run out of memory, never have to compact the buffer, and you can access it as fast as the processor can handle (unlike other ring buffer implementations which often force you to read data out one byte at a time).

  11. Anonymous says:

    Simon, neat trick, I’m not sure I fully understand this but if VirtualAlloc allocates memory in your process’ (virtual) address space, shouldn’t it be safe from being "gazumped" (unless you "gazump" it yourself)?

  12. Anonymous says:

    "one-to-one mapping between virtual memory and virtual address space"

    I think the wording is a bit confusing here. I think what Raymond’s explanation really amounts to is "one-to-one mapping between virtual memory and physical memory". And this mapping is indeed not one-to-one, multiple PA can be mapped to same VA, but one VA only maps to one PA in a single address space.

    Take his sample program, for demonstration purpose, especially the one referred in his link "you can map the same memory to multiple addresses", you will be granted two virtual addresses (actually logical addresses, but on most mordern ia32 OS, it’s almost always true logical address and virtual address are inter changable terms). They points to the same data in–now here is the secret–physical memory. This can be easily verfied if you know how to use a kernel inspection tool. For example, in my case, I got two pointers, 0x8a0000, 0x8b0000, on windows 2000, the virtual address of the PTE that contains the virtual address itself are translated to 0xc0002280 and 0xc00022c0. And if you go to both virtual addresses and display the content, they are 0x095fnnnn and 0x095fmmmm, remember the PTE_MASK? :) So the real physical addresses for both virtual memory addresses are 0x095f0000.

    In conclusion, multiple PA can be mapped to the same VA, but one VA only maps to one PA in a single address space.

    Raymond, I enjoyed your writing and keep up the good job!

  13. Anonymous says:

    "0x095fnnnn and 0x095fmmmm" are actually

    0x095ff0nn and 0x095ff0mm

    and btw, the PA changes everytime you run the program, but the two VA addresses rarely changes because afterall they are virtual :)

    sorry about the typo.

    ps, would be nice to be able to edit comment

  14. Anonymous says:

    I stand by my original statement. They both map the same virtual memory, which since it happens to be present, also map the same physical memory. But if it gets paged out, then it doesn’t exist in physical memory yet still exists in virtual memory. And that virtual memory is still shared.

  15. Anonymous says:

    Raymond, physical memory doesn’t necessarily mean RAM, it’s broader term referred to actual data storage, thus it can be pagefile. When the data is swapped out, the pte is marked as non-present, however, it’s a fake non-present, its bits actually point to the location of the data in the pagefile.

  16. Anonymous says:

    "They both map the same virtual memory" is really bad wording imo. They are really same physical memory, different virtual memory.

  17. Anonymous says:

    Andreas:

    The problem is that what I do is reserve the memory space using VirtualAlloc, and then I have to release my reservation on that space in order to set up the memory mapping there.

    Goes like this:

    1. Reserve 128kb span of virtual memory space

    2. Get pointer to 128kb span.

    3. Release 128kb span of virtual memory space.

    4. Map 64kb at pointer.

    5. Map 64kb at pointer + 64kb.

    That’s why I want a kernel call; there’s a chance that (especially if I’m using 3rd party COM objects) something else in another thread can allocate a new page in my virtual memory map for its own heap – and if it happens between 3 and 5, there’s a chance that my allocation will fail.

    Steps 1 through 3 are used in an effort to (hopefully!) make it more likely that steps 4 and 5 will succeed. But it’s not foolproof.

  18. Anonymous says:

    "Physical memory doesn’t necessarily mean RAM, it’s broader term referred to actual data storage, thus it can be pagefile." Perhaps that’s your definition of physical memory, but I think you’ll find that in most texts, physical memory means "memory chips". Certainly that’s what the x86 CPU manuals mean.

    I’ve made a note to myself to draw a diagram in a future entry to explain how virtual address space, virtual memory, and physical memory all fit together, since it seems there is a lot of confusion over it.

  19. Anonymous says:

    Dead mpz. No.

  20. Anonymous says:

    Oops, Freudian slip there.

  21. Anonymous says:

    mpz – no, we can’t. Some of these "hacks" are actually really damn efficient ways of using the system. Moving to AMD64 doesn’t have anything to do with them one way or another.

  22. Anonymous says:

    Simon:

    Yeah, I got that, but if you do this setup at a time when there’s only one thread (e.g. at startup) it should be safe, right?

  23. Anonymous says:

    Another variation of allocating/using more memory than your adress space is the Address Windowing Extensions (AWE) API set, which allows user applications to use up to 64 GB of physical non-paged memory in a 32-bit virtual address space on 32-bit platforms, with windowed views to this physical memory from within the application’s virtual address space.

    Which has been used for quite a few years on 32bit Intel Xeon systems which support 36bits for physical addresses.

  24. Anonymous says:

    Simon – I’m not sure that mapping the buffer twice helps speed that much. Okay, it means that you don’t have to access byte-by-byte the whole time, but in practice you wouldn’t do that anyway, and you still have to do the comparison to buffer-size in order to wrap once you have gone off the end.

    Having said all that, I haven’t actually tested it. Might do so in a bit. :)

    Regarding the interleaving of memory – with all the stuff Windows lets you do to its internals, isn’t there a call to temporarily lock the scheduler?

    (By the way – still waiting for Statues of Ice…)

  25. Anonymous says:

    Hmm, well, what it comes down to is letting the system do some buffer handling and cacheing for you (which I guess is what virtual memory and paging is basically about). Your suggestion Raymond doesn’t give you 4GB addressability, it just lets you look at a window into that data inside your 2/3Gb space which you have to manage with some not so convenient code. I don’t know the physical addressing limit of current Intel chips, but given it’s a 32 bit architecture I guess it’s at most 4Gb (well, there’s some real address extensibility PAE? used by Datacenter Server, I believe). Anyway, into which (assuming you even have so much real storage) quite a bit of other stuff like bits of the the OS and other apps and data have to fit, so probably you end up paging most of a 4Gb buffer. While such functions as (virtual?) file mapping maybe make addressing a bit easier, and enable you to store stuff simply in the page file, you could also do it yourself with the file system. But then of course you might be double paging some of the data without knowing it. The fact is, you can’t just take a linear address space (or range) of say 4GB and address it directly, which is what would be nice, and make the programming much simpler. Some architectures let you do just that, regardless of how much real storage or virtual storage address range in a normal address space is available.

  26. Anonymous says:

    "Certainly that’s what the x86 CPU manuals mean. ", show me where this manual is. In fact, the intel ia32 manuals deliberately avoid a specific definition of what physical address(memory) really means. It’s not only my definition but the definition shared by many other windows hackers such as Shreiber and the sysinternal wizards.

    Looking forward to your diagram explaining the relation between *virtual address space" and "virtual memory".

  27. Anonymous says:

    The ia32 system manual do make a distinction between physical memory (RAM) and disk pagefile. The windows implemention (linux does this too I believe) uses non-present PTE to locate pagefile data, therefore, the more exact term is physical address that we use to refer to actual data storage in the system.

  28. Anonymous says:

    Dear God. Can’t we all just go AMD64 *right now* and forget these memory hacks ever existed?

  29. Anonymous says:

    "you can’t just take a linear address space (or range) of say 4GB and address it directly"

    Please elaborate on what you mean by "address it directly". You can do that today. It’s called a 32-bit pointer.

    By physical memory I mean things which are addressed via physical addresses.

  30. Anonymous says:

    Raymond, I would close the comments on this article now before everyone gets even more confused :)

  31. Anonymous says:

    Raymond –

    "Please elaborate on what you mean by "address it directly". You can do that today. It’s called a 32-bit pointer."

    Yeah, I’m just a dumb guy. But I’d have thought my meaning was actually pretty clear from everything else I wrote, I’m talking about addressing the 4GB you’ve mapped to what you’ve chosen to call "virtual memory". You can’t address that directly, only as you say yourself by mapping parts of it to a window in your 2GB address space.

    With the hardware mechanisms I’ve mentioned in a previous thread, data in a seperate address space can have a full range of say 4GB which can be directly addressed by suitable hardware registers, for what it’s worth.

  32. Anonymous says:

    Alas that’s not how the i386 works; more on this is scheduled for a future entry.

  33. Anonymous says:

    Andreas wrote:

    > Yeah, I got that, but if you do this setup

    > at a time when there’s only one thread (e.g.

    > at startup) it should be safe, right?

    Because of load-time DLL injection, and depending on the version of the OS and which libraries it needs, there’s no real way of 100% ensuring that.

    Paul Walker wrote:

    > Simon – I’m not sure that mapping the buffer

    > twice helps speed that much. Okay, it means

    > that you don’t have to access byte-by-byte

    > the whole time, but in practice you wouldn’t

    > do that anyway, and you still have to do the

    > comparison to buffer-size in order to wrap

    > once you have gone off the end.

    You only have to do the comparison to buffer size when you add or remove data from the buffer – which is infrequently. And even then, you only need to adjust your head pointer when you remove data.

    Compare it with the other ring buffer implementations out there on CodeProject et al. Either you can make things really complicated, and deal with the buffer in two halves, or you access it a byte at a time.

    Also, another advantage of this is that if you use it with overlapped IO, you can set things up such that you can be feeding data into one end of the buffer, consuming it from the other, and whenever you try to do a Send or Write, you can always send the whole contents of the buffer in one go. If you’re using zero-copy writes/sends, this becomes quite important (eg. high-perf networking).

    > Regarding the interleaving of memory – with

    > all the stuff Windows lets you do to its

    > internals, isn’t there a call to temporarily

    > lock the scheduler?

    Nope. At least, not in user-mode code. That’d kind of defeat the point of the pre-emptiveness of the scheduler. You can temporarily boost yourself to realtime priority, but that (again) isn’t 100% foolproof – you could have other realtime threads to compete with. Another option is to suspend all other threads in your process, and then resume them, but you risk setting up race conditions and losing state that way

  34. Anonymous says:

    Regarding all of this… Linux makes distinctions between various different types of address, such as:

    * user virtual addresses (seen by user-space code)

    * physical addresses (used between CPU and memory)

    * bus addresses (used between peripherals and mem)

    * kernel logical addresses

    * kernel virtual addresses

    I’d imagine Windows makes at least the same distinctions.

    For what it’s worth, I’ve never heard physical memory used to refer to anything except RAM chips. Certainly not swap.

  35. Anonymous says:

      As Evan already mentioned on his blog, Raymond Chen has a great series on /3GB switch on his blog. What is really cool is that Raymond takes on some myths about the /3GB switch and  the fact that he…

  36. Anonymous says:

    "Allocate means it is available in my address space."

    No! You missed the whole point of this entry.

    "Mapped" means it is available in your address space. "Allocate" means that the memory is committed to you.

    You can "actually" use the memory – my ReadByte function shows how. It’s not convenient, but it is nevertheless actual.

  37. Anonymous says:

    Something that you are failing to deal with in this series of articles, that makes all their information lost:

    explain the different between allocated memory and mapped memory.

    If i want to allocate 2.5GB of memory, and i don’t have the /3GB switch, then i can’t use 2.5GB of memory.

    You try to talk about tricks of mapping parts of files and whatnot. But i want to map a 2.5GB file, i want to access it all; and i can’t. And that is because without the /3GB switch i cannot allocate more than 2GB.

    Dwell on the difference between *technically* somewhere there is more than 2GB of memory set aside for stuff, and *actually* not being able to use more than 2GB.

    Allocate means it is available in my address space.

  38. Anonymous says:

    I started programming on x86 machines during a period of large and rapid change in the memory management

Comments are closed.