There is no /8TB flag on 64-bit Windows


A customer reported that their 64-bit application was crashing on Windows 8.1. They traced the problem back to the fact that the user-mode address space for 64-bit applications on Windows 8.1 is 128TB, whereas it was only 8TB on earlier versions of Windows for x64. They wanted to know whether there was a compatibility mode that they could set for their application to say "Give me only 8TB please."

No, there is no compatibility mode to shrink the user-mode address space down to 8TB.¹ You get it all, like it or not.

Allowing 64-bit applications to opt out of the 128TB address space has implications for the rest of the system, such as reducing the strength of ASLR.

As for how they ended up having a dependency on the address space being at most 8TB, they didn't say, but I have a guess: They are using the unused bits for tagging.

If you are going to use tagged pointers, you need to put your tag bits in the least significant bits, since those are bits you control. For example, if you align all your objects on 16-byte boundaries, then you have four available bits for tagging. If you're going to use upper bits for tagging, at least verify that those upper bits are available.

¹ If anything, the user-mode address space will only grow over time. The original 8TB limit in earlier versions of Windows was due to the lack of a CMPXCHG16B instruction, but now that the instruction is available (and support for it deemed mandatory), the full 256TB address space is available to the operating system, and right now, it decides to split the address space evenly between user mode and kernel mode.

Comments (36)
  1. Brian_EE says:

    What about a /640K switch for my program? Because my program should never need more than 640KB.

  2. anonymouscommenter says:

    Having debugged a similar one, it's possible they had a similar mistake to mine: high bits of a pointer were being sheared off by some erroneous programming. Changes to 8.1 ASLR made the bug far easier to trigger. Oh and it was a Word plugin and our customers press send to Microsoft and the stack frame is hosed. The Word team must have had fun with that one.

  3. anonymouscommenter says:

    The irony of that quote is that Gates never said it, and there's never been any evidence for him having ever said it.

  4. anonymouscommenter says:

    Is there a reason people don't just use full-blown structs for this?  Because I'm pretty sure this is (at best) non-portable even if you do stick to the low bits.  You can't know that the architecture is aligned in any particular way, and x86 doesn't even require proper alignment.

  5. anonymouscommenter says:

    @Kevin: Pointer-sized objects can be written atomically in flat architectures.

  6. anonymouscommenter says:

    @Brian_EE: And then some unsporting person runs your program under a memory manager that borrows the VGA and MDA framebuffers to allow a whopping 736k of address space, and it doubtless fails in interesting ways. cf the "Packed file is corrupt" errors when DOS was loaded high.

    @Anon: The Windows 95 resource kit does say that 2G address space is enough for any desktop application, which will sound just as ridiculous one day.

  7. anonymouscommenter says:

    If the solution to your problem involves crippling the system that should generally be a red flag that you're doing something wrong.

  8. Azarien says:

    Don't tag your pointers. Simple.

  9. Karellen says:

    @Kevin - space saving. If you've got a value which is a pointer, and you want to store only 2 or 3 bits extra with it, then using a struct for that on a 64-bit system will double your per-value storage space to 128 bits, wasting around 60 bits/value. If you've got a few hundred million of these values, the wasted space can really add up.

    One notable example of this is in weakly-typed (scripting) languages, where all variables contain a "value" which can be of any type. Getting an efficient storage mechanism for *every single value* used by all the programs written in that language is very important. Tagged pointer, or tagged float/double values, are popular ways of achieving this. See:

    wingolog.org/.../value-representation-in-javascript-implementations

    Rob Sayre's post "Mozilla’s New JavaScript Value Representation" is no longer at its original home, but an archive is about 1/3 of the way down:

    evilpie.github.io/.../cache.aspx.htm

    I also think the Linux kernel uses (used?) tagged pointers in of the large heavily used caches, like the directory entry/inode cache, or the page table tree.

  10. anonymouscommenter says:

    Before I modified our crash reporter to use a semi-documented technique to get the true Windows version number for crash reports, it reported NT 6.2 (Windows 8.0) for Windows 8.1.  To distinguish 8.1 from 8.0, I looked in crash reports for system DLLs in the 0x00007FFFxxxxxxxx range instead of the 0x000007FFxxxxxxxx range.

    Yesterday, I debugged a problem at first seemingly triggered by Windows 10.  It turned out that some old code wasn't ported to 64-bit correctly, and our program on Windows 10 for some reason ends up allocating the whole lower 4 GB of address space, so the program crashed when that object got allocated above 4 GB.

  11. waleri says:

    I am not sure whether my math is correct, but 256 TB sounds like 48bits.

    Did they put their tagging bits in the *middle*?

    I would've used either least significant bits or most significant bits, either way I would've been in the clear.

  12. anonymouscommenter says:

    @waleri: "I would've used either least significant bits or most significant bits, either way I would've been in the clear."

    Unless you had more than 16 flags.

  13. anonymouscommenter says:

    The upper 16 bits of VAs are off limits anyway due to x86-64 pointer canonicalization...

    [And this is precisely why people want to use them as tag bits. Because no valid pointer will use them. -Raymond]
  14. anonymouscommenter says:

    Why does Windows use only 48 bits for address space on a 64-bit system?

  15. anonymouscommenter says:

    Ooops. I should have refreshed my browser before posting that.

  16. anonymouscommenter says:

    @Myria: to get version #, extract version info from kernel32.DLL. Masking GetWindowsVersion was not very smart of MS.

  17. anonymouscommenter says:

    What happened to bending over backwards for backwards compatibility? (E.g. only using 8TB address space unless the application manifest specifies that it's Very-Large-Address-Aware)

    [Can the "backward compatibility for buggy apps is good" commenters and the "backward compatibility for buggy apps is bad" commenters get into a room, and then somebody tell me who wins? -Raymond]
  18. anonymouscommenter says:

    Anonymous Coward: they invented the NT kernel so that buggy applications don't take down the whole the OS when they crash, and then it's not their problem anymore

  19. anonymouscommenter says:

    I read the linked article but I'm still not clear as to what that instruction has to do with the address space limit.

  20. anonymouscommenter says:

    thanks'not an anon' google 'x86-64 pointer canonicalization' and the wikipedia article has a good explanation of the issues and why only 48 bits (for now)

  21. Darran Rowe says:

    @Neil:

    The memory manager needs it to do its magic with paging.

    Of course, this answer probably isn't satisfactory, but without knowing how protected mode works, more specifically memory paging, a longer answer isn't that useful.

    If you do know about how protected mode works, then try looking at the Intel or AMD platform documents. The Intel 64 and IA-32 Architectures Software Developer Manuals and the AMD64 Architecture Programmer's Manual would be the place to look.

  22. anonymouscommenter says:

    @John Elliott No sane Win 95 application will, at least.

  23. anonymouscommenter says:

    Neil: The basic problem is that Windows has a lock-free data structure (a singly-linked list) that requires an atomic compare-exchange operation. Although the data structure can be 128 bits, the early x64 processors could only atomically operate on 64 bits at a time. This means that all the bits of the structure that need to be changed atomically have to fit within those 64 bits.

    The elements that need to change simultaneously are the count of entries in the list, a change counter, and a pointer to the first entry. The count of entries is documented as having 16 bits, so that's essentially hard-coded. That leaves the change counter and first entry pointer to fight it out for the remaining 48 bits.

    Since the alignment of entries is guaranteed to be on 16-byte boundaries, the low 4 bits of the pointer will always be 0 and can be used for the change counter. However, the change counter has to be bigger than 4 bits to ensure the ability to detect changes on systems with a lot of CPUs.

    Apparently MS decided that 9 was the minimum number of bits required for change detection, so that leaves 39 bits for the pointer. 2^39 bits yields about 500 billion pointers to unique locations, times 16 bytes per location gives you 8TB of space.

    So why does the linked-list data structure's limit of 8TB worth of pointers require that Windows itself have the same limit? Because these linked-lists are ubiquitous throughout the OS, so being able to have a something in memory that couldn't be put in one of the lists would not be very useful.

  24. anonymouscommenter says:

    Windows 11, which will be true 64-bit, will implement the /256TB flag, for apps using only 48-bit pointers.

  25. anonymouscommenter says:

    @640k Doc Brown just turned up in the DeLorean to pick up a new hoverboard and dropped off a copy of Windows 11. He told me that where we were going, we don't need pointers.

    In six years time we're due 128 bit cpu's (80386=1985, Opteron=2003).

  26. Jan Ringoš says:

    If I were to guess, based on how typical RAM size increased in servers around me over time, I would say we might get x86-64 CPUs with a few more addressing bits around the time the unix epoch rolls over, but it will then still take several more years before they are actually used for something other than releasing address space pressure or ASLR-like things.

  27. cheong00 says:

    Too back the boot option have switch to limit the CPU and max memory the applications see, but no swtich to limit the max address space it use.

    Btw, does we have a swtch / registry value that blocks particular range(s) of address space being used, similar to what we had in EMM386.exe?

  28. anonymouscommenter says:

    With 64 Bit I can adress 16.777.216 TB of memory. Microsoft can only adress 256 TB. You should know that if you are going to use tagged pointers, you need to put your tag bits in the least significant bits, since those are bits you control.

  29. anonymouscommenter says:

    @ThomasX:

    Correction - current x86-64 processors can only address 256TB, as they only support a 48-bit virtual address space, even though addresses are stored in 64 bits. Canonical addresses have already been mentioned in the comments.

    You would face enormous obstacles trying to implement a 64-bit address space in any other OS running on the same processors.

  30. anonymouscommenter says:

    @ThomasX You can't have 16.777.216TB of memory in any x64 hardware currently available, Intels new Skylake for example will support 64GB. Which is why 256TB of address space with 128TB for kernel and 128TB for user is a reasonable compromise. Even 48 bits of address space is only useful for address space layout randomization, not because mapping that amount of memory is practical. I don't know if Microsoft were involved in the x64 design, but they are "inconvenienced" the same as any OS when it comes to implementing address spaces (it's generally not a real inconvenience at all). While they are some algorithms where a full 64 bit address space would seem like a nice solution, they aren't practical as you don't have full control over the address space anyway. Inconvenient things like program and memory mapped I/O (gpu etc) all polute the address space.

  31. anonymouscommenter says:

    @boogaloo: more like 2036. The 8080 (8bit) was 1974, The 8086 (16bit) 1978, The 80386 1985 and 64bit came 2003. Which gives a linear regression of 0.5174*Bitness+1969,5 with R² of 0,9971 or just short of 2036 for 128bit (and 1971 for 4bit, which actually was the year the 4004 was released). But of course the AMD64 architecture was not released first with intel, so if we only use intel Chips for comparison it will be even later that "x128" is released :-)

    Of course that's the same "reasoning" Mark Twain used in the following quote: "In the space of one hundred and seventy-six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oolitic Silurian Period, just a million years ago, next November, the Lower Mississippi River was upward of one million three hundred miles long, and stuck out over the Gulf of Mexico like a fishing-rod."

  32. anonymouscommenter says:

    256 TB ought to be enough for anybody.

  33. not an anon says:

    @Raymond -- while people may *want* to use them as tag bits, it doesn't mean the CPU will let 'em: trying to feed a noncanonical address to the CPU is documented in the AMD64 Architecture Programmer's Manual as causing a GPF (this is on print page 4 of volume 2, for those playing along at home).

    [Obviously, you remove the tag before dereferencing. This can often be done at no extra cost by folding it into the addressing mode. e.g. mov eax, [ebx-80000000h]. Now you don't even need to test the tag bit before dereferencing: If the tag bit is not set, you AV! -Raymond]
  34. Joshua says:

    [Can the "backward compatibility for buggy apps is good" commenters and the "backward compatibility for buggy apps is bad" commenters get into a room, and then somebody tell me who wins? -Raymond]

    Neither will win, because neither side is right. There's a big difference between retaining ABI compatibility when the contract changes and maintaining compatibility with those who abuse things. In this case, the number of significant bits in a pointer can be argued as part of the ABI (let someone else debate it--I will not).

    What this customer could (probably) do is reserve *all* of the address above 8TB, then start another thread (to force stack frame below 8TB) and have the first thread wait for it.

  35. anonymouscommenter says:

    I was only asking because MS has a long history of trying to maintain compatibility with abusive programs.

  36. anonymouscommenter says:

    @Anonymous Coward Mainly for important abusive programs.

Comments are closed.

Skip to main content