Why is my x64 process getting heap address above 4GB on Windows 8?


A customer noticed that when they ran their program on Windows 8, memory allocations were being returned above the 4GB boundary. They included a simple test program:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv)
{
    void *testbuffer = malloc(256);
    printf("Allocated address = %p\n", testbuffer);
    return 0;
}

When run on Windows 7, the function prints addresses like 0000000000179B00, but on Windows 8, it prints addresses like 00000086E60EA410.

The customer added that they care about this difference because pointers above 4GB will be corrupted when the value is truncated to a 32-bit value. As part of their experimentation, they found that they could force pointers above 4GB to occur even on Windows 7 by allocating very large chunks of memory, but on Windows 8, it's happening right off the bat.

The memory management team explained that this is expected for applications linked with the /HIGH­ENTROPY­VA flag, which the Visual Studio linker enables by default for 64-bit programs.

High-entropy virtual address space is more commonly known as Address Space Layout Randomization (ASLR). ASLR is a feature that makes addresses in your program less predictable, which significantly improves its resiliance to many categories of security attacks. Windows 8 expands the scope of ASLR beyond just the code pages in your process so that it also randomizes where the heap goes.

The customer accepted that answer, and that was the end of the conversation, but there was something in this exchange that bothered me: The bit about truncating to a 32-bit value.

Why are they truncating 64-bit pointers to 32-bit values? That's the bug right there. And they even admit that they can trigger the bug by forcing the program to allocate a lot of memory. They need to stop truncating pointers! Once they do that, all the problems will go away, and it won't matter where the memory gets allocated.

If there is some fundamental reason that they have to truncate pointers to 32-bit values, then they should build without /LARGEADDRESSAWARE so that the process will be given an address space of only 2GB, and then they can truncate their pointers all they want.

(Of course, if you're going to do that, then you probably should just compile the program as a 32-bit program, since you're not really gaining much from being a 64-bit program any more.)

Comments (43)
  1. anonymouscommenter says:

    Even without being large-address aware you still get the extra set of registers and PC-relative addressing modes by compiling for x64.  That alone makes a big difference to the performance of some software.  Register spillage was the bane of x86.

  2. Kirillenseer says:

    Assuming pointers to be 32b is stupid. Plain stupid.

  3. Dan Bugglin says:

    The sad part is the customer probably disabled ASLR on their application as the "solution".

    @cuavas yeah with compiler optimizations it's my understanding some functions may fit all their variables in registers when they couldn't in 32-bit mode, which can really speed things up in the right cases.

  4. cSharpFanboy says:

    This is nice to know. It makes it easier to find accidental pointer truncation errors on 64 bit builds: Just run on windows 8 and should crash right away. One of our customers found a (7 year old) pointer truncation for us about 6 months ago. We never noticed during testing, as we go out of my way to use as little memory as possible.

  5. anonymouscommenter says:

    You should have answered with "Why wouldn't it?"

    Might just as well have asked why it returns addresses over 1MB. You have a 64bit address space, so your addresses need 64 bits. End of story.

  6. anonymouscommenter says:

    @Andre: Yup. My response would have been "Because it can."

    As for "why are they truncating?", my guess would be that they're round-tripping their pointers through an int, rather than a (u)intptr_t.

  7. anonymouscommenter says:

    > my guess would be that they're round-tripping their pointers through an int, rather than a (u)intptr_t.

    Or a long, which would work on most 64-bit platforms but not Windows.

  8. Dave Bacher says:

    If you're loading 64bit components and DLLS, and you're compiled to a 2GB address space, isn't that likely to -- you know, crash horribly because they're expecting a larger address space?  I mean, they might work, but how many of them are routinely tested in a 64bit process running in a 2gb address space?  Probably pretty close to zero -- they really need to fix the bug.

  9. anonymouscommenter says:

    > Or a long, which would work on most 64-bit platforms but not Windows.

    Indeed. Took me awhile to find out why it was crashing. sizeof(char *) > sizeof(long) - ugh!

  10. Medinoc says:

    Note: I may be mistaken, but from my experience (ETA: And the very link on the article), simply omitting /LARGEADDRESSAWARE isn't enough for a 64-bit program because it's set by default: One has to explicitly clear it with /LARGEADDRESSAWARE:NO.

  11. anonymouscommenter says:

    Couldn't they just compile their program as a 32 bits program ? It should solve all their problems !

  12. Zan Lynx' says:

    They might not be able to Peter. Some 3rd party libraries are showing up now with 64 bit only builds, the opposite problem that developers used to have.

    I feel absolutely no sympathy for them however. I love not needing to worry about getting a 1 GB memory map. Linux works, BSD works and 64 bit Windows works. 32 bit Windows can get stuffed now.

  13. anonymouscommenter says:

    > (Of course, if you're going to do that, then you probably should just compile the program as a 32-bit program, since you're not really gaining much from being a 64-bit program any more.)

    You will gain more registers without having to allocate extra space for pointers, I didn't know this switch existed, I had been using a base address and indexes to avoid pointer bloat, note, however, the base+index can address far more than 2GB.

  14. anonymouscommenter says:

    @Evan: Or just use intptr_t and co - because those are actually guaranteed to work and while optional are incredibly widely implemented.

  15. anonymouscommenter says:

    I started out doing Assembler on S/360.  The very thought of truncating a pointer is analogous with hitting oneself in the face with a hammer and then wondering why in the world it hurts.  Are there really developers out there who do things like truncate pointers?

  16. Wanting to have some '32-bit compatible' pointers in a 64-bit process is not necessarily stupid.  For example the application may access legacy data structures (perhaps stored in existing files or the registry) that have space for only 32-bits.  If it inherently never needs to access more than 4 Gbytes of memory, leaving those structures unchanged may be far easier than any other solution.

  17. anonymouscommenter says:

    @Evan - Yes, that's probably it.

    Part of my brain knows that Windows uses LLP64, but for some reason it just never sinks in intuitively, and it just keeps suprising me every time I "realise" the fact anew.

  18. anonymouscommenter says:

    Why was the first 4GB of LAA 64-bit app address space not reserved so people find this out very quickly?

    [It would be the /LARGEADDRESSAWARE problem in reverse. People will write code on the assumption that pointers are always greater than 4GB. In other words, if your problem is that people don't support blue widgets, and you solve it by changing the system so that all anybody gets are blue widgets, then nobody will support red widgets! -Raymond]
  19. anonymouscommenter says:

    @12BitSlab: Win32 encouraged casting between ptr and DWORD. Guess what happens on conversion to 64 bit.

  20. anonymouscommenter says:

    @ Joshua -- I do realize that.  Bad advice leads to bad code.  Most -- not all -- ASM programmers would have realized the advice was wrong right away and did their own thing.

  21. anonymouscommenter says:

    @12bitslab

    I'd say truncating pointers and referencing the truncated pointer is more akin to cutting off your arm at the elbow and wondering why you can't pick anything up.

    @Richard T Russell

    As Raymond mentioned, if they want to intentionally not be LAA-aware, they should never have said they were LAA-aware in the first place.

    @Joshua

    In 1932, Chevrolet used 6v batteries on their pickup trucks. Guess what happens when you jump-start it with a 2015 Denali.

  22. anonymouscommenter says:

    @12BitSlab

    pointer<->DWORD wasn't bad advice, because a 32-bit pointer fits into a 32-bit DWORD.

    Since that advice was released, however, we've added pointer_64 and DWORD64.

  23. anonymouscommenter says:

    @Anon LAA is the default, but I'm sure if you asked them they'd say "but we ARE aware of large addresses, we just truncate them". I'd file this under "people who shouldn't be allowed to use pointers".

  24. alegr1 says:

    Warning for using DWORD instead of DWORD_PTR has been there for 10 years already. Get with the program, people.

    @Richard T Russel:

    Nobody stores virtual addresses in registry. That would be extremely stupid.

  25. anonymouscommenter says:

    @Richard Russell

    The code loading the data should be able to convert 32-bit pointers into 64-bit ones for a new version of a structure during the deserialization process.

    You can even reverse it when storing data into the registry for backwards compatibility.

    The pointer value is going to be pointless, anyway, since whatever it pointed to will likely change the next time the program runs.

    Just have the 64-bit version of the program parse the data structure and copy it to one with the pointers replaced with 64-bit ones.

    When saving, do the opposite: copy it all to a data structure with 32-bit pointers, with the pointers set to NULL.

    Not sure why you'd be serializing something that has pointer values, anyway.

  26. anonymouscommenter says:

    > In 1932, Chevrolet used 6v batteries on their pickup trucks. Guess what happens when you jump-start it with a 2015 Denali.

    It starts. Having a great uncle with such a truck and having heard his maintenance stories on it, I happen to know the circuits can take 12v for a couple of minutes.

  27. anonymouscommenter says:

    Correct.

    Every 64-bit "portable executable" has lots of 32-bit RVAs in its headers.-P

  28. @12BitSlab says:

    The original S/360 "Load Address" instruction cleared the MSB of the result, i.e. worked modulo 2**24. The LA of S/370-XA cleared only the MSBit...

  29. @@: To whomever is setting their name field to @<someone who previously posted>... how are we supposed to address you in our replies?  @@12BitSlab?  @@?  I'm just going to call you At now.

  30. anonymouscommenter says:

    @Neil:

    > Why was the first 4GB of LAA 64-bit app address space not reserved so people find this out very quickly?

    because a number of critical 64bit DLLs load into 32bit processes also so at the very least these would have to load at the same address in both 32 and 64bit processes.

    And then the 64bit code needs to share data with the thunking layer between 32 and 64 bits so this would mean that 32 bit code would need to be able to get data from 64 bit addresses.

    This was probably the right design decision although it would have been nice to have a "pure64" variant of the OS from day 1 that didn't map the first 4gb but basically it would have run explorer.exe and notepad.exe, not that useful.

  31. @Chris Iverson:

    The specific (real) example I had in mind is an interpreted programming language in which all pointers are *defined* to be 32 bits.  This creates issues for a 64-bit interpreter of that language, and your suggestions don't help in this case.

  32. anonymouscommenter says:

    (32bit RVA) well who in their right mind would make a >2/4gb .exe file...me! Giant DVD bound conditional SFX. Thankfully my custom resource format is outside the image. No points for guessing why I don't sign such an .exe.

  33. anonymouscommenter says:

    @Richard T Russell Reserve 4gb of address space in your 64bit version and add your 32 bit pointers to the base pointer.

    msdn.microsoft.com/.../aa366887(v=vs.85).aspx

    p.s. I used BBCBASIC x86 for years :-)

  34. anonymouscommenter says:

    Once upon a time, I was porting some code to a new 32-bit target system. The new system routinely returned pointers greater than 2^31-1, even for small allocations.

    But there were a lot of places in the code that truncated 32-bit pointers to just 31 bits, and used the highest bit as an error flag. (by casting the pointer to signed int and checking for >0)

  35. anonymouscommenter says:

    Is windows even reliable enough to where this matters ? I mean, the lack of quality and the whole spyware debacle that has plagued the windows franchise (but the company hasn't solved) kind of makes technical questions irrelevant.

  36. anonymouscommenter says:

    @Mike No operating system can solve the problem of malicious software, you should either be more careful or use a minority OS (I assume you picked the later). I'm sure Microsoft would love to get the easy ride that Linux etc get. Boring troll is boring.

  37. anonymouscommenter says:

    The truncating pointers to 32-bits is actually pretty common, it used to be endemic throughout the microsoft code bases because for so many years sizeof(DWORD) == sizeof(DWORD*) which ended up in a lot of code that used the two interchangeably. It's bad form and not something you were supposed to be doing, but it happened a lot and IIRC it at least initially resulted in some oddities in what you'd expect for sizeof(T) on some types that you would've expected to be 64-bit that were 32-bit specifically because it was so common (although i admit at present im drawing a blank on what the type was exactly; maybe just 'int'? id have to check).

    tl;dr: ASLR is right, and its really important in a lot of server side processes. Linux et al are woefully behind at present.

    The addresses >4GB is because this is really important to ASLR; services are long-running and handle multiple users and often do so in the context of threads and lots and lots of virtual mappings/heaps; which means an attacker could essentially fixate memory allocations so that they need not be able to guess an arbitrary address but rather could predict within 80% or more where a given allocation would occur. There are and were issues with allocation biases as well, for instance in 7 thread stack allocations would bias towards lower addresses and start filling in the gaps and the TEB was generally at a predictable address. It's been a while since I've looked at it, so precisely what the biases were is probably slightly off; Microsoft was aware of these issues along with issues relating to the system call interface in early 2009 and elected to do nothing for years until it started being actively exploited.

    tl;dr: Windows ASLR is light years ahead of Linux; (also I sincerely hate everyone who compiles windows applications with gcc and not MSVC.)

    In re Linux; at present, if you can either leak or blindly operate on a single mmap() address and known a bit about the target application, specifically have had access to a compiled version of the same application (i.e. you've looked at the same RPM or .deb etc), then you can actually calculate with static offsets the base addresses of all loaded modules. Linux needs to introduce more entropy into allocations akin to how the openbsd allocator utilizes arc4, and it needs to randomize the load order of modules. Furthermore, it needs to remove the bias where small mmap's will always occur in order at the bottom of the address space and larger ones will always start immediately after the first section of ld.so (IIRC) towards the top of the address space. In short, Linux ASLR sucks, and with a lot of common c++ constructs (vptrs) you should be able to essentially blindly bypass ASLR if you can control a few offsets used (you need to take the vptr and turn it back into the base address of the mapping and then perform either a single subtraction or addition whose value you control).

  38. anonymouscommenter says:

    Imagine what would Linus Torvalds say if somebody would ask him, "Could you please limit the heap address space to 2GB because our large customer truncates pointers to 32-bit?"

  39. anonymouscommenter says:

    Probably along the lines of "please write your own malloc()".

  40. anonymouscommenter says:

    I wish that Windows had an "x32" target like Linux.  For those who don't know, Linux's "x32" is x86-64 except that the size of pointers is 32-bit.  This gets you some advantages of 64-bit, such as 64-bit registers and more of them, but without the penalty of doubling your pointer size.  It's a great target for many applications that simply don't need a 64-bit address space, yet don't need to be backward-compatible, either.

    In Windows, you can always link a 64-bit application with /LARGEADDRESSAWARE:NO to get a 2 GB address space in an x86-64 program, and thus could use x32 with a modified C compiler, but you'd be on your own when it came to the runtime library and Win32.  Win32 would expect all structures to be their full 64-bit form.  Also, you'd only get 2 GB, not 4 GB.

    Unfortunately, true x32 support in Windows would require a full set of runtime, Win32 and NT libraries, which are far larger in Windows than the equivalents in Linux.

  41. anonymouscommenter says:

    @Myria: If you want 64-bit registers but only 32-bit pointers, why not (in Linux anyways) just choose the appropriate -march setting (lots of games did something similar in DOS after the 386 came out)? It seems silly to say "I want to be a 64-bit program but restrict myself to the first 4GB of virtual memory", particularly when ASLR exists. You may have to be more careful about saving registers during calls, but at least you don't have to potentially have 3 copies of each system library installed!

  42. @Myria: The only strictly valid penalty for doubling your pointer size that I can think of is the increased overhead that brings to your application.  And considering that memory is cheap and there are a ton of workarounds around this, I don't think the cost justification really works out for making that kind of feature.  And as ErikF points out, without 64-bit pointers you get weak ASLR, which Microsoft definitely doesn't want.

    Now if only I could force all applications to use ASLR...

  43. jonwil says:

    In regards to storing pointers in disk files, I work on code enhancement mods for an older game title and that game stores pointers in its disk files and has some special logic to say "when this structure was written to disk, it was at address xyz, now its at address xyz" then some other logic to say "when this structure was written to disk this field pointed to a structure at xyz, please replace it with the new address of that structure"

Comments are closed.

Skip to main content