Why is 0x00400000 the default base address for an executable?

The default base address for a DLL is 0x10000000, but the default base address for an EXE is 0x00400000. Why that particular value for EXEs? What's so special about 4 megabytes

It has to do with the amount of address space mapped by a single page directory entry on an x86 and a design decision made in 1987.

The only technical requirement for the base address of an EXE is that it be a multiple of 64KB. But some choices for base address are better than others.

The goal in choosing a base address is to minimize the likelihood that modules will have to be relocated. This means not colliding with things already in the address space (which will force you to relocate) as well as not colliding with things that may arrive in the address space later (forcing them to relocate). For an executable, the not colliding with things that may arrive later part means avoiding the region of the address space that tends to fill with DLLs. Since the operating system itself puts DLLs at high addresses and the default base address for non-operating system DLLs is 0x10000000, this means that the base address for the executable should be somewhere below 0x10000000, and the lower you go, the more room you have before you start colliding with DLLs. But how low can you go?

The first part means that you also want to avoid the things that are already there. Windows NT didn't have a lot of stuff at low addresses. The only thing that was already there was a PAGE_NOACCESS page mapped at zero in order to catch null pointer accesses. Therefore, on Windows NT, you could base your executable at 0x00010000, and many applications did just that.

But on Windows 95, there was a lot of stuff already there. The Windows 95 virtual machine manager permanently maps the first 64KB of physical memory to the first 64KB of virtual memory in order to avoid a CPU erratum. (Windows 95 had to work around a lot of CPU bugs and firmware bugs.) Furthermore, the entire first megabyte of virtual address space is mapped to the logical address space of the active virtual machine. (Nitpickers: actually a little more than a megabyte.) This mapping behavior is required by the x86 processor's virtual-8086 mode.

Windows 95, like its predecessor Windows 3.1, runs Windows in a special virtual machine (known as the System VM), and for compatibility it still routes all sorts of things through 16-bit code just to make sure the decoy quacks the right way. Therefore, even when the CPU is running a Windows application (as opposed to an MS-DOS-based application), it still keeps the virtual machine mapping active so it doesn't have to do page remapping (and the expensive TLB flush that comes with it) every time it needs to go to the MS-DOS compatibility layer.

Okay, so the first megabyte of address space is already off the table. What about the other three megabytes?

Now we come back to that little hint at the top of the article.

In order to make context switching fast, the Windows 3.1 virtual machine manager "rounds up" the per-VM context to 4MB. It does this so that a memory context switch can be performed by simply updating a single 32-bit value in the page directory. (Nitpickers: You also have to mark instance data pages, but that's just flipping a dozen or so bits.) This rounding causes us to lose three megabytes of address space, but given that there was four gigabytes of address space, a loss of less than one tenth of one percent was deemed a fair trade-off for the significant performance improvement. (Especially since no applications at the time came anywhere near beginning to scratch the surface of this limit. Your entire computer had only 2MB of RAM in the first place!)

This memory map was carried forward into Windows 95, with some tweaks to handle separate address spaces for 32-bit Windows applications. Therefore, the lowest address an executable could be loaded on Windows 95 was at 4MB, which is 0x00400000.

Geek trivia: To prevent Win32 applications from accessing the MS-DOS compatibility area, the flat data selector was actually an expand-down selector which stopped at the 4MB boundary. (Similarly, a null pointer in a 16-bit Windows application would result in an access violation because the null selector is invalid. It would not have accessed the interrupt vector table.)

The linker chooses a default base address for executables of 0x0400000 so that the resulting binary can load without relocation on both Windows NT and Windows 95. Nobody really cares much about targeting Windows 95 any more, so in principle, the linker folks could choose a different default base address now. But there's no real incentive for doing it aside from making diagrams look prettier, especially since ASLR makes the whole issue moot anyway. And besides, if they changed it, then people would be asking, "How come some executables have a base address of 0x04000000 and some executables have a base address of 0x00010000?"

TL;DR: To make context switching fast.

Comments (31)
  1. Joshua says:

    So does thus mean that an .EXE with a base address of 64k is allowed now? Not that I have any good reason right now.

  2. A New Record? says:

    I think this may be the Old New Thing article with the most hyperlinks yet. I'm not going to get a lot of work done today.

  3. kantos says:

    I suspect if you asked the Linker team would give you a withering glare and tell you to enable ALSR and set the no-fixed flag. If you insisted they would support you but only because they had to.

  4. skSdnW says:

    Most .exe files did not have a relocation directory back then. As long as you stay away from ntdll and kernel32 (and all other shared "system dlls" on Win9x) you don't really have to worry about not getting loaded at your preferred base. This changed a bit when Vista added ASLR so at least internet facing apps/file readers should now make sure the linker generates it and sets the flags for ASLR and DEP.

  5. skSdnW says:

    …and Process Explorer and VMMap are probably broken in this regard since they will mark a image as ASLR compatible just by checking a PE flag when they really also should check if the image has relocations.

  6. Joshua says:

    @skSDnW: I have a compiler that can generate a relocatible EXE that requires no fixups.

  7. skSdnW says:

    @Joshua: Is it setting IMAGE_FILE_RELOCS_STRIPPED? Take a look at the description for PROCESS_MITIGATION_ASLR_POLICY.DisallowStrippedImages: "Images that have not been built with /DYNAMICBASE and do not have relocation information will fail to load if this flag and EnableForceRelocateImages are set." so there is a possibility that your .exe might not load in a locked-down environment…

  8. Antonio 'Grijan' says:

    Pfew! It was a nice hour or so reading this article (and linked articles, and sublinked articles… see http://xkcd.com/214/ ). But I really enjoyed it!

    Raymond, thanks again for all your information, specially the "Geek Trivia" part. For 20 years, I have assumed the VMM just allocated a simple selector of base 0 and size 4 GB, directly mapping segmented addresses to virtual ones. From that came my mistake of thinking the interrupt table could be easily accessed (and corrupted).

    When are you going to publish a book about Windows 95 history? And I say "publish" and not "write" because it could be as simple as a collection of the articles from this blog that talk about Windows 95, maybe sorted chronologically. You have enough material to make a good reading :-) .

  9. Cesar says:

    Reading blog posts like this, one can easily understand why Unix users were so smug in the 90's. Compare and contrast the massive ball of hacks which was the MS-DOS 7.x/Windows 9x combo, with the clean minimalist design of Unix variants of that era.

    [Of course, in order get that clean minimalist design, you have to throw a lot of stuff away (hence "minimalist") such as "can use v86-mode to multitask MS-DOS applications on an 80386 chip with a B1 stepping" and "supports MS-DOS drivers originally written for MS-DOS 2.0 to control a hand-held scanner that somebody bought from an Egghead bargain bin." Which is great if none of your customers bought a hand-held scanner from an Egghead bargain bin. -Raymond]
  10. Adam Rosenfield says:

    Thank you again for this great history lesson, Raymond.

  11. mikeb says:

    I've said it before, but it's time to say it again: thanks for these history lessons.

  12. Mordachai says:

    I'm mildly confused: "This memory map was carried forward into Windows 95".  But we're talking about Win95 the whole time, no?  Or do you mean that scheme was carried forward from Windows 3.xx?  Or …?

    [Thanks for pointing this out. I've made some tweaks to clarify, let me know if they help. -Raymond]
  13. remis says:

    I remember a time 12-13 years ago I ran over all of our (C++) projects to change their base addresses so they do not clash. I was young and naive at that time.

    I wonder how that was changed .NET?

  14. Schnikies says:

    @remis[I remember a time 12-13 years ago I ran over all of our (C++) projects to change their base addresses so

    they do not clash. I was young and naive at that time.]

    I wrote a program some years back to automate the rebasing of our numerous DLLs as part of our builds. Sorta like the old REBASE command, whose syntax I could never get a handle on. It did make a difference, at the time.

  15. DWalker says:

    When I see a question asking why the base address is 0x00400000, I think "well, it has to be somewhere".  No matter what the default is, someone will ask why that default was chosen.  The answer might be "it was a random choice".  But in this case, it appears that there are valid reasons for the default.

    Is the question "why 0x00400000 as opposed to 0x00000000", or "why 0x00400000 as opposed to 0x04000000"?  :-)

    And yes, clean minimalist designs of Unix in the 90s would not support random hardware, and people spent a lot of time finding and manually installing drivers, and tweaking configurations to make their hardware work.

  16. doynax says:

    I don't suppose you happen to know why the 0x400000 base address was unavailable on Win32s?

  17. Yuhong Bao says:

    @skSdnW: Trivia: I think link.exe used to default to 0x10000 as the base. Luckily the default was /FIXED:NO back then. Then the default was changed to 0x400000, and after that (I think in Visual C++ 5.0) the default was changed to /FIXED.

  18. Myria says:

    A file is defined as not having relocations by having the IMAGE_FILE_RELOCS_STRIPPED flag set, not the absence of a relocation table.  This is because, as Joshua said, there can exist images that are relocatable but do not have anything to patch to do so.  These are usually resource-only DLLs, but nothing stops someone from making a compiler to that produces position-independent Windows code.  (Note that .NET executables still have exactly one relocation: a jmp dword ptr [__imp__CorExeMain@0] gets fixed up to support pre-XP systems.  ntdll.dll in XP and later detects .NET images and redirects execution to mscoree.dll before the .NET image stub code ever gets control.)

  19. user says:

    Do you guys have all this information documented at Microsoft? I wanna work there.

  20. user says:

    Great article! I'm now wondering why 0x140000000 is the default base address in x64…

  21. Mark says:

    skSdnW: "have not been built with /DYNAMICBASE *AND* do not have relocation information"

  22. AlexFru says:

    @Joshua: ImageBase of 64KB appears to work fine under Windows7 x64. I tried it while working on my compiler's linker.

  23. Chris Smith says:

    With respect to the comment that Cesar made, that is possibly the finest retort against "Unix smugness" I've seen and I'm a Unix guy. It's not all roses on this end either and I rather prefer working with NT to be honest.

  24. Cesar says:

    @Raymond: being more useful doesn't stop it from being uglier. And someone who is being smug about his operating system won't be caught buying "inferior" hardware or software (where inferior is defined as anything which would not work with his preferred operating system).

    @Chris: things changed after the 90's. The Windows NT line has a cleaner kernel than the MS-DOS 7.x/Windows 9x pair (even though most of its elegance gets hidden from applications by the Win32 layer it inherited from the Windows 9x line), and Unix variants got more complex (thus less minimalist).

  25. Yuhong Bao says:

    @doynax: If I remember correctly, Win32s used a single global address space instead of separate per-process address spaces.

  26. ErikF says:

    @Cesar: If you were a Unix user in the early '90s, you probably didn't buy the hardware (your university or employer did!) Unless, of course, you were independently wealthy; I for one did not have $15,000 for a Sun workstation kicking around! And good luck finding hardware that worked with it unless you got it from the vendor.

  27. cheong00 says:

    [And yes, clean minimalist designs of Unix in the 90s would not support random hardware, and people spent a lot of time finding and manually installing drivers, and tweaking configurations to make their hardware work.]

    That's not the age of internet. I don't think they can find much about what hardware does one system support before they bought the systems.

    The logical choice instead is to buy the system from a specialist vendor. (Take SPARC machines or IBM mainframes for example)

  28. Mordachai says:

    @Raymond – thanks, yes, much clearer now :)

  29. Adam Rosenfield says:

    @user (re: "Do you guys have all this information documented at Microsoft? I wanna work there."):

    IANAME (I am not a Microsoft employee), but I highly suspect that the answer is "no" and that Raymond's blog posts like this constitute the de-facto documentation.  Based on his other numerous history blog posts, I'm guessing he cobbles this information together from source control history and his infinite wisdom.

  30. cheong00 says:

    @Adam Rosenfield: Also known as "reconstructing the scene from fragments".

    Btw, I think he'd ask whoever he knows that understand what happened at the time to confirm the information too.

  31. Ken Hagan says:

    Regarding whether or not all this is documented, I'm pretty sure the default base address *is* documented and I'm struggling to imagine a legitimate program that depends on any of the other stuff.

Comments are closed.

Skip to main content