How did Windows 95 rebase DLLs?

Windows 95 handled DLL-rebasing very differently from Windows NT.

When Windows NT detects that a DLL needs to be loaded at an address different from its preferred load address, it maps the entire DLL as copy-on-write, fixes it up (causing all pages that contain fixups to be dumped into the page file), then restores the read-only/read-write state to the pages. (Larry Osterman went into greater detail on this subject earlier this year.)

Windows 95, on the other hand, rebases the DLL incrementally. This is another concession to Windows 95's very tight memory requirements. Remember, it had to run on a 4MB machine. If it fixed up DLLs the way Windows NT did, then loading a 4MB DLL and fixing it up would consume all the memory on the machine, pushing out all the memory that was actually worth keeping!

When a DLL needed to be rebased, Windows 95 would merely make a note of the DLL's new base address, but wouldn't do much else. The real work happened when the pages of the DLL ultimately got swapped in. The raw page was swapped off the disk, then the fix-ups were applied on the fly to the raw page, thereby relocating it. The fixed-up page was then mapped into the process's address space and the program was allowed to continue.

This method has the advantage that the cost of fixing up a page is not paid until the page is actually needed, which can be a significant savings for large DLLs of mostly-dead code. Furthermore, when a fixed-up page needed to be swapped out, it was merely discarded, because the fix-ups could just be applied to the raw page again.

And there you have it, demand-paging rebased DLLs instead of fixing up the entire DLL at load time. What could possibly go wrong?

Hint: It's a problem that is peculiar to the x86.

The problem is fix-up that straddle page boundaries. This happens only on the x86 because the x86 architecture is the weirdo, with variable-length instructions that can start at any address. If a page contains a fix-up that extends partially off the start of the page, you cannot apply it accurately until you know whether or not the part of the fix-up you can't see generated a carry. If it did, then you have to add one to your partial fix-up.

To record this information, the memory manager associates a flag with each page of a relocated DLL that indicates whether the page contained a carry off the end. This flag can have one of three states:

  • Yes, there is a carry off the end.
  • No, there is no carry off the end.
  • I don't know whether there is a carry off the end.

To fix up a page that contains a fix-up that extends partially off the start of the page, you check the flag for the previous page. If the flag says "Yes", then add one to your fix-up. If the flag says "No", then do not add one.

But what if the flag says "I don't know?"

If you don't know, then you have to go find out. Fault in the previous page and fix it up. As part of the computations for the fix-up, the flag will get to indicate whether there is a carry out the end. Once the previous page has been fixed up, you can check the flag (which will no longer be a "Don't know" flag), and that will tell you whether or not to add one to the current page.

And there you have it, demand-paging rebased DLLs instead of fixing up the entire DLL at load time, even in the presence of fix-ups that straddle page boundaries. What could possibly go wrong?

Hint: What goes wrong with recursion?

The problem is that the previous page might itself have a fix-up that straddled a page boundary at its start, and the flag for the page two pages back might be in the "I don't know" state. Now you have to fault in and fix up a third page.

Fortunately, in practice this doesn't go beyond three fix-ups. Three pages of chained fix-ups was the record.

(Of course, another way to stop the recursion is to do only a partial fix-up of the previous page, applying only the straddling fix-up to see whether there is a carry out and not attempting to fix up the rest. But Windows 95 went ahead and fixed up the rest of the page because it figured, hey, I paid for this page, I may as well use it.)

What was my point here? I don't think I have one. It was just a historical tidbit that I hoped somebody might find interesting.

Comments (23)
  1. Brad Shannon says:

    Definitely interesting, thanks

  2. Cooney says:

    I know this isn’t a general solution, but why not pad dll code so that instructions never cross page boundaries? This could be done at build time and costs, at most, a couple of bytes.

  3. Adrian says:

    People like Joel (of Joel on Software) have argued that code bloat isn’t that bad because the OS only loads the pages it needs as it needs them. Indeed, it sounds like Windows 95 did this. But, from your description, it sounds like Windows NT fixes up an entire DLL at once–if necessary. So, for modern versions of Windows, Joel’s argument is only true when all of your DLLs are properly based.

    As projects grow more complex, and we reuse more and more components from third parties, ensuring none of the DLLs overlap becomes a more and more difficult task. Given that, shouldn’t MS reconsider a fix-up on the fly solution more like what Windows 95 has (but perhaps saving fixed-up pages in the page file to avoid redoing fix-ups)?

    From my VMS days, I remember tricks like putting all your entry points in a jump table in one or two pages, so that fix-ups wouldn’t have to be applied throughout your library. (The jump table used relative addresses.) Since the tables themselves were hit repeatedly, they tended to stay in your working set, so cost of the extra indirection tiny.

  4. Denny says:

    This is a thing I used to Love with the AmigA OS and system:

    On the Amiga a Dll was called a .library

    and all code in a well formed library was "Address Relative" so you had calls to other libraries via LoadLibrary() which amounted to getting the address of a jump table of pointers and all code inside the libaray did branches as IP + offset or IP – Offset or LibraryBaseAddress + Offset

    so here is the 64 Giga Dollar question?

    can current intel / amd cpus generate this kind of code?

    if so then Microsoft and other compiler vendors should be making dll’s that use that form of addressing.

    then the load address of a dll is just an address and no relocs or fixups are needed.

    can this be done?

  5. Cooney says:

    Fat angus:

    That’s why I said it wasn’t a general solution. For it to be a general solution, windows would have to refuse to load a dll that has an instruction that crosses a page. The intent of this padding is to reduce the incidence of the edge case that raymond described.

  6. Raymond Chen says:

    The x86 does not have an address-relative addressing mode, but the x64 does. You still need fixups for things like initialized data containing pointers or jump tables. (The latter because I don’t think it has a "jump to IP + offset encoded in a register" instruction.)

    Cooney: This would require bi-directional communication between what are traditionally separate steps (compiling and linking). The linker would have to feed back to the compiler, "I’m about to put this function here, could you insert some NOPs for me and fix relative jumps to account for the extra NOPs?" And then the compiler would say "Okay, here’s the new function", and the linker would say "Oh, oops, that shifted this other function, could you move those NOPs you inserted backwards five bytes?" and the compiler would say "Okay, but I needed an extra NOP because that would have put me in the middle of a larger instruction." and so on.

  7. Raymond: In retrospect, do you think that the attempt to make Windows 95 run in 4 MB RAM was actually successful? I didn’t use it that much but friends who did seemed to find it somewhat painful to run in 8 MB. Obviously it would boot and run the included applets in 4 MB, but could it run "real" applications without swapping so much as to be impractical?

  8. ch says:

    You can run into a similar hassle on less "wierdo" processors, even on seemingly normal RISC processors.

    Back when I was at CMU, we ran into a terribly hard to find bug in Mach on the IBM ROMP that happened only when a branch and the branch’s delay slot straddled a page boundary and a fault hapened on the second page.

  9. TOPS-20 (the os used on the decsystem20) used to have a bug that occurred when you set the "indirect" bit on the first byte of each page, then executed a "fetch memory" instruction referencing the first bit – it would attempt to fault the entire 18 bit address space into memory, which exceeded the 2M physical memory on those machines.

  10. Cooney: you can’t "pad dll code", beacuse rebasing applies to *all* .dll’s – not only those shipping with Windows.

  11. Raymond Chen says:

    Heck the machine that I used to *develop* Windows 95 was a 50MHz 486 with 8MB. In my opinion, I was running "real" application in a practical way.

    You have to remember that your average computer in 1995 didn’t have that much memory or hard disk space. Personal computers at most large corporations were only 4MB, so that was an important target to hit. If you bumped the minimum memory requirement of Windows 95 to 16MB, adoption would have been pitiful.

  12. Phaeron says:

    It seems to me that some of the benefits of on-the-fly rebasing would be offset by the need to keep the relocations resident, whereas they could be dumped by NT after load. You’d take the massive hit on load but then swapping would be more efficient.

    On a side note, I once saw a multimedia filter DLL that was linked at 00400000….

  13. James Schend says:

    Heh. What’s funny to me is that despite knowing C and C++ inside and out, I’ve never in my life really dealt with shared libraries, so I’ve never learned stuff like this.

    Of course, I’ve never really done any Windows coding, either. I wrote a lot of code for MacOS Classic, which to my knowledge didn’t even have shared libraries until version 8.0. Classic had its disadvantages, but it sure was a simple API to code.

  14. Mike Hearn says:

    Has an ELF style global offset table ever been considered as a PE DLL extension? That way you could avoid text relocs.

  15. Norman Diamond says:

    It was just a historical tidbit that I hoped

    > somebody might find interesting.

    Yes, the history channel is fine, and is interesting, even though Windows 95 (apparently not your parts thereof) are indefensible.

    > If you bumped the minimum memory requirement

    > of Windows 95 to 16MB, adoption would have

    > been pitiful.

    Yeah, corporations that had Windows 3.1 working would keep it working until they moved to Windows NT4 Workstation and then that would work too (before SP4). But retail customers would still be getting PCs with Windows 95 preinstalled and that would still be as pitiful as it was. I’ve retrieved an old PC running W95 (A but can do B), bought a used Adaptec 1460 whose driver is built into W95 (and have a few other SCSI cards), still have my 3 gigabyte SCSI hard drive (and others), and hope we can meet someday.

    Nonetheless the history channel is interesting, and you’d have a right to post it even if it weren’t, no problem there.

    12/17/2004 11:03 AM Ben Hutchings

    > Obviously it would boot and run the included

    > applets in 4 MB, but could it run "real"

    > applications without swapping so much as to

    > be impractical?

    A few years ago I retrieved an older PC and got W95 running on it in 10MB, more than twice the minimum. Solitaire and Minesweeper and Notepad were really snappy, none of the delay that is noticed when invoking them under Windows 2000 or XP. I vaguely recall putting Word 95 on it but that was slow, and I didn’t try adding Internet Explorer to it.

  16. Ben Hutchings says:

    Raymond: I do remember 1995, thankyou – I was mostly using an Amiga then, and it also had 4 MB RAM. I wasn’t questioning whether 4 MB was a commercially important target spec, but whether that target was achieved in a meaningful way. I seem to recall people saying that Windows 95 was driving up demand for RAM as people decided it practically needed 8 MB.

  17. Raymond Chen says:

    You can also have fixups in the data segment, and you must admit that having data change depending on what order you read from it is pretty strange. (And what if the loader decides "I need to increment this value due to a carry" and the value had already been changed by the program? Now what do to you?)

  18. cola says:

    Why would they fix up the instruction when only half of it is present in memory? When it runs, the other page has to be faulted in and they can fix it then.

    Although that method would have the interesting side effect that *copying* functions out of a DLL would fail, since the first part would already be finished at the time of the fault. And double fixups would have to be avoided with a bit or something.

  19. Thomas Heller says:

    Why are dlls always rebased on 64-k boundaries?

  20. Phill says:

    I believe Windows 95 was successful at running in 4mb. It was faster with 16mb, but my car doesn’t go as fast as a ferrari either.

    While there are many companies that went from Win 3.1/3.11 to WinNT, there are other companies that went from Windows 95 to Windows 2000. The rollout policy of a large company is rarely based on any logical conclusions. Normally it’s down to when they last spent money & the resources they have available now. They might throw popular misconceptions around as the reason for what they do, but that just makes their lives easier.

    Coming from an Amiga background, I actually thought Windows 95 was the first version worth running, but I switched to WinNT4 when I had a machine capable of running it ( mainly because I had problems running windows 95 & windows 98 beta on the machine ).


  21. It is time again for something completly off-topic. In februari I read this msdn article regarding startup

Comments are closed.