How the NT Loader works


My team maintained the NT loader (the component that loads DLLs) for about a year or so during Windows XP as we were adding the isolated application features to it so we got quite an interesting perspective on this lovely little piece of technology.  Warning to people who find themselves wanting to innovate in technology which has basically been left dormant and untouched for over a decade: be sure you have plenty of time to deal with the anthills you knock over!


We don’t own it any more (not sure if it’s a blessing or a curse…) but it sure was interesting and enlightening; especially in the tradeoffs of application compatibility, robustness and reliability.


You might notice that the docs for DllMain have grown a lot over the past few years.  I like to think that my team’s involvement here had a lot to do with it because DLL load order etc. was always a vaguely understood and arcane topic.  There were always vague warnings about not doing too much in DLL_PROCESS_ATTACH but nobody could really describe the situation except for a number of anecdotes they had had in the past when somehow mysteriously load orders changed and they were broken during either initialization or shutdown.


I’ll take a break from where I’m headed on the reliability front and walk through a summary of the issues which I recently sent to the internal win32 programming email alias.  Hopefully I’ll fix the incomplete sentances and bad grammar this time.


I’ll make a separate post with the beginning – a basic rundown of how things work today.  As usual, do not consider this in any way shape or form a contract.  One of the reasons that this isn’t documented fully is that people have wanted to change/fix it for years and years now.  On the other hand, maintaining compatibility with the current behavior is going to constrain the implementation so much that either (a) it won’t change after all or (b) the change will have to be compatible with the effects of anything I’m describing here anyways.


You will see aspects of my reliability/robustness series come up here.  You’ll laugh, you’ll cry, you’ll see local innocuous bugs in DLL initialization or uninitialization affect the entire process’s reliability.


Comments (12)

  1. bao says:

    Has there been any consideration in defining a new executable format and new loader from scratch? Sometimes I wonder how fast and clean Windows would be if backwards compatibility was completely dropped, or at least new components added to gradually replace old components laden with compatibility code. It’d be interesting to hear on this.

  2. mgrier says:

    Hard to say. It’s possible but even the radical experimental new programming model (CLR) still needs a basic win32 operating environment set up underneath itself. Building something that doesn’t depend on win32 might get rid of a bunch of process initialization costs but (a) the NT level APIs are undocumented for a reason and (b) it’s not clear that we wouldn’t just be trading the devil we know for the devil we don’t know.

    I think we’re on a good path to get rid of the cruft that’s gotten into the initialization path over the next few releases and if we can get rid of most of the DLL initialization code in the world, DLL and process startups will be markedly faster.

  3. Yaytay says:

    Whilst the documentation for DllMain has grown a lot it still does not describe in detail the ramifications of the lock that is taken out (the process lock, I don’t know the internal name for it).

    The two comments that there are ("It must not call the LoadLibrary or LoadLibraryEx function" and "entry-point functions should not attempt to communicate with other threads or processes") are not sufficient.

    Please can someone document all the places in which the process lock is taken out, so that we have a definitive list of areas that will cause problems?

    This should be documented, because any changes to the places where it is taken out will affect applications.

  4. You will not see the details of the "loader lock" documented. However I do plan to discuss a number of the visible side effects of the current implementation.

  5. I don’t normally post on weekends, but I just noticed that Michael Grier’s finally started posting his…

  6. Yaytay says:

    Why not?

    I’m not after details of how it works (well, actually I love to see that, but I don’t consider that necessary) but the places where it is taken out affects the code that we can write.

    Wouldn’t it make more sense to just document all the functions calls that might result in the lock being taken out and avoid the current approach of just highlighting certain of the things that will break?

    As it stands it forms part of the contract between you and us – but we don’t know what it is.

    I’m not trying to get aggressive about it, but I’ve said why I think it should be documented and I’d like to know why it won’t be.

  7. mgrier says:

    Re: why not document the loader lock:

    It’s an implementation detail that some folks believe that we will be able to eliminate eventually.

    My only goal in this series is to give you some idea of why running code in DllMain is a problem. If you can get rid of the code in DLL_PROCESS_ATTACH, you are now oblivious to whatever changes eventually do come along so why not take the early path to success?

    Clearly there’s a database of loader data and clearly multiple threads can concurrently try to access it so the fact that access is synchronized will always be true. We /may/ someday be able to remove the single central lock and retain the same overall behavior. The jury is still out on that.

    If that could happen, most of the restrictions that occur due to the fact that the DLL callouts (dllmain) occur with the loader lock held could be lifted.

    This is standard stuff – I’m walking a fine line here between explaining stuff and helping people vs. obligating us to maintaining an implementation indefinitely.

  8. Norman Diamond says:

    I was going to suggest that warnings be issued (in the application log at least) when cycles are detected, so that developers will know that something will break in the future and they might look into breaking the cycles now in a designed non-random manner.

    But then, just by accident, I came across these two cycles:

    hal.dll -> ntoskrnl.exe -> hal.dll

    bootvid.dll -> ntoskrnl.exe -> bootvid.dll

    What does it mean for a dll to be dependent on an .exe? Or is ntoskrnl.exe really a shareable (oops) that missed out on getting its filename changed?

    What happens if the NT loader loads these dlls and exe in a different order than it used to? (No not that NT loader, *that* NT loader. Or are there more?)

    > Re: why not document the loader lock:

    > It’s an implementation detail that some

    > folks believe that we will be able to

    > eliminate eventually.

    And why is it partially documented? And why did your team add more parts to the documentation? Because programmers have to know what to work around, right? Let the MSDN pages about elimination say "Preliminary information subject to change" and let MSDN pages about this decade’s systems be more concrete about what this decade’s programmers have to work around.

  9. Drew says:

    Cool! I guess I don’t need to save that mail you sent after all.

    Suggest: spell checking.

    s/sentance/sentence/

    As always, this is great information and a fun read, too. Thanks.

  10. mgrier says:

    Re: HAL, kernel, etc.:

    I don’t know why ntoskrnl.exe is an exe. Geeks love to play these unified field theory games where if they generalize things enough, everything fits in.

    I also have very little perspective on the kernel mode loader. It’s an entirely different chunk of code from the user mode loader. I won’t postulate past this point. Note that at some point here, the boot loader also is important; I believe it has to load both the right hal and the kernel.

    Finally, on imports from PEs… this is an interesting topic. In the unified-field-theory of things, everything’s a PE and you can import anything from any PE that exports it.

    That said, EXEs are different. Their entry points do not follow the DllMain shape. The linker defaults are to not include relocation information for themselves, so they can’t be moved.

    Given this, the only usable way to use an export from an executable is if it was the executable that was used to launch the process (since then it didn’t have to be relocated and its initialization had to have been done before it got to your code.

    The DLL loader has prevented loading of executables (it’s a bit in the PE32 header) for ages. In XP, I added code so that importing exports from executables other than the base process executable failed. Again, since we had touched the code, anything that went wrong with process initialization or DLL loading/unloading was directed to us and we found a few cases where people were using static imports to try to load (new) executables into the process. It occasionally worked but often did not so we advised the appropriate folks to change their ways and just closed this down.

    The interesting cycles had more than 2 nodes.

    This topic is getting a lot of attention as we move towards a less … organic … process of growing Windows.

    See http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnembedded/html/embedded04152003.asp for an interesting appetizer of the issues we’re working in this area. The article isn’t directly relevant but the footprint issue is all about dependencies and especially cyclic dependencies.