The NT DLL loader: dynamic unloads

To recap our story from last time:

The NT DLL loader starts from some PE (either the main EXE or the DLL which is passed in to the LoadLibrary() API), walks the graph of static imports rooted with that first PE.  You can think of the loader as then building a linear ordered list of DLLs to initialize starting with the deepest away from the root.  The order tends to be stable but is dependent on a number of factors which no one DLL can control.

It's not uncommon to find cycles.  Any time you try to apply some kinds of topological sort to a graph with cycles you basically have to break the cycles; how you break them typically depends on how you first entered into the cycle.  The result of this is that even if a cyclic graph of static imports exists, a linear initialization order is selected.  As imports are added and removed from various points in the graph, the first point of entry into the cycle can change and so even though you can experiment and figure out that A is always initialized before B and thus as long as A's initialization does not depend on B's, things work.  However an innocuous change elsewhere in the graph can change the order and suddenly the initialization order is broken and your component fails.

For a mental model of dynamic loads, imagine that the closure of the dependencies from the loaded DLL are found and then the DLLs already loaded into the process are removed and the initialization is run in that order.

The global list of DLLs is maintained in multple ordered lists, one of which is the in-initialization-order list.  When a dynamic load occurs, the new initialization list is added to the end of the init list.

A few interesting notes.  The refcount of the closure of the static imports of the main executable is "maxed out".  The refcounting code knows not to decrement load counts when the count is as the maximum value.  I call this "pinned" - the DLLs which are in the static closure of the executable are permanently pinned in the process.  In theory if you loaded a DLL enough times, its refcount could reach the same value and then it could not be unloaded but if your refcounts get that high you probably have a DLL ref count leak anyways.

The other interesting note is that the refcount of (non-pinned) DLLs is the total number of dynamic loads that could have reached it.  E.g. the closure of the dependency graph of the dynamic load has its refcount incremented.  This doesn't really matter that much except that when I describe unloading, I'll reference it.

Unloading is pretty explainable: Starting from the DLL handle that was passed in to FreeLibrary(), the refcount of the closure of its dependencies is decremented.  Any DLLs whose refcounts reach zero are marked as to-be-unloaded.  The in-init-order list is consulted for these DLLs and they are uninitialized in the opposite order of their initialization.

The reason that this is important is that if a DLL managed to get initialized at the right time, it's presumably going to be uninitialized also at the right time.  If the uninitialization order was based solely on the graph being unloaded, cycles could have been entered at a different point in the cycle and the uninitialization order could be wrong.

Summary: cycles make initialization order hard to understand but the algorithm picks an order.  The order can change when cycles are involved since adding or removing static imports from any DLL in the graph may cause the cycles to be entered at different places.  The initialization order is inverted to get the uninitialization order in the case of unloads.  DLLs which are reachable from the executable are "pinned" (will never unload).  Clients forgetting to call FreeLibrary() will eventually force the DLL to be permanently pinned rather than allowing the refcount to overflow back to zero.

Next time, the hazards of DllMain.

Comments (5)

  1. How does delayload enter into the picture? When an EXE with delay load imports finally does load the DLL, is the newly loaded DLL pinned, or is it treated like LoadLibrary?

  2. MGrier says:

    Delayload is coming. I’m trying to take a more organized approach to the brain dump I sent around internally. First I wanted to explain normal non-reentrant behavior and what quirks come out of it – specifically the ordering problems when cycles are in the graph.

    Next I’m going to explain the ExitProcess() hazard; we have all the right tools to understand why not to do anything during ExitProcess() now.

    Then I’m going to go into loader reentrancy, starting again with deadlocks and then going into the problems with reentrancy (including delayloads used during DLL_PROCESS_ATTACH / DLL_PROCESS_DETACH).

    Then I’ll finish up with the fact that sloppy error handling can tank the whole process when the code runs as part of DLL_PROCESS_ATTACH which will be exacerbated by the use of explicit (loadlibrary/getprocaddress) or implicit (linker delayloads) delay loading techniques.

  3. MGrier says:

    After rereading Larry’s comment and questions, I’ll answer the pinning question.

    Delayloads are just wrappers around dynamic loads. The loader doesn’t know anything at all about them. The fact that they do have a standard form documenting them in the PE header is just to assist analysis tools like depends.exe.

    Thus the answer is that they don’t inherit any pinning. Because of the problem where you can’t know who your caller is, it’s not possible to implicitly pin delayloads from pinned DLLs.

Skip to main content