To recap our story from last time:
The NT DLL loader starts from some PE (either the main EXE or the DLL which is passed in to the LoadLibrary() API), walks the graph of static imports rooted with that first PE. You can think of the loader as then building a linear ordered list of DLLs to initialize starting with the deepest away from the root. The order tends to be stable but is dependent on a number of factors which no one DLL can control.
It's not uncommon to find cycles. Any time you try to apply some kinds of topological sort to a graph with cycles you basically have to break the cycles; how you break them typically depends on how you first entered into the cycle. The result of this is that even if a cyclic graph of static imports exists, a linear initialization order is selected. As imports are added and removed from various points in the graph, the first point of entry into the cycle can change and so even though you can experiment and figure out that A is always initialized before B and thus as long as A's initialization does not depend on B's, things work. However an innocuous change elsewhere in the graph can change the order and suddenly the initialization order is broken and your component fails.
For a mental model of dynamic loads, imagine that the closure of the dependencies from the loaded DLL are found and then the DLLs already loaded into the process are removed and the initialization is run in that order.
The global list of DLLs is maintained in multple ordered lists, one of which is the in-initialization-order list. When a dynamic load occurs, the new initialization list is added to the end of the init list.
A few interesting notes. The refcount of the closure of the static imports of the main executable is "maxed out". The refcounting code knows not to decrement load counts when the count is as the maximum value. I call this "pinned" - the DLLs which are in the static closure of the executable are permanently pinned in the process. In theory if you loaded a DLL enough times, its refcount could reach the same value and then it could not be unloaded but if your refcounts get that high you probably have a DLL ref count leak anyways.
The other interesting note is that the refcount of (non-pinned) DLLs is the total number of dynamic loads that could have reached it. E.g. the closure of the dependency graph of the dynamic load has its refcount incremented. This doesn't really matter that much except that when I describe unloading, I'll reference it.
Unloading is pretty explainable: Starting from the DLL handle that was passed in to FreeLibrary(), the refcount of the closure of its dependencies is decremented. Any DLLs whose refcounts reach zero are marked as to-be-unloaded. The in-init-order list is consulted for these DLLs and they are uninitialized in the opposite order of their initialization.
The reason that this is important is that if a DLL managed to get initialized at the right time, it's presumably going to be uninitialized also at the right time. If the uninitialization order was based solely on the graph being unloaded, cycles could have been entered at a different point in the cycle and the uninitialization order could be wrong.
Summary: cycles make initialization order hard to understand but the algorithm picks an order. The order can change when cycles are involved since adding or removing static imports from any DLL in the graph may cause the cycles to be entered at different places. The initialization order is inverted to get the uninitialization order in the case of unloads. DLLs which are reachable from the executable are "pinned" (will never unload). Clients forgetting to call FreeLibrary() will eventually force the DLL to be permanently pinned rather than allowing the refcount to overflow back to zero.
Next time, the hazards of DllMain.