So what happens if you call back into the loader when you're inside a loader callout (DllMain) for DLL_PROCESS_ATTACH?
I'll be addressing teardown (DLL_PROCESS_DETACH) after completing the DLL_PROCESS_ATTACH series.
The first issue is: what about LoadLibrary()? I'll address GetProcAddress() and FreeLibrary() later.
We already know how LoadLibrary() works. It walks the dependency graph starting from the DLL that is being loaded and trims away any portions already in the loader's tables. It then effectively adds that to the initialization order list and (this is what varies) if you're not using the loader reentrantly, calls the initializers.
If you are calling into the loader reentrantly, the list of DLLs to initialize is extended but (I should double check the code to make sure I'm not lying; it's been a couple of years...) only the outer-most invocation of the loader will complete the initialization sequence.
Thus, you call LoadLibrary("a.dll"). In a.dll's DLL_PROCESS_ATTACH, it calls LoadLibrary("b.dll"). b.dll's initializer is added to the init list but it isn't actually initialized at that time. Instead, the inner LoadLibrary() just returns and then the DLL_PROCESS_ATTACH returns (successfully for the sake of argument) and then the original stack frame for the original LoadLibrary() initializes b.dll.
Clearly things are a little more complicated since it might not be a.dll that actually calls loadlibrary(); it could be a static import of A which does so and thus the init list may still have uninitialied entries on it at the time that the entries for b.dll (and its static imports!) are appended.
So far so good; assuming that the cycles don't break you, this was a fine solution to the problem which also avoids arbitrary recursion due to nested dynamic DLL loads. This was mostly harmless!