DllMain and life before birth

Preface

OS loader has always intrigued
me - probably because it works behind the scenes and no-one normally bothers to understand
what is that is does exactly, until strange or funny things start happening. And they
do. And then we read through the documentation and we are forced to remember that
there's more to loading a binary than just slapping it into process address space.
In fact there's
a wonderful article
by Matt Pietrek that discusses those matters. I strongly encourage
every person who deals with native code to go and read it - it may be quite enlightening
for you - I know it was for me. When you know how things get loaded, you are less
likely to forget to re-base your binary, consider early binary binding etc.

Every now and then another piece
of information or a great summary on the subject comes up and I find myself mystified
with the whole loader topic all over again. This time it was a
very lengthy post
in Chris Brumme's
blog
. As many people have mentioned, the post in question is very long and very
dense with technical information well, what else did you expect from Chris's blog?
:) Anyway, in order to absorb the topic
better and in hopes of getting the whole thing out of my system I decided to write
things down.

DllMain
and OS loader

As we are all well aware now, things
are not as easy as they seem. In fact are they ever? DllMain which used to be briefly
discussed in most books on Win32 as a reasonably innocent initialization routine may
now look like a vicious monster which obeys no rules and causes nasty side-effects.
But let's get to the source - MSDN
reference

It all starts innocently enough.
The article defines DllMain as an optional entry point into a DLL, called by the system
when the DLL gets attached to a process or a thread; outlines the somewhat tricky
but reasonable rules that govern the calls (for instance, calls may be unmatched for
a thread if it's a main thread of the process or if it was already running when LoadLibrary
was called), discusses abnormal termination and then

...whoa...

Without missing a heart-beat, it
carries on describing what you can do there. That is pretty startling as of itself
since when should you be limited in that regard? - but as you keep reading, things
just get worse. It turns out, you can do pretty much nothing at all. Calls to LoadLibrary/LoadLibraryEx
are explicitly prohibited. Other calls into kernel32 are OK. But you can't call into
User32. And don't use CRT memory management (unless you are linked statically)
- use HeapAlloc instead. Oh, and of course don't call anything that would do any such
nasty things: that would be bad. One last thing - don't read the registry either.
Have a nice day.

The fact that none of this is written
is big, bold, maybe
even red
print is truly
unfortunate - it really ought to be, because most people simply miss that part.
So let's say, you have read it all now the question is: why?

The thing is, as far as your binary
is concerned, DllMain gets called at a truly unique moment. By that time OS loader
has found, mapped and bound the file from disk, but - depending on the circumstances
- in some sense your binary may not have been "fully born". Things can be tricky.

In a nutshell, when DllMain is
called, OS loader is in a rather fragile state. First off, it has applied a lock on
its structures to prevent internal corruption while inside that call, and secondly,
some of your dependencies may not be in a fully loaded state. Before a binary gets
loaded, OS Loader looks at its static dependencies. If those require additional dependencies,
it looks at them as well. As a result of this analysis, it comes up with a sequence
in which DllMains of those binaries need to be called. It's pretty smart about things
and in most cases you can even get away with not following most of the rules described
in MSDN - but not always.

The thing is, the loading order
is unknown to you, but more importantly, it's built based on the static import
information. If some dynamic loading occurs in your DllMain during DLL_PROCESS_ATTACH
and you're making an outbound call, all bets are off. There is no guarantee that DllMain
of that binary will be called and therefore if you then attempt to GetProcAddress into
a function inside that binary, results are completely unpredictable as global variables
may not have been initialized. Most likely you will get an AV.

Another scenario is when you start
spinning a new thread on DLL_THREAD_ATTACH and wait for it to finish initialization
via some syncronization technique. This blocks your thread in DllMain, while still
keeping OS lock. This can lead to deadlocks.

Overall, if anything - anything -
goes wrong in DllMain of one of the binaries, the whole process may be doomed.

The trouble is, definition of "wrong"
is very, very vague in this case. For instance, developers using MC++ know that you
shouldn't even dream of having DllMain in your library. And if you do you do, you
may be very, very sorry
. I think CLR folks want to fix this for the "Whidbey"
release.

Chris Brumme lists the following
things that should never, ever be done in
DllMain
.

· Dynamic
binds
. That includes
LoadLibrary/UnloadLibrary calls or anything that may call implicitly call them

· Locking of
any kind. If you are trying to acquire a lock that is currently help by a thread that
needs OS loader lock (which you may be holding), you'll deadlock.

· Cross-binary
calls.
As been discussed
the binary youre calling into may not have been initialized or have already been unutilized.

· Starting
new threads and then wait for completion
.
As discussed, thread in question may need to acquire OS lock that you are holding.

So, what does this tell us?

DllMain
is that gun you can easily shoot yourself with

How many people do you know that
did stupid things like calling CoInitialize() in DllMain? I know of cases when that
was done on DLL_THREAD_ATTACH, which not only means that we were risking to hit a
deadlock, but also that any thread in that process will have COM initialized. What's
worse, it may be initialized with the wrong threading model. And then people will
be wondering how the heck they ended up with STA threads in thread pools. Or something
much more subtle like calling a system function that starts a worker thread as part
of its execution? How many times did you do
all those things?

Another problem with this is that
all these horrors can present themselves under very limited circumstances. In most
cases things do work fine, but a race condition, a slightly modified DLL load order
or other factors may change everything. Which means you may not even know it until
your ship. This may be fine for a user application (well, things like that are never fine,
it's just that the damage may not be substabtial), but this is always bad for servers
- especially if you are talking enterprise availability. I don't think this can
ever become a security threat - one you can fight anyway - but random crashes
are just not nice.

So let's get back to what we can do
in DllMain. According to MSDN, "The entry-point function should perform only simple
initialization or termination tasks."

These tasks can only include calls
to Kernel32 (excluding LoadLibrary/LoadLibraryEx). If you look at what this means
for you, you will find that this is extremely liming. Further,
CRT functions, including memory allocations are not safe unless you are statically
linked. This means that seemingly innocent things something like g_pMyGlobalObject
= new CMyGlobalObject()
can theoretically cause all kinds of nasty stuff because
they will use malloc that is dynamically
linked from msvcr*.dll.

This leaves us with primitive types,
synchronization objects initialization ... that's about it. And definitely - definitely -
no managed code.

So what am I saying? There aren't
too many things that are legal there; it's extremely easy to do illegal stuff - you
have to always know if what you're calling really does, which is extremely difficult
if you use something defined elsewhere - C/C++ LIB for instance; the compiler won't
tell you that you are doing the wrong thing; and the code is likely to run fine in
most cases... but not all of them.

Where options does this leave us
with?

-
Just
say no
. Avoid the darn
thing altogether and link with /noentry. Reconsider the way you deal with globals.
Do lazy TLS initialization.

-
Be
very careful
. Sometimes
you simply have to use it. It's just too ugly not to. Have a full code review. See
what's being done and what OS does. Make sure that everyone understands that DllMain
is just different. Read and memorize horror stories about people who didn't know better.

One thing you can do here to minimize the damage is disabling calls to your DllMain  
when new threads join/leave the process - this can be done with **DisableThreadLibraryCalls**.  
This is generally a good idea in all cases where you don't need thread-level initialization  
because OS loader doesn't need to call into your binary every time a new thread is  
born  
  • Be
    afraid. Be very afraid.
    Well,
    just leave things where they are. Things don't crash right now and you have other
    things to do. Good plan.

Silver lining
: DllMain and resource leaks diagnostics

There's one piece of information
that gets provided through DllMain which you can't possibly get any other way. If
you review the signature of DllMain, youll notice that the last argument passed in
despite being called lpReserved actually has some meaning:

If fdwReason is
DLL_PROCESS_ATTACH, lpvReserved is NULL for dynamic loads and non-NULL for
static loads.

If fdwReason is
DLL_PROCESS_DETACH, lpvReserved is NULL if DllMain has been called by
using FreeLibrary and non-NULL if DllMain has been called during process
termination.

As you see, lpvReserved does tell
you something. Although I can't see why you would be interested in knowing whether
your DLL has been statically or dynamically loaded - there may be uses there,
I just don't see them - but knowing how you are being unloaded could be interesting.

For one, if you're managing some
kind of resource in DllMain, which only lives within process context, you can possibly
skip some clean-up if you knew that the process is dying as it is. This is not too
valuable because the very nature of DllMain does not make it a very good entry point
for resource management.

There are cases, however, when
you expect your DLL to be unloaded in a specific way and you can use DllMain to verify
that it is indeed being unloaded as you expect. For instance, if:

· your
DLL is in fact a COM server (and has no other uses), and

· the
COM host is well-behaved and

· all
of your COM objects have been properly released,

then you should expect that you
will get lpvReserved=NULL - that is unloaded via FreeLibrary.

Heres what seems to be happening.
Every well-behaved COM process should call CoUnintialize() on each thread when it
gets shut down. Internally that calls DllCanUnloadNow on your binary which returns
TRUE if all outstanding references are closed. If that's the case, COM will call FreeLibrary,
which - unless there are other LoadLibrary references outstanding - will unload your
DLL. That will pass lpvReserved=NULL. If any of
these conditions is not satisfied, your DLL will reside in the process until it terminates
and you'll get lpvReserved!=NULL( I'd like to thank
Michael
Entin
- who really ought to start blogging - for helping me to get all the pieces together).

So if - and that's a big
if
- your application
is well-behaved, and no-one ever messed up loading your DLL with LoadLibrary and forgetting
to unload it, then lpvReserved!=NULL means that some of your COM objects have not
been released. There's nothing your code can do about that - except maybe asserting
- and you will then have to look into that further.

This approach is not limited to
only COM leaks - theoretically you should expect that when your binary is leaving
this world, it's not taking anything with it. You can look through the list of globally-managed
resources and see of they have been disposed if. Be very, very careful there - you
shouldn't be doing any stuff that may compromise OS loader: see the four bullets
above.