Does creating a thread from DllMain deadlock or doesn’t it?

Let me get this out of the way up front: Creating a thread from DllMain is not recommended. The discussion here has to do with explaining the behavior you may observe if you violate this advice.

Commenter Pete points out that “according to Usenet” creating a thread in DllMain is supposed to deadlock, but that’s not what he saw. All he saw was that the thread entry procedure was not called.

I’m going to set aside what “according to Usenet” means.

Recall how a thread starts up. When you call CreateThread, a kernel thread object is created and scheduled. Once the thread gets a chance to run, the kernel calls all the DllMain functions with the DLL_THREAD_ATTACH code. Once that’s done, the thread’s entry point is called.

The issue with deadlocks is that all DllMain functions are serialized. At most one DllMain can be running at a time. Suppose a DllMain function is running and it creates a thread. As we noted above, a kernel thread object is created and scheduled, and the first thing the thread does is notify all the DLLs with DLL_THREAD_ATTACH. Since DllMain functions are serialized, the attempt to send out the DLL_THREAD_ATTACH notifications must wait until the current DllMain function returns.

That’s why you observe that the new thread’s entry point doesn’t get called until after you return from DllMain. The new thread hasn’t even made it that far; it’s still working on the DLL_THREAD_ATTACH notifications. On the other hand, there is no actual deadlock here. The new thread will get itself off the ground once everybody else has finished doing their DllMain work.

So what is this deadlock that Usenet talks about? If you’ve been following along, you should spot it easily enough.

If your DllMain function creates a thread and then waits for the thread to do something (e.g., waits for the thread to signal an event that says that it has finished initializing, then you’ve created a deadlock. The DLL_PROCESS_ATTACH notification handler inside DllMain is waiting for the new thread to run, but the new thread can’t run until the DllMain function returns so that it can send a new DLL_THREAD_ATTACH notification.

This deadlock is much more commonly seen in DLL_PROCESS_DETACH, where a DLL wants to shut down its worker threads and wait for them to clean up before it unloads itself. You can’t wait for a thread inside DLL_PROCESS_DETACH because that thread needs to send out the DLL_THREAD_DETACH notifications before it exits, which it can’t do until your DLL_PROCESS_DETACH handler returns.

(It is for this thread cleanup case that the function FreeLibraryAndExitThread was created.)

Comments (25)
  1. John says:

    The Old New Thing: 4 out of 5 Usenet users recommend it!

  2. Joshua says:

    I feel smart… I actually found the deadlock problem before he told us! I think that’s a first for me.

  3. JS Bangs says:

    "… [A]ll DllMain functions are serialized."

    Is there a current architectural reason why this must be so? Or is it a holdover from the olden days, kept for compatibility reasons? In either case, it’s rather too bad, since as dual- and quad-core CPU’s become common, startup could be sped up a lot by parallelization.

  4. Bob says:

    [Note: Wild speculation. Raymond, Larry, feel free to tell me I’m an idiot.]

    Certainly everything was serialized by definition in the uniproc 3.x days, but I’d think life is still easier with serialized DllMains. This way you don’t have to worry about reentrancy. DLLs with no significant startup cost return quickly enough from DllMain that it’s not a major issue.

    DLLs that do need to fire up other threads/libraries (such as winmm.dll or DirectX) could theoretically have their exports called in mid-startup by other cores. They’d have to block anyway to prevent uninitialized interfaces, etc., from being called, so why not just serialize everything and make people’s lives easier? (Not to mention that this avoids the case of multiple simultaneous loadings.)

    Plus I’m guessing that enough otherwise-correct code has been written under the serial DllMain assumption that we’d suffer a total meltdown if this was no longer the way of things in a future version of Windows. I’d almost expect the introduction of a DllMainEx–i.e., a separate entry point–in that case, and wouldn’t THAT be fun to explain?

  5. Wang-Lo says:

    Thank you so much, Raymond, for that tutorial on how to get a thread started from DllMain.

    I believe The Old New Thing has helped my professional development more than any other site on the Web.  I program mostly in platforms, like Delphi C/S and Paradox for Windows, that insulate me from both the power and the complexity of the Windows APIs.  Sometimes I am frustrated by the limitations of these platforms, and I am haunted by the feeling that by avoiding the native Windows environment, I am missing a rich and rewarding experience.

    Whenever that happens, I come here and read Raymond’s wonderfully detailed explanations of how the Windows architecture really works, and the feeling soon passes.


  6. Wang-Lo is right. I tell the devs I work with that they can learn more about Win32 development in a weekend of reading Raymond’s archives than in a year of reading Win32 books. Of course, now that part of those archives have been made into a book, maybe that’s not so true anymore.


  7. Anonymous says:

    @JSBangs: You may want to check out this link:

    Basically the result is that general purpose applications are not benefitting from parallelization.  Only specific applications like graphics rendering or scientific number crunching are benefitting.

    Of course, the author of the article does point out that general purpose applications are not designed with parallelization in mind.  That said, though, I think Bob is right.  General purpose tasks, especially initialization tasks, don’t lend themselves to parallelization.  Designing DllMain to be thread safe is probably more trouble than it’s worth.

  8. JS Bangs says:

    Anon: Actually I had just read that article before reading Raymond’s post, which is why it was on my mind. It seemed to be an example of a place where the OS might be serializing unnecessarily. But the objections raised here are good–the benefits or making DllMain thread-safe are small, and the costs enormous.

  9. Anony Moose says:

    Sounds like another case of "it seems to work for me" not being close to the same as "it actually will work perfectly for everyone". Just because something "seems" to work doesn’t mean it’s a good idea.

  10. Gabe says:

    Don’t forget that a large amount of initialization time is actually taken up with hard page faults. That isn’t the sort of thing that would be sped-up by executing multiple threads in parallel.

  11. Igor says:

    And if you call DisableThreadLibraryCalls() as the first thing in DllMain()?

    I believe that then it is possible to create threads and use TLS safely.

  12. Pete says:

    OK, I am a little weirded out here. I’m the Pete mentioned above. I haven’t thought about this topic in months, but today I got confused while debugging somebody else’s code, came back here in hopes of finding more information, and sure enough not only is it here but it was just posted today in direct response to an earlier question I posed! Talk about good service! Where is the tip jar?

  13. Daev says:

    Important note for everyone who thinks "I’m not writing system software, I just write plain C, so I don’t need to think about this."

    • atexit functions for DLLs are called from DllMain during a DLL_PROCESS_DETACH.

    That means two things to keep in mind when you’re writing C code that will go into a DLL:

    (1) You can’t signal and wait for other threads to shut down safely inside your atexit functions.

    (2) You can’t rely on other DLLs even being present when you’re running your atexit functions; for instance, you can’t call WinSock to close open sockets.

    If your atexit function is being called on the same thread as called exit() then you have nothing to worry about, however.  exit explicitly runs through your atexit functions before it leaves.  It’s other thread’s atexit functions that are at risk — in particular, other threads created by other DLLs or main EXEs, which might have their own C run-time libraries bound to them.

  14. Pete says:

    From the No Good Deed Goes Unpunished Department, I’ve now moved on to being confused about a slightly different matter.

    I’ve got a module with two functions; call them Start and Stop. Start creates a thread, which in turn creates a window and enters a message loop. Stop sends WM_CLOSE to the window and then waits for the thread to end. (There’s some extra cleverness in Stop to handle the case in which Start hasn’t yet created the window, but I don’t think that’s important right now.)

    Suppose I statically link this module into some program and call Start and Stop as the result of, say, the user pressing a button. It would seem that I can call Stop with impunity, and indeed when I do that everything works as expected.

    But now let’s suppose I call Stop from a DLL_PROCESS_DETACH handler. What would you expect? Given all this discussion, I’d expect you to say waiting for the thread to end will deadlock. And that was what I was expecting, too, and I was prepared to link this page in a comment and be done with the issue. But that’s not what happens.

    What happens instead is that SendMessage returns immediately without having delivered the message to the window. And then things get weirder: WaitForSingleObject returns immediately as if the thread were already signalled (ended). What the heck is going on here?

    [I think it’s kind of obvious. When would SendMessage return immediately? When does WaitForSingleObject return immediately? Why is process termination the scariest moment of a process’s lifetime? -Raymond]
  15. Pete says:

    After reading the "scariest moment" post, I have an explanation for the behavior I’m seeing. Developers must assume that, rather than tear down resources and DLLs in some orderly, document-able sequence, Windows does these things in no particular order. Consequently, DLL_PROCESS_DETACH is a good time to do a whole lotta nothin’ if that last parameter is non-NULL. However, please allow me to respectfully submit that this is not obvious from the specific symptoms I posted. It’s only obvious if you have more experience with these things than I do. And I’m glad not only that you do but also that you share it. Once again, I am looking for the tip jar.

  16. Igor says:

    Pete, you can buy Raymond’s book.

    When you are writing DLL which allocates resources and creates threads ‘n’ stuff, usually you do all that alloc/create and free/destroy stuff in separate functions called YourLibNameInit() and YourLibNameExit() which are then called after the DLL gets loaded by its client, and not from DllMain().

    This not only speeds up application loading time (nothing is done inside of DLL until it is actually needed), but it also saves you from a lot of headache related to all-night debugging sessions.

  17. Yuhong Bao says:

    How about creating a thread in DllMain as a way of waiting until the loader lock is unlocked? I mean, you can create a thread with it’s entry point pointing to an initialization procedure, and then after you return from DllMain, the procedure will be called when the loader lock is unlocked. And at the beginning of any exported function, check that the thread created during DllMain has been terminated. But what if the function calling the exported function is in the thread that was created during DllMain itself? Then maybe report an error. But what if also the DLL that called the exported function is the DLL itself? How about having the exported function be a stub that checks that the thread has been terminated and then calls another internal function. Then the DLL itself can call the internal function directly.

  18. Dave Harris says:


    It’ll take a while for general purpose applications to take advantage. My employer (we write shrink-wrap desktop apps) is only just starting to do this, because until recently the multi-core machines weren’t a significant part of our market. And because it’s hard, of course. In 5 or 10 years time going from 2 to 4 cores will bring a bigger benefit.

  19. Yuhong Bao says:

    "Just don’t forget to FreeLibrary() your DLL before your initialization thread exits. "

    BTW, that is what FreeLibraryAndExitThread() is for.

  20. Mike says:

    About creating a thread from DLLMain (and other code that should be run only after loader lock is released), wouldn’t it be possible to queue a user mode APC?

    Won’t solve neither the shutdown nor "pulling the rug" problem, but perhaps this "trick" can help someone.

  21. Pete says:

    Igor, nobody questions the wisdom or lack thereof of putting significant logic into DllMain given a choice. (Personally, I come from a Mac background, and the same is true of the CFM and Mach-O equivalents.) Many of us, however, are stuck with figuring out the messes left behind by other programmers. This is one such case. My charter for this maintenance task does not include adding Start/Stop entry points to the DLL since clients which do not call these functions are presumed to exist already somewhere out there in the world.

  22. Pete says:

    Interestingly, I’ve been plumbing the depths of MFC’s DllMain lately. It’s a busy little beast! (imagine usual note from Raymond about how he is just the messenger and isn’t accountable for the actions of others within Microsoft)

  23. MadQ says:

    @Yuhong Bao: As the "not recommended" document mentions, if the DLL is unloaded before the thread you created in your DllMain has run to comletion, your new thread will cause an access violation. This could easily happen if some other DLL causes your DLL to be loaded, but then returns FALSE for DLL_PROCESS_ATTACH.

    Heh. Reminds me of that trick of pulling the table cloth off the table after dinner is served. It’s awesome when it works, but has a tendency to leave catsup stains on the walls.[Great entertainment either way.]

    However, there’s a trick to avoid having your DLL being unloaded from under you on Windows XP and later: call GetModuleFileName() to get the filename of your DLL, followed by LoadLibraryEx() with the DONT_RESOLVE_DLL_REFERENCES flag. This has the effect of incrementing the DLL’s reference count without incrementing the loader lock when called from DllMain. Just don’t forget to FreeLibrary() your DLL before your initialization thread exits. Preferably, enable SEH and use a __try/__finally block to make sure this happens.

    I’ve not used this trick in any production code, but it has passed all the tests I’ve thrown at it. In particar, I’ve used it to host the CLR after having had a ‘gawd, this is sooooo much easier C#!’ moment.

    For the ‘Never host the CLR in a DLL!’ crowd: This is my <expletive deleted> computer, and I’ll do any <expletive deleted> <expletive deleted> thing I <expletive deleted> well please to it. *snub*

  24. Triangle says:

    You know what might be cool, if the DllMain for each thread was run only the first time the thread calls into a Dll, and then some magic was done to make subsequent calls go through normally. Like a delay-load for DllMain. Could that be possible ?

Comments are closed.