A process shutdown puzzle


In honor of National Puzzle Day, I leave you today with a puzzle based on an actual customer problem.

Part One: The customer explains the problem.

We have this DLL, and during its startup, it creates a thread with the following thread procedure:

DWORD CALLBACK ThreadFunction(void *)
{
  HANDLE HandleArray[2];
  HandleArray[0] = SetUpStuff();
  if (HandleArray[0]) {
    HandleArray[1] = ShutdownEvent;
    while (WaitForMultipleObjects(2, HandleArray,
                             FALSE, INFINITE) == WAIT_OBJECT_0) {
      ProcessStuff();
    }
    CleanUpStuff(HandleArray[0]);
  }
  SetEvent(ThreadCompleteEvent);
  FreeLibraryAndExitThread(ThisLibrary, 0);
}

During process shutdown, the following function is called as part of DLL_PROCESS_DETACH handling:

void StopWorkerThread()
{
  // tell the thread to stop
  SetEvent(ShutdownEvent);

  // wait for it to stop
  WaitForSingleObject(ThreadCompleteEvent, INFINITE);

  // Clean up
  CloseHandle(ShutdownEvent);
  ShutdownEvent = NULL;

  CloseHandle(ThreadCompleteEvent);
  ThreadCompleteEvent = NULL;
}

The above function is hanging at the call to WaitForSingleObject. If we break in, we see that the thread that is supposed to be running the ThreadFunction is gone. I verified that the thread was successfully created, but by the time we get around to waiting for it, it's already gone.

I checked, and nobody sets the ThreadCompleteEvent except the StopWorkerThread function. I stepped through SetUpStuff, and it succeeded. However, a breakpoint on CleanUpStuff was never hit. No exceptions were thrown either.

I am completely stumped as to how this thread disappeared.

You already know enough to explain how the thread disappeared.

Part Two: After providing your explanation, the customer came up with this solution.

Thank you for your explanation. We've made the following changes to fix the problem. Again, thank you for your help.

DWORD CALLBACK ThreadFunction(void *)
{
  HANDLE HandleArray[2];
  HandleArray[0] = SetUpStuff();
  if (HandleArray[0]) {
    HandleArray[1] = ShutdownEvent;
    while (WaitForMultipleObjects(2, HandleArray,
                             FALSE, INFINITE) == WAIT_OBJECT_0) {
      ProcessStuff();
    }
    CleanUpStuff(HandleArray[0]);
  }
  // SetEvent(ThreadCompleteEvent);
  FreeLibraryAndExitThread(ThisLibrary, 0);
}

void StopWorkerThread()
{
  // tell the thread to stop
  SetEvent(ShutdownEvent);

  // wait for the thread
  WaitForSingleObject(ThreadHandle, INFINITE);

  // Clean up
  CloseHandle(ShutdownEvent);
  ShutdownEvent = NULL;
}

Criticize this proposed solution.

Part Three: Even though the proposed solution is flawed, explain why it doesn't cause a problem in practice. (I.e., explain why the customer is always lucky.)

Comments (29)
  1. Juan says:

    Hi, I am pretyt new in Threads programming and all I want to do is try to learn. :)

    I think, (Please note that I said think) the main concern about this solution is to pass the Thread’s handle to WaitForSingleObject, since if the thread is gone it realeases the handle, so at least in theory, any process can reuse it.

    I think the customer is always lucky because the computer the customer is running the application in, has enough memory to not reuse handles and just continue to use higher handles. I think under low memory conditions this solution will run in troubles. Am I right?

  2. Alexander Grigoriev says:

    The DLL in question is never unloaded by FreeLibrary. If it is, it will hang on WaitFSO.

    Anyway, on process exit one should not be waiting for the threads. And to safely shut the threads down in a DLL it requires a second proxy DLL.

  3. Stewart says:

    The proposed solution suffers from the same problem, since waiting on the thread in DllMain will deadlock because ExitThread needs to take the loader lock.

    I’m totally stumped as to how they’re getting away with it though. The only way I can think of is if CleanupStuff closes the thread handle. It seems likely that CleanupStuff will run to completion before the WaitForSingleObject runs, since the SetEvent will probably cause a context switch to the worker thread. If the handle got closed in there, then WaitForSingleObject will fail with ERROR_INVALID_HANDLE, but since they don’t check the return code, they won’t know that. Either that or they closed the handle straight away after creating the thread. That would also make sense, since if they had the handle around, what did they need ThreadCompletedEvent for anyway.

    On a related note, they seem to be calling FreeLibraryAndExitThread without calling LoadLibrary first – unless they’re doing it somewhere else. That might be bad.

  4. Mark Steward says:
    1. They haven’t read the docs for ExitProcess.  It terminates all threads except the main one.  Pens down, no finishing the sentence you’re on. (http://blogs.msdn.com/oldnewthing/2383346.aspx)

    2. The thread no longer exists (see part 1).  If the handle were reused, we could end up waiting on something forever.

    3. Handles rarely get reused.

  5. peterchen says:

    Juan,

    The handle is valid until it is closed, i.e. it "survives" thread termination. This is intentional, so you can indeed wait on a thread handle, whcih get signaled when the thread terminates.

  6. Mark Steward says:

    Actually, the more likely reason they’re lucky is that they haven’t called CloseHandle, so the thread object and its handle still exist (even though the thread doesn’t).

  7. Stewart says:

    Mark,

    But if they haven’t closed the handle, so we assume ThreadHandle is valid, then the call to WaitForSingleObject, which is called with the loaderlock held, should deadlock with the thread termination in FreeLibraryAndExitThread which needs to acquire the loader lock – so they ought to see a deadlock here.

  8. Mark Steward says:

    Yes, that will happen if they’re unloaded using FreeLibrary (before process shutdown).  That’s another problem with this design.  As you said, they shouldn’t be calling FreeLibary from DllMain.

    However, on process shutdown, DllMains are called after the threads are terminated, so the handle will be in the permanently signalled state as peterchen said.

  9. Max says:
    1. ExitProcess terminates all threads except the calling one ( as Mark Steward has already pointed out ).

    2. It is not a good idea to wait in a thread in DLLMain

    3. The customer is probably lucky because during process termination ( according to one of the Raymond’s posts on this topic ), the kernel says "Open season on critical sections!" and that causes the loader lock not to block at all ?

  10. gedoe says:

    Max seems (almost)spot on

    Exitprocess terminates all threads

    their first solution can therefore never succeed as the event is never set (the thread is already gone)

    the second solution works, the thread is dead and the threadhandle knows that. But nothing they did helped it, they could just as easy have ignored exiting the thread.

    Extra luck is rewarded for the fact that they do seem to have their own dll fixed in memory due to a GetModuleHandle or something and thus should call Freelibrary (as they try). This ensures that the only reason for unloading the dll is terminating the application

    (unloading due to a refcount of 0 would have caused havoc, then the threads are not forcefully terminated and the loader lock would have prevented the process from recovering)

  11. Florian says:
    1. When the process exits, all threads are killed before DLL_PROCESS_DETACH is called. Thus StopWorkerThread() comes too late.

    2. I see several problems: First, StopWorkerThread() is still too late, so even though there is no deadlock, CleanUpStuff() is never called, which is clearly not the intended behaviour. Second, if the worker thread execution actually reaches FreeLibraryAndExitThread and the library is unloaded, DLL_PROCESS_DETACH will be sent to the thread, I think (the docs are unspecific about this), thus the thread ends up waiting for itself. Or maybe LoadLibrary is called before the thread is being created, but in this case the library is keeping a reference on itself and will never be freed before the process terminates.

    3. Maybe CleanUpStuff isn’t that important, and the second problem doesn’t occur because no one calls FreeLibrary, WaitForMultipleObjects never fails (seems unlikely) and no one else raises the shutdown event. OK, it’s not a good explanation.

  12. Stewart says:

    I’m surprised that all the threads get terminated before DLL_PROCESS_DETACH is called. I’m sure I’ve seen processes deadlock on shutdown waiting for threads in DllMain, but not sure enough to disagree – I don’t know how I’d ever have seen it otherwise – I’m pretty sure I haven’t written enough code that uses dynamically loaded (or more importantly unloaded) libraries to have seen it that way.

    I’m not sure enough to disagree though :)

  13. WaldoFinder says:

    Stewart,

    FreeLibraryAndExitThread() never gets called, so no dead lock.

  14. Bob says:

    The problem: "We have this DLL, and during its startup, it creates a thread"

    The real solution: "Don’t do that"

    Seriously. I read that much of the first sentence and already knew that they were never going to get far.

    And even though luck is with them for the proposed "solution", all they need to do is add an "unrelated" dependancy to a DLL that will cause just enough of a change to the DLL loading sequence that they’ll hit a deadlock. And because adding a dependancy wouldn’t even need to be related to the thread, they’ld probably have no idea why it caused a deadlock.

    If you look up DllMain on msdn, this is what you find:

    http://msdn.microsoft.com/en-us/library/ms682583.aspx

    Because DLL notifications are serialized, entry-point functions should not attempt to communicate with other threads or processes. Deadlocks may occur as a result.

    And if you’re not retarded, you say "hmm, I’m having problems with my DLL, first let me check the documentation and see what it says".

    Threads and DllMain() – just say no!

  15. John says:

    "And if you’re not retarded…"

    Too late.

  16. Alexandre Grigoriev says:

    A clean way to have a thread in a DLL is to have a second proxy DLL which runs the thread. First (explicit) DLL loads the proxy and asks it to start the thread. The thread should hold an extra reference to the DLL by calling LoadLibrary(hProxy). When the first DLL gets unloaded, its DllMain should signal the thread to exit, and then call FreeLibrary(hProxy). The thread will call FreeLibraryAndExitThread(hProxy). All done. The proxy is now gone, too.

  17. A completly different mark says:

    It appears that most of the answer has been described, but not all in one post.

    1) … explain how the thread disappeared:

    The process is exiting, the DLL was not (un)loaded dynamically. All threads will have exited or have been killed by ExitProcess(). We can assume that the DLL in use is not loaded dynamically, since if it was, the thread would still exist, set the event, and call FreeLibrary on the DLL currently unloading, resulting in … i have no idea, but it cant be good.

    2) Criticize this proposed solution:

    Removing the SetEvent() call only fixes the symptom of the problem, not the underlying cause. Also calling FreeLibrary is totally wrong (who called LoadLibrary() and see above.)

    3) explain why the customer is always lucky:

    Since the thread is killed, it never exits the loop, and nothing below it gets called. However the new wait will exit this time, since the thread is gone, and windows will clean up the handle when the process is finished exiting. No possible deadlock since the ‘communication’ with another thread never happens.

  18. WaldoFinder says:

    Alexandre,

    How are you supposed to do this? You are not supposed to call LoadLibray() or anything that calls LoadLibray(), from DllMain(). Your solution will deadlock, as your DLL is waiting for your proxy DLL to load, while the proxy can’t load till DllMain returns.

    Read the MSDN description of DllMain(), if you need a thread in a DLL, don’t create it in DllMain().

  19. Alexander Grigoriev says:

    Waldo finder,

    1) dynamically link the second DLL to the first one, so it will be loaded automatically.

    2) You CAN call CreateThread in DllMain. The issue is only that its thread function won’t start running until your DllMain returns. You can’t synchronize with the thread’s initialization code though (that’s what either AfxBeginThread of beginthreadex used to do, but doesn’t anymore).

  20. Alexander Grigoriev says:

    WaldoFinder,

    And, by the way, the loader lock is recursive.

  21. Daev says:

    So ExitProcess, which is called any time a program exits, kills all the threads except for the one that invoked ExitProcess.  Then it has that thread call DllMain with DLL_PROCESS_DETACH.

    Yet we so often hear that the only way to kill a thread is with TerminateThread, and TerminateThread is so horrible that it should never be used, since it leaves your process-world in a dangerous, unstable state.

    So how is it that one can do anything in the DllMain?  Does the kernel shut down the threads in a way less likely to cause the horrible trouble that you would be in if you simply called TerminateThread in the middle of your program?

    [The historical context is given in this old article. And that’s why the guidance in MSDN is to do nothing if DLL_PROCESS_DETACH is called due to process termination. -Raymond]
  22. tb says:

    How much metadata is created with a HANDLE? and how much space does it occupy in memory? WinNT.h has "typedef void* HANDLE;", but windbg and Process Explorer (SysInternals) tell me all kinds of information about various handles. Where is all that stored?

    I began looking into a handle leak in a 3rd party application on a server and was sidetracked by these rather pedestrian questions whose answers elude my searches on Google and MSDN. Perhaps I don’t know where to look. Do I just need to buy the book by Russinovich and Solomon?

    Apologies for the slight non sequitur, but I can’t find the information that I’m looking for anywhere else, and comments are closed on a more relevant post. http://blogs.msdn.com/oldnewthing/archive/2007/07/18/3926581.aspx

  23. hw says:

    OT for tb:

    Start here. The book might be helpful too, but this is the basics (apologies if you’ve been through this, but everything you’re asking for is described in this "chapter"):

    http://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspx

  24. tb says:

    @hw: Not sure where that’s buried in the link, but I eventually found Handles and Objects ( http://msdn.microsoft.com/en-us/library/ms724457(VS.85).aspx )

    Oh, what different results you get searching ‘handle limits’ and ‘handle limitations’.

    "The total number of open handles in the system is limited only by the amount of memory available." That tells me that there’s no longer a hard limit (like 10000 or 32768) for all handles, but it doesn’t help me to predict when a system could hit its limit, or just generally satisfy my curiosity about the memory footprint of a handle to a registry key or a file.

  25. OK, my first try at one of Raymond’s Famous Problems:

    Part 1:

    When a process is exiting, DLL_PROCESS_DETACH is called after all threads have been terminated except one.

    Part 2:

    According to the ExitProcess documentation, thread handles do not become signalled until after all DLL_PROCESS_DETACH notifications return.

    Furthermore, the list of allowed actions for DLL_PROCESS_DETACH is quite small (and only semi-documented). It’s better to do nothing.

    This "solution" will totally fall apart if FreeLibrary is called instead of the process exiting (due to a deadlock over the loader lock).

    Part 3:

    Hmmm… this one is tough.

    I would guess that the ExitProcess documentation is a bit misleading, and that the thread handle is signalled before DLL_PROCESS_DETACH is called.

    This may be lucky results of a race condition, and I wouldn’t count on it happening always.

    General Recommendation:

    What is needed is for the customer to restructure their thinking: a thread DOES NOT belong to a dll. It belongs to the process. The process may explicitly ask the DLL to create or cleanup threads on its behalf (using exports), but the dll should not create or cleanup threads implicitly (using DllMain).

    -Steve
    
  26. DaveK says:

    @tb:  You need to look up a bit about how the Windows NT kernel Object Manager works.  HANDLEs are indices into a (sparse, multi-level tree-based) array of pointers; the pointers point to kernel and executive objects such as thread and process structures, mutexes, etc., etc.

    So basically each extra handle to an object uses up four bytes in the handle table.

    Browse the wikipedia entry, it looks reasonably accurate at a first glance:

    http://en.wikipedia.org/wiki/Object_Manager_(Windows)

  27. Tyler says:

    The system will terminate the thread during ExitProcess, probably while it is blocking on HandleArray.  As a result ThreadCompleteEvent never gets set.  So blocking on ThreadCompleteEvent will block indefinitely.

    On the other hand, blocking on ThreadHandle will always return WAIT_OBJECT_0, because the thread has already been terminated as part of the regular shutdown process.

  28. Justa Fanna DaOleNewTing says:

    Part 1: the thread was already removed by the time PROCESS_DETACH is handled, so ThreadCompleteEvent will never be set, so the WFSO in PROCESS_DETACH will never be satisified, and will hang.  

    Part 2: does not actually address the issue that the thread will be removed by the time PROCESS_DETACH is handled, so setting ShutdownEvent is pointless.  

    Part 3: WFSO is actually failing (and thus not in an INFINITE wait) because ThreadHandle is not actually a handle.  

Comments are closed.