My, those threads start up really fast nowadays


Here's a little puzzle inspired by an actual bug:

// global variable
DWORD g_WorkerThreadId;

bool IsRunningOnWorkerThread()
{
  return GetCurrentThreadId() == g_WorkerThreadId;
}

bool LaunchWorkerThread()
{
 HANDLE hThread = CreateThread(nullptr, 0,
                               WorkerThread,
                               nullptr, 0,
                               &g_WorkerThreadId);
 if (hThread != nullptr) {
   CloseHandle(hThread);
   return true;
 }
 return false;
}

DWORD CALLBACK WorkerThread(void *Proc)
{
  // Can this assertion ever fire?
  assert(IsRunningOnWorkerThread());

  return 0;
}

Can the assertion at the start of WorkerThread ever fire?

Naturally, the answer is Yes, otherwise it wouldn't be a very interesting article.

The assertion can fire if the worker thread starts running before the call the Create­Thread returns. In that case, the caller hasn't yet received the handle or ID of the newly-started thread. The new thread calls Is­Running­On­Worker­Thread, which returns false since g_Worker­Thread­Id hasn't been initialized yet.

The actual bug was something along the lines of this:

void DoSomething()
{
  if (IsRunningOnWorkerThread()) {
     .. do it one way ..
  } else {
     .. do it the other way ..
  }
}

void DoManyThings()
{
  DoSomething();
  DoSomethingElse();
  DoYetAnotherThing();
}

DWORD CALLBACK WorkerThread(void *Proc)
{
  ...
  DoManyThings();
  ...

  return 0;
}

If the new thread started up so quickly that the original thread doesn't get a chance to receive the new thread ID and put it into g_Worker­Thread­ID, then the Do­Something function called from the worker thread will accidentally do things the not-on-the-worker-thread way, and then things start go go awry.

One way to address is is to add suspenders to your belt:

DWORD CALLBACK WorkerThread(void *Proc)
{
  g_WorkerThreadId = GetCurrentThreadId();
  ...

By having both the original thread and the created thread set the g_WorkerThreadId variable, you cover both cases of the race. If the original thread runs faster, then the CreateThread function will set the g_WorkerThreadId variable to the ID of the worker thread, and the first line of Worker­Thread will be redundant. On the other hand, if the worker thread runs faster, then the assignment at the beginning of Worker­Thread sets the thread ID, and the assignment performed by the CreateThread function will be redundant.

Comments (52)
  1. Joshua says:

    CREATE_SUSPENDED. That is all. Filler text to bypass spam filter.

  2. pete.d says:

    Shouldn't the call be "CreateThread(nullptr, 0, WorkerThread, nullptr, 0, &g_WorkerThreadId);"?

    Setting "g_WorkerThreadId" in two different places seems like such a kludge. A bigger question might be why does the same function have to do two different things, depending on which thread it's running on? If it's a matter of a thread-affinitied object or something like that, then the real test isn't whether the code is running on the worker thread, but whether it's on the original, affinitied thread. In which case, the global variable (yuck!) could just as easily be set to the original thread, and the test by "IsRunningOnOriginalThread()" instead of "IsRunningOnWorkerThread()".

    (Using CREATE_SUSPENDED has a more elegant feel to it as well, but unfortunately that solution suffers from a relative lack of practicality, in the sense that the extra line of code is more complicated than the original solution presented here. :( ).

    [In this case, it was a global object that had the rule "The worker thread is the only thread that can do X to the object." So in a sense the worker thread is the original thread. -Raymond]
  3. AC says:

    Since the function doesn't return the thread Id but sets the variable by pointer, it would be nicer of the API to guarantee that the value is set before the thread gets the chance to be scheduled. (The same obviously couldn't work for the returned handle).

  4. xor88 says:

    AFAIK this is a data race which results in undefined behavior in recent C++ standards. The fact that the same value is written does not matter. It is undefined behavior.

  5. anon says:

    @xor88: But if you're using the Windows API function CreateThread(), you obviously aren't writing portable C++ code.  So you can take advantages of the additional memory model guarantees that are provided by Windows, or by the CPU architectures that Windows supports (x86, x86-64, and ARM).

    I would assume that the code Raymond gave is safe on all CPU architectures currently supported by Windows.  (I haven't checked – I'm relying on Raymond having done that).

  6. xor88 says:

    The compiler would have to make that guarantee. It doesn't. See http://www.hpl.hp.com/…/why_undef.html for why that is.

  7. Joshua says:

    @xor88: The memory model of the architecture makes that guarantee, so the compiler doesn't have to.

  8. jader3rd says:

    I'm with AC on this one.

  9. Mike.Shawaluk says:

    To fit in with Raymond's comment in the post, the macro should be named CREATE_SUSPENDERS.

  10. xix says:

    minor correction

    "One way to address is is to add suspenders to your belt", double 'is'.

    Or it's a synchronization joke.

  11. xor88 says:

    @Joshua, the compiler does not translate reads and writes 1:1 to CPU instructions. Therefore, both architecture memory model of the CPU as well as the compiler of the memory model need to make that guarantee. The compiler does not guarantee that. It can reorder or elide stuff at will. It is not hard to see that.

    See http://www.usenix.org/…/Boehm.pdf and lots of other stuff.

  12. anon says:

    @xor88: What makes you think the compiler doesn't make that guarantee?

    Again, you're using Windows API functions so you'll be compiling your code on a compiler that supports Windows and targets a Windows-supported CPU architecture.  So you're not restricted to the minimum level of guarantees found in the C++ standard, you can rely on Windows-specific or architecture-specific guarantees.

    And Windows does in fact provide the guarantee you need.  From msdn.microsoft.com/…/ms684122(v=vs.85).aspx

    "Simple reads and writes to properly-aligned 32-bit variables are atomic operations. In other words, you will not end up with only one portion of the variable updated; all bits are updated in an atomic fashion. However, access is not guaranteed to be synchronized. If two threads are reading and writing from the same variable, you cannot determine if one thread will perform its read operation before the other performs its write operation."

    And blogs.msdn.com/…/10097860.aspx says that by default, DWORDs will be properly-aligned.

    So any compiler that claims to support compiling Windows programs has to provide those guarantees.

    So, there are three scenarios. Either one thread runs then the other:

    – Thread A atomically writes variable

    – Thread A atomically reads variable

    – Thread B atomically writes variable

    – Thread B atomically reads variable

    Or both writes happen first:

    – Thread A atomically writes variable

    – Thread B atomically writes variable

    – Thread A atomically reads variable } in either order

    – Thread B atomically reads variable }

    Or the compiler optimizes away the reads from the variable (which it's allowed to because it's not volatile) and thread A "reads" the value it wrote, and thread B "reads" the value it wrote.

    And in all these cases, we get the desired behaviour.

  13. V-SHorn says:

    I'm with pete.d, myself. Store the thread id of the main (or non-worker, whatever) thread, which can be known well in advance; then you don't have to race. If you have 20 million lines of code that use IsRunningOnWorkerThread(), you can still leave it named as such, just change the implementation to return GetCurrentThreadId() != g_MainThreadId;

    [The object was affine to the worker thread, not the main thread. Therefore you really wanted to check if you are on the worker thread. (Because you might be on a third thread different from main and worker.) -Raymond]
  14. Maurits says:

    I see this pattern fairly often:

    LaunchWorkerThread() {

    HANDLE hStarted = CreateEvent(…);

    HANDLE hThread = CreateThread(…, WorkerThreadProc, hStarted, …);

    if (!hThread) { bail out }

    WaitForSingleObject(hStarted, INFINITE);

    … do some things…

    }

    WorkerThreadProc(void *pContext) {

    HANDLE hStarted = pContext;

    … do some things…

    SetEvent(hStarted);

    … do some more things…

    }

  15. George says:

    I don't get it. Why use the same variable in both places? If the thread sets the variable always, then CreateThread can use a dummy local variable and ignore the result. This will avoid another race condition if we want the thread to clear the global variable when it ends.

    [There was other code that ran on the main thread that checked the worker thread ID to see if the worker thread was started. If you set it only in the worker thread, then you would have the same race condition in reverse. -Raymond]
  16. Clipboarder Gadget says:

    The code may not contain undefined behavior on Windows, but it relies on undocumented behavior. The documentation of CreateThread doesn't state that it sets the ThreadID atomically.

  17. alegr1 says:

    The actual thread is created by some kernel call, which doesn't have to return the thread ID to the usermode location, because this would require exception handling and address validation. Then the tail of the syscall copies the ID back to the caller. You can clearly see that it's impossible to get the ID saved before the thread starts.

  18. Maurits says:

    @Myria that's good feedback, but note that waiting on a thread handle and GetExitCodeThread require different security privileges. Waiting requires SYNCHRONIZE access, and GetExitCodeThread requires THREAD_QUERY_INFORMATION or THREAD_QUERY_LIMITED_INFORMATION.

  19. Joker_vD says:

    Goodness, if there is one thing I hate more than unreasonably weak synchronisation guarantees, that's the fact that there are people defending them as being perfectly okay, "just do proper sync on your own if you need it" — I almost always need it, and I don't want to spend hours trying to prove manually that yes, I did it correctly, at least on this platform, and may the God have mercy on my soul if I ever need to port/rewrite it to something with a slighlty different memory model.

  20. @AC says:

    Except CreateThread isn't what creates the thread. Now all of a sudden the kernel has to be able to ensure that the thread ID is set before *it* actually does the heavy lifting and schedules the thread.

  21. xor88 says:

    anon, you are assuming a lot. Show me the page of documentation where it says that storing to the same location in a racy way produces a defined result. As long as you don't find that sentence, you are just assuming behavior.

    The C++ standard explicitly and without doubt says that this is undefined. The compiler can, if it wishes to do so, make it defined. I don't think you have understood the concept of undefined behavior yet. It means that the compiler can compile this as:

    if (race_detected())

    format("C:");

    It really can do that and be a conforming implementation. The architecture does not play a role in this (valid) transformation.

    I hear you saying "yes, but it wouldn't do that". Again, you are *assuming* stuff without reference for it.

  22. Karellen says:

    "The assertion can fire if the worker thread starts running before the call the Create­Thread returns."

    Surely it only fires if the worker thread starts running *after* the call the^H^Ho CreateThread returns.

    Or is the assertion backwards?

  23. Niclas Lindgren says:

    Doing it like described is not a very good idea, not only do you have the problem with nothing when the thread is created and spawned, but even when you know the thread isn't created and spawned after "g_WorkerThreadId" has been written to by the main thread, there is absolutely no guarantee it is published (memory model speak).

    Setting it in the thread though solves the issue, because it is an Atomic Update (not obvious that it is though) on a DWORD in a x86 arch. On other archs, like ARM, you might end up with a (for a split moment of time) half updated value, when the thread reads back the value.

    This is just like the double check problem for singletons. Without proper memory barriers in place in the above code it will just randomly work or not work depending which CPU you end up running the different threads on.

    If you have threads, do proper thread synch, never ever try to be clever, you will fail.

    [Since the value is written by both the main thread and the worker thread, the value is observable in both. A third thread, however, may not see the result immediately. -Raymond]
  24. pete.d says:

    @Raymond: 'it was a global object that had the rule "The worker thread is the only thread that can do X to the object." '

    Ah, okay.

    Still, it seems like this is a good example of why OOP and better decoupling of code is a good thing. There's not really any fundamental reason why a) the main thread needs this global variable to track that the thread is created and running, b) the worker thread needs this global variable to know that it's the worker thread, nor c) that if some function is called legally only by the worker thread, that there even should be a code path that leads to it being called from some other thread.

    I mean, no doubt in this particular implementation that's all the case. And I agree that writing the thread ID from the worker thread is a good solution. But the whole situation seems to call out for a better-compartmentalized design (based of course on the infinitessimal amount of information I actually have :) ).

  25. Myria says:

    I agree with Clipboarder: the code sample is taking for granted that CreateThread isn't, say, writing the thread ID a word at a time, since it isn't stated in CreateThread's documentation.  I would instead have the code put CreateThread's result into a local variable, then copy it to g_WorkerThreadId, which *would* be atomic because you know that you're writing an aligned DWORD directly on a 32-bit-or-higher architecture.  You'd also want to declare g_WorkerThreadId as volatile and put _WriteBarrier() after the writes.

    A minor reminder that's tangentially related: if you use GetExitCodeThread to get WorkerThread's return value, note the warning in MSDN about the STILL_ACTIVE return code.  To perfectly determine whether a thread has terminated, call WaitForSingleObject(hThread, 0).  I wish MSDN had a note saying what the correct solution to the GetExitCodeThread caveat is.

  26. Maurits says:

    @Karellen: Assertions in general fire if their condition evaluates to false. In this particular case, the assertion will fire if GetCurrentThreadId() != g_WorkerThreadId.

    The only way that can happen is if the assertion is evaluated *before* CreateThread updates g_WorkerThreadId; that is, if the thread function reaches the crucial point before CreateThread returns.

  27. 640k says:

    It's tough when you cannot even start a thread without running into race conditions.

    blogs.msdn.com/…/9634511.aspx

  28. Karellen says:

    @Maurits – Thanks. I'd somehow managed to get my understanding of "assert()" backwards. Something to do with using the verb "to assert" (e.g. "the program assert()ed") to mean that the assert fired, and therefore getting muddled around thinking that the program was meant to assert if the predicate was true. Exposure to BUG_ON() probably hasn't helped.

    Doh!

  29. smf says:

    >It's tough when you cannot even start a thread without running into race conditions.

    If you are starting a thread you will run into race conditions.

  30. Matt says:

    To everyone who thinks that CreateThread should definitely assign pThreadId before the thread starts, perhaps you should think for a second what that would mean for a *wrapper* of CreateThread (e.g. CreateThread2, or a future CreateThreadEx, or a pthread_create). How would you preserve that kind of semantic to the parent function? You can't!.

    Now consider that CreateThread IS the parent function of NtCreateThread, so the guarantee can't be made for anything more granular than NtCreateThread without implementing heavy communication between the thread's startup routine and the CreateThread API, which also has the effect that CreateThread will WAIT for the thread to start before returning (and the added complexity this would mean for CREATE_SUSPENDED).

    No. Instead, CreateThread says "I will start a thread for you, and you don't know when that thread will start. pThreadId is definitely assigned when CreateThread returns, but the thread might start before then. Deal with it."

  31. Daniel says:

    > but the thread might start before then. Deal with it.

    (And if you don't want to deal with it: Then use CREATE_SUSPENDED. Which is exactly intended for such reasons)

    IMO:

    In this case CREATE_SUSPENDED is a way better solution, as it fixes the actual PROBLEM and doesn't simply change its effects.

    From experience, applying fixes to EFFECTS of problems only leads to more undefined behavior later on

    Let's just assume that the main thread says: On no. Lets stop worker thread 1 again and start a different worker thread. If you're very unlucky, the initialization of the global variable in thread 1 will run AFTER the initialization of thread 2… (Of course this can be patched too… but maybe that patch has side-effects too?).

    –> so do yourself a favor and fix the race condition (which can easily be done) instead of "hotfixing" its effects.

  32. Niclas Lindgren says:

    @Raymond – Yes it is observable in both, but that does not mean  it is immediately observerable, meaning the value of the variable may or may not be cached in differently places, and the compiler might even keep it in register as there is nothing illegal doing such an optimisation when you have no locking semantics involved, in java you can for instance add the volatile keyword to the variable to say something about its synchronization, adding that i C++ will make sure the compilter doesn't optimize it into a register, but it won't add any locking semantics to it..

    You need a memory barrier, everything else is just a guess.

    Setting the variable in Thread A, does not make it immediately observable in Thread B, the access can be reordered, cached into a register, published whenever.

    Again, trying to do synchronization without any semantics is in general a very bad idea as it is incredibly hard to get right.

    [But certainly if thread A sets a value, it will observe that value. Imaging the craziness if that weren't true. You couldn't trust anything! In this case, both the main thread and the worker thread set the value, so the value is observable in both. -Raymond]
  33. smf says:

    >Setting the variable in Thread A, does not make it immediately observable in Thread B, the access can be reordered, cached into a register, published >whenever.

    The variable should probably be marked as volatile, so that caching into a register doesn't happen. But nobody uses volatile, even when they should.

  34. GregM says:

    "But nobody uses volatile, even when they should."

    Well, in C++11, it's very rare that you should.  It's not for thread synchronization, it's for telling the compiler that its value can change from effects outside the program, such as connections to hardware, or via shared/mapped memory.

  35. Joker_vD says:

    @GregM: I would assume that the thread scheduling and updates of the processors' caches are "effects outside the program".

  36. Mark says:

    Niclas: and what's wrong (in the two-thread case) with storing it in a register?

  37. Deduplicator says:

    Just to rehash Raymonds summary of the solution:

    The uninitialized variable is 0, so unequal any thread. Good enough, there's no worker thread yet. (or at least it never did anything, so how would you know the difference?)

    The variable is read as 0 or the workerthread-id: Good enough, you know you are not the worker-thread and you are not the creator of the workerthread.

    The variable is read as the workerthread-id: You have no idea if you read the creators or the workerthreads write (its atomic per platform guarantees). Both have equal effect anyway.

    So, there's no trouble at all, neither write nor read barriers needed.

  38. Mal DeMer says:

    @Joker_VD: I would assume that the thread scheduling and updates of the processors' caches are "effects outside the program".

    Sure they are. But volatile in C++11 doesn't tell the compiler that it needs to ensure that synchronization occurs, only that the value might change. You don't get atomic access for volatile data: you need to explicitly use synchronization operations to get that. stackoverflow.com/…/volatile-in-c11

  39. voo says:

    In my opinion this should clearly be stated by the documentation for CreateThread to avoid confusion. The behavior is certainly not obvious and expecting explicit and accurate information about visibility, ordering, etc. is useful in any case for a concurrent API.

  40. hankdane says:

    I once worked on a problem that did this one worse:

    HANDLE hThread = CreateThread(nullptr, 0,

                                  WorkerThreadProc,

                                  nullptr, 0,

                                  &g_WorkerThreadId);

    while (WAIT_TIMEOUT == WaitForSingleObject(hThread, interval))

    The code would literally check if the thread was running by examining hThread immediately after the call to CreateThread.  Since WaitForSingleObject returns something other than WAIT_TIMEOUT when the thread has not started yet, this lead to some mysterious behavior.

  41. GregM says:

    "I would assume that the thread scheduling and updates of the processors' caches are "effects outside the program"."

    Well, they are effects outside the program, but they don't change the value of a memory location.  I suppose you could say that updates of the processor cache would change the value that the processor sees, but that applies to all memory locations, not just ones marked as volatile.

  42. Marcel K. says:

    @Matt: "How would you preserve that kind of semantic to the parent function? You can't!"

    Why not? Just pass the pointer given to the wrapper function to CreateThread. This cuts through all the layers. I don't exactly endorse this type of coding, but it's not impossible to do.

  43. alegr1 says:

    @Henri Hein:

    The code as shown is completely valid. The thread handle is immediately valid, and you can wait on it immediately. It doesn't matter whether the thread ever started to run. It only matters whether the thread object is valid (it is) and whether it's signaled yet.

    If there was any problem, it should have been somewhere else.

  44. alegr1 says:

    @Marcel K:

    >Why not? Just pass the pointer given to the wrapper function to CreateThread.

    That's not always possible for kernel code.

  45. Marcel K. says:

    @alegr1

    Kernel code calling the user mode CreateThread function? I don't think this is a valid scenario.

    [No, but kernel code would have to pass the raw original pointer to the dwThreadId variable all the way down into the object manager so it can set the thread ID at the exact moment the thread is created. And now you have a potential user-mode attack against the object manager if the user-mode code, say, changes page permissions from another thread. It can get even worse if the thread object is created in a different memory context. (E.g., via an asynchronous call.) -Raymond]
  46. Marcel says:

    @Raymond: Yes, how this could be done in kernel code or whether it's sensible at all I don't know. I guess it should be easy enough to always create the thread suspended, set the threadID and only then unfreeze it. But I'm not a kernel level guy.

    All I wanted to say is that if CreateThread gives this guarantee, then so can every function wrapped around CreateThread and thus "it can't be wrapped" is just not a valid argument against this feature.

  47. Niclas Lindgren says:

    @Raymond "[But certainly if thread A sets a value, it will observe that value. Imaging the craziness if that weren't true. You couldn't trust anything! In this case, both the main thread and the worker thread set the value, so the value is observable in both. -Raymond]"

    You misunderstand me, I am not saying your "solution" is flawed, of course what you mentioned above is true, but you speak of a race between the main thread and the worker thread. There is no race, but you imply there is. If we speak of a sync issues between the threads than my reasoning follows, so it should be seen in the light of that we actually have a sync issue. In this case the logic is flawed. All you need to do is set the variable in the worker thread and keep it 0 else, no race at all (however half baked). Also spawning multiple threads will lead to more odd behaviors. A better approach, still flawed, would be to set the variable to the Main threads ID before spawning the worker thread and check if the Id of the current thread is different from the main thread, which means it is a worker thread. If you have 2 "main threads" spawning workers you still have a problem.

    What I am saying though, is, that because you now set the variable from 2 places, in theory, in the general sync 2 variables case, that is dangerous with no locking logic. In this particular case you set it to the same value in both places, so there is no "half" published value to be worried about, since even a half published value of the same two halves will be the correct value.

    In general though, as advice to solve locking problems (which this isn't), don't be clever. In the case above, the best approach is to use a local variable and pass that to the DoSomething(TRUE), and from the main thread pass it DoSomething(FALSE). No race, no sync issues, and especially no global variables that only work as long as only one worker thread is spawned at a time.

  48. GregM says:

    "All I wanted to say is that if CreateThread gives this guarantee, then so can every function wrapped around CreateThread and thus "it can't be wrapped" is just not a valid argument against this feature."

    That's not what was asserted.  What was asserted is that if CreateThread DOESN'T give this guarantee, then a wrapper of CreateThread CAN'T give this guarantee.  Thus, by induction, CreateThread CAN'T give this guarantee, because it is itself a wrapper of a NtCreateThread, and NtCreateThread doesn't give this guarantee.

  49. Marcel K. says:

    The point is, a wrapper can not only keep the promise the original function made, it can actually add it:

    HANDLE WINAPI CreateThreadWithPromise(LPSECURITY_ATTRIBUTES sa, SIZE_T ss, LPTHREAD_START_ROUTINE start, LPVOID p, DWORD flags, LPDWORD lpThreadId) {

    __HANDLE h = CreateThread(sa, ss, start, p, flags | CREATE_SUSPENDED, lpThreadId);

    __if ((flags & CREATE_SUSPENDED) == 0) {

    ____ResumeThread(h);

    __return h;

    }

  50. Marcel K. says:

    The point is, a wrapper can not only keep the promise the original function made, it can actually add it:

    HANDLE WINAPI CreateThreadWithPromise(LPSECURITY_ATTRIBUTES sa, SIZE_T ss, LPTHREAD_START_ROUTINE start, LPVOID p, DWORD flags, LPDWORD lpThreadId) {

    __HANDLE h = CreateThread(sa, ss, start, p, flags | CREATE_SUSPENDED, lpThreadId);

    __if ((flags & CREATE_SUSPENDED) == 0) {

    ____ResumeThread(h);

    __return h;

    }

  51. Niclas Lindgren says:

    @Mark There is nothing wrong as long as you do a set in the worker thread as Raymond shows, but in the general case there are so many things wrong with that code, and that should be pointed out. Also to double set the variable, makes the code work, but it is still not a good solution to the problem. That is all I am saying. The programmer of that code needs education in many levels, what he or she doesn't need is a solution to the problem, because with the knowledge displayed, there will soon be another problem, much worse problem, with multithreading or other structural issues.

    @GregM What do you mean it cannot be wrapped and create this guarantee, of course it can?

  52. Anil [MSFT] says:

    CreateProcessAsUser has this bit in the documentation:

    The calling thread can use the WaitForInputIdle function to wait until the new process has finished its initialization and is waiting for user input with no input pending. This can be useful for synchronization between parent and child processes, because CreateProcessAsUser returns without waiting for the new process to finish its initialization. For example, the creating process would use WaitForInputIdle before trying to find a window associated with the new process.

Comments are closed.