If you want to track whether the current thread owns a critical section, you can use the critical section itself to protect it


You may find yourself in the situation where you want to keep track of the owner of a critical section. This is usually for debugging or diagnostic purposes. For example, a particular function may have as a prerequisite that a particular critical section is held, and you want to assert this so that you can catch the problem when running the debug build.

class CriticalSectionWithOwner
{
public:

  CriticalSectionWithOwner() : m_Owner(0), m_EntryCount(0)
  {
    InitializeCriticalSection(&m_cs);
  }

  ~CriticalSectionWithOwner()
  {
    DeleteCriticalSection(&m_cs);
  }

  void Enter()
  {
    EnterCriticalSection(&m_cs);
#ifdef DEBUG
    m_Owner = GetCurrentThreadId();
    m_EntryCount++;
#endif
  }

  void Leave()
  {
#ifdef DEBUG
    if (--m_EntryCount == 0) {
      m_Owner = 0;
    }
#endif
    LeaveCriticalSection(&m_cs);
  }

#ifdef DEBUG
  bool IsHeldByCurrentThread()
  {
    return m_EntryCount &&
           m_Owner == GetCurrentThreadId();
  }
#endif

private:
  CRITICAL_SECTION m_cs;
#ifdef DEBUG
  DWORD m_Owner;
  int m_EntryCount;
#endif
};

After we successfully enter the critical section, we mark the current thread as the owner and increment the entry count. Before leaving the critical section, we see if this is the last exit, and if so, we clear the owner field.

Note that we update the owner and entry count while the critical section is held. We are using the critical section to protect its own diagnostic data.

The subtle part is the Is­Held­By­Current­Thread function. Let's look at the cases:

First, if the current thread is the owner of the critical section, then we know that the diagnostic data is safe to access because we own the critical section that protects it. That's not the subtle part.

The subtle part is the case where the current thread is not the owner of the critical section. A naïve analysis would say that the diagnostic data is off limits because you are trying to access it without owning the protective critical section. But what value can m_Owner have at this point?

  1. If the critical section is not held, then m_Owner will be zero, which will be unequal to the current thread ID.

  2. If the critical section is held, then m_Owner will be the owner of the critical section, which will also be unequal to the current thread ID.

But what if the value of m_Owner changes while we are looking at it? Well, since we are not the owner of the critical section, it can only change between the two states above (possibly from one state 2 to another state 2). In other words, it can only change from one value that is not equal to the current thread ID to another value that is still not equal to the current thread ID. Therefore, even if we race against another thread entering or leaving the critical section, the fact that the owner of the critical section is not us doesn't change.

Note that this analysis assumes that the m_Owner is a suitably-aligned value that can be updated atomically. (If not, then it's possible that a torn value will be read which coincidentally matches our thread ID.)

Since the CRITICAL_SECTION itself must already be suitably aligned, placing the DWORD up against it will also align the DWORD.

Comments (42)
  1. xix says:

    (Aside due to RAMP's post no longer accepting comments):  If you missed Raymond's presentation, it's recorded here:

    http://www.ustream.tv/…/35704183

  2. Adam Rosenfield says:

    Is GetCurrentThreadId() guaranteed to never return 0?  I assume it is, but I don't see that listed in the documentation.

  3. Zarat says:

    That the Thread ID cannot be zero would be a property of the thread id, not a property of GetCurrentThreadId function. Therefore you have to look at the description of the properties of the Thread ID, and voila, there it is stated:

    A thread can use the GetCurrentThreadId function to get its own thread identifier. The identifiers are valid from the time the thread is created until the thread has been terminated. Note that no thread identifier will ever be 0.

    Quote from MSDN at msdn.microsoft.com/…/ms686746%28v=vs.85%29.aspx

  4. Adam Rosenfield says:

    …and of course if I'd bothered to Google, I would've found blogs.msdn.com/…/78395.aspx before posting that.  Yes, 0 is an invalid thread ID as one would expect, but that fact is only implied in the documentation of other semi-related functions.  The documentation for GetCurrentThreadId() and friends ought to be more explicit about such things.

  5. I suppose, while you're at it, you could add an ASSERT(m_EntryCount == 0) in the destructor.

  6. And in the beginning of Leave():

    ASSERT(m_EntryCount > 0 && m_Owner == GetCurrentThreadId())

  7. Zarat says:

    @Adam: I don't think it's the responsibility of each function receiving or returning a thread id to re-state all the properties of a thread id. That's like restating all the properties of an integer on every function working with an integer. If you want invariants about a type you'd look at the types documentation, not at functions working with the type.

  8. pete.d says:

    Why are m_Owner and m_EntryCount not required to be "volatile"? Does this example assume a) "DEBUG" always means no compiler optimizations, including those that would cache the values, and b) x86 architecture, where the memory model is essentially "everything's volatile" anyway?

    Or is there some other guarantee going on that allows those members to be used without "volatile" semantics, even on non-x86 architectures and in optimized builds?

  9. Adam Rosenfield says:

    @Zarat: Fair enough.  But it is interesting that GetThreadId() lists a return value of 0 as an error, whilst GetCurrentThreadId() makes no similar mention (presumably because it can't fail).

    I just thought of another interesting potential edge case — if a thread exits while holding the critical section, and that thread's ID gets reused for another thread, wouldn't that other thread mistakenly think it owned the critical section?  How long does it take for thread IDs to be reused?

  10. Joshua says:

    0 sounds like a value one would get back if GetCurrentThreadId failed. But how can that happen? Non-Win32 thread? If so, somebody really screwed up if it's got Win32 code running on it and EnterCriticalSection is probably dead meat.

    Incidentally, 0 is a valid process id, but it won't be a Win32 process (it's the System Idle Process). You can't inject Win32 code in that.

  11. > wouldn't that other thread mistakenly think it owned the critical section

    Yup.

    > How long does it take for thread IDs to be reused

    Developers should assume that thread IDs are reused at the worst possible time… that is, that the agent handing out thread IDs is malicious.

    If you want to know whether a given thread is still around, you need a thread handle. You can call WaitForSingleObject(h, 0) == WAIT_OBJECT_0 to know whether the thread has exited.

  12. Zarat says:

    Reusing the thread id cannot change it from "not my thread id" to "my thread id" (or reverse) because that would mean someone just got the same thread id assigned as the currently running thread.

    But I agree with the others who are being cautious, I don't trust myself enough to be understand all the corner cases a compiler or non-x86 platform can introduce. I'd probably use a C++11 atomic variable to let it do the hard work of figuring out how to "get it right".

  13. Damien says:

    Is the m_EntryCount test in IsHeldByCurrentThread just an optimization to avoid calling GetCurrentThreadId()?

  14. Adam Rosenfield says:

    @pete.d: I don't think that the lack of volatile here is a problem.  If the thread is the owner of the critical section, then if that thread calls IsHeldByCurrentThread(), it's the same thread which is reading and writing m_Owner and m_EntryCount, so there are no threading/caching issues.

    Conversely, if another thread tries reading m_Owner and m_EntryCount while another thread is obtaining, obtains, or is releasing the critical section, it doesn't matter if the thread reads a stale, cached value or the most up-to-date value, and it also doesn't matter if the reads happen out-of-order — in any of those cases, the value it reads for m_Owner will either be 0 or another thread's ID, so it will never be the calling thread's ID.

  15. alegr1 says:

    >I just thought of another interesting potential edge case — if a thread exits while holding the critical section, and that thread's ID gets reused for another thread, wouldn't that other thread mistakenly think it owned the critical section?

    That's not an edge case, that's a bug which should not occur in a valid program. You just don't exit while holding a CS.

    @pete.d:

    EnterCS and LeaveCS provide a memory barrier. Because of that, those variables don't have to be volatile.

  16. Simon Buchan says:

    @pete.d: Enter/LeaveCriticalSection(), like all synchronization APIs has to be defined as a (potentially one directional) memory barrier in order to work, so the compiler is not permitted to move either reads or writes above enter or below exit, even without volatile, and it must emit sufficient instructions to ensure the CPU doesn't either. (Which can get expensive on ARM7, I hear)

    You're probably thinking of volatile's semi-abuse in lock-free algorithms: volatile says the compiler can't do anything to accesses to this variable: coalesce writes, hoist reads out of guards and other common codegen optimizations, essentially that they are IO. It doesn't prevent the CPU from doing so, since the CPU knows if that address was mapped to a DMA range or whatever. It just happens that x86 doesn't, (at least in most cases?), so a lot of technically unsafe uses of volatile work. std::atomic<> was introduced in C++11 as the right thing to use instead of volatile: it requires that the correct interlocked instructions are emitted. Win32 provided the Interlocked*() APIs to gain portable (to windows supported hardware platforms) access to those instructions beforehand.

  17. wqw says:

    m_Owner and m_EntryCount are declared in debug build only, but the constructor references them in release too. Or am I missing somthing?

    [You're not missing anything. It's just a bug. -Raymond]
  18. @Adam Rosenfield:

    I would assume that GetCurrentThreadId function would get the value from the TEB structure. If that gets into a status where it could fail, then it would be better if the thread was dead.

  19. Joshua says:

    @Adam Rosenfield, alegrl1:

    Wouldn't EnterCriticalSection hang forever or return ABANDONED_MUTEX error in that case?

  20. Adam Rosenfield says:

    @alegr1: Yes, any thread that exits while holding a critical section is clearly a very bad bug, so it's not worth trying to fix this code to handle that case correctly.  But I just thought that it's worth noting that thread ID reuse is something to think about.

    @Joshua: The behavior is undefined, according to the documentation on EnterCriticalSection().  So actually, it's not even (theoretically) possible to handle that case correctly, since the behavior becomes undefined as soon as the thread exits while holding the critical section, not the next time some other tries to enter the critical section.

    You're probably thinking of the WAIT_ABANDONED status which can be returned by the Wait* family of functions when given a mutex.  Since mutexes act across processes, other processes may need to conceivably handle the WAIT_ABANDONED status (such as by closing shared memory objects), so the behavior needs to remain defined when a process exits while holding a mutex.  But critical sections are local to processes and cannot be shared across processes.

  21. Mordachai says:

    Awesome little idea, and great discussion spawned.  Thanks :)

  22. pete.d says:

    "Enter/LeaveCriticalSection(), like all synchronization APIs has to be defined as a (potentially one directional) memory barrier in order to work"

    But that only affects the thread writing to the value, doesnt' it? The thread reading, without any synchronization, doesn't necessarily benefit from that. Without its own memory barrier (which at least in some languages is provided by using "volatile"…I don't recall off the top of my head what C++'s memory model is, and I realize it might not be as formalized as e.g. .NET languages' models), the CPU may still reorder the read to occur too early to see a write from another thread.

  23. Joshua says:

    @alegrl1: In NT4, CriticalSection is implemented in terms of a mutex (with the optimization that it only exists when somebody is actually waiting for it). They got fancy in newer versions of Windows but it boils down to basically the same thing.

  24. Mike Vine [MSFT] says:

    I do not agree with some of the points in this post. However much you argue about what possible values entry-count and owner have this is *clearly* and *unequivocally* undefined behaviour according to the C++ standard – there's a clear data race here as defined in the latest [C++11 standard]. And once you have undefined behaviour then your whole program is undefined. If you make the 2 extra values std::atomic you lose a *tiny* bit off efficiency with the benefit of a well defined program.

    Whether or not the MS compiler guarantees this to work, all of us as C++ professional programmers should hold ourselves to higher standards than this.

    Oh, and those talking about using volatile for thread safety, if you've got access to a C++11 compiler please use std::atomic – volatile isn't really useful for thread safety, even with the MS specific extensions (which I'm hoping will eventually be deprecated, they're already are off by default on VS for Arm) there are still situations where volatile just doesn't do enough. See http://www.drdobbs.com/…/212701484 for lots of juicy info on why you should migrate from using volatile in C++!

  25. pete.d says:

    Re: lack of volatile

    I read a couple more replies, and think I understand now. That is, it doesn't matter that the read value might be wrong, because the wrong value is still not going to be the current thread's values (i.e. the function will still return false, as needed).

    The only exception I see to this is the (probably rare) case where the thread ID is reused. That would be the scenario where the values could theoretically match, but be from an old, dead thread if volatile semantics aren't observed. But it's my understanding that there is at least one implicit memory barrier imposed when a thread starts up, obviating the need for explicitly imposing volatile.

    In other words, all good. :)

    Still, I have to agree with all the other commenters: this sounds like the kind of thing to let the author of some high-performance concurrency library worry about. It's a great topic for the blog, but it's not the kind of thing that ought to show up in most people's production code. :)

  26. hacksoncode says:

    @Mike Vine: It's clear from the context that this program is *not* using C++11, because if it were, it would also be using C++11 threads and, consequently, synchronization primitives (i.e. *not* critical sections). I'm not even sure there *is* a spec for what happens in this case in a C++11 program if you're not using C++11 threads, because the abstract execution model for multiprocessing isn't defined unless you are.

    So, yes, a programmer "should" refactor the entire program to use C++11 threads, and making use of atomic variables in this case would probably be the only way anyone would think of doing it.

    If you're using OS-level threads, however, this kind of meta-analysis about the effects is perfectly valid, because you're already constrained to that OS's thread, execution, and memory model. And on Windows, with the processors it runs on, the function in question has perfectly well-defined high-level behavior, because the only possible ambiguities don't matter.

  27. Mike says:

    This looks ok to me without volatile or atomic as long as you're on a single processor but I'm not convinced it's good with multiprocessors.  The sequence might be:

    Thread 1 on processor A has the lock

    Thread 1 calls IsHeldByCurrentThread and gets back true

    Thread 1 migrates to processor B and releases the lock

    Thread 2 on processor B acquires the lock

    Thread 1 migrates back to processor A and calls IsHeldByCurrentThread again

    Without an acquire/release on m_Owner and m_EntryCount I'm not sure that processor A is guaranteed to know that anything's changed and might still return true.  Enter/LeaveCriticalSection doesn't give that to you because they're operating on a completely different structure.

    [In other words, what if processor A has not yet observed the write to m_Owner that occurred as part of the Leave? Imagine if this were possible, that a thread can migrate to a processor which has not observed memory changes made by the thread prior to the migration. Then this code could assert: void foo() { int x = 1; assert(x==1); } I think you'd agree that pretty much every line of code would be broken if this were true. It would require all memory accesses to be performed with explicit barriers. -Raymond]
  28. Killer{R} says:

    /*Note that this analysis assumes that the m_Owner is a suitably-aligned value that can be updated atomically. (If not, then it's possible that a torn value will be read which coincidentally matches our thread ID.)*/

    Anyway, if its not true, or if we're on some exotic hardware, so in race state it can return some garbage instead of old/previous values – this can be simple fixed by TryEnterCriticalSection(). By the cost of some overhead of course.

  29. alegr1 says:

    @Joshua:

    >In NT4, CriticalSection is implemented in terms of a mutex

    Any implementation of Critical_section that's trying to avoid unnecessary kernel roundtrip cannot use a kernel mutex. A auto-reset event is required. That's because you need to be able to signal the event without havind to acuire it beforehand, and not have to release it after having waited for it.

    If you want to use a kernel mutex, you cannot apply any user-mode optimization.

  30. Clovis says:

    The ctor needs an #ifdef DEBUG around the member initialisation too.

  31. Matt says:

    "Since the CRITICAL_SECTION itself must already be suitably aligned, placing the DWORD up against it will also align the DWORD."

    Not true. Alignment/Padding of fields is up to the compiler. (I can't for the life of me think why a compiler might deliberately MIS-align a field, but that doesn't mean it can't happen).

    [A compiler which intentionally misaligned types would not be Win32-compatible, and consequently this entire article does not apply. -Raymond]
  32. Matt says:

    // setup:

    CriticalSectionWithOwner cs;

    for(ULONG i = 0; i < (ULONG)UINT_MAX); i++)

     cs.Enter();

    // ZOMG an Integer Overflow :(

    cs.Enter();

    ASSERT(cs.IsHeldByCurrentThread());

    // cleanup

    cs.Leave();

    for(ULONG i = 0; i < ((ULONG)UINT_MAX) + 1; i++)

     cs.Leave();

  33. Matt says:

    @alegr1, @Joshua:

    A CRITICAL_SECTION is a spin-lock and a lazily initialized kernel MUTANT object. EnterCriticalSection waits a certain number of spins on the spin lock before giving up and creating and waiting on the kernel mutex. This means that if your CRITICAL_SECTION is rarely contested (or protects a very fast section of code), then a CRITICAL_SECTION is basically a spin lock with no syscalls or kernel resources. But if the protected code is slow and occasionally contested, the threads waiting on it will call the kernel to wait on the mutex which will deschedule the thread, avoiding wasting the CPU spinning on a thread that is going to have to wait for a long time.

  34. alegr1 says:

    @Matt:

    Using a kernel mutant in a CS would require that EnterCriticalSection requires a *mandatory* trip to the kernel even if there is no contention. Because if a contender appears later, and had to wait for the mutex, the mutex would not be owned otherwise. LeaveCriticalSection will also require a kernel call.

    So if you want to use a mutex, you might as well just use a mutex, not bothering with any CS, because you'll just have to acquire and release it every time.

    Whereas using an auto-reset event for CS makes everything a piece of cake. With an event, you can use the usermode optimization, because you only need to signal the event if there are contenders, and you only need to wait on the event if there is contention. And you can signal the event (release the wait) without ever having to wait on it.

  35. smf says:

    Can't you use the OwningThread member in the CRITICAL_SECTION structure?

    msdn.microsoft.com/…/cc164040.aspx

  36. Joshua says:

    @alegrl: When I took computing theory, mutexes didn't check if you were stupid enough to lock on one thread then unlock on another.

    @smf: If you do that, there's a random chance your code breaks on any service pack.

  37. alegr1 says:

    @Joshua:

    >When I took computing theory, mutexes didn't check if you were stupid enough to lock on one thread then unlock on another.

    In theory, the practice is same as theory, and Windows mutexes don't check if you release them from another thread.

    In practice, they're not the same, and the mutexes do check that. The very first remark in ReleaseMutex doc says so.

  38. alegr1 says:

    @Joshua:

    Also, a lazy CreateMutex when a thread needs to wait for the first time for a contended CS cannot create it in the state owned by a different thread, which would be necessary for actual WaitForSingleObject to suspend the thread.

  39. alegr1 says:

    >When I took computing theory, mutexes didn't check if you were stupid enough to lock on one thread then unlock on another.

    Of course you can have a non-reentrant mutex that you can also release from any thread. In Windows it is called an auto-reset event.

  40. Gabe says:

    alegr1: I'm pretty sure that a non-reentrant mutex that you can also release from any thread was originally called a "semaphore".

  41. alegr1 says:

    @Gabe:

    >I'm pretty sure that a non-reentrant mutex that you can also release from any thread was originally called a "semaphore".

    The CS would require a semaphore with max count of 1 (to only allow releasing one waiter), which is equivalent to an auto-reset event.

  42. Martin Ba. _ says:

    Why not just assert on the members of the CRITICAL_SECTION structure itself? (LockCount, OwningThread, …) Seriously.

    [Please tell me you're joking. -Raymond]

Comments are closed.