The history of Win32 critical sections so far


The CRITICAL_SECTION structure has gone through a lot of changes since its introduction back oh so many decades ago. The amazing thing is that as long as you stick to the documented API, your code is completely unaffected.

Initially, the critical section object had an owner field to keep track of which thread entered the critical section, if any. It also had a lock count to keep track of how many times the owner thread entered the critical section, so that the critical section would be released when the matching number of Leave­Critical­Section calls was made. And there was an auto-reset event used to manage contention. We'll look more at that event later. (It's actually more complicated than this, but the details aren't important.)

If you've ever looked at the innards of a critical section (for entertainment purposes only), you may have noticed that the lock count was off by one: The lock count was the number of active calls to Enter­Critical­Section minus one. The bias was needed because the original version of the interlocked increment and decrement operations returned only the sign of the result, not the revised value. Biasing the result by 1 means that all three states could be detected: Unlocked (negative), locked exactly once (zero), reentrant lock (positive). (It's actually more complicated than this, but the details aren't important.)

If a thread tries to enter a critical section but can't because the critical section is owned by another thread, then it sits and waits on the contention event. When the owning thread releases all its claims on the critical section, it signals the event to say, "Okay, the door is unlocked. The next guy can come in."

The contention event is allocated only when contention occurs. (This is what older versions of MSDN meant when they said that the event is "allocated on demand.") Which leads to a nasty problem: What if contention occurs, but the attempt to create the contention event fails? Originally, the answer was "The kernel raises an out-of-memory exception."

Now you'd think that a clever program could catch this exception and try to recover from it, say, by unwinding everything that led up to the exception. Unfortunately, the weakest link in the chain is the critical section object itself: In the original version of the code, the out-of-memory exception was raised while the critical section was in an unstable state. Even if you managed to catch the exception and unwind everything you could, the critical section was itself irretrievably corrupted.

Another problem with the original design became apparent on multiprocessor systems: If a critical section was typically held for a very brief time, then by the time you called into kernel to wait on the contention event, the critical section was already freed!

void SetGuid(REFGUID guid)
{
 EnterCriticalSection(&g_csGuidUpdate);
 g_theGuid = guid;
 LeaveCriticalSection(&g_csGuidUpdate);
}

void GetGuid(GUID *pguid)
{
 EnterCriticalSection(&g_csGuidUpdate);
 *pguid = g_theGuid;
 LeaveCriticalSection(&g_csGuidUpdate);
}

This imaginary code uses a critical section to protect accesses to a GUID. The actual protected region is just nine instructions long. Setting up to wait on a kernel object is way, way more than nine instructions. If the second thread immediately waited on the critical section contention event, it would find that by the time the kernel got around to entering the wait state, the event would say, "Dude, what took you so long? I was signaleded, like, a bazillion cycles ago!"

Windows 2000 added the Initialize­Critical­Section­And­Spin­Count function, which lets you avoid the problem where waiting for a critical section costs more than the code the critical section was protecting. If you initialize with a spin count, then when a thread tries to enter the critical section and can't, it goes into a loop trying to enter it over and over again, in the hopes that it will be released.

"Are we there yet? How about now? How about now? How about now? How about now? How about now? How about now? How about now? How about now? How about now? How about now? How about now?"

If the critical section is not released after the requested number of iterations, then the old slow wait code is executed.

Note that spinning on a critical section is helpful only on multiprocessor systems, and only in the case where you know that all the protected code segments are very short in duration. If the critical section is held for a long time, then spinning is wasteful since the odds that the critical section will become free during the spin cycle are very low, and you wasted a bunch of CPU.

Another feature added in Windows 2000 is the ability to preallocate the contention event. This avoids the dreaded "out of memory" exception in Enter­Critical­Section, but at a cost of a kernel event for every critical section, whether actual contention occurs or not.

Windows XP solved the problem of the dreaded "out of memory" exception by using a fallback algorithm that could be used if the contention event could not be allocated. The fallback algorithm is not as efficient, but at least it avoids the "out of memory" situation. (Which is a good thing, because nobody really expects Enter­Critical­Section to fail.) This also means that requests for the contention event to be preallocated are now ignored, since the reason for preallocating (avoiding the "out of memory" exception) no longer exists.

(And while they were there, the kernel folks also fixed Initialize­Critical­Section so that a failed initialization left the critical section object in a stable state so you could safely try again later.)

In Windows Vista, the internals of the critical section object were rejiggered once again, this time to add convoy resistance. The internal bookkeeping completely changed; the lock count is no longer a 1-biased count of the number of Enter­Critical­Section calls which are pending. As a special concession to backward compatibility with people who violated the API contract and looked directly at the internal data structures, the new algorithm goes to some extra effort to ensure that if a program breaks the rules and looks at a specific offset inside the critical section object, the value stored there is −1 if and only if the critical section is unlocked.

Often, people will remark that "your compatibility problems would go away if you just open-sourced the operating system." I think there is some confusion over what "go away" means. If you release the source code to the operating system, it makes it even easier for people to take undocumented dependencies on it, because they no longer have the barrier of "Well, I can't find any documentation, so maybe it's not documented." They can just read the source code and say, "Oh, I see that if the critical section is unlocked, the Lock­Count variable has the value −1." Boom, instant undocumented dependency. Compatibility is screwed. (Unless what people are saying "your compatibility problems would go away if you just open-sourced all applications, so that these problems can be identified and fixed as soon as they are discovered.")

Exercise: Why isn't it important that the fallback algorithm be highly efficient?

Comments (19)
  1. Martin Bonner says:

    Because, to a first approximation, it is never called.

    Also, if you can't allocate an auto-reset event, your system is very, very, overloaded – you just need to survive until the overload goes away.

  2. VinDuv says:

    “Exercise: Why isn't it important that the fallback algorithm be highly efficient? ”

    An Out-of-memory situation is not supposed to occur frequently, so it makes no sense to optimize for it. Anyway, the program will probably not continue running for long if it’s getting out-of-memory errors.

    I was wondering how the contention event is created. What if two threads try to enter a locked critical section with no contention event?

  3. alegr1 says:

    >What if two threads try to enter a locked critical section with no contention event?

    I expect both will create an event, then use InterlockedCompareExchangePointer to set it, and one of the events will be freed.

    But the whole lazy allocation sceme so reeks of NT 3.1 era. If the kernel is so out of nonpaged memory that it cannot allocate about 128 bytes, them it will not even be able to run any I/O (including paging), because typical IRP, MDL and SRB will require allocations of similar size. The kernel is pretty much constipated at that point and the application will simply fail one by one, even if they don't use CRITICAL_SECTION.

  4. Simon says:

    Very interesting stuff! Thanks Raymond.

    I like reading about kernel changes in Windows. Channel9 had some good videos about this topic years ago, as well as TechNet magazine, some MSFT blogs and the Windows Internals book. These days I can't seem to find descriptions of kernel improvements anymore. Can anyone recommend some resources covering Windows 8/8.1?

  5. Michael Grier [MSFT] says:

    Things were somewhat even worse before N. fixed this.  I thought he fixed it in XP but yeah it could be Vista.

    Prior to moving to keyed events (so that no kernel object had to be allocated at all), the contention object could be allocated by the thread releasing the critical section.  Which meant that LeaveCriticalSection could throw the exception and when it did, the CS was corrupt.  There was even verbiage in MSDN at the time advocating putting __try/__except around critical section entry and exit to catch the exceptions but the problem was that there was really nothing reasonable that you could do in light of the exception in the first place.  There's obviously contention (so you can't just hope the problem is theoretical rather than real) and the CS is not usable any further.

    These were dark times.  The world is a much better place now.

  6. Yuhong Bao says:

    "Windows 2000 added the Initialize­Critical­Section­And­Spin­Count function"

    NT4 SP3 did:

    http://ftp.microsoft.com/…/readme.htm

  7. Yuhong Bao says:

    "Windows 2000 added the Initialize­Critical­Section­And­Spin­Count function"

    NT4 SP3 did:

    http://ftp.microsoft.com/…/readme.htm

  8. alegr1 says:

    >the contention object could be allocated by the thread releasing the critical section

    I'm not sure how that would work at all. If there is a contention, a thread already needs an event handle to wait on.

    [This is in the case where the owner of the critical section exits the critical section before the waiting thread can allocate the event handle to wait on. -Raymond]
  9. alegr1 says:

    [This is in the case where the owner of the critical section exits the critical section before the waiting thread can allocate the event handle to wait on. -Raymond]

    I believe this case can be handled by having the contending thread (which successfully allocated and set the event) decrement the contention counter and go back and try to acquire the CS again.

    [Nope, there is still a race. Owner thread: Check for contention. Yes, there is contention. Is the event created? No: Don't set it. ⟨Pre-empted⟩ Contending thread: Create the event, decrement contention count, loop back. Try to enter critical section, cannot. Increment contention count. Wait on event. ⟨thread switch⟨ Owner thread: Decrement contention count, return. Result: Contending thread stuck waiting on event that is not signaled. -Raymond]
  10. Yukkuri says:

    Didn't you know that open source is the solution to every problem?

    Anyway the open source solution to the problem is 'change the implementation, screw the users of software that relied on internal details'.

    There is no hope for anyone that cannot recognize that this strategy is not tenable for Microsoft.

  11. @Yukkuri: that's if you're lucky.  If you're not, the sentence ends at "users".

  12. Neil says:

    Before reading the rest of the comments, my idea for solving the event contention problem would have been to wrap the creation of the event in its own critical section (a global one with a preallocated event; we don't want to contend over creating that critical section's event!) just in case.

  13. Yukkuri says:

    @Harry Johnston — You're not wrong, sadly :p

  14. Yuhong Bao says:

    "In Windows Vista, the internals of the critical section object were rejiggered once again, this time to add convoy resistance"

    I think it was Server 2003 SP1 that did that.

  15. alegr1 says:

    [Nope, there is still a race.-Raymond]

    The idea is to try enter the CS. Check if the CS is currently owned and an event is not allocated. In this case, allocate and set the event, and only then really proceed with incrementing the contention count and waiting on the event if necessary. This also has a nice side effect that raising an exception on CreateEvent failure still leaves the CS in the original consistent state.

  16. Anonymous Coward says:

    @Yukkuri: "willingness to make internal changes that break programs that do dumb stuff" is orthogonal to "amount of people the source code is visible to".

  17. win32s ftw says:

    @Yukkuri:

    Sell the source code to NSA is at least always an option for the greedy. Dunno if it counts a open though.

Comments are closed.

Skip to main content