You can’t leave a critical section you never entered

If you call LeaveCriticalSection on a critical section you never entered, the behavior is undefined.

Critical sections are an extremly high-traffic code path. Intense work goes into making them as fast as possible. Customers like to ask questions like "Exactly how many cycles does it take to enter a critical section? We're not going to use them if they're too slow." They don't say what they are going to do if critical sections are too slow for them, but the threat is real (even if the justification is bogus). I've seen programmers respond to code feedback of the form "You need a critical section here" with "No, I'm not going to bother. I'm afraid it'll slow down my code."

Given that critical sections are so heavily used, the algorithm gets tweaked regularly in order to improve performance. Sometimes the tweaks are minor; other times, the algorithm gets a major overhaul. Of course, the hope is that by changing only the insides, nobody will notice. On the other hand, people who relied on undefined behavior (like exiting a critical section they never entered and hoping that something meaningful would happen) are going to see changes in behavior.

I don't know the precise reasons why the internals of critical sections changed, but I suspect it had to do with mitigating the effect of lock convoy.

Comments (11)
  1. John says:

    Chuck Norris can leave a critical section he never entered, but he can also divide by zero so I guess it’s not THAT impressive.

  2. Ben Voigt [C++ MVP] says:

    For these kind of questions, it’s Jon Skeet who is given supernatural powers.

    For example, "Jon Skeet never leaves a critical section, the critical section goes wherever he is."

  3. Karellen says:

    Bruce Schneier can enter a critical section in 0 cycles. But as no-one has ever dared to enter into a race condition against him, it is unknown if he bothers to use them.

  4. tb says:


    Starting with Windows Server 2003 SP1

  5. Gabe says:

    OK, it sounds like the problem is that fair locking requires that the next thread waiting be given the lock, then wait to be scheduled. This can cause threads that try to acquire locks more frequently than a context switch can take place to end up in live-lock. The solution to this is to allow threads that ask for a lock after a waiting thread has been given it, but before it has had a chance to execute, to steal said lock. Right?

    And so the lesson is that solving problems like that could require changes to critical sections such that undefined behavior may be different than before. Right?

    However, solving such a problem is critical to Windows scaling to many-core CPUs, so we can’t rely on such undefined behavior to remain unchanged. Right?

  6. configurator says:

    When was this change made? I assume between versions of Windows; in that case, between which versions?

    I don’t suppose anyone relied on this behavior intentionally; there’s probably a function that does something and exits a critical section – and they called it even when not in a critical section due to laziness. The bug exists in the software, but is *completely undetectable*. It is understandable how that bug could be introduced by mistake; that is why, if you don’t allow something to happen, you should actually not allow it to happen in the first place.

  7. tb says:

    Perhaps my googling (or binging) skills need to be refined (or I just need to wait for the latest edition of Windows Internals to arrive in the post), but I couldn’t readily find the answer to my query… exactly how fair/unfair are mutexes, critical sections, et. al. in Vista and Server 2008 (and even Win7)? Is there any mechanism in place to prevent absolute starvation?

    I saw an interview done a short time ago with Mark Russinovich where he talked about dealing with hot locks and reducing contention in the kernel (for Win7 I believe) to make it scalable to something like 256 cores, but he didn’t go into implementation details.

    Was anything like the following ever considered?

    Take the existing Vista/Server 2003 sp1 algorithm (non-FIFO/unfair) and add a second queue, SHORTLIST. Threads are only added to SHORTLIST if a dequeued thread wakes up to find that its lock was taken by a new thread during the context switch. Then (designer’s choice) either:

    A) treat SHORTLIST as strictly FIFO (new threads can take the lock before threads in the primary queue ONLY if SHORTLIST is empty – essentially making SHORTLIST a single-item "on deck" position); or

    B) add a member to the QueueItem structure that is incremented each time a thread is queued onto SHORTLIST (or holds the total number of ticks the thread has been waiting), and if that value has reached a certain threshold when a thread is dequeued from SHORTLIST, then it basically goes into an "on deck" position where its place in line for the lock can no longer be pre-empted.

    My first thought was something like (A), but quickly realized that a convoy could still form due to the second wake-up time. Though, I don’t know if the additional overhead of (B) would have a significant enough performance cost such that the trade-off between a convoy vs. total starvation would be infeasible. i.e. the number of threads per time-frame required to hit a single critical section at a rate high enough to cause absolute or nearly-absolute starvation (using the existing unfair algorithm) either* suggests an additional synchronizer at a higher level in the application (like a threadpool with a max size appropriate to the workload), or describes a system that will not exist any time in the near future (ex: 1024-core Larrabee-cousin descendant).

    *assuming, of course, that the application/system is well written and doesn’t do silly things like Sleep() while holding the lock.

    Should I just wait for Windows Internals 5th Edition to show up?

    Am I answering my own questions? LOL

  8. Euro says:

    Coincidentally, This weekend I witnessed a real life case that bears some resemblance to this scenario (although not quite the same).

    At a popular burger vendor at the Nats’ ballpark, customers were buying their burgers and were given receipts with numbers — standard stuff. What’s interesting is that the burgers were completed in batches of about 6 to 8 and sent together in a tray to the front counter.

    With half a dozen people in front of her ready for their orders, the attendant was spending more time trying to find the "next" consecutive number among the burgers than actually handing the burgers out. In the interest of "fairness", the whole process took a lot longer than if she had simply picked up the orders randomly (or in "tray order") from the tray. I seriously doubt anybody would have been upset.

    The cause was not the same ("thread sorting" is not the same as "thread scheduling"), but the effect was similar: the process of setting up the "next fair one" took a substantial amount of time compared to just picking one that was "ready now".

  9. Alexandre Grigoriev says:


    My opinion is, that it doesn’t make sense to *guarantee* 100% fairness. There are always some corner cases that the guarantee may be very expensive. What makes sense is to make it fair most of the time (like wait implementation in Windows does). Anyway, as soon as you have a multithreaded (and even more, multiprocessor) system, the concept of strict ordering between threads doesn’t exist amymore. 100% fairness is then pointless.

  10. Euro says:


    Agree – no arguments from me. Did it sound like I was supporting ‘fairness at all costs’? Maybe I need to clarify this one sentence:

    "… I seriously doubt anybody would have been upset [if she had just handed out the burgers out of sequence, thereby making the overall process faster at the expense of ‘fairness’]."

  11. Ken Hagan says:

    ‘I’ve seen programmers respond to code feedback of the form "You need a critical section here" with "No, I’m not going to bother. I’m afraid it’ll slow down my code."’

    To which the obvious response is, "Well if you don’t care about actually working properly, then you can make it run *much* faster by omitting the rest of the code as well.". Still, I’ve got to admire their gall in so openly espousing such reckless disregard for their jobs.

Comments are closed.