A high waiter count on a free critical section may indicate a lock convoy


If you're debugging a performance problem in your application, you may run across a critical section in a very strange state: A lot of threads are waiting for it, but nobody owns it!

0:000> !critsec 0x10009C70
CritSec at 0x10009C70
LockCount          37
RecursionCount     0
OwningThread       0

This state means that the previous owner of the critical section has just exited it and signalled a waiting thread to take it, but that thread hasn't yet gotten a chance to run yet. This is normally a transient condition, but if you see it a lot, then you very likely the victim of a lock convoy.

Others have written about lock convoys, so I'm just going to refer you to them to get the details.

Comments (16)
  1. Adrian says:

    Great tip.  Thanks!  As the processors go multi-core, we’re all going to have to get a lot more multi-threading savvy.

  2. Ricardo Montalban says:

    I never have this problem in C#

  3. Anony Moose says:

    Yes, of course, because multi threading synchronisation is definately a language (and maybe runtime) issue and never under any circumstances has anything to do with the application design.

    (I hope everyone’s sarcasm detectors are working today, though I’m kinda hoping that mine is broken and I’m just stating the blindingly obvious.)

    In the real world, it’s impossible to build a locking primitive that can never be misused,

    even if C# has some pretty neat lock classes.

  4. Cooney says:

    After reading Larry’s article, I came up with this model that doesn’t use completion ports (because I’m a unix-head):

    the threadpool is divided into two groups, working and idle (stolen shamelessly)

    when adding a job, lock the pool and allocate an idle thread if possible. Otherwise add it to the queue.

    When a job completes, lock the pool, check for another job in the queue. If available, allocate jobs to idle threads and kick them, else unlock and sleep.

    This should eliminate boxcarring, as most threads are sleeping instead of blocking, and the pool is only locked when adding a job to the queue or ending a job/setting up jobs. It’s a bit more complicated design, so more potential for screwups, but it looks more efficient.

  5. Ben Hutchings says:

    Cooney: On POSIX systems we can avoid some of these cases by using condition variables and pthread_cond_notify. (Windows appears to have a similar facility in PulseEvent, but unfortunately this sometimes fails to wake any waiters.)

  6. Cooney, that’s actually pretty close to the model that NT used before Cutler designed completion ports.  The big advantage completion ports have is that you can trigger them on I/O completion as well as work.  In addition, you still have the boxcar potential, it’s just less likely (you still have to lock and unlock the queue on the active threads, and if you’re queueing a lot of items to the queue you can still get lock convoys).

  7. Cooney says:

    True, but the boxcar problem should be less of a problem, since any threads that queue up will likely acquire the lock, see that they already have work (or that no work is available) and immediately drop the lock. It still trashes the L1 cache, but I would expect (assuming a fifo lock scheme) that it would flush rapidly, since the first one to get the lock doles out all the jobs it can.

    I’ll have to look at pthread_cond_notify and see how that works for this.

    [I think you missed the fundamental point of the lock convoy problem. It’s not how long you hold the lock, it’s how often you ask for it. See Sue Loh’s article. -Raymond]
  8. Cooney says:

    That’s what I was trying to address by making the default state sleep (or pthread_cond_wait on your own cond variable). I’ve still got a situation where the mutex gets hit a lot, but I might be able to eliminate a fair amount of that with a check of the job queue size against 0, or simply sleep if the mutex is busy.

  9. Tal says:

    Another scenario when owner=0, is when thread A calls lock, but thread B calls release. We observed this scenario in an old ReadWriteLock implementation.

    (See http://groups.google.com/groups/search?hl=en&q=RtlpNotOwnerCriticalSection)

  10. Ema says:

    Maybe use semaphore primitive to sync threads?

    Just 1 consumer thread should be able to execute push, after the pop has been executed from the producer thread?

  11. Ema says:

    Ehhmmm don’t get confused by my english errors.

    I meant:

    1) Producer(s) execute push after putting element in queue (use CS or mutexes for the list).

    2) Consumer(s) wait trying to pop on the semaphore (use CS or mutexes for the list).

  12. Joe Duffy says:

    Fwiw, Server 2003 SP1 and Vista eliminate fairness in the locks in an attempt to reduce convoys.  I jotted down a tiny essay at: http://www.bluebytesoftware.com/blog/PermaLink,guid,e40c2675-43a3-410f-8f85-616ef7b031aa.aspx.

    –joe

  13. Old Fart says:

    Grumble, this isn’t a "lock convoy" or "boxcar problem" (the latter from Larry Osterman’s blog), this is the thundering herd problem, which has been around for at least two decades (it predates all of Win32 by some years).  I wish people wouldn’t invent new, incompatible names for this… if you search for "thundering herd" you’ll find plentiful discussion of the problem and various solutions.

  14. GregM says:

    "The thundering herd problem is an issue that occurs when a number of processes waiting for an event scramble en masse when the event occurs." – http://en.wikipedia.org/wiki/Thundering_herd_problem

    Larry Osterman’s post describes something that sounds a lot like thundering herd, but Sue Loh describes a very different problem.  It may just be a difference in the way that Larry describes it, or Larry and Sue may be describing totally different issues.

  15. Gabe says:

    A thundering herd would be when a number of threads are waiting on an event, and when that events gets signalled they all get released and fight over the same resources.

    In this case a number of threads are waiting on a mutex, and only a single thread gets released at a time.

    What’s happening in Larry’s post is the thundering herd results in a lock convoy.

Comments are closed.

Skip to main content