When there is a long line of people waiting for a shared resource, you want to investigate the person who is hogging the resource, not the people waiting in line for it


If you see a long line of people waiting for a phone booth (note: this analogy assumes you remember how phone booths work), and you want to understand the reason for the long line, do you

  • Go to a person waiting in line and begin your investigation there?
  • Go to the phone booth (and the person inside) and begin your investigation there?

If there is a long line of people waiting for a single resource, a resource that there is not normally a long line for, you would probably look at the person who is using the resource to see if, for example, they are a chatterbox who will be on the phone for an hour, or if the phone is being repaired or is otherwise not working properly.

Similarly, if you find that in your 20-thread program, 17 of them are waiting for a single critical section, then you probably want to investigate the thread that owns the critical section to see whether (and why) it isn't releasing it.

When testing a program, I encountered a hang that occurred after doing X. There are a few threads stuck in LoadLibrary, and about 40 threads stuck here:

ntdll!KiFastSystemCallRet
ntdll!ZwWaitForSingleObject+0xc
ntdll!RtlpWaitForCriticalSection+0x132
ntdll!RtlEnterCriticalSection+0x46
ntdll!_LdrpInitialize+0xf0
ntdll!KiUserApcDispatcher+0x7

Here is one of the threads that is stuck in LoadLibrary [stack trace deleted]. You seem to be one of the people who work on the component that is trying to load the library. Can you investigate why the program is stuck?

This person picked one of the people waiting in line and decided that they were the ones responsible for the problem. But if course, that person waiting in line is just another victim of the person at the head of the line who is hogging the critical section. In this case, the critical section is the infamous loader lock. That it's the loader lock is obvious from the symptoms: What critical section does every thread require when it starts up? What critical section does LoadLibrary require?

You can use the !critsec debugger command to identify the current owner of the loader lock, and then start studying that thread to see what the hold-up is.

Note that I'm not saying that the thread that owns the resource is necessarily the culprit. The problem could be in the resource itself, or it could be in the pattern of usage associated with that resource. But starting your investigation with the owner of the resource is a good bet, because most of the time, the reason for the long wait queue is that the current owner of the resource isn't releasing it.

Comments (16)
  1. Drak says:

    (You misspelled symptoms, Rayomond, in “That it’s the loader lock is obvious from the symptons: “)

    I probably don’t know enough abou tthe subject matter to make any appropriate comments on it, but maybe the person who mailed you didn’t either and thought the thread he sent you might give an indication of why it was waiting for the resource, not giving it a second thought that the reason would (almost?) always be ‘because someone else is using it).

    [Fixed, thanks. -Raymond]
  2. benjamin says:

    You’ll often see this same phenomenon at the self-checkout at grocery stores.

    The chief culprit, I find, are the people that want to purchase vegetables. Instead of printing an easily scannable bar code, the patron needs to use the fallback method of paging through a large list of on-screen items to find the vegetable they’re trying to buy.

    I can never decide if that’s an example of a leaky abstraction (since the machine deals only in bar codes and weights, not ‘carrots’ or ‘oranges’) or a case of backward compatibility hampering advancement.

    Can you tell I spent a lot of time in self-checkout lines?

  3. Brian says:

    Can I execute the !critsec debugger command from within visual studio, or can I only do it from the command line debugger?  

  4. Clovis says:

    "then you probably want to investigate the thread that owns the critical section" – no, I’d immediately write an ill thought out, bile-ridden blog entry explaining how broken Vista was and how this just wouldn’t happen if we used cloud computing for everything ever. And I bet it’s just as broken in Windows 7, whatever it is.

  5. Gabe says:

    benjamin: Is that because you have lots of spare change to get rid of?

  6. Owen S says:

    @benjamin: This is why the scales should be near the veg, rather than at the self checkout line. It saves time at the normal checkouts too, and scales can be hung over the veg in order to increase parallelism.

  7. Mike Dimmick says:

    @Owen S: there usually are such scales, but the customer doesn’t think to use them at the point of choosing the veg, only weighing at the end.

    The industry’s response is to define a more compact barcode symbology – now called DataBar, previously known as Reduced Space Symbology (RSS) – so that each piece of fruit or veg can have a small sticky label attached that indicates its product code. Unfortunately retailers don’t have to have implemented it for another five years.

    The UPC/EAN barcode format is 35 years old, so it’s stood up pretty well. It still works extremely well, even better when the barcodes are printed with the right dimensions, good contrast, and not likely to be damaged. My local supermarket at home has some pasta in shrink-wrap packaging where the seal of the package commonly obscures the barcode; at work, the sandwiches are barcoded but the label material is wrong for the laser printer they’re using and the toner doesn’t stick properly, so the bars fall off – they’re also too small so almost any damage leads to a no-decode.

    For more on DataBar, see http://gs1.org/barcodes/databar

  8. JamesW says:

    @Clovis

    Aiming for the stars?

    ☆☆☆☆☆

  9. Jared says:

    This is obvious…

    "You seem to be one of the people who work on the component that is trying to load the library. Can you investigate why the program is stuck?"

    It means that the guy wants someone else (i.e someone else waiting in line) to do the dirty digging.

  10. Cory Foy says:

    @brian – No, it’s WinDBG only:

    http://msdn.microsoft.com/en-us/library/cc267146.aspx

    The tricky thing is that you need to have an idea of what you are looking for. Like most any command in WinDBG, if you poke around, you’ll find something to say aha to, but it probably isn’t your problem.

  11. Worf says:

    @Mike Dimmick: @benjamin:

    Actually, instead of pawing through the list of fruit and vegetables, on the databar sticker (or the sticker period, if you have fruit without databar), there’s 4-digit number. Surprisingly, it’s the PLU code for that fruit or vegetable!

    (And the bottom barcode reader is also a scale.).

    Save yourself the lookup time and just use those 4 digits and you won’t go wrong. Unless you mix the fruits into one bag…

  12. Scott says:

    You guys need to drop your current jobs and start working on a piece of software that identifies fruits and vegetables. You’ve established that supermarkets would pay you millions for it and I can see you have the passion to succeed.

  13. porter says:

    Interesting, humans have been eating fruit for 4 million years, trading them for money for 4 thousand years and now with all our technology, it’s somehow a problem?

  14. Drak says:

    Gah, and I misspelled your name, and the word ‘the’ :(

  15. Chris J says:

    @Drak – it’s an immutable law of the internet: Anyone correcting another poster’s spelling or grammar error will inevitably make a spelling or grammar error within the correction.

  16. Alexandre Grigoriev says:

    @Drak, Chris J:

    It’s called Muphry’s Law

Comments are closed.

Skip to main content