The world’s slowest RET instruction


Occasionally, somebody will ask

I'm debugging a hang, and I see that many threads are stuck at a RET instruction. When I try to trace one instruction from that thread, the trace breakpoint never fires. It's as if the RET instruction itself is wedged! I've found the world's slowest RET instruction.

(A common variation on this theme is that the thread in question is consuming 100% CPU... on a RET instruction?)

Because what you see in that RET instruction is a thread that is executing in kernel mode. The kernel parked the user-mode side of the thread at a RET instruction, poised to execute once the kernel-mode side has returned. Which it hasn't yet.

In order to see what is really going on with that thread, you have to drop into the kernel debugger. You might be able to make some educated guesses (also known as "invoke psychic powers") based on what you can still see on the user-mode side. For example, the RET could be returning back to a WaitForSingleObject call, which tells you that whatever this thread is waiting for hasn't happened yet.

[While Raymond was on vacation, the autopilot stopped working due to a power outage. This entry has been backdated.]

Comments (4)
  1. schwiet says:

    And to think to think of all the time I spent optimizing away the ‘ret’s by making my funtions extra long and convoluted… doh!

  2. Neil says:

    Great. Now all I need to know is how to kernel debug a machine without a COM port which only hangs when display power saving is active…

  3. 1394 says:

    …… AAARRRRGGGHHHHH.

    I wanted to submit this as

    Name: 1394

    Comments:

    i.e., no COM. But your blog wouldn’t allow it. Comments are REQUIRED. Spoils the pun.

  4. BryanK says:

    Ran into this today — SQL server 2000 SP4 logged an error 17883 (UMS context <whatever> appears to be non-yielding, on scheduler <whatever>) while the server was having a bunch of other issues. (TCP connections were timing out, local file access was being either really slow or failing altogether, etc., etc. — I now suspect the hardware; this server is several years old.) When I first saw the error, I suspected SQL, but see below.

    When SQL 2000 logs this error, it also writes out a mini-dump file. I loaded this file into windbg, and it appeared that the OS thread that was running the SQL fiber that didn’t yield was actually sitting at a "ret" instruction inside ZwReadFile in kernel32. I eventually remembered this posting, and figured out that SQL probably wasn’t the cause, just another symptom of whatever the problem was. That leaves hardware as probably the only other option, I think.

    The problem did clear up a couple minutes later, not that that’s any relief if it happens again… but oh well.

Comments are closed.

Skip to main content