Knowing just enough about debugging IRPs to chase the problem out of the I/O stack


One of my colleagues was running a tool that wound up stuck on Flush­File­Buffers. Since this was a hang in the I/O stack, a kernel dump is more useful.

I used the !irp debugger command to look at the I/O request that got stuck:

0: kd> !irp 0xffffab0c`fced9340 1

Irp is active with 2 stacks 2 is current (= 0xffffab0cfced9458)
 No Mdl: No System Buffer: Thread ffffab0d15731080:  Irp stack trace.
     cmd  flg cl Device   File     Completion-Context
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000

                        Args: 00000000 00000000 00000000 00000000
>[IRP_MJ_FLUSH_BUFFERS(9), N/A(0)]
            0  1 ffffab0cdf855060 ffffab0ce2c6eef0 00000000-00000000    pending
               \FileSystem\Npfs
                        Args: 00000000 00000000 00000000 00000000

I don't know what any of this means, but somebody else did.

The file system is Npfs, which is the "named pipe" file system. This means that the code is trying to flush a named pipe, and the process on the other end of the pipe is not responding.

With the help of debugger documentation I dumped the file object:

0: kd> !fileobj ffffab0ce2c6eef0

\contoso44268

Device Object: 0xffffab0cdf855060   \FileSystem\Npfs
Vpb is NULL

Flags:  0x40082
        Synchronous IO
        Named Pipe
        Handle Created

File Object is currently busy and has 1 waiters.

FsContext: 0xffffe30b60eefe70   FsContext2: 0xffffe30b23b593d3
Private Cache Map: 0x00000001
CurrentByteOffset: 0

I don't know what any of this means either, but the name of the named pipe is apparently contoso44268.

We provided this information to the owner of the tool, and they recognized it as a named pipe they use to communicate between the tool and a helper process, and the helper process in turn satisfies the pipe request by contacting a Web service.

The owner of the tool requested some diagnostic logs to figure out why the named pipe got stuck. But that's not the point of the story today. The point here is just being able to chase the stuck I/O out of kernel mode back into an application so the forward progress can be made.

Bonus reading: More on debugging the I/O system:

Comments (7)

  1. Brian_EE says:

    “… by contacting a Web service.”
    Addendum: “And that process hung when the web service failed to respond, because the (forgotten) computer under someone’s desk got unplugged as part of an office redecorating project.” https://blogs.msdn.microsoft.com/oldnewthing/20090114-00/?p=19493/

  2. DWalker07 says:

    Hmmm.. A link to MJ? Interesting.

  3. IanBoyd says:

    There’s a lot of chewing-gum code out there for how to launch a process and capture its command-line output. You create a pipe, you supply the writable end of the pipe to the child process using the `StartupInfo.hStdOutput` parameter of **CreateProcess**. The chewing gum code fails when they try wait for the child process to end by waiting for the child process handle to become signaled (e.g. MsgWaitFor…).

    The problem is that if you wait, the child process will **never** signal:

    – because the child process can never close
    – because it’s stalled writing to the pipe
    – because the pipe buffer is full
    – because you’re not reading from your readable end of the pipe
    – because you’re waiting for the child process to close

    **Deadlock**

    The proper solution is to keep reading from the pipe in a loop. When the child process terminates, it’s writeable end of the pipe will close with it. That will cause **ReadFile** to return false, and **GetLastError** will return `ERROR_BROKEN_PIPE` *(“The pipe has been ended.”)*.

    Be sure to keep reading from the pipe in a loop until **ReadFile** fails:

    //Read will return when the buffer is full, or if the pipe on the other end has been broken
    while (ReadFile(stdOutRead, buffer, bufferSize, &bytesRead, null)
    runningText = runningText + Copy(buffer, 1, bytesRead);

    //ReadFile will either tell us that the pipe has closed, or give us an error
    DWORD le = GetLastError;
    if (le != ERROR_BROKEN_PIPE) //”The pipe has been ended.”
    RaiseLastOSError(le);

    And an important point is that once you hand your child process the writable end of the pipe, you must call **CloseHandle** on your reference to the writeable end of the pipe; that way it will signal when the child process terminates.

    **tl;dr:** Stop waiting for the child process to end before you read from the pipe.

    1. Zan Lynx' says:

      And the worst bit about what you describe is that some data will fit into the pipe buffer without being read. Which means that small test outputs will appear to work but longer outputs will hang.

    2. Joshua says:

      I got really annoyed that ReadFile can fail with ERROR_BROKEN_PIPE. My helper libraries change that to a successful read of length 0 now. (Hint: EOF) Of course I have to do something a little different if I want stdout and stderr separate (waaay too much code in any case).

      1. Harry Johnston says:

        What does the helper do if it actually gets a successful read of length zero? For a named pipe, that’s legal: it just means that the process at the other end of the pipe performed a null write.

  4. These are by far the most useful articles I read on your blog! I’ve used them to compile a cheat sheet for debugging hangs and difficult crashes. Sadly, even when I eventually track down the culprits, the owners of the code rarely bother, but it’s worth knowing anyway.

Skip to main content