What does this crash in TppRaiseHandleStatus mean?


A customer found that their program was crashing with this stack:

ntdll!TppRaiseHandleStatus
ntdll!TppSetupNextWait
ntdll!TppWaitCompletion
ntdll!TppWorkerThread
kernel32!BaseThreadInitThunk
ntdll!RtlUserThreadStart

"Since none of our code is on the stack, it is unclear what may have gone wrong here."

A naming convention followed by many teams is to come up with a short prefix for a component and put that prefix on every function in that component. For example, all of the I/O manager functions begin with Io, then a capital letter. Furthermore, functions intended to be called from other components use the unadorned prefix, and functions that are internal to the component add a p (for private) to the prefix. For example, all of the functions internal to the I/O manager begin with Iop, then a capital letter.

(An older convention is to use the lowercase prefix to indicate internal functions. Under the older convention, the internal functions in the I/O manager would begin with io followed by a capital letter.)

Given that information about naming conventions, you can guess that the functions prefixed Tpp are functions that are internal to a component whose prefix is Tp.

And your guess would be correct. This code is from the thread pool.

One of the purposes of the thread pool is to consolidate waits. If one object wants to do something when handle A is signaled, and another object wants to do something when handle B is signaled, one way to do this is to have each object create a thread, and have that thread wait on the corresponding handle. Unfortunately, that results in two threads, and when you have a hundred objects, this results in a hundred threads, and that doesn't scale well. Better would be to put one thread in charge of all the waiting, so that it can use Wait­For­Multiple­Objects. That way, you need only ⌈n ∕ MAXIMUM_WAIT_OBJECTS⌉ threads to wait on n handles.

Okay, so let's see what this stack is telling us. I am not familiar with the thread pool's internals, so this is all educated guesswork. It's educated by the fact that code is not intentionally written to be impossible to understand. For example, if a function is called Tpp­Setup­Next­Wait, it probably sets up the next wait.

The thread pool started its own worker thread with Tpp­Worker­Thread; that makes sense. And then a wait completed, which was handled by Tpp­Wait­Completion. After handling that wait, the thread wants to go back to waiting until the next handle becomes signaled, which is done by the function Tpp­Setup­Next­Wait. But something went wrong, and it needs to raise a status, which is done by the Tpp­Raise­Handle­Status function. And that's where we crashed.

What could go wrong in Tpp­Setup­Next­Wait? The most likely reason in my opinion is that the list of handles it is being asked to wait on is no longer valid, probably because an application closed a handle while it was still registered with the thread pool, so when the thread pool tried to wait on it, it got an ERROR_INVALID_HANDLE error.

The documentation for the Set­Threadpool­Wait function says,

If this handle is closed while the wait is still pending, the function's behavior is undefined.

The customer wrote back that they have another crashing stack that points at their application code, so they have something else to work with to help pin down where the handle went bad.

Comments (6)
  1. Ben says:

    Sorry for the off topic post, but //Build2016 registrations start soon (01/19 IIRC), will you be there Raymond? (better question is there any place that you have on your blog that lists what conferences you’re going to?). On my bucket list is to meet my programming idol and get him to sign his book! I’m willing to go to Build JUST for that!

    1. I have not been invited to //Build. If that changes, I’ll post it.

  2. EduardoS says:

    Initially I tough Tpp were the component prefix…

  3. Joshua says:

    Does it ever disturb you that Microsoft has this knack of appearing blissfully aware of bugs that simple Google searches can find?

    Of course, trying to weed through stuff like this might be part of the cause.

    Yet I wonder what a publicly visible and community verified bug database would turn up.

  4. poizan42 says:

    The “code is not intentionally written to be impossible to understand.” link seems to be broken. Here is a working link: https://blogs.msdn.microsoft.com/oldnewthing/20140131-00/?p=1913

Comments are closed.

Skip to main content