Simple example of an API design flaw.

Article
08/01/2005

Here’s a simple example of an API design flaw in ICorDebug.

(ICorDebug is the API that Visual Studio / MDbg and other debuggers use to debug managed code).
Here’s the background knowledge:
1) When a thread first executes actual IL (whether jitted or ngenned) code, ICorDebug issues an ICorDebugManagedCallback::CreateThread callback to the debugger. This callback provides the debugger with a ICorDebugThread object which the debugger can then use to inspect the new thread. In other words, when Mdbg gets the CreateThread callback, it can run a managed-only callstack and will see exactly 1 frame.
2) When an thread throws a managed exception, it issues an ICorDebugManagedCallback:: Exception callback. This callback includes a non-null ICorDebugThread object to tell the debugger which thread the exception occurred on. ICorDebug guarantees that the thread object a) is explicitly non-null and b) was provided by a previous CreateThread callback.

Some of you are probably already chuckling at how we painted ourselves into a corner here…
So now what happens when an exception occurs on a managed thread before any IL is run? The debugger will get an Exception on a thread for which there was no CreateThread callback, and it can’t fulfill this without violating the guarantees in #2 above.

Some may ask “but how could that happen?” The real question is “how could that possibly not happen?” After all, there’s a lot that happens between when a native thread is created and when that native thread is actually executing jitted IL code, including:
- loading types (could throw a TypeLoadException)
- jitting the code (that could fail, perhaps from Out-of-memory).
- Corrupt PE image (may throw some sort of InvalidProgram exception).
- Security Checks (may throw some security exception)

This turns out to be one of our most common Watson buckets.

To make matters worse, there’s no way to fix it without changing the interface. I think the ideal solution would be to provide some sort of PreCreateThread notification that fires when the thread is first created but well before it runs any managed code. This effectively weakens the guarantees in #2 and thus allows us to fulfill them.
Other non-breaking solutions include:
1) Fire a managed log message to at least notify the user. LogMessages require a thread object too, but we could actually just pick a random one. (This is the solution we ended up using for V2.0)
2) Fire an MDA (Managed Debug Assistant, which are basically log messages with rich data). MDAs explicitly don’t require an ICorDebugThread object to avoid this same sort of problem. This is a fancier version of the LogMessage solution.
3) Ignore the exception event completely. However, this means that the thread will disappear underneath the user and could be very confusing.

Simple example of an API design flaw.

Additional resources