Why you can’t debug mscorwks when interop-debugging

Interop (aka mixed-mode) debugging allows you to debug both the managed and unmanaged portions of your app. However, even when interop-debugging, you still can’t debug the core runtime itself (most of mscorwks.dll). I briefly here that the key danger is deadlocking. I want to explain more fully in this blog entry.

 

First, what does it mean to say you can’t debug mscorwks? A thread will never be stopped in the middle of the core portion of mscorwks. For example, any breakpoints in this region will be skipped automatically without notifying the user. A debugger could still show a callstack through mscorwks and inspect data in mscorwks (if it has the private symbols).

 

Recall that managed debugging requires a helper-thread running in the debuggee. This thread services requests from the debugger to do most all managed debugging operations ranging from callstacks to placing managed breakpoints. This means in order for managed debugging to work, the helper thread must be able to run free and unblocked.

 

What if you did stop inside the runtime?

What if we did allow a debugger to stop at a native breakpoint inside the runtime, but just refused to execute the helper-thread and thus temporarily disable managed-debugging functionality? In theory, this would work. In practice, it would be a bad user experience because:

- you couldn’t get a managed callstack since the helper-thread can’t run. Thus you wouldn’t be able to see what managed function called into the runtime.

- You’d be stranded at the native breakpoint. You could step within the native code, but without the helper running, you could never step back across the nativeà managed boundary.

 

In other words, you’d hit the native breakpoint in mscorwks, and then wouldn’t be able to see any managed frames and could only press F5 to continue.

 

So whenever the debuggee is stopped at any native breakpoint, we need to keep the helper thread running. This means:

1) The helper thread must not block.
2) The data structures the helper thread uses must be in a consistent state.

 

Background data:

ICorDebug can skip any native breakpoints if it needs to. This is because:

- When interop-debugging. ICorDebug filters all native debug events. Thus a debugger won’t ever see native debug events destined for the CLR (such as a breakpoint exception that’s backing a managed breakpoint)

- In v2.0, an interop-debugger must notify ICorDebug of any native breakpoints it places (via ICorDebugProcess2::SetUnmanagedBreakpoint).

 

 

Deadlock dangers

So what happens if you place a native breakpoint on a function that the helper thread hits, such as kernel32!WaitForSingleObject?

1) The helper thread would hit that breakpoint and then stop and be unavailable to service debugger requests. The helper thread would be blocked waiting for the debugger to continue the process and get it past the breakpoint.

2) But the debugger would be blocked waiting for the helper thread to service some request (such as a callstack).

So it’s a deadlock. To avoid this, ICorDebug will just skip all breakpoints on the helper thread.:

 

But what if you place a native breakpoint somewhere else in mscorwks.dll in a region that holds a key lock that the helper thread tries to take? Naively, the helper thread would block and hit the same deadlock as if it hit a breakpoint.

 

This means in order to keep the helper-thread running free, no thread can be blocked (e.g., stopped at a native breakpoint in mscorwks) while holding any resource that may directly or indirectly block the helper thread.  Also, no threads can be blocked in the middle of modifying a CLR internal data structure which the helper thread may need to do its work.

ICorDebug deals with this by pumping all threads out of any such unsafe region before trying to communicate with the helper thread. (This pumping is called “synchronizing” the debuggee.)

This means that native breakpoints inside that region will get automatically continued without ever notifying the user. That region roughly corresponds to the CLR internal implementation.

 

It turns out there are random areas of mscorwks outside that region. Thus an interop-debugger could debug some portions of mscorwks.dll. (However, Visual Studio explicitly disallows this to avoid potential confusion).

 

Random closing notes:

We believe the limitations here have a small impact on the overall .net audience:

- We believe only a small set of people are interested in actually debugging the runtime itself (mainly CLR developers ;) ), and so having interop-debugging support debugging the runtime was definitely not a mainline scenario. You’d need private CLR symbols to have a meaningful experience here anyways.

- You can always debug the runtime under a native debugger. This is what most CLR devs do.