Why you can’t debug mscorwks when interop-debugging

Interop (aka mixed-mode) debugging allows you to debug both the managed and unmanaged portions of your app. However, even when interop-debugging, you still can’t debug the core runtime itself (most of mscorwks.dll). I briefly here that the key danger is deadlocking. I want to explain more fully in this blog entry.


First, what does it mean to say you can’t debug mscorwks? A thread will never be stopped in the middle of the core portion of mscorwks. For example, any breakpoints in this region will be skipped automatically without notifying the user. A debugger could still show a callstack through mscorwks and inspect data in mscorwks (if it has the private symbols).


Recall that managed debugging requires a helper-thread running in the debuggee. This thread services requests from the debugger to do most all managed debugging operations ranging from callstacks to placing managed breakpoints. This means in order for managed debugging to work, the helper thread must be able to run free and unblocked.


What if you did stop inside the runtime?

What if we did allow a debugger to stop at a native breakpoint inside the runtime, but just refused to execute the helper-thread and thus temporarily disable managed-debugging functionality? In theory, this would work. In practice, it would be a bad user experience because:

         you couldn’t get a managed callstack since the helper-thread can’t run. Thus you wouldn’t be able to see what managed function called into the runtime.

         You’d be stranded at the native breakpoint. You could step within the native code, but without the helper running, you could never step back across the nativeà managed boundary.


In other words, you’d hit the native breakpoint in mscorwks, and then wouldn’t be able to see any managed frames and could only press F5 to continue.


So whenever the debuggee is stopped at any native breakpoint, we need to keep the helper thread running. This means:

1) The helper thread must not block.
2) The data structures the helper thread uses must be in a consistent state.


Background data:

ICorDebug can skip any native breakpoints if it needs to. This is because:

         When interop-debugging. ICorDebug filters all native debug events. Thus a debugger won’t ever see native debug events destined for the CLR (such as a breakpoint exception that’s backing a managed breakpoint)

         In v2.0, an interop-debugger must notify ICorDebug of any native breakpoints it places (via ICorDebugProcess2::SetUnmanagedBreakpoint).



Deadlock dangers

So what happens if you place a native breakpoint on a function that the helper thread hits, such as kernel32!WaitForSingleObject?

1)      The helper thread would hit that breakpoint and then stop and be unavailable to service debugger requests. The helper thread would be blocked waiting for the debugger to continue the process and get it past the breakpoint.

2)      But the debugger would be blocked waiting for the helper thread to service some request (such as a callstack).

So it’s a deadlock. To avoid this, ICorDebug will just skip all breakpoints on the helper thread.:


But what if you place a native breakpoint somewhere else in mscorwks.dll in a region that holds a key lock that the helper thread tries to take? Naively, the helper thread would block and hit the same deadlock as if it hit a breakpoint.


This means in order to keep the helper-thread running free, no thread can be blocked (e.g., stopped at a native breakpoint in mscorwks) while holding any resource that may directly or indirectly block the helper thread.  Also, no threads can be blocked in the middle of modifying a CLR internal data structure which the helper thread may need to do its work.

ICorDebug deals with this by pumping all threads out of any such unsafe region before trying to communicate with the helper thread. (This pumping is called “synchronizing” the debuggee.)

This means that native breakpoints inside that region will get automatically continued without ever notifying the user. That region roughly corresponds to the CLR internal implementation.


It turns out there are random areas of mscorwks outside that region. Thus an interop-debugger could debug some portions of mscorwks.dll. (However, Visual Studio explicitly disallows this to avoid potential confusion).


Random closing notes:

We believe the limitations here have a small impact on the overall .net audience:

         We believe only a small set of people are interested in actually debugging the runtime itself (mainly CLR developers 😉 ), and so having interop-debugging support debugging the runtime was definitely not a mainline scenario. You’d need private CLR symbols to have a meaningful experience here anyways.

         You can always debug the runtime under a native debugger. This is what most CLR devs do.



Comments (12)

  1. Fine post. I have two questions:

    1) As I’m developing a VS add-in, I used to debug devenv.exe in the VS itself (in the managed-only mode). I frequently encounter weird deadlocks involving debugger and debuggee VSs and I have to kill them using Task Manager. Do those deadlocks relate somehow to the problems you describe, or this is smth peculiar to VS?

    2) Will VS debugger in Whidbey be improved to not freeze UI when evaluating watches? Now this happens very often, and it is quite annoying. As far as I understand, evaluating a watch involves communication with the helper thread, why not use timeouts or smth else to prevent freezes? Sorry, I understand that this question should be more likely addressed to VS developers, not to you.



  2. Talking about Debugging the CLR. I’m wondering if it is possible to inspect the CLR in readonly mode. i.e. from a native debugging tool, reading the Process Memory and retriving information like loaded classes and threads etc. one another application I can think of is Heap Profiler.

    I know Java has similar tool.


  3. Mike Stall says:

    Dmitry –

    1) From the CLR’s perspective, VS is just a random application that happens to host the CLR and use various managed interfaces. So the CLR-internal restrictions here should have no impact on VS. Those deadlocks are probably a different issue than this. (Especially because you’re using managed-only and these issues are for interop-only). You should report this as a bug to the VS team.

    2) The func-eval thing is a great question (though also a totally different issue than the ones described here). You’re right about the helper thread. But there are indeed timeouts in place and even a func-eval abort mechanism.

    At the ICorDebug API level, there’s no reason that the UI needs to freeze during a func-eval.

    Questions to help me diagnose this more:

    – How long do you observe the UI to freeze for (a few seconds, or much longer)?

    – Do you know which function is being evaluated?

  4. Mike Stall says:

    Sameer – Yes, it’s possible. The CLR is just a user mode app and can be native-only debugged like any user mode app.

    – Now we don’t publish an API to do this, so tools can’t really take advantage of it.

    – And you’d need private symbols to make any sense of it yourself.

    – However, we do ship a windbg extension (Sos.dll) that has commands to help do this.

    – The profiling APIs (corprof.idl) also expose these inspection capabalities to CLR profilers, so it’s possible for profiling tools.

    FWIW, I very much hope to get rid of the helper thread in V3.0 of the CLR.

  5. Mike, thanks for your response.

    As for freezes during evaluations, they last for approx ten seconds on average for me. I noticed that they are almost definitely happen if I expand a large object in Watches window that has many properties whose evaluation requires Interop (there are plenty of such objects in VS Extensibility API which add-ins have to use). Moreover, these long evaluations are often unsuccessful for no obvious reasons (instead of the value of a property "Expr cannot be evaluated" is displayed).

  6. Mike Stall says:

    Dmitry – what version of VS are you using?

    It sounds the VS is evaluating a func-eval that hangs. I just posted about func-eval problems ( ttp://blogs.msdn.com/jmstall/archive/2005/03/23/400794.aspx

    The UI is hung until the funceval completes (similar to a modal dialog). VS then issues a "func-eval abort" (ICorDebugEval::Abort) after 5 secs. If the UI comes back, sounds like the abort was successful.

    The func-eval is probably some getter which wraps a com-object and does a cross thread call to evaluate. The thread pumping messages is likely suspended and so the eval hangs.

    You’ll likely need to disable property evaluation in VS to get around it.

  7. Mike, I was talking to Jason Shay about SOS extension but it looks like SOS API is not publically available and we can only load it inside NTSD, Windbg etc. Debugging Tools family apps.

    I was actually hoping to P/Invoke SOS.dll and read required information. SOS.dll being such a incredibly powerful tool it must be available to general public.

    any supporters!!! 🙂

  8. Mike Stall says:

    Sorry, you’re out of luck for V2.0. SOS is just an extension dll and not a general API or platform. The reason is that the cost for a general API is signficantly higher than making an isolated tool. We couldn’t foot that bill in V2.0, but we figured it was better to at least release the extension dll than nothing at all.

    I know this is unfortunate. We hope to significantly improve this V3.0.

  9. Interop-debugging (mixed-mode) is managed + native debugging combined. Well, sort of. Native and managed