Why is managed debugging different than native-debugging?

People ask “why can’t a native debugger debug managed code?”.

The reason is that the CLR provides a lot of cool services beyond what you get in a typical native C++ app, such as: running on a Virtual Machine / JITing, Dynamic class layout, the type-system, garbage-collection, reflection-emit, and more. Each of these imposes special challenges on a debugger. Put another way, a native app that did all these things would not be at all debuggable with conventional native debuggers.

I’ll explore the impact of  these things on a 10k foot level. (I’m intentionally omitting tons of detail for brevity’s sake. Maybe I should fill in the details and just convert this to an article instead of a blog entry?)

1)      Native debugging can be abstracted at the hardware level but managed debugging needs to be abstracted at the IL level. Managed code can not just be shoehorned into C/C++ native debugging paradigms. One reason is this may restrict the CLR’s options for executing the IL. For example, although it currently (as of v2.0) JITs IL, we’d like to leave the door open for things like interpreting the IL, pitching rarely used jitted code, or even rejitting code. If ICorDebug used native code offsets for everything, it would be unable to debug interpreted IL.   

2)      Managed debugging needs a lot of information not available until runtime. With managed code, the compilers only produce IL and the real debugging information is not resolved until runtime. For example, the JIT will compile IL to native code at runtime, and the loader will dynamically determine most class layout at runtime. The type-system may create new types at runtime (from reflection-emit or from System.Activator). For native code, this is all determined at compile time. A managed debugger needs some way to get all this information at runtime. Some solutions include:

a.       Have the CLR create auxiliary PDBs at runtime as the information is determined. This could be a huge perf hit, and we’d hate to do it if the debugger is not attached. But if we don’t do it when a debugger is not attached, it may not be available if a debugger does attach later.

b.      Have the managed debugger inspect the pertinent CLR data structures (either directly from out-of-process or via a “helper” thread running in-process). A big caveat here is ensuring the debugger doesn’t request such information when the CLR data structures are in an inconsistent state. The CLR currently uses a helper thread.

3)      A managed debugger needs to coordinate with the Garbage Collector (GC).  The CLR has a mark-sweep-compact GC.  This means the GC will move objects to defragment the heap and update all references (“gc roots”) throughout the process accordingly. This impacts debugging in several ways:

a.       The debuggee is temporarily in an inconsistent state during the GC. The debugger must coordinate with the GC to ensure that it doesn’t inspect the debuggee during this window. 

b.      Debuggers can let users change the values of variables. This updating must be coordinated with the GC’s updating.

c.       There’s no convenient object identity. In native code, the raw pointer value to an object uniquely identifies that object since objects don’t get moved around.

Comments (11)

  1. Pavel Lebedinsky says:

    Now if somebody could also explain why managed console debugger commands are so different from the native ones (cordbg vs cdb).

    In cordbg, the most commonly used cdb command (‘k’, which prints a stack trace) kills the process… that’s just cruel.

  2. Pavel, take a look at MDbg at http://blogs.msdn.com/jmstall/archive/2004/09/30/236281.aspx. There’s another blog entry where they discuss possibly replacing cordbg with MDbg and have a cordbg compatibility mode.

  3. Mike Stall says:

    Pavel – there’s not a good reason. I think Cordbg’s original author (my former boss), had never used CDB and so was unaware of the CDB commands. Furthemore, Cordbg was originally written as a test shell to internally test the ICorDebug API. I don’t think we ever actually intended for anybody to use it for real purposes, so there wasn’t much emphasis on making the commands intelligent and consistent! By the time Cordbg did become important, backwards compat had already locked us in.

  4. Mike Stall says:

    Pavel – I also want to emphasize that there are 2 disjoint things here:

    1) A debugging API (ICorDebug for managed code, win32 has one for native code).

    2) The debugger, which is just an application that consumes that API (Cordbg, Visual Studio, MDbg, Cdb / ntsd / Windbg, etc).

    This entry was just focused on the design constraints of a managed debugging API, and makes no comments about the actual debuggers.

    The names of commands is entirely up to the debugger. You could write your own debugger (eg, PavelDbg.exe) that consumes ICorDebug and calls the commands anything you want.

  5. Pavel Lebedinsky says:

    I would try MDbg if it was working on 1.1.

    By the way, I believe that WinDbg can debug 2.0 managed code directly (without extensions like sos.dll). Is it using ICorDbg to do that? If so, shouldn’t cdb be the proper replacement for cordbg?

  6. Mike Stall says:

    Eventually, I think we’d all like windbg / cdb / ntsd to be able to debug managed code directly. You’re totally right – when that day comes, Cordbg will truly be useless.

    At this point, I can’t yet comment about what Windbg’s actual managed debugging capabilities will be when we ship v2.0 CLR. (I’ll try to get back to this when things get declassified)

    I can say that neither Windbg nor SOS use ICorDebug. Given the current architectures, this would be completely impossible.

    The issues I brought up in this blog entry will definitely affect any attempt to extend Windbg to include managed debugging. For example, SOS can take a stacktrace because it reads CLR data structures from out of process. Now if those data structures were in an inconsistent state and then you tried to inpsect the stack, things would likely fail. This could very easily happen in a multi-threaded debuggee: thread #1 hits a breakpoint while thread #2 is in the middle of updating the data structures. SOS does not handle this.

  7. "Maybe I should fill in the details and just convert this to an article instead of a blog entry?)"

    Or just divide them up into multiple blog entries…? 😉

  8. smidgeonsoft says:

    If a native debugger is able to inject a profiler into the managed application, and, then, collect JIT notifications, the native debugger then has full symbol support for managed code (as I am sure you know). This creates the possibility of adding breakpoints, etc. With the addition of knowledge of some internal structures in the CLR, one can also gain access to .NET GC objects. (I wish the specs to SOS would be made available — it would make things a lot easier.) This is how PEBrowse Interactive (a native debugger) is able to step from managed-to-unmanaged code with full symbol support.

  9. Mike Stall says:

    smidgeonsoft (Russell?) – using the profiling API to provide additional information is a very clever + interesting idea (I meant to reply to your earlier post where you alluded to this). It’s one way to get key information and get around some of the limitations of interop-debugging.

    For the record, we really didn’t intend it for this use and it does have a few shortcomings over the ICorDebug:

    – It still leaves many of the issues above open. For eg, it’s possible that you’ll hit a breakpoint on 1 thread and then fail to get a callstack because another thread has key CLR structures in an intermediate state.

    – There is some very cool debugger functionality not available through the profiling API (like Edit-and-Continue, Just-my-code, Set-Next-Statement, and func-eval). Though I can see a low-level debugger wouldn’t care about this.

    – Profilers can’t yet attach to a process.

    – Profilers have a very different versioning bar than debuggers. A v2.0 debugger can debug both a v1.0 and a v2.0 app; a v2.0 profiler can only profiler v2.0 apps (since it can’t be loaded into a v1.0 process).

    FWIW, I’m actually in the middle of drafting a blog entry comparing the debugging vs. profiling API.

  10. You can’t get a full-mixed (both managed+ native) stack of a thread within your own process. You can…