The CLR is a cross-language platform, so it follows the CLR-debugging services are also cross-language. This means any 3rd-party can write their own managed debugger, and that can debug any managed app produced by any 3rd-party compiler. (This holds true provided we’re talking about debugger-only features and not things like Edit-And-Continue which span the whole IDE.)
That’s some great flexibility! In this entry, I’ll explore the significance and caveats of that.
Debugging at the IL level.
The CLR debugging API (ICorDebug) is abstracted at the IL level, which means it can debug anything that compiles to IL (including dynamically generated code). This also means the debugger is abstracted from the details of how the IL is actually executing, such as whether its jitted or interpreted. (In fact, a 3rd-party could in theory leverage existing managed debuggers to debug their own platform by implementing their own ICorDebug, which I briefly discuss here)
The key qualities of IL-debuggability are:
1) Source-level mappings: IL ranges can be annotated with sequence points which map back to source ranges (defined as a filename, starting line, starting column, ending line, and ending column). This mapping is stored in the pdb and accessed via the symbol store interfaces.
2) A traditional LIFO callstack for function calls. IL has specific call instructions, and ICorDebug can view the callstack of functions (see ICorDebugThread)
3) Each IL function has a list of local variables and parameters. Parameter names are stored in the metadata, while local variable names are stored in the pdbs. The IL evaluation stack itself is not available via the debugging API.
4) Raw object inspection. ICorDebug will let you see the raw layout of objects, such as the fields in a class. It has no knowledge to provide higher level abstractions. For example, ICorDebug would show you the raw guts of a hash table (such as the buckets, chains, etc) and then debuggers would have to implement additional logic (such as Visualizers) to provide a pretty enumeration.
Basically, a naïve managed debugger can only see what you see in ildasm (note that ildasm will load symbols if available).
Codegen + Debugging
This means the closer the language maps to the IL constructs above, the more debuggable it will be by default. For example, V1.0 C# directly translates to IL (in the same way that C translates to assembly), so it’s very debuggable even by naïve debuggers with no C#-specific knowledge (e.g., MDbg).
However, if the language uses constructs that don’t translate well, it may appear be confusing under a debugger. For example, Jim Hugunin was explaining to me that Python allows you to programmatically access local variables. Now Reflection lets you access parameters and callstacks, but not local variables. So one way to codegen this is to put each function’s local variables in a special wrapper class (like what C# does for anonymous delegates), and then the python codegen can crack this wrapper and use that to get the locals programmatically. This works great, except at the IL level, the only local is the single wrapper class and so a naïve debugger won’t realize to expand that to get the “real” source-level locals.
A compiler may even want to consider debugabbility when picking a particular code-gen pattern. To the compiler writers out there: If two different codegen patterns appear equally valid, pick the one that’s more debuggable. This may be as simple as picking intelligent names for compiler-generated functions (such as an anonymous delegate in C#). Or it may mean wildly different codegen under debug builds vs. release builds.
Specific debuggers may provide additional functionality (‘sugar’) on top of the raw clr debugging services to smooth things over for an end-user. The pro here is that such debugger services can be extremely customizable and uninhibited by CLR ship cycles. The con is that unlike ICorDebug, any debugger-specific work likely can’t be shared across debuggers. For example, Visual Studio add-ins can’t be used in Windbg or MDbg. Cross-debugger reusability is an incentive to find a way to codegen things in a debugger-friendly fashion when possible.
Such debugger-specific functionality may include:
1) A “language service” which can abstract language-specific idioms and may provide:
o higher level abstractions on top of low-level debugging APIs. This is especially valuable for presenting concepts not directly represented in the raw IL. To reiterate the example above, a debugger may have special knowledge of how to display certain common collection classes (such as a hash table) in a pretty user-friendly form rather than the raw underlying data. Another example is that the RegEx class compiles regular expressions to IL, but a regular expression state machine doesn’t map nicely back to the IL constructs mentioned above. Thus a “regular expression language-service”
o language-specific insight for things that don’t compile to IL very well. For example, a language service for the python example above would know how to extract the source-level local variables from a frame.
o language-specific formatting and expression evaluation. For example, C++ formats things like “class::function”, wheras C# formats as “class.function”.
Cordbg / MDbg don’t have any language services.
2) A Debugger extensibility model. MDbg, Visual Studio, and Windbg (and likely every other major debugger out there) all have their own extensibility models. These extensions may be very primitive or very sophisticated, such as VS’s visualizers.
3) Pseudo-standard protocols. There are some informal ‘protocols’ on top of ICorDebug that are reasonable enough that multiple debuggers can implement them. For example:
o performing function evaluation on ToString() or properties (though have an option to disable this, since func-eval can be evil).
o display the Message property from an exception object when an exception is thrown.
o Using the DebuggerNonUserCode attribute for Just-my-code debugging.
o A whole bunch of additional attributes for advising a debugger how to display an object in an inspection window.
As the CLR looks at debugging more strange languages that don’t map so nicely to IL, we’ll need to explore if there are other psuedo-protocols that we should advertise. Leave a comment if you have any ideas / suggestions.
Debugging things that don’t compile to IL?
So it’s great that the clr debugging services can work wonders debugging IL. But what if you want to debug something that’s not IL? For example, what if you want to debug an arbitrary table-based state machine? (Martin Platt brought up the example of debugging a yacc-like state machine here).
Steve Steiner and I talked about this and had the following conclusions:
1) Ideally, if you can somehow compile the machine to IL, then do it and you’re golden! However, I’d guess if you could do this at compile time, you wouldn’t be using a state machine in the first place! Also, if you don’t normally compile the state machine, you won’t want to do it when debugging since that’s a radical behavior change and debuggers shouldn’t change behavior. That aside, you could do it using dynamically generated code. Unfortunately, if you use light-weight codegen, you can’t debug it. And if you use traditional reflection-emit, you need to put it in its own appdomain to unload it when you’re done.
2) Else consider a debugger extension to provide a tool window. For example, a VS add-in could inspect the state machine in the debuggee, and then output high-level information in a tool window.
3) create an auxiliary IL code snippet to just track sequence points. The state machine itself wouldn’t be compiled to IL, but it would create an auxiliary IL snippet which could have sequence points that map the state machine back to some psuedo-source (such as the yacc-input file). The state machine would then call into the IL snippets at stopping points. I’m still thinking about exactly what this would look like. If I can come up with something more specific, you can be sure that I’ll blog about it.