Debugging IL

Managed applications are compiled to IL (Intermediate Language) and then our JIT (Just-In-Time compiler) can compile it to native code so that the CPU can execute it. (The alternative is interpreting the IL directly, which has horrible performance characteristics and is unsupported by the current CLR implementations).

People commonly ask, "Why can't I debug the IL?"..

The answer: The CPU is executing native code and not IL, and any loss of fidelity from IL-->Native will confuse the debugger. For example, suppose you have the source statement “x = y”. That may get compiled to the following IL:

  IL_0004: ldloc.1

  IL_0005: stloc.0

 

That may then get compiled to the following native code (x86 here, if x + y are enregistered).

mov eax, ebx

Thus the 2 IL instructions are represented by only a single native instruction. Thus you could never step over just IL_0004. We deal with this IL-->Native conversion by specifying sequence points, which are groups of IL instructions that the debugger effectively considers to be atomic. The compilers (C#, VB.Net, etc) either explicitly specify Sequence points or tell the JIT to infer sequence points based off certain patterns in the IL. Sequence Points also determine the granularity of the IL-->Native map. More sequence points provide better fidelity, but may restrict the jit's ability to optimize. The PDBs associate sequence points with source-lines, and a good mapping here ensures a sane source-level debugging experience. Thus for source-level debugging, the "sweet spot" is to have just enough sequence points to map the source-lines.

Now if that's not an issue, you can debug the IL ... at least from the ICorDebug perspective. In fact, ICorDebug is abstracted at the level of debugging IL. 

  1. Most ICorDebug functionality operates on the IL level. For example, breakpoints can be set at IL offsets and stackframes can report IL offsets.
  2. The ICorDebug API exposes the IL-codebytes for functions (via ICorDebugCode),
  3. Translating IL codebytes to text is trivial. It's much easier than disassembling x86. The IL encodings are public and I'm guessing that the source to ILDasm is available on rotor, and there's probably lots of IL disassemblers out there.
  4. Our API also exposes the IL <--> Native mapping (also via ICorDebugCode::GetILToNativeMapping). So although the CPU is really executing native code, you can translate back to the IL. In v2.0, these mappings are always available (regardless of ini / config files).

In fact, MDbg can do "IL-debugging" (at least with the right extensions), and internally, we find it very useful for our internal test purposes. Also, you could always use IL-dasm to get the IL from a high-level language (eg, C#) and then re-ILasm to get different sequence points granularity. 
[Update: 11/8/05] As proof of all this, we've added IL-debugging support into the MDbg gui.

Now all that said, Visual Studio does not expose "IL debugging". This is a great example of how the low level API (ICorDebug) has a different perspective than the high-level end-user tool (VS).  From the CLR perspective, the problem is solved - though I recognize that doesn't really help out VS's end users. I think they figure that already being able to debug on both the source-level and native-code level was sufficient; the IL-->Native problem mentioned above would make the feature problematic, and that there wasn't high enough demand / benefit for this feature to justify the costs. I'm curious: how important do you consider the ability to debug at the IL-level?