3rd-parties and Edit-and-Continue (Part 1: Editors + Compilers)

I’ve said before that any 3rd party debuggers can add Edit-And-Continue (EnC) support, and now I want to be very clear exactly what that means.

I’ve found people mean two different questions here:

1)      Managed debugging is language neutral, so how can I have an existing managed debugger (like Visual Studio) add EnC support for my new language?  This is completely dependent on the particular debugger’s extensibility model.  My understanding is that VS has a great extensibility model that could allow this (for example, look at all the cool stuff Don Syme is doing with F# and VS). However, I’m mostly ignorant about advanced VS usage and can’t really comment on it here.

2)      How can I write my own EnC-aware IDE using the underlying APIs (like ICorDebug, metadata, etc) ? I’ll focus on the second question in this blog entry.

 

In short, the CLR has no private interfaces with Visual Studio, and so because VS can do EnC, any 3rd-party IDE (Integrated Development Environment = Editor + Debugger + Compiler) can do it too.

 

That said, there are 2 main follow up points:

1)      Where is the documentation?  Although MSDN does not yet have formal EnC documentation, the idl files (for CorDebug.idl and CorSym.idl) containing EnC interfaces are public. I’d also expect the changes to the metadata show up in the next release of the v2.0 metadata specifications (eg, along with other changes like for generics). The MDbg sample also demonstrate some EnC functionality from the debugger end. I also hope (but certainly can’t promise) that we will get out a sample demonstrating a compiler (like ilasm) emitting EnC info. Further supplementary information may also show up on Microsoft newsgroups and blogs (like this one).

2)      How much work is it? A lot. The big caveat is that in addition to debugger changes, you may need to write your own editor and compiler too. And then there’s still lots of rules and restrictions to abide by. I’ll spend the rest of this entry commenting on that.

 

EnC is not just a debugger thing.

EnC is not just a debugger feature; it spawns across the whole IDE. It’s not enough to have an EnC-aware debugger; you need an EnC-aware compiler and EnC-aware editor too. Although the EnC debugging interfaces are public, there are no standard interfaces for the editor and compiler.

Furthermore, Microsoft’s implementation of both the C# compiler (csc.exe) and VB.Net compiler (vbc.exe) currently do not expose these new EnC interfaces. They are private interfaces between the compilers and the IDE.

That means:

-         either find a way to build those EnC-aware components on top of non-enc aware components (which may not even be possible, as I briefly explore below).

-         write your own EnC-capable compiler for each language that you want to support EnC.

 

In other words , adding the ability to do EnC on C# / VB.Net to your existing debugger is a very expensive feature.

 

In the following sections, I’ll briefly describe some of the new requirements EnC imposes on the editor and compiler (I’ll blog here what the debugger has to do). This should help both provide a feel for how much work creating an EnC IDE is as well as illustrate why you need EnC aware components instead of just the traditional ones.

 

EnC requirements on the Editor

In a pre-EnC world, a standalone debugger doesn’t even care about the editor. A debugger can provide lots of great information without source files, such as callstacks, function names, variables, etc.

EnC requires a new level of cooperation between the debugger and the editor. When editing an active function, the debugger needs to be able to remap a thread from the original version of the function to the new version of the function. This means the IDE needs to keep track of the source changes so that it can build a remap table of old IL offsets to new IL offsets. You’ll likely need information such as where the deltas are in relation to the current instruction pointer to construct the proper mapping. For example, suppose you change:

 

Old Source

Newly edited source

1 | Foo();

2 | Foo();   ß Current IP

3 | Bar();

1 | Bar();

2 | Foo();

3 | Foo();

 

By just looking at original and final source, it’s not clear which new line should the old IP (at line 2) map to: 2 or 3? It depends on how the edits were made.

If we do this series of edits, the old line 2 should map to new line 2.

Original

Intermediate

Final

1 | Foo();

2 | Foo();   ß IP

3 | Bar();

(delete row 1)

1 | Foo();   ß IP

(delete row 3)

1 | Bar(); ß insert

2 | Foo();   ß IP

3 | Foo(); ß insert

 

But if we do this series of edits, the old line 2 should map to new line 3.

Original

Intermediate

Final

1 | Foo();

2 | Foo();   ß IP

3 | Bar();

1 | Foo();

2 | Foo();   ß IP

(delete row 3)

1 | Bar(); ß insert

2 | Foo();  

3 | Foo();  ß IP

 

I think there are enough diff utilities and flexible editors out there that one could somehow build an “EnC-aware editor” on top of a traditional editor with such utilities. An IDE could also greatly simplify their dependency on the editor by not allowing edits of any active functions, though that would greatly decrease EnC's usefuless.

 

EnC requirements on the Compiler

Compilers traditionally support a command line interface that takes in all the source files and produces a single executable. This works great in the pre-EnC world, but EnC requires additional interfaces from the compiler.

 

Some of the EnC functionality compilers have to support include:

1) Compiling the deltas

With EnC, the IDE needs to:

  • get the editor to collect source deltas,
  • pass the source deltas to the compiler which will then produce delta IL and metadata blobs
  • pass those to the debugger (via ICorDebugModule2::ApplyChanges).

 

One delta may refer to a previous delta, so this also means that the compiler needs to keep track of all previous deltas.

There’s not a standard interface for how the IDE should request 3rd-party compilers to produce new delta IL and metadata blobs.

 

2) Detecting illegal edits.

The IDE needs to detect edits that don’t make sense and prevent them. The CLR has additional restrictions for EnC beyond just what compiles. For example, in v2.0 CLR, it’s illegal to change the base class of a type via EnC, even though such a change may be legal if the user recompiled the whole program. If the user tries to do such an illegal edit, the IDE needs to detect it and prevent it gracefully (likely by telling the user that they must restart their debugging session and recompile their app before they can use EnC again).

Some illegal edits also depend on program state. For example, some restrictions may only apply if a function is currently on the stack. 

This illegal edit detection may require additional interfaces from the compiler.

 

3) Alternate codegen for compiling deltas.

A compiler may need to produce different IL for an EnC delta versus what it would do if compiling clean. For example, another CLR restriction is that you can’t remove local variables from a function. If a source delta removes a local variable, the compiled delta still needs to emit the local in the method’s signature but just omits any reference to that local in the rest of the body (it becomes just a placeholder). Once you recompile clean, the local can be completely removed. An EnC-aware compiler needs to be aware of restrictions like this and be able to emit the altered code-gen patterns as needed.

 

Building an EnC-aware compiler on top of a traditional compiler?

An open question is if it’s possible to build an EnC-capable compiler, which addresses issues like the ones above, on top of a traditional compiler. We don’t have a current solution, but it’s something I’m casually exploring. The basic idea would be:

-         Use the traditional compiler on both the full original source (say T1.cs) and full final source (say T2.cs). This will produce two files (say T1.exe, T2.exe). This alone may have significant problems:

o       There may be performance disadvantages of recompiling the entire project instead of just the delta.

o       It relies on a completely deterministic compiler, including the token mappings for all functions.

-         Use an “IL-diff” utility on T1.exe and T2.exe to detect the deltas. A simple text diff of the ildasm results may almost be sufficient here. For eg, if T2.cs just added a new method, it would show up here. We’ll call the diff ‘delta.il’.

-         Use an “ILasm tool with EnC extensions” utility to compile the delta.il file into compiled IL, metadata, and delta pdbs. This tool would have to deal with the alternative codegen issues described above and check for illegal edits. The current ilasm does have some very primitive enc support for academic and testing purposes and is very insufficient for supporting this goal.

Even if such utilities existed and such a pipeline could be established, and it was perfomant, I’m concerned they would still be very fragile.