Compiling a language to C#

Several people have written me to say that they're writing their own language (let's say 'X') and they compile their language to C# and then compile C# to IL. This is instead of directly compiling X to IL. This can be attractive because:
1) using C# constructs may simplify the code-gen for X. For example, it's easier to emit an 'if (..) { ... } else {... }' then the raw IL instructions.
2) it may simplify semantic analysis by letting X piggy back on top of C#. X could generate C# code like "System.Console.WriteLine(o);" without necessarily determining the type of 'o'. I personally think this is cheating, but it can be a nice shortcut.

That's great, but it introduces some issues for debugging since you want to debug it at the 'X' source level and not the C# source level.   Assuming language X is sane, you should not have to write your own debugger. If you get the PDB right, you should be able to leverage the full power of an existing debugger (such as Visual Studio) to get a reasonable debugging experience.

Here are some things to pay attention to:

1) Use #line for source-line mapping: The biggest thing is that you need to get the IL-to-source map (sequence points) correct.  When you compile X --> C# --> IL, the C# compiler by default will emit sequence points to map the IL back to the C#. You can use
C#'s #line directive to providing your own mapping, which lets you map the IL back to 'X' (or any other source file). This is also great for code generators.  Note that each #line can specify its own source line and file, and thus a single function can be mapped back to source lines from multiple files.

This alone will solve many problems, including making managed breakpoints, stepping, and set-next-statement, work.
This sequence points are stored in the PDB. I have a tool that converts a managed pdb into a xml file.

 

2) What about locals?   Local variable names are stored in the pdb. C# doesn't provide a way to let you override the names of locals. Thus you'll need to pick your local variables in C# such that they map well to any local variables in 'X'. This may require clever codegen. Considering the following:

    2a) One problem is that sometimes a local in 'X' may be a reserved keyword in C#. You can get around this by using '@' lexing rule. Eg, you can say this in C#:
            int @int = 5;   // declares a local var named "int"
    2b) Adopting a coding convention to decorate all 'internal' locals to make it clear to end-user that they don't map locals from 'X'. (Note that double-underscore is reserved). CS uses variables with names like  'CS$1$0000' for this purpose.

3) What about callstacks? As I mention here , #line will affect the source-to-IL maps, but it won't affect the callstacks. That's because the callstack is based off metadata and not symbols (That's why the StackTrace class can work even without pdbs). Thus to have a reasonable callstack in the debugger for 'X', the X--> C# mapping must be intelligent. You can use '@' on function names too.  If you're generating C# code for a function Foo() in X, try to generate it as a single C# function also called Foo(). If you need to generate multiple C# functions, consider calling them Foo_1() and Foo_2().
Technically, since the source-mapping is arbitrary (as defined by the pdb), the function name in the callstack and the source mapping don't have to match up. Such a mismatch is bound to confuse end-users!

4) In general, use debugger-friendly code-gen.  Look at the code-gen for anonymous delegates for an example of how debugger-friendly code-gen can make a language construct more debuggable. We didn't provide any new debugging support for anonymous delegates, yet friendly codegen means end users still have a good experience.  Note that most compilers have a /debug switch for generating explicitly debuggable code. 
Sometimes this is as easy as picking good names. Sometimes it may be more complicated. Look at C# yield as a moderate example.
 

These issues are related to issues I raised when I explained how to add debugging support for an arbitrary state machine.
If I can think of more techniques, I'll try to come back here and update this list.