Compiling a language to C#


Several people have written me to say that they’re writing their own language (let’s say ‘X’) and they compile their language to C# and then compile C# to IL. This is instead of directly compiling X to IL. This can be attractive because:
1) using C# constructs may simplify the code-gen for X. For example, it’s easier to emit an ‘if (..) { … } else {… }’ then the raw IL instructions.
2) it may simplify semantic analysis by letting X piggy back on top of C#. X could generate C# code like “System.Console.WriteLine(o);” without necessarily determining the type of ‘o’. I personally think this is cheating, but it can be a nice shortcut.


That’s great, but it introduces some issues for debugging since you want to debug it at the ‘X’ source level and not the C# source level.   Assuming language X is sane, you should not have to write your own debugger. If you get the PDB right, you should be able to leverage the full power of an existing debugger (such as Visual Studio) to get a reasonable debugging experience.


Here are some things to pay attention to:


1) Use #line for source-line mapping: The biggest thing is that you need to get the IL-to-source map (sequence points) correct.  When you compile X –> C# –> IL, the C# compiler by default will emit sequence points to map the IL back to the C#. You can use
C#’s #line directive to providing your own mapping, which lets you map the IL back to ‘X’ (or any other source file). This is also great for code generators.  Note that each #line can specify its own source line and file, and thus a single function can be mapped back to source lines from multiple files.


This alone will solve many problems, including making managed breakpoints, stepping, and set-next-statement, work.
This sequence points are stored in the PDB. I have a tool that converts a managed pdb into a xml file.


 


2) What about locals?  Local variable names are stored in the pdb. C# doesn’t provide a way to let you override the names of locals. Thus you’ll need to pick your local variables in C# such that they map well to any local variables in ‘X’. This may require clever codegen. Considering the following:


    2a) One problem is that sometimes a local in ‘X’ may be a reserved keyword in C#. You can get around this by using ‘@’ lexing rule. Eg, you can say this in C#:
            int @int = 5;   // declares a local var named “int”
    2b) Adopting a coding convention to decorate all ‘internal’ locals to make it clear to end-user that they don’t map locals from ‘X’. (Note that double-underscore is reserved). CS uses variables with names like  ‘CS$1$0000’ for this purpose.

3) What about callstacks? As I mention here, #line will affect the source-to-IL maps, but it won’t affect the callstacks. That’s because the callstack is based off metadata and not symbols (That’s why the StackTrace class can work even without pdbs). Thus to have a reasonable callstack in the debugger for ‘X’, the X–> C# mapping must be intelligent. You can use ‘@’ on function names too.  If you’re generating C# code for a function Foo() in X, try to generate it as a single C# function also called Foo(). If you need to generate multiple C# functions, consider calling them Foo_1() and Foo_2().
Technically, since the source-mapping is arbitrary (as defined by the pdb), the function name in the callstack and the source mapping don’t have to match up. Such a mismatch is bound to confuse end-users!


4) In general, use debugger-friendly code-gen.  Look at the code-gen for anonymous delegates for an example of how debugger-friendly code-gen can make a language construct more debuggable. We didn’t provide any new debugging support for anonymous delegates, yet friendly codegen means end users still have a good experience.  Note that most compilers have a /debug switch for generating explicitly debuggable code. 
Sometimes this is as easy as picking good names. Sometimes it may be more complicated. Look at C# yield as a moderate example.
 


These issues are related to issues I raised when I explained how to add debugging support for an arbitrary state machine.
If I can think of more techniques, I’ll try to come back here and update this list.
 

Comments (12)

  1. badguy219@NOSPAM_yahoo.com says:

    Hello,

    If it is within your knowledge, I’d like to know how I may write and compile my own language with C#. I know you can do such with Assembly, but I’m in no way good with Assembly language.

    Thanks,

    badguy219@NOSPAM_yahoo.com

  2. jmstall says:

    bg219 – Can you clarify what you’re asking?

    Eg: Are you asking:

    1) how do you write a compiler in C#?

    2) how do you write a compiler which translates a language into C#

  3. badguy219@NOSPAM_yahoo.com says:

    I mean I would like to know how to write my own language in C#. Even something really simple would be fine. Any ideas on making such a language in C#?

  4. jmstall says:

    You’re in luck. C# the CLR are great for writing languages. some links:

    1) I wrote a C# compiler in C#. Full source is available. See here for details: http://blogs.msdn.com/jmstall/archive/2005/02/06/368192.aspx

    2) IronPython is a Python compiler / interpretter written in C#. Full source is also available. See http://www.gotdotnet.com/workspaces/workspace.aspx?id=ad7acff7-ab1e-4bcb-99c0-57ac5a3a9742.

    3) Reflection.Emit is a set of class libraries that you can access from C# to generate IL. See http://blogs.msdn.com/jmstall/archive/2005/02/03/366429.aspx

  5. badguy219@NOSPAM_yahoo.com says:

    Yes, thank you very much Jmstall. However, this compiler…. Will it compile a language made in C# as well? Also, do you have a tutorial or article about making a language in C#? Some help would be appreciated.

    Thanks!

  6. jmstall says:

    bg219 – I’m confused again as to what you’re asking.

    Compilers have 3 qualities:

    1) the target input language. What language do they actually compile? Eg, CSc.exe (Microsoft’s C# compiler) takes in C#. cl.exe (MS’s C++ compile) takes in C++.

    2) the target output. What does it compile it to? Most compilers will target some stand alone executable. CSC.exe produces a .NET exe (exe containing IL that runs on the CLR). CL.exe produecs a win32-exe (runs without the CLR).

    This blog post points out that a compiler could actually produce C# (which may be easier to produce than targetting an .exe directly), and then use CSC.exe to convert that to an .exe.

    3) What language is the compiler itself implemented in? This is indepedent of the answers to #1 and #2! CSC.exe happens to compile (C# –> .NEt exe), and is implemented in C++. Blue (my compiler above), compiles (C# –> .Net exe) but is implemented in C#.

    You could write a compiler in ML that compiles (C# –> .Net exe)

    Given this background, I’m not sure how to interpret your question?

  7. badguy219@NOSPAM_yahoo.com says:

    Oh, I see. Well, I’ll speak more clearly this time. If I make a language in C#, will a compiler be able to compile this language? Also, I downloaded Blue, but I can’t manage to seem to get it running. The command line reports that the code of source_core is incorrect, and that source files are missing. Any idea?

  8. jmstall says:

    bg219 – "If I make a language in C#, will a compiler be able to compile this language?"

    Of course. Why wouldn’t it?

    Specifically, let’s say you design your own language X.

    In order to actually execute programs written in X, you need some program (a compiler) that will translate X into something runnable (like an .exe).

    You can write that program in any language you wish, including C#. CSC.exe can then compile your compiler.

    Unfortunately, Blue only works on V1.1 CLR. (it wasn’t updated for v2.0). What version are you using?

  9. badguy219@yahoo.com says:

    I am using Visual C# Command Line Compiler version 7.10.6001.4. So I suppose that is why Blue has decided not to run. Okay, well thanks a lot for the information.

  10. Roman Osykin says:

    Sorry for the delay, I have to thank you for this article, it really helps.

  11. jmstall says:

    Roman – great! I’ll update it as people raise new issues.

  12. We need some customer feedback to determine if we fix a regression that was added in VS2008. Any language