Source for a C# compiler written in pure C#.

For anybody looking for the full source to a bootstrapping C# compiler, today’s your lucky day.

 

A while ago (back in 2001 before we shipped v1.0), I wrote a C# compiler called “Blue”. I know it’s 3.5 years after I wrote it, but I figured releasing it now was better late then never. Some fast facts:

- It’s written 100% in C#.

- It uses Reflection to import all references and Reflection.Emit to emit the IL.

- Everything (particularly the parser and lexer) were written by hand.

- It has the standard compiler pipeline as described in the dragon book.

- It produces verifiable IL (you can run PEVerify on the output and it passes).

 

Here’s the source: https://mikewinisp.members.winisp.net/blog/blue/blue.zip

There’s a ReadMe.html in there with more information. Unzip it do a directory , and then in a v1.0 or v1.1 command shell, run build_all.bat.

 

 

It’s a giant reflection-emit test.

Blue demonstrates a whole bunch of different parts of Reflection.Emit such as:

- Baking types.

- Creating nested types.

- Creating delegates - you’ll notice there’s no DelegateBuilder.

- Creating Enums – I recall having to dance around some issues with EnumBuilder.

- Emitting debugging information for all the code constructs (sequence points, local names, parameter names, etc) - I gave a brief example of how Ref.Emit can generate debuggable code, here. Blue is an extreme example here because it includes most C# language features and generates fully debuggable + verifiable code.

 

What’s missing?

I view Blue primarily as a sample of Reflection.Emit, and not actually as a viable production compiler. Although it’s complete enough to compile itself, it’s missing a bunch of features like:

- Custom attributes

- unsafe code

- checked/unchecked, locked keyword, using keyword

- floating point, decimal types.

- Multi dimensional arrays. (note that it can handle jagged arrays)

- Some operator overloading. (Allows binary operators like + to

- I didn’t quite get function overload resolution entirely correct. I didn’t follow the spec close enough.

be overloaded, doesn't yet handle overloaded type casts, unary ops, or ++/--.)

- Error handling: It still may assert when compiling an illegal program. (I would use csc.exe to verify inputs before sending them through blue)

I notice it also doesn’t work on whidbey because it gets confused when importing generics from mscorlib. There also may be some issues with v2.0 Ref.Emit that we’re still looking at. So you need to do this demo with V1.1. I may explore trying to update the sample to at least run on v2.0.

 

But at least it dogfoods itself…

Despite the limitations, it’s still complete enough to compile itself (eg, “dogfood”, “bootstrap”). One confession: when I was trying to have it dogfood itself, there were a few places were features that I had naturally used when writing blue weren’t yet supported in blue. In order to get it dogfood, I made local changes to avoid these features. I marked these with “@dogfood” in the code.

 

The blue download includes a batch file, build_all.bat, which will first build blue.exe from csc, and then use blue to compile itself to produce dogfood.exe:

@echo ***** Building the Blue compiler

@rem Build the compiler first using C#. This will produce a main.exe

csc /out:blue.exe @blue_source_core

@echo ***** C# compiled the blue sources to produce Blue.exe

@echo *** Show that the newly compiled blue.exe (which came from csc.exe) is Verifiable IL

peverify blue.exe

@rem Now have the compiler build itself and produce a Dogfood.exe

blue.exe /out:Dogfood.exe @blue_source_core

@echo ***** Blue.exe compiled itself to produce Dogfood.exe

@echo *** Show that blue.exe produced verifiable code

peverify dogfood.exe

 

You’ll notice the peverify stages pass.

 

You can verify that blue.exe actually compiles itself correctly by cycling the compilation 1 more step:

dogfood.exe /out:Dogfood2.exe @blue_source_core

 

Now Dogfood and dogfood2 were produced by the same algorithm (Blue) which was just compiled 2 different ways. So we expect them to be the same output. We can ildasm both dogfood.exe and dogfood2.exe and verify that (the only difference is assembly name and mvid).

 

Now what?

This is just sample code I’m throwing out there. It was just a pet project of mine a few years ago, so it’s not a supported product. And I’m not very proud of the code quality either (it was one of the first things I wrote in C#). And it’s definitely not a production quality tool either. All that aside, feel free to play around with it or try out various ideas. For example, you could:

- Play around with codegen. Perhaps go transform all the calls to be late bound and then compare perf.

- Add “inline IL”, similar to inline assembly in C/C++.

- add any of the missing v1.1 features.

- cannabalize it to get the ref-emit code.

- add any of the new C# v2.0 features

If anybody does end up doing something interesting with it, let me know!