Joe Duffy and I were really impressed with the amount of people who showed up for the PDC session “Write a Dynamic Language compiler in an hour” at the PDC last month. It confirmed my belief that customers care for details about compiler technologies and the managed libraries that enable them. We promised source download from commnet and our blogs, so here it is:
Thanks Dominic, for your help on the original presentation, much appreciated!
What follows are resources I found useful to get bootstrapped in to the world of compiler construction. If you have any other resources you'd like to see appear on this list, drop me an email.
Tools, Languages, Source and more...:
GPPG (The Gardens Point Parser Generator): Yacc/Bison like parser generator that emits C#. This was just released recently, and looks pretty solid. If you look on the link, the QUT folk talk about this being built for a "Ruby .NET" project in the context of parsing the full Ruby grammar. There doesn't seem to be anything mentioned officially about the Ruby .NET project but given their track record on delivery, I'm happy to get my hopes up. Can't wait to get my hands on it!
PEAPI and PERWAPI: Managed API for reading and writing managed executables. It's a lower level Reflection.Emit like interface, gives you more control over the metadata bits and bytes. I believe the Mono C# compiler uses PERWAPI as it's backend. Fast too.
IronPython: Python compiler for .NET, incubated in my team, the CLR, fronted by Jim Hugunin. A full dynamic language compiler with source, released under a liberal Shared Source license. We're getting close to 1.0 release on this one. I think this is a great starting point for anyone looking to write or port a dynamic language - Python has some interesting problems and solutions around interop (language, BCL, etc.), performance (late-bound binding and invocation), code loading, IDE integration and more. It has a very active community.
F#: A functional language with ML like flavor from Don Syme at MSR Cambridge. Very stable, fantastic VS.NET integration (both 2003 and 2.0 Beta 2) and the F# to BCL interop stuff is awesome. Lately, Don has been expanding the breadth of F# language features and investing in developer compiler interactivity (IDE/REPL etc.) to make it play great in the .NET ecosystem. I'd love to see an F# book though - mapping OCaml to F# can be tedious if you've never approached either before.
Rotor (SSCLI): Subset source to the Common Language Runtime, C# and JScript.NET compilers, shipped with an academic/hobbiest friendly license. Runs on multiple platforms and architectures (Windows, FreeBSD, MacOS - x86, PPC). It's great having the source to the CLR, C# and JScript.NET handy. I recommend buying the Rotor book (see below) if you want to ramp quickly on the source.
ECMA Specification: Submission of the CLI (Common Language Infrastructure) to the ECMA standards organization. More commonly known as "the docs to the product". Has all the execution semantics, rules and metadata bits of the CLR - exellent resource for compiler writers.
List of .NET Languages: A pretty complete list of languages that run on top of the CLR/.NET platform. Has a lot of links to the respective project websites. Some of the languages come with source too.
DOTNET-LANGUAGE-DEVS: This is the place where commercial, academic and hobbiest compiler writers hang out. The list is light on traffic, but you can usually expect a rapid response to questions.
DOTNET-CLR: A mailing list for CLR geeks. Some of the language people hang out on this list. Great place to ask questions about compiler<-->VM problems you might face when targeting the CLR.
Lambda the Ultimate: A great programming languages weblog.
Reflection.Emit MSDN documentation: Documentation for the Reflection.Emit namespace. Useful if you intend on using Reflection.Emit as the backend for your compiler.
Hello World, Reflection.Emit style: Hello, World!
DynamicMethod (Lightweight Code Generation): MSDN documentation for LCG. LCG is a Whidbey (2.0) technology for lightweight, GC reclaimable code generation. It uses familiar Reflection.Emit API's. Recommended in most scenarios where code generation is required and assemblies and type generation is not needed.
Hello World, Lightweight Code Generation (LCG): Hello, World, this time using LCG as the code generation technology.
Reflection Performance article: An article I wrote a few months back on how to improve the performance of common Reflection scenarios. Has some stuff on LCG and binding.
Under the hood of Dynamic Languages: Part 1 of a blog post I did about how dynamic languages map to the runtime. A lot of what I talk about here can be found in source form in the IronPython compiler. Part 2 is coming soon. 😉
Whidbey Delegates: Describes the relaxed signature matching changes we've done for Whidbey in the delegate space. What isn't mentioned is the open instance/closed static delegates that we've added - the Whidbey delegate docs describe these changes but I don't believe the latest docs are on the web yet. These small but powerful additions open up a whole new range of calling convention opportunities, especially in the dynamic languages space.
Late-bound invocation: Some notes I wrote up on late-bound invocation using Reflection (if you're going dynamic, then I hope this helps you out). Part 2 here.
Compiling for the .NET Common Language Runtime, John Gough, 0130622966
The bible for compiler writers targeting the CLR.
Inside Microosft .NET IL Assembler, Serge Lidin, 0735615470
All you need to know about IL and the ILASM/ILDASM tools that ship with the SDK. If your compiler is going to target textual IL and call on the ILASM compiler to cook you up an exe, this book will be invaluable.
Compilers, Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman, 0201100886
The bible. Also featured in the movie Hackers, as a l33t resource for those hacker people.
Modern Compiler Implementation in ML, Andrew W. Appel, 0521582741
I haven't finished reading this book yet, but so far so good. It'd be better if it was "Modern Compiler Implementation in F#", but I'm biased... 🙂 (also somes in C and Java flavors)
Programming Language Pragmatics, Michael L. Scott, 1558604421
I love the chapter on concurrent programming languages towards the end of the book. Lots of really great archaic info on languages like Ada, Fortran and Prolog. Great stuff.
Advanced Compiler Design and Implementation, Steven Muchnick, 1558603204
.NET Platform books (useful to understand what you're targeting):
|Essential .NET, Volume 1: The Common Language Runtime, Don Box, 0201734117|
Suitably high level to get a good understanding of the CLR. I recommend this one to all CLR newbies.
|Shared Source CLI Essentials, David Stutz, Ted Neward, Geoff Shilling, 059600351X|
Total ego booster when coupled with the Rotor source. Impress your friends and co-workers with your deep understanding of the CLR. Graduates of this book are prone to be hired by the CLR team. Recommended to compiler writers who want a sharp understanding of what they are targeting.
|The Common Language Infrastructure Annotated Standard, Jim Miller, 0321154932|
If you find the ECMA spec lacks a little oomph, try this book from CLR Architect (and part-time opera singer) Jim Miller. Annotates the ECMA spec with design decision explainations and other runtime facts. Also known to produce CLR new hires.
|Applied Microsoft .NET Framework Programming, Jeffrey Richter, 0735614229|
Jeff is the man. A broader platform book that goes a little deeper than Essential .NET. Useful to newbie compiler writers for two reasons: you get to learn the platform API's to write your compiler, and it gives a good insight in to how the backend works.
|Professional .NET Framework 2.0, Joe Duffy, 0764571354|
Where's Joe's face?