A Java to x86 compiler written in F#

A little personal information about me, I’m currently working on a masters degree at the University of Washington. And a few weeks ago I completed taking a course in  Compiler Construction, where the course project was to build a simple Java compiler that generated x86 executables.


While most students chose to write their compilers in Java or C#, my partner and I wrote ours in F#. Though I won’t be publishing the source code - since it would be too tempting for other students using Appel’s Modern Compiler Implementation in Java – I would like to point out where F# makes the task of writing a compiler much easier than say, C#. F#’s superiority in this domain can largely be attributed to discriminated unions, as you will soon find out.


Abstract Syntax Tree, Parsers, and Lexer

The first phase of the compiler was perhaps the easiest to implement in F#. Rather than representing some complex object hierarchy and use polymorphism, the AST can be represented as a series of discriminated unions.


Ast.fs snippet

type Statement =

    | CompoundStatement of Statement list

    | If of Expression * Statement * Statement          // If ([expr]) [stmt1] else [stmt2]

    | While of Expression * Statement

    | PrintLine of Expression

    | DeclarationOrAssignment of VariableOrType * DeclarationOrAssignmentTail


type Ast =

    | MiniJavaProgram of Ast * Ast list                 // MainClass, ClassDeclaration list

    | MainClass of string * Ast                         // Name, main method

    | MainMethod of string * Type * Statement           // Name of parameter, return type, Statement

    | Class of string * Ast list                        // Name, VarDeclaration or Method

    | DerivedClass of string * string * Ast list        // Name, BaseClass, VarDeclaration or Method

    | VariableDeclaration of string * Type              // Type, name

    | Method of string * Type * Ast list * Ast          // Name, (return) Type, (parameter) Types, MethodBody

    | MethodBody of Statement list * Expression         // Statements, (return) Expression


Parser.fsy snippet

// These are the rules of the grammar along with the F# code of the

// actions executed as rules are reduced.  In this case the actions

// produce data using F# data construction terms.

start: MiniJavaProgram { $1 }



      | MainClass                   { MiniJavaProgram($1, []) }

      | MainClass ClassDeclarations { MiniJavaProgram($1, List.rev $2)}



      | CLASS ID LCURLY MainMethod RCURLY       { MainClass($2, $4) }




                                                { MainMethod($9, MainMethodType, $12) }  


      | ClassDeclaration                        { [$1] }

      | ClassDeclarations ClassDeclaration      { $2 :: $1 }



      | CLASS ID LCURLY RCURLY                                     { Class($2, []) }

      | CLASS ID EXTENDS ID LCURLY RCURLY                          { DerivedClass($2, $4, []) }

      | CLASS ID LCURLY ClassMakeupDeclarations RCURLY             { Class ($2, (List.rev $4)) }

      | CLASS ID EXTENDS ID LCURLY ClassMakeupDeclarations RCURLY  { DerivedClass ($2, $4, ($6))}


Type Checking and Symbol Table Generation

Once we were able to successfully lex and parse source code, we moved onto type checking and generating a symbol table. Like most compilers, we simply used the Visitor design pattern. Which if implemented in C# requires some funky notion of double dispatch. That is, the object being visited has a method called ‘visit’ or ‘accept’ which is given a parameter of type Visitor, then, the object being visited calls ‘visit’ on the visitor passing ‘this’ as parameter.


The problem with this is that it adds a lot of lame boiler plate code. For example, from the Wikipedia article each ‘part’ needs to implement the ‘accept’ method, which in turn calls the ‘visit’ method on the visitor. Via method overloading, the right visitor method will be called.

class Engine {

    public void accept(Visitor visitor) {





class Body {

    public void accept(Visitor visitor) {





In F# with discriminated unions we can write a single ‘walker’ for the discriminated union types, and keep all the visitation code in one place. (Rather than decorating each node.) This keeps the code much simpler.


Visitor.fs snippet

type Walker =


    // Walks the ast tree with the given visitor

    static member WalkTree (astRoot:Ast, visitor:Visitor) =

        let walker = new Walker(visitor)



    // ... snip ...


    // Methods which walk the AST node, calling our visitor methods as nessecary.

    member private this.WalkAst (astNode : Ast) =

        // Pre visit

        visitor.PreVisitAst astNode


        let walkAstList (astList : Ast list) = 

            astList |> List.iter (fun nast -> this.WalkAst nast)


        let walkStmtList (stmtList : Statement list) =

            stmtList |> List.iter (fun nstmt -> this.WalkStatement nstmt)


        // Recurse

        let _ = match astNode with

                | MiniJavaProgram (ast, astList)    -> this.WalkAst ast; walkAstList astList

                | MainClass (_, ast)                -> this.WalkAst ast

                | MainMethod (_, _, stmt)           -> this.WalkStatement stmt


                | Class (_, astList)                ->  walkAstList astList

                | DerivedClass (_, _, astList)      ->  walkAstList astList


                | VariableDeclaration (_, varType)  -> this.WalkType varType

                | Method (_, retType, astList, ast) -> this.WalkType retType; walkAstList astList; this.WalkAst ast

                | MethodBody (stmtList, expr)       -> this.WalkExpression expr; walkStmtList stmtList


        // Post visit

        visitor.PostVisitAst astNode


Executable Creation

Generating the assembly code is straightforward by walking your IR nodes and building up a list of ‘Instructions’, represented by the following DU:


type Instruction =

    | MovRR of Register * Register

    | MovRI of Register * int

    | MovRM of Register * Memory

    | MovMA of Memory * string

    | MovMR of Memory * Register

    | Comment of string
    | ...
    override this.ToString() =

        match this with

        | MovRR(dstReg, srcReg) -> "  mov " + dstReg.ToString() + ", " + srcReg.ToString()

        | MovRI(dstReg, i)      -> "  mov " + dstReg.ToString() + ", " + i.ToString()

        | MovRM(dstReg, m)      -> "  mov " + dstReg.ToString() + ", " + m.ToString()

        | MovMA(m, address)     -> "  mov " + m.ToString() + ", " + "OFFSET " + address

        | MovMR(m, srcReg)      -> "  mov " + m.ToString() + ", " + srcReg.ToString()

        | Comment note          -> "; " + note


Once the list of instructions has been generated, we then needed to build executables. To link them you could use ml.exe and cl.exe, Microsoft’s C/C++ compiler and linker. But since requiring a second step is lame, we embedded the files into .RESX files and exported them at runtime. While the F# compiler supports this easily, we actually did this part in C# so that we could take advantage of Visual Studio’s Managed Resource Designer.


            /// <summary>

            /// Takes the given assembly file and compiles it using the hard-coded boot loader code.

            /// </summary>

            public static string LinkAsmFile(string asmFile)




                        // First, dump ML and boot.obj to disk.

                        string tempPath = Path.GetTempPath();

                        string mlExeFile = Path.Combine(tempPath, "ml.exe"); ;


                        // Extract the binary resources embedded in this DLL and write to file.

                        WriteFile(MasmResources.MASMResources.ml, mlExeFile);


                        // Shell execute ml.exe on the target assembly file.

                        // This produces a .obj file.

                        ProcessStartInfo procStartInfo = new ProcessStartInfo(mlExeFile, "/c /Cx /coff /Zi \"" + asmFile + "\"");

                        procStartInfo.RedirectStandardOutput = true;

                        procStartInfo.UseShellExecute = false;


                        Process linkerProc = Process.Start(procStartInfo);


                        if (linkerProc.ExitCode != 0)

                              return linkerProc.StandardOutput.ReadToEnd();


                              return "";


                  catch (Exception ex)


                        return ex.Message;




Sounds good, but notice that to run our F# compiler you need F#MiniJava.exe (the program), MASMResources.dll (C/C++ linker), and FSharp.Core (F# libraries). That all sounds lame. Fortunately F# has two awesome command line flags that will essentially embed any required assembly references into your executable:



      Statically link the F# library and all transitively referenced DLLs

      that depend on the F# library into the assembly being generated.

      Disables the generation of the F# interface and optimization

      resources and renames all types beginning with Microsoft.FSharp,

      making them private to the target assembly. There must be no

      duplicate type names across any statically linked

      assemblies, no code may use reflection on any of the

      types in those assemblies, and no code may export F# library types

      as part of an API.

--static-link <string>:

      Statically the given assembly and all transitively referenced DLLs

      that depend on this assembly.  NOTE: use an assembly name e.g. mylib

      , not a DLL name, e.g. mylib.dll.


(Note: many bad things can happen if you do this, from security vulnerabilities to subtle codegen errors. These flags may be removed in a future release of F# and use at your own risk.)


Having taken an undergraduate course in compilers I found the course material to be challenging, but when it came to the project implementation F# made it a breeze.


Comments (9)
  1. namin says:

    Hi Chris,

    Regarding the –standalone flag, would you recommend using it on production code (so the client can be oblivious to F#) or just asking the client to download F# instead?

  2. ChrSmith says:

    The –standalone flag is percisely for just that, allowing you to simply deploy an executable and have people be oblivious to the fact it was written in F#.

    You may recall Visual J# required a seperate runtime component, which was a pain in the neck for a lot of people. Ideally, the F# core libraries (FSharp.Core.dll, FSharp.Compiler.dll) would be included in the next release of the .Net framework so everything ‘just works’.

    If you do apply the –standalone flag however, be aware that you are giving up the ability to service your application. For example, if there is a security hole in System.dll, you can simply GAC the new one and all your applicaions are now patched. However, by baking everything into your app you ‘carry around’ those security holes with you.

    But for simple apps you are sharing with your colleagues it does simplify things.

  3. Drigg says:

    Your story was featured in Drigg! Here is the link to vote it up and promote it: http://defun.ru/vsjakoedrugoe/Kompiljator_javy_na_F

  4. Now that we have covered the basics, in minutes 8 – 14 we will cover the foundational concepts and types

  5. Anders Cui says:


  6. Laszlo says:

    Hi Chris,

    Excellent post. I’ve been playing with F# for a month now. Currently I’m implementing a compiler too in order to get the feel of functional programming. I was wondering how did you managed annotate the AST with semantic information, since DUs do not support seamless extension,in other words adding a new union element breaks existing code. Although it may be possible to compute these kind of information (e.g. symbol table for the current scope) on demand they don’t persist.

    Maybe the decoration of the AST nodes is not a good idea at all, I’m just stuck with the common OO ideas.

    I’d highly appreciate if you could share your ideas on this matter.


  7. Chris Smith says:

    The AST did not contain any additional information, for example line/column spans for the code where the element was declared.

    In the implementation I went with I just had a seperate SymbolTable to lookup additional information. While I agree associating information with AST nodes directly is helpful, it erodes the simplicity that discriminated unions offer.



  8. 第一篇,从零开始编写我们的第一个F#程序。 什么是F#,我为何要学它? F#是一种.NET平台上的 函数式编程 语言。就像C#和VB.NET,F#可以利用.NET的核心类库,如 WPF , WCF ,

Comments are closed.

Skip to main content