A tryst with MSIL

Article
11/30/2006

Any ASP.NET developer should be knowing that when his/her .NET application is compiled, the high-level code written in C# or Visual Basic .NET is compiled into the intermediate language MSIL. It is this MSIL that the Common Language Runtime (CLR) actually expects when the application is run. The CLR then converts the MSIL into platform-specific code (Just-In-Time compilation) while running the application.

MSIL (MicroSoft Intermediate Language) as the name suggests is a language in itself, so why not actually code your application directly using MSIL instead of using any high-level language like C# or VB.NET. Of course it is not an easy task, think of it as an abstract high-level assembly language.

This article will give you an insight into coding using MSIL. I seriously don't think anyone will abandon the pleasures of C# or VB.NET and start coding in MSIL. But at least the next time you ILDASM a .NET exe or dll you will know what you are looking at.

Coding in MSIL revolves around the stack (would sound familiar if you know assembly language). Basically any parameters that you need to pass to a function or any operands for an command have to first populated on to the stack. After the command or function is executed it reads the required data from the stack and the result is stored onto the stack again.
In short
1. Push the operands / parameters on to the stack
2. Execute the command / function
3. Pop the result from the stack

So let's start with the most famous program in computer history... Hello World!

.assembly HelloWorld {}
.method static void HelloWorld()
{
.entrypoint
.maxstack 1

ldstr "Hello World!"
call void [mscorlib]System.Console::WriteLine(string)
ret
}

The first thing a C, C++ or even a C# coder would question is Where is main() ? Well in MSIL the program entry point is not restircted to the main() function. Here you can define any method as the entry point by just specifying the directive .entrypoint. So now any method that has the .entrypoint directive will be your main(). Only one method in the application can have this directive.

Now this piece of code prints out Hello World! on the console. It achieves this by calling the WriteLine method. As discussed earlier before a method call any parameter that has to be passed to it has to be pushed onto the stack. Here we need to push one parameter and that is the maximum stack space we are going to use. The next line of code .maxstack 1 indicates the max number of stack slots the method will use. In our case it is just one.

The next instruction ldstr pushes the string that we need to print onto the stack. The call instruction then invokes the WriteLine() method that prints out the string. There is a significant difference between IL and other programming languages when it comes to calling a function. In IL, when we are calling a function, we have to completely specify a function, including its namespace, return type and data type of its parameter
call <return type> <namespace>.<class name>::<function name>(parameter type)

The last statement ret as you must have guessed returns execution to the caller.

To convert this piece of code into an executable use the ILASM.EXE(Shipped with the framework).

Now lets complicate things a little. We will try writing this C# code in MSIL

public static void Main()
    {
        int a, b, c;
        a = Convert.ToInt32(Console.ReadLine());
        b = Convert.ToInt32(Console.ReadLine());
        try
        {
            c = a / b;
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Message);
        }
    }

Here is the corresponding MSIL code

.method public static void division()
     {
        .entrypoint
        .maxstack 2
        .locals init (int32,
                           int32,
                           int32,
                           class [mscorlib]System.Exception)

        call string [mscorlib]System.Console::ReadLine()
        call int32 [mscorlib]System.Convert::ToInt32(string)
        stloc.0

call string [mscorlib]System.Console::ReadLine()
call int32 [mscorlib]System.Convert::ToInt32(string)

        stloc.1
        .try
           {
              ldloc.0
              ldloc.1
              div
              stloc.2
              ldloc.2
              call void [mscorlib]System.Console::WriteLine(int32)
              leave.s CONTINUE
            }
        catch [mscorlib]System.Exception
            {
              stloc.3
              ldloc.3
              callvirt instance string [mscorlib]System.Exception::get_Message()
              call void [mscorlib]System.Console::WriteLine(string)
              leave.s CONTINUE
            }
     CONTINUE:
        ret
     }

Something that immediately stands out in this piece of code is the .locals directive. Yes it is used to declare local variables. So I am declaring three int32 variables for a,b and c and one object for ex for the exception handling. You should have noticed that the variables are not named(like a,b and c). This is because you can refer to the variables by their index just like an array. You can also use identifiers for variables and define them as .locals init(int32 a, and then refer to the variables using the identifiers.

Now again handling local variables also revolves round the stack. To push a local variable value onto the stack use the ldloc instruction. And to pop a value off the stack and populate a local variable use the stloc instruction.

So now this is how the code translates

int a, b, c;	.locals init (int32, int32, int32)

a = Convert.ToInt32(Console.ReadLine());	call string [mscorlib]System.Console::ReadLine()call int32 [mscorlib]System.Convert::ToInt32(string)stloc.0

c = a / b;	ldloc.0 ldloc.1 div stloc.2

Exception handling as you see looks similar to C# but for one small difference leave.s this is a jump instruction to transfer control(remember the goto statement)

By now you must have understood that coding in MSIL is all about breaking down expressions and statements into simple instructions that can be executed in sequence. Let us look at how the if statement can be implemented.

if (a == b){ /Statements when true}/Statements	ldloc.0ldloc.1ceqldc.i4.0ceqbrtrue.s CONTINUE /Statements when trueCONTINUE:/Statements

The ceq instruction pops off two values from the stack, compares them, and then pushes 1 onto the stack if they are equal or 0 if not. The brtrue.s instruction handles the conditional branching, it transfers control to the given target if the value on the stack is non-zero. That is the reason we actually compare the result of the first ceq again with a temporary value of zero(i4). If you want to skip the second comparision you can use the brfalse.s which branches off when the value on the stack is zero.

And if you are thinking about the for statement it gets even dirtier. In short a for statement can be implemented as follows

Initialize Index

Jump To CONDITION

LOOP:

Increment Index

CONDITION:

If Index not equals N

Jump to LOOP

You have to implement the if statement as discussed above and for the unconditional jumps you can use the br.s instruction.

To bring everything together, MSIL is all about breaking down expressions and statements into simple instructions that can be executed in sequence. The code may seem overly complicated, but knowing how things work at the lowest level does make it easier to see the big picture.

A tryst with MSIL

Additional resources