Msil Parser

I just moved into a new apartment last Saturday and have had no internet access till today.  Life without internet is like being cut off from the world.  I can't check my email, I can't work from home, I got to call and talk to people to order food ;{  It has been horrible :(  Despite not having any internet yesterday night I decided to get on my computer and look for something fun to do.  I am going to share that search with you ;)

I am quite a big user of reflector.  Whenever I want to know how something works or why something doesn't work I just load it up in reflector and look at the code.  How does reflector work though?  There is no api included in the .net framework that will allow you to read msil (as far as I know).  Unfortunately the reflector code is super obfuscated and very hard to read (reflector can't reflect on reflector lol) so I decided to start from scratch.  I want to write something that can parse msil 

I noticed in .net 2.0 a new class called MethodBody and a new method on MethodBase called GetMethodBody() were introduced.  Instead of trying to parse PE headers and the .net meta data we will just use this class and only try to parse the msil.  The MethodBody class has a method called GetILAsByteArray.  We will parse these bytes and turn them into some objects that represent the msil.

In the parser we will use a dictionary to look up the instructions.  The .net framework has a class OpCode used to represent a msil instruction when using the reflection emit apis.  The OpCodes class has each OpCode as a public field.  We will use these fields to fill our dictionary.

                Dictionary<short, OpCode> lookupTable = new Dictionary<short, OpCode>();
                FieldInfo[] fields = typeof(OpCodes).GetFields(BindingFlags.Static | BindingFlags.Public);
                foreach (FieldInfo field in fields)
                {
                    OpCode code = (OpCode) field.GetValue(null);
                    lookupTable.Add(code.Value, code);
                }
                return lookupTable;

Cool.  After this is just looping though the bytes.  Lookup up the instruction in our lookup table, figuring out the size of the data for each instruction, and move to the next instruction.  One thing that got me was that msil instructions can be 1 or 2 bytes in length.  Since I was out of internet access I had no way of looking up the exact format.  I noticed that all the instructions that are 2 bytes start with 0xfe.  In the OpCodes class Prefix1 has this value.  After figuring this out everything worked great.  The is also a prefix2-7 but I don't see any instructions that use these so I did not implement.  
           

                int instructionValue;
                if (_methodReader.BaseStream.Length - 1 == _methodReader.BaseStream.Position)
                {
                    instructionValue = _methodReader.ReadByte();
                }
                else
                {
                    instructionValue = _methodReader.ReadUInt16();
                    if ((instructionValue & OpCodes.Prefix1.Value) != OpCodes.Prefix1.Value)
                    {
                        instructionValue &= 0xff;
                        _methodReader.BaseStream.Position--;
                    }

                    if ((instructionValue & OpCodes.Prefix1.Value) != OpCodes.Prefix1.Value)
                    {
                        instructionValue &= 0xff;
                        _methodReader.BaseStream.Position--;
                    }
                    else
                    {
                        instructionValue = ((0xFF00 & instructionValue) >> 8) |
                            ((0xFF & instructionValue) << 8);
                    }
                }

                OpCode code;
                if (!_instructionLookup.TryGetValue((short)instructionValue, out code))
                {
                    throw new InvalidProgramException();
                }

                int dataSize = GetSize(code.OperandType);
                byte[] data = new byte[dataSize];
                _methodReader.Read(data, 0, dataSize);
                _current = new MsilInstruction(code, data);

Below is a complete sample console application.  Pass in any MethodInfo into PrintMsil to see the msil.  We will continue the rest later.  Next time we will look at parsing the data byte array and resolving tokens ( method, type, field, etc ) that are in the msil.  The Module class has some methods that allow us to do this.  Then we will look at some cooler things like rewriting the msil that we read in.  Peace until next time :)
 

 

 using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Reflection;
using System.IO;
using System.Reflection.Emit;

namespace TestMsilReader
{

    class Program
    {

        static void Main(string[] args)
        {
            PrintMethod(typeof(Program).GetMethod("TestMethod", Type.EmptyTypes));
            PrintMethod(typeof(String).GetMethod("IsInterned", new Type[]{typeof(string)}));
            Console.WriteLine("Press any key to exit.");
            Console.ReadLine();
        }


        private static void PrintMethod(MethodInfo methodInfo)
        {
            Console.WriteLine("Method {0}.{1}", methodInfo.DeclaringType.Name,methodInfo.Name);
            Console.WriteLine("-------------------------------");
            MsilReader reader = new MsilReader(methodInfo);
            while (reader.Read())
            {
                Console.WriteLine(reader.Current);
            }
            Console.WriteLine("-------------------------------");

        }


        public static void TestMethod()
        {

            int i = 0;

            i += 1;

            i += 2303;

            int x = i / 2;

            Console.WriteLine(x);

            Console.WriteLine(i);

        }



        public class MsilReader
        {

            private static Dictionary<short, OpCode> _instructionLookup;

            private static object _syncObject = new object();

            private BinaryReader _methodReader;

            private MsilInstruction _current;

            private Module _module;// Need to resolve method, type tokens etc



            static MsilReader()
            {

                if (_instructionLookup == null)
                {

                    lock (_syncObject)
                    {

                        if (_instructionLookup == null)
                        {

                            _instructionLookup = GetLookupTable();

                        }

                    }

                }

            }

            public MsilReader(MethodInfo method)
            {

                if (method == null)
                {

                    throw new ArgumentException("method");

                }

                _module = method.Module;

                _methodReader = new BinaryReader(new MemoryStream(method.GetMethodBody().GetILAsByteArray()));

            }





            public MsilInstruction Current
            {

                get
                {

                    return _current;

                }

            }



            public bool Read()
            {

                if (_methodReader.BaseStream.Length == _methodReader.BaseStream.Position)
                {

                    return false;

                }

                int instructionValue;



                if (_methodReader.BaseStream.Length - 1 == _methodReader.BaseStream.Position)
                {

                    instructionValue = _methodReader.ReadByte();

                }

                else
                {

                    instructionValue = _methodReader.ReadUInt16();

                    if ((instructionValue & OpCodes.Prefix1.Value) != OpCodes.Prefix1.Value)
                    {

                        instructionValue &= 0xff;

                        _methodReader.BaseStream.Position--;

                    }
                    else
                    {
                        instructionValue = ((0xFF00 & instructionValue) >> 8) |
                            ((0xFF & instructionValue) << 8);
                    }


                }

                OpCode code;

                if (!_instructionLookup.TryGetValue((short)instructionValue, out code))
                {

                    throw new InvalidProgramException();

                }

                int dataSize = GetSize(code.OperandType);

                byte[] data = new byte[dataSize];

                _methodReader.Read(data, 0, dataSize);



                _current = new MsilInstruction(code, data);

                return true;

            }









            private static int GetSize(OperandType opType)
            {

                int size = 0;

                switch (opType)
                {

                    case OperandType.InlineNone:

                        return 0;

                    case OperandType.ShortInlineBrTarget:

                    case OperandType.ShortInlineI:

                    case OperandType.ShortInlineVar:

                        return 1;



                    case OperandType.InlineVar:

                        return 2;

                    case OperandType.InlineBrTarget:

                    case OperandType.InlineField:

                    case OperandType.InlineI:

                    case OperandType.InlineMethod:

                    case OperandType.InlineSig:

                    case OperandType.InlineString:

                    case OperandType.InlineSwitch:

                    case OperandType.InlineTok:

                    case OperandType.InlineType:

                    case OperandType.ShortInlineR:





                        return 4;

                    case OperandType.InlineI8:

                    case OperandType.InlineR:



                        return 8;

                    default:

                        return 0;



                }

            }





            private static Dictionary<short, OpCode> GetLookupTable()
            {

                Dictionary<short, OpCode> lookupTable = new Dictionary<short, OpCode>();

                FieldInfo[] fields = typeof(OpCodes).GetFields(BindingFlags.Static | BindingFlags.Public);

                foreach (FieldInfo field in fields)
                {

                    OpCode code = (OpCode)field.GetValue(null);

                    lookupTable.Add(code.Value, code);

                }

                return lookupTable;

            }

        }







        public struct MsilInstruction
        {

            public MsilInstruction(OpCode code, byte[] data)
            {

                Instruction = code;

                Data = data;

            }

            public readonly OpCode Instruction;

            public readonly byte[] Data;





            public override string ToString()
            {

                StringBuilder builder = new StringBuilder();

                builder.Append(Instruction.Name + " ");

                if (Data != null && Data.Length > 0)
                {

                    builder.Append("0x");

                    foreach (byte b in Data)
                    {

                        builder.Append(b.ToString("x2"));

                    }

                }

                return builder.ToString();

            }

        }

    }
}

Share/Save/Bookmark