System.Reflection-based ILReader

Compared to what I posted previously here (or what was used in the DynamicMethod visualizer), this new version introduced the Visitor pattern.

  • A do-nothing visitor ILInstructionVisitor is included; the users can focus on their domain-specific logic by simply inheriting from it and overriding few methods.

public abstract class ILInstructionVisitor {
    public virtual void VisitInlineBrTargetInstruction(InlineBrTargetInstruction inlineBrTargetInstruction) { }
    public virtual void VisitInlineFieldInstruction(InlineFieldInstruction inlineFieldInstruction) { }
    ...
}

  • The abstract class ILInstruction now has an abstract method "Accept"; each derived ILInstruction class provides the override "Accept" method, which dispatches the call to ILInstructionVisitor's corresponding visit method. It saves a long switch statement when an ILInstruction is about to be processed.
  • The class ILReader has a shortcut to traverse all IL instructions: the instance method "Accept".

MethodBase Hierarchy

Class ILReader has 2 overload constructors. There are many mscorlib types derived from MethodBase (see the picture at the right side, from Reflector); only RuntimeMethodInfo or RuntimeConstructorInfo instance is supposed to pass into the first constructor. Internally it uses MethodBase.GetMethodBody and token resolution APIs from System.Reflection.Module. DynamicMethod does not provide public APIs to access the method body, nor the token resolution APIs. IILProvider and ITokenResolver are designed for easy plug-in of other IL Streams. If you are familar with the DynamicMethod visualizer source code, you already know, for tool authoring purpose, we could use Reflection to reflect some private members (which so depends on the implementation details, and could be likely broken in the future runtime version), and then come up dynamic method related classes (which implement both interfaces). We could figure out how Reflection.Emit manages those tokens at the bake time, but I have not gotten enough time on solving it.

public ILReader(MethodBase method);
public ILReader(IILProvider ilProvider, ITokenResolver tokenResolver);

Let us check out how we can build something "useful" with the ILReader. Instead of demo'ing with C# code, I'd like to write some Python code here (well, to promote IronPython ...).

The first IronPython code examines which System.String's public methods could throw ArgumentOutOfRangeException directly. For that, I can write a ILInstructionVisitor-derived class ThrowExceptionHunter, having one override method VisitInlineMethodInstruction, which checks whether the instructions's operand "Method" belongs to one of ArgumentOutOfRangeException's constructors. In order to stop hunting after hitting the ArgumentOutOfRangeException constructor, I need write a tailored loop, instead of using ILReader.Accept. From the output, we see 11 methods could throw ArgumentOutOfRangeException "directly", and they always call the constructor which requires 2 string arguments (i.e., paramName, message);

class ThrowExceptionHunter(ILInstructionVisitor):
    def __init__(self, method, exceptionType):
        self.method = method
        self.ctors = exceptionType.GetConstructors()
        self.found = False
    def VisitInlineMethodInstruction(self, instruction):
        if instruction.Method in self.ctors:
            print instruction.Method, '|', self.method
            self.found = True

clrStringType = clr.GetClrType(System.String)
exceptionType = clr.GetClrType(System.ArgumentOutOfRangeException)
for method in clrStringType.GetMethods():
    h = ThrowExceptionHunter(method, exceptionType)
    for il in ILReader(method):
        il.Accept(h)
        if h.found: break

Output:
Void .ctor(System.String, System.String) | System.String Join(System.String, System.String[], Int32, Int32)
...
Void .ctor(System.String, System.String) | System.String Remove(Int32)

Next visitor is to find callees (methods called by the current method). Early-bind method calls happen with either InlineMethodInstruction or InlineSigInstruction. Suppose you unlikely write code using "calli", only one ILInstructionVisitor method need to be overridden: to put the called method together into a python set (therefore there is no duplicate). The results shows String.StartsWith(String, StringComparison) internally used 10 different methods (including constructor call, property access).

class CalleeFinderNoDup(ILInstructionVisitor):
    def __init__(self):
        self.callees = set()
    def VisitInlineMethodInstruction(self, instruction): 
        self.callees.add(instruction.Method)

method = clrStringType.GetMethod('StartsWith', System.Array[System.Type]([str, System.StringComparison]))
reader = ILReader(method)

v = CalleeFinderNoDup()
reader.Accept(v)
print 'How many:', len(v.callees)
for x in v.callees:
    print x, '/', x.DeclaringType

Output:
How many: 10
Void .ctor(System.String) / System.ArgumentNullException
...
Int32 CompareOrdinalIgnoreCaseEx(System.String, Int32, System.String, Int32, Int32) / System.Globalization.TextInfo

We can think of many scenarios related to IL code disassembly; the attached ILReader library provides 2 visitors for that purpose: ReadableILStringVisitor and RawILStringVisitor. Along the visit, we pass in IILStringCollector. ReadableILStringToTextWriter, which implements IILStringCollector, can be used to dump the readable IL string to the specified TextWriter (shown below). We can write customized ILStringCollectors to collect those IL strings in other manners. 

v = ReadableILStringVisitor(ReadableILStringToTextWriter(System.Console.Out))
reader.Accept(v)

Output:
IL_0000: ldarg.1
IL_0001: brtrue.s IL_000e
IL_0003: ldstr "value"
IL_0008: newobj Void .ctor(System.String)/System.ArgumentNullException
IL_000d: throw
...
IL_00f7: call System.String GetResourceString(System.String)/System.Environment
IL_00fc: ldstr "comparisonType"
IL_0101: newobj Void .ctor(System.String, System.String)/System.ArgumentException
IL_0106: throw

By the way, you may think those method names inside the ILInstructionVisitor class are way too long - they all can simply be named "Visit". I agree, actually that was my first try; however to be dynamic language friendly, I renamed them by adding the matching instruction class name. When all methods are named "Visit", static language compiler (like C#) can still decide which method to bind based on the parameter type, while IronPython has to surrender itself to classes derived from such visitor: If we wrote one customized "Visit" method for ThrowExceptionHunter, which base "Visit" method is supposed to override? If you need to override several "Visit" methods from ILInstructionVisitor, there is no way to implement it in Python - defining any amount of "Visit" methods only keeps the last one alive.

TryILReader.py can be viewed here.

More tools (and discussion) related to this ILReader coming soon. Hope one day this could become System.Reflection.ILReader ...

Disclaimer: THE CODE IS PROVIDED "AS IS", WITH NO WARRANTIES INTENDED OR IMPLIED. USE AT YOUR OWN RISK

ILReader.zip