The world is a better place if you generate verifiable IL


If you are writing a compiler that targets IL or just emitting IL, you may find this an interesting read:

 

The JIT compiler will always try to generate code, even if the IL is bad. From the JIT’s point of view, IL code falls in 3 categories:

 

1)       Verifiable IL. Most of the code can be verifiable. You can look for bugs in it using offline tools (PEVerify)

2)       Non verifiable but correct IL

3)       Non verifiable and incorrect IL

 

It is difficult for us to tell the difference between 2) and 3). Whenever we see something that looks really wrong (Bad EH regions, IL stack imbalances, etc..) we bail out throwing an InvalidProgramException. The JIT could probably do a better job here by being stricter on things that are obviously wrong, but even if we did, bad IL will get through (we can’t prove it’s incorrect for all cases). Historically, it’s difficult to be in the 2) camp and be in shape, I’ve had to debug many more bugs coming from people that generate unverifiable code than from let’s say , the C# compiler, which generates verifiable code (unless you use the unsafe keyword).

 

The problem is that It is very easy to go from 2) to 3). What can happen when you are in 3)?

 

– If you get very lucky, you will immediately crash, with some clear indication of what method had a problem, you will inspect the IL and find the problem.

– If you are less lucky, you will break some CLR invariant and crash in a way you will have a hard time to figure out what went wrong.

– If you are not very lucky, you will have a bug that won’t reproduce deterministically, that will behave different ways in different machines (specially in your customer’s machines), and that will need debugging by somebody with expertise in CLR internals. Trust me, you don’t want to be here.

 

What’s my recommendation? It’s very likely that 100% of your code could be in camp 1). If not, try to get close to 100%. Once you’re there separate verifiable from non verifiable code into different assemblies and make sure the code you think is verifiable passes verification. You can do this by saving your assembly to disk and running PEVerify on it, or refusing the SkipVerification permission, in which case the JIT compiler will verify your methods as you go and will throw a nice an easy to debug exception when it encounters code that isn’t verifiable.

 

Some examples on how easy and how non obvious it is to get into really hard to debug problems. The CLR Garbage Collector performs all its work on a huge dynamic graph of objects. To put it in a simple form, the GC follows pointers from one object to the other in order to obtain the set of objects that are currently alive. For example, if your have a live object of the following class

 

Class Foo

{

            Int   i

            Foo next

}

 

To see what other live objects this one is keeping alive, the GC has to walk the object looking for pointers. To do that, the GC obtains the layout of the object and follows the managed pointers, in this case, ‘next’.

 

Now, what happens if next is a bad pointer? Most likely you will crash. Why? To figure out what objects is ‘next’ keeping alive, we need to know what type ‘next’ is. The GC figures this out by looking at ‘next’s method table, which is, let’s say, in  *( (MethodTable**) next). Now, if next is a bad pointer, you are basically reading from a random location, if you get lucky you will crash right there, if not more dangerous things can happen: heap corruption, type safety violation, etc… This class of bug is what we call here a ‘GC Hole’

 

How can you get into this mess by emitting IL? Let’s write a silly example (at the right of each instruction I show what’s in the IL stack after the instruction executes) that will show you that not even inspecting the x86 assembly we generate from your IL will be enough proof that you are correct (obviously, for simplicity, this is not real code. Real code can be more subtle and more innocent looking). Let’s hope it can convince you to only generate verifiable code 😉

 

 

newobj     instance void Test::.ctor()  ( Test*)

castclass  Test                         ( Test*)

ret                                     ()

 

This would generate the following code:

 

IN0001: mov     ECX, 0x2cb0dd4            (GC regs: – )

IN0002: call    CORINFO_HELP_NEWSFAST     (GC regs: EAX)

IN0003: mov     EDX, EAX                  (GC regs: EDX)

IN0004: mov     ECX, 0x2cb0dd4            (GC regs: EDX)

IN0005: call    CORINFO_HELP_CHKCASTCLASS (GC regs: EAX)

 

This code is fine. Note the ‘GC regs’ on each line The JIT emits GC information, which tells the GC what registers hold GC pointers for every instruction (again, we don’t do this always, but let’s say we do for this discussion), so basically, after instructions 2,3 and 4 EDX is holding a GC reference (the Test object we just created) and after instruction 5 the result of the case is in EAX

 

Now, let’s introduce a small change (that will make our code unverifiable)

 

newobj     instance void Test::.ctor()( Test*)

conv.i                                ( I )

castclass  Test                       ( Test*)

ret                                   ()

 

 

This yields the following code:

 

IN0001: mov     ECX, 0x2cb0dd4            (GC regs: – )

IN0002: call    CORINFO_HELP_NEWSFAST     (GC regs: EAX)

IN0003: mov     EDX, EAX                  (GC regs: None!)

IN0004: mov     ECX, 0x2cb0dd4            (GC regs: None!)

IN0005: call    CORINFO_HELP_CHKCASTCLASS (GC regs: EAX)

 

 

What is the difference? The generated code is exactly the same, but the GC info isn’t. In our second version, after the 3rd instruction EAX is no longer a GC pointer, because conv.i told us to treat it as a pointer size integer. What does this mean to us? It means that if a GC happens while in instructions 3 and 4, the GC won’t think there is an object in EDX, which is the only reference we have to our newly create object, so it will reclaim it’s memory back and possibly reuse it. So now the GC returns control to the code and we call into CORINFO_HELP_CHKCASTCLASS with a bogus pointer, oops! And what’s worse, you will only see the bug if a GC happens in that 2 instruction window. Combine the above with multithread code (let’s say you are running in ASP.NET) and you have the perfect GC hole, which will cause unexplained crashes that you’ll just see in production machines or while demoing your product.

 

[Edit: Fixed fonts and a typo]

 


Comments (14)

  1. Sam says:

    Good stuff,

    Do you have any links to information on *how* to produce verifiable IL when using reflection emit?

    I would like to learn more…

  2. David Notario says:

    Sam, if your question is how to verify that the IL you generate is verifiable, 2 ways of doing it:

    1) Save the assembly you emit to disk, run peverify on it. This will make sure that all methods in your assembly are verifiable

    2) Refuse SkipVerification permission. You can add the following attribute to your assembly:

    [assembly:PermissionSetAttribute(SecurityAction::RequestRefuse,Name="SkipVerification")];

    The downside of this aproach is that verification will be done at runtime (ie, it verifies as you go, so maybe you won’t cover all your code if not all is used).

    Producing verifiable code is about following the IL verification rules as described in ECMA spec, they’re mostly type safety rules, so I don’t think you will find much stuff there that is suprising.

    Let me know if you have any other questions.

  3. Hi,

    I try to always generate verifiable code, but in some cases that isn’t as easy as you might think. For example, is the following code verifiable? Whidbey’s PEVerify doesn’t think so.

    .assembly extern mscorlib {}

    .assembly test {}

    .module test.exe

    .class private auto ansi beforefieldinit Test

    extends [mscorlib]System.Object

    {

    .method private hidebysig specialname rtspecialname

    instance void .ctor() cil managed

    {

    .maxstack 1

    .try

    {

    ldstr "try"

    call void [mscorlib]System.Console::WriteLine(string)

    leave.s label

    }

    finally

    {

    ldstr "finally"

    call void [mscorlib]System.Console::WriteLine(string)

    endfinally

    }

    label:

    ldarg.0

    call instance void [mscorlib]System.Object::.ctor()

    ret

    }

    .method private hidebysig static void Main() cil managed

    {

    .entrypoint

    .maxstack 1

    newobj instance void Test::.ctor()

    pop

    ret

    }

    }

  4. David Notario says:

    Jeroen, I believe the verifier wants the .ctor of Test to call its base constructor before doing anything else (to guarantee the state of the base class). If you just call Object::.ctor as the first thing in your .ctor you’ll become verifiable.

    I’ll confirm this with verifier guys.

  5. David Notario says:

    Jeroen, I believe the verifier wants the .ctor of Test to call its base constructor before doing anything else (to guarantee the state of the base class). If you just call Object::.ctor as the first thing in your .ctor you’ll become verifiable.

    I’ll confirm this with verifier guys.

  6. Thanks. Believe me, I do understand the issues involved. I’ve written a Java VM for .NET and this is kind of annoying (I could work around it by hoisting the try/finally code into a static method).

    BTW, the ECMA spec doesn’t say anything about this (apart from the fact that you must call be base class constructor before returning from the constructor).

    There’s also a related issue. I’d like to be able to do (also in a constructor):

    try

    {

    // constructor body

    // including call to base class ctor

    }

    finally

    {

    GC.KeepAlive(this);

    }

    This is needed to make sure the object isn’t GC’ed before the constructor ends (which is required by the JVM spec). However, this code isn’t verifiable because this may be uninitialized and the verifier doesn’t understand that GC.KeepAlive is a harmless method.

    (This same issue also applies to Monitor.Enter/Exit, the verifier should also allow these methods to be called with an uninitialized this).

  7. David Notario says:

    Jeroen, first, thanks for doing the right thing (being verifiable) I agree with you that the status quo is not the best it could be. I spoke with the verifier people. It turns out that some of our verifier rules are not in ECMA. Some of them are just implementation limitations we don’t want to standarize and others that will eventually get there.

    Let me know of other stuff you find, I’ll make sure we’re tracking it (I can start a post where I can add these as we find them, as a way of having them public until they get standarized)

    I assume you need the KeepAlive for the case where the instance .ctor gets inlined and you have code like this

    .ctor()

    {

    some_field = 0;

    // if .ctor was inlined at this point, with no other references it could be collected.

    Console.WriteLine("Hello");

    }

    Why do you need the GC.KeepAlive inside a finally? GC.KeepAlive should be treated as a side effect, so just adding the KeepAlive at the end should do what you want (assuming that in the case of an exception you don’t care, as the .ctor will never complete)

    If that doesn’t work:

    Translate

    java_equivalent_of_newobj

    to:

    newobj instance_ctor

    dup

    call GC.KeepAlive

    In both cases you also get rid of the EH overhead.

    Let me know if you need anything else.

  8. Thanks. I think the exception scenario may also be important. For some constructor parameter values, the JIT may be able to inline the constructor and eliminate code in such a way that it can detect that an exception will be throw, this will affect the liveness of this the pointer. BTW, the reason this is important is not so much because of external side effects (like the Console.WriteLine example you give), but because the constructor might contain native method calls that use an unmanaged resource. If the finalizer runs before the constructor terminates, the unmanaged resource will be freed will the native code is running. This is obviously not good 😉

    I did consider your alternate suggestion (adding the KeepAlive after the newobj), but that doesn’t satisfy me either, because other .NET languages obvious won’t do that, so interop will suffer.

    One potential solution I thought of yesterday is to add a call to KeepAlive after every call that occurs in the constructor (once the this has been initialized). That could still theoretically (in some obscure scenarios) cause problems when the base class constructor throws an exception, but I think it might be the best workaround.

  9. Oh, and additionally, the try/finally before the base class ctor call is actually a real issue as well (well, it’s actually try/catch since the JVM doesn’t have a try/finally construct).

    Take the following Java code:

    class Base {

    Base(Class c) { … }

    }

    class Derived {

    Derived() {

    super(SomeClass.class);

    }

    }

    (The "SomeClass.class" construct is equivalent to C#’s typeof(SomeClass))

    One particular Java compiler compiles the Derived ctor as:

    Class c;

    try {

    c = Class.forName("SomeClass");

    } catch(ClassNotFoundException) {

    throw new NoClassDefFoundError();

    }

    super(c);

    My current (straight-forward) translation of this construct is rejected by PEverify is not verifiable (however, note that the CLR verifier doesn’t agree with PEVerify on this!)

    What I’d like to know is whether PEVerify is right, or the runtime is right.

  10. Speaking of dynamic IL generation …

    Before Whidbey, the framework supplied two ways of creating code…

  11. Managed C++ (MC++) code generation is a cool accomplishment. I think it’s

    another good testimony…

  12. Managed C++ (MC++) code generation is a cool accomplishment. I think it’s

    another good testimony…

  13. Managed C++ (MC++) code generation is a cool accomplishment. I think it’s

    another good testimony…

  14. Managed C++ (MC++) code generation is a cool accomplishment. I think it’s another good testimony to the…