Subtleties of C# IL codegen


It must be CLR week over at The Old New Thing because it’s been non-stop posts about C# lately. Raymond’s last two technical posts have been about null checks and no-op instructions generated by the jitter when translating IL into machine code.  

I’ll comment on both posts here, but I want to get the no-op discussion done first, because there are some subtleties here I believe that Raymond’s statement that the jitter does not generate no-ops when not debugging is not entirely correct. This is not a mere nitpick — as we’ll see, whether it does so or not actually has semantic relevance in rare stress cases.

Now, I’ll slap a disclaimer of my own on here: I know way more about the compiler than the jitter/debugger interaction. This is my understanding of how it works. If someone who actually works on the jitter would like to confirm my and Raymond’s interpretation of what we see going on here, I’d welcome that.

Before I get into the details, let me point out that in the C# compiler, “debug info emitting on/off” and “IL optimizations on/off” are orthogonal settings. One controls whether debug info is emitted, the other controls what IL the code generator spits out. It is sensible to set them as opposites but you certainly do not have to.

With optimizations off, the C# compiler emits no-op IL instructions all over the place.  With debug info on and optimizations off, some of those no-ops will be there to be targets of breakpoints for statements or fragments of expressions which would otherwise be hard to put a breakpoint on.

The jitter then cheerfully turns IL no-ops into x86 no-ops. I suspect that it does so whether there is a debugger attached or not.

Furthermore, I have not heard that the jitter ever manufactures no-ops out of whole cloth for debugging purposes, as Raymond implies. I suspect — but I have not verified — that if you compile your C# program with debug info on AND optimizations on, then you’ll see a lot fewer no-ops in the jitted code (and your debugging experience will be correspondingly worse). The jitter may of course generate no-ops for other purposes — padding code out to word boundaries, etc.

Now we come to the important point: It is emphatically NOT the case that a no-op cannot affect the behaviour of a program, as many people incorrectly believe.

In C#, lock(expression) statement is a syntactic sugar for something like

temp = expression;
System.Threading.Monitor.Enter(temp);
try { statement } finally { System.Threading.Monitor.Exit(temp); }

The x86 jitter has the nice property that the code it generates guarantees that an exception is never thrown between the Enter and the try. This means that the finally always executes if the lock has been taken, which means that the locked resource is always unlocked.

That is, unless the C# compiler generates a no-op IL instruction between the Enter and the try! The jitter turns that into a no-op x86 instruction, and it is possible for another thread to cause a thread abort exception while the thread that just took the lock is in the no-op. This is a long-standing bug in C# which we will unfortunately not be fixing for C# 3.0.

If the scenario I’ve described happens then the finally will never be run, the lock will never be released and hey, now we’re just begging for a deadlock.

That’s the only situation I know of in which emitting a no-op can cause a serious semantic change in a program — turning a working program into a deadlocking one. And that sucks.

I’ve been talking with some of the CLR jitter and threading guys about ways we can fix this more robustly than merely removing the no-op. I’m hoping we’ll figure something out for some future version of the C# language.

As for the bit about emitting null checks: indeed, at the time of a call to an instance method, whether virtual or not, we guarantee that the object of the call is not null by throwing an exception if it is. The way this is implemented in IL is a little odd. There are two instructions we can emit: call, and callvirt. call does NOT do a null check and does a non-virtual call. callvirt does do a null check and does a virtual call if it is a virtual method, or a non-virtual call if it is not.

If you look at the IL generated for a non-virtual call on an instance method, you’ll see that sometimes we generate a call, sometimes we generate a callvirt. Why? We generate the callvirt when we want to force the jitter to generate a null check. We generate a call when we know that no null check is necessary, thereby allowing the jitter to skip the null check and generate slightly faster and smaller code.

When do we know that the null check can be skipped? If you have something like (new Foo()).FooNonVirtualMethod() we know that the allocator never returns null, so we can skip the check. It’s a nice, straightforward optimization, but the realization in the IL is a bit subtle.

Comments (23)

  1. Peter Ritchie says:

    The JIT does emit different code when a debugger is attached.  I don’t know if that specifically has an effect on no-ops; but it wouldn’t be surprising.  It would be fairly easy to see if the assembly was NGENed or attached to by the debugger after the code had been run outside the debugger.

  2. >>This means that the finally always executes if the lock has been taken,

    >>which means that the locked resource is always unlocked.

    Not always, if it’s in a background thread then finally might not be called if the process ends (all forground threads have ended). I found that out the hard way when I had a finally that sometimes never completed in some code I wrote 🙂

    Great blog by the way!

    Regards

    Lee

  3. mikey says:

    i’m sure there is a legitimate reason that the Monitor.Enter(temp) can’t go in the try. but i don’t know what it is. so … why not? it would seemingly make sense.

  4. Peter Ritchie says:

    Mikey: because if the Monitor.Enter were in the try block and it caused an exception, the finally block would always get executed.  You can only assume that if Monitor.Enter threw an exception that it didn’t lock.  If it didn’t lock then there’s no reason to enter the finally block to ensure that Monitor.Exit is called.  So, it’s outside the try block.

  5. mikey says:

    peter: true. oops.

    but surely there is a reasonably sensible way to handle that in the try? (i.e. wrap it in an inner try, perhaps, [not pretty i suppose]) or check if you received the lock before exiting.

  6. Peter Ritchie says:

    Mikey: sure, you can skip the whole C# lock keyword and do whatever you want with Monitor.Enter, Monitor.Exit, try and finally; adding any number of check you want. But, what would be the point?  Likely you’d want it so it’s debug-only; so you’ve got that complexity, add that with all the other complexities can you guarantee that all instances of that code will be fault free?

    Keep in mind, the scenario that Eric discusses will only occur on a debug built on a multi-processor computer and two threads have to be executing the same instruction (essentially) at the same time.  That’s an extremely rare occurrence.  Yes, it might happen; but I wouldn’t suggest changing your code to compensate for it.  Debugging multithreaded code has many other problems.

  7. Eric Lippert says:

    > or check if you received the lock before exiting.

    As I said, we’re talking with the CLR guys to try to do something like that. For example, we could have a version of Enter which reports whether the lock was taken, and then put the Enter in the try. Then "lock(x) statement" would translate to

    bool mustExit = false;

    bool temp = x;

    try{

     Enter(temp, out mustExit);

     statement;

    }

    finally {

    if (mustExit) Exit(temp);

    }

    Enter would have to set the out parameter atomically with taking out the lock.

    However, there are drawbacks to that approach as well.  I’ll probably write a blog article about that at some point.

  8. Derek Park says:

    Even if you patched up the block to guarantee that the finally block would be entered, isn’t there still the potential problem that the same thing could happen in the finally block?

    e.g.:  try { … } finally { NOP; cleanup; }

    Is it guaranteed that an exception cannot be thrown between the start of the finally block and the first statement of the finally block?  If not, it seems like that would need to be patched as well.

  9. Derek Park says:

    Peter: "Keep in mind, the scenario that Eric discusses will only occur on a debug built on a multi-processor computer and two threads have to be executing the same instruction (essentially) at the same time."

    That’s not how I read it.  It seemed that Eric was saying that the NOP can occur even when not in debug mode.  It also seems that the problem wasn’t from two threads executing the same code, but instead, that one thread is attempting to take the lock when another thread kills it.  This is probably (hopefully) still rare, but not as rare as what you’re describing.

  10. Peter Ritchie says:

    Derek: to be clear, yes you can get NOPs to be emitted in release mode; but you’d have to disable optimizations.  Optimizations by default are only disabled for debug mode.

    To correct myself: it’s not that the two threads would be executing the same instruction at the same time, it’s that one thread would need to be executing the NOP after Monitor.Enter (the try block begins at the instruction after that, which may also be a nop, i.e. the next instruction is not a "try" instruction) and the other thread would have to call that thread’s Abort method while the NOP instruction was being executed.  I would think that would be even more rare.

  11. Peter Ritchie says:

    Eric: could the .try directive simply not include the NOP following the Monitor.Enter to solve the problem?  

    This is what I’m seeing in IL:

       L_0011: call void [mscorlib]System.Threading.Monitor::Enter(object)

       L_0016: nop

       L_0017: nop

       L_0018: ldc.i4.1

       L_0019: stloc.1

       L_001a: nop

       L_001b: leave.s L_0025

       L_001d: ldloc.2

       L_001e: call void [mscorlib]System.Threading.Monitor::Exit(object)

       L_0023: nop

       L_0024: endfinally

       L_0025: nop

      .try L_0017 to L_001d finally handler L_001d to L_0025

    What would be the debugging consequences of changing it to:?

      .try L_0016 to L_001d finally handler L_001d to L_0025

  12. mikey says:

    peter:

    do you mean trying it to L_0018 instead of L_0016? otherwise you’re including two nops. also, wouldn’t the fact that you aren’t including the nops mean you can no longer put a break point on the start of the

    try {

    statement?

    i think a nop before a try is valid and fine; it just seems that the generated Monitor.Enter should be within the try, with a ‘achievedLock’ boolean result from .Enter.

  13. Peter Ritchie says:

    Mikey: Yes, L_0016, not L_0018–which would include both the NOPs in the protected region (aka try block).  Yes, I suppose that would not allow you to put a break point at the start of a try block, but Visual C# 2005 doesn’t let me do that anyway.  I can put a break point on the open brace, not the try.  Which puts it at the second NOP at the start of the protected region (I’m assuming, this is x86 instructions, there’s two x86 NOP instructions just like the IL; I’m assuming there’s a one-to-one relationship).  If including both NOPs in the protected region gets around this issue, let’s do it, or just get rid of the first NOP.

    lock(someObject)

    { … }

    is a different issue.  I need to be able to break on the lock statement because I need to break before the call to Monitor.Enter (I can get that with the try because Monitor.Enter is on it’s own line).  Also with lock I can break on the open brace; and as with the try the breakpoint on the open brace puts it the second not, at the start of the protected region.

    Now, you might be saying, "well, having both NOPs allows me to fine-tune by breakpoints in the disassembly"; but it doesn’t, you already have a much finer-grained ability to break on disassembly instructions, adding a couple of NOPs adds nothing.  In disassembly I can put a breakpoint on the call to Monitor.Enter, or the first instruction in the protected region just as I can with the C# debugger–I see no need to be able to put a breakpoint after the call to Enter but before the protected region (the first NOP), and I don’t see a need to put a breakpoint before the first instruction in the protected region but within the protected region (the second NOP).

    Now, as far as I can tell (Eric can correct me or validate this) this is strictly a C# compiler thing (the two NOPs).  I believe he’s said he really doesn’t no why the NOPs are there and it’s his interpretation they’re there to provide an instructions to put a breakpoint on, which explains one NOP but not two since the C# debugger only allows you to put a breakpoint on one of them, the one in the protected region–so the first NOP that’s causing all the fuss isn’t even being used by the debugger.

  14. Tanveer Badar says:

    Eric perhaps you can help out with this? Any explanation?

    http://11011.net/archives/000714.html

  15. Continuing the theme of Thead.Sleep is a sign of a poorly designed program , I've been meaning to

  16. andreister says:

    Hi there!  Great, but I have two questions:

    "…The jitter may of course generate no-ops for other purposes — padding code out to word boundaries, etc.."

    Isn’t that true that a code is ALWAYS padded out to word boundaries by default?? Why do we need nops for that?

    "…I suspect — but I have not verified — that if you compile your C# program with debug info on AND optimizations on…"

    Maybe should read "debug info *off* AND optimizations on"?

    Please correct me if I’m wrong…

  17. Eric Lippert says:

    > Isn’t that true that a code is ALWAYS padded out to word boundaries by default??

    No, that is not true. (Hint: you need to think about all possible architectures, not just x86.)

    > Why do we need nops for that?

    The jump instruction is faster on a 64 bit machine if it jumps to an instruction aligned on an eight byte boundary. The jitter may therefore choose to introduce nops so that frequently targeted instructions — like loop beginnings — are aligned to an eight byte boundary.

    > Maybe should read "debug info *off* AND optimizations on"?

    No, I meant "on". This echoes my earlier point that debug info vs optimization is orthogonal. Having debug info on does not change the IL codegen.

  18. andreister says:

    Eric thanks for the answers!

    Another portion:

    "…it is possible for another thread to cause a thread abort exception while the thread that just took the lock is in the no-op…"

    1. Does this mean, say, thread A has acquired the lock and is standing on the nop – but thread B at this point causes thread A to abort? Or do you see any more complicated scenarios?

    2. Why simply including Monitor.Enter in the try block wouldn’t help?

  19. Eric Lippert says:

    1) That is the scenario I had in mind, yes.

    2) Because then you have the opposite problem. What if the thread abort happens _before_ the lock is taken out? The finally will then release a lock that was never taken, which is potentially as bad as never releasing a taken lock.

    What we need is Enter(object obj, out bool entered). If we had such a method then we could generate

    object temp = expr;

    bool entered = false;

    try {

    Enter(temp, out entered);

    statement

    } finally {

    if (entered) Exit(temp);

    }

    which would have none of these problems. I am hoping that in a future version of C#/CLR we have such a method available to us.

  20. Peter Ritchie says:

    Eric, were you hoping that the C# compiler would then use that new Enter method for the lock keyword?  What is the likelihood of red bits being changed to accomodate that?  If Thread.Abort was deprecated, would a new Enter still be needed?

    Seems like a better idea considering the "hack" required by the CLR to make what we have thread-safe on x86 leaving 64-bit dangling in the wind.

  21. Eric Lippert says:

    > were you hoping that the C# compiler would then use that new Enter method for the lock keyword?

    Yes.

    > What is the likelihood of red bits being changed to accomodate that?

    Low. But not zero. (The implementation of that functionality already exists in the red bits, it is just not publically exposed.)

    > If Thread.Abort was deprecated, would a new Enter still be needed?

    If wishes were horses, would beggars ride?  

    I try to not reason from counterfactuals. It is unlikely that Thread.Abort will be deprecated, and even if it were, deprecated does not mean nonexistant.

    > Seems like a better idea

    That’s the idea, yes.

  22. andreister says:

    Eric, and what about another problem that C# lock is being translated to a Monitor.Enter but not to Monitor.TryEnter (although the latter could possibly help to avoid deadlocks)?

    Something like IanG (among others) was worrying about back in 2004 http://www.interact-sw.co.uk/iangblog/2004/03/23/locking

    Is there any trends in the compiler to address that in the future? Or probably compiler guys think that those who don’t  like the neat beauty of lock, could just use TryEnter and relax? 🙂

  23. A couple years ago I wrote a bit about how our codegen for the lock statement could sometimes lead to