Subtleties of C# IL codegen

It must be CLR week over at The Old New Thing because it's been non-stop posts about C# lately. Raymond's last two technical posts have been about null checks and no-op instructions generated by the jitter when translating IL into machine code.  

I'll comment on both posts here, but I want to get the no-op discussion done first, because there are some subtleties here I believe that Raymond's statement that the jitter does not generate no-ops when not debugging is not entirely correct. This is not a mere nitpick -- as we'll see, whether it does so or not actually has semantic relevance in rare stress cases.

Now, I'll slap a disclaimer of my own on here: I know way more about the compiler than the jitter/debugger interaction. This is my understanding of how it works. If someone who actually works on the jitter would like to confirm my and Raymond's interpretation of what we see going on here, I'd welcome that.

Before I get into the details, let me point out that in the C# compiler, "debug info emitting on/off" and "IL optimizations on/off" are orthogonal settings. One controls whether debug info is emitted, the other controls what IL the code generator spits out. It is sensible to set them as opposites but you certainly do not have to.

With optimizations off, the C# compiler emits no-op IL instructions all over the place.  With debug info on and optimizations off, some of those no-ops will be there to be targets of breakpoints for statements or fragments of expressions which would otherwise be hard to put a breakpoint on.

The jitter then cheerfully turns IL no-ops into x86 no-ops. I suspect that it does so whether there is a debugger attached or not.

Furthermore, I have not heard that the jitter ever manufactures no-ops out of whole cloth for debugging purposes, as Raymond implies. I suspect -- but I have not verified -- that if you compile your C# program with debug info on AND optimizations on, then you'll see a lot fewer no-ops in the jitted code (and your debugging experience will be correspondingly worse). The jitter may of course generate no-ops for other purposes -- padding code out to word boundaries, etc.

Now we come to the important point: It is emphatically NOT the case that a no-op cannot affect the behaviour of a program, as many people incorrectly believe.

In C#, lock(expression) statement is a syntactic sugar for something like

temp = expression;
System.Threading.Monitor.Enter(temp);
try { statement } finally { System.Threading.Monitor.Exit(temp); }

The x86 jitter has the nice property that the code it generates guarantees that an exception is never thrown between the Enter and the try. This means that the finally always executes if the lock has been taken, which means that the locked resource is always unlocked.

That is, unless the C# compiler generates a no-op IL instruction between the Enter and the try! The jitter turns that into a no-op x86 instruction, and it is possible for another thread to cause a thread abort exception while the thread that just took the lock is in the no-op. This is a long-standing bug in C# which we will unfortunately not be fixing for C# 3.0.

If the scenario I've described happens then the finally will never be run, the lock will never be released and hey, now we're just begging for a deadlock.

That's the only situation I know of in which emitting a no-op can cause a serious semantic change in a program -- turning a working program into a deadlocking one. And that sucks.

I've been talking with some of the CLR jitter and threading guys about ways we can fix this more robustly than merely removing the no-op. I'm hoping we'll figure something out for some future version of the C# language.

As for the bit about emitting null checks: indeed, at the time of a call to an instance method, whether virtual or not, we guarantee that the object of the call is not null by throwing an exception if it is. The way this is implemented in IL is a little odd. There are two instructions we can emit: call, and callvirt. call does NOT do a null check and does a non-virtual call. callvirt does do a null check and does a virtual call if it is a virtual method, or a non-virtual call if it is not.

If you look at the IL generated for a non-virtual call on an instance method, you'll see that sometimes we generate a call, sometimes we generate a callvirt. Why? We generate the callvirt when we want to force the jitter to generate a null check. We generate a call when we know that no null check is necessary, thereby allowing the jitter to skip the null check and generate slightly faster and smaller code.

When do we know that the null check can be skipped? If you have something like (new Foo()).FooNonVirtualMethod() we know that the allocator never returns null, so we can skip the check. It's a nice, straightforward optimization, but the realization in the IL is a bit subtle.