What are these strange cmp [ecx], ecx instructions doing in my C# code?


When you debug through some managed code at the assembly level, you'll find a whole lot of seemingly pointless instructions that perform a comparison but ignore the result. What's the point of comparing two values if you don't care what the result is?

In C++, invoking an instance method on a NULL pointer results in undefined behavior. In other words, if you do it, the compiler is allowed to do anything it wants. And what most compilers do is, um, nothing. They don't take any special steps if the this pointer is NULL; they just generate code on the assumption that it isn't. In practice, this often means that everything seems to run just fine until you access a member variables or call a virtual functions, and then you crash.

The C# language, by comparison, is quite explicit about what happens if you invoke an instance method on a null object reference:

The value of E is checked to be valid. If the value of E is null, a System.NullReferenceException is thrown and no further steps are executed.

The null reference exception must be thrown before the method can be called. That's what the strange cmp [ecx], ecx comparison is for.¹ The compiler doesn't actually care what the result of the comparison is; it just wants to raise an exception if ecx is null. If ecx is null, the attempt to dereference it (in order to perform the comparison) will raise an access violation, which the runtime inspects and turns into a NullReferenceException.

The test is usually against the ecx register since the CLR happens to use² the fastcall calling convention, which for instance methods passes the this pointer in the ecx register. The pointer the compiler wants to test is going to wind up in the ecx register sooner or later,³ so it's not surprising that the test, when it happens, is made against the ecx register.

Nitpicker's Corner

¹Although this statement is written as if it were a fact, it is actually my interpretation based on observation and thinking about how language features are implemented. It is not an official position of the CLR team nor Microsoft Corporation, and that interpretation may ultimately prove incorrect.

²"Happens to use" means that this is an implementation detail, not a contractual guarantee.¹

³Unless the call is optimized. For example, the function might be inlined.

Comments (41)
  1. David Walker says:

    You say “if ecx is null”… But the assembly-language registers
    don’t actually contain “null” in this case, do they?  Do they
    contain zero?  I suppose that cmp ecx, ecx will raise an access
    violation if ecx is zero.

    Since I’m not an assembly language progammer, maybe there is some
    special value in a register that means the corresponding C# object is
    null.  (I’m not really trying to nitpick, honest.)  I
    couldn’t figure this out with a Google search.

    [“If ecx contains a value corresponding to a null
    reference”. I thought it was obvious that knowledge of assembly
    language was a prerequisite for this article, seeing as if you don’t
    know assembly language, then you’ll never see “cmp [ecx], ecx” in the
    first place, so you will never ask the question in the title. -Raymond
    ]
  2. Jeff Stong says:

    It’s comical (in a sad way) to see that even your footnotes require footnotes.  I find the blog enlightening and entertaining.  Please don’t let the need for the footnotes discourage you from continuing the blog — it would be sorely missed.

  3. Ulric says:

    strcmp’s post just kicked started my brain, it didn’t process the de-referencing brackets when I first read this earlier. (Is this truly the cheapest way to do this?)

  4. Mike Dimmick says:

    Ah, I see, deliberate access violation – the (virtual) address range between 0 and 64KB (0x0000’0000 to 0x0001’0000) is reserved in Windows and always causes access violations when you do it. The CLR looks at the address that the offending instruction was trying to read from/write to and if it’s 0 (possibly up to 64KB) it’ll interpret that as a NullReferenceException, otherwise it’s an AccessViolationException.

    Getting the hardware to generate an access violation isn’t particularly quick, but it’s supposed to be the exceptional case. We’re worried about the performance of the regular case of the pointer not being null. Presumably this instruction was selected as being one that pipelines well on Intel and AMD processors without introducing too many other dependencies, while still being pretty small, only two bytes. As a side-effect it catches any other bad pointers, as well as just null ones. The value of [ecx] is likely to be needed anyway if making a virtual call – C++ compilers usually put the vtable pointer as the first member of the object, and I think this was carried over into the CLR.

  5. "Raymond’s current temper tantrum"

    I would call it "justifyable and increasing irritation with commenters being demanding, whiny, pedantic, clueless or argumentative". It is a PERSONAL blog, which he does for PERSONAL enjoyment. Thank you for lessening the latter another notch by using this term.

  6. EricLippert says:

    An additional data point:

    As Raymond correctly conjectures, the jitter turns a callvirt into machine code which ensures that the "this" reference is good before the call. This is why the C# compiler sometimes generates callvirt instructions even when we are making a call to a non-virtual instance method — we want to get the null check "for free".

    If the C# compiler detects that it is doing a call on a non-virtual instance method and we can deduce at compile time that the object of the call cannot possibly be null, then we sometimes do just an ordinary call without forcing the check. This allows the jitter to generate slightly more efficient code.

    For example, if you have

    GiveMeAFoo().FooNonVirtMethod()

    we will generate a virtual call to FooNonVirtMethod so that we get the null check.  But if you have

    (new Foo()).FooNonVirtMethod()

    then we know that the "new" has already thrown an exception if the allocation failed, so this cannot be null, therefore we just generate a call so that we skip the null check.

  7. Drew Hoskins says:

    The effect of this is interesting… aggregating fields by value is in general no faster on .NET than aggregating by reference.  In the former case, it will check *this, and in the latter case, it won’t bother since it’s already dereferencing the field reference, and will get an exception there.  In each case you have one memory load.  At least, that’s what I found to be true with C++/CLI on VS 2005.

  8. Phaeron says:

    I wonder why cmp [ecx], ecx was used instead of mov eax, [ecx]. Was it due to register pressure? The cmp form is slightly more expensive as it takes two uops (load + alu).

  9. Jules says:

    David: think of it this way… assembly language registers are effectively untyped[*].  In the instruction ‘cmp [ecx], ecx’, ecx is being used as a pointer, therefore it does make sense to say that it could contain null.

    * Obviously, I mean "untyped in as far as that’s possible with a very limited data storage capacity".

  10. @Phaeron: Yes of course. Your code requires the jitter to find (or make) a free register.

  11. strcmp says:

    David, don’t let Raymond’s current temper tantrum get you down.  Hopefully it’ll pass soon.  You can think of cmp [ecx],ecx as equivalent* to:

    if( *p == p );

    So naturally if p == 0 (NULL is just 0 **), you’ll throw an exception on *p ([ecx]) because p (ecx) is NULL.  If p is a valid memory reference, the instruction is essentially a no-op.

    * If you nitpick this statement, you live a sad, sad life.

    ** Same with this statement.  The majority of people really honestly don’t care about the more precise answer.

  12. xiaoguo ge says:

    Does this check and use has to be thread safe. If it does, then the cost would be huge. And how does the compiler ensure it is thread safe

    [What could render a null pointer test thread-unsafe? -Raymond]
  13. Pax says:

    Phaeron said:

    =====

    I wonder why cmp [ecx], ecx was used instead of mov eax, [ecx]. Was it due to register pressure? The cmp form is slightly more expensive as it takes two uops (load + alu).

    =====

    Only if you define expense in terms of time :-).  What’s the expense of having your eax register shredded from underneath you (or having to push/pop around the call)?

    Why use an instruction that’s arguably more side-effective than the cmp?  Cmp changes the flags, mov changes a register which may be in use by the calling code.  Given that most instructions would affect the flags, it’s likely that a given sequence of code is more likely to have ‘changes flags’ than ‘changes a register’ as a side-effect.  My question was along the lines of "why use cmp instead of test?".  I don’t have the energy to go and analyse instruction cycles for them so I’ll defer to others.

    Xiaoguo Ge stated:

    =====

    Does this check and use has to be thread safe. If it does, then the cost would be huge. And how does the compiler ensure it is thread safe?

    =====

    Does Intel still only allow interrupts between instructions (the ‘186 was the last chip I coded in ASM so I’m unsure)?  If so, there’s no problem with thread-safety.  The check-and-use of cmp is irrelevant since we don’t care what the result of the comparison would be – we’re just using the memory-access part of that instruction to raise an exception if it’s NULL.

  14. Burble says:

    I like the footnote in the footnote.  In the future, you could add footnote 2 to footnote 1, and vice versa.  Then all the nitpickers would end up stuck in an infinite loop and have no further impact on your writing.

  15. Alpha Male says:

    > [What could render a null pointer test thread-unsafe? -Raymond]

    Are you assuming x86?

    [Assume any processor that is supported by the CLR. -Raymond]
  16. Xavi says:

    “so you will never ask the question in the title”

    Ok… this is it.

    Sorry to spoil your weekend Raymond, but you can delete me from your readers list.No, begging won’t help.

    The way I see it, your knowledge doesn’t make up your attitude anymore. Your cynic and arrogance response in return to a polite (though in your view stupid question) just kicked me. Reading your blog turns me down. Seeing how you post to a broad audience and seeing how you can’t politely deal with the echo is a sad thing.

    To make a long story short, this is your blog and you can do what you want, now without me.

    [I don’t post to a broad audience, but a broad audience reads me. I’m trying to target advanced programmers. If I write an article about assembly language, then I’m going to assume you know assembly language. If you don’t, then you can skip that article. I’m not going to write an assembly language tutorial to “get everybody up to speed”; I assume that you’re already up to speed. -Raymond]
  17. @AlphaMale: given that the article is about the disassembly of an x86 machine code instruction, I think you can probably take it as a given that Raymond is assuming an x86 architecture.

  18. poochner says:

    I’m not seeing how this could be thread unsafe, anymore than any other use.  It’s testing the [code]this[/code] pointer for an object during the call to one of that object’s methods.  Say we’re calling a method on an object for which we have member reference.  The thread that’s making the call has a working reference to the object.  If some other thread nulls out the member, that’s not going to change that the caller still has its reference.

    If the thread calling the referred object somehow gets a ref to itself back from the owner (via a getter, say), it could get a NullReferenceException, but that’s a normal thread-safety issue.

  19. Jeff Stong says:

    "In the future, you could add footnote 2 to footnote 1, and vice versa."

    This reminds me of a couple of entries in the glossary of a reference book I received (many years ago) with an Apple IIe.

    Infinite loop : See loop, infinite.

    and

    Loop, infinite : See infinite loop.

  20. It must be CLR week over at The Old New Thing because it’s been non-stop posts about C# lately. Raymond’s

  21. Mike Dimmick says:

    @Pax: why not use test? If you simply test the value of ecx against 0, you then need to add a branch to generate the exception (best case, JZ <exception-generator-code>, probably 6 bytes if you’re not going to keep a NullReferenceException generator somewhere near every single block of code that performs the check). If you’re going to use test [ecx], reg, that still requires a two-byte opcode and still incurs both a load and alu cycle.

    Adding the branch causes the branch predictor to do more work, and the processor to have to load more code, with the JIT having to allocate more memory for the code giving potentially more swapping. Remember, this is an operation we’re doing all the time!

    In terms of dependencies on other instructions, this only has a dependency on a previous write to ecx, and only generates downstream dependencies on the flags, and the code generated by the JIT shouldn’t be relying on the values of those anyway (since it will assume that either the pointer is good, or the code after it will not be reached because a hardware exception was raised). x86 doesn’t actually have that many conditional instructions – CMOV and the Jcc family are the main ones. Other architectures – ARM, Itanium, for example – have predicated instructions, but then ARM only sets the condition flags if asked to, the default is to leave the condition flags alone.

    This is another problem with the mov eax, [ecx] alternative – you make future instructions that use eax dependent on this one.

  22. Igor says:

    strcmp said : “If p is a valid memory reference, the instruction is essentially a no-op.”

    It isn’t.

    Actually if p is a valid memory reference, then CMP [ECX], ECX has a side effect of TLB priming and a prefetch.

    If there are two threads accessing the same object where one often writes to some member variable (say at [ECX + 8]), and another does this check each time before calling a method in a loop you will have false cache line sharing issue which is I believe more serious performance hit than the TEST ECX, ECX / JZ RaiseException.

    On second thought, perhaps you don’t need two threads at all. It would be enough for one thread to write the variable at [ECX + 8] and to perform CMP [ECX], ECX before a method call in a loop. That way the write to variable at [ECX + 8] would most likely always have to be written back to memory creating a lot of unneccessary bus traffic.

    [It seems awful strange that a single processor would be forced to flush writes just because a read occurred on the same cache line. I’d think that was a very common coding pattern. -Raymond]
  23. Igor says:

    Let me just clarify that I am talking about x86 here and that RaiseException label should be at the end of the executable because static predictor predicts forward conditional jumps as not taken and it would be right 99% of the time. For that 1% when you get an exception it wouldn’t matter anyway.

  24. Igor says:

    “It seems awful strange that a single processor would be forced to flush writes just because a read occurred on the same cache line. I’d think that was a very common coding pattern. -Raymond”

    You are right, I got confused about that, but the problem stays with the threads and it seems that it can’t be avoided.

    [The question is, then, whether the JIT should optimize on the assumption that most objects are hot on multiple processors simultaneously, or whether it should optimize on the assumption that most objects are used by only one processor at a time. You appear to believe that the JIT should optimize for the former. I would guess that the latter is more reflective of the real world. -Raymond]
  25. EricLippert says:

    does judicious use of "sealed" prevent the generation of unnecessary callvirt calls?

    Sorry, I am not following your train of thought here. How would knowing that a class is sealed enable us to know that the object of a call to a non-virtual method is not null?

  26. Phaeron says:

    @KJK::Hyperion:

    It’s been a while, but I think the 16-bit DOS emulator might commit memory at 0. I guess writing managed NTVDM plugins isn’t a good idea. :)

  27. Igor says:

    “The question is, then, whether the JIT should optimize on the assumption that most objects are hot on multiple processors simultaneously, or whether it should optimize on the assumption that most objects are used by only one processor at a time.”

    How about going forward in step with the hardware for once? Multicore is almost a standard nowadays. Shouldn’t software follow and adapt?

    “I would guess that the latter is more reflective of the real world.”

    I am not into C# so correct me if I assume too much, but if someone for example writes a Queue class, wouldn’t it be neccessary for multiple, not just two threads to access it? How would this CMP [ECX], ECX which is effectively hidden from a developer work to his advantage in that and any other similar case?

    [Most programs are not written with the design that all objects are highly multithreaded with lock-free algorithms. I bet yours aren’t. You can have all the multicore in the world it won’t make a difference. -Raymond]
  28. Igor says:

    And regarding this:

    “It seems awful strange that a single processor would be forced to flush writes just because a read occurred on the same cache line. I’d think that was a very common coding pattern.”

    After rethinking it I believe that there is a possibility for performance loss even with single-threaded code.

    If you have for example (in pseudo code):

    class Crap

    {

    public:

           Crap()

           {

           }

           ~Crap()

           {

           }

           double Calc()

           {

                   // does something smarter

                   // than this of course

                   return rand() * 0.01 + m_Var;

           }

           SetCrap(double Var)

           {

                   m_Var = Var;

           }

           double m_Var;

    };

    And then:

    double  a = 3.14, b;

    Crap    c;

           for (int i = 0; i < 1000; i++) {

                   c.SetCrap(a);

                   a += c.Calc();

           }

    If we assume that m_Var is at [ECX + 0] and if you perform CMP [ECX], ECX before Calc() method call you are probably blocking store to load forwarding because of a large store (double is 8 bytes) followed by a small load (CMP [ECX], ECX loads 4 bytes) thus effectively forcing the write to go through memory.

    But perhaps I am wrong and perhaps I don’t know nothing about code optimization. It is 3:46am here you know.

    [Step through some CLR assembly language to see what really happens. -Raymond]
  29. KJK::Hyperion says:

    Bonus disruption: you *can* allocate memory at virtual address 0 with VirtualAlloc and MapViewOfFile. Don’t do it even if you figure out how, though :-)

    EricLippert: does judicious use of "sealed" prevent the generation of unnecessary callvirt calls?

  30. Igor says:

    “Most programs are not written with the design that all objects are highly multithreaded with lock-free algorithms. I bet yours aren’t. You can have all the multicore in the world it won’t make a difference. -Raymond”

    But I wasn’t implying that queue should/must be lock-free. I just said that this invisible pointer test will hurt performance for any object which is accessed by multiple threads in a loop.

    “Step through some CLR assembly language to see what really happens. -Raymond”

    Great, now I have to install C#…

    [?? If there’s a lock then you’re going to have cache line contention anyway! I assume people are willing to do some basic research before asking questions. If you are not willing to do that, then I am equally unwilling to bother answering. -Raymond]
  31. KJK::Hyperion says:

    EricLippert: sorry, got confused. But "sealed" does turn virtual calls into normal calls, right? since you know the methods couldn’t possibly be overriden?

  32. Mike Dimmick says:

    @KJK::Hyperion: use of ‘sealed’ would indeed allow the compiler to generate call rather than callvirt for a virtual function call, if the static type of the variable is a sealed class. Eric’s discussion is about using callvirt to force a null check, as required by the language, even where a non-virtual call would have been possible.

  33. Igor says:

    “If there’s a lock then you’re going to have cache line contention anyway!”

    Of course but you can at least keep other often accessed data out of the cache line which holds the lock. I am talking about false sharing here which is may not be obvious to the developer.

    [What percentage of programs would benefit from this hyper-micro-optimization? Compare it to the cost to programs that don’t care. -Raymond]
  34. Igor says:

    "What percentage of programs would benefit from this hyper-micro-optimization?"

    Perhaps people from your SQL server team could tell you more about it. I remember them having performance issues with false sharing and blaiming it all on the HyperThreading capable CPUs.

    "Compare it to the cost to programs that don’t care."

    Are you seriously implying that the cost of:

    TEST ECX, ECX

    JZ   NullPointerExceptionHandler

    Is higher than the cost of:

    CMP [ECX], ECX

    ???

    As far as I know from theory it goes like this:

    1. Cost of TEST/JZ is non-existent because forward conditional jump is predicted as not taken.
    2. Cost of CMP is much higher because it is a load from memory which can prime TLB, trigger a page fault, and it generates unneccessary bus traffic.

    Granted I haven’t constructed a test case because I have a busy schedule at the moment so I can’t claim the above is true for the real world code.

    I’ll see if I can find some time to check it out. Or I should perhaps trust Microsoft on this one because they have already compared the two methods on a variety of application types including threaded ones?!?

  35. EricLippert says:

    Re: using "call" on a sealed virtual method.

    Ah, I think I understand. I think your question is "if we are guaranteed that the object is not null AND we know the exact method that will be in the vtable because the class is sealed, then can we generate a call rather than a callvirt?"

    Yes, we could do that. We don’t.

  36. Dean Harding says:
    1. Cost of CMP is much higher because it is a load from memory which can prime TLB, trigger a page fault,

    and it generates unneccessary bus traffic.

    I’m no assembly expert, but surely since [ECX] is your "this" pointer, and you’re only testing because you’re about to call a method on it anyway, then loading the value at [ECX] is offset by the fact that you’re about to use it anyway…?

  37. Igor says:

    "I’m no assembly expert, but surely since [ECX] is your "this" pointer, and you’re only testing because you’re about to call a method on it anyway, then loading the value at [ECX] is offset by the fact that you’re about to use it anyway…?"

    Yes but as I understand it (perhaps incorrectly?), you are always loading from [this + 0], not the actual pointer of the method you are going to call.

    I just gave my .02 when it comes to potential performance impact on the future code (especially threaded), perhaps I am wrong, time will tell.

  38. Pax says:

    @Igor stated: Are you seriously implying that the cost of:

    TEST ECX, ECX

    JZ   NullPointerExceptionHandler

    Is higher than the cost of:

    CMP [ECX], ECX

    ???

    As far as I know from theory it goes like this:

    1. Cost of TEST/JZ is non-existent because forward conditional jump is predicted as not taken.
    2. Cost of CMP is much higher because it is a load from memory which can prime TLB, trigger a page fault, and it generates unneccessary bus traffic.

    =====

    We’re still assuming cost is based on elapsed time?  Cost is a generic term which may be time, memory size, etc.  The ‘cmp ecx,0;jz offset’ method takes (I think) more bytes (3+2, best case assuming nullpointer label within short jump range) than ‘cmp [ecx],ecx’ (2), does it not?

    This increase in memory may not have much of an impact unless there’s LOTS of these tests compared to the rest of the code.

    But you also need to take into account that most ‘cmp [ecx],ecx’ instructions WON’T result in page faults as I would assume the code is debugged in production to the point where there’s no null pointers (TLB priming is beyond my expertise, thank $DEITY I code in high-level languages nowadays.

  39. David Walker says:

    "I thought it was obvious that knowledge of assembly language was a prerequisite for this article…"

    Well, I have actually written lots and lots of assembly language, but not on x86’s or compatible — it was on IBM and compatible mainframes.  And I have read my share of x86-compatible assembler code.

    I was truly confused by the concept of putting a null value into a register!

    Now I understand.  And Xavi, don’t leave because of Raymond’s response to me.  It wasn’t that bad.

Comments are closed.