Drilling into .NET Runtime microbenchmarks: 'typeof' optimizations.

In my last blog entry showed how to use a simple simle class called MultiSampleCodeTimer to measure the performance (time), of pretty much any fragment of CLR code quickly and easily. That entry had a solution packaged up as a zip file TypePerfMeasurement.zip that you could use to duplicate the experiment yourself. However I did not have time to show you what is particularly interesting about those paricular performance measurements (I pciked them for a reason). Here I wish to do complete the explaination. 

When I unpack and run TypePerfMeasurement.zip (please do it yourself, detailed instructions at the end of my last post), I get the following output. 

 Data units of msec resolution = 0.279365 usec
10 typeof(string)                       : count: 10000    7.677 +- 3%    msec
10 typeof(string).TypeHandle            : count: 10000    0.000 +- >400% msec
10 anObj.GetType() == type              : count: 10000    7.393 +- 41%   msec
10 Type.GetTypeHandle(obj).Equals(tHnd) : count: 10000    4.554 +- 9%    msec
10 anObj.GetType() == typeof(string)    : count: 10000    0.103 +- 7%    msec
10 (anObj is string)                    : count: 10000    0.594 +- 12%   msec

Please take the time to run it yourself. You don't need Visual studio do so (see readme.txt file, it is as easy as running 'buildit, and running the resulting exe). Note that if you are running under visual studio, and you get very differrent numbers based on whether you run it under VS or outside VS, you did not set your VS settings as I describe in this post. Go and do that now. Note that your numbers are going to be different than mine, but the relative sizes between the various rows should be reasonably close.

What do these numbers mean? We need some background first. The .NET Runtime defineds a type called System.Type, which is a representation of a type in the system. EVERY class that has been defined in the system has a cooresponding System.Type. The System.Type object is the gateway to exploring at runtime all the data that the users typed when he defined the type. You can get superclasses, methods, properties fields, etc by calling the appropriate operations on System.Type. This ability to inspect and invoke characteristics of the source code is called Reflection, and System.Type is your gateway into this capability. 

While the System.Type class is very powerful, that power comes at a cost. Also most programs doe NOT need this power and should NOT use it (I need to blog about that too) becuase of the perf cost. Thus the .NET runtime has tried to provide alternatives to using System.Type for some common operations that CAN be implemented efficiently.

One such operation is testing if an object is of a EXACTLY a certain type. While the 'is' operator in c# can often be used for this (and is the preferred method), however that mechanism will also match any subtype (eg ("foo" is object) returns true) and also only works for literal types (if the type was a variable of type System.Type), you could not do this. For example

     System.Type myType = typeof(string); // in real life not a literal
    // and then in other methods (and maybe in a loop)
    if (anObj.GetType() == myType) { /* some op */ }

In this example we use the 'GetType() method on System.Object to see if 'anObject' is a string. Unfortunately both the operations used above (typeof, and GetType()), are relatively expensive (as shown in my data above 5-10ms instead of .7ms for 'is' operator). 

Type checking operations like the one above are common enough that the .NET runtime has added a special type called 'RuntimeTypeHandle' to make them fast. RuntimeTypeHandle is a less user friendly but 'fast' alternative to System.Type. It happens to be simply a managed wrapper that represents the internal pointer (to a structure called a MethodTable), that the runtime uses internally to represent a type. As mentioned it is not very friendly, however it can handle the scenario above, and is very fast. The code using RuntimeTypeHandles looks like

     RuntimeTypeHandle myType = typeof(string).TypeHandle;
    // and then in other methods (and maybe in a loop)
    if (Type.GetTypeHandle(anObj).Equals(myType)) { /* some op */ }

This code is significantly faster than the previous code becuse no System.Type object needed to be created, only RuntimeTypeHandles, wich is simply a value type wrapping an unmanaged poitner, and thus is very lean (The code above sadly is slower than it should be because some JIT optimizations where not done in time for V2.0, but is still significantly faster than using System.Type. The first operation

     RuntimeTypeHandle myType = typeof(string).TypeHandle;

Is blazenly fast, as shown by our measurements (it is slow small it is in the noise). This is suprising because this line does a 'typeof(string)' and we know that that operation is relatively slow. Setting a breakpoint in this code shows why. That line of code compiled down to

     00000000  mov         dword ptr [ecx+10h],790FA3E0h 

Basically the JIT compiler is smart enough to realize that to fetch the RuntimeTypeHandle, you don't need to fetch the System.Type and fetch its TyepHandle, you can just lookup the value at compile time and emit code to generate the literal. This is very fast. 

Sadly the other part of the operation

     if (Type.GetTypeHandle(anObj).Equals(myType)) { /* some op */ }

Should be just as fast (fetch the MethodTable poitner from the the object, and test for pointer equality), but is not due to some JIT optimizations that did not kick in appropriately (you will see that we make real calls for both GetTypeHanlde and Equals). Even so, it is faster because we avoid creating a System.Type object.

As data point of just how fast the code above coudl be (when the JIT is fixed), consider the final experiment in the code

     result = anObj.GetType() == typeof(string);

Given what we know already, we would expect this to be slow, and yet, the measurements above show it to be FASTER than using the 'is' opertator.  How can that be?   The answer is that the JIT recognises this sequence and knows that while it seems like two System.Type object need to be created and compared, all you really want is a yes-no answer on a type question that can be answered using RuntimeTypeHandles.  It thus substitutes that code

     0000000f  cmp         dword ptr [edx],790FA3E0h 
    00000015  sete        al   
    00000018  mov         byte ptr [ecx+0Ch],al 

EDX holds 'anObj' and the method table for an object is the first field of the object, and 790fa3e0h is the method table pointer (RuntimeTypeHandle) for string. (See previous post on using !DumpMT to determine this). Thus in one instrution we have tested whether 'anObj' is string. The 'sete' instruction converts the procressor condition flags set by 'cmd' into a boolena value in register AL, and the last insturction sets 'result' to this value. This is pretty lean an mean!

Summary:

In this entry, we have taken a look at some code generation for doing 'type reflection'.  We measured its performance and got some anomolies (some operations were much faster than we expected.   We looked into the disassembly for those operations and determined that the JIT compiler was doing some non-trivial optimizations that made certain operations very fast (in one case faster than the 'built in' 'is' operation). 

We have learned that a System.Type object is relatively expensive compared to RuntimeTypeHandle, and we have used techniques from the last few perf blog entries to help dig into exactly why.

I again encourage you to experiment with the TypePerfMeasurement.zip example yourself and hone your skills in measuring and investigating .NET Runtime performance.  In my next blog entry I will be doing 'inventory' on the performance characteristics of the 'basic' operations of the runtime.