The Performance of Arrays

Stop me if you’ve heard this one, but here’s some information about how arrays perform, and a neat trick you can do to (possibly) get some performance back.

Some background

In .NET, arrays of reference types are covariant in their element type, but not safely. Eric, as always, has a post that goes into this more deeply if you want to refresh your memory. The upshot is that if you have a Derived[], you can convert it to a Base[] and use it that way. For instance,

 class Base { }
class Derived : Base { }
 
class Program
{
    static void Main()
    {
        Derived[] derivedArray = new Derived[10];
        // This is the covariant conversion
        Base[] baseArray = derivedArray;
 
        for (int i = 0; i < baseArray.Length; ++i)
        {
            // Putting a Derived into our Base[] is ordinary polymorphism
            baseArray[i] = new Derived();
        }
    }
}

Allowing this conversion from Derived[] to Base[] is not safe because now the compiler can’t help you know if your types match up when you set array elements. In the preceding example, I can put the Derived in the array, but imagine if I had another unrelated derived class, OtherDerived. Putting those things in the array must fail, since the actual array can contain only Deriveds, and OtherDeriveds are not Deriveds. In Eric’s perhaps more folksy example, “you can’t put a Turtle into an array of Giraffes.”

 class Base { }
class Derived : Base { }
class OtherDerived : Base { }
 
class Program
{
    static void Main()
    {
        Derived[] derivedArray = new Derived[10];
        // This is the covariant conversion
        Base[] baseArray = derivedArray;
 
        for (int i = 0; i < baseArray.Length; ++i)
        {
            // Putting a OtherDerived into our Base[] is ordinary
            // polymorphism. However, this is going to cause a runtime
            // exception because the actual array won't be able to
            // store an OtherDerived.
            baseArray[i] = new OtherDerived();
            // Unhandled Exception: System.ArrayTypeMismatchException: 
            //   Attempted to access an element as a type incompatible
            // with the array.
        }
    }
}

Well now to the point I want to address: where did that exception come from? It came from the runtime, which was in a unique position to know what the real type of the array object was, and the real type of the element. It needed to determine if the assignment was allowable and throw if not. Now if you think about this for a minute, you can see that it’s going to have to perform that check for every assignment to an array element, right? The compiler just can’t know if any assignment to a covariant array is ever going to work, and the runtime is always going to have to be there to manage the potential for failure. Can we turn that off? Does it make sense to? What does the check cost you?

A workaround

Well, if it’s doing that check for every assignment to an array, that’s probably going to cost you something. Let’s see if we can get rid of the check to determine what the cost is. Remember I said that arrays are covariant only for reference types? We’re going to use that information. If we have an array of value types, the runtime isn’t going to have to perform that check anymore, since it knows that the IL the compiler emitted wasn’t going to allow assignment of anything but that value type to that array.

So if I want to have an array of reference types, but get the runtime to treat it like an array of value types, what have I got to do? Just wrap the reference type in a value type. There you go. That’s the trick. Let’s see what that does.

I’m going to create a simple value type that merely holds a reference. I am even going to make it generic. And to make this as dead-simple as I can, that’s all I’ll do. You could imagine a potentially easier-to-use version of this code that has implicit conversions and nice constructors, but let’s not do any of that.

 struct Reference<T> where T : class
{
    public T Value;
}

Ok, so, wow, what is that thing? It’s no bigger than the reference we started with. And if I want to make an array of them in order to get rid of covariance, I can. And here’s how I’d use it.

 class Base { }
class Derived : Base { }
 
class Program
{
    static void Main()
    {
        Reference<Base>[] baseArray = new Reference<Base>[10];
 
        for (int i = 0; i < baseArray.Length; ++i)
        {
            baseArray[i] = new Reference<Base> { Value = new Derived() };
        }
    }
}

Now I have an array assignment that doesn’t need the runtime to perform a check anymore, and the reason is because it’s impossible for there to be anything in “baseArray” except for an array of the exact type I said. The struct-ness of Reference<T> prevents the possibility of anyone having performed a covariant conversion as existed in our previous examples. So I got rid of that check, right? But I added the initialization of a value type. Did I win or lose? Let’s do some timing to figure it out.

The experiment

Here’s my code. I’m going to time two different things: First, I’ll put a lot of Deriveds into a Base[], and then I’ll put a lot of Reference<Base>s that hold Deriveds into a Reference<Base>. I’m using huge arrays just because I want to prove to myself that the jitter isn’t going to lift some of these operations out of the loop. And I’m producing Release binaries that will run on the 32-bit framework on my 64-bit machine. I’ll time each example a few times to be sure. Here’s the code.

 class Base { }
class Derived : Base { }
 
struct Reference<T> where T : class
{
    public T Value;
}
 
class Program
{
    const int Count = 1 << 27;
 
    static void Main()
    {
        Stopwatch watch = new Stopwatch();
 
        TestReferenceArray(watch);
        TestReferenceArray(watch);
        TestReferenceArray(watch);
 
        TestValueArray(watch);
        TestValueArray(watch);
        TestValueArray(watch);
    }
 
    static Base[] TestReferenceArray(Stopwatch watch)
    {
        Base[] array = new Base[Count];
        var element = new Derived();
 
        watch.Restart();
        for (int i = 0; i < Count; ++i)
        {
            array[i] = element;
        }
        watch.Stop();
        Console.WriteLine(watch.ElapsedMilliseconds);
 
        return array;
    }
 
    static Reference<Base>[] TestValueArray(Stopwatch watch)
    {
        Reference<Base>[] array = new Reference<Base>[Count];
        var element = new Derived();
 
        watch.Restart();
        for (int i = 0; i < Count; ++i)
        {
            array[i] = new Reference<Base> { Value = element };
        }
        watch.Stop();
        Console.WriteLine(watch.ElapsedMilliseconds);
 
        return array;
    }
}

Have a seat. Here’s what it prints out.

2507

2471

2474

595

607

593

Wow! That’s a huge time difference, and a big win. It’s FOUR TIME FASTER! I wouldn’t blame you for thinking that you have to go to your code base right now and eliminate arrays of reference types from it; but hold on a second. That’s not the whole story.

Issue #1: Are the types exact?

In the example that I timed, I was putting a Derived into arrays with element type Base. The types weren’t the same, and therefore the check that the runtime performed had to be complicated by doing something like walking an inheritance hierarchy. But what if I were putting Bases into the Base array. What do the times look like then? (I won’t replicate the code here; just replace “new Derived()” with “new Base()” everywhere).

805

775

771

593

593

599

Oh, that’s much more reasonable. The lesson here is that for my scenario, if I’m not making use of polymorphism, I have much less to be worried about.

Issue #2: Which runtime exactly?

I said that I was doing tests on the 32-bit framework on my 64-bit machine. I ensured that by building my binaries “x86,” which is the default for VS2010. What if I instead build them “Any CPU” so that I can run the test on the 64-bit framework? Under those conditions, I am swapping out the whole stack that my tests run on for a whole different runtime. Let’s see what the numbers are then.

2102

2057

2055

789

784

784

Ok, still faster, but not by as big a factor. What about for the exact type case?

861

848

846

792

790

817

Nearly the same! So my example seems to be not as big a deal for 64-bit processes. I am not sure why. There is another oddity here that I can’t explain: when I build Debug binaries, the Reference<T> trick makes my example about three times SLOWER. I have no idea why.

Issue #3: My example is not your scenario

I tried to make the thing I was timing as simple as possible. But I’m using relatively a lot of array assignments. Do you do that much? Probably not. I’m not doing any I/O and not waiting on any user interaction, are you? I also have a giant array and I’m clearly blowing out the CPU cache in my for loops. Does your code do that? I’m assigning arrays in a tight loop with a known upper bound. Are you? Does the jitter lift the bounds checking out of the loop, and how would that check compare with the assignment compatibility check?

My point is that as with any performance timing, there are a lot of variables and a lot of considerations. If you want to make your program perform better using this trick, you need to try it and then time it yourself. I am not saying that .NET arrays are slower than they need to be or unsuited for your application. But it’s a fun experiment to try; how many milliseconds faster can you make your scenario, and is it worth the cost? Were you using the covariance of arrays in your code, and therefore ineligible to even consider this trick?

Why?

Why does C# have these unsafe covariant arrays to begin with? I was not here at the time, but I can confidently say that the CLR needed them because they needed to support languages that have array covariance. One example of such a language (in fact, probably the interesting example) is Java. C# has them both because the CLR has them and because it was a clear goal to that if users were porting existing code to C#, they were met with as few hurdles as possible. Also, there would have been complications with supporting both covariant arrays and invariant arrays in the same runtime; imagine the confusion users and compiler-writers would face trying to interop between the two. The language design notes from 1999 bear out these claims.