A curious subtlety about how CLR does interface dispatch on array types

This post is about some internal CLR magic that usually makes accessing .NET arrays through an interface up to 100 times faster (if you're an optimist; or accessing arrays up to 100 times slower in the worst case, if you're a pessimist). I'm usually very curious about how this universe works under the hood - and finding out about the internals of the CLR for me is like discovering laws of physics :) So if you ever find yourself measuring the timing of array access through generic interfaces in performance critical code, this post is probably worth taking a look at.

Anyway, here's the story. A customer (Robert Kieboom from CityGIS) has reported an intermittent performance decrease when accessing arrays through the IList<T> interface. Specifically, for some weird reason, same codepath started executing 100 times slower after a call to some seemingly unrelated code. Huge thanks to Robert for providing this precise and exact repro (I hope he doesn't mind if I share it):

 class Program
{
    public static int AddValues(IList<int> list)
    {
        int add = 0;
        /*foreach ( int value in list )
          add += value;*/
        for (int i = 0; i < list.Count; i++)
            add += list[i];
        return add;
    }

    static void TestAddArray(int[] items)
    {
        Stopwatch timer = Stopwatch.StartNew();
        AddValues(items); 
        timer.Stop();
        Console.WriteLine("Array: {0} ms.", timer.ElapsedMilliseconds);
    }

    static void TestAddIList(int[] items)
    {
        IList<int> cl = new List<int>(items);

        Stopwatch timer = Stopwatch.StartNew();
        AddValues(cl); 
        timer.Stop();

        Console.WriteLine("IList: {0} ms.", timer.ElapsedMilliseconds);
    }

    static void Main(string[] args)
    {
        int[] items = new int[1000000];
        for (int i = 0; i < items.Length; i++)
            items[i] = i - (items.Length / 2);

        TestAddArray(items); // fast
        TestAddArray(items); // fast

        TestAddIList(items); // something weird happens??!

        TestAddArray(items); // slow
        TestAddArray(items); // slow
    }
}

Here's what this program does. First we call TestAddArray a couple of times that uses an array disguised as an IList<int> - both calls are very fast, each about 108ms on my machine.

Then we call TestAddIList that uses a List<int> disguised as an IList<int> - this call is even faster, 21 ms.

Now - watch this - both subsequent calls to TestAddArray take 3638ms and 3677ms respectively - about 33 times slower!

If you remove the call to TestAddIList(items), all four calls will be equally fast. What's going on? The same codepath becomes slower after some random call? We were puzzled. After Eric couldn't find any explanation for this from the C# compiler standpoint (we were producing correct IL), it was clear that the issue is deeper than IL and has to do with the execution engine itself. So I routed the bug to the CLR team - and pretty soon we had a reply from them.

It turns out, CLR has some optimizations on how they do virtual dispatch on a generic interface, if the runtime type is an array. As I'm afraid to talk nonsense outside of my area of expertise, I'll just quote the bug description from the CLR folks:

The problem is rooted in the fact that IList<T>::get_Item is an interface invocation on an array type (in the example an int[]).  Because arrays are very common, and interface dispatch through interfaces like IList<T> is uncommon, the runtime saves significant space by having special logic for dispatching through generic interfaces like these. 

Our interface dispatch logic has an important optimization where FOR EACH CALL SITE, it checks one particular target explicitly, and if that fails, does slower hash table lookup, and if that lookup fails, fall back to an  expensive lookup.  Thus for calls sites that tend to go to one destination it is very fast, and call sites that go to many targets, you get good, but not as great performance.

As it turns out, there is a problem in the special logic for dispatch logic for arrays (but not for ordinary interfaces) that cause the hash table lookup to fail. What this means is either interface dispatch FOR ARRAY TYPES is very fast (if the explict check works), or very slow (when it does not).  

In the fast case, the interface calls to 'get_Item and get_count' (used in the [] operator) call with at 'this' pointer of int[] first, which means this is the value that is used for the 'fast check'.   The second time we call it with List<int>, not int[]. However since there is no bug associated with normal (not array) dispatch, everything works well.

However when the benchmarks are reversed, List<int> is the first 'this' that dispatches through 'get_Item' and 'get_Count' call sites so they get the benefit of the fast check. When the first benchmark is run, it now dispatches using an int[] as the 'this' pointer, and because of the bug, the hash table lookup always fails and so only the very slow lookup mechanism is used.

Workarounds:

1) In performance critical code, ensure that if arrays are passes as IList<T> (or super Interfaces), that T[] is always called first.  There is a good chance that this approach is impractical, but if there are only a few such critical paths then it is possibly useful. 

2) Always use List<T> where you would have used T[]. By the way, if you can avoid using Collection<T> that is probably a good thing, as it is a wrapper that in most cases does not add much value.

3) In any performance critical paths, special case the array case with code like

             method(IList<T> aIList)

             {

T[] asArray = aList as T[]

                    if (asArray != null)

                    {

                              // logic on arrays

                    }

                    else

                    {

                              // logic on aList

                    }

Option 3 has the advantage that it is SIGNIFICANTLY faster than interface dispatch will ever be because the array check is done only once (and then lots of operations are done), rather than effectively being done on every interface invocation. I would not recommend any work-around that does not have some positive effect on your code base.  I could believe that options 2 and 3 MAY be a superior solution in the long term anyway, so I would suggest investigating them.

This explains why calling arrays through generic interfaces might under certain conditions become considerably slower (or, alternatively, why these calls are in many cases considerably faster than the general case). In any case, virtual dispatch is slower than calling array members directly, so if you have a chance, just work with arrays directly or use generic collections like List<T>.

Regarding the future fate of this particular CLR optimization failing in some cases, we are considering whether we shall fix it for the next release of the .NET framework. However, as always: no promises here folks! If you do hit this problem (which is pretty unlikely), just use the workarounds for now - thankfully they are easy and reliable.

I hope this was insightful and many folks have found this interesting - I'd like to give credit to everyone who reported, reproduced and investigated this issue (including, among others, Robert Kieboom, Arie Leeuwesteijn, Erik Meijer, Eric Lippert, Vance Morrison and Jan Kotas) - and I hope they don't mind me sharing this with the rest of the .NET world.