Performance Quiz #9 : IList<T> List and array speed


A short and sweet quiz with lots of juicy discussion possibilities:


           public int Sum(IList<ushort> indices)
           {
               int result = 0;
               for (int i = 0; i < indices.Count; i++) 
                   result += indices[i];
               return result;
           }


Considering only the time it takes to do the Sum (i.e. assuming we had already set up the array/list) which gives better performance and why?


           // #1
           ushort[] tmp = new ushort[500000]; // this doesn’t count
           Sum(tmp); // this is what we are timing


OR


           // #2
           List<ushort> tmp = new List<ushort>(500000); // this doesn’t count
           for (int i = 0; i < 500000; i++) tmp.Add(0); // this doesn’t count
           Sum(tmp); // this is what we are timing


What say you gentle readers?


(my solution is now posted here)

Comments (34)

  1. dzCepheus says:

    Mmm, rough.

    I want to say #2, because with #1 you’ve got an array of type ushort, while in the Sum method you’re working with an IList<ushort> – so the array has to be cast to an IList<ushort>.

    Just my intuition talking – I guess I’d need to time it myself to be sure.

  2. Wesner Moise says:

    I believe that #2 goes faster because the CLR has dynamically create for each type T, new IList<T> interface mappings to generic methods in SZArrayHelper. These generic methods forces an explicit cast (to T[]) operation to occur for accessing both the Count property and the get_Item indexer. (The need for the cast is probably because the array has deal with covariant IList implementations as well, because an array can be cast to its base.)

    If we changed the implementation a bit by foreach’ing over the enumerator instead of using a for loop, then the casting operation will no longer be performed, so I am betting the array’s enumerator would be faster than the List<T> enumerator.

  3. I believe #2 to be faster because List<T> implements a generic enumerator (IEnumerable<T>), whereas Array only implements IEnumerable. Thus, iterating over an array incurs an unboxing cost but iterating over List<T> does not.

    Or am I just making shit up . . . ?

  4. Crap – there’s no foreach. I am just making shit up.

  5. Igor says:

    Seems like they would perform the same. I expect that once built, List<ushort> is internally nothing but a ushort[].

  6. I believe #1 will be faster.

    With #1 .NET will optimise getting array values by removing any bounds checking. Because of the for loop it knows it will never access values beyond the length of the array.

    The Item property on the List<ushort> in #2 will however will still do bounds checking for each get.

  7. Actually, is bounds checking optimization done at runtime or during the compile? If it is during compiling then I don’t think it wouldn’t know that the IList<ushort> is an array.

  8. mak says:

    I think #1 is faster.

    Because both use the same sum jitted code for sum(), which means JIT doesn’t make an optimzie code for array, and List<short> is just a wrappar of short[].

    So #1 is faster for the overhead of List<>.

  9. mak says:

    On the second thought, maybe I was wrong…

    On #2, List<short> uses optimized code for array indexing. But #1 accesses an array via interface, which is slower than the optimized code.

    So, hmm, I don’t know!

  10. I would assume that #1 if faster, since it has no management overhead, while the List<> does.

    The cost of interface invocation (Count) is something that the List needs to handle as well, since it is handles an array internally.

  11. damien morton says:

    Well, firstly the test code runs slower than necessary – this small change speeds it up by 50% or so:

    public int Sum(IList<ushort> indices)

              {

                  int result = 0;

                  int n = indices.Count; // cache the Count rather than accessing it on every loop iteration

                  for (int i = 0; i < n; i++)

                      result += indices[i];

                  return result;

              }

    Now, Im finding that the summing an array as an IList<> is about 4 times slower than summing a List<> as an IList, and that summing up an array directly is 2.5 tims faster than summing up an IList<>. Hard to imagine what is gobbling up that factor 10 in performance.

    Is there some boxing/unboxing going on in there?

  12. mak says:

    I’ve got the same result as damien’s (#1 is 3-4x slower). I thought at least #1 was faster on debug build, but not.

    I think accessing an array via interface is VERY slow than accessing directly, or the optimizer is pretty good to deal with array directly.

  13. mak says:

    I mean, JIT makes a good code for accessing an

    array directly, even in debug build.

    I guess the following is what happen.

    #1: accessing an array via interface(in sum(), slow, no optimization for array)

    vs

    #2: accessing an array directly(in List<>.this[], very fast, optimized for array) + vtable dispatch to invoke List<>.this[](very fast)

  14. Mike Parsons says:

    In Rico’s Example, #2 is faster but this is actually twice as fast as #2 …

    ushort[] tmp = new ushort[500000]; // this doesn’t count

    int result = Sum(tmp);

    public int Sum(ushort[] indices)

           {

               int result = 0;

               int n = indices.Length-1; // cache the Count rather than accessing it on every loop iteration

               for (int i=n;i>-1;i–) // Iterating thru Backwards is faster than Forwards

                   result += indices[i];

               return result;

           }

    I have been experimenting with doing this type of iteration in JavaScript, where every optimization can mean exponential gains in performance. You can check out come of my tests here … http://geekswithblogs.net/mparsons/archive/2006/03/02/71175.aspx#FeedBack

    Check out the Test Page in my comment.

  15. Chris L says:

    The 2nd commenter was correct that #2 runs faster, and also that changing the for loop to a foreach loop makes #1 way faster than #2. I’m really not sure why this happens though. Maybe in certain cases the JIT compiler thinks it has to do bounds checking on the array when it really didn’t have to.

  16. kov says:

    Not to be a stick-in-the-mud, but I kinda think 500k virtual-calls do count.

  17. ricom says:

    Great comments so far!  I love this one, it’s so juicy 🙂

  18. Wilco Bauwer says:

    I’m with Wesner on this one. The (internal) SZArrayHelper class is a class that contains the method bodies that is generated for (1d) arrays. The reason why #1 is more time consuming seems evident when you compare SZArrayHelper.get_Item’s implementation to that of List<T>.get_Item. The former includes a cast (which incurs per item in your array), which is where I believe the overhead of time is spent.

  19. Rico’s posted another performance quiz on his blog, which means that I get to run the F1 profiler on…

  20. ianhu says:

    Hey Rico. I ran F1 on this scenario and I got some more data. But to be honest I’m still not 100% sure what these JIT_IsInstanceOfArray and ArrayIsInstanceOfNoGC function are doing in the array case. I have some guesses, but nothing that I’m sure of. Anyways, my run of the profiler is here:

    http://blogs.msdn.com/ianhu/archive/2006/03/10/548778.aspx

  21. KeithH says:

    Boxing is happening because the array’s get_Item() implementation returns object:

    L_0002: call instance –>object<– System.Array::GetValue(int32)

    Whereas the List<T> implementation of get_Item doesn’t box:

    L_0002: call instance !0 System.Collections.Generic.List`1<!T>::get_Item(int32)

  22. mak says:

    KeithH:

    but sum() accesses the array via IList<>.this[int], which doesn’t return object, so no boxing around there.

  23. Alois Kraus says:

    I have tested these two with a very simple profiler (Stopwatch) and got dramatic performance differences.

    Array/Generics = 852,33

    The Array version is over 800 times slower.

    I have used the following code:

          IList<ushort> gentmp = new List<ushort>(ArraySize);

           ushort[] arrtmp = new ushort[ArraySize];

           public void Run()

           {

               long GenTicks;

               long ArrayTicks;

               Stopwatch watch = Stopwatch.StartNew();

               Sum(gentmp); // this is what we are timing

               watch.Stop();

               GenTicks = watch.ElapsedTicks;

               List<ushort> ArrCopy = new List<ushort>(ArraySize);

               ArrCopy.AddRange(arrtmp);

               watch = Stopwatch.StartNew();

               Sum(arrtmp); // this is what we are timing

             

               watch.Stop();

               ArrayTicks = watch.ElapsedTicks;

               Console.WriteLine("Array/Generics = {0:F2}", (double)ArrayTicks /(double) GenTicks);

           }

    Perhaps I have an obvious error here which I did not see yet. But the number I get seem to be consistent. What is very interesting if I copy the array completely into a generic list and use that one I get

    List<ushort> ArrCopy = new List<ushort>(ArraySize);

    ArrCopy.AddRange(arrtmp);

    Sum(ArrCopy);

    Array/Generics = 1,70

    Nearly the same. I think there is massive overhead during the access of each element going on here. It is even cheaper to copy the whole thing and use that one if the array is big enough.

    Yours,

     Alois Kraus

  24. KeithH says:

    mak:

    Yep I see that now.  Thx.  The docs don’t even show array as implmenting IList<T> but then they do go on to say:

    In the .NET Framework version 2.0, the Array class implements the System.Collections.Generic.IList, System.Collections.Generic.ICollection, and System.Collections.Generic.IEnumerable generic interfaces. The implementations are provided to arrays at run time, and therefore are not visible to the documentation build tools. As a result, the generic interfaces do not appear in the declaration syntax for the Array class, and there are no reference topics for interface members that are accessible only by casting an array to the generic interface type (explicit interface implementations).

    So my guess is that there is something about this runtime magic that is causing the poor performance.

  25. M Knight says:

    One of the problems with this is if the assembly is marked as debug mode or a debugger attached during startup the JIT disables optimizations, and that will make a big difference.

    The optimizations Mike Parsons does are very similar to what the JIT can do when it is allowed to optimize.

  26. Alois Kraus says:

    To sum it up. Every Array supports IList<T> with its specified type at compile time. If you pass it via the IList<T> interface to another function the CLR has special run time checks embedded to check if is an array or a reall generic list and an additional type check to ensure type safety. This is all done at run time for each get call which explains this factor of several hundred times slowdown. Is this the correct interpretation of Ianhu’s profiling run?

    Yours,

    Alois Kraus

  27. Norman Diamond says:

    I’d think it just depends on what kinds of optimization are used (i.e. available and used, whereas not yet available would result in not currently being used).

    Meanwhile this confuses me:

    for (int i = 0; i < 500000; i++) tmp.Add(0); // this doesn’t count

    Even though it doesn’t count, even a minimal amount of optimization should make it even less count ^_^

  28. ricom says:

    The idea was that in all the cases the array had already been initialized in some sensible way.  We’re basically just talking about how to enumerate it.

    Another blog about list enumeration… who’da thunk it 🙂

  29. Norman Diamond says:

    Then I guess this line:

    for (int i = 0; i < 500000; i++) tmp.Add(0); // this doesn’t count

    was supposed to be this:

    for (int i = 0; i < 500000; i++) tmp[i].Add(0); // this doesn’t count

    ?  At least then the count that doesn’t count counts for something.

  30. ricom says:

    for (int i = 0; i < 500000; i++) tmp.Add(0); // this doesn’t count

    This line adds 500000 elements to the List (all zeros).  So now it’s the "same" as an array of 500000 zeros.

  31. Alexei says:

    If you happen to know the exact type of the container, it’s always useful to let JIT know it too.  It can produce more optimized code.  For comparison, I passed List<> into the following function Sum2, and it appeared to be much faster.  No function calls are made inside Sum2().  JIT inlined it.

    static public int Sum2(List<ushort> indices)

    {

    int result = 0;

    for (int i = 0; i < indices.Count; i++)

    result += indices[i];

    return result;

    }

    Timing:

    Sum(IList<ushort>)

    time: 404

    Sum2(List<ushort>)

    time: 57

    You can call it "abstraction penalty."  Sum() function is generic, it operates on any container implementing IList<>, whereas Sum2() is specific to one concrete type List<>.

    Interesting that the abstraction penalty cannot be avoided with a generic function:

    static public int Sum3<T>(T indices) where T : IList<ushort>

    The code generated here is no better than the one for the original Sum() function.

  32. Last week I posted Performance Quiz #9&amp;nbsp;for discussion. Well the comments have been just awesome…

  33. Last week I posted Performance Quiz #9 for discussion. Well the comments have been just awesome and many