Well you'll recall that in the Performance Quiz #9 solution there was a surprising result where Test7 was actually faster than Test5 even though they appear to be doing basically exactly the same work. So my new challenge to you was to see if you can explain it.
So, now this is your last chance to go and look for yourself before I give away the answer... so stop here if you want to work on the problem.
OK if you're still reading then you're just dying to hear what was going on. Well there's a lot of serendipity here. My colleague Vance had just written a blog about how to disassemble dispatch stubs. In what I can only categorize as a total unmitigated fluke (because believe it or not we didn't discuss this with each other at all) his blog which I linked to in the original has the answer right there on a silver platter.
As wrote in the solution comments:
"The difference is caused by different interface call stubs (as explained by Vance Morrison) in Test5 and Test7 since [the] foreach loop calls ([to] IEnumerator<ushort> MoveNext and Current) in SumSpecial are only on List<ushort> instance while [the] same calls in SumForeach are on both ushort (in Test4) and List<ushort> (in Test5). If we comment [out the] Test4 call, they would have similar speed (Test5 would be a little bit faster as expected).
Very interesting, thanks Rico and Vance!"
Which is right on the mark! Full points to Ivan!
SumSpecial has two call sites to the foreach methods, the first one always calls on an array instance so it can use the monomorphic stub (which is faster). The second one calls always on the List<T> instance so it too can use the monomorphic stub. SumForeach is handicapped because it's called first on an array and then on a List so the second time we decide to use a more flexible but slower stub to do the calling. That makes Test5 slower.
As Ivan writes you can see the difference go away by simply commenting out the call to Test4 in program.cs at which point all the call sites in question are monomorphic. Or equivalently you can create a duplicate of SumForeach called SumForeach2 and use that in Test5 and again all the call sites are monomorphic.
Remember, as Vance writes, stub selection is per call site.
And now that we have that little data we might want to rewrite program.cs to factor out stub selection in the reported times. But really that doesn't change the results a whole lot. The main thing that was going on was the array special helpers as discussed.
Chalk up another one for powerful secondary effects in the performance biz!