Running the F1 profiler on Rico’s Perf Quiz #9

Article
03/10/2006

Rico’s posted another performance quiz on his blog, which means that I get to run the F1 profiler on it, puzzle over the numbers for a bit and then have Rico explain to me what’s really going on ;-). Just goes to show that performance results are best interpreted by someone who really knows performance and who knows the internals of the framework that is being used. By the way if you are interested in writing performance .NET application you really should have Rico’s blog in your feedreader, it’s got lots of good stuff.

Shown below is the program that I’m using to test Rico’s scenario. You’ll notice that I’m using the same function with two different names (Sum and SumB), this is because Rico was only looking at the performance of that specific part of the implementation not any of the initialization code. And, by default, the F1 profiler aggregates data by function name, so I had to add a second function with the same name to separate out my results.

class Program

{

static void Main(string[] args)

{

Program p = new Program();

ushort[] tmp = new ushort[500000]; // this doesn't count

p.Sum(tmp); // this is what we are timing

List<ushort> tmp2 = new List<ushort>(500000); // this doesn't count

for (int i = 0; i < 500000; i++) tmp2.Add(0); // this doesn't count

p.SumB(tmp2); // this is what we are timing

}

public int Sum(IList<ushort> indices)

{

int result = 0;

for (int i = 0; i < indices.Count; i++)

result += indices[i];

return result;

}

public int SumB(IList<ushort> indices)

{

int result = 0;

for (int i = 0; i < indices.Count; i++)

result += indices[i];

return result;

}

Since I’m assuming that a little app like this is CPU bound, I’m running the profiler in sampling mode. Also, since I was not getting enough samples at the default rate of one sample every 10 million clock cycles I moved the rate down to one sample every 500,000 clock cycles.

Making sure that I was running in release mode I fired up the profiler and let it run on the code snippet above. Using the top level function view I could pull the following numbers (the numbers are inclusive percent times, so 51% of execution time was spent in Sum vs 7% in SumB):

RicoQuiz.Program.Main(string[]) 65.297

RicoQuiz.Program.SumB(class System.Collections.Generic.IList`1<uint16>) 7.306

RicoQuiz.Program.Sum(class System.Collections.Generic.IList`1<uint16>) 51.598

So now we know that SumB is quite a bit faster then Sum, but can the profiler help to tell us why that is? Pictured below is a screen of the calltree view for this program (I’ve set main as the root function to hide all the loader stuff that happens before we even start executing our program, note that only 65% of all samples happened under the program main function):

So why is Sum so much slower then SumB? Well, I’m no CLR performance engineer (and that’s kinda the point of these exercises, after all, Rico doesn’t need a profiler to find the answers, but I do) but when we call getCount or getItem on the array we have to do some nasty checks to figure out exactly what type of array we are dealing with. Just look at all the time being eaten up in calls to JIT_IsInstanceOfArray, ArrayIsInstanceOfNoGC and in getting the element type of the array (check the blue marks that I added). I don’t have the time to dig too deep to figure out exactly what is going on here, but I’m sure Rico will fill us in soon enough.

Running the F1 profiler on Rico’s Perf Quiz #9

Additional resources