Benchmarking, C++, and C# Micro-optimizations

Two posts (1 2) on C# loop optimization got me thinking recently.

Thinking about what I did when I first joined Microsoft.

Way back in the spring of 1995 or so (yes, we did have computers back then, but the Internet of the time really *was* just a series of tubes), I was on the C++ compiler test team, and had just picked up the responsibility for running benchmark tests on various C++ compilers. I would run compilation speed and execution speed tests in controlled environments, so that we could always know where we were.

We used a series of “standard” benchmarks – such as Spec – and a few of our own.

Because execution speed was one of the few ways (other than boxes with lots of checkmarks) that you could differentiate your compiler from the other guy’s, all the compiler companies invested  resources at being faster at the benchmarks.

The starting point was to look at the benchmark source, the resultant IL, and the final machine code, and see if you could see any opportunity for improvement. Were you missing any optimization opportunities?

Sometimes, that wasn’t enough, so some compiler writers (*not* the ones I worked with) sometimes got creative.

You could, for example, identify the presence of a specific expression tree that just “happened to show up” in the hot part of of a benchmark, and bypass your usual code generation with a bit of hand-tuned assembly that did things a lot faster.

Or, with a little more work, you could identify the entire benchmark, and substitute another bit of hand-tuned assembly.

Or, perhaps that hand-tuned assembly doesn’t really do *all* the work it needed to, but took a few shortcuts but still managed to return the correct answer.

For some interesting accounts, please text “compiler benchmark cheating” to your preferred search engine.

As part of that work, I got involved a bit in the writing and evaluation of benchmarks, and I thought I’d share a few rules around writing and interpreting micro-benchmarks. I’ll speak a bit about the two posts – which are about looping optimizations in C# – along the way. Just be sure to listen closely, as I will be speaking softly (though not in the Rooseveltian sense…)

Rule 0: Don’t

There has always been a widespread assumption that the speed of individual language constructs matter. It doesn’t.

Okay, it does, but only in limited cases, and frankly people devote more time to it than it deserves.

The more productive thing is to follow the agile guideline and write the simplest thing that works. And note that “works” is a bit of a weasely word here – if you write scientific computing software, you may have foreknowledge about what operations need to be fast and can safely choose something more complicated, but for most development that is assuredly not true.

Rule 1: Do something useful

Consider the following:

void DoLoop()
{
    for (int x = 0; x < XMAX; x++)
    {
        for (int y = 0; y < YMAX; y++)
        {
        }
    }

}

void TimeLoop()
{
    // start timer
    for (int count = 0; count < 1000; count++)
    {
        DoLoop();
    }
    // stop timer
}

if XMAX is 1000, YMAX is 1000, and the total execution time is 0.01 seconds, what is the time spent per iteration?

Answer: Unknown.

The average C++ optimizer is smarter that this. That nested loop has no effect on the result of the program, so the compiler is free to optimize it out (the .NET JIT may not have time to do this).

So, you modify the loop to be something like:

void DoLoop()
{
    int sum;

    for (int x = 0; x < XMAX; x++)
    {
        for (int y = 0; y < YMAX; y++)
        {
            sum += y;
        }
    }
}

The loop now has some work done inside of it, so the loop can’t be eliminated.

Rule 2: No, really. Do something useful

However, the numbers won’t change. The call to DoLoop() has no side effects, so the entire call can be safely eliminated.

To make sure your loop is really a loop, there needs to be a side effect. The best bet is to have a value returned from the method and write it out to the console. This has the added benefit of giving you a way of checking whether things are working correct.

Rule 3: Benchmark != Real world

There are lurking effects that invalidate your results. Your benchmark is likely tiny and places very different memory demands on the system than your real program does.

Rule 4: Profile, don’t benchmark

 

C# loop optimization

If you are writing code that needs the utmost in speed, there is an improvement to be had using for rather than foreach. There is also improvement to be had using arrays rather than lists, and unsafe code and pointers rather than array indexing.

Whether this is worthwhile in a specific case depends exactly on what the code is doing. I don’t see a lot of point in spending time measuring loops when you could spend time measuring the actual code.