More on basic performance

C# and the .NET frameworks allow you to be incredibly productive and write elegant, maintainable code. However, some of these productivity features can just make you productive at writing slow code. “Sparkle” has an internal DOM for working with XAML. We have an XPath-like data structure that lets us make references in that DOM. This path data structure looks a little like a list: it is basically an array of steps, though the operations are quite different so it the OM looks nothing like a list. However, when doing performance analysis I discovered we were allocating and freeing huge numbers of these arrays of steps. I traced it to an implementation that looked something like this:

      class Path

      {

            private Step[] steps;

            public Path(params Step[] steps)

                  : this((ICollection)steps)

            {

            }

            public Path(ICollection steps)

            {

                  this.steps = new Step[steps.Count];

                  int i = 0;

                  foreach (Step step in steps)

                  {

                        this.steps[i++] = step;

                  }

            }

            public Path Append(Step step)

            {

                  List<Step> newSteps = new List<Step>(this.steps);

                  newSteps.Add(step);

                  return new Path(newSteps);

            }

      }

Very commonly we initialized a Path that contained only a single step and then appended new steps. Path is immutable, so you always allocate new ones, as the Append method demonstrates.

Seems like nice simple code. What could possible go wrong? Well consider an example like this:

                  Step step1 = new Step();

                  Step step2 = new Step();

                  step1.StepName = "FirstStep";

                  step2.StepName = "SecondStep";

                  Path path1 = new Path(step1);

                  Path path2 = path1.Append(step2);

Just how much memory do we allocate in this case?

It seems like we should be allocating 2 Steps, 2 Paths and 2 Step[]s, since that is what we want at the end.

But really we allocate 2 Steps, 2 Paths, 6 Step[]s, 2 ArraySZEnumerators, a List<Step> and a List<Step> enumerator. Most of these are short-lived, but in large scenarios they were adding up to megabytes, trashing the caches and making the garbage collector work overtime. There are three problems:

1) the params array is being automatically constructed for us by the compiler and then thrown away (need it be pointed out that a single item array is more expensive than the single item it contains?)

2) By casting to ICollection the compiler has lost the specific type, so it must call GetEnumerator in order to make the foreach work in the second ctor. Even if you’ve implemented your Enumerator correctly as a struct, since it is going through the ICollection interface that returns a general type, it must box it.

3) The Append was allocating an additional List, then the Path ctor was allocating and copying the contents.

All these “features” made the original code easy to write and understand, but slower in a non-obvious way.

Here’s the rewritten, more efficient Path:

      class Path

      {

            private Step[] steps;

            public Path(Step step)

            {

                  this.steps = new Step[1];

                  this.steps[0] = step;

            }

            public Path(Step step1, Step step2)

            {

                  this.steps = new Step[2];

                  this.steps[0] = step1;

                  this.steps[1] = step2;

            }

            private Path(Step[] steps)

            {

                  this.steps = steps;

            }

            public Path Append(Step step)

            {

                  Step[] newSteps = new Step[this.steps.Length + 1];

                  for (int i = 0; i < this.steps.Length; i++)

                  {

                        newSteps[i] = this.steps[i];

                  }

                  newSteps[newSteps.Length - 1] = step;

                  return new Path(newSteps);

            }

      }

Basically the first two constructors handle the most common use cases in our codebase. I additionally have a couple more constructors that handle conversion from other internal data structures we use a lot. Append now works by allocating an array, adding to it and handing what we just allocated over to the Path ctor instead of making another copy. The same example as before allocates less than half as much memory, makes fewer copies and is noticeably faster in our actual scenarios.

Don’t get my advice wrong. Don’t stop using the collection classes in .NET. My second version of Path is trickier and more likely to have bugs of the off-by-one type. However, if you find yourself allocating a lot of memory, take a look at things like these as potential culprits.