Recursive iterator performance

Just got back from a very successful week at WinDev in Boston.  Not only was the conference good, but it was great being back in Boston for the Red Sox victory.  Lots of folks at the conference asked me if I was a Yankees fan as I currently live and work in NYC, but my time in New York could never replace all those years growing up in New Hampshire rooting for the Red Sox, nor my college years in Cambridge (just outside of Boston) living with die-hard Sox fans; we even drove from Cambridge to Fort Myers for two spring breaks to see them play during spring training.  (For all you Bostonians interested in such a trip, the drive is approximately 27 hours each way, which we did only stopping for food and restroom breaks; well worth the effort.)

In any event, I think the conference this week was successful.  Spent a lot of time talking with the attendees, learning about the types of applications they’re building with .NET, what functionality in the Framework is most interesting and/or useful to them, what kind of content they’re interested in seeing more of on MSDN and in MSDN Magazine, what features they’re looking forward to in the .NET Framework 2.0, etc.

Overall, I also felt the speakers were very good, which certain helps a conference.  That said, every once in a while I’d hear a comment that I felt was misleading.  Not that the speaker said something wrong, just that what they said could be taken out of context in an improper fashion.  The rest of this post will address one of those comments.

I listened in on a session today covering iterators in C# 2.0.  The presenter did a good job, and had some cool examples for how this new compiler functionality can make development life much easier.  However, one of the attendees asked a question on performance, namely do recursive iterators perform well, and I feel the presenter’s response deserves some rebuttal.

The question was asked after the presenter showed how to perform recursive tree iteration (for example, an in-order traversal) using a C# iterator.   The code looked something like this (assuming a Tree<> class defined similar to what you'd expect):

    static IEnumerable<int> EnumerateTree (Tree<int> tree)
    {
        if (tree == null) yield break;
        foreach (int i in EnumerateTree (tree.Left)) yield return i;
        yield return tree.Value;
        foreach (int i in EnumerateTree (tree.Right)) yield return i;
    }

The response the presenter gave was that there’s little to no difference between using iterators and custom-written enumerators, and that specifically, he hasn’t found any performance problems using recursive iterators.

Leaving aside the part about recursion, it's true that there's little to no performance difference between using custom enumerators and iterators... in fact, the compiler does a very good job of generating the state machine for the iterator such that it'd be hard to get better performance from a custom enumerator, and custom enumerators are certainly harder to write.

As with any perceived performance issue, good design and measurement are key.  And, of course, the actual data set for the problem is important.  If I only need to sort 5 elements, a bubble sort is going to perform fine.  Not only would the performance difference between bubble sort and something "faster" like merge sort or quick sort be negligible, bubble sort’s O(N^2) run time might be faster than quick sort’s (N log N) run time due to the constants involved, the time to load the larger quick sort routine from disk, locality, etc.  So, I’ll make sure to backup my claims here with numbers (of course, since the data set does matter, take these numbers with a grain of salt).

Iterators are a really cool feature in C# 2.0 that I’m sure I’ll make use of frequently.  However, as with any feature, you have to understand what’s happening under the covers so that you can reason better about the performance impact.  As Grant has previously pointed out, the above code will produce O(N) iterator objects, where N is the  number of nodes in the tree.  Does that matter?  Maybe.  We’ll be creating approximately one new iterator node for every node in the tree.  Not a big deal in and of itself, except that if the tree is deep, all of the temporary nodes associated with nodes from the current node up to the root node will be kept alive during GC calls.  This means that there’s a much higher chance that these temporary nodes will be promoted to generations 1 and 2, which means more and more time will need to be spent in the garbage collector.  In addition, keep in mind that whenever the next value is yielded from an iterator, this value is returned up through all of the yield statements in each iterator (i.e. a node M deep will return through M calls to MoveNext on the various levels of iterators).  So whenever you need to make the next call to MoveNext from your client code, you’re one call to MoveNext will result in approximately O(log N) calls to MoveNext as your calls makes its way down to the node of the tree currently being iterator over.  So not only are we possibly making a ton of work for the garbage collector, but we’ve also changed our iteration algorithm from O(N), looking at each node once, to approximately O(N log N), where for each node in order to get to it we have to make a method call for all of the nodes above it.

The best way for me to prove that this is in fact a problem is to show that there’s a faster way.  Grant has already shown a better way to do it for simply tree enumeration, so I’ll refer you to his for that.  Instead, I thought I would demonstrate on a related problem, in fact the one the presenter today used for his discussion: Towers of Hanoi (if you don’t know about this problem, search on MSN or Google and you’ll find tons of references and with nice pictures and all that good stuff).

The C# iterator presented today for solving Towers of Hanoi looked like the following:

    public enum Peg { A, B, C }

    public struct Move
    {
        public Move(Peg from, Peg to) { From = from; To = to; }
        public readonly Peg From;
        public readonly Peg To;
    }

    public static IEnumerable<Move> GenerateMoves(int n, Peg from, Peg to, Peg scratch)
    {
        if (n == 1) yield return new Move(from, to);
        else
        {
            foreach (Move m in GenerateMoves(n - 1, from, scratch, to)) yield return m;
            yield return new Move(from, to);
            foreach (Move m in GenerateMoves(n - 1, scratch, to, from)) yield return m;
        }
    }

The solution is the standard recursive solution presented in the average CS 101 class, except making use of C# iterators.  It works great, and is certainly cool and a good demonstration of iterators, but at the same time it has the same performance problems as does the tree example discussed above.  In fact, the Towers of Hanoi problem is almost identical to the in-order tree traversal problem.

How can we do it without recursion?  We can use our own explicit stack (rather than the implicit one we get from making method calls, each of which gets its own stack frame).

    public enum Peg { A, B, C }

    public struct Move
    {
        public Move(Peg from, Peg to) { From = from; To = to; }
        public readonly Peg From;
        public readonly Peg To;
    }

    struct InternalMove
    {
        public InternalMove(int n, Peg f, Peg t, Peg s) { N = n; From = f; To = t; Scratch = s; }
        public readonly Peg From;
        public readonly Peg To;
        public readonly Peg Scratch;
        public readonly int N;
    }

    public static IEnumerable<Move> GenerateMoves(int n, Peg from, Peg to, Peg scratch)
    {
        Stack<InternalMove> stack = new Stack<InternalMove>(n);
        for (InternalMove cur = new InternalMove(n, from, to, scratch);
               cur.N > 0 || stack.Count > 0;
               cur = new InternalMove(cur.N-1, cur.Scratch, cur.To, cur.From))
        {
            for (; cur.N > 0; cur = new InternalMove(cur.N-1, cur.From, cur.Scratch, cur.To)) stack.Push(cur);
            cur = stack.Pop();
            yield return new Move(cur.From, cur.To);
        }
    }

Note that I’ve kept the Move structure and the Peg enumeration from the original version, and the signature of my iterator hasn’t changed.  The only thing I’ve done is add an InternalMove struct to maintain some state and have changed the actual implementation of GenerateMoves.  In fact, I’ve just adopted Grant’s solution (the standard iterative tree walk with an explicit stack) and replaced it with specifics for Towers of Hanoi.  Specifically, you can visualize Towers of Hanoi as a tree of moves.  Whereas in in-order tree traversal to move left I set "current" to current.Left, in Towers of Hanoi to move left I set current to new InternalMove(cur.N-1, cur.From, cur.Scratch, cur.To)... this comes exactly from the recursive version.  Similarly, instead of setting current to current.Right in order to traverse right, in Towers of Hanoi I set current to be new InternalMove(cur.N-1, cur.Scratch, cur.To, cur.From), again, directly from the recursive version.

Does it work?  Yup.  For example, here are outputs with the number of disks set to 4 (so there will be 15 moves).

Recursive Version:
    From A to C
    From A to B
    From C to B
    From A to C
    From B to A
    From B to C
    From A to C
    From A to B
    From C to B
    From C to A
    From B to A
    From C to B
    From A to C
    From A to B
    From C to B

Iterative Version:
    From A to C
    From A to B
    From C to B
    From A to C
    From B to A
    From B to C
    From A to C
    From A to B
    From C to B
    From C to A
    From B to A
    From C to B
    From A to C
    From A to B
    From C to B

This is the result of just iterating over each Move in a foreach using the iterators from above and printing out the from and to pegs: 

    public static void Main()
    {
        foreach(Move m in GenerateMoves(4, Peg.A, Peg.B, Peg.C))
        {
            Console.WriteLine("From {0} to {1}", m.From, m.To);
        }
    }

So... both produce the same output.  Is the second one really faster?  Yes.  For 4 disks, on my laptop the recursive version version took 0.0145 milliseconds while the iterative version took 0.0069 milliseconds.  A difference for sure, though not a showstopper.  But as with bubble sort versus quick sort, the magic lies with larger data sets.  What happens if we set the number of discs to 23, resulting in 8,388,607 moves?  The recursive version required 20.94 seconds to run (yes, seconds) versus 0.83 seconds for the iterative version.  Now that’s a difference!  Do it yourself to see, and while it's running open perfmon and look at the .NET Memory counters, specifically those measuring the percent of time spent doing garbage collections.

The moral of the story is this.  Iterators are cool.  Very cool.  And extremely practical.  In and of themselves, they add little overhead, and they make certain types of code incredibly easy to write.  However, that doesn’t mean you can forget about coding well or about thinking through the problem space.  You still need to think about what’s happening under the covers, or you could find yourself blaming iterators for bad performance when you should really be blaming, well, the guy next to you who wrote the code.