Recursive iterator performance, part 3

Last week and weekend I discussed performance of iterators in C# 2.0, specifically recursive iterators.
Recursive iterator performance
Recursive iterator performance, part 2

To briefly recap, the ability to write recursive iterators is, quite frankly, awesome.  But as with any powerful feature you need to be aware of the performance ramifications associated with writing code one way or another.  I post a few examples of rewriting recursive iterators as iterative iterators, including binary tree inorder, preorder, and postorder traversals, as well as a solution to the Towers of Hanoi (which, as I also mentioned, is basically an inorder traversal).

Here is another example, the classic permutations problem folks sometimes see in their intro to computer science classes (I've even heard of it being asked in Microsoft interviews).  The idea is, given an array, enumerate all of the permutations of that array.  A generic recursive iterator solution could be written as follows:

    private static void Swap<T>(T[] arr, int pos1, int pos2)
    {
        if (pos1 != pos2)
        {
            T temp = arr[pos1]; arr[pos1] = arr[pos2]; arr[pos2] = temp;
        }
    }

    public static IEnumerable<T[]> Permute<T>(T[] arr)
    {
        foreach (T[] permutedArr in Permute(arr, 0, arr.Length-1))
        {
            yield return permutedArr;
        }
    }

    private static IEnumerable<T[]> Permute<T>(T[] arr, int lowerBound, int upperBound)
    {
        if (lowerBound == upperBound) yield return arr;
        else
        {
            for (int i = lowerBound; i <= upperBound; i++)
            {
                Swap(arr, i, lowerBound);
                foreach (T[] permutedArr in Permute(arr, lowerBound+1, upperBound))
                {
                    yield return permutedArr;
                }
                Swap(arr, i, lowerBound);
            }
        }
    }

Basically, I start at the beginning of the array (where "beginning" is defined as the lower bound passed to the method, not necessarily 0) and walk along the array, swapping the element at the beginning of the array with the current element.  After a swap, I recursively enumerate all of the permutations of the rest of the array, starting at beginning+1.  And when I've completed that, I swap back the beginning element with the current element and continue walking along the array.  You really have to admire the elegance of recursion.

To turn this into an iterative iterator, one needs to examine exactly where work is being performed in relation to when recursive calls are being made and when values are being returned.  In this case, the termination condition is when the beginning of the array is the end of the array, and that's when we stop recurring and yield a value.  In all beginning positions, we swap the elements in two positions, recur, and swap them back.  Thus, in an iterative version, I need an explicit stack that keeps track of what swaps I've made and when I made them... this requires some information:
1. What's the current lower bound?  In the recursive version, we're passing that information to the recursive call.
2. What's the current element, i.e. what's the value of i as we walk from lower bound to upper bound?  In the recursive version, that's maintained as part of the stack frame for each recursive call.
3. What's the current upper bound?  In the recursive version, we're passing that information to the recursive call.  Note, however, that this value never changes; that'll come in handy when writing an iterative version.
4. Am I coming or going?  In other words, I push some information onto the stack about a swap that's been performed... how do I know when and how to undo that swap?  This is implicit in the recursive version.

These questions lead to the following state structure to maintain information about each "recursive call" (since the upper bound is constant, I don't need to include it):

    private struct State
    {
        public State(int lower, int current, bool isUndo) 
        {
            LowerBound = lower; CurrentSwap = current; IsUndo = isUndo;
        }
        public readonly int LowerBound;
        public readonly int CurrentSwap;
        public readonly bool IsUndo;
    }

Every time I would make a recursive call in the recursive version, in the iterative version I need to push one of these state structures onto my stack to indicate what operation I need to perform the next time through my loop.  In addition, I need another instance of this struct pushed onto the stack to indicate that the operation needs to be undone, and equally important, when.  Since a stack is a LIFO data structure, whatever gets pushed onto the stack last will be the first thing out.  So, since I need to perform my swap before I undo it, I need to first push the undo information onto the stack, and then push the regular swap information onto the stack.  With that in hand, we've basically solved the problem.  My solution (which could probably be cleaned up a bit), looks like this:

    public static IEnumerable<T[]> Permute<T>(T[] arr)
    {
        int upperBound = arr.Length-1;
        Stack<State> stack = new Stack<State>((upperBound*upperBound)+1);
        stack.Push(new State(-1, -1, false));

        while (stack.Count > 0)
        {
            State curState = stack.Pop();

            if (curState.IsUndo)
            {
                if (curState.CurrentSwap >= 0) Swap(arr, curState.LowerBound, curState.CurrentSwap);
            }
            else if (curState.LowerBound == upperBound) yield return arr;
            else
            {
                if (curState.CurrentSwap >= 0) Swap(arr, curState.LowerBound, curState.CurrentSwap);
                for (int i = upperBound; i > curState.LowerBound; i--)
                {
                    stack.Push(new State(curState.LowerBound + 1, i, true));
                    stack.Push(new State(curState.LowerBound + 1, i, false));
                }
            }
        }
    }

Now the question becomes a question of worth... does this version perform better than its recursive counterpart?  The answer is a definitive maybe.  For large arrays, it runs faster.  On my machine, iterating over all of the permutations with the recursive version on an array of 12 numbers took over four minutes to finish.  The iterative version in the same scenario took under two minutes.  An array of 13 numbers with the iterative version finished in approximately 15 minutes, and I didn't have the patience to wait for the recursive version to finish.  But for smaller arrays, the difference was fairly negligable, and up until an array of length 10, both versions finished computing all of the permutations in under a second.  There are other memory related issues to consider, too.  For example, in the recursive version the call stack grows and shrinks, but in the iterative version, the Stack<State> never shrinks.  Is that a problem?  Probably not.  But measuring is the only way to know for sure.

As always, it's good to have a choice, but whether the extra effort is worth it (and whether it will in fact make a positive difference) really depends on the situation.  So, profile the code, find your hotspots, and keep this kind of transformation in your bag of tricks in case you need to use it one day.