Recursive iterator performance, part 2

Article
10/31/2004

On Friday I post some observations on the performance of recursive iterators in C#. Since then I've received a few emails from folks stating that, while converting the recursive iterator to an iterative iterator with an explicit stack made sense for that particular problem, they didn't see how it could be generalized.

First, it should be noted that every recursive algorithm can be converted to an iterative algorithm. A recursive algorithm simply makes use of the implicit call stack, so an iterative solution can be created that makes use of an explicit stack. That said, just because an iterative algorithm exists doesn't mean it's easy to find. There do exists algorithms you can follow to convert recursive algorithms to iterative algorithms, but following them to the letter can yield fairly ugly code.

The emails I received in particular asked about preorder and postorder traversals of trees, since the example I showed was an inorder traversal. In terms of difficulty in converting to an iterative algorithm, preorder is by far the easiest, followed by inorder, followed by postorder, at least in my opinion.

The recursive iterator for preorder traversal would be written something like this:

    public static IEnumerable<T> PreorderTraversal<T>(Tree<T> node)
    {
        yield return node.Value;
        if (node.Left != null) foreach (T val in PreorderTraversal(node.Left)) yield return val;
        if (node.Right != null) foreach (T val in PreorderTraversal(node.Right)) yield return val;
    }

As mentioned on Friday, this has some serious performance issues if the tree is large/deep, namely that the temporary iterator objects created at each level could get promoted to higher GC generations, causing more and more time to spent performing collections, and that this converts the O(n) traversal into an O(n log n) traversal given that each call to MoveNext on the iterator calls the next level's MoveNext which calls the next level's MoveNext, etc. Preorder can easily be turned into an iterative iterator as follows:

    public static IEnumerable<T> PreorderTraversal<T>(Tree<T> node)
    {
        Stack<Tree<T>> stack = new Stack<Tree<T>>();
        stack.Push(node);
        while(stack.Count > 0)
        {
            node = stack.Pop();
            if (node.Right != null) stack.Push(node.Right);
            if (node.Left != null) stack.Push(node.Left);
            yield return node.Value;
        }
    }

As with most recursive --> iterative conversions, I'm maintaining my own explicit stack to maintain state. (Note that many of the recursive problems that wouldn't require a stack are tail recursive problems, meaning that the recursive call is the last thing that happens in the function). Since with a preorder traversal work is done to the current node first, this makes it incredibly straightforward to implement with an iterator, since every time through the loop all I need to do is get the next node from the stack, push its children onto the stack, and return its value. And as with its inorder cousin, this performs much better in most situations than its recursive counterpart.

Postorder is more difficult than inorder, and requires a bit of thought. For reference, here's the recursive iterator:

    public static IEnumerable<T> PostorderTraversal<T>(Tree<T> node)
    {
        if (node.Left != null) foreach (T val in PostorderTraversal(node.Left)) yield return val;
        if (node.Right != null) foreach (T val in PostorderTraversal(node.Right)) yield return val;
        yield return node.Value;
    }

It looks exactly like the inorder and preorder recursive iterators with the exception of the placement of the "yield return node.Value;" line. While probably not the only solution, my solution to turning this into an iterative iterator is as follows:

    struct PostorderData<T>
    {
        public PostorderData(Tree<T> node, bool traversedRight) { Node = node; TraversedRight = traversedRight; }
        public readonly Tree<T> Node;
        public readonly bool TraversedRight;
    }

    public static IEnumerable<T> PostorderTraversal<T>(Tree<T> tree)
    {
        Stack<PostorderData<T>> stack = new Stack<PostorderData<T>>();

        while(tree != null || stack.Count > 0)
        {
            for (; tree != null; tree = tree.Left) stack.Push(new PostorderData<T>(tree, false));

            PostorderData<T> data = stack.Pop();
            if (!data.TraversedRight)
            {
                stack.Push(new PostorderData<T>(data.Node, true));
                tree = data.Node.Right;
            }
            else
            {
                yield return data.Node.Value;
                tree = null;
            }
        }
    }

Starting at the root of the tree, I walk left until I can't walk left anymore, and I push each node along the way onto the stack. Note that I push more than just the node, but I also push a boolean value indicating whether I still need to traverse the right child of the node; since for now I'm only traversing the left child, the value is false, meaning the right child has not yet been traversed. After going as far left as possible, I pop a value from the stack. If I haven't traversed its right child, I push the node back onto the stack, but this time indicate that I have traversed its right child, and start the process again, starting from the right child instead of from the original tree root. If the popped node has had its right child traversed, then I'm done with that whole subtree and can yield the popped node's value.

Again, it takes a bit more thought than the recursive version, but it's a heck of a lot faster for most situations and is well worth the extra few minutes. Of course, situation really does matter. If the work being performed for each node (or alternatively if the work being performed to "find" the nodes, if we're not dealing with an in-memory tree structure) dominates, then it doesn't really matter from a time perspective whether we go with the recursive or iterative version. For example, Brad Abrams previously blogged a nice recursive iterator for enumerating files:

    public static IEnumerable<string> GetFiles(string path)
    {
        foreach (string s in Directory.GetFiles(path)) yield return s;

        foreach (string s in Directory.GetDirectories(path))
        {
            foreach (string s1 in GetFiles(s)) yield return s1;
        }
    }

You can easily turn this into an iterative iterator, something like the following (though either way you should probably check each item to ensure it isn't a reparse point):

    public static IEnumerable<string> GetFiles(string path)
    {
        Stack<string> stack = new Stack<string>();
        stack.Push(path);

        while (stack.Count > 0)
        {
            path = stack.Pop();
            foreach (string s in Directory.GetFiles(path)) yield return s;
            string[] dirs = Directory.GetDirectories(path);
            for(int i=dirs.Length-1; i>=0; i--) stack.Push(dirs[i]);
        }
    }

You can do it, but should you? Honestly, you can measure it, but it probably doesn't make much sense to do it iteratively. The I/O involved here is almost certainly going to dwarf any benefits gained from doing this iteratively with an implicit stack, and the recursive version is more intuitive.

Recursive iterator performance, part 2

Additional resources