Recursion and Concurrency

When teaching recursion in an introductory computer science course, one of the most common examples used involves a tree data structure.  Trees are useful in this regard as they are simple and recursive in nature, with a tree’s children also being trees, and allow for teaching different kinds of traversals (in-order, pre-order, post-order, level-order, and so forth).  But when these introductory concepts meet multithreading, the concepts are no longer so simple, at least not with the tools available in mainstream languages today like C#, C++, and Java. Consider a simple Tree data structure:

class Tree<T>
{
    public Tree<T> Left, Right; // children
    public T Data; // data for this node 
}

Let’s say we want to execute an Action<T> for each datum stored in the tree, and let’s assume I don’t care about order (introducing parallelism could be questionable if I did).  That’s straightforward to do sequentially and recursively:

public static void Process<T>(Tree<T> tree, Action<T> action)
{
    if (tree == null) return;

    // Process the current node, then the left, then the right
   
action(tree.Data);
    Process(tree.Left, action);
    Process(tree.Right, action);
}

I could also do so without recursion by maintaining an explicit stack (or queue, or some other data structure with different ordering guarantees):

public static void Process<T>(Tree<T> tree, Action<T> action)
{
    if (tree == null) return;
    var toExplore = new Stack<Tree<T>>();

    // Start with the root node
   
toExplore.Push(tree);
    while (toExplore.Count > 0)
    {
        // Grab the next node, process it, and push its children

        var current = toExplore.Pop();
        action(current.Data);
        if (current.Left != null)
           
toExplore.Push(current.Left);
        if (current.Right != null)
           
toExplore.Push(current.Right);
    }
}

Now, let’s assume the action we’re performing on each node of the tree is independent, relatively expensive and/or that the tree is relatively large, and as such we want to process the tree in parallel (we’re of course also assuming that the action delegate is thread-safe), meaning that we want multiple threads each running the action delegate on distinct tree nodes.  How do we do this with what we have in .NET today?

There are multiple approaches, some more valid than others.  The first thing someone might try is to follow the original recursive implementation but using the ThreadPool, which could look something like this:

public static void Process<T>(Tree<T> tree, Action<T> action)
{
    if (tree == null) return;

    // Use an event to prevent this method from
    // returning until its children have completed

    using (var mre = new ManualResetEvent(false))
    {
        // Process the left child asynchronously

        ThreadPool.QueueUserWorkItem(delegate
        {
            Process(tree.Left, action);
            mre.Set();
        });

        // Process current node and right child synchronously
        action(tree.Data);
        Process(tree.Right, action);

        // Wait for the left child

        mre.WaitOne();
    }
}

The idea behind this implementation is to, given a node, spin up a work item to process that node’s left child in parallel with the current node, and then process the current node’s data as well as its right child.  Of course, I could be losing out on some parallelism here as I delay processing of the right child until I’m done processing the current data.  So we modify it slightly:

public static void Process<T>(Tree<T> tree, Action<T> action)
{
    if (tree == null) return;

    // Use an event to wait for the children
    using (var mre = new ManualResetEvent(false))
    {
        int count = 2;

        // Process the left child asynchronously
        ThreadPool.QueueUserWorkItem(delegate
        {
            Process(tree.Left, action);
            if (Interlocked.Decrement(ref count) == 0)
                mre.Set();
        });

        // Process the right child asynchronously

        ThreadPool.QueueUserWorkItem(delegate
        {
            Process(tree.Right, action);
            if (Interlocked.Decrement(ref count) == 0)
                mre.Set();
        });

        // Process the current node synchronously
        action(tree.Data);

        // Wait for the children
        mre.WaitOne();
    }
}

I’ve now fixed that issue, such that both the left and the right children could potentially be processed in parallel with the current node, but that was by far not the worst problem.  For starters, I’m creating a ManualResetEvent for every node in the tree… that’s expensive.  ManualResetEvent is a thin wrapper around a Win32 kernel event primitive, so creating one of these things requires kernel transitions, as does setting and waiting on one.  Next, every time I process a node, I block waiting for its children to complete.  And as the processing of a node (all but the root) is happening on a thread from the ThreadPool, I’m blocking ThreadPool threads.  If a ThreadPool thread gets blocked, the ThreadPool will need to inject additional threads in order to process the remaining work items, and thus this implementation will require approximately one thread from the pool per node in the tree.  That’s a lot of threads!  And that carries with it some serious problems.  By default, a thread in .NET has a megabyte of stack space committed for it, so each thread burns a megabyte of (virtual) memory.  The ThreadPool also throttles the creation of additional threads, such that introducing a new thread (once the number of pool threads equals the number of processors) will take 500 ms.  For a tree of 250 nodes, that means its processing will take close to 2 minutes, purely for the overhead of creating threads, nevermind the actual processing of the nodes.  And worse, there is a maximum number of threads in the pool: in .NET, the ThreadPool has a limited number of threads, by default 25 per processor in .NET 1.x/2.0 and 250 per processor in .NET 2.0 SP1.  If the pool reaches the maximum, no new threads will be created, and thus this implementation could deadlock.  Parent nodes will be waiting for their child nodes to complete, but the child nodes can’t be processed until their parent nodes complete and relinquish a thread from the pool to execute.

Obviously, we need a better implementation. Next up, we can walk the tree sequentially, queuing up a work item in the pool for each node in the tree.  Here we take that approach based on the recursive implementation of a tree walk:

public static void Process<T>(Tree<T> tree, Action<T> action)
{
    if (tree == null) return;

    // Use an event to wait for all of the nodes to complete
    using (var mre = new ManualResetEvent(false))
    {
        int count = 1;
        
        // Recursive delegate to walk the tree
        Action<Tree<T>> processNode = null;
        processNode = node =>
        {
            if (node == null) return;

 
            // Asynchronously run the action on the current node

            Interlocked.Increment(ref count);
            ThreadPool.QueueUserWorkItem(delegate
            {
                action(node.Data);
                if (Interlocked.Decrement(ref count) == 0)
                    mre.Set();
            });

 
            // Process the children

            processNode(node.Left);
            processNode(node.Right);
        };

 
        // Start off with the root node