yield return and Continuation-Passing Style

Someone was recently porting some C# code to VB and had a question about how to convert the C# yield return iterator methods to VB (VB currently doesn’t support iterators).

There were a lot of replies like “use Reflector on a compiled binary and copy-paste the generated state machine code”. The problem with the Reflector approach is that you go one step down the abstraction ladder and lose the high-level intent expressed in the original code. Resulting code will be surely harder to read and maintain.

Surprisingly, no one mentioned CPS. But before applying continuation-passing style, let’s first look at the nature of yield return. It’s essentially a producer-consumer model where the producer is a state machine where transitions are triggered by the MoveNext calls and current state is saved in the Current property. On the consumer side there is eventually almost always a foreach loop with some logic in the body, and this logic only requests the next element (and triggers a state machine transition) after it’s done processing the current element.

It turns out, we can preserve the algorithm encoded in the iterator and avoid using yield return and thus having the compiler to generate the state machine code for us. To achieve this, we pass the logic that used to be in the consumer foreach loop (a continuation) directly into the iterator.

Here’s an example with yield return that we want to convert:

 using System;
using System.Collections.Generic;

class Node<T>
{
    public Node<T> Left { get; set; }
    public T Value { get; set; }
    public Node<T> Right { get; set; }

    public IEnumerable<T> Traverse()
    {
        if (Left != null)
        {
            foreach (var item in Left.Traverse())
            {
                yield return item;
            }
        }
        yield return Value;
        if (Right != null)
        {
            foreach (var item in Right.Traverse())
            {
                yield return item;
            }
        }
    }
}

class Program
{
    static void Main(string[] args)
    {
        Node<int> tree = new Node<int>()
        {
            Left = new Node<int>()
            {
                Value = 1,
                Right = new Node<int>()
                {
                    Value = 2
                }
            },
            Value = 3,
            Right = new Node<int>()
            {
                Value = 4
            }
        };

        foreach (var item in tree.Traverse())
        {
            Console.WriteLine(item);
        }
    }
}

Now, the trick is to pass the “continuation”, which is the code that processes the results, directly into the iterator method (using first-class functions AKA delegates):

 public IEnumerable<T> Traverse()
{
    List<T> result = new List<T>();
    TraverseInner(this, result.Add);
    return result;
}

void TraverseInner(Node<T> root, Action<T> collector)
{
    if (root == null) return;
    TraverseInner(root.Left, collector);
    collector(root.Value);
    TraverseInner(root.Right, collector);
}

Note how we created an internal helper that actually does the traversing and how the logic of the traversal is even shorter than in the original method. We don’t use yield return and still maintain a high level of abstraction. Where was yield return, now is a call to the helper method. Otherwise, the control flow is the same.

The downside of this approach though is that we lose laziness, which means, that once requested, we eagerly calculate all the results and return them at once. This is the price that we pay for losing the state machine that can store intermediate results for us.

If we remove the limitation of having to return IEnumerable<T>, we can directly consume the helper method without having to write a foreach loop:

 TraverseInner(tree, Console.WriteLine);

Here we’re passing the “continuation” (which is the Console.WriteLine method) directly inside the iterator. Note how the consumer side became shorter as well because we don’t have to write a foreach loop.

Note: a while ago I blogged about yield foreach which would allow to get rid of foreach statements in the iterator scenario as well.

Note 2: I’m guessing it’s possible to get rid of yield return and still keep the laziness, I just need to do more homework on Push LINQ and similar to find a nice solution to this.