Internal vs. External Iterators (revisted)

A while back I posted about my issues in C# when I was trying to write out an extensible collections API.

From the high level perspective I wanted to provide useful concrete implementations of many structures that would solve most peoples needs, and I also wanted to include good abstract implementations of most behaviors so that it would be pretty easy for someone to come in and implement one of the collections interfaces without too much burden.

From the low level perspective something I struggled with was whether or not to base the collections on an external iterator pattern or an internal iterator pattern.

Recall that an internal iterator behaves like this:

 
///returns false when you iteration to stop
delegate void Action(object a);

interface IIteratable {
     void Iterate(Action p);
}

A collection implementing the internal iterator is just passed a delegate which it will then apply to all of its internal elements in order.

An external iterator behaves like this:

 
interface IEnumerator {
     object Current { get; }
     bool MoveNext();
}

interface IEnumerable {
     IEnumerator GetEnumerator();
}

Why would I want two different iteration methods supplied in the API? Well, it comes down to a balance between ease of implementation vs ease of use. Let's use the following simple case: I want to provide a sorted set of elements and I do so by using a binary tree. Now I have two of these sets and I want to test them for equality. Since they are both sorted sets then I know that they have no duplicate elements. So to test them for equality I just need to check their sizes and then iterate over them verifying that each has the same elements in the same order.

Say my set has the following definition for the nodes it uses in the tree

 
class BinaryNode {
       object value;
       BinaryNode; left;
       BinaryNode right;
} 

If I try to write the IIterator for that. It's pretty simple with the following code:

 
      void Iterate(Action a) {
             if (left != null) {
                   left.Iterate(a);
             }

             a(value);

             if (right != null) {
                   right.Iterate(a);
             }
      }

If I wanted to implement the IEnumerable version though it would be far more complex. (Just try writing it!). However, say I have written the IEnumerable version and I want to test two sets for equality, then it's incredible simple by just enumerating over the two IEnumerators returned from each set. Testing equality with the IIterator would be far more complex in that case. (Just try writing it!) :-)

So (in general) IEnumerable puts a far greater burden on the programmer but more power in th users hand, while IIterator puts a far greater burden on the consumer of the API while simplifying things for the API developer. So wouldn't it be great if you could have a solution that made it easy for the developer while being powerful enough for the consumer. That's what I attempted to do. However, it seems that in C# it's only easy to go in one way. i.e. i could write:

 
public abstract class AbstractCollection : ICollection {
       public void Iterate(Action action) {
             foreach(A element in this) {
                    action(element);
             }
       }

       public abstract IEnumerator GetEnumerator();
}

However, what if I wanted the reverse. What if I wanted to implement GetEnumerator on top of Iterate. I struggled with this for a while and ended up giving up on it since it didn't seem possible to do without some nasty threading hacks. (see the feedback from that post to see Steve Steiner actually go through and implement that idea).

I had not thought about this in a while when suddenly i got a great email from Neil pointing me to this article on fibers and coroutines in .Net. In the original posting I pointed out that what I was trying to do was fairly trivial in some functional language through the use of call-with-current-continuation (or some hackery with exceptions). However, I couldn't see a way to do that in C# given the (seeming) lack of low level control necessary to accomplish that.

However, the article shows that that isn't the case and that it is actually quite possible to accomplish the very task I was trying to do. Using the information from the article it seems like it should be possible to write the code similar to the following:

 
     //currently in the AbstractCollection code

     public IEnumerator GetEnumerator() {
            return new CollectionEnumerator(this);
     }

     //note the subclassing of Fiber
     class CollectionEnumerator : Fiber, IEnumerator {
            ICollection c;
            object current;

             CollectionEnumerator(ICollection c) {
                     this.c = c;
                     c.Iterate(delegate (object o) {
                             Yield(o);  
                             //Yield is inherited from Fiber.  It means: "return control to the outer fiber passing back the value 'o'.
                             //The state of this fiber (including registers/stack) is saved and control to is can be resumed in the future.
                             //When this fiber finishes (after yielding the last element), control will be passed back to the main
                             //fiber with the value 'null' returned"
                     });
             }

             public object Current { get { return current; } }
 
             public bool MoveNext() {
                    current = Resume();
                    return current != null;

                    //Resume is inherited from Fiber.  It means: "continue the fiber where it last left off.  When the fiber yields a 
                    //value keep it around in current.  When the fiber finishes, null will be returned and we know that this enumerator
                    //has no more elements"
             }
      }     

Note: it will actually take a little more work, and it would be nicer to clean things up to both work with generics and also deal with the fact that the Iterator might actually produce null as an actual value in the collection, not just something to indicate that iteration completed. However, the core work is pretty much there.

Note: this isn't really that different from what AT, Steve and I were discussing earlier (since it does involve interesting threading ideas), however the pointers provided by the site mean that it can be done much more simply without any extra time/memory tradeoffs and without having to write out all the synchronization logic. It also means that all the fiber logic (as well as saving thread state) is hidden out of the way from you. Another thing to notice is that because it is based on fibers when you run it you always stay within the current thread of execution, so you aren't taking an hit for switching an entire thread in/out. The implementation in the paper requires that one subclass a specialized Fiber class. I'm not a big fan of forcing subclassing as it can interfere with your own natural hierarchies. When I come back to work later next week I'm going to see if I can refactor the code so that you end up with a mixin that you can use to get this without requiring subclassing.

Edit: The code samples here have not been tested. I came up with them after reading the article on MSDN. I can't vouch for whether or not they will actually work and I will post updated (hopefully correct) code when i do implement this later on.