Why iterators are better than collections

  • Introduction

In many languages, we are using collections to manage groups of objects. We are using collections as well in data access layers as in frameworks, components or UI layers. The .Net framework itself is using a hugh number of collections. Today we call a collection, a class that can store a list of elements while offering features like add/remove, sorting, filtering and change notifications.

However, an other solution is available for a long time: the iterators. .Net and Java are offering iterator support since their creation. In older generation technologies we were talking about unidirectional cursors.

  • Iterators

.Net is defining this feature with the IEnumerator interface. Here is its definition:

 

 // Summary:
    //     Supports a simple iteration over a nongeneric collection.
    public interface IEnumerator {
        // Summary:
        //     Gets the current element in the collection.
        //
        // Returns:
        //     The current element in the collection.
        //
        // Exceptions:
        //   System.InvalidOperationException:
        //     The enumerator is positioned before the first element of the collection or
        //     after the last element.
        object Current { get; }

        // Summary:
        //     Advances the enumerator to the next element of the collection.
        //
        // Returns:
        //     true if the enumerator was successfully advanced to the next element; false
        //     if the enumerator has passed the end of the collection.
        //
        // Exceptions:
        //   System.InvalidOperationException:
        //     The collection was modified after the enumerator was created.
        bool MoveNext();
        //
        // Summary:
        //     Sets the enumerator to its initial position, which is before the first element
        //     in the collection.
        //
        // Exceptions:
        //   System.InvalidOperationException:
        //     The collection was modified after the enumerator was created.
        void Reset();
    }

The main advantage about using an unidirectional cursor is to be lightweight. Using this interface, we can reset the cursor, go ahead and read the current value. You can notice that we can not know the element number before reaching the end. This important point also means that the one who is implementing this interface does not need to store the data like an array or a collection does.
The advantage is that a data browsing class having no idea of the length of the data that it is reading can implement an iterator. (like a DataReader or a network stream)

From a langage point of view, instead of using a classical while loop, we can use the foreach statement by implementing the IEnumerable interface.

Then the classical while loop:

 

 IEnumerator enumeration = ObjectImplementingIEnumerator; 

enumeration.Reset(); 

while (enumeration.MoveNext())
{ 
    object o = enumeration.Current; 
}

becomes

 IEnumerable enumeration = ObjectImplementingIEnumerable; 

foreach (string s in enumeration)
{ 
}

IEnumerable defines a single GetEnumerator() method which is returning an IEnumerator. We can consider IEnumerable to be a IEnumerator provider.
The foreach statement also has the advantage to include a implicit cast in its syntax.

  • Comparing collections and iterators in a sample

Here is the subject: in a winform application, let's try to create a method returning the complete list of the child controls (including all the children recursively) for a given control.
Let's remind that the Control class offers a Controls collection property containing all the direct children a control contains.

The following method brings a first solution based on collections. Each intermediate result of each recursive call returns the list of the child controls. Each of those list are added to the parent list: « controls.AddRange(GetAllControls(childControl))»

 

 private List<Control> GetAllControls(Control parentControl)
{ 
    List<Control> controls = new List<Control>(); 

    foreach (Control childControl in parentControl.Controls) 
    { 
        controls.Add(childControl); 
        controls.AddRange(GetAllControls(childControl)); 
    } 
    return controls; 
} 

This solution is completely correct but we will see that the second one based on iterators is more powerfull. Here it is:

 private IEnumerable<Control> EnumerateAllControls(Control parentControl)
{ 
    foreach (Control childControl in parentControl.Controls)
    { 
        yield return childControl; 
        foreach (Control c in EnumerateAllControls(childControl)) 
            yield return c; 
    } 
}

I will not explain the "yield return" keyword in this article, but it is a simplified iterators implementation. You can find a nice article about yield return here.

Why is this solution more powerful ? You can notice that the result is exactly the same.
In my sample (which is attached to this post), I have added a counter to trace how many collections are created using the first solution. So I can measure that 13 collections have been created as I will only use ! Of course, all these collections are taking cpu time and memory. Cpu time is because we navigate many times to same objects and memory is because a same objet belongs to many collections.

In the second solution, no collection is created and there is only one browsing to items !

 

Moreover, while using the enumeration, at anytime you can easily go back to the world of collection: "new List<Control>(enumeration);" or even simpler in C# 3.0: "enumeration.ToList();"

Conclusion

With this last solution, we have a lower and lightweight approach which is faster and does not consumes any collection allocation. As it is easy to recreate a collection at anytime, we keep all the advantages of the first solution.

For information, you can find this article in french here.

iteratorsvscollections.zip