Why iterators are better than collections



  • Introduction

In many languages, we are using collections to manage groups of objects. We are using collections as well in data access layers as in frameworks, components or UI layers. The .Net framework itself is using a hugh number of collections. Today we call a collection, a class that can store a list of elements while offering features like add/remove, sorting, filtering and change notifications.

However, an other solution is available for a long time: the iterators. .Net and Java are offering iterator support since their creation. In older generation technologies we were talking about unidirectional cursors.



  • Iterators

.Net is defining this feature with the IEnumerator interface. Here is its definition:


 


// Summary:
// Supports a simple iteration over a nongeneric collection.
public interface IEnumerator {
// Summary:
// Gets the current element in the collection.
//
// Returns:
// The current element in the collection.
//
// Exceptions:
// System.InvalidOperationException:
// The enumerator is positioned before the first element of the collection or
// after the last element.
object Current { get; }

// Summary:
// Advances the enumerator to the next element of the collection.
//
// Returns:
// true if the enumerator was successfully advanced to the next element; false
// if the enumerator has passed the end of the collection.
//
// Exceptions:
// System.InvalidOperationException:
// The collection was modified after the enumerator was created.
bool MoveNext();
//
// Summary:
// Sets the enumerator to its initial position, which is before the first element
// in the collection.
//
// Exceptions:
// System.InvalidOperationException:
// The collection was modified after the enumerator was created.
void Reset();
}


The main advantage about using an unidirectional cursor is to be lightweight. Using this interface, we can reset the cursor, go ahead and read the current value. You can notice that we can not know the element number before reaching the end. This important point also means that the one who is implementing this interface does not need to store the data like an array or a collection does.
The advantage is that a data browsing class having no idea of the length of the data that it is reading can implement an iterator. (like a DataReader or a network stream)

From a langage point of view, instead of using a classical while loop, we can use the foreach statement by implementing the IEnumerable interface.

Then the classical while loop:

 


IEnumerator enumeration = ObjectImplementingIEnumerator;

enumeration.Reset();

while (enumeration.MoveNext())
{
object o = enumeration.Current;
}


becomes


IEnumerable enumeration = ObjectImplementingIEnumerable;

foreach (string s in enumeration)
{
}


IEnumerable defines a single GetEnumerator() method which is returning an IEnumerator. We can consider IEnumerable to be a IEnumerator provider.
The foreach statement also has the advantage to include a implicit cast in its syntax.



  • Comparing collections and iterators in a sample

Here is the subject: in a winform application, let’s try to create a method returning the complete list of the child controls (including all the children recursively) for a given control.
Let’s remind that the Control class offers a Controls collection property containing all the direct children a control contains.

The following method brings a first solution based on collections. Each intermediate result of each recursive call returns the list of the child controls. Each of those list are added to the parent list: « controls.AddRange(GetAllControls(childControl))»

 


private List<Control> GetAllControls(Control parentControl)
{
List
<Control> controls = new List<Control>();

foreach (Control childControl in parentControl.Controls)
{
controls.Add(childControl);
controls.AddRange(GetAllControls(childControl));
}
return controls;
}


This solution is completely correct but we will see that the second one based on iterators is more powerfull. Here it is:


private IEnumerable<Control> EnumerateAllControls(Control parentControl)
{
foreach (Control childControl in parentControl.Controls)
{
yield return childControl;
foreach (Control c in EnumerateAllControls(childControl))
yield return c;
}
}

I will not explain the “yield return” keyword in this article, but it is a simplified iterators implementation. You can find a nice article about yield return here.

Why is this solution more powerful ? You can notice that the result is exactly the same.
In my sample (which is attached to this post), I have added a counter to trace how many collections are created using the first solution. So I can measure that 13 collections have been created as I will only use ! Of course, all these collections are taking cpu time and memory. Cpu time is because we navigate many times to same objects and memory is because a same objet belongs to many collections.


In the second solution, no collection is created and there is only one browsing to items !


 


Moreover, while using the enumeration, at anytime you can easily go back to the world of collection: “new List<Control>(enumeration);” or even simpler in C# 3.0: “enumeration.ToList();”


Conclusion


With this last solution, we have a lower and lightweight approach which is faster and does not consumes any collection allocation. As it is easy to recreate a collection at anytime, we keep all the advantages of the first solution.

For information, you can find this article in french here.

iteratorsvscollections.zip

Comments (6)

  1. foldip says:

    When using all the elements returned by the function the difference is not exactly that clear. Consider that when you use "yield" there are objects created in the background which holds the actual environment of the call. I think that although this, it’s a more powerful solution, but the real difference is when you don’t want to iterate over the entire collection. Because in the case using collections the entire collection is created even if you break the iteration for example at the second element.

  2. mitsu says:

    Thanks for your comment !

    I understand what you mean but the underlying compiler generated class that implements the iterator is still lighter than dealing with a collection which weight depends on the number of elements. Even if you don’t filter or break the iteration, which is a case where iterations advantage is huge (I totally agree with you), in my sample, the solution 2 nagivates only once over the elements. In solution 1, the code is navigating on the same element so many times !

  3. Last week the Microsoft MVP’s converged on Redmond from all corners of the globe. It was a great occaission

  4. jcteague says:

    I have a question regarding Iterators vs the Foreach(Predicate<T>) method in List<T>.

    Is there an advantage to creating an iterator rather than using the Foreach method.  I will often use this method when I’m finding a subset in a list using simple anonymous delegates.  Would it be for efficient to use an Iterator instead?

  5. mitsu says:

    List<T> does not contain Foreach(Predicate<T>) but Foreach(Action<T>). Using Foreach(Action<T>), you must implement a final action for one full navigation of the collection. Using iterators, you can create a sequence of foreachs if your filter (or order or any other action) returns an iterator again. Whatever the number of action nested in the sequence, the magic of the yield return keyword will create a single navigation over the items.