The Yield Contextual Keyword

[Blog Map]  [Table of Contents]  [Next Topic]

Yield Return is a means to more elegantly implement the plumbing for iteration.  Yield was introduced in C# 2.0, but my informal polling indicates that many developers don't yet understand it.  It's not hard, but it deserves some explanation.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCUse of this construct is vital for LINQ.  Yield return allows one enumerable function to be implemented in terms of another.  It allows us to write functions that return collections that exhibit lazy behavior.  This allows LINQ and LINQ to XML to delay execution of queries until the latest possible moment.  it allows queries to be implemented in such a way that LINQ and LINQ to XML do not need to assemble massive intermediate results of queries.  Without the avoidance of intermediate results of queries, the system would rapidly become unwieldy and unworkable.

The following two small programs demonstrate the difference in implementing a collection via the IEnumerable interface, and using yield return in an iterator block.

With this first example, you can see that there is a lot of plumbing that you have to write.  You have to implement a class that derives from IEnumerable, and another class that derives from IEnumerator.  The GetEnumerator() method in MyListOfStrings returns an instance of the class that derives from IEnumerator.  But the end result is that you can iterate through the collection using foreach.

public class MyListOfStrings : IEnumerable
{
private string[] _strings;
public MyListOfStrings(string[] sArray)
{
_strings = new string[sArray.Length];

for (int i = 0; i < sArray.Length; i++)
{
_strings[i] = sArray[i];
}
}

public IEnumerator GetEnumerator()
{
return new StringEnum(_strings);
}
}

public class StringEnum : IEnumerator
{
public string[] _strings;

// Enumerators are positioned before the first element
// until the first MoveNext() call.
int position = -1;

public StringEnum(string[] list)
{
_strings = list;
}

public bool MoveNext()
{
position++;
return (position < _strings.Length);
}

public void Reset()
{
position = -1;
}

public object Current
{
get
{
try
{
Console.WriteLine("about to return {0}", _strings[position]);
return _strings[position];
}
catch (IndexOutOfRangeException)
{
throw new InvalidOperationException();
}
}
}
}

class Program
{
static void Main(string[] args)
{
string[] sa = new[] {
"aaa",
"bbb",
"ccc"
};

MyListOfStrings p = new MyListOfStrings(sa);

foreach (string s in p)
Console.WriteLine(s);
}
}

Using the yield return keywords, the equivalent in functionality is as follows.  This code is attached to this page:

class Program
{
public static IEnumerable<string> MyListOfStrings(string[] sa)
{
foreach (var s in sa)
{
Console.WriteLine("about to yield return");
yield return s;
}
}

static void Main(string[] args)
{
string[] sa = new[] {
"aaa",
"bbb",
"ccc"
};

foreach (string s in MyListOfStrings(sa))
Console.WriteLine(s);
}
}

As you can see, this is significantly easier.

This isn't as magic as it looks.  When you use the yield contextual keyword, what happens is that the compiler automatically generates an enumerator class that keeps the current state of the iteration.  This class has four potential states: before, running, suspended, and after.  This class has Reset and MoveNext methods, and a Current property.  When you iterate through a collection that is implemented using yield return, you are moving from item to item in the enumerator using the MoveNext method.  The implementation of iterator blocks is fairly involved.  A technical discussion of iterator blocks can be found in the C# specifications.

Yield return is very important when implementing our own query operators (which we will want to do sometimes).

There is no counterpart to the yield keyword in Visual Basic 9.0, so if you are implementing a query operator in Visual Basic 9.0, you must use the approach where you implement IEnumerable and IEnumerator.

One of the important design philosophies about the LINQ and LINQ to XML technologies is that they should not break existing programs.  Adding new keywords will break existing programs if the programs happen to use the keyword in a context that would be invalid.  Therefore, some keywords are added to the language as contextual keywords.  This means that when the keyword is encountered at specific places in the program, it is interpreted as a keyword, whereas when the keyword is encountered elsewhere, it may be interpreted as an identifier.  Yield is one of these keywords.  When it is encountered before a return or break keyword, it is interpreted by the compiler as appropriate, and the new semantics are applied.  If the program was written in C# 1.0 or 1.1, and it contained an identifier named yield, then the identifier continues to be parsed correctly by the compiler, and the program is not made invalid by the language extensions.

[Blog Map]  [Table of Contents]  [Next Topic]

Yield.cs