Lazy Evaluation (and in contrast, Eager Evaluation)

[Blog Map]  [Table of Contents]  [Next Topic]

One of the most important concepts in LINQ is the notion of lazy evaluation.  Without this facility, LINQ would perform very poorly.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCQuery expressions operate on some type that implements IEnumerable<T>.  The variable of type IEnumerable<T> on which the query expression operates is the source of the query expression.  Further, query expressions often evaluate to IEnumerable<T>.  As you have also seen, you can declare a variable of type IEnumerable<T>, and assign a query expression to a variable.  We call those variables queries, because that is what they are.  They are not the result of the query – the variable itself is the query.  The gist of lazy evaluation is that until you iterate over the query, the source is not iterated.  This topic will demonstrate this notion.

This is a powerful concept.  It allows you to write expressive queries that also perform well.  However, you must be aware of when lazy evaluation takes place.  If you are not, then strange bugs can show up in your code.  One category of these strange bugs is that demonstrated by The Halloween Problem (so called because some researchers first observed it on October 31, and the bugs appeared to be spooky).

Lazy Evaluation Demonstrated

The easiest way to demonstrate lazy evaluation is to create an extension method that enumerates a collection and prints to the console as each member is enumerated.  The simplest way to demonstrate this is via a collection of strings.

For simplicities sake, this example is somewhat artificial.  Here is an extension method that converts every string in a collection to upper case.  The following code is attached to this page.

public static class MyExtension
{
public static IEnumerable<string> ToUpper(this IEnumerable<string> source)
{
foreach (string s in source)
{
Console.WriteLine("Yield returning: {0}", s);
yield return s.ToUpper();
}
}
}

Now, we can write code that:

·         Initializes an array of strings.

·         Assigns the array of strings to another variable, applying the ToUpper extension method to the collection.

·         Writes a line to the console indicating that the code is just about to iterate through the new collection.

·         Iterates through the new collection, printing each item in the collection.

The following code is attached to this page:

string[] sa = new[] {
"aaa",
"Bbb",
"CCc"
};

Console.WriteLine("Before using ToUpper()");
var sb = sa.ToUpper();

Console.WriteLine("After using ToUpper()");
Console.WriteLine("Before iterating the collection in sb");

foreach (string s in sb)
Console.WriteLine("Within iteration, s: {0}", s);

When you run this code, you will see:

Before using ToUpper()
After using ToUpper()
Before iterating the collection in sb
Yield returning: aaa
Within iteration, s: AAA
Yield returning: Bbb
Within iteration, s: BBB
Yield returning: CCc
Within iteration, s: CCC

What you will notice is that the extension method isn't called until the result collection is iterated.

Key to this is the yield return statement in the iteration.  When you use the yield return statement in a block, the block is called an iterator block.

To make this explicit, if you modify the code so that it iterates through the collection twice, like this:

string[] sa = new[] {
"aaa",
"Bbb",
"CCc"
};

Console.WriteLine("Before using ToUpper()");
var sb = sa.ToUpper();

Console.WriteLine("After using ToUpper()");
Console.WriteLine("Before iterating the collection in sb");

foreach (string s in sb)
Console.WriteLine("Within iteration, s: {0}", s);

foreach (string s in sb)
Console.WriteLine("Within second iteration, s: {0}", s);

You will see that all elements are returned twice from the iterator block in the extension method:

Before using ToUpper()
After using ToUpper()
Before iterating the collection in sb
Yield returning: aaa
Within iteration, s: AAA
Yield returning: Bbb
Within iteration, s: BBB
Yield returning: CCc
Within iteration, s: CCC
Yield returning: aaa
Within second iteration, s: AAA
Yield returning: Bbb
Within second iteration, s: BBB
Yield returning: CCc
Within second iteration, s: CCC

Materializing a Collection

One more point: you can force a collection to be iterated and a list created in memory using the ToList standard query operator, as follows:

string[] sa = new[] {
"aaa",
"Bbb",
"CCc"
};

Console.WriteLine("Before using ToUpper()");
var sb = sa.ToUpper().ToList();

Console.WriteLine("After using ToUpper()");
Console.WriteLine("Before iterating the collection in sb");

foreach (string s in sb)
Console.WriteLine("Within iteration, s: {0}", s);

When you use the ToList operator, it completely changes the behavior of the program.  This forces the collection to be completely iterated.  All items are collected and a List<T> is created.  The output of this code is:

Before using ToUpper()
Yield returning: aaa
Yield returning: Bbb
Yield returning: CCc
After using ToUpper()
Before iterating the collection in sb
Within iteration, s: AAA
Within iteration, s: BBB
Within iteration, s: CCC

When you materialize a collection, as above, you are no longer using lazy evaluation; you are using what is called eager evaluation.

Chaining Queries Together

As you write more and more interesting queries, they will become more complicated syntactically.  You will start to embed queries within other queries. Sometimes they will be nested 3 or 4 queries deep.  Sometimes you will be using multiple queries at any given level of nesting.  This becomes unreadable when taken too far.

One approach to cleaning up your code is to chain queries together.  Due to the lazy evaluation characteristics of LINQ (and LINQ to XML), you can chain queries together in such a way that the resulting operation is just as efficient as if you wrote both queries in a single query.

The following code uses several queries.  However, the code actually iterates over the source array only once, when it calls the Sum method.

The following code is attached to this page.

string[] sa = new[] {
"#33",
"#22",
"% this is text",
"#18"
};

var justNumbersAsStrings =
from s in sa
where s.StartsWith("#")
select s.Substring(1);

var numbersAsInts =
from j in justNumbersAsStrings
select Int32.Parse(j);

var sum =
numbersAsInts.Sum();

Console.WriteLine(sum);

What should be clear at this point is that even though we have chained together queries that yield collections, no intermediate collections are materialized.  Instead, each item is passed from one lazy method to the next.

[Blog Map]  [Table of Contents]  [Next Topic]

LazyEvaluation.cs