Performance of Chained Queries

 

Introduction

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCOne of the most important aspects of LINQ (and LINQ to XML) is that performance of chained queries can be effectively the same as if you write one larger, more complicated query.

Overview

You can write a query that uses another query as its source. For example, in the following code (which is purposely very simple), query2 is written with query1 as its source:

[c#]

XElement root = new XElement("Root",

    new XElement("Child", 1),

    new XElement("Child", 2),

    new XElement("Child", 3),

    new XElement("Child", 4)

);

 

var query1 = from x in root.Elements("Child")

             where (int)x >= 3

             select x;

 

var query2 = from e in query1

             where (int)e % 2 == 0

      select e;

 

foreach (var i in query2)

    Console.WriteLine("{0}", (int)i);

 

This example produces the following output:

4

This example has the same order of magnitude of performance as iterating through a linked list. Here is why:

· The Elements axis has the same performance as iterating through a linked list. It is implemented as an iterator with deferred execution. This means that it does a certain amount of extra work, such as allocating the iterator object, and keeping track of a certain amount of state. This extra work can be divided into two categories: the work that is done with the iterator is set up, and the work that is done in each iteration. The setup work is a fixed amount of work and the work done during iteration is proportionate to the number of items in the source array. Neither amount of work is particularly onerous.

· In query1, the where clause causes the query to call the System.Linq.Enumerable.Where method. This method is also implemented as an iterator. The setup work consists of instantiating the delegate that will reference the lambda expression, and the normal setup for an iterator. With each iteration, the delegate is called to execute the predicate.

· In query1, the select clause causes the query to call the System.Linq.Enumerable.Select method. This method has the same performance profile as the System.Linq.Enumerable.Where method.

· In query2, both the where clause and the select clause have the same performance profile as in query1.

The iteration through query2 is therefore in linear time. A corresponding Visual Basic example would have the same performance profile.

For more information on iterators, see yield (C# Reference).

For a more detailed tutorial on chaining queries together, see Chaining Queries Together.