The LINQ Farm: Query Operators

Article
11/11/2006

This article describes simple ways to query an in-memory collection or "table" using LINQ query expressions. The focus will be on a particular part of a query expression called a query operator. Query operators such as select, where, join and groupby are the primary engine driving LINQ queries. Hence the explanation of query operators found in this post provides you with the keys to the LINQ kingdom. Once you understand the basics of how to use query operators in query expressions, you will be ready to begin serious and useful work with LINQ.

NOTE: This is the third in a series of posts on LINQ. An index to this series is available in my blog . The code for this post is available for download.

Rather than directly access a database server, I will show how to use a new feature called collection initializers to quickly create an in-memory collection that will act just like a database table. By working with an in-memory "table" you can see how the syntax for querying a database works without having to connect directly to a database server. You will also begin to see how you can use the same syntax to query a database table or a different type of data structure such as a collection.

The "table" found in this post's example program will contain one row of data for each of the 48 query operators you can use in the May LINQ CTP. A CTP, or Community Technical Preview, is a kind of pre-beta, offering a sneak peak at upcoming technology.

In this post you will get a chance to use a few query operators, and to view the names of all the query operators. By the time you are done reading, you should have a sense of the important role that query operators play in the LINQ technology. Please remember that we are working with pre-release code. It is therefore possible that a few of the details of how LINQ works will change before the product ships.

The query operators are declared in a static class called System.Query.Sequence. They are stored in an assembly called System.Query.dll.

Collection Initializers

Collection initializers provide a shorthand for creating a collection or List<>. The example found in Listing One shows how to create a list of pre-initialized instances of the class called Operator. The custom Operator class is defined at the beginning of the listing.

Listing One: A Collection Initializer creates a collection from a set of literals.

    1:   class Operator

    2:   {

    3:       public int OperatorID;

    4:       public string OperatorName;

    5:       public string OperatorType;

    6:   }

7:

    8:   private List<Operator> OperatorList;

9:

   10:   private void CreateLists()

   11:   {

   12:       // Collection initializer

   13:       OperatorList =

   14:           new List<Operator>

   15:           {

   16:               { OperatorID = 1, OperatorName = "Where",

   17:                 OperatorType = "Restriction" },

   18:               { OperatorID = 2, OperatorName = "Select",

   19:                 OperatorType = "Projection" },

   20:               { OperatorID = 3, OperatorName = "SelectMany",

   21:                 OperatorType = "Projection" }

   22:           }

   23:  }

You can see that Operator has three fields called OperatorId, OperatorName, and OperatorType. All three fields are initialized in this example. The end result is a list containing three instances of the Operator class.

Query Operators

The operators under examination here are the query operators that are part of the LINQ API. If you download the source for this example, you will see that there are actually 48 different query operators available in the May CTP. In Listing One, I only initialize three operators. I do this to keep the example simple and easy to read. The source, however, is considerably longer and shows not three operators, but 48.

The example we are building in this post will provide a means of querying this in-memory table, or collection. My general goal is to provide examples of how to query either in-memory objects, or a real database. One of the great benefits of LINQ is that it uses nearly identical code to query a real database and a collection.

Simple Queries

In this post I'm going to show you three simple queries, shown in Listing Two. The query expressions that form the heart of this code are found on lines 3 and 4, lines 14 and 15, and lines 25, 26 and 27.

Listing Two: Three simple ways to query the data in the OperatorList

    1:  public void ShowOperatorObjects(System.Collections.IList list)

    2:   {

    3:       var s = from p in OperatorList

    4:               select p;

5:

    6:       foreach (var a in s)

    7:       {

    8:           list.Add(a);

    9:       }

   10:   }

11:

   12:   public void ShowOperatorNames(System.Collections.IList list)

   13:   {

   14:       var s = from p in OperatorList

   15:               select p.OperatorName;

16:

   17:       foreach (var a in s)

   18:       {

   19:           list.Add(a);

   20:       }

   21:   }

22:

   23:   public void ShowOperatorGeneration(System.Collections.IList list)

   24:   {

   25:       var s = from p in OperatorList

   26:               where p.OperatorType.Equals("Generation")

   27:               select p.OperatorName;

28:

   29:       foreach (var a in s)

   30:       {

   31:           list.Add(a);

   32:       }

   33:   }

The example program I am using is a Windows Forms application that uses a ListBox to display data, as shown in Figure One. Since I'm working with a ListBox, I pass in an IList to each of these methods. By passing in this list, I can ensure that the data found in our query expressions will be displayed in the ListBox:

queryData.ShowOperatorObjects(listBox1.Items);

NOTE: I need to fully qualify IList with the System.Collections namespace because I don't want to confuse the System.Collections.Generic.IList interface with the System.Collections.IList interface used by the ListBox class. The necessity of adding this rather verbose qualification is an unfortunate, but unavoidable, exercise.

Figure One: Some of the data returned by running the ShowOperatorObjects method found in Listing Two.

The ShowOperatorObjects query is found on lines 3 and 4 of Listing Two. It produces the output shown in Figure One.

This query expression asks the compiler to "select all the items from the OperatorList." In the discussion of collection initializers, we saw that in this program these items will be instances of the Operator class.

As you learned in the previous posts in this series, the code on lines 3 and 4 is interesting because it demonstrates how to use a simple, type-safe, native-to-C#, declarative, SQL-like syntax for querying data. In short, it shows how to use LINQ.

The data we are querying is stored not in a database, but in a collection of type List<> . Had the data been stored in a database we could have used identical syntax to query the table.

The string "QueryLister.QueryData+Operator"is the output from the ToString() method of the Operator class. Why does the ToString() method return this rather odd looking string? Take a look at Listing Three. This is another view of the same class shown in Listing Two. You can see that the Operator class is sub-class of a class called QueryData which is declared in a namespace called QueryLister.

Listing Three: This second view of the code excerpts shown in Listing One gives you a sense of the scoping of the Operator class.

    1:  namespace QueryLister

    2:  {

    3:      class QueryData

    4:      {

    5:          class Operator

    6:          {

    7:              public int OperatorID;

    8:              public string OperatorName;

    9:              public string OperatorType;

   10:          }

11:

   12:          // lots of code omitted here

   13:      }

   14:  }

The var Keyword

The code shown on lines 1 through 9 of Listing Two could have been written like this:

Listing Four: Here both instances of the var keyword have been removed from the ShowOperatorObjects method.

    1:  public void ShowOperatorObjects(System.Collections.IList list)

    2:  {

    3:      IEnumerable<Operator> s =

    4:          from p in OperatorList

    5:          select p;

6:

    7:      foreach (Operator a in s)

    8:      {

    9:          list.Add(a);

   10:      }

   11:  }

Either version of the ShowOperatorObjects method will compile, and both produce the same output. In fact, they both are asking the compiler to do more or less the same thing.

On line 3 of Listing Four you can see that I have replaced the declaration var s with IEnumerable<Operator> s. These are really two ways of saying the same thing. In LINQ, however, the var syntax is preferred in part because it makes programming simpler, and in part because it plays an important role in LINQ programming.

In later posts, you will see that there are syntactical constructs called anonymous types that are used in LINQ programs. I'll talk about these anonymous types in more depth in later posts. For now, you only need to know that anonymous types have no name and no explicit type. If you don't know the type of a variable, then you can't declare it. To avoid putting a developer in this awkward situation, LINQ uses the var type. The var type is a "typeless" type that can, for instance, stand for any data that is returned by a query expression. It can even stand for an anonymous type that is never explicitly declared in your program!

If all this business about anonymous types sounds confusing, then just ignore it. All you really need to know is that the var type makes LINQ programming simple. The code shown in Listing Four is simpler than the code in Listing Two. It is easier to write var than it is to write IEnumerable<Operator> . In LINQ, query expressions can almost always be declared to return a var type, and you generally don't have to worry about exact type that is being returned.

The var type is simple, clean, and easy to use. Don't worry, be happy. var is easy to use. Rejoice! It makes your life simpler!

Slightly More Complex Query Expressions

The output from the code in the ShowOperatorNames method is shown in Figure Two. This latter method is just slightly more sophisticated than the code in the ShowOperatorObjects method.

Figure Two: The output from the ShowOperatorNames method gives you a complete list of all the operators found in the May LINQ CTP.

I will show you the ShowOperatorNames method once again. The code shown here is identical to the code in Listing Two, but I am repeating it so that you don't have to scroll back and forth in your browser:

    1:  public void ShowOperatorNames(System.Collections.IList list)

    2:  {

    3:      var s = from p in OperatorList

    4:       select p.OperatorName;

5:

    6:      foreach (var a in s)

    7:      {

    8:          list.Add(a);

    9:      }

   10:  }

This method is similar to the ShowOperatorObjects method. In this case, however, we qualify the select statement by asking specifically for the OperatorName field from the Operator class. It is this difference that makes the output in Figure Two so much more useful than that in Figure One. In the first figure, we see the output from the ToString method of the whole Operator class. In figure two, however, we see the actual OperatorName from the Operator class.

As explained in previous posts, the variable p is called a range variable and it is never specifically declared. In this simple query we know that p is of type Operator. We know this because OperatorList contains instances of the Operator class. Furthermore, we know that that the Operator class has OperatorName as one of its fields. The compiler is also privy to this information. Thus the field OperatorName is type checked.

Let's take a moment to consider the importance of what is happening here. When you wrote a SQL expression in the bad old days, you had to write a string literal such as "SELECT OperatorName FROM OperatorList". These string literals were not checked at compile time. If you accidentally typed OperatorsName instead of OperatorName, your query would fail, but you would not know of the problem until you compiled and ran your program. With LINQ, errors like this are caught at compiler time!

LINQ is giving you two big advantages you didn't have before:

A native C# query language that gives you compile time type checking
The ability to use a single, unified query language whether you are querying databases, xml, or in-memory data structures such as the OperatorList in this example.

The Where Query Operator

Let's take a look at the Where operator found in the ShowOperatorGeneration method. This method produces the output shown in Figure Three.

Figure Three: The output from the ShowOperatorGeneration method gives you a list of all the operators of type Generation.

Here is the code from the ShowOperatorGeneration method:

    1:  public void ShowOperatorGeneration(System.Collections.IList list)

    2:  {

    3:      var s = from p in OperatorList

    4:       where p.OperatorType.Equals("Generation")

    5:       select p.OperatorName;

6:

    7:      foreach (var a in s)

    8:      {

    9:          list.Add(a);

   10:      }

   11:  }

This code is similar to that in the ShowOperatorNames method, except we have added a where clause that uses the where operator.

The OperatorList collection is a table-like structure with rows that look like this:

Operator	OperatorName	OperatorType
1	Where	Restriction
2	Select	Projection
3	SelectMany	Projection
4	Take	Partitioning
5	Skip	Partitioning
6	Range	Generation
7	Repeat	Generation
8	Empty	Generation

There are 48 rows in the table, but here I show just 8 sample rows. As you can see, in this poorly normalized table the OperatorType sometimes repeats. This is because the OperatorType is used to categorize the various kinds of operators. For instance, the "Generation" type has three members called Range, Repeat and Empty.

I take advantage of the current simple "table" structure to show how to use the where operator to query this collection. In particular, the program asks to see "the OperatorName from all the instances of the Operator class in the collection that have their OperatorType set to the word 'Generation.'" The result is the data shown in Figure 3.

Summary

In this post you were introduced to query operators. We saw a listing and some program output that revealed the names of a number of these operators. We also had a chance to use two of the operators, called select and where.

Future posts in this series will continue to explore these query operators. You will get a chance to see many of them in working code, and I will provide tables listing all of the operators. This exploration of query operators will be a key building block in our study of LINQ.

LINQQueryLister.zip