LINQ Farm: Focus on Grouping

This is the fifth post in a series of articles on LINQ and query expressions. LINQ will be part of Visual Studio Orcas. It provides a simple means of querying data. This article focuses on grouping information from a query into related sets of data.

The more I use LINQ query expressions, the simpler and more intuitive they seem to me. There is no question that many developers will need to overcome some conceptual hurtles when they first see this technology. We have faced hurtles like this before however. At first objects and interfaces seemed strange to us, but now they are the tools on which we base most of our programs. LINQ developers need to undergo a similar transformation of their thinking. With continued use LINQ becomes increasingly familiar, and increasingly easy to use.

Simple Grouping

As you recall from previous posts, the OperatorType is a string field that sometimes repeats. Each row in our table has one operator, but some Operators have the same OperatorType. In table one, you can see that Take and Skip both have the same OperatorType.

Table 1: Some rows from the OperatorList table.

Operator OperatorName OperatorType
1 Where Restriction
2 Select Projection
3 SelectMany Projection
4 Take Partitioning
5 Skip Partitioning
6 Range Generation
7 Repeat Generation
8 Empty Generation
9 Distinct Set
10 Union Set
11 Intersect Set
12 Except Set

Consider the code shown in Listing 1. The code in this listing sorts our each unique instance of the OperatorType. For instance, if the code shown here were run against the contents of table 1, then the result would be Restriction, Projection, Partitioning, Generation and Set

Listing One: A simple example of grouping.

    1:  public void Group()
    2:  {
    3:   
    4:      var s = from p in operatorList
    5:              group p by p.OperatorType
    6:              into MyGroup
    7:              select MyGroup.Key;
    8:   
    9:      Utilities.Display(listBox, s);
   10:  }

Let's take the lines of code one at a time. Line 4 states that we want to look at each item in the OperatorList. On line 5 we ask to group the range variable p by OperatorType. This is the key line, as it creates one group for each unique OperatorType. In line 6, we put the group into a variable called MyGroup, and finally, in line 7, we select from that group the key, which gives us a the list of unique OperatorTypes.

Here is the code that displays the output:

    1:  public static void Display(System.Collections.IList listBox, 
    2:     IEnumerable<string> s)
    3:  {
    4:      foreach (var value in s)
    5:      {
    6:          listBox.Add(value);
    7:      }
    8:  }

When looking at this code, you will notice that we have to declare the actual type of the variable s from line 4 of listing 1 when we pass it to this function. We do this because the var keyword cannot be used in a method declaration.

Ordering

If you want to see the output ordered alphabetically, you need to make only one slight change to the code shown in Listing One.

Listing Two: Ordering the output.

    1:  public void GroupOrdered()
    2:  {
    3:   
    4:      var s = from p in operatorList
    5:              group p by p.OperatorType
    6:              into MyGroup
    7:              orderby MyGroup.Key
    8:              select MyGroup.Key;
    9:   
   10:      Utilities.Display(listBox, s);
   11:  }

This code adds line 7, which states that we want to order the output on the key field. The output now has the OperatorTypes arranged alphabetically, as shown in Figure 1.

Figure 1: An Ordered list of Operator Types.

Listing Group Members

The final step in this process would be to list each member of each group, as shown in Figure 2.

Figure 2: List all the OperatorTypes, and then the members of each OperatorType group. (Click to see original)

The code that produces the result in Figure2 is shown in Listing Three.

Listing Three: This code creates an ordered list of OperatorTypes and shows each member belonging to a particular OperatorType group.

    1:  public void Group2()
    2:  {
    3:      var s = from p in operatorList
    4:              group p by p.OperatorType                    
    5:              into MyGroup
    6:              orderby MyGroup.Key
    7:              select MyGroup;
    8:   
    9:      foreach (var n in s)
   10:      {
   11:          listBox.Add(n.Key);
   12:   
   13:          foreach (var a in n)
   14:          {
   15:              listBox.Add("    " + a.OperatorName);
   16:          }
   17:      }
   18:  }

This code is similar to that shown in Listings 1 and 2.  This time, however, in line 7, we return the entire group, and not just the key from the group.

On Line 11, we add the key to the list of items to be displayed. In lines 13 through 16 the program iterates through each detail item that belongs to a particular group. The fact that one list is nested inside the other list is not something that is likely to be intuitively obvious to most developers. As a result, I suggest that you look at this code carefully, and try to memorize this pattern. It will recur often in LINQ programming.

Summary

In this post you were introduced to the GroupBy operator. This is a very powerful and commonly used operator, and all LINQ developers will probably benefit by becoming familiar with it.

The code shown here compiles cleanly on both the May CTP and the most current versions of Orcas. The only difference is that in the May CTP you should write using System.Query, while in the Orcas that line changes to using System.Linq.

kick it on DotNetKicks.com