LINQ Farm: LINQ Sets

This is the sixth in a series of articles on LINQ. In this post the focus will be on the LINQ Set operators. Near the end of the post I include a short section on the importance of choosing the best operator for a particular task. Please see the links at the bottom of this post to retrieve the code.

Set Operators

LINQ provides users with four Set operations:

  1. Distinct
  2. Union
  3. Intersect
  4. Except

We have already worked with the Distinct operator. As a result, in this post I will focus on Union, Intersect, and Except.

Unions

Listing 1 shows an example of working with the Union operator. This code queries for all the operators of type Aggregate or type Conversion. It then joins the operators in a simple union containing a list of all the Aggregate and Conversion operators.

Listing 1: Performing a simple union with LINQ.

 

    1:  private IEnumerable<string> GetOpertorTypeMembers(string operatorType)
    2:  {
    3:      return from p in operatorList
    4:             where p.OperatorType == operatorType
    5:             select p.OperatorName;
    6:  }
    7:   
    8:  public void SimpleUnion()
    9:  {
   10:      var aggregateOperators = GetOpertorTypeMembers("Aggregate");
   11:      var conversionOperators = GetOpertorTypeMembers("Conversion");
   12:      var aggregatePlusConversion = aggregateOperators.Union(conversionOperators);
   13:   
   14:      foreach (var item in aggregatePlusConversion)
   15:      {
   16:          listBox.Add(item);
   17:      }
   18:  }

In Listing 1, your focus should be on the second method, called SimpleUnion. The GetOperatorTypeMembers method is a simple helper function designed to retrieve data from the database with a simple query. In particular, it asks the question "select from the database the names of all the operators of a particular type." The type of operator to retrieve is passed in as a parameter. This type of query was discussed in more detail in earlier posts in this series.

On lines 10 and 11 you can see the code that retrieves all the operators of type Aggregate and of type Conversion. Line 12 has the simple LINQ code for performing a union between two sets.

Earlier in this series, in the post entitled "Using Distinct and Avoiding Lambdas," I explained that most LINQ operations should be performed with query expressions, such as the one found in the GetOperatorTypeMembers method. I then went on to explain that query expressions help us avoid having to compose lambdas. There is nothing innately wrong with lambdas, but they can be hard to write. As a result, most users will prefer using query expressions and avoiding the difficult task of composing their own lambdas. When calling the set operators, however, there is no need to use a lambda. As a result, we call them directly, as shown on line 12.

Intersect

The next Set operator that I want to cover is called Intersect. Needless to say, this operator retrieves the intersection between two sets.

In Figure 2 you can see two sets. The first set shows all the operators that contain the letters "Wh", and the second sent shows all the operators that contain the letter "k". The intersection between the two sets are the operators called TakeWhile and SkipWhile.

Figure 2: Here you can see two sets, and below the second dotted line you can see their intersection.

Listing 4: Finding the Intersection between two sets.

    1:  private IEnumerable<string> GetContains(string searchTerm)
    2:  {
    3:      return from p in operatorList
    4:             where p.OperatorName.Contains(searchTerm)
    5:             select p.OperatorName;
    6:  }
    7:   
    8:  public void GroupPatterns()
    9:  {
   10:      var constainsWh = GetContains("Wh");
   11:      var constainsK = GetContains("k");
   12:   
   13:      Utilities.Display(listBox, constainsWh);
   14:      listBox.Add("===============");
   15:      Utilities.Display(listBox, constainsK);
   16:      listBox.Add("===============");
   17:      var unionData = constainsK.Intersect(constainsWh);
   18:   
   19:      foreach (var data in unionData)
   20:      {
   21:          listBox.Add(data);
   22:      }
   23:  }

The code in Listing 4 produces the output shown in Figure 2. At the top of the listing is a helper method that uses the Contains operator. In this short code sample, the GetContains method is used on lines 10 and 11 to retrieve the operators that contain either the letters "Wh" or the letter "k."

On line 17 you can see the call to the Intersect operator. It retrieves the intersection between the set called containsK and the set called containsWh. It's all pretty simple and straight forward.

The Except Operator

The Except operator is the mirror image of the Union operator. Instead of joining two sets together, the Except operator removes members from a set.  More precisely, you can use it to subtract one set from an existing set.

In Listing 2 the code creates the union of three sets, and then uses the Except operator to show how to remove elements belonging to one of the three sets.

Listing 2: A simple except statement.

    1:  public void SimpleExcept()
    2:  {
    3:      var aggregateOperators = GetOpertorTypeMembers("Aggregate");
    4:      var conversionOperators = GetOpertorTypeMembers("Conversion");
    5:      var setOperators = GetOpertorTypeMembers("Set");
    6:      var aggregatePlusConversionPlusSet = 
    7:          aggregateOperators.Union(conversionOperators).Union(setOperators);
    8:   
    9:      Utilities.Display(listBox, aggregatePlusConversionPlusSet);
   10:   
   11:      listBox.Add("===============");
   12:   
   13:      var exceptData = aggregatePlusConversionPlusSet.Except(setOperators);
   14:   
   15:      Utilities.Display(listBox, exceptData);
   16:  }

In Listing 2 notice how the Union operator is chained together in line 7 to create the union of three distinct sets. On line 13 the Except operator is used to return a set equal to the big union created in line 7 minus those items in the setOperators.

Figure 1 first shows the union of all three sets. Then, after the dotted line you can see what the set looks like after you subtract the set operators. In particular, notice that Distinct, Union, Intersect and Except are missing from the second group of operators.

Figure 1: The union of three sets is shown first. After the dotted line you see the same set minus the Set operators: Distinct, Union, Intersect and Except.

Choosing the Right Operator

I've now said all that I wanted to say about set operators. However, this post has been a little too simple, so why don't I close by talking a bit about the importance of choosing the right tools to help you compose the right query.

If you have an array of numbers from 1 to 9, then you might think it would be easy to use a mathematical formula in combination with the Except operator to extract the set of numbers that are even. You would then be left with the set of odd numbers between 1 and 9. You want to subtract one set from another set, so obviously you should use the Except operator. Right?

In practice things aren't always that simple. The LINQ Set operators work with sets. In particular, the Except operator expects to be passed a set. 

You have the set of numbers between 1 and 9, but to subtract the even numbers with the Except operator you have to first create the set of even numbers. To create the set of even numbers, you would typically write a query expression that looks like this:

    1:  int[] numbers = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    2:   
    3:  var evens = from p in numbers
    4:           where (p % 2 == 0)
    5:           select p;

You could now use Except to subtract the Even numbers from the original set of numbers:

 var odds = numbers.Except(evens);

But if you can create the set of even numbers, then you could just as easily create the set of odd numbers and just skip calling Except:

    1:  var evens = from p in numbers
    2:           where (p % 2 != 0)
    3:           select p;

By changing the == operator to an != operator, we got the result we wanted directly without calling Except.

In this simple example, it is fairly obvious why calling the Except operator is unnecessary. But in more complex code, it is going to be easy for LINQ developers to start using the wrong operator to accomplish a particular task. By calling the "wrong" operator, we won't necessarily get the wrong answer, but we will be creating code that is both slower and more complex than necessary.

Consider the code in Listing 3. This combines the two chunks of code we were looking at earlier to get the set of odd numbers between 1 and 9.

Listing 3: The query expression embedded in this Except statement returns a set expressed as an IEnumerable<T>.

    1:  var odds = numbers.Except(from p in numbers
    2:                            where (p % 2 == 0)
    3:                            select p);

In LINQ, a set is usually expressed as an IEnumerable<T>. In fact, The Except operator expects an IEnumerable<T> as its sole parameter. 

We know that query expressions produce an IEnumerable<T>.  Here I use a query expression to produce a set of even numbers that we could subtract from our existing set of numbers to create a set of odd numbers. This is what happens in the code seen in Listing 3.

In some ways, this code is fairly compelling. It is relatively concise, and produces the results that we want. However, it is not optimal. In fact, the query expression passed to the Except operator if taken on its own would be sufficient to produce our results. All we would need to do is change the == operator to !=.

The approximately 50 LINQ operators represent a language that can be used to query data. There are lots of ways to combine these operators together to create the correct results. The best LINQ developers, however, will be the ones who can quickly understand which of these 50 operators they want to use, and what is the best way to combine them.

Summary

This relatively simple post demonstrates how to use the Set operators to perform simple set operations on your data. The interesting thing about the Set operators is that they usually do not take lambdas, and as a result the designers of LINQ allow you to call them directly rather than asking you to call them through a query expression.

At the end of this post I took a little side trip which focused on the importance of selecting the right operators for a particular task. There is an art to writing LINQ queries, and the developers who are most adept at composing queries will be the ones who excel on this new frontier.

If you want to succeed you will almost certainly have to begin by becoming familiar with the LINQ query expression "language." To make the right choices, you need to know the various operators, and you need to have a fairly intuitive sense of how to optimally combine them. For most people, this is going to take a certain amount of practice. If you take the time to master query expressions, however, you will have gained a powerful skill that will significantly improve both the maintainability and the readability of your code.

kick it on DotNetKicks.com

LINQQueryLister04.zip