LINQ Farm: Using Distinct and Avoiding Lambdas


This is the fourth in a series of articles on LINQ. This article focuses on an important operator from the list of 49 LINQ operators available in the May CTP. This operator, called Distinct(), is different from the other operators we have seen because it is called directly.

This post will focus on five related ideas that will be explored and explained in depth:

  1. Most query operators such as Select(), Where() and GroupBy() take something called a lambda as a parameter.
  2. Lambdas are difficult to write.
  3. Query expressions were created in large part to allow developers to use LINQ without having to learn the complex syntax associated with lambdas.
  4. A few query operators, such as Distinct(), do not take lambdas as parameters. As a result, they are easy to call.
  5. Query expressions were therefore not created for operators such as Distinct() that do not take lambdas.

This post will explain these ideas in depth.

Overview

In this post, I will continue to avoid tackling lambdas directly. The C# team created a syntax called query expressions which allow you to write both simple and advanced queries without using lambdas.

Many discussions of LINQ that I have seen on the web start by focusing on lambdas. I believe this approach will make LINQ hard for many developers to understand. As a result, I’m telling you about everything except lambdas.

You do not need to understand lambdas in order to harness the power of LINQ. Lets start with what is easy to understand, and then later move on to more difficult subjects.

It is exciting to finally see a way to query data that is built directly into a computer language. When you fire up the compiler up and start using the technology directly, you will see that it is fun to play with the language, and fun to see what it yields.

Unless you really love complex syntax, don’t get hung up with lambdas and expression trees. Just have fun with LINQ. It is a great way to write queries.

Groups of Operators

As you recall from the previous post, the code in our sample programs rely on an in-memory “table”, which is really just a collection or list of objects of type Operator:

   1:  class Operator
   2:  {
   3:    public int OperatorID;
   4:    public string OperatorName;
   5:    public string OperatorType;
   6:  

There are 49 different operators in the May CTP. Therefore, the OperatorList “table” has 49 rows in it. Here is a look at 12 sample rows which will give you a sense of the structure of this “table”:

Operator OperatorName OperatorType
1 Where Restriction
2 Select Projection
3 SelectMany Projection
4 Take Partitioning
5 Skip Partitioning
6 Range Generation
7 Repeat Generation
8 Empty Generation
9 Distinct Set
10 Union Set
11 Intersect Set
12 Except Set

The OperatorType on the right illustrates a scheme for grouping the various types of operators. Trying to keep track of 49 different operators is a chore, so the team grouped related sets of operators together into 14 different categories, or operator types. In the table above, you can see that Range, Repeat, and Empty are part of the Generation category. In this post, we will frequently focus on the Distinct operator, which is part of the Set category.

Exploring the OperatorTypes

In Listing One, you can see a few of the different ways of exploring the OperatorTypes from our table. This post and the next will focus on the methods found in this listing.

Listing One: A few of the many different ways to begin exploring the OperatorTypes

 
   1:  using System;
   2:  using System.Collections.Generic;
   3:  using System.Text;
   4:  using System.Query;
   5:   
   6:  namespace QueryLister
   7:  {
   8:      class GroupsAndSets
   9:      {
  10:          private List<Operator> operatorList;
  11:          private System.Collections.IList list;
  12:   
  13:          public GroupsAndSets(System.Collections.IList list)
  14:          {
  15:              operatorList = Operators.GetOperatorList();
  16:              this.list = list;
  17:          }
  18:   
  19:          private void Display(IEnumerable<string> s)
  20:          {
  21:              foreach (var value in s)
  22:              {
  23:                  list.Add(value);
  24:              }
  25:          }
  26:   
  27:          public void DistinctWithLambda()
  28:          {               
  29:              var s = operatorList.Select(p => p.OperatorType).Distinct();
  30:   
  31:              Display(s);
  32:          }
  33:   
  34:          public void SimpleDistinct()
  35:          {
  36:              var s = (from p in operatorList                     
  37:                       select p.OperatorType).Distinct();
  38:   
  39:              Display(s);
  40:          }
  41:   
  42:          public void DistinctOrdered()
  43:          {
  44:              var s = (from p in operatorList
  45:                       orderby p.OperatorType
  46:                       select p.OperatorType).Distinct();
  47:   
  48:              Display(s);
  49:          }
  50:      }
  51:  }

I will focus on five methods from Listing One:

  • SimpleDistinct
  • DistinctOrdered
  • DistinctWithLambda
  • NotWhatYouExpected

In future posts, I will discuss the GroupBy operator. I will also “normalize” our data by placing the OperatorTypes in their own “table.”

The Distinct Operator

The SimpleDistinct method demonstrates the easiest way to get a look at the unique entries in the OperatorType field of our “table.” The output from the SimpleDistinct method is shown in Figure One.

Figure One: The SimpleDistinct method shows all the unique instances of the OperatorType field from  the OperatorList “table”. The Distinct operator itself is part of the Set group.

The code on lines 36 and 37 asks the compiler to “find all the unique instances of the OperatorType from the OperatorList ‘table.'” Or, you could say “From the OperatorList select all the OperatorTypes that are distinct.” 

If you go look again at lines 36 and 37, you will see that it is almost easier to read the code directly than it is to try to turn it into a proper English sentence. This simple syntax is the beauty of query expressions, and one of the key features of the LINQ technology.

Why is the Distinct Operator Called Directly?

Query operators are method calls. In other words, there are methods in the LINQ API called Select(), Group(), Distinct(), etc. We don’t usually call these methods directly because they take lambdas as parameters, and many people find that lambdas are hard to understand. To help developers avoid the complex task of writing lambdas, the team invented query expressions, which are a “syntactic sugar” that sit on top of lambdas.

Look at line 29. There you can see a call to the Select operator that takes a funny looking parameter: Select(p => p.OperatorType). The parameter, which I show in blue, is a lambda. You can tell it is a lambda because it includes that funny looking “goes to” operator: =>. To read it out loud, you would say “p goes to p dot OperatorType.”

Query expressions create lambdas behind the scenes, without forcing developers to compose them directly. We write query expressions that use terms like select, where, and group. Here is an example from an earlier post:

   1:  from p in OperatorList
   2:  select p;

This a query expression. There are no direct calls to the Select() operator in this code. Here is another example:

   1:  from p in operatorList
   2:  group p by p.OperatorType
   3:  into MyGroup 
   4:  select MyGroup.Key;

Again, there are no direct calls to Group or Select operators. Behind the scenes these query expressions are converted into calls to Select() and Group(), each of which take a lambda as a parameter.

If you look at the code for the SimpleDistinct method you will see that it is a little bit different from the other query expressions we have looked at so far. In particular, parenthesis are used to bind together the query expression proper, and then a method called Distinct() is called on the result of that query expression. The code below shows the query expression itself in blue, and the direct call to the Distinct() operator in red:

(from p in operatorList
select p.OperatorType
).Distinct();

Look back at the methods from previous posts, or look at the Group and GroupOrdered methods from this post, and you will see that this pattern is atypical. Normally we hide direct calls to query operators behind query expressions.

To understand why this method is different you need to remember that query expressions were created to save you from having to type lambdas. A query expression syntax was created for almost all the cases where you would pass a lambda as a parameter to query operator.  

The Set operators do not take lambdas as a parameter. Distinct is one of the Set operators. The other set operators are Union(), Intersect() and Except().

Since they do not take lambdas as parameters, the C# team decided that the Set operators do not need to hide behind query expressions. In effect, they said, “Oh look. It’s easy to call a Set operator like Distinct, so we won’t bother to create a query expression for it.” In fact, it is easier to just directly call these methods than it would be to invent a query expression syntax for them.

NOTE: Throughout this section, I have emphasized that lambdas are difficult to use. You’ve probably noticed, however, that the DistinctWithLambda method is not really that much more complex than the SimpleDistinct method. In fact, it takes 69 characters to write the SimpleDistinct method, and 66 characters to write the DistinctWithLambdaMethod. In this particular case, the lambda we are dealing with is simple, and easy to write. However, lambdas can easily become complex and lengthy, while query expressions are typically shorter and easier to read.

Danger: Not What you Would Expect

I’ll end this post by examining the NotWhatYouExpectDistinct method:

   1:  public void NotWhatYouExpectDistinct()
   2:  {
   3:      var s = from p in operatorList 
   4:          select p.OperatorType.Distinct();
   5:   
   6:      foreach (var value in s)
   7:      {
   8:          list.Add(value);
   9:      }
  10:  }

This method looks similar to the SimpleDistinct method:

   1:  public void SimpleDistinct()
   2:  {
   3:      var s = (from p in operatorList                     
   4:               select p.OperatorType).Distinct();
   5:   
   6:      Display(s);
   7:  }

The difference between the two methods is that one uses parenthesis to set off the query expression, and the other does not. If you don’t use the parenthesis, then the Distinct() method gets bound to p.OperatorType, and you end up with the output in Figure 3. This is probably not what you want. Be sure to use parenthesis when working with Distinct()!

Figure Two: Oh my gosh, I forgot the parenthesis!

Alphabetizing the Output from Distinct

The output shown in Figure One is not arranged alphabetically. The DistinctOrdered method eliminates this potential shortcoming by using the OrderBy operator:

   1:  public void DistinctOrdered()
   2:  {
   3:      var s = (from p in operatorList
   4:               orderby p.OperatorType
   5:               select p.OperatorType).Distinct();
   6:   
   7:      Display(s);
   8:  }

The majority of code here is identical to that in SimpleDistinct. On line 4, however you can see that a new operator has been introduced. This allows us to order the output, as shown in Figure Three.

Figure Three: The unique OperatorTypes arranged alphabetically.

The orderby operator shown on line 4 does exactly what you would expect it to do. It is intuitive and easy to use. It provides exactly the kind of simple syntax you would want to use to order a list like this one.

Summary

In this post you have been introduced to the Distinct. The syntax for using this operator is quite simple.

Much of this post, however, focuses on helping you understand the motivation for query expressions, and why there are times when you need to step past query expressions and call query operators directly. The Distinct operator, for instance, is called directly.

If you understand that query expressions are a means of hiding lambdas, and if you can see that some query operators do not take lambdas and can therefore be called directly, then you have assimilated the main point of this post.

Hopefully the simple process of working with simple query expressions has helped you begin to get a feeling for how LINQ is put together. Despite all the complex talk of lambda expressions and expression trees, LINQ is at heart a simple and easy to use syntax. We will continue to explore this fun tool in future posts. After a time, it should become easy for you to use in your own programs.

 

del.icio.us tags: , ,

LINQQueryLister02.zip

Comments (22)

  1. Welcome to the twelfth Community Convergence . Please go here to post comments. This edition of Community

  2. Charlie Calvert , who some of you may know for the years he spent at Borland or for his books on Delphi,

  3. Charlie Calvert , who some of you may know for the years he spent at Borland or for his books on Delphi,

  4. Charlie Calvert , who some of you may know for the years he spent at Borland or for his books on Delphi,

  5. Prog says:

    Charlie Calvert , who some of you may know for the years he spent at Borland or for his books on Delphi,

  6. This is the sixth in a series of articles on LINQ. In this post the focus will be on the LINQ Set operators.

  7. Kyle Roche says:

    This is a great article on using DISTINCT operator with LINQ.

  8. Kyle Roche says:

    Hi, what if i want to select all columns using a distinct operator?

  9. Olgunka-pj says:

    <a href= http://index4.opolog.com >kentucky bluegrass festival</a> <a href= http://index1.opolog.com >coogi sunglasses</a> <a href= http://index2.opolog.com >chatham womans club</a> <a href= http://index3.opolog.com >free comedy videos</a> <a href= http://index5.opolog.com >wet ass</a>

  10. balabo2_al says:

    <a href= http://index1.tcenip.com >wnbc photos</a> <a href= http://index2.tcenip.com >p.googole</a> <a href= http://index3.tcenip.com >multnomah county oregon</a> <a href= http://index4.tcenip.com >georgia o keefes paintings pic</a> <a href= http://index5.tcenip.com >1.99 perfume</a>

  11. balabo2_al says:

    <a href= http://index1.tcenip.com >wnbc photos</a> <a href= http://index2.tcenip.com >p.googole</a> <a href= http://index3.tcenip.com >multnomah county oregon</a> <a href= http://index4.tcenip.com >georgia o keefes paintings pic</a> <a href= http://index5.tcenip.com >1.99 perfume</a>

  12. Roberto says:

    This article is not so great, it flies over casual things.

    Why dont you talk about Distinct and the IEqualityComparer that raise an exception if the linq select manages a class object and not a basic type int, string…

    I’m talking about the <Unsupported overload used for query operator ‘Distinct’.>

    class MyClassComparer : IEqualityComparer<MyClass> …

    class MyClass …

    var result = (from e in DC.MyClass

                where …..

                select e).Distinct( new MyClassComparer )….

    This sample always generates the exception not supported.

    Great linq !!! Maybe, maybe not

  13. Lagi kurang kerjaan nich …. 😛 tapi bukan berarti minta boss(Buat bos gue kalo baca artikel ini) this

  14. Mukesh says:

    Short and sweet..

    Good one… Thank you…

  15. Chris says:

    Thanks a lot for this post, very helpful especially when you described why Distinct has to be called as opposed to being part of the sugar syntax.

  16. Usman says:

    although this article is fine. but it not give any information about Distinct() when using with the generic lists and we have to select the records from those lists not just one item. what we have to do then??? what linq provides us then because it supports only the primitive types like int,string,long etc,,

    could any one give me suggestion over this???

  17. axiohm says:

    Excellent article.. clear and concise… .. very useful

    Thank u!

  18. goldeneyes says:

    greate article thanks u …………….

  19. Jorge Guerreiro says:

    wow, you are great !  i just spent over 2 hours trying to get the distinct to work and you distilled it so simply.  Thank you Sir!

  20. Doru says:

    // LINQ2Entity: Distinct() after OrderBy() reslut is an unsorted list !!!

    ACMEEntities de = new ACMEEntities();

    // Distinct() change the order — not good

    var stateList = (from c in de.Customers

                           orderby c.State

                           select c.State).Distinct();

    // Call OrderBy() after Distinc() — Ok.

    var stateList = (from c in de.Customers

                           select c.State).Distinct().OrderBy(s => s);

    or

    // Also Ok.

    var stateList = de.Customers.Where(c => c.State != null).Select(c => c.State).Distinct().OrderBy(s => s);

  21. Yassine says:

    it's great i would like to thank you for sharing it's very use full

  22. Good article to understand when to avoild Lambdas