Taking advantage of the ordering guarantees of the LINQ GroupBy method


A customer wanted to group a set of data by one field, and within each group, sort the data by another field, and then sort the groups by that second field.

For example, given the following data set:

Name Time
Charles 11
Charles 21
Alice 20
Charles 23
Alice 29
Alice 13
Charles 17
Bob 20
Alice 13
Bob 12
Alice 26
Bob 18
Charles 18
Bob 28
Alice 23
Bob 13

We group by name:

Name Time
Alice 20
Alice 29
Alice 13
Alice 13
Alice 26
Alice 23
Bob 20
Bob 12
Bob 18
Bob 28
Bob 13
Charles 11
Charles 21
Charles 23
Charles 17
Charles 18

And then we sort each person's time, shortest first.

Name Time
Alice 13
Alice 13
Alice 20
Alice 23
Alice 26
Alice 29
Bob 12
Bob 13
Bob 18
Bob 20
Bob 28
Charles 11
Charles 17
Charles 18
Charles 21
Charles 23

And then we sort the people by their best time. Charles's best time is 11 seconds, which is best overall, so his times go first. Bob's best time is 12 seconds, so his group goes next. Alice's best time is 13 seconds, so her group is last.

Name Time
Charles 11
Charles 17
Charles 18
Charles 21
Charles 23
Bob 12
Bob 13
Bob 18
Bob 20
Bob 28
Alice 13
Alice 13
Alice 20
Alice 23
Alice 26
Alice 29

So we have a three-step LINQ query, where we group, and then sort each group, and then sort the groups.

var results =
    data.GroupBy(x => x.Name) // group by name
        .Select(g => g.OrderBy(x => x.Time)); // sort each group
        .OrderBy(g => g.First()) // sort the groups by best time
        .SelectMany(g => g);  // flatten the groups

The last step is to use SelectMany to convert the groups back into their individual members. This takes advantage of the fact that IGrouping<TKey, out TElement>, derives from IEnumerable<TElement>, so you can use the group as a collection.

But you can reduce this to a two-step operation: First sort globally by time, and then group them. The Group­By method is documented as reporting the groups in the order of first appearance, so this ensures that the fastest group comes first.

var results =
    data.OrderBy(x => x.Time) // sort globally by time
        .GroupBy(x => x.Name) // group by name (best time first)
        .SelectMany(g => g);  // flatten the groups

It does slightly more work than the three-step query because it sorts the entire collection, even though it needed only to sort each group. But it looks slicker, and might even be easier to understand. Provided you understand that the grouping is stable.

Comments (8)
  1. KennyBiel says:

    The first example, besides having a spurious semicolon after the Select, does not do what you expect it to do. I believe you need this:
    .OrderBy(g => g.First().Time) // sort the groups by best time

  2. Vilx- says:

    Seems like a delicate trick that could be easily missed by future maintainers of the code. I’d probably leave a comment there bringing their attention to this subtlety. Or write it in a more obvious way.

    1. KennyBiel says:

      I find that it’s the GroupBy returning IEnumerable<IGrouping> that is not obvious to people who do not use it much. Then you have the Select returning a IEnumerable<IOrderEnumerable>. I find the comments that Raymond has are sufficient though.

  3. Anonymous says:
    (The content was deleted per user request)
  4. cheong00 says:

    The second method works, but the first one throws exception when running:

    System.ArgumentException: At least one object must implement IComparable.
    at System.Collections.Comparer.Compare(Object a, Object b)
    at System.Collections.Generic.ObjectComparer`1.Compare(T x, T y)
    at System.Linq.EnumerableSorter`2.CompareKeys(Int32 index1, Int32 index2)
    at System.Linq.EnumerableSorter`1.QuickSort(Int32[] map, Int32 left, Int32 right)
    at System.Linq.EnumerableSorter`1.Sort(TElement[] elements, Int32 count)
    at System.Linq.OrderedEnumerable`1.d__1.MoveNext()
    at System.Linq.Enumerable.d__17`2.MoveNext()
    at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)

    1. Anonymous says:
      (The content was deleted per user request)
    2. cheong00 says:

      Have to change to the following to work: .OrderBy(g => g.First().Time)

      Just like KennyBiel said. (Why there’s no edit comment button?)

  5. ta.speot.is says:

    “The Group­By method is documented as reporting the groups in the order of first appearance, so this ensures that the fastest group comes first.”

    The documentation for LINQ’s Enumerable.GroupBy makes this guarantee, but not the documentation for LINQ’s Queryable.GroupBy. As far as I’m aware the common implementations of IQueryable (Entity Framework etc.) don’t apply any ordering as a result of Queryable.GroupBy. And they are not obliged to.

Comments are closed.

Skip to main content