Converting SQL to LINQ, Part 5: GROUP BY and HAVING (Bill Horst)

This post assumes you’ve read the previous posts in this series:

          Converting SQL to LINQ, Part 1: The Basics

          Converting SQL to LINQ, Part 2: FROM and SELECT

          Converting SQL to LINQ, Part 3: DISTINCT, WHERE, ORDER BY and Operators

          Converting SQL to LINQ, Part 4: Functions

 

This post will discuss the GROUP BY and HAVING clauses.

 

GROUP BY

 

A SQL GROUP BY clause allows you to group records by particular fields, so the entire group can be dealt with at once.  A LINQ statement can have a Group By clause as well, but with different syntax.  An informal (and incomplete) syntax expression could be:

 

VB Group By

Group [{optional list of fields}] By {list of fields} _

Into {list of aggregate expressions}

 

 

In this syntax, all the listed expressions can have aliases, and must if an identifier cannot be inferred.  Essentially, the clause is “Group By” followed by a list of fields by which to group the records, then “Into” followed by a list of aggregate expressions to calculate.  For example, the following query calculates the total and average cost of shipments to each zip code:

 

VB

From Shipment In OrderTable _

Group By Shipment.ShippingZip _

Into Total = Sum(Shipment.Cost), Average(Shipment.Cost)

 

This query expression returns three values, ShippingZip, Total and Average, for each zip code represented in the data.

 

Optionally, a list of fields can be provided between the “Group” and “By” keywords to narrow down the information included to specific fields.  This could be thought of as a built-in Select clause prior to the Into clause:

 

VB

From Shipment In OrderTable _

Group OrderCost = Shipment.Cost By Shipment.ShippingZip _

Into Total = Sum(OrderCost), Average(OrderCost)

 

In addition to aggregate expressions, the Into clause can also contain the keyword “Group”, which causes the individual records to be included for each group, as an array member.  For each pairing of zip code and order date represented in the data, the following query expression returns two values, Zip and OrderDate, plus an array representing all the records in the group.

 

VB

From Shipment In OrderTable _

Group By Zip = Shipment.ShippingZip, Shipment.OrderDate _

Into Group

 

When there are no fields specified between “Group” and “Order By”, as above, all fields are included in the records in the Group array.  Including specific fields narrows the information to the specified fields, as though the records were filtered through a Select clause.  For each zip code and order date pairing represented in the data, the following query expression returns Zip and OrderDate values, plus an array of ID and Cost fields for each record in the group.

 

VB

From Shipment In OrderTable _

Group ID = Shipment.OrderID, Shipment.Cost By _

    Zip = Shipment.ShippingZip, Shipment.OrderDate _

Into Group

 

I’ve given a lot of examples because the syntax is fairly complex, but now we start to see how to convert various SQL statements with GROUP BY clauses to VB.  Below is an example:

 

SQL

SELECT OrderDate Date_Of_Order, SUM(Cost) Daily_Total

FROM OrderTable

GROUP BY Date_Of_Order

 

 

VB

From Shipment In OrderTable _

Group By Date_Of_Order = Shipment.OrderDate _

Into Daily_Total = Sum(Shipment.Cost)

 

 

Having

 

Having is another SQL clause which can specify conditions for a group’s inclusion in the query results.  VB has no corresponding clause, so the best way to re-create this with VB LINQ is to use a Where clause after a Group By clause, as shown below.

 

SQL

SELECT OrderDate Date_Of_Order, SUM(Cost) Total_Cost

FROM OrderTable

GROUP BY Date_Of_Order

HAVING SUM(Cost) > 1000

 

 

VB

From Shipment In OrderTable _

Group By Date_Of_Order = Shipment.OrderDate _

Into Total_Cost = Sum(Shipment.Cost)

Where Total_Cost > 1000

 

 

In my next post, I plan to cover various kinds of joins.

- Bill Horst, VB IDE Test