Aggregation WITH ROLLUP


In this post, I’m going to discuss how aggregation WITH ROLLUP works.  The WITH ROLLUP clause permits us to execute multiple “levels” of aggregation in a single statement.  For example, suppose we have the following fictitious sales data.  (This is the same data that I used for my series of posts on the PIVOT operator.)



CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)
INSERT Sales VALUES(1, 2005, 12000)
INSERT Sales VALUES(1, 2006, 18000)
INSERT Sales VALUES(1, 2007, 25000)
INSERT Sales VALUES(2, 2005, 15000)
INSERT Sales VALUES(2, 2006, 6000)
INSERT Sales VALUES(3, 2006, 20000)
INSERT Sales VALUES(3, 2007, 24000)


We can write a simple aggregation query to compute the total sales by year:



SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr


As expected, this query returns three rows – one for each year:

Yr          Sales
———– ———————
2005 27000.00
2006 44000.00
2007 49000.00

The query plan is a simple stream aggregate:


  |–Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
       |–Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1010]=COUNT_BIG([Sales].[Sales]), [Expr1011]=SUM([Sales].[Sales])))
            |–Sort(ORDER BY:([Sales].[Yr] ASC))
                 |–Table Scan(OBJECT:([Sales]))


Now suppose that we want to compute not just the sale by year but the total sales as well.  We could write a UNION ALL query:



SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr
UNION ALL
SELECT NULL, SUM(Sales) AS Sales
FROM Sales


This query works and does give the correct result:

Yr          Sales
———– ———————
2005 27000.00
2006 44000.00
2007 49000.00
NULL 120000.00

However, the query plan performs two scans and two aggregations (one to compute the sales by year and one to compute the total sales):


  |–Concatenation
       |–Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1023]=(0) THEN NULL ELSE [Expr1024] END))
       |    |–Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1023]=COUNT_BIG([Sales].[Sales]), [Expr1024]=SUM([Sales].[Sales])))
       |         |–Sort(ORDER BY:([Sales].[Yr] ASC))
       |              |–Table Scan(OBJECT:([Sales]))
       |–Compute Scalar(DEFINE:([Expr1010]=NULL))
            |–Compute Scalar(DEFINE:([Expr1009]=CASE WHEN [Expr1025]=(0) THEN NULL ELSE [Expr1026] END))
                 |–Stream Aggregate(DEFINE:([Expr1025]=COUNT_BIG([Sales].[Sales]), [Expr1026]=SUM([Sales].[Sales])))
                      |–Table Scan(OBJECT:([Sales]))


We can do better by adding a WITH ROLLUP clause to the original query:



SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr WITH ROLLUP


This query is simpler to write and uses a more efficient query plan with only a single scan:


  |–Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006] END))
       |–Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))
            |–Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1007]=COUNT_BIG([Sales].[Sales]), [Expr1008]=SUM([Sales].[Sales])))
                 |–Sort(ORDER BY:([Sales].[Yr] ASC))
                      |–Table Scan(OBJECT:([Sales]))


The bottom stream aggregate in this query plan is the same as the stream aggregate in the original non-ROLLUP query.  This aggregation is a normal aggregation and, as such, it can be implemented using a stream aggregate (as in this example) or a hash aggregate (try adding an OPTION (HASH GROUP) clause to the above query).  It can also be parallelized.


The top stream aggregate is a special aggregate that computes the ROLLUP.  (Unfortunately, in SQL Server 2005 there is no way to discern from the query plan that this aggregate implements a ROLLUP.  This issue will be fixed in SQL Server 2008 graphical and XML plans.)  A ROLLUP aggregate is always implemented using stream aggregate and cannot be parallelized.  In this simple example, the ROLLUP stream aggregate merely returns each pre-aggregated input row while maintaining a running total of the Sales column.  After outputting the final input row, the aggregate also returns one additional row with the final sum.  Since SQL lacks a concept of an ALL value, the Yr column is set to NULL for this final row.  If NULL is valid value for Yr, we can identify the ROLLUP row using the GROUPING(Yr) construct.



SELECT
      CASE WHEN GROUPING(Yr) = 0
            THEN CAST (Yr AS CHAR(5))
            ELSE ‘ALL’
      END AS Yr,
      SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr WITH ROLLUP

Yr    Sales
—– ———————
2005 27000.00
2006 44000.00
2007 49000.00
ALL 120000.00

We can also compute multiple ROLLUP levels in a single query.  For example, suppose that we want to compute the sales first by employee and then for each employee by year:



SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH ROLLUP

EmpId       Yr          Sales
———– ———– ———————
1 2005 12000.00
1 2006 18000.00
1 2007 25000.00
1 NULL 55000.00
2 2005 15000.00
2 2006 6000.00
2 NULL 21000.00
3 2006 20000.00
3 2007 24000.00
3 NULL 44000.00
NULL NULL 120000.00

There are a couple of points worth noting about this query.  First, since the combination of the EmpId and Yr columns is unique, in the absence of the WITH ROLLUP clause, this query would just return the original data.  However, with the WITH ROLLUP clause the query produces a useful result.  Second, the order of the columns in the GROUP BY clause is relevant with the WITH ROLLUP clause.  To see why simply try the same query but reverse the EmpId and Yr columns.  Instead of computing the sales first by employee it will compute the sales first by year.


The query plan for this query is identical to the query plan for the prior query except that it groups on both the EmpId and Yr columns instead of on just the EmpId column.  Like the prior query plan, this query plan includes two stream aggregates: the bottom one which is a normal stream aggregate and the top one which computes the ROLLUP.  This ROLLUP stream aggregate actually computes two running totals: one which computes the total sales for an employee for all years and one which compute the total sales for all employees and all years.  This table shows how the ROLLUP computation proceeds:













































































EmpId


Yr


SUM(Sales) BY EmpId, Yr


SUM(Sales) BY EmpId


SUM(Sales)


1


2005


12000.00


12000.00


12000.00


1


2006


18000.00


30000.00


30000.00


1


2007


25000.00


55000.00


55000.00


1


NULL


 


55000.00


55000.00


2


2005


15000.00


15000.00


70000.00


2


2006


6000.00


21000.00


76000.00


2


NULL


 


21000.00


76000.00


3


2006


20000.00


20000.00


96000.00


3


2007


24000.00


44000.00


120000.00


3


NULL


 


44000.00


120000.00


NULL


NULL


 


 


120000.00


In my next post, I’ll take a look at the WITH CUBE clause.  I’ll discuss how it differs from WITH ROLLUP both in terms of function and in terms of its implementation.

Comments (11)

  1. In my last post, I wrote about how aggregation WITH ROLLUP works. In this post, I will discuss how aggregation

  2. mknee says:

    Great post – I’m sure many people would consder this "the basics" but it was new to me & VERY useful.

    Thanks!

    Mike Knee

  3. In my last two posts, I gave examples of aggregation WITH ROLLUP and CUBE . SQL Server 2008 continues

  4. Rakesh Dewangan says:

    Very helpful, thanks for writing blog.

    gr8 work!!!

    Keep it up

  5. Sonal Bhosale says:

    Thanx a lot!!!!

  6. Deepak Ugale says:

    Post really helps to understand the basic use of ROLLUP

  7. Cool article on WITH ROLLUP ..Thanks 🙂

  8. Supriya says:

    Very well explained.  It cleared all my doubts. Thanks Craig 🙂

  9. gopal says:

    How can i display the final result as in the column "SUM(Sales) BY EmpId" of above table

  10. Craig Freedman says:

    Starting with SQL Server 2012, you can use a window aggregate function as follows:

    SELECT EmpId, Yr, SUM(Sales) OVER (PARTITION BY EmpId ORDER BY Yr) AS Sales

    FROM Sales

    HTH

    Craig

  11. Waldemar Sarmiento says:

    Actually the Aggregate Windows Function SUM(Sales) OVER (PARTITION BY EmpId)  

    exists since SQL 2005 but without the Order By

    Waldemar