Recursive CTEs continued ...

In this post, I will finish the discussion of recursive CTEs that I began in my last post.  I will continue to use the CTE examples from Books Online.  To run these examples, you'll need to install the Adventure Works Cycles OLTP sample database.

In my last post, I explained that all recursive queries follow the same pattern of one or more anchor sub-selects and one or more recursive sub-selects combined by a UNION ALL.  Similarly, all recursive query plans also follow the same pattern which looks like so:

  |--Index Spool(WITH STACK)
       |--Concatenation
            |--Compute Scalar(DEFINE:([Expr10XX]=(0)))
            |    |-- ... anchor sub-select plan(s) ...
            |--Assert(WHERE:(CASE WHEN [Expr10ZZ]>(100) THEN (0) ELSE NULL END))
                 |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr10YY], [Recr10XX], ...))
                      |--Compute Scalar(DEFINE:([Expr10ZZ]=[Expr10YY]+(1)))
                      |    |--Table Spool(WITH STACK)
                      |-- ... recursive sub-select plan(s) ...

Because of this basic plan shape, SQL Server cannot execute recursive queries using parallel plans.  There are two basic problems.  First, the stack spool does not support parallel execution.  Second, the nested loops join does not support parallel execution on its inner input.

Finally, let's look at how the placement of a WHERE clause can have a big impact on the performance of a recursive query.  In my last post, I dissected a recursive query that returns a list of all employees and their level within the organization.  Suppose that we want to run the same query but limit the results to those employees in the first two levels.  We could write the following query (which you'll also find in Books Online) with an extra WHERE clause to limit the results:

WITH DirectReports(ManagerID, EmployeeID, EmployeeLevel) AS
(
    SELECT ManagerID, EmployeeID, 0 AS EmployeeLevel
    FROM HumanResources.Employee
    WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.ManagerID, e.EmployeeID, EmployeeLevel + 1
    FROM HumanResources.Employee e
        INNER JOIN DirectReports d
        ON e.ManagerID = d.EmployeeID
)
SELECT ManagerID, EmployeeID, EmployeeLevel
FROM DirectReports
WHERE EmployeeLevel <= 2

This query computes uses essentially the same plan as the query without the extra WHERE clause.  The WHERE clause simply introduces a filter operator at the root of the plan:

 Rows   Executes
34     1        |--Filter(WHERE:([Recr1012]<=(2)))
290    1             |--Index Spool(WITH STACK)
290    1                  |--Concatenation
0      0                       |--Compute Scalar(DEFINE:([Expr1013]=(0)))
0      0                       |    |--Compute Scalar(DEFINE:([Expr1003]=(0)))
1      1                       |         |--Index Seek(OBJECT:([Employee].[IX_Employee_ManagerID]), SEEK:([ManagerID]=NULL) ...)
289    1                       |--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE NULL END))
289    1                            |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
0      0                                 |--Compute Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
290    1                                 |    |--Table Spool(WITH STACK)
0      0                                 |--Compute Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
289    290                                    |--Index Seek(OBJECT:([Employee].[IX_Employee_ManagerID]), SEEK:([ManagerID]=[Recr1007]) ...)

As you can see from the row counts, this query computes the level of every employee in the organization before discarding the results that we do not want.  Now, suppose that we instead move the WHERE clause into the recursive sub-select:

WITH DirectReports(ManagerID, EmployeeID, EmployeeLevel) AS
(
    SELECT ManagerID, EmployeeID, 0 AS EmployeeLevel
    FROM HumanResources.Employee
    WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.ManagerID, e.EmployeeID, EmployeeLevel + 1
    FROM HumanResources.Employee e
        INNER JOIN DirectReports d
        ON e.ManagerID = d.EmployeeID 
    WHERE EmployeeLevel < 2
)
SELECT ManagerID, EmployeeID, EmployeeLevel
FROM DirectReports

This query is semantically identical to the previous query, but if we look at the plan, we can see that the filter moves into the recursive portion of the plan:

 Rows   Executes
34     1        |--Index Spool(WITH STACK)
34     1             |--Concatenation
0      0                  |--Compute Scalar(DEFINE:([Expr1013]=(0)))
0      0                  |    |--Compute Scalar(DEFINE:([Expr1003]=(0)))
1      1                  |         |--Index Seek(OBJECT:([Employee].[IX_Employee_ManagerID]), SEEK:([ManagerID]=NULL) ...)
33     1                  |--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE NULL END))
33     1                       |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
0      0                            |--Compute Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
34     1                            |    |--Table Spool(WITH STACK)
0      0                            |--Compute Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
33     34                                |--Nested Loops(Inner Join)
7      34                                     |--Filter(WHERE:(STARTUP EXPR([Recr1008]<(2))))
7      7                                      |    |--Constant Scan
33     7                                      |--Index Seek(OBJECT:([Employee].[IX_Employee_ManagerID]), SEEK:([ManagerID]=[Recr1007]) ...)

Notice from the row counts that the new plan computes only the portion of the result set that really interests us and then terminates the recursion.

You may also observe than the plan now has a startup filter.  An ordinary filter reads rows from its input tree and then evaluates a Boolean expression on each row to determine whether to return it.  A startup filter evaluates its expression only once per execution.  The expression does not depend on the input rows.  If the expression evaluates to true, the startup filter returns all of its input rows.  If the expression evaluates to false, the startup filter returns no rows.

In this example, if the startup filter expression evaluates to true, the constant scan returns one row to the nested loops join immediately above the filter and this join then executes the index seek on its inner input.  However, if the startup filter expression evaluates to false, the filter and then the nested loops join return no rows.

While there are many scenarios where a startup filter is genuinely useful, this plan is slightly more complex than is really necessary.  The plan could have used an ordinary filter:

  |--Index Spool(WITH STACK)
       |--Concatenation
            |--Compute Scalar(DEFINE:([Expr1013]=(0)))
            |    |--Compute Scalar(DEFINE:([Expr1003]=(0)))
            |         |--Index Seek(OBJECT:([HumanResources].[Employee].[IX_Employee_ManagerID]), SEEK:([HumanResources].[Employee].[ManagerID]=NULL) ORDERED FORWARD)
            |--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE NULL END))
                 |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
                      |--Filter(WHERE:([Recr1008]<(2)))
                      |    |--Compute Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
                      |         |--Table Spool(WITH STACK)
                      |--Compute Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
                           |--Index Seek(OBJECT:([HumanResources].[Employee].[IX_Employee_ManagerID] AS [e]), SEEK:([e].[ManagerID]=[Recr1007]) ORDERED FORWARD)

Note that this plan is artificial.  You cannot get SQL Server to generate it.