Spool operators in query plan...

I came across a question in the relationalserver.performance newsgroup where a customer was wondering about the spools seen in a recursive query execution plan. The query is shown below:

USE Northwind;
Go

WITH EmpChart AS
(
SELECT EmployeeId, ReportsTo, 1 AS treelevel
FROM Employees
WHERE (Employees.ReportsTo = 2)
UNION ALL
SELECT e.EmployeeId, e.ReportsTo, treelevel +1
FROM Employees e
JOIN EmpChart ec
ON e.ReportsTo=ec.EmployeeID
)
SELECT * FROM EmpChart;

The plan for the above query shows an index spool and a table spool. They are one and the same. The plan is shown below:

|--Index Spool(WITH STACK)
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1013]=(0)))
| |--Compute Scalar(DEFINE:([Expr1003]=(1)))
| |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees]), WHERE:([Northwind].[dbo].[Employees].[ReportsTo]=(2)))
|--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE NULL END))
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
|--Compute Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
| |--Table Spool(WITH STACK)
|--Compute Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e]), WHERE:([Northwind].[dbo].[Employees].[ReportsTo] as [e].[ReportsTo]=[Recr1006]))

The index spool is also a lazy spool here meaning rows get inserted into the spool during execution of the recursive part also. Additionally, the rows from the index spool is read using a stack-like mechanism otherwise the recursive part may visit the same rows again. Here is how to read the plan with the recursive query:

1. Start with the anchor / top-most part

|--Index Spool(WITH STACK)
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1013]=(0)))
| |--Compute Scalar(DEFINE:([Expr1003]=(1)))
| |--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees]),
WHERE:([Northwind].[dbo].[Employees].[ReportsTo]=(2)))

The anchor part of the recursive CTE first gets executed and the spool is created with index. Note the stack option also in the spool. This indicates that rows are read in a FIFO manner.

2. Next the recursive part of the query

|--Nested Loops(Inner Join, OUTER
REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
|--Compute
Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
| |--Table Spool(WITH STACK)
|--Compute
Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
|--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e]),
WHERE:([Northwind].[dbo].[Employees].[ReportsTo] as
[e].[ReportsTo]=[Recr1006]))

This is the nested loop join between the spool (created in step #1 for the anchor query) and the recursive part of the query. Since this is eager spool, rows will be populated into the spool also until the recursion is completed or the maximum level is reached. The maximum level check is done using the assert operator above the nested loop join:

|--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE
NULL END))

3. Now, the way to tell which spools are related or the same is to look at the properties of the spool operator in the query plan output. The index spool has a property called NodeId which will referenced by the table spool as PrimaryNodeId property in another part of the plan.

Lastly, SQL Server can also create a plan with an eager spool which can be seen below for the query. In case of eager spool, query execution can continue only after the eager spool has been fully created. This is different from the lazy spool.

select count(distinct ShipVia), count(distinct ShipCountry)
from Orders as o, Customers as c
where o.CustomerID = c.CustomerID;

1. To read, the plan we will start again from the top part which contains the eager spool population.

| |--Table Spool
| |--Hash Match(Inner Join,
HASH:([c].[CustomerID])=([o].[CustomerID]),
RESIDUAL:([Northwind].[dbo].[Customers].[CustomerID] as
[c].[CustomerID]=[Northwind].[dbo].[Orders].[CustomerID] as
[o].[CustomerID]))
| |--Index
Scan(OBJECT:([Northwind].[dbo].[Customers].[Region] AS [c]))
| |--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o]))

Here you can see the hash join between customers and orders table that populates the eager spool table.

2. The ShipVia distinct count is computed as follows by reading from the spool.

|--Compute
Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))
| |--Stream
Aggregate(DEFINE:([Expr1010]=COUNT([Northwind].[dbo].[Orders].[ShipVia] as
[o].[ShipVia])))
| |--Hash Match(Aggregate, HASH:([o].[ShipVia]),
RESIDUAL:([Northwind].[dbo].[Orders].[ShipVia] as [o].[ShipVia] =
[Northwind].[dbo].[Orders].[ShipVia] as [o].[ShipVia]))
| |--Table Spool

3. Similarly, the ShipCountry distinct count is computed using the same spool. You can see this by looking at the NodeId and PrimaryNodeId properties of the spool operators in the query plan or showplan xml.

|--Compute
Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0)))
|--Stream
Aggregate(DEFINE:([Expr1011]=COUNT([Northwind].[dbo].[Orders].[ShipCountry]
as [o].[ShipCountry])))
|--Hash Match(Aggregate, HASH:([o].[ShipCountry]),
RESIDUAL:([Northwind].[dbo].[Orders].[ShipCountry] as [o].[ShipCountry] =
[Northwind].[dbo].[Orders].[ShipCountry] as [o].[ShipCountry]))
|--Table Spool

4. Finally, since COUNT() aggregate always returns one row, we just do a nested loop join between the two parts of the tree above to return a row.

|--Nested Loops(Inner Join)
|--Compute
Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))
....
|--Compute
Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0)))

Hope this helps you read execution plans that contain the various spool operators.

--

Umachandar Jayachandran