What are iterators?
SQL Server breaks queries down into a set of fundamental building blocks that we call operators or iterators. Each iterator implements a single basic operation such as scanning data from a table, updating data in a table, filtering or aggregating data, or joining two data sets. In all, there are a few dozen such primitive iterators. Iterators may have one, two, or more children and can be combined into trees which we call query plans. By building appropriate query plans, we can execute any SQL statement. In practice, there are frequently many valid query plans for a given statement. The query optimizer’s job is to find the best (e.g., the cheapest) query plan for a given statement.
How do iterator’s work?
An iterator reads input rows either from a data source such as a table or from its children (if it has any) and produces output rows which it returns to its parent. The output rows that an iterator produces depend on the operation that the iterator performs.
All iterators implement the same set of core methods. For example, the Open method tells an iterator to prepare to produce output rows while the GetRow method requests that an iterator produce a new output row. Because all iterators implement the same methods, iterators are independent of one another. That is, an iterator does not need specialized knowledge of its children (if any) or parent. Consequently, iterators can be easily combined in many different ways and into many different query plans.
When SQL Server executes a query plan, control flows down the tree. That is, we call the methods Open and GetRow on the iterator at the root of the query plan and these methods propagate down through the tree to the leaf iterators. Data flows or more accurately is pulled up the tree when one iterator calls another iterator’s GetRow method.
A simple example
Consider the following query:
select count(*) from t
The simplest way to execute this query is to scan each row in table [t] and count the rows. SQL Server uses two iterators to achieve this result: one to scan the rows in [t] and another to count them:
<-- count(*) <-- Scan [t]
To execute this query plan, we call Open on the count(*) iterator. The count(*) iterator performs the following tasks in Open:
- it calls Open on the scan iterator which readies the scan to produce rows;
- it calls GetRow repeatedly on the scan, counting the rows returned, and stopping only when GetRow returns that it has reached the end of the scan; and
- it calls Close on the scan iterator to indicate that we are done.
Thus, by the time the count(*) iterator returns from Open, it has already calculated the number of rows in [t]. To complete execution we call GetRow on the count(*) and it returns this result. (Technically, we call GetRow on the count(*) one more time since we do not know that count(*) produces only a single row until we try. The second GetRow call returns that we have reached the end of the result set.)
Note that the count(*) iterator does not care or need to know that it is counting rows from a scan; it will count rows from any sub-tree we put below it regardless of how simple or complex the sub-tree may be.
Finally, the scan is implemented using the aptly named table scan iterator while the count(*) operation is actually implemented using the stream aggregate iterator. I’ll delve into more detail about many of the different iterators supported by SQL Server in future posts.