Optimizing I/O Performance by Sorting – Part 1

In this post from last year, I discussed how random I/Os are slower than sequential I/Os (particularly for conventional rotating hard drives).  For this reason, SQL Server often favors query plans that perform sequential scans of an entire table over plans that perform random lookups of only a portion of a table.  (See the last example in this post for a simple demonstration.)  In other cases, instead of performing a sequential scan, SQL Server introduces a sort operator whose sole purpose is to convert random I/Os into sequential I/Os.

Let's look at an example of such a sort.  To measure the performance effects, we'll need a reasonably large table.  The following script creates a 25.6 million row table that consumes about 3 GBytes of storage.

CREATE DATABASE IOTest
    ON ( NAME = IOTest_Data, FILENAME = '...\IOTest_Data.mdf', SIZE = 4 GB )
    LOG ON ( NAME = IOTest_Log, FILENAME = '...\IOTest_Log.ldf', SIZE = 200 MB )
GO
ALTER DATABASE IOTest SET RECOVERY SIMPLE
GO
USE IOTest
GO
CREATE TABLE T (
    PK INT IDENTITY PRIMARY KEY,
    RandKey INT,
    Flags TINYINT,
    Data INT,
    Pad CHAR(100))
GO
SET NOCOUNT ON
DECLARE @I INT
SET @I = 0
WHILE @I < 100000
  BEGIN
    WITH
      X2 (R) AS ( SELECT RAND() UNION ALL SELECT RAND() ),
      X4 (R) AS ( SELECT R FROM X2 UNION ALL SELECT R FROM X2 ),
      X8 (R) AS ( SELECT R FROM X4 UNION ALL SELECT R FROM X4 ),
      X16 (R) AS ( SELECT R FROM X8 UNION ALL SELECT R FROM X8 ),
      X32 (R) AS ( SELECT R FROM X16 UNION ALL SELECT R FROM X16 ),
      X64 (R) AS ( SELECT R FROM X32 UNION ALL SELECT R FROM X32 ),
      X128 (R) AS ( SELECT R FROM X64 UNION ALL SELECT R FROM X64 ),
      X256 (R) AS ( SELECT R FROM X128 UNION ALL SELECT R FROM X128 )
    INSERT T (RandKey, Flags, Data, Pad)
        SELECT R * 1000000000, 0xFF, 1, '' FROM X256
    SET @I = @I + 1
  END
GO
CREATE INDEX IRandKey on T (RandKey, Flags)
GO

Due to the fixed width Pad column, each row of T consumes 113 bytes (plus overhead).  Roughly 65 rows fit on a single 8 Kbyte page.  (The Flags column is unused in this example, but I will make use of it in a subsequent post.)

The RandKey column, as the name suggests, contains random values.  Notice that we have a non-clustered index on this column.  Given a predicate on the RandKey column, SQL Server can use this index to fetch qualifying rows from the table.  However, because the values in this column are random, the selected rows will be scattered randomly throughout the clustered index.

If we select just a few rows from the table using a filter on RandKey, SQL Server will use the non-clustered index:

SELECT SUM(Data)
FROM T
WHERE RandKey < 1000

  |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1011]=(0) THEN NULL ELSE [Expr1012] END))
       |--Stream Aggregate(DEFINE:([Expr1011]=COUNT_BIG([T].[Data]), [Expr1012]=SUM([T].[Data])))
            |--Nested Loops(Inner Join, OUTER REFERENCES:([T].[PK], [Expr1010]) OPTIMIZED WITH UNORDERED PREFETCH)
                 |--Index Seek(OBJECT:([T].[IRandKey]), SEEK:([T].[RandKey] < (1000)) ORDERED FORWARD)
                 |--Clustered Index Seek(OBJECT:([T].[PK__T__...]), SEEK:([T].[PK]=[T].[PK]) LOOKUP ORDERED FORWARD)

The non-clustered index seek selects a few rows (the use of random keys means that the exact number may vary each time the table is loaded) and looks them up in the clustered index to get the value of the Data column for the SUM aggregate.  The non-clustered index seek is very efficient - it likely touches only one page - but the clustered index seek generates a random I/O for each row.

If we select a large number of rows, SQL Server recognizes that the random I/Os are too expensive and switches to a clustered index scan:

SELECT SUM(Data)
FROM T
WHERE RandKey < 10000000

  |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1009]=(0) THEN NULL ELSE [Expr1010] END))
       |--Stream Aggregate(DEFINE:([Expr1009]=COUNT_BIG([T].[Data]), [Expr1010]=SUM([T].[Data])))
            |--Clustered Index Scan(OBJECT:([T].[PK__T__...]), WHERE:([T].[RandKey]<(10000000)))

This query touches only 1% of the data.  Still, the query is going to touch more than half of the pages in the clustered index so it is faster to scan the entire clustered index than to perform on the order of 256,000 random I/Os.

Somewhere in between these two extremes things get a little more interesting:

SELECT SUM(Data)
FROM T
WHERE RandKey < 2500000

  |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
       |--Stream Aggregate(DEFINE:([Expr1010]=COUNT_BIG([T].[Data]), [Expr1011]=SUM([T].[Data])))
            |--Nested Loops(Inner Join, OUTER REFERENCES:([T].[PK], [Expr1009]) WITH UNORDERED PREFETCH)
                 |--Sort(ORDER BY:([T].[PK] ASC))
                 |    |--Index Seek(OBJECT:([T].[IRandKey]), SEEK:([T].[RandKey] < (2500000)) ORDERED FORWARD)
                 |--Clustered Index Seek(OBJECT:([T].[PK__T__...]), SEEK:([T].[PK]=[T].[PK]) LOOKUP ORDERED FORWARD)

This query touches a mere 0.25% of the data.  The plan uses the non-clustered index to avoid unnecessarily touching many rows.  Yet, performing 64,000 random I/Os is still rather expensive so SQL Server adds a sort.  By sorting the rows on the clustered index key, SQL Server transforms the random I/Os into sequential I/Os.  Thus, we get the efficiency of the seek - touching only those rows that qualify - with the performance of the sequential scan.

It is worth pointing out that sorting on the clustered index key will yield rows that are in the logical index order.  Due to fragmentation or due simply to the multiple layers of abstraction between SQL Server and the actual hard drives, there is no guarantee that the physical order on disk matches the logical order.

In my next post, I'll run some of these queries and demonstrate the performance implications of the sort.