Optimizing I/O Performance by Sorting – Part 2

Article
03/04/2009

In my last post, I discussed how SQL Server can use sorts to transform random I/Os into sequential I/Os. In this post, I'll demonstrate directly how such a sort can impact performance. For the following experiments, I'll use the same 3 GByte database that I created last week.

The system I'm using to run this test has 8 GBytes of memory. To exaggerate the performance effects and simulate an even larger table that does not fit in main memory, I'm going to adjust the ‘MAX SERVER MEMORY' SP_CONFIGURE option to allow SQL Server to use just 1 GByte of memory. I'm going to use CHECKPOINT to ensure that the newly created database is completely flushed to disk before running any experiments. Finally, I'm going to run DBCC DROPCLEANBUFFERS before each test to ensure that none of the data is cached in the buffer pool between tests.

CHECKPOINT

EXEC SP_CONFIGURE 'SHOW ADVANCED OPTIONS', '1'
RECONFIGURE
EXEC SP_CONFIGURE 'MAX SERVER MEMORY', '1024'
RECONFIGURE

DBCC DROPCLEANBUFFERS

Note that you will NOT want to run these statements on a production server.

As I discussed last week, SQL Server can use one of three plans for the following query depending on the value of the constant:

SELECT SUM(Data)
FROM T
WHERE RandKey < constant

To recap, if the constant is small, SQL Server uses a non-clustered index seek and a bookmark lookup. If the constant is large, SQL Server uses a clustered index scan to avoid performing many random I/Os. Finally, if the constant is somewhere in the middle, SQL Server uses the non-clustered index seek but sorts the rows prior to performing the bookmark lookup to reduce the number of random I/Os. You can review last week's post to see examples of each of these plans. I'm going to focus on the third and final plan with the sort.

To demonstrate the benefit of the sort, I need to be able to run the same query with and without the sort. A simple way to make SQL Server remove the sort is to use the following UPDATE STATISTICS statement to trick SQL Server into believing that the table is really small. To ensure that I still get the plan with the non-clustered index seek and the bookmark lookup, I need to add an INDEX hint. I'm also adding a RECOMPILE query hint to ensure that SQL Server generates a new plan after I've altered the statistics.

UPDATE STATISTICS T WITH ROWCOUNT = 1, PAGECOUNT = 1

SELECT SUM(Data)
FROM T WITH (INDEX (IRandKey))
WHERE RandKey < constant
OPTION (RECOMPILE)

I can also reset the statistics using the following statement:

UPDATE STATISTICS T WITH ROWCOUNT = 25600000, PAGECOUNT = 389323

Here is an example of the default plan with the real statistics and with the sort:

|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
       |--Stream Aggregate(DEFINE:([Expr1010]=COUNT_BIG([T].[Data]), [Expr1011]=SUM([T].[Data])))
            |--Nested Loops(Inner Join, OUTER REFERENCES:([T].[PK], [Expr1009]) WITH UNORDERED PREFETCH)
                 |--Sort(ORDER BY:([T].[PK] ASC))
                 |    |--Index Seek(OBJECT:([T].[IRandKey]), SEEK:([T].[RandKey] < (2000000)) ORDERED FORWARD)
                 |--Clustered Index Seek(OBJECT:([T].[PK__T__...]), SEEK:([T].[PK]=[T].[PK]) LOOKUP ORDERED FORWARD)

Here is an example of the plan after running UPDATE STATISTICS and without the sort:

|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1009]=(0) THEN NULL ELSE [Expr1010] END))
       |--Stream Aggregate(DEFINE:([Expr1009]=COUNT_BIG([T].[Data]), [Expr1010]=SUM([T].[Data])))
            |--Nested Loops(Inner Join, OUTER REFERENCES:([T].[PK]))
                 |--Index Seek(OBJECT:([T].[IRandKey]), SEEK:([T].[RandKey] < (2000000)) ORDERED FORWARD)
                 |--Clustered Index Seek(OBJECT:([T].[PK__T__...]), SEEK:([T].[PK]=[T].[PK]) LOOKUP ORDERED FORWARD)

Here are my results running this query with two values of the constant both with and without the sort. Keep in mind that these results depend greatly on the specific hardware. If you try this experiment, your results may vary.

		Execution Time		% Increase
		with Sort	without Sort	% Increase
Constant	2,000,000(0.2% of rows)	91 seconds	352 seconds	286%
Constant	4,000,000(0.4% of rows)	97 seconds	654 seconds	574%
% Increase	100%	6%	86%

There are a two points worth noting regarding these results. First, it should be very clear that the plan with the sort is significantly faster (up to 7 times faster) than the plan without the sort. This result clearly shows the benefit of sequential vs. random I/Os. Second, doubling the number of rows touched had hardly any effect on the execution time for the plan with the sort but nearly doubled the execution time for the plan without the sort. Adding additional I/Os to the plan with the sort adds only a small incremental cost since the I/Os are sequential and the disk head will pass over the required data exactly once either way. Adding additional I/Os to the plan without the sort adds additional disk seeks and increases the execution time proportionately to the increase in the number of rows. In fact, if the constant is increased further, the execution time of the plan with the sort will continue to increase only gradually with the execution time of the plan without the sort will continue to increase rapidly.

Optimizing I/O Performance by Sorting – Part 2

Additional resources