Scan Count meaning in SET STATISTICS IO output: Part 2

In my last blog post, I had explained what Scan Count means and how it is calculated for a query executed on a table. I had illustrated a few examples based on seeks/scans done on Clustered Indexes of a table. This time I shall expand this to clustered indexes while performing joins. We shall use the same thumb rules from last time to explain the Scan Count returned.

Thumb rules

1. Scan count is 0 if the index that you are using is a unique index or clustered index on a primary key and you are seeking for only one value. Eg. WHERE Primary_Key_Column = <value>

2. Scant count is 1 when you are searching for one value using a non-unique clustered index which is defined on a non-primary key column. This is done to check for duplicate values for the key value that you are searching for. Eg. WHERE Clustered_Index_Key_Column = <value>

3. Scan count is N when N is the number of different seek/scan started towards the left or right side at the leaf level after locating a key value using the index key.

This time I am going to use the MSDB database and illustrate the Scan Count output for Inner Joins. The query that I used was:

select * from msdb.dbo.backupfile a

inner join msdb.dbo.backupset b

on a.backup_set_id = b.backup_set_id

where a.backup_set_id = 1

The query plan snippet for the above is:

|--Clustered Index Seek(OBJECT:([msdb].[dbo].[backupset].[PK__backupse__21F79AAB0E391C95] AS [b]), SEEK:([b].[backup_set_id]=(1)) ORDERED FORWARD)

|--Clustered Index Seek(OBJECT:([msdb].[dbo].[backupfile].[PK__backupfi__57D1800A17C286CF] AS [a]), SEEK:([a].[backup_set_id]=(1)) ORDERED FORWARD)

The scan count output is as follows:

Table 'backupfile'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'backupset'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The scan count on the table “backupset” is Zero because the ID used to search is 1. The backup_set_id column has a primary key defined on it. So using thumb rule #1, we can explain this. The scan count on the table “backupfile” is 1 because the primary key used for the seek is defined on two columns: backup_set_id and file_number. In this case, SQL Server needs to go beyond the one value that it found to ensure that there are no other rows with the seek value i.e. backup_set_id = 1. This behavior can be explained with thumb rule #2.

Now, I shall remove the filter and see what kind of surprises we get for Scan Count numbers! J

The snippet of the query plan is shown below. The Index Scan/Seek operations feed into a Nested Loop operation as before but the difference here is that there is no SEEK filter applied on the table “backupset”.

Query:

select * from msdb.dbo.backupfile a

inner join msdb.dbo.backupset b

on a.backup_set_id = b.backup_set_id

|--Clustered Index Scan(OBJECT:([msdb].[dbo].[backupset].[PK__backupse__21F79AAB0E391C95] AS [b]))

|--Clustered Index Seek(OBJECT:([msdb].[dbo].[backupfile].[PK__backupfi__57D1800A17C286CF] AS [a]), SEEK:([a].[backup_set_id]=[msdb].[dbo].[backupset].[backup_set_id] as [b].[backup_set_id]) ORDERED FORWARD)

The Statistics IO output for the query is as follows:

Table 'backupfile'. Scan count 20, logical reads 42, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'backupset'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The Scan Count of One for the table backupset is self-explanatory because it started a scan from the first value stored in the clustered index page and went on scanning till the end to retrieve, in my case, 20 rows. This can very well be explained by thumb rule #2. Now the surprising part is the Scan Count of 20 for the table “backupfile”. If you look at the SEEK condition, you will see that the JOIN criteria is used as the SEEK condition here. This means that for each row returned from the Clustered Index Scan operation on the “backupset” table, it will use that value to Seek for a matching row in the “backupfile” table. Since, the table has a primary key defined on two columns and we are joining only one of the primary key columns, each seek on a value returned from the “backupfile” table would result in a Scan Count 1 as SQL Server would need to determine if there are any other rows with the same value. Since I have 20 rows returned and the join condition needs to be evaluated for all these rows, the scan count is 20*1=20.

If I change the query to the following:

select * from msdb.dbo.backupfile a

inner join msdb.dbo.backupset b

on a.backup_set_id = b.backup_set_id

where a.file_number = 1

The Statistics IO output changes to the following:

Table 'backupfile'. Scan count 0, logical reads 40, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'backupset'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The difference in the query plan now is that SEEK condition for the clustered index seek on the “backupfile” table changes to:

SEEK:([a].[backup_set_id]=[msdb].[dbo].[backupset].[backup_set_id] as [b].[backup_set_id] AND [a].[file_number]=(1.))

Even though we are evaluating the condition for all the 20 rows returned by the “backupset” table, we find that there is an additional filter in the SEEK condition i.e. the file_number column, for the data fetched from the “backupfile” table. This changes the scan count to Zero. The reason for this is that the SQL instance now knows that it has to fetch only one row for the value given in the filter as both columns present in the composite primary key are being used during the SEEK. Now SQL knows prior to the fetch that there cannot be duplicate rows, so no extra scan/seek is required.

This is a gotcha that we might overlook when a clustered index seek is being used! Since, we are dealing with a small set of rows, the plan changes may not be significant but for larger tables, such a difference in evaluation may have a significant impact on the query plan and ultimately the performance of the query.

 

 

Regards,
Amit Banerjee
SEE , Microsoft SQL support