How It Works: Statistics Sampling for BLOB data

I worked on an interesting issue today and the outcome deserves a post.  The issue was that when auto update statistics executed the query went from 6 seconds to 6+ minutes.

select count(*) from tblImageData where blobData is null

Turing on the SP-:StmtStarting and SP:StmtCompleted trace events the internal sample query can be retrieved.

SELECT StatMan([SC0]) FROM (

SELECT TOP 100 PERCENT SUBSTRING ([logo], 1, 100)++substring([blobData],

      case when datalength([blobData])<=200 then 101

            else datalength([blobData])-99 end,

            datalength([blobData]))

AS [SC0] FROM [dbo].[tblImageData]

WITH (READUNCOMMITTED,SAMPLE 8.330000e+001 PERCENT)

ORDER BY [SC0] ) AS _MS_UPDSTATS_TBL

When sampling statistics the SAMPLE clause is added to the query.   When FULL SCAN is forced or elected the SAMPLE clause does not appear in the query.    By default if the index being sampled has less than 1024 pages a FULL SCAN is forced.

For BLOBs the sampling percentage is determined by the number of data rows and pages and not the number of BLOB pages.   You can see the actual page values with: select * from sys.system_internals_allocation_units

Also, notice that for large data types the first 100 and last 100 bytes/chars are retrieved and used in the creation of the statistical information.

Because the number of BLOB pages is not directly considered the sample percentage remain a large value and the first and last 100 bytes are retrieved it can trigger a large BLOB scan operation.

You may consider building statistics on the BLOB column (Create statistics .... NORECOMPUTE) and add the NORECOMPUTE so you can control when the BLOB scanning occurs.

Bob Dorr
SQL Server Principal Escalation Engineer