SQL Server Statistics: Explained


Several DBA’s asked me these questions about SQL Server statistics at different customer places.

a)      Should we disable Auto update stats for the database and run maintenance plans instead?

b)      What sampling rate does Auto Stats use? Can we control it?

c)       Should we update statistics after rebuilding indexes during maintenance plans?

 

Here is an explanation for all these questions.

There are 3 types of statistics in a SQL Server database.

1)      Statistics created due to index creation. These statistics have the index name

2)      Statistics created by Optimizer(Column statistics). Starts with _WA_*

3)      User defined statistics which are created with CREATE STATISTICS command by the DBA

 Facts about statistics: 

  • Index statistics are always updated with fullscan when we rebuild the index (Only exception is in SQL 2012 partitioned index when the number of partitions >1000 it uses default sampling)
  • Column Statistics are not updated because of index rebuilds, they are only updated by either DB Auto update stats option or if we manually update statistics using UPDATE STATISTICS command.
  • If we do not specify the sampling rate while running the UPDATE STATISTICS command it will take the default sampling rate. See below how the default sampling rate algorithm works.
  • If you specify Update statistics command with just the table name it will update statistics for all the statistics including the index statistics on the table. So you may lose the full scan advantage which you get for index statistics if you update statistics with sampling rate after rebuilding indexes (Be very careful with this step during your maintenance windows)

 

Auto Update stats Algorithm: 

So the Auto Update stats will fire for every 500 + 20% change in table rows. Of course, we have an improved algorithm in SQL 2012 which is SQRT(1000 * Table rows) which is much better.

 

When it fires it will use the  default sampling rate and here is the algorithm how it calculates the sampling rate.

1)      If the table < 8MB then it updates the statistics with a fullscan.

2)      If the table > 8MB, it follows an algorithm. It reduces the sampling rate as the number of rows in the table are increased to make sure we are not scanning too much data. This is not a fixed value but is under the control of optimizer. It is not a linear algorithm either.

Example: if we have 1,000,000 rows it would use a sampling rate of 30% but when the number of rows increase to 8,000,000 it would reduce the sampling rate to 10%. These sampling rates are not under the DBAs control but optimizer decides it.

 

Question: So, what if we disable Autostats on the database and decide to update statistics manually by the DBA using a scheduled job?

Answer: It depends, on what the sampling rate was  used by the optimizer versus what we specify manually. The larger the sampling rate the better the results.

                Here is how you can calculate the sampling rate used by Autostats algorithm, Run DBCC SHOW_STATISTICS(Tablename, statistic name), This will come up with Num of rows and Sampled rows. So you do a math,  Sampling Rate = (sampled rows/Actual Rows)*100

 

In case you are running on SQL 2008 R2 SP2 or SQL 2012 SP1, there is a new DMV called as sys.dm_db_stats_properties where you can get the statistics sampling rate for all tables in the db.

 

You can check this for every table and see which method has the largest sampling rate and select it. If you think manually updating statistics is yielding higher sampling rate, then you can run Sp_autostats ‘OFF’ for that statistic and use your job to update it manually. In this way either the statistics are updated with Auto stats or updated manually yielding better execution plans for the optimizer.

 

Perils of updating statistics manually:

1)      Adds overhead for the DBA to write custom scripts for individual statistics

2)      If the job fails, we don’t have updated statistics until the next run.

3)      If SQL Agent fails, then your update statistics job cannot run.

 

Recommendation: We would recommend to leave the Auto Stats option for the database to ON, and if there is a particular case where your seeing bad query plans on certain tables, just disable Autostats for that table and enable the manual statistics job for that table.

References:

Internal employees

http://msdn.microsoft.com/en-us/library/dd535534.aspx

http://www.sqlskills.com/blogs/erin/post/New-Statistics-DMF-in-SQL-Server-2008R2-SP2.aspx

 


Comments (8)

  1. Kishore says:

    Nice article krishna.. Need more articles.. If possible can you post article about backup database architecture the way you explained here.

  2. sql_handle says:

    The SQL server 2012 stats update threshold of sqrt(1000*num rows) is one I've seen associated with trace flag 2371.  Does this mean that systems leveraging T2371 can retire it in SQL Server 2012 and beyond due to changes in default behavior?

    blogs.msdn.com/…/changes-to-automatic-update-statistics-in-sql-server-traceflag-2371.aspx

  3. sql_handle says:

    blogs.msdn.com/…/sql-server-trace-flag-2371-for-dynamics-ax.aspx

    The link above from Michael DeVoe indicates that sqlserver 2008 r2 sp2 and 2012 sp1 saw changes making trace flag 2371 more precise in its behavior impact, but implies that default auto update threshold without T2371 remains unchanged.

  4. Sql_handle says:

    Aha! A little slow on the uptake here 🙂

    SQL Server 2008 R2 SP1 and 2012 SP1 were the debut of trace flag 2371, documented in KBa 2754171.  So the lower threshold is available for large tables as of the versions Michael DeVoe mentions, and after SP1 it's available in 2012 as you mentioned.  But it remains in any version only activated by trace flag 2371.

    support.microsoft.com/…/2754171

  5. Feroz says:

    Very succinct article indeed Krishna, but isn't that the auto update algorithm kicks in for the following criteria?

    For SQL 2000: 500 + 20% of data changes at row level

    For SQL 2005 & 2008 500+20% of data changes to the column that the statistics describe?

    Please clarify.

  6. CareySon says:

    Column Statistics are not updated because of index rebuilds, they are only updated by either DB Auto update stats option or if we manually update statistics using UPDATE STATISTICS command.

    i dont think that's correct,index build will cause corresponding statistics update as well

  7. AJ says:

    Great article,  here is a good guide on how to identify manual update-able stats  sqlturbo.com/finding-good-stats-candidates-for-manual-updating

  8. Ken Trock says:

    "Of course, we have an improved <Auto Update Stats> algorithm in SQL 2012 which is SQRT(1000 * Table rows) which is much better."

    Just to point out, several people on sqlservercentral.com have verified that this is NOT by default. You must turn on trace flag 2371 to get this behavior.