How Statistics in SQL Server determines performance

SQL Server collects statistical information about indexes and column data stored in the database. These statistics are used by the SQL Server query optimizer to choose the most efficient plan for retrieving or updating data. When a large number of rows have been inserted/updated/deleted on a table then statistics may become out of date causing a bad query plan. 

 

  •  Statistics maintained on each table in SQL Server to aid the optimizer in cost-based decision making include the:
  1. Number of rows in the table.

            Number of pages used by the table.

            Number of modifications made to the keys of the table since the last update to the statistics.

 

  •  Additional information is stored for indexes, including (for each index):
    • An equi-height histogram on the first column.

                        Densities on all column prefixes.

  •            Average key length.
  •  The basic algorithm for auto update statistics is:
    • If the cardinality for a table is less than six and the table is in the tempdb database, auto update with every six modifications to the table.

                       If the cardinality for a table is greater than 6, but less than or equal to 500, update status every 500 modifications.

                      If the cardinality for a table is greater than 500, update statistics when (500 + 20 percent of the table) changes have occurred.

 

  • For table variables, cardinality changes does not trigger auto update statistics.

 

  • Statistics are automatically updated when the query optimizer determines that they are out-of-date. This basically means that sufficient data modification operations have occurred since the last time they were updated to minimize their usefulness. The number of operations is tied to the size of the table and is usually something like 500 + 0.2 * (number of rows in the table). This means that the table must have at least 500 modification operations before statistics are updated during query optimization; for large tables, this threshold can be much larger. In this case, when I say "operation" I mean a single row modification. A single UPDATE statement that changes 1000 rows will be counted as 1000 modification operations. You can actually see the number of modification operations in the sysindexes table, in the rowmodctr column.
    The number of rows affected by every INSERT, UPDATE, or DELETE operation on a table is reflected in the rowmodctr column for the row representing the table itself.

 

  • The newly-introduced statistical maintenance functionality, AutoStat, may generate unwanted overhead on a production system by performing one of the following actions:
    • Initiating statistical updates during heavy production periods.

             -or-

            Initiating an excessively high number of UPDATE STATISTICS processes at a given point in time.

  • To maintain the statistical information in an as up-to-date fashion as possible, SQL Server introduces AutoStat, which, through SQL Server's monitoring of table modifications, is capable of automatically updating the statistics for a table when a certain change threshold has been
    reached. Additionally, SQL Server introduces auto-create-statistics, which causes the server to automatically generate all statistics required for the accurate optimization of a specific query.
  •  The server maintains two counters in sysindexes - rowModCtr and rowModCtrNCMax.  RowModCttr tracks the number of update/insert/delete operations made to a table since the statistics were last updated.  Specifically, the updates are only  tracked when there is a change to a column covered by a histogram on the table.  The basic idea is that we maintain the rowModCtr value whenever we touch columns  covered by a histogram.  Whenever we update the statistics, we only update those  statistics required by the current query.    In order that we don't loose
    track  of the fact that the other statistics may be stale, we move the rowModCtr value  from the parent table to each of the index/stats which were not maintained.   We  then summaries the maximum change count for all indices in the base index row.
  •  The  sysindexes.rowmodctr column maintains a running total of all modifications to a table that, over time, can adversely affect the query processor's decision making process. This counter is updated each time any of the following events occurs:
    • A single row insert is made.

                        A single row delete is made.

                        An update to an indexed column is made.

NOTE: TRUNCATE TABLE does not update rowmodctr.

After table statistics have been updated, the rowmodctr value is reset to 0 and the table's statistics schema version is updated.

  • In addition to cardinality, the selectivity of the predicate also affects AutoStats generation. This means that statistics may not be updated afer every 500 modifications if cardinality were < 500 or for every 20% of changes if cardinality were > 500. A scale up factor (value ranges
    from 1 to 4, 1 and 4 inclusive) is generated depending on the selectivity and a product of this factor and the number of changes as obtained from the algorithm would be the actual number of modifications required for AutoStats generation.
  •  To determining when Autostats are being run  we can use Trace Flag 205.
  •   When set to ON, existing statistics are automatically updated when the statistics become out-of-date because the data in the tables has changed. When set to OFF, existing statistics are not automatically updated; instead, statistics can be manually updated. SQL Server automatically updates this statistical information periodically as the data in the table’s changes. The sampling is random across data pages, and taken from the table or the smallest nonclustered index on the columns needed by the statistics. After a data page has been read from disk, all the rows on the data page are used to update the statistical information. The frequency at which the statistical information is updated is determined by the volume of data in the column or index and the amount of changing data. For example, the statistics for a table containing 10,000 rows may need updating when 1,000 index values have changed because 1,000 values may represent a significant percentage of the table. However, for a table containing 10 million index entries, 1,000 changing index values is less significant, and so the statistics may not be automatically
    updated. SQL Server, however, always ensures that a minimum number of rows are sampled; tables that are smaller than 8 megabytes (MB) are always fully scanned to gather statistics.

 

  • If needed, selectively use FULLSCAN statistics

              If you are using auto create and auto update statistics and you are getting a bad query plan because the statistics are not accurate or current, do the following:

                     1. Leave auto create and auto update statistics on.

                     2. Only for those statistics that are not accurate or current enough, use  CREATE STATISTICS … WITH FULLSCAN, NORECOMPUTE and a batch job that does  UPDATE STATISTICS … WITH FULLSCAN, NORECOMPUTE periodically

                    3. Use asynchronous statistics update if synchronous update causes undesired delay

                         If you have a large database and an OLTP workload, and if you enable AUTO_UPDATE_STATISTICS, some transactions that normally run in a fraction of a second may very infrequently take several seconds or more because they cause statistics to be updated.

                         If you want to avoid the possibility of this noticeable delay, enable AUTO_UPDATE_STATISTICS_ASYNC. For workloads with long-running queries, getting the best plan is more important than an infrequent delay in compilation. In such cases, use synchronous rather than       asynchronous auto update statistics.

                   4. Determining When to Update Statistics Consider updating statistics for the following conditions:

  •  Query execution times are slow.
  • Insert operations occur on ascending or descending key columns.
  • After maintenance operations.

Rule of thumb!

  • Use Update Stats with full scan and disable Auto Stats when doing large batch updates and minimal to none ongoing updates.
  • Use Auto Stats (default) and don't run UPDATE STATS Full Scan when data is modified throughout the day.

 

Happy Learning & Troubleshooting!!

 

Disclaimer:  Everything here, is my personal opinion and is not read or approved by Microsoft before it is posted. No warranties or other guarantees will be offered as to the quality of the opinions or anything else offered here.