Statistics on system tables and query performance

Recently I was helping a customer design a database with a very large number of database objects – hundreds of schemas, a few thousands of stored procedures, hundreds of partitioned tables, with most tables containing between two hundred and a thousand partitions. Once all these objects were created, I had to write a catalog metadata query that went something like this: “Find the partition of a table in schema A that has the largest number of pages of all partitions in that table, with the limitation that the table must have more than one partition and may not be referenced by a foreign key constraint, and has a view in schema B with the same name.” (As little sense as this probably makes, there was a perfectly good reason why this query was needed.)

With the plethora of catalog views and DMVs in SQL Server, writing this query wasn’t particularly hard. In this case, it involved sys.schemas, sys.tables, sys.views, sys.partitions, sys.indexes, sys.dm_db_partition_stats, and sys.data_spaces. However, while the query returned correct results, its performance was very poor: it took about twenty seconds to return the row I was after. Looking at the query plan, I noticed something that would normally be a direct clue to the cause of poor performance: while the actual number of rows passing through various iterators in the plan varied, the estimated number of rows was always 1. In other words, the estimates used by the optimizer were inaccurate, resulting in suboptimal plan and poor performance.

If the query used regular user tables, the next troubleshooting step would be to update statistics on these tables in the hopes of giving the optimizer better estimates. But in this case, the query used only catalog views and DMVs, and on first glance, there was no statistics in sight. However, looking at the query plan, I saw that the objects referenced in the iterators were actually hidden system tables, i.e. sys.sysclsobjs, sys.sysidxstats, sys.sysschobjs, sys.syssingleobjrefs, etc. While it is not possible to query these tables directly (unless using the DAC connection), their definition can still be viewed with the sp_help stored procedure. The result from sp_help includes the indexes defined on these tables, and for each index, corresponding statistics can also be viewed with DBCC SHOW_STATISTICS.

When I did that, the reason for inaccurate estimates was clear – even though most tables had a rather large number of rows corresponding to the multitude of objects in the database, in many cases the statistics were dated prior to the time when database objects were created, and showed completely inaccurate density and distribution. Once I ran UPDATE STATISTICS on each index of the system tables, the query returned in less than a second.

Leave a Reply Cancel reply