Column Statistics – No need to create them manually

Lately we stumbled over a few customers which showed us some funny list of indexes, when using ‘Single Table Analysis’. What looked like masses of single column indexes at the end figured out to be custom generated column statistics. Instead of showing up in the lower part of the screen which shows a list of column statistics which were generated by SQL Server automatically, all these statistics did show up in the list for indexes. Since this did happen not only once this year, but multiple times, we were wondering what kind of rumors were spread about the need to create SQL Server column statistics manually. Therefore this blog which will give a bit background about the column statistics and what the clear cut recommendations are with SAP.

What are our column statistics and how are they generated?  When you look at normal index statistics, you can roughly differentiate 3 different sections which are displayed very nicely in DBACockpit. Section 1 contains header data. Data about when the statistics were done. How many rows got sampled, etc. Section 2 contains what we call the density. It basically gives a statistical value on the distinct different values or value combinations between the values of the different index columns. The last and often the larger section show a histogram over the values in the FIRST column of the index. With a first index column which would have many different values, you would see 200 entries and data about how often the value in each of the buckets was found and how many different values were found between these ‘bucket’ values. This very detailed data is used by the SQL Server query optimizer to estimate how many rows would be returned by a query.

However one problem arose when we tested SQL Server 7.0 at the beginning of 1998 with SAP R/3 systems. Imagine many of the SAP created indexes, where the column like MANDT or CLIENT is the first column of an index with more than often one value only. With those indexes, SQL Server Optimizer wouldn’t have any idea about the distribution of values in columns like BUDAT, BEGDA, VBELN, POSSNR, etc. In order to supplement the histogram of the FIRST column of a multi-column index, we added functionality to SQL Server in 1998 to AUTOMATICALLY generate histograms on columns which played a vital role in the process of generating a query plan. This histograms generated automatically would then be persisted as column statistics. Those column statistics would then be refreshed with every statistics update involving the index or the columns.

How can we find them? In order to find a naming convention which wouldn’t collide with customers naming spaces, current SQL Server releases use the following naming conventions. The name is concatenated out of the following three parts:

1. Fix string ‘_WA_Sys_’

2. Eight digit hex value of the column_id within the table

3. Object_id in hex of the table the column and the column statistics belong to

As an example the column statistics of column MATNR of table MARA could be named like: ‘_WA_Sys_00000002_540421D7’. In the system tables one usually find them with a statement like this:

select * from sys.stats where auto_created=1;

How is SAP Netweaver based software dealing with column statistics? From early on in 1998 when automatically generated column statistics got introduced in SQL Server 7.0, SAP software took advantage of it or to a larger degree relied on the fact that customers would leave the database option enabled which allows to automatically create column statistics by SQL Server’s Query optimizer. In all those years, it did work great without ever uncovering reasons to manually supplement with additional manually created column statistics. DBACockpit and some other implementations also to a degree rely on the naming conventions SQL Server chose for the automatically created column statistics.

Why creating own column statistics? We see no reason to do so. As mentioned the clear and very crisp recommendation is to leave the database settings in regards to creating column statistics automatically by SQL Server at the default value, which means that the functionality is enabled. In the my more than 12 years, I worked with the newer generation of SQL Server releases (SQL Server 7.0 and later) with SAP customers and SAP applications, I never was forced to create additional column statistics manually to get a better execution plan for a query. Therefore our recommendation is very clear. Keep the hands off creating additional column statistics on top of the automatic created once. Also please follow the recommendations and leave the automatic generation of column statistics by SQL Server enabled. One of the customers who surprised us with that many manually created column statistics pointed me to the stored procedure sp_createstats delivered with SQL Server which would create column statistics on columns which were not the first column of an index. In all honesty, I didn’t know the stored procedure existed or that we delivered something like this in SQL Server. Also not that it is nicely documented in Books Online. Our joint recommendation of SAP and Microsoft folks who spent in sum decades supporting and deploying SAP systems on SQL Server is crisp and clear; Keep the hands off manually creating column statistics and leave it up to SQL Server to create those as needed.

Cheers Juergen