Considerations when tuning your queries with columnstore indexes on clone databases

As discussed in my previous blog post, one of the primary scenario for DBCC CLONEDATABASE is to assist dbas, developers and support teams in troubleshooting sub-optimal query plans by creating fast, minimally invasive and transaction ally consistent database clones of their production databases. The database clone created using DBCC CLONEDATABASE contains the copy of schema and statistics which allows the optimizer to generate same query plan as observed on the production database without the actual data. While this is true for queries involving traditional rowstore indexes, there are some special considerations for queries involving Columnstore indexes due to difference in the way statistics are generated for these indexes. In this blog post, I will explain you this behavior and in the end, provide you with the scripts required to handle this scenario to generate same query plan in database clone as observed on the production databases.

Unlike traditional B-tree indexes, when a columnstore index is created, there is no index statistics created on the columns of the columnstore indexes. However, there is an empty stats object created with the same name as columnstore index and an entry is added to sys.stats at the time of index creation. The stats object is populated on the fly when a query is executed against the columnstore index or when executing DBCC SHOW_STATISTICS against the columnstore index, but the columnstore index statistics aren't persisted in the storage. The index statistics is different from the auto created statistics on the individual columns of columnstore indexes which is generated on the fly and persisted in the statistics object. Since the index statistics is not persisted in storage, the clonedatabase will not contain those statistics leading to inaccurate stats and different query plans when same query has run against database clone as opposed to production database.

Let me illustrate this behavior with a sample script below

set nocount on
go
create database db1
go use db1
go create table t1(a varchar(8000)); go insert t1 values(replicate(newid(), 200)); go 1000
create clustered Columnstore index cci on t1
go

-- Initial stats object with name cci created at the time of index creation select * from sys.stats where object_id=OBJECT_ID('t1')
go

-- Adding more 3000 rows to the table  insert t1 values(replicate(newid(), 200));
go 3000

-- Query with predicate to generate auto created statistics on the column select a from t1 where a = 'aaaaaaaaaaaaaa' GO

-- Verify if auto created statistics got added to the table  -- Here you will see 2 statistics objects. 1 with the index name and other auto created statistics on column a due to the earlier query

select * from sys.stats where object_id=OBJECT_ID('t1') go

-- Turn on Actual Exec Plan and it accurately displays 4000 rows  -- before cloning select a from t1
GO

-- Run DBCC SHOW_STATISTICS against the index statistics of columnstore index and it accurately reflects 4000 rows & 6000 Data pages (generated on the fly)

dbcc show_statistics('t1','cci') with stats_stream go

dbcc clonedatabase('db1','db2') go

use db2
go

-- Both the stats are copied in clone  select * from sys.stats where object_id=OBJECT_ID('t1') go

/* If you turn on Actual Execution Plan only 1000 rows are displayed in statistics clone which is the same as the number of rows at the time of creation of cci but never updated.*/ select a from db2.dbo.t1
GO

-- Run DBCC SHOW_STATISTICS against the index statistics and it reflects 1000 rows & 0 Data pages which was at the time of index generation (not updated) dbcc show_statistics('t1','cci') with stats_stream
go

-- DROP DATABASE after tests

use master go DROP DATABASE db1
DROP DATABASE db2

This is by design behavior of Columnstore indexes in SQL Server. To handle this behavior and to be able to accurately capture the columnstore index statistics in the clone database, we have created and shared a script in our Tiger Github repository, which can be used to update the columnstore index statistics on the source database before running DBCC CLONEDATABASE.

The script usp_update_CI_stats_before_cloning.sql should be run on source production database which needs to be cloned before running DBCC CLONEDATABASE. The script basically performs the following

  1. Runs DBCC SHOW_STATISTICS WITH STATS_STREAM against all the columnstore indexes on the source database to capture the up to date stats blob generated on the fly.
  2. Updates the persisted stats object with the most recent stats blob captured in step 1.

Note: The above script is required to be run only if you would like to clone columnstore index statistics provided the query plan on the database clone is different from the query plan on source database. This script is not required to be run otherwise since database engine is designed to generate and handle the non-persisted columnstore index statistics to generate efficient query plans.

 

Parikshit Savjani
Senior PM, SQL Server Tiger Team Twitter | LinkedIn
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam