Columnstore Index – How to Estimate Compression Savings

SQL product team has made significant improvements in columnstore index functionality, supportability and performance during SQL Server 2016 based on the feedback from customers. Please refer to  List of Blogs  for all blogs published by SQL Tiger Team on columnstore index.

Issue
When SQL Server team released ROW and PAGE compression in SQL Server 2008, customers could invoke sp_estimate_data_compression_savings  stored procedure to estimate the storage savings for ROW and PAGE compression. Note, the compression savings was just an estimate based on sampling a subset of rows from the source table and loading them into a temporary table and then measuring the size of this temporary table before/after the compression. For most cases, the compression savings estimate was good except when the data in the source table was skewed. Most customers found it useful as it was a convenient way to see the storage benefits. However, as some of you have found out, this stored procedure has not been extended to estimate storage savings from columnstore index. This is something we could consider for the future.

Workaround
For now, to estimate compression savings for columnstore index, we recommend the following steps

  1. Create a staging table with identical schema
  2. Load 2 million rows into the staging table. Note, I have chosen 2 million arbitrarily but it needs to be at least 1 million.
  3. Use sp_spaceused to find the size of the table
  4. Now create columnstore index on the table
  5. Measure the storage using sp_spaceused.
  6. Compare the numbers in (3) and (5)

The storage savings for the staging table as computed above will be a good estimate to work with.  Here is a simple example to show case this

use AdventureWorksDW2016CTP3
go

-- create a staging table
select * into ccitest_temp from dbo.FactResellerSalesXL where 1=2
go

-- load 2 million rows
insert into ccitest_temp select top 2000000 * from dbo.FactResellerSalesXL

-- find the spaceused. Note it down
sp_spaceused 'ccitest_temp'

-- Create clustered columnstore index on the staging table
create clustered columnstore index ccitest_temp_cci on ccitest_temp

-- find the spaceused in the compressed state and then compare
sp_spaceused 'ccitest_temp'

-- Now you can drop the staging table
drop table ccitest_temp

Nonclustered Columnstore Index (NCCI)
Creating an NCCI does not save storage; in fact, it takes additional storage. If you are interested in finding the size of NCCI, you can follow the same step as above but instead CCI, create an NCCI and the see the increased size.

Thanks,
Sunil Agarwal
SQL Server Tiger Team
Twitter | LinkedIn
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam