Ask Learn
Preview
Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms using trusted Microsoft documentation.
Please sign in to use Ask Learn.
Sign inThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Most of us understand that clustered columnstore index can typically provide 10x data compression and can speed up query performance up to 100x. While this sounds all so good, the question is how do I know which tables in my database could potentially benefit from CCI? For a traditional DW scenario with star schema, the FACT table is an obvious choice to consider. However, many workloads including DW have grown organically and it is not trivial to identify tables that could benefit from CCI. So the question is how can I quickly identify a subset of tables suitable for CCI in my workload?
Interestingly, the answer lies in leveraging the DMVs that collect data access patterns in each of the tables. The following DMV query provides a first order approximation to identify list of tables suitable for CCI. It queries the HEAP or the rowstore Clustered index using DMV sys.dm_db_index_operational_stats to identify the access pattern on the base rowstore table to identify tables that meet the criteria listed in the comments below:
-------------------------------------------------------
-- The queries below need to be executed per database.
-- Also, please make sure that your workload has run for
-- couple of days or its full cycle including ETL etc
-- to capture the relevant operational stats
-------------------------------------------------------
-- picking the tables that qualify CCI
-- Key logic is
-- (a) Table does not have CCI
-- (b) At least one partition has > 1 million rows and does not have unsupported types for CCI
-- (c) Range queries account for > 50% of all operations
-- (d) DML Update/Delete operations < 10% of all operations
select table_id, table_name
from (select quotename(object_schema_name(dmv_ops_stats.object_id)) + N'.' + quotename(object_name (dmv_ops_stats.object_id)) as table_name,
dmv_ops_stats.object_id as table_id,
SUM (leaf_delete_count + leaf_ghost_count + leaf_update_count) as total_DelUpd_count,
SUM (leaf_delete_count + leaf_update_count + leaf_insert_count + leaf_ghost_count) as total_DML_count,
SUM (range_scan_count + singleton_lookup_count) as total_query_count,
SUM (range_scan_count) as range_scan_count
from sys.dm_db_index_operational_stats (db_id(),
null,
null, null) as dmv_ops_stats
where (index_id = 0 or index_id = 1)
AND dmv_ops_stats.object_id in (select distinct object_id
from sys.partitions p
where data_compression <= 2 and (index_id = 0 or index_id = 1)
AND rows >= 1048576
AND object_id in (select distinct object_id
from sys.partitions p, sysobjects o
where o.type = 'u' and p.object_id = o.id))
AND dmv_ops_stats.object_id not in ( select distinct object_id
from sys.columns
where user_type_id IN (34, 35, 241)
OR ((user_type_id = 165 OR user_type_id = 167) and max_length = -1))
group by dmv_ops_stats.object_id
) summary_table
where ((total_DelUpd_count * 100.0/(total_DML_count + 1) < 10.0))
AND ((range_scan_count * 100.0/(total_query_count + 1) > 50.0))
Example: Restore database AdventureWorksDW2016CTP3 and following the following steps
Select c.CalendarYear,b.SalesTerritoryRegion, FirstName + ' ' + LastName as FullName,
count(SalesOrderNumber) as NumSales,sum(SalesAmount) as TotalSalesAmt , Avg(SalesAmount) as AvgSalesAmt
,count(distinct SalesOrderNumber) as NumOrders, count(distinct ResellerKey) as NumResellers
From FactResellerSalesXL a
inner join DimSalesTerritory b on b.SalesTerritoryKey = a.SalesTerritoryKey
inner join DimEmployee d on d.Employeekey = a.EmployeeKey
inner join DimDate c on c.DateKey = a.OrderDateKey
Where c.FullDateAlternateKey between '1/1/2006' and '1/1/2010'
Group by b.SalesTerritoryRegion,d.EmployeeKey, d.FirstName,d.LastName,c.CalendarYear
Now, to find about data compression savings, unfortunately it is not yet supported in the popular stored procedure sp_estimate_data_compression_savings but you can use the following work around for approximate calculation.
Hope this helps you discover tables that could qualify for columnstore index. As always, please feel free to reach out to me for any questions on columnstore index
Thanks,
Sunil Agarwal
SQL Server Tiger Team
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam
Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms using trusted Microsoft documentation.
Please sign in to use Ask Learn.
Sign in