Columnstore Index: What is new in SQL Server 2016?

Background

The columnstore index was first introduced in SQL Server 2008R2 with PowerPivot (https://technet.microsoft.com/en-us/library/ff628113(v=sql.100).aspx). This technology was adopted into SQL Server 2012 engine with some changes and was surfaced as an non-clustered columnstore index (NCCI). The NCCI was a read-only index that targeted to speed up analytics on table/partitions that are read-only. There were two challenges with it:

  • First, it was a new index on an existing rowstore table. Even though the size of NCCI was typically 10x less than the rowstore table but it did not eliminate the rowstore. So customers did not see any reduction in storage.
  • Second, it was read-only which required customers to work around these restrictions like by dropping/recreating the NCCI. Still, the significant speed up in analytics was value enough for many customers to adopt this technology for their production workloads.

SQL Server 2014 addressed both of these challenges with the introduction of updateable clustered columnstore index (CCI). With CCI, customers to get rid of the rowstore and replace it by columnstore storage as the master copy of the data. Since it is the master copy of the data, it was termed as ‘clustered’. Using CCI, the customers can get the expected high data compression for significant storage benefits. Additionally, CCI is updateable that makes is easy to use. A typical scenario for Data Warehouse or Data Mart is that data is moved from transactional systems periodically thru ETL (Extract, Transform and Load) concurrently with analytics query workload. With CCI, customers can, for example, convert the FACT table to CCI and run analytics workload concurrently with ETL. Many customers have deployed CCI successfully in their production workloads. Here are couple of case studies:

SQL Server 2016

In-Memory technologies continue to be an area of investment for SQL Server team. With SQL Server 2016  there are many new enhancements to columnstore technology as described here:

  • Updateable nonclustered columnstore index (NCCI): Now you can create an NCCI on a rowstore table that is undergoing DML transactions. NCCI is a basic building block in enabling real-time operational analytics. Please refer to Real-Time Operational Analytics Using In-Memory Technology
  • Ability to create columnstore index on memory-optimized tables to deliver high performance real-time operational analytics that combines two key in-memory technologies in SQL Server. Please refer to Real-Time Operational Analytics: Memory-Optimized Tables and Columnstore Index
  • Ability to create regular btree indexes to on tables with CCI. This enables enforcement of PK/FK constraints, row-level locking, efficient for equality search and short-range queries
  • Many Performance improvements that together have provided around 40% better performance over the same queries running in SQL Server 2016.  Please refer to the picture below for TPC-H benchmark for the numbers released in March'2016. In the table below (1) and (3) were run on the same hardware and you can see that SQL Server 2016 gives around 40% better performance. Please refer to the details on the official website for TPC-H - Top Ten Performance Results - Non-Clustered Version 2. This is significant because your analytics queries will see a significant performance boost by just upgrading to SQL Server 2016 without requiring any changes.

tpch

  • Parallel data import from staging table.
  • Online defragmentation of columnstore indexes. Please refer to Columnstore Index Defragmentation using REORGANIZE Command
  • Ability to offload analytics on Readable Secondary on secondary replica in Always On configuration.
  • Ability to access NCCI and CCI using snapshot based isolation levels (RCSI and SI).
  • Significant improvements to make monitoring/troubleshooting easier.

I will cover some of the key improvements in the subsequent blogs.

Thanks

Sunil