I have had many calls with customers that essentially ended with your I/O subsystem cannot keep up with your workload. The obvious next question was how do we solve that? My answer was usually get a faster IO subsystem or do less IO! Now, SQL 2014 offers another option.
Let’s pretend for a moment that you cannot get faster IO and your problem is data warehouse style table and queries. The key here; not an OLTP workload. If this is the case Clustered Columnstore Index (CCI) in SQL 2014 may be worth investigating.
SQL Server 2012 introduced the non-clustered columnstore index which was met with great fanfare to everyone who tested and the few who were able to use it. A limitation of the non-clustered column store is it rendered the table read only. In SQL 2014 the Clustered Columnstore is updatable.
What you need to know,
- Clustered Columnstore is the only index on the table
- Data is stored in an columnar format versus a row format
- Columnstore is highly compressed, unlike non-clustered column store index, your entire table may be reduced in sized from 2x-20x.
- Each column is compressed into a Column Segment that has up to 1 million rows (1 column of 1 million rows, for simplicity i use 1 million rows for the column segment, but is is actually 1,048,576 rows, (1024*1024))
- A group of 1 million rows makes up a row group (the row group is made up from the columns, if your table is 10 columns, your row group would be made up of 10 columns by 1 million rows)
- Until you have inserted 1 million rows your data is kept in a deltastore
- If you bulk insert 102,400 they will go directly to the columnstore bypassing the deltastore
- CCI Archival_Compression you will gain even more compression, this is very good for older data that is accessed less often
- Only columns that are in the SELECT, WHERE predicate or/and JOIN and GROUP BY are read from disk thus allowing column segment elimination
- I/O for an entire row group is eliminated if the row segments are out of range of the WHERE predicate or JOIN
- CCI can be partitioned for easier management of your table
- If your query uses every column in the table, your performance may be worse, consider the queries accessing the table before blindly converting to a Clustered Column Store Index
- Once a column segment is created it is never physically changed, deletes are maintained using a delete bitmap
- Updates are a delete and insert, delete bitmap is updated to mark the old row and the new row will be inserted into the delta store
- A high number of updates may reduce overall query performance against the CCI table
- Batch Mode processing moves data between query execution operators in vectors (up to 1000 rows at a time) vs row mode processing which moves 1 row at a time
- Columnstore queries are highly parallel operations, they are not limited by k-groups, Columnstore operations do obey MAXDOP settings
- Alter Index Rebuild is used to physically delete rows marked for deletion in the delete bitmap
- Alter Index Reorganize will only move closed row groups in the deltastore into the columnstore
- SQL 2014 CCI can participate in an AlwaysOn availability group however it is not readable on the secondary
- If you columnstore index creation fails with out of memory error 8657, 8658, 701, 802, increase the resource governor default resource pool max memory per the SQL Server Columnstore Index FAQ for the creation of the index, do not forget to set it back when complete
To give you a sense of the data compression you may experience, the following is test telemetry data from a Microsoft application that came through the lab, and keep in mind the column data was relatively redundant so there was great opportunity for compression.
Looking at the SSMS table size report below, the extension on the table name indicates the level of compression based on size from a standard row store table to a Clustered Column Store Archive on a table with 1,111,997,989 rows and uncompressed size of 182GB.
*_RS = Row Store
*_RS_PC = Row Store Page Compressed
*_CCI = Clustered Column Store Index
*_CCI_Archive = Clustered Column Store Index with Archival_Compression
The following sample used a query test to get an idea of the potential performance for 1 execution of the query with a cold cache, and each query run on the same server with the same hardware configuration.
(For this test I used a 2 processor 16 core, 128 GB RAM with SQL allocated 112GB, 2 mirrored 10k RPM disks for data file)