Updateable Column Store Indexes in SQL Server 2014

Editor’s note: In partnership with Microsoft Press, now celebrating their 30th year, MVPs have been contributing to an ongoing guest series on their official team blog. Today’s article is from SQL Server MVP Sergio Govoni which is the 42nd in the series.  

Updateable Column Store Indexes in SQL Server 2014

Introduction

Column store indexes had been released with SQL Server 2012 to optimize data warehouse workloads that have specific patterns, data are loaded through T-SQL scripts or through SSIS packages, several times a day, in any case the frequency is not important, only that the available data is loaded in the same execution. At the end of ETL process, data is read with reporting tools. Usually data is written one time, then read multiple times.

In SQL Server 2012 there was the non-clustered column store index only; like a traditional B-Tree non-clustered index, it was a secondary index. However, it differs from a traditional B-Tree index because it is based on a columnar structure, though, the base table remains organized by row (in a structure called row-store and saved in 8K data pages).

The column store indexes are part of Microsoft In-Memory Technologies because they use xVelocity engine for data compression optimization and its implementation is based on a columnar structure such as PowerPivot and SSAS Tabular. Data in column store indexes are organized by column, each memory page stores data from a single column, so each column can be accessed independently. This means that SQL Server Storage Engine will be able to fetch the only columns it needs. In addition, data is highly compressed, so more data will fit in memory and the I/O operations can greatly decrease.

Column store indexes structure

Before talking about new feature of column store indexes in SQL Server 2014, it is important to introduce three keywords: Segment, Row Group and Compression. In a column store index, a segment contains values for one column of a particular set of rows called row group. As it is possible to see in the following picture, each red and gray portions are segments. When you create a column store index, the rows in the table will be divided in groups and each row group contains about 1 million rows (the exact number of rows in each row group is 1,048,576; in other word there are 220 rows in each row group). Column store transforms the internal index organization from row organization to columnar organization and there will be one segment for each column and for each row group. Column store indexes are part of Microsoft In-Memory technologies in which data is compressed and the compression plays a very important role, so each segment is compressed and stored in a separate LOB.

This article does not detail the algorithms to compress data in column store index. At any rate, keep in mind that each segment stores the same type of data, so in a segment, there will be homogeneous data and in this scenario, the compression algorithms will be more efficient than the one used to compress table rows because row usually contains different type of data. In general, data compression can be implemented using different techniques such as:

  • Dictionary Encoding
  • Run-length Encoding
  • Bit Packing
  • Archival Compression (only in SQL Server 2014)
    • It can reduce disk space up to 27%

The techniques used by SQL Server to compress data are undocumented. The following picture shows an example of row groups and segments in a column store index.  Continue reading full article here.

About the author

Since 1999 Sergio Govoni has been a software developer; in the 2000 he received a Degrees in Computer Science  from Italian University. He has worked for over 14 years in a software house that produces multi-company ERP on Win32 platform. Today, at the same company, he is Program Manager and he’s constantly involved on several team projects, where he takes care of the architecture and the mission-critical technical details. Since 7.0 version he has been working with SQL Server and he has a deep knowledge of Implementation and Maintenance of Relational Databases, Performance Tuning and Problem Solving skills. He also works for training people to SQL Server and related technologies. Since 2010 he is a Microsoft SQL Server MVP. You can meet him at conferences, SQL Saturday or Microsoft events.  Sergio blogs in both English and Italian.  Follow him on Twitter

About MVP Monday

The MVP Monday Series is created by Melissa Travers. In this series we work to provide readers with a guest post from an MVP every Monday. Melissa is a Community Program Manager, formerly known as MVP Lead, for Messaging and Collaboration (Exchange, Lync, Office 365 and SharePoint) and Microsoft Dynamics in the US. She began her career at Microsoft as an Exchange Support Engineer and has been working with the technical community in some capacity for almost a decade. In her spare time she enjoys going to the gym, shopping for handbags, watching period and fantasy dramas, and spending time with her children and miniature Dachshund. Melissa lives in North Carolina and works out of the Microsoft Charlotte office.