Real-Time Operational Analytics - Overview nonclustered columnstore index (NCCI)

The blog technology/  https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/12/09/columnstore-index-real-time-operational-analytics-using-in-memory-technology/ describes that customers are increasingly looking for real-time operational analytics for the ability to do analytics in real-time while minimizing or eliminating the cost of ETL and setting up of a separate DW. While the customer value is compelling but it was not possible earlier when data was organized as rows. With the introduction of columnstore technology since SQL Server 2012, customers are seeing dramatic (upto 100x) speed up in the performance of analytics queries. SQL Server 2016, with the introduction of updateable nonclustered columnstore index (NCCI), makes it possible to do real-time analytics on operational schema. More importantly, you can do so without requiring any changes to your OLTP or transactional application. All you need to do is to create an NCCI on one or more tables that are needed for analytics. SQL Server query optimizer automatically chooses NCCI for analytics queries while your OLTP workload continues to run using the same btree indexes as before. In many cases, if the existing OLTP workload already had indexes for speeding up analytics queries, you can drop those indexes and replace them by a single NCCI index. If you are replacing multiple btree indexes with one NCCI index, you may infact see higher throughtput transactional workload. The picture below shows a table with NCCI. Think of it as just another index that is maintained automatically as DML transactions are run.

 

ncci-basic

While this all sounds well and good, there are essentially two challenges to be addressed. First challenge is how we can get good performance for analytics queries when the database schema has been optimized for OLTP? Recall, that traditionally the Data Warehouse (DW) use Star or snowflake schema to give optimal performance for analytics queries. The schema for OLTP databases is highly normalized (i.e. with minimal data duplication) which when used for analytic queries could lead to poor performance primarily because of complexity of joins between larger number of tables. The significant query speed up made possible by columnstore index (i.e. NCCI) can overcome the complexity of query and still deliver most analytics queries in few seconds. At this point, it is important to emphasize the analytics query performance with real-time operational analytics will not be as fast as you can get with a dedicated DW but the key benefit is the ability to do analytics in real-time. Some businesses may choose to do real-time operational analytics while still maintaining a dedicated DW for extreme analytics. There are customers who have deployed NCCI in production and have eliminated the dedicated DW.

 

The second challenge is how to minimize or eliminate the impact of analytics on transactional workload. Many customers we talked to had indicated in very strong terms that transactional system tracks their business transactions and any slowdown in transactional workload will impact their bottom line and therefore unacceptable. SQL Server 2016 provides customers multiple options described below to eliminate or minimize the impact of analytics on operational workloads (to be covered in futures blog in detail)

  • Filtered Columnstore index: This gives customer the ability to create NCCI meeting a filtered condition. For example, if there is a natural filtered condition that can identify rows that are cold (i.e. not or rarely changed), you can minimize the impact of NCCI. Note, SQL Server optimizer will transparently pick both the cold rows using NCCI and the ‘hot’ rows using btree index. A strong recommendation or a best practice is to have a clustered btree index on the column(s) used in the filtered condition to identify the ‘hot’ rows efficiently.
  • Compression Delay: If there no natural way to express a filtered condition, the SQL Server 2016 provides you with an option ‘CompressionDelay’ which you can use to specify a ‘duration’ when the data row(s) is ‘hot’. After the specified duration, the row(s) are now marked for compression into compressed rowgroups.
  • Offloading analytics to AlwaysON Readable Secondary: If you run your application in High Availability configurations using AlwaysOn. Using this configuration, you can offload your analytics to one or more Readable Secondary as shown below. In this configuration, the only analytics overhead at primary replica is the additional NCCI.

RTOC-AlwaysON

The key takeaway I want you to have is (1) you can enable real-time operational analytics in your workload by just creating an NCCI with no changes to your transactional workload (2) You have multiple options available to minimize the impact of analytics on your transactional workload. Please stay tuned for additional blogs on this topic

 

Thanks

Sunil Agarwal