Compression Strategies

In the previous blogs, I had described the data compression in SQL Server 2008 and its implications on CPU and IO. One of the question that keeps coming up is ‘Should I compress my database? Or Why  does the SQL Server not provide an option to compress the whole database’. I am not suggesting that you may never want to compress the whole database, but here are some basic guidelines that you should keep in mind.

First, why compressing the whole database blindly is not such a good idea?  Well, to give you an extreme example, let us say that you have a heavily used table T with say 10 pages in a database with millions of pages, there is no benefit in compressing T in this case. Even if the SQL Server could compress 10 pages to 1 page, you hardly made a dent in the size of the database and it added some CPU overhead instead.  Remember, the data needs to be uncompressed every time it is accessed.  Granted that in real life workload, the choices are not this obvious but it does hi-light the point that you must evaluate the pros/cons before you compress the data. Our recommendation is that before you compress an object (e.g. table index or a partition), you must look at its size, usage and estimated compression savings using sp_estimate_data_compression_savings stored procedure.  Let us look at each of these in some detail

·         If the size of the object is much smaller than the overall size of the database, then it does NOT buy you much in space savings even if most of it can be compressed.


·         If the object is used heavily both for DML and SELECT operations, then you will incur additional CPU overhead that can impact your workload especially if it makes it CPU bound.  You can use sys.dm_db_index _operational_stats to find the usage pattern of objects to identify which tables, indexes and partitions are being hit the most. In this case, special care is needed to evaluate if to compress these objects or not.


·         The compression savings are schema/data dependent and in fact for some objects the size after compression may be larger than before or the space savings is very insignificant. Cearly, if the compression saving is not much relative to the size of the object and/or of the database, you may not want to compress it.


Some common strategies: As you look at the strategies below, keep in mind that the index/data pages in buffer pool have compressed data and the SQL Server needs to uncompress the row/column, not the whole page, each time column/row(s) needs to be retrieved. The suggestions below should be taken as a guideline. You must test it with your workload and then decide what compression is right for you to minimize any surprises.

·         How do you choose between ROW and PAGE compression? If you have lot of DML activity, typical of an OLTP workload, generally ROW compression will be a better option as the cost of DML operations with ROW compression is much lower compared to PAGE compression.  


·         If you have a table with multiple indexes, only compress indexes that are used occasionally.


·         If you have an index that is used mainly for singleton lookup, it needs careful consideration for compression as the cost of decompressing data while traversing the index tree will be expensive. At minimum, SQL Server will to need uncompress key values for each level in the index tree. And if this index is a nonclustered index on a table with the clustered index, the clustered index tree will need to be traversed as well. Interestingly, when you compression an index with PAGE compression option, all internal index pages are compressed with ROW and only the leaf pages are compressed with PAGE compression. The reasons are (1) the number of non-leaf pages are quite small < 2% compared to leaf pages for a fanout of > 50; (2) it reduces the cost of uncompression for non-leaf pages. Yes, one can argue that by compressing the non leaf pages with PAGE compression, we can increase the fanout thereby reducing the height of the index tree which indirectly reduces the number of key comparisons. For now, the nonleaf pages are only ROW compressed.


·         If you have a partitioned table where data in some partitions is accessed occasionally, you may want to compress those partitions and associated indexes with PAGE compression. This is a common scenario with partitioned tables where older partitions are referenced occasionally. One example can by sales data partitioned by quarters across many years. Commonly the queries are run on the current quarter while data from other quarters is not referenced as frequently. When the current quarter ends, you can change its compression setting.


In the next blog, I will describe the DDL to enable/disable compression