In part 1 of the series we discussed Costs and Benefits of data compression together with some Best Practices to successfully compress your database. In this post we will walk you through the planning of Data Compression, including Pre-requisites Validation and 3 steps to identify the best candidates for compression:
1. Estimate the space saving per object
2. Find the best compression type per object
3. Make a join for both outputs
Planning is the crucial part of Data Compression process. It is recommended to carefully analyze the feasibility of the compression in your environment and estimate its benefits, at least the space saving after compression.
1. SQL Server Edition
Before going further and compress your production database, you should validate if all your environments, including pre-production, User Acceptance Test and Development are running under Enterprise/Developer edition. A database containing compressed tables or indexes cannot be restored to an instance of any other editions.
You can use a table like the following, to do an inventory and make sure all environments meet the prerequisite:
|Environment||SQL Instance Name||SQL Server Version and edition||Compression is possible|
|Production Primary||SQL 2012 SP2 Enterprise||Yes|
|Production – Disaster Recovery||SQL 2012 SP2 Enterprise||Yes|
|User Acceptancy Test||SQL 2012 SP2 Enterprise||Yes|
|Development||SQL 2012 SP2 Developer||Yes|
2. CPU resource requirement
Compression require extra CPU resource on database server, to compress and decompress data on the fly when data are exchanged with applications. It is hard to estimate what will be the increase of CPU usage. In theory, you can roughly expect 10% to 15% of additional CPU consumption. But in our experience, we never observed a noticeable increase of CPU after compressing the production database. However, it is recommended to establish a baseline of CPU usage over a significant period to make sure you have a room for additional CPU required by compression and decompression of data.
As an example, the following figure illustrates a CPU usage baseline of a database server over a period of one month. It is clear we have enough CPU margin to handle the compression overhead:
SPACE SAVING ESTIMATION
The compression efficiency will depend on the type and the distribution of data in tables and indexes. It is important to identify which tables/indexes will yield more space saving after compression. In general, top 20% of tables/indexes represent 80% of space saving.
The space saving estimation is done using the stored procedure sp_estimate_data_compression_savings detailed here.
This stored procedure estimates the compression rate of one index passed in the parameters. To estimate the space saving after the compression of the entire database, you need to build a script that iterates through all tables/indexes and produces a list of all tables/indexes with their respective compression ratio, like in the bellow example:
That list will help you to decide which index to compress, from space saving perspective. For further usage, it is better to save the output in a temporary table like in the step 2 of the sample script databaseCompressionPlanning.sql. (temporary tables need to be created before as per step 1). In this sample, the temporary table is named #Estimation_row_page.
APPLICATION WORKLOAD CONSIDERATION
As you will discover after the previous step regarding the amount of space saving estimation, ROW compression produces less space saving than PAGE compression. But its CPU overhead is minimal (less than 10%).
PAGE compression, on other hand, produces a high compression ratio but at the price of a higher CPU overhead.
The best approach to decide what to compress and how to compress is to analyze the workload characteristics for each table and index. It is based on the following two metrics:
- U: The percentage of update operations on a specific table, index, or partition, relative to total operations on that object. The lower the value of U (that is, the table, index, or partition is infrequently updated), the better candidate it is for page compression.
- S: The percentage of scan operations on a table, index, or partition, relative to total operations on that object. The higher the value of S (that is, the table, index, or partition is mostly scanned), the better candidate it is for page compression.
You can use the DMV sys.dm_db_index_operational_stats to determine the operational statistics of the tables/indexes.
You can build a script which list all objects (tables, indexes) with their percentage of updates and scans, like in the bellow example:
For further usage, it is better to save this output in a temporary table like in the step 3 of the sample script databaseCompressionPlanning.sql. In this sample, the temporary table is named #Update_scan_ratio.
Note1: DMV data are reset every time the SQL Server instance is restarted. To get accurate operational statistics, you should query the DMV on a server with at least one month of uninterrupted activity.
Note2: Inserts to append-only (inserted at the end of the table) tables that are rarely used, do not have much overhead for page compression. Such tables may perform well with page compression even if S is low. Examples of such tables are logging or audit tables, which are written once and rarely read. These are good candidates for page compression
DECIDING WHICH OBJECTS TO COMPRESS AND WITH WHICH LEVEL OF COMPRESSION (PAGE vs. ROW)
Finally, we need to join the output from space saving estimation and the output from updates vs scan analysis. If you have saved each output in a temporary table, it will be easy to join them and generate an output list which shows for each object (indexes, tables): the page compression gain, the row compression gain, the percentage of updates, the percentage of scan. See the step 4 of the sample script databaseCompressionPlanning.sql.
With these statistics, you can now decide which object to compress and whether to use page or row level compression.
The general rule is to compress objects which are estimated to yield the highest compression ratio. In our experience and for Dynamics AX workload, we compress all objects which are estimated to produce at least 25% of space saving.
Regarding the choice of Page vs Row compression, the general rule is to compress at Page level all objects with low percentage of updates (U) and high percentage of scan (S). And to compress at Row level all other objects.
The bellow strategy is from our experience in the Dynamics AX context:
- compress at Page level, all objects which has less than 20% of updates ratio(U) and with an estimated space saving of more than 25%.
- compress at Row level all objects which has more than 20% of updates ratio and with an estimated space saving of more than 25%.
With this strategy, we generally achieve up to 70% of overall space saving. We regularly see 2TB databases being compressed to 700GB, or 1TB being compressed to 300GB. What's remarkable is that these compressions not only have been done without any noticeable negative impact on CPU utilization but with a noticeable IO reduction.
See part 3 of this blog series: How to compress Dynamics AX databases via SQL Server or Dynamics AX here.
Premier Field Engineer Dynamics AX
Senior Premier Field Engineer SQL Server