Data Compression: Why Do we need it?

As announced in Tech-Ed 2007, data compression is a new and exciting feature targeted to be available in SQL Server 2008. This is a huge topic to be covered in one BLOG post, I have decided to break it into a series of posts, each building on the previous ones. If you are interested discussions on any specific topic on data compression, please send me a note and I will make sure that I include a discussion on it in my subsequent BLOGs.

 

This is the first in the series of BLOG entries for data compression feature in SQL Server 2008.

Starting with SQL Server 2008 release, SQL Server is targeted to support native data compression. It will allow you to compress a table, or an index or one or more of their partitions to save space on the disk. I will cover about the details on these compressions in later BLOG entries, but let us start with one fundamental question: Why Compression? This may very well be a non-question for our customers who have been waiting for this feature for a while. But still, it is useful to visit or re-visit the reasons why data compression is needed or useful.

One obvious reason is to save the cost of disk. While this may seem strange given that the disks are cheap. We all can buy 100s of GB disk for less than $100.00. So what all this fuss is about? First reason, for the uninitiated, is that the disks used for high-end systems are not cheap. Secondly, there is rarely a single copy of the production data. For example, if you are using high availability features like replication, log shipping or mirroring, you will have at least one more copy. Now what about test environment? There is one copy there as well. All these add up to the cost of hardware. Third, how about backups? If you have many backups of your database over time, just multiply the cost. Agreed that backup can be stored on less expensive media but still there is some cost associated with it.

Second reason is the cost of managing the data. Larger the database, it takes longer to do the backup, recovery. Similarly for running DBCC commands, rebuilding indexes, and bulk import/export. Clearly if these commands are IO bound, then if we could reduce the size of the data, they will run faster. Even better, they will have less of an impact on the concurrent workload in your system. One interesting point is that if your database is compressed, the backup will be automatically smaller. So how does this relate to backup compression, a new feature in SQL Server 2008. Well, as you will see in subsequent BLOG posts, backup compression is orthogonal to data compression and can be used together.

Third reason is memory. Don’t we all wish we had more memory on our servers? Well, if the data is compressed, you can fit more data in the same memory. So if you could compress the data 50%, then suddenly you have increased your memory 100% (i.e. you can fit double the size of the data). Is this not fantastic? Even if you have 64-bit machine with the capability for huge amount of addressable memory, the databases, for most customers, is many order of magnitude larger than the memory. So compression will benefit even for servers running on 64-bit architecture. Clearly your IO bound workloads are likely to see increase in throughput with data compression as there is less data to be read.

Hope this has created some excitement in you for the data compression feature. More in the future BLOGs.

Next Blog https://blogs.msdn.com/sqlserverstorageengine/archive/2007/09/30/data-compression-techniques-and-trade-offs.aspx