Why you shouldn't compress SQL Server data and log files

Most knowledgable SQL Server DBAs know you shouldn't compress a database's data and log files, but I'll wager that few know the details behind why that's such a bad idea.  The general reply might be "Performance" -- and that's correct -- but the assumption that the performance hit you take when you compress data/log files is due to compressing/decompressing data is only part of the story.  There's a lot more to it than that.

SQL Server abstracts all of the I/O for data and log files in its UMS component.  UMS (User Mode Scheduler) allows the engine to schedule work and carry out I/O without making direct Win32 API calls.  This design permits the engine to support fibers without having separate code lines for threads and fibers, and it allows the engine to run on versions of Windows that don't support asynchronous file I/O (Win9x, WinME) without having to have separate code lines for asynchronous I/O and synchronous I/O.  You'll recall that SQL Server leverages NT's ability to carry out I/O operations asynchronously and to perform asynchronous I/O in a scatter/gather fashion.  Windows 9x/ME doesn't support any of this.  By abstracting this away in UMS, the engine merely schedules the work it needs to do, and UMS carries it out in the most efficient way possible.

One nuance of Windows' asynchronous I/O facility is that, depending on the API you use, you may or may not actually get an async operation back from the OS.  IOW, although you request an operation to be carried out asynchronously, Windows may decide to run it synchronously and may hold up your API call until the operation completes.  The OS always makes the final decision on whether an async I/O request is honored.  What happens when Windows decides not to honor an async I/O request depends on the API.  For ReadFile/WriteFile, they simply block until the operation completes and return TRUE.  You have to check their return values and respond accordingly -- you can't write code that expects the operation to complete at some point in the future if the operation actually finished immediately.  For ReadFileEx/WriteFileEx, the API returns TRUE regardless of whether the action is carried out synchronously or asynchronously.  I'm not aware of an easy way to detect that an async I/O request has been carried out synchronously using ReadFileEx or WriteFileEx. 

On Win9x/ME, UMS automatically schedules all file I/O to run synchronously.  It knows that the version of Windows on which it's running does not support async I/O, and it schedules I/O operations accordingly.  For NT, it always attempts to run file I/O operations asynchronously, but, as I've said, Windows may have other ideas.

One circumstance in which Windows never honors an async I/O request is when the file it's reading or writing is compressed.  When calling ReadFile or WriteFile against a compressed file, Windows always runs the operation synchronously, regardless of whether the caller requested an async I/O operation.  That's right:  compressing a file disables an app's ability to read or write it asynchronously.

So, not only are you paying the obvious cost of compression/decompression when you compress database data and log files, you're also paying the cost of switching from async I/O to synchronous I/O -- a change that can make a huge difference with a high-speed RDBMS such as SQL Server.  That the operations work at all in this situation is a testament to SQL Server's robustness, but the performance hit you take is rarely worth it, and that doesn't even touch on how compression affects reliability and recoverability.  Hard drives are cheap; don't compress your data and log files unless you have no other choice.