How It Works: File Stream Compression with Backup/Restore

The question of file stream compression during backup and restore arose last week and makes for a good clarification topic.

Shown here is a conceptual layout of a database containing file stream containers.    On the left the folders represent the storage of the individual files used when varbinary(max) has the FILESTREAM attribute.   The FSAgent is the SQL Server component that presents the file(s) as a varbinary(max) to the rest of the engine.   Other parts of the SQL Server engine treat and act upon the data as a varbinary(max).

From my 1980's days on the Mac, I decided to use the Stuff-It icon to show the point of backup/restore compression.   The compression used by SQL Server 2008 is not Stuff-It and it is also not .ZIP or .RAR or .CAB so I wanted a more generic view.   Now that I have dated myself pay special attention to the compression location.

Compression/Expansion takes place as the data is streamed to and from the backup media.   The type of data does not matter to the backup facility.   SQL Server just seems a stream of bytes, compresses or expands them as required.

Compress

All data written to and from the SQL Server backup media can be compressed.   The only part of the media that is not compressed are the actual media headers.

The design is such that you can replace the folders in the picture with full text catalogs and the same data flow occurs for backup compression.

File Stream On Disk Storage

It is technically possible to mark the folders, used for file stream storage, compressed but not recommended by Microsoft.  You can review details of the WAL protocol and SQL Server in various Microsoft knowledge base articles.

Total Data Encryption and Database Compression

SQL Server 2008 introduces TDE and compression of the data.   This is done with a layer similar to the FSAgent in the SQL Server engine. When data is retrieved or store do a page a lightweight translation layer takes the appropriate steps to encrypt, decrypt, compress or expand the data for the rest of the SQL Server engine. 

Backup does not need to read individual rows on the page so when a compressed or encrypted page the entire 8K page remains compressed or encrypted and is sent to the backup stream.   The backup stream can compress the data stream but compression ratios are not as good when the data is already compressed.   When the data is encrypted the data patterns don't lend themselves to healthy compression rates, thus backup with compression is discouraged when TDE has been enabled.

Bob Dorr
SQL Server Principal Escalation Engineer