Composing Compression and Encryption

Encryption is counterproductive for compression if the two features aren't used together correctly. Generally, you want to compress first and then encrypt. This is the order that naturally happens when you compress at the encoding level and encrypt at the transport level. You tend to get disadvantageous results if you encrypt first and then compress. This order can happen when you encrypt early on, such as when you use message security with transport compression, or if you attempt to apply compression from outside the system after encryption has already taken place.

If you think about how a typical lossless compression method works, it is exploiting repeatability or non-randomness in the uncompressed content. Completely random content will tend to compress very poorly, possibly even growing in size, because there is no statistical redundancy to eliminate. On the other hand, completely predictable content will tend to compress very well. Text and many kinds of binary content that are not already compressed tend to at least be somewhat predictable.

Most secure encryption mechanisms will transform content such that the result is very close to being random. Statistical tendencies in the resulting encryption output could be a way to attack the encryption mechanism, possibly revealing the original content or even the secrets used for encryption. Therefore, the encrypted output tends to be more random than the original content. If the original content was fairly predictable, then this can cause a significant decline in compression effectiveness. This makes encrypted content a poor candidate for compression.

Next time: Getting Better Time Formats