Basics of Encryption and Hashing

In the Basics of Transport Security article I wrote a few weeks ago, I introduced three different kinds of security that people care about for their messages. Let's look at the concepts behind implementing two of those types of security.

Confidentiality means keeping the contents of the message secret from unintended listeners. This is usually what people think of when you talk about security. The easily recognized way of providing confidentiality is message encryption. An encryption algorithm takes the message, called the plaintext, and transforms that into ciphertext. The information that randomizes the encryption process is called the key. Ciphertext is essentially indistinguishable from random data, and the ciphertext of a particular message is essentially indistinguishable from the ciphertext of a random message. Decryption is the process of returning the plaintext from the ciphertext.

When the encryption and decryption processes use the same key, that key must be a shared secret between the sender and receiver of the message. This is called secret key or symmetric cryptography. An alternative is to use separate keys for encryption and decryption, allowing you to publish your encryption key for other people to use. This means that anyone can send you a message without sharing a secret with you, but nobody else can read the messages intended for you. This is called public key or asymmetric cryptography. Breaking message security is easier for a given key length when using asymmetric cryptography. Public keys need to be larger in size than a shared secret to provide an equivalent level of protection.

A good encryption algorithm is resistant to finding the plaintext if you don't know the decryption key. A good encryption algorithm is also resistant to finding the key or remaining plaintext even if you know some of the plaintext for a particular message.

Integrity means having confidence that the message you received is the same as the message that the sender sent. An encryption algorithm does not provide any guarantees of integrity. If you were to modify the ciphertext of a message, then the receiver could still decrypt that message into some plaintext. You don't know how that plaintext would relate to the original plaintext message, if it does at all. For many encryption algorithms, the resulting plaintext from making changes to an encrypted message results in gibberish. However, if the receiver does not know what the message was supposed to look like, they may not realize that is has been tampered with.

A hashing algorithm takes an arbitrary length message and produces a fixed-length output that is characteristic of the message. This is also called a message digest. A hashing algorithm should be irreversible, meaning that producing any portion of the message should be impossible given only the message digest.

There are two measures of the effectiveness of a hashing algorithm. The first is how difficult it is to take a message digest and produce a message that hashes to that digest value. It's still bad if you can find a different message that hashes to the digest value than was originally used to create the message digest. Many password schemes apply hashing to the input password so that they don't have to pass around the original. Since the hashing algorithm is hard to reverse, knowing the password digest does not allow an attacker to gain access to the protected resource. However, any password that hashes to the same digest value would let the attacker access the resource.

The second measure of effectiveness for a hashing algorithm is how difficult it is to find any two messages that have the same digest value. This is called the collision-resistance of the hashing algorithm. It turns out that finding collisions is often much easier than reversing a message digest. This means that the digest strength of the hashing algorithm is frequently chosen based on how bad it would be if someone could produce collisions. I'll talk next time about how easy it is in general to find collisions for a hashing algorithm.

Next time: Math Behind the Hashing Birthday Attack