Hi, Chetan Bhat here. I’m a developer with the Security Tools Team.
In this post I will talk about common mistakes developers make when when using hash functions. Any hash function is required to meet the following two requirements.
- It must be easy to calculate for any possible message.
- It must return a fixed-size bit string irrespective of the length of the message.
Broadly speaking there are two types of hash functions: non-cryptographic hash functions and cryptographic hash functions. A cryptographic hash function is expected to also meet the two additional requirements below:
- It must be practically impossible to find the message given the hash (pre-image).
- It must be practically impossible to find two different messages with the same hash (hash collisions).
Developers often confuse between these two kinds of hash functions which leads to following problems.
- Using non-cryptographic hashes in place of cryptographic hashes has serious security consequences.
- Using cryptographic hashes in place of non-cryptographic hashes has severe performance problems (either memory or speed).
The .Net BCL provides both cryptographic and non-cryptographic hash functions. Cryptographic hash functions such as MD5, SHA-1, SHA512 etc. that are available in the System.Security.Cryptography namespace are primary designed for security purposes (like digital signatures). On the other hand, non-cryptographic functions like Object.GetHashCode() is not meant for security considerations but can be used for performance reasons (like fast string comparisons and hash-tables).
The rest of the blog post will demonstrate the fact that Object.GetHashCode is not a cryptographic hash and how it (and other low bit count hashes) can be defeated.
Birthday attacks are used to defeat the “hash-collision” requirement for crypto-hashes with small bit lengths. It is based on the theory of the Birthday Paradox, commonly seen in Probability Theory textbooks. The “paradox’ comes from the fact that in a group of 41 randomly selected people, there is more than a 90% chance that at least two people share the same birthday. Also, In a group of 57 people, there is more than a 99% chance that at least two people share the same birthday.
The fact can be easily extended to hash functions too: For a hash algorithm with bit-size N with uniform distribution, a set of 2^(N/2) random hashes (hashes of random data) has a very good probability that at least two hashes in the set are the same.
In our case, the GetHashCode returns an Int32 which makes N = 32. So, effectively it takes just 2^(32/2) = 65536 hashes of random data to find a hash collision (which is BAD and NOT SECURE). However it still takes 2^32 = 4294967296 iterations or random trials for a successful pre-image attack.
Here’s a C# program which find collisions on the String.GetHashCode() function. Here’s how it works:
- The program randomly generates 100,000 strings of fixed length (in the order of 65,536 but improves our chances)
- Hash of each of these strings is calculated by calling String.GetHashCode() and is stored along with the string in a list.
- The list is then searched for duplicate hashes (collisions) and strings with colliding hashes are reported via the command line (String –> Hash format).
- If duplicates are not found, we try again (we are bound to find one or more collisions very soon).
Here’s a snapshot of a sample run (results would vary between runs as the program uses random numbers).
As a conclusion, I’d like to emphasize the fact that when dealing with real-world problems, developers need to carefully assess the requirements of security and performance and appropriately use crypto and non-crypto hashing functions. After all, security (or performance) of an application is only as strong (or as fast) as its weakest (or slowest) link.