How rainbow tables work

This year, I heard the term “rainbow tables” in regards to passwords and how hackers use them to break encrypted passwords.  I originally looked it up on Wikipedia but I couldn’t understand what I was reading (now that I know what they are, it makes a lot more sense).

I did a few Internet searches and the best one was this one at kestas.kuliukas.com (from which this post borrows heavily).  But I thought I’d toss my own hat into the mix and talk about how (I think) they work.

A rainbow table is a way of doing cryptanalysis very quickly and efficiently.  Suppose that you are a hacker and you have acquired a database of usernames and encrypted passwords. What are you to do? You can’t just go and logon to the user’s account because to do that, you need to know his password in clear text.

For you see, many web portals today do not store passwords in their backend databases in clear text (the ones built by amateurs still do).  Instead, they encode the password using a hash function, which is basically a way of condensing a given set of data into a condensed string.  For example, the MD5 algorithm encrypts password “MyPassword” as 48503dfd58720bd5ff35c102065a52d7 for the user terry@example.com.

If you, as a hacker, got the password above, you wouldn’t know what my password is just by looking at it. 

image

Instead, you’d refer to your handy-dandy rainbow table (you’re trying to do this under the radar and won’t look at this md5 reverser site).

You see, you could just take an MD5 hash of every word in the English language, waiting until you’d got a hit.  But that would take forever.  Besides, you’d also have to hash substitute letters like “MyP@ssword”.  And man, to do this for every password?

Or, you could do a little bit of work ahead of time and pre-compute every word in the English language and sort the results, and then store them in a database.  That way, the hash that you acquire you could simply look up in your table.  Since you’ve already pre-sorted the database, this lookup is pretty quick.  But that also takes a lot of processing power and takes up a lot of disk space.

image

Or, you could create a rainbow table.  A rainbow table works by efficiently compressing a table of pre-computed hashes.  You still have to do a lot of pre-computing of the hashes, but it doesn’t take nearly as much disk space to store everything.

Rainbow tables work by taking a hash of a string of text, and then “reducing” the hash to create a new string, and then reducing it again.  There are various ways to reduce a string of text, but let’s suppose you want the first 8 characters because that’s what many passwords are.

Okay. You want to create a rainbow table of hashed passwords and you know that the most common one is 12345678:

hashMD5(12345678) = 25d55ad283aa400af464c76d713c07ad

The first 8 characters of that are 25d55ad2.  This is called “reducing the hash.”  Then you take another one:

hashMD5(25d55ad2) = 5c41c6b3958e798662d8853ece970f70

The first 8 characters of that are 5c41c6b3:

hashMD5(5c41c6b3) = 4e3d15bea09e7455aa362d3cf89838fc

The first 8 characters of that are 4e3d15be:

hashMD5(4e3d15be) = f439b57135045b9d365dd867fca1e316

And so forth.  You pick a criteria to stop at (let’s say 100,000 hashes) at which point you have a chain of 100,000 MD5 hashes and the last one is 78ba340e23a492704f63e5239be65495.  But you don’t store 100,000 hashes.  Instead, you store the first plain text and the last (100,000th) hash.  This represents the first chain in your rainbow table.  It’s a chain that represents 100,000 hashes but only stores two values.  In this example, the first stored chain is:

12345678 –> f439b57135045b9d365dd867fca1e316

You would then begin a new starting point to begin the next chain.  It could be a reduction of the previous hash, or some other criteria.  Suppose for our example we had 25 chains.  This represents 25 x 100,000 = 2.5 million hashes.  It might look something like this:

Chain 1: 12345678 –> f439b57135045b9d365dd867fca1e316
Chain 2: fca1e316 –> 8c03e407bff63d06cd1b04d415678d51
Chain 3: 15678d51 –> 9673c8a82554a9baf8a1539a94b3c407

Chain 25: a9baf8a15 –> 38a44053779a4f40c6791e90376e207d

With this, you now have a pretty good table of hashes but you’re barely using any disk space – only 25 entries.  You could store it in a text file!  You’re now ready to start searching. 

Okay, you’ve got your hashed passwords that someone posted to PasteBin.  Below is a flowchart of the process.  You start off with the hashed text (the password) and start checking to see if it exists in your database.  If so, you go to the start of the chain and start hashing until you get a match.  And you will get a match because you’ve got it already in your database.

Of course, if the database isn’t big enough you won’t get a hit to begin with.  But if it is big enough, this is how you crack through it.

image

At worst, you only have to hash and reduce 100,000 MD5 hashes.  Because MD5 is a fast algorithm, that won’t take very long on modern processors to iterate through that chain.

That’s how rainbow tables work in a nutshell, but it doesn’t address the problem of hash collisions which is a pretty significant problem to overcome.  But I’ll get into those in a future post.