SYSK 89: Hashing Explained...

Article
03/23/2006

So, you hear all the time – Hashtable, hash value, etc. You’ve seen GetHashCode method in Object type. But what is hashing?

Hashing is a process of applying one of many hash algorithms. It’s frequently used to create/use hash tables, which significantly increase search efficiency making it a constant time O(1) (compare that binary tree search, which is a function of O(logn)). A hash table is simply an array of data (the actual table where the data to be searched is stored) and a mapping function, known as a hash function.

http://en.wikipedia.org/wiki/Hash_function site gives the following definition:

A hash function or hash algorithm is a function for examining the input data and producing an output of a fixed length, called a hash value. Two different inputs are unlikely to hash to the same hash value.

There are many hash functions. You might’ve heard of secure hash algorithms such as SHA, or other like MD2, MD4, MD5, etc. But to understand it, let’s look at two simple examples below (as Damien Morton correctly stated in the comment below, these are not good examples for "real-life" usage, but they do convey the point in an easy to understand way):

public int DivisionHashCode(string data)
{
// Division method hashing
int result = 0;

    foreach (char c in data.ToCharArray())

    {
result += c;
}

result %= 2053; // let's assume our max string is 2048 chars... find the next prime number

    return result;
}

public int MiddleSquareHashCode(string data)
{
// Middle square algorithm, where M = 128, k = 7, w = 1
int result = 0;

    foreach (char c in data.ToCharArray())
{
result += c;
result = (result*result)>>6; // Thanks to Damien Morton for the correction of a bug in the original post

    }

    return result;
}

The returned hash value is, in essence, the index into the array of data. So, instead of searching for a match by comparing each key (linear search), or by leveraging the powers of the binary tree search, we simply convert the key (string) to a hash value (integer), which gives us an index into the data table. Then, it’s a simple data[index] operation to get the value…

While simplified, this, in essence, is how hashing works…

*** Special thanks to Damien Morton for the comment below ***

SYSK 89: Hashing Explained...

Additional resources