When I first looked at encryption and hashing I found it to be the same. It was confusing. So here we would be discussing two topics 1) the basic difference between encryption and hashing. 2) How do we implement encryption and hashing in SQL server?
Recently I came across a scenario where we had to perform a one way conversion of the data and the result of the conversion needed to be the same every time it was converted to cipher text.
So the problem was something like this.
Consider that we have the name of a person i.e. let’s say “John”
And let’s say we had an algorithm called “x” for generating the cipher text.
So we need to apply the algorithm “x” to “John” which would result in some cipher text. But if you are using encryption, each time you apply “x” to “John” you would get different results (cipher text). I am trying to depict this using a diagram.
Case: We are using encryption.
So if you look at the output after we apply the encryption algorithm to “John” we get the cipher text as “XYZ” but when we again apply the encryption algorithm to “John” we get “ABC”. So the cipher text generated would be different every time you apply the encryption algorithm to the source text. But it is not the same case when we use hashing to generate the cipher text. The cipher text generated would be the same every time you apply the hashing algorithm. Let’s consider the same example. It would look something like this:
Here Y is the hashing algorithm that we are using to convert “John” to the cipher text. So if you look at the output of the hashing algorithm, you would see that the output is the same. Isn’t that amazing?
Yes it is amazing. But there are certain restriction or you can call it as limitations that hashing has. So now let me tell you about that.
When we convert a plain text to a cipher text using an encryption algorithm, we can always convert the cipher text back to the plain text or we can say that we can get our original data back. Let’s have a look at the diagram which would explain it.
Here X is a decryption algorithm.
But when we consider hashing, once we convert the plain text to a cipher text, it cannot be converted back to the plain text or the original data. The figure describes the scenario
So we cannot convert the cipher text back to the plain text when we are using hashing. There some other problems with hashing algorithms. I would be discussing that below.
So one of the problems which we should be concerned about is “Collision”. So when we are using hashing algorithm, we should try to make sure that we use the algorithm which would reduce collision.
So what is this “Collision”? When I heard about it I was confused. But let me try explaining collision here. Hash collision is a situation where two different inputs to the hash function result in the same output or the same cipher text. So it would look something like this:
So this is one of the biggest problems with hash functions.
SE, Microsoft Sql Server.
Technical Lead, Microsoft Sql Server.