Indexing encrypted data

Encrypted data and indexes


One thing I have been asked many times is how to create an index on top of encrypted data in SQL Server 2005.  In SQL Server 2005 the encryption functions are nondeterministic, which means that every time a function is called, the output will be different, even if exactly the same input was used; for example:





DECLARE @x varbinary(8000)

DECLARE @y varbinary(8000)



SET @x = EncryptByKey( key_guid( ‘key1’), ‘Test’ )

SET @y = EncryptByKey( key_guid( ‘key1’), ‘Test’ )

IF ( @x = @y )

    PRINT ‘ERROR: EncryptByKey returned the same output twice!!!!’


    PRINT ‘EncryptByKey returns different results every time it is called’






 As there is no way to predict the outcome of an encryption builtin call, there is no way to establish any relationship between two different values being encrypted. Therefore, creating an index on top of encrypted data will be of no use at all. It is possible to decrypt the data for the lookup operation, but it would be too expensive as it will result in a linear search. For example:


— …

— The following select will always return 0 rows,

— because EncryptByKey will never return the same value twice

SELECT CONVERT( nvarchar(20), DecryptByKey( SSN )) as SSN_decrypted,

        * FROM t_Customer

        WHERE SSN = EncryptByKey( key_guid( ‘key_Encryption’ ), N’111-11-1111′ )



— The following SELECT statement will work, but it

— will be very expensive, because it will result in a linear search

SELECT CONVERT( nvarchar(20), DecryptByKey( SSN )) as SSN_decrypted,

        * FROM t_customer

        WHERE DecryptByKey( SSN ) = N’111-11-1111′



  This raises a question: why did we choose to have a nondeterministic encryption function? In order to answer it, I want to start with a simplified explanation on how block ciphers work (i.e. DES or AES), which I hope will help explain this property better.


Block ciphers


  As the name implies, block ciphers work on blocks of plaintext/ciphertext, and every time the same plaintext is encrypted by the same key, the resultant ciphertext will always be the same. This raises some security considerations, such as how can we conceal patterns in the plaintext or how can we reuse a key to encrypt more than one plaintext.


  One mechanism that is used to address these issues is the use of the block cipher in different “cryptographic modes”, which usually consist on some feedback mechanism and simple operations such as XOR.


  The most simple of the cryptographic modes is called Electronic Codebook mode (ECB) and it consists of dividing the plaintext in blocks and then encrypting them independently. Sounds simple and it is deterministic, but unfortunately, this cryptographic mode allows an attacker to create a code book without any need to access the key. Imagine the following scenario: You are trying to encrypt a social security number of the form “xxx-xx-xxxx”, if we were using TRIPLE_DES, the block length would be 8 bytes, therefore the ciphertext would really be divided in two blocks:  Encrypt( key, “xxx-xx-x” ) + Encrypt( key, “xxx” ). An attacker can easily find out that any SSN that starts with the same numbers will always result in the ciphertext for the first block of ciphertext. It is not recommended to use this mode, especially if there are potential patterns to conceal or if the key is going to be used to encrypt more than one plaintext.


  The cryptographic mode we decided to use in SQL Server 2005 is Cipher Block Chaining mode (CBC). This mode uses the previous ciphertext block to feed the next one, by XORing the previous ciphertext block with the current plaintext block before encrypting it, and it repeats this for every block. But what should we do with the first block? As there is no previous ciphertext block, we need to initialize the system with something else; in this case, we will use an Initialization vector (IV). How do we choose an IV? If we select a fixed IV, then the first block will always encrypt to the same ciphertext and all subsequent blocks will be the same until we hit the first difference between 2 plaintexts. To avoid this, a randomly generated IV is used every time and it is included as the first block of the ciphertext in order to be able to decrypt it back.


How to perform lookups on encrypted data


     At this point, you are probably asking: “Good, thanks for the information, but I still need to create an index on encrypted data! Any suggestions?”. Well, actually, yes we have some ideas, and I really hope they can be of some help.


   First of all, we need to emphasize that in order to be able to index something we need to give away some information about it. How much information? The answer may be different for every situation, but here are some ideas as well as some pros and cons for each one of them. The basic idea behind the following suggestions is to create an additional column on which we will create the index.


   For the reminder of this article, we will assume the problem of indexing data that is used to uniquely identify entities in a table, for example, credit card numbers identifying customers or Social Security numbers identifying patients. For these scenarios, the data that we want to index is a primary key and is used as the identification method. Below we describe several suggestions for how to address this problem.


  Create a new identifier value


  It may be possible in some cases to use a column with an identity value completely unrelated to the data we are trying to protect, instead of using the protected data; for example, we could create a unique customer ID instead of using a social security number for identification.


  Unfortunately, because of the business logic and existing applications, this approach is not always possible.


   Index a hash of your data


   The idea is simple: create a new column to store the hash (for example, SHA1) of the plaintext data and then index this column.


   As the hash functions are deterministic, this sounds like a good approach, but its simplicity is also its worst enemy. An attacker can potentially create a dictionary with all possible values of plaintext offline and then she can just do a simple lookup of the indexing column and correlate that with her dictionary.


   Index a MAC of your data


   We can use a MAC (Message Authentication Code) of the plaintext to create a new indexing column. This approach is similar to using a hash, but it requires a secret key to calculate the MAC. This prevents an unauthorized user to use a general purpose dictionary of hashed values and it will also prevent her from creating a targeted dictionary without having access to the MAC key. Please note that an attacker who has access to the MAC key can generate a targeted dictionary.


  SQL Server 2005 doesn’t provide a function for computing a MAC, but it is possible to write a user-defined function that calculates a MAC using either SQL Server 2005 CLR or by reusing the existing builtin functions.


  Sample code for indexing a MAC


  I would like to include a short demo that hopefully will help to understand this better. I strongly recommend reading more about hashes and MACs – one good source you can use is Bruce Schneier’s Applied Cryptography (


  First, I create a certificate to protect my symmetric key and I protect the certificate itself using the DB master key (DBMK). This will allow me to control who can access the protected data by giving the proper permissions on the certificate and symmetric key without the need to memorize or hardcode any password.



*      This posting is provided “AS IS” with no warranties, and confers no rights.


* Authors:             Raul Garcia

*                      Laurentiu Cristofor

* Date:         11/02/2005

*                      09/01/2006

* Description:


*  Create an index based on a MAC (message authentication code)

* to be able to do SEEK operations on a table that contains

* encrypted data that is typically used as an index, such as SSN

* or credit card numbers.

*   It also demonstrates a few other topics such as EXECUTE AS.


* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *    * Security note:  

* This demo is intended to demonstrate how to protect data

* at rest, while trying to keep it easily accessible to users while

* the SQL Server instance is running; the root for the key hierarchy

* in this demo is the service master key (SMK).


*     (c) 2005 Microsoft Corporation. All rights reserved.




— This certificate will be used to protect the data-encryption key

CREATE CERTIFICATE cert_ProtectEncryptionKeys WITH SUBJECT = ‘Data encryption key protection’



— This key will be used to protect our plaintext data




  I will generate a random value as a MAC key. Also, because we may want to use this mechanism for more than one table, we would want to be able to store a MAC for each column that we index this way. So I will create a table to store all the MAC keys, as well as a certificate and a symmetric key to protect them.  I will be using the table id as a primary key for the MAC keys. This will allow me to keep track of one MAC key for each table. Of course, if we would need to have several MAC keys per table, because we might want to index several encrypted columns, then we would need to use a column id in addition to the table id, but for the purpose of this demo, I am keeping it simple.


— This is the certificate that will protect our MAC key-encryption key

CREATE CERTIFICATE cert_ProtectIndexingKeys WITH SUBJECT = ‘Data indexing key protection’



— This key will be used to protect the MAC keys




— This table will store the encrypted MAC keys for all tables

CREATE TABLE t_MacIndexKeys( table_id int PRIMARY KEY, Mac_key varbinary(100) not null )



I will also define the MAC user-defined function. The function itself is quite simple, and I would strongly suggest you to review this code and my notes in it and modify it as needed. The MAC function that I am using is defined as SHA1( plaintext + key ). Also note that this function may return null in case of an error.


— This function is used to generate a MAC based on the plaintext.

— Parameters:

      @Message       Message we want to calculate the MAC for.

      @Table_id      The table id (object_id),

                     this value will be used to find the corresponding

                     MAC key from the t_MacIndexKeys table

      return value: a varbinary(24) MAC if succeeded, null otherwise

— NOTES:  This function will run under the DBO context to be able

— to access the required keys and calculate the MAC

 Only grant execute permission to authorized principals

— who need to access the protected data and/or use

— the module signature feature to grant access via an application.

 This function relies on the DBMK being available;

— if the DBMK is not protected by the SMK, the caller must

— open the DBMK explicitly before calling this function.

— SECURITY NOTE:   A potential attacker with access to the database

— and with execute permissions on this function while the system

— is online and the DBMK is opened can generate a dictionary with

— all the possible values for the protected data

— and its corresponding MAC based on the table id.

 By using a different key for every table, the attacker

— may be slowed down a little bit, depending on the protected data

— domain space. A good advice is to monitor irregular activity

— on this function.

CREATE FUNCTION MAC( @Message nvarchar(4000), @Table_id int )

RETURNS varbinary(24)




        declare @RetVal varbinary(24)

        declare @Key   varbinary(100)

        SET @RetVal = null

        SET @key    = null

        SELECT @Key = DecryptByKeyAutoCert( cert_id(‘cert_ProtectIndexingKeys’), null, Mac_key) FROM t_MacIndexKeys WHERE table_id = @Table_id

        if( @Key is not null )

               SELECT @RetVal = HashBytes( N‘SHA1’, convert(varbinary(8000), @Message) + @Key )

        RETURN @RetVal




The following stored procedure will be used to generate a new random varbinary value that we can use as a MAC key. As we don’t have any cryptographic random byte generator builtin available out of the box, we will use the Initialization Vector (IV) of the EncryptByKey builtin to generate entropy on our new varbinary MAC key. I also strongly suggest to review this code and to modify it as needed.


 This SP creates a new randomly generated MAC key

— for the table specified by table_id

— Parameters:

      @Table_id      The table id (object_id) related to the MAC key.

 NOTE:  This SP will run under the DBO’s context

— to be able to access the required keys and tables.

 It will not return any new or existing key blob back,

— but it is recommended to only grant execute permissions

— to trusted principals who need to be able to create new MAC keys.

CREATE PROC AddMacForTable @Table_id int



        declare @Key    varbinary(100)

        declare @KeyGuid uniqueidentifier

        SET @KeyGuid = key_guid(‘key_Indexing’)

        — Open the encryption key

        — Make sure the key is closed before doing any operation

— that may end the module, otherwise the key will

— remain opened after the store-procedure execution ends

        OPEN SYMMETRIC KEY key_Indexing DECRYPTION BY CERTIFICATE cert_ProtectIndexingKeys


        — The new MAC key is derived from an encryption

— of a newly created GUID. As the encryption function

— is not deterministic, the output is random

        — After getting this cipher, we calculate a SHA1 Hash for it.

        SELECT @Key = HashBytes( N‘SHA1’, ENCRYPTBYKEY( @KeyGuid, convert(varbinary(100), newid())) )


— Protect the new MAC key

        SET @KEY = ENCRYPTBYKEY( @KeyGuid, @Key )


        — Closing the encryption key

        CLOSE SYMMETRIC KEY key_Indexing

        — As we have closed the key we opened,

— it is safe to return from the SP at any time


        if @Key is null


               RAISERROR( ‘Failed to create new key.’, 16, 1)


        INSERT INTO t_MacIndexKeys VALUES( @Table_id, @Key )



And finally, we create a sample table with a Social Security Number column (SSN) that we want to use as the primary key. The table has been modified to have 2 columns for storing the SSN:

* SSN_cipher is used to store the SSN in an encrypted way, such that someone with the right key can recover all the SSNs from the table.

* SSN_index is used to store MACs based on the SSN and on this table’s MAC-key. This column will be used for identity lookups.


— Table encryption example:

CREATE TABLE t_RawCustomer(

        — MAC-based index (SSN)

        SSN_index      varbinary(20) PRIMARY KEY,   

        — ciphertext (SSN)

SSN_cipher     nvarchar(60),                        

        — Two extra columns with information.

name    nvarchar(max),

        ExtraData      nvarchar(100)




We also have to call the stored procedure that we created, to generate and store the new table MAC key.


— Create a new MAC key for this table

declare @objid int

SET @objid = object_id(‘t_RawCustomer’)

EXEC AddMacForTable @objid



Now I will create a few objects that will make it easier to work with the data table. These objects are not intended to increase the security of the system, only its usability. Let’s start with a view that will display the SSN in plaintext form when the viewer has proper access.


— Now, let’s create an easy to use view for the table

— NOTE: as we are using DecryptbyKeyAutoCert to decrypt the data,

— it is not necessary to explicitly open the key

      Also note that this means that in order to see

— the plaintext value for SSN, the caller requires permission

— to access both the certificate’s private key and the

— data symmetric key.

CREATE VIEW v_Customer




        — We want to leave the index in the view as

— it may be useful for SEEK operations

        SSN_index as SSN_Id,

        — Use the certificate to automatically open the encryption key.

        — Additionally use the index (MAC(k1, PT)) to verify the

— decryption and prevent data tampering such as

— copying encrypted values from one row to another

convert( nvarchar(15), DecryptbyKeyAutoCert(

cert_id(‘cert_ProtectEncryptionKeys’), null,

SSN_cipher, 1, SSN_index)) as SSN,

        — the rest of the data will remain the same

        Name, ExtraData FROM dbo.t_RawCustomer



Next, we create a trigger that will fire during an insert in the base table. This trigger expects that the SSN_cipher value will be a SSN plaintext and will perform the MAC calculation and encryption on it. The trigger will validate that the SSN is not null, as well as verify that the caller can generate both MAC and ciphertext values based on the keys being used.

Note that, for inserting data using our trigger, it is a prerequisite to open the key that is used to encrypt the data.


— Intercept the inserts and make sure the inserted data is properly generated

CREATE TRIGGER trig_ProtectSSN on t_RawCustomer



        declare @Index varbinary(24)

        declare @KeyGuid uniqueidentifier

        declare @Cipher nvarchar(60)


        if( select count(*) from inserted where SSN_cipher is null ) > 0

               RAISERROR( ‘Cannot store null as protected data. ‘, 16, 1)



               SET @KeyGuid = key_guid(‘key_Encryption’)

               SELECT @Index = dbo.MAC( SSN_cipher,

object_id(‘t_RawCustomer’) ) from inserted


               if( @Index is null

 OR @KeyGuid is null

 OR encryptbykey( key_guid(‘key_Encryption’), 0x00)

     is null )


                       RAISERROR( ‘Cannot Insert protected data. Either the encryption or indexing keys are not available or the indexing key is not valid for MAC generation.’, 16, 1)



                       INSERT INTO dbo.t_RawCustomer select


                         encryptbykey( key_guid(‘key_Encryption’),                                              SSN_cipher, 1, @Index ),

                         Name, Extradata

                       from inserted




In a similar way, we will create a trigger that will fire during updates and that will prevent arbitrary changes on the SSN fields (cipher or MAC).


— Intercept any attempt to modify the RawData table

— and prevent anyone from modifying the cipher values

CREATE TRIGGER trig_ProtectSSNUpdate on t_RawCustomer



        if( COLUMNS_UPDATED() & 3 ) > 0

               raiserror( ‘Cannot update protected columns. Drop the row and create a new one with the updated information.’, 16, 1 )



               UPDATE t_RawCustomer

SET Name = ins.Name, ExtraData = ins.ExtraData

FROM inserted ins, t_RawCustomer orig

WHERE ins.SSN_index = orig.SSN_index




Here is a usage example :


— test: Try to insert without opening the key; this should fail

insert into t_RawCustomer values ( null, N‘111-11-1110’, N‘Customer 0’, N‘New data’ )



— test: Open the symmetric key before we can use it

OPEN SYMMETRIC KEY key_Encryption DECRYPTION BY CERTIFICATE cert_ProtectEncryptionKeys



insert into t_RawCustomer values ( null, N‘111-11-1111’, N‘customer 1’, N‘New data 1’ )

insert into t_RawCustomer values ( null, N‘111-11-1112’, N‘customer 2’, N‘New data 2’ )






— Compare the raw table with the view

select * from t_RawCustomer

select * from v_Customer



— Select using the SSN in plaintext.

— Note that this will require a full table scan and is inefficient.

— We cannot make this query take advantage of an index.

SELECT * FROM v_Customer

        where SSN = N‘111-11-1112’



— Now select using the SSN_id column, which will use

— the index that we have set up for this column.

SELECT * FROM v_Customer

        where SSN_Id = dbo.MAC( N‘111-11-1112’, object_id(‘t_RawCustomer’) )



And finally, here is a Least Privileged User example. These are the minimum permissions that a low-privileged user will need for different operations on the table/view that we created. Note that this is not an exhaustive list of possible operations or permissions on our table; it is just an example of some new and interesting permissions you may want to get familiar with. I strongly suggest to review the SQL Server 2005 documentation for these objects to get a complete list of permissions. In this particular example, I want to emphasize the following operations and permissions:

  • Perform identity lookup queries (i.e. SSN = ‘xxx’ ):

    • GRANT SELECT ON v_Customer to LUA_principal

    • GRANT EXECUTE ON dbo.MAC to LUA_principal. The SP will run under elevated context, therefore no explicit permission on the symmetric key protecting the MAC-keys is needed.

    • GRANT VIEW DEFINITION ON dbo.t_RawCustomer to LUA_principal. This principal will need o access the base table index.

    • NOTE: With these set of permissions, the principal can select from the view v_Customer, but cannot recover the SSN plaintext.

  • Insert a new record

    • All of the Identity lookup queries permissions

    • GRANT INSERT ON t_RawCustomer TO LUA_principal

    • GRANT VIEW DEFINITION ON SYMMETRIC KEY::key_Encryption TO LUA_principal. Needed for the key_guid operation.

    • Grant some way to open the symmetric key. In the example we show 1 mechanism.


— Create a database-only principal for testing purposes




GRANT SELECT ON v_Customer to LUA_principal



— We want this principal to be able to perform indexed queries,

— so we need to grant him permissions on the MAC procedure

GRANT EXECUTE ON dbo.MAC to LUA_principal



— We also need to grant the principal view definition on the

— base table, so he can obtain its index

GRANT VIEW DEFINITION ON dbo.t_RawCustomer to LUA_principal



— Run a simple test as this new principal

— You will notice that the SSN column shows only NULLs.

— This is because this principal has no access to the encryption key

EXEC (‘select * from v_Customer’) as USER = ‘LUA_principal’

— Note the new principal has no permissions at all on the base table!

EXEC (‘select * from t_RawCustomer’) as USER = ‘LUA_principal’



— Now run a select with a WHERE clause

— As it is possible to execute the MAC function,

— and the caller knows exactly what SSN he is looking for,

— it is possible to run the query.

— Notice that the SSN column shows NULL.

EXECUTE AS USER = ‘LUA_principal’


SELECT * FROM v_Customer

        where SSN_Id = dbo.MAC( N‘111-11-1112’, object_id(‘t_RawCustomer’) )





— Now let’s give the LUA permission to update the data

GRANT UPDATE ON v_Customer(ExtraData) TO LUA_principal



EXECUTE AS USER = ‘LUA_principal’


update v_Customer set ExtraData=N‘data updated by LUA’

        where SSN_Id = dbo.MAC( N‘111-11-1112’, object_id(‘t_RawCustomer’) )


SELECT * FROM v_Customer

        where SSN_Id = dbo.MAC( N‘111-11-1112’, object_id(‘t_RawCustomer’) )





— Now give our LUA_principal permission to insert data into

— the table

GRANT INSERT ON t_RawCustomer TO LUA_principal



EXECUTE AS USER = ‘LUA_principal’


— NOTE: This one will fail as we don’t have access to the key

— for the data or the certificate protecting it!

insert into t_RawCustomer values ( null, N‘111-11-1113’,

     N‘customer 3’, N‘New data 3’ )


SELECT * FROM v_Customer





— To give access to the encryption key to the LUA principal,

— we have 2 options:

— Option 1: Grant direct permission to the key


—       TO LUA_principal

     GRANT CONTROL ON CERTIFICATE::cert_ProtectEncryptionKeys

—       TO LUA_principal

— Option 2:


—       TO LUA_principal

  Create a SP that opens the key for the calling context

 and grant the LUA principal access to it.

— For this demo we will use option 2.

— The main advantage is to limit the control

— permission on the certificate

— as it may be protecting more than 1 key


— We need it for key_guid(‘key_Encryption’),

— otherwise we cannot look for the GUID




CREATE PROC sp_GetCustomersKeyAccess



        OPEN SYMMETRIC KEY key_Encryption DECRYPTION BY CERTIFICATE cert_ProtectEncryptionKeys



GRANT EXECUTE ON sp_GetCustomersKeyAccess TO LUA_principal




— Test the new SP

EXECUTE AS USER = ‘LUA_principal’


— Empty

SELECT * FROM sys.openkeys

EXEC sp_GetCustomersKeyAccess

— Now we have the required key open

SELECT * FROM sys.openkeys


— and we can close it when it is not needed anymore



SELECT * FROM sys.openkeys






— Now try the failed insert once more…

EXECUTE AS USER = ‘LUA_principal’


EXEC sp_GetCustomersKeyAccess


insert into t_RawCustomer values ( null, N‘111-11-1113’, N‘customer 3’, N‘New data 3 inserted by ‘ + user_name() )

insert into t_RawCustomer values ( null, N‘111-11-1114’, N‘customer 4’, N‘New data 4 inserted by ‘ + user_name() )


— Note that because LUA_principal doesn’t have direct access

— to the certificate protecting the encryption key,

— the SSN column will still display NULLs.

— As we only wanted to grant this principal permission

— to insert new data and not to read arbitrary data,

— this should be sufficient

SELECT * FROM v_Customer


— Close the key after using it






— finally, if you want to allow the LUA principal to see

— all entries in the table.

— SECURITY NOTE; After granting control on the certificate,

— the principal will have full access to the PVK,

— this means thsi principal can decrypt

— and sign anything using this certificate.

 Make sure this permission is granted only to trusted principals

— and try to monitor activity on this certificate

GRANT CONTROL ON CERTIFICATE::cert_ProtectEncryptionKeys TO LUA_principal



EXEC (‘select * from v_Customer’) as USER = ‘LUA_principal’



Comments (10)

  1. In SQL 2005 encryption functions are nondeterministic, which means that every time a function is called,…

  2. jdyoumans says:

    Good article.  The more I read, the dumber I feel.  Thanks!

  3. Mike C# says:

    Good stuff, quick question.  It seems as if this MAC method is designed to provide "indexing" of encrypted data with protection against dictionary attacks, but it doesn’t look like it provides additional protection against frequency analysis.  Is this a fair assessment?  Thanks!

  4. raulga says:

     Thanks a lot for your comment Mike C#.

     The main objective of this article was to suggest alternative mechanisms to allow identity searches (i.e. SELECT … WHERE <column> =  <value>) in SQL Server 2005 without the need to do a linear search (i.e. decrypt every single row) and explain why the IV is used in SQL Server 2005 EncryptByKey builtin.

      The idea behind the HMAC solution I proposed was thought considering concealing patterns in the cipher text column that could be recovered (i.e. keep using the CBC cryptographic mode with a random IV generated, just like SQL Server does today) but creating a column based on the plaintext (in such a way that the application can still query using the known plaintext) and using well known cryptographic operations to solve the indexing problem our customers had. As you suggested, I decided to use an HMAC to help against dictionary attacks as opposed to using a simple hashing function.

     I am not a cryptographer or a cryptoanalyst myself, and to b honest I didn’t consider frequency analysis when I discussed about this idea with my peers mostly because I assumed, based on my understanding, that it shouldn’t be a problem. I don’t know how the HMAC column (or the alternative Id column I also suggested in the article) can be used in a frequency analysis attack, but as I understand this technique, using an HMAC shouldn’t help in a frequency analysis as the result of the operation (A cryptographic hash of the plaintext + some key) should not contain patterns of the plaintext that may leak information on the distribution of characters (or other information blocks) in it, but I may be misinterpreting or missing some information.

     I will appreciate if you or anyone would provide more information (i.e. a link or a white paper or book suggestion) on how frequency analysis can be used against an HMAC or other attacks against such solution. I will also try to look some information about this by myself, and I will be more than glad to update this article with the information I find as I am sure a lot of people will be interested.

     -Raul Garcia


     SQL Server Engine

  5. Mike C# says:

    Hi Raul,

    I guess the point I was trying to make is that the random IV generation is specifically designed to prevent frequency analysis on the data sets, so that duplicate data encrypted with the same key does not generate the same result.  But you circumvent the random IV generation with a hash or MAC code stored in separate columns (or tables) related back to the encrypted data.

    Just as a very simple example (extremely simple…), if the Title column of the HumanResources.Employee AdventureWorks table were encrypted, you would not be able to tell which employee performed which job without encrypting the data.  If you added a hash code column (or MAC column), you could look at the frequency of occurrence of certain hash codes (or MAC codes) as a starting point for determining that the items that occur the least are probably higher-level management positions (e.g., Finance Manager, Q.A. Manager, CEO, CFO) versus those that occur most (Product Technicians, Sales Reps, etc.)

    Would it help eliminate these patterns if you were to store multiple MAC keys for a single table?  For instance with three or four MAC keys you would cut down the number of occurrences of duplicate hash or MAC codes significantly.  With 10 MAC keys you could potentially reduce the recurrences of “Product Technician – WC60” from my example to the same levels as “Network Administrator” or “Database Administrator”.  And for the sample data, you could potentially reduce the occurrences of all data to about the same level with about 25 or so different MAC keys on the same table.  I don’t know if joining to another table to calculate the MACs would defeat the efficiency gains though?

    For your example of encrypting SSN’s though, unless there were a lot of duplicate SSN’s (unlikely), the frequency analysis should not be an issue.  The problem, I think, comes in when you are encrypting columns that could contain duplicate entries in different rows.

  6. raulga says:

      I see what you are saying… my understanding of frequency analysis is analyzing the patterns within the same ciphertext. I know the problem you mention as a correlation or statistical analysis, and you are correct.

     If you use the encrypted (i.e. encryption without the IV, HMAC-based keys, or any other variation) you are leaking statistical information that can be used to guess the nature of the plaintext even if the attacker cannot directly decrypt it.

      I have never seen this problem being solved by using n unique keys as you suggested per plaintext; I am guessing that the attacker can still analyze the information available by joining multiple tables, bt I am not an expert on this area myself.

     A solution I have read for this particular problem is to create buckets (i.e. use the last 4 digits of the SSN to generate the key on the table, or simply randomly assign the buckets), but the keys will not be unique anymore, and extra processing is necessary (i.e. decrypt all rows from the bucket until you find the unique row you are really looking for).

     -Raul garcia


      SQL Server Engine

  7. raulga says:

     K07 found a bug in the sample code ( The column storing the ciphertext is declared as nvarchar instead of varbinary. It works properly for the example because of the implicit conversion from varbinary to nvarchar, but I have to recommend against this practice as the ciphertext is not a proper Unicode string, and some other tools may have problems when accessing the raw data.

    — Table encryption example:

    CREATE TABLE t_RawCustomer(

    — MAC-based index (SSN)

    SSN_index varbinary(20) PRIMARY KEY,

    — ciphertext (SSN)

    SSN_cipher varbinary(120),  –nvarchar(60),  

    — Two extra columns with information.

    name nvarchar(max),

    ExtraData nvarchar(100)



     After running a couple of quick tests after the correction, I realized that the insert calls will require an explicit conversion from nvarchar to varbinary, for example:

    insert into t_RawCustomer values ( null, convert( varbinary(100), N’111-11-1111′), N’customer 1′, N’New data 1′ )

     It is a relatively small change, but I think it was worth mentioning it.

     Thanks a lot,


  8. kpantiga says:

    Does it makes sense just to store the hash of the plain text and not the encrypted column?


  9. raulga says:

     It really depends on what do you want to do with them. Hashes are one-way functions, and therefore if you only store the hashes it will be very unlikely to recover the original information, but if you know the message, you can always calculate its hash and match it.

     If you intent to store hashes as a mechanism to identify if somebody knows a secret (i.e. password table), I would recommend using a salt in order to avoid rainbow attacks and protect the table (i.e. grant SELECT permission only to privileged principals in your system).

  10. Introduction It seems to be the case that cryptography is often employed to solve problems for which