Indexing encrypted data

Article
03/12/2006

Encrypted data and indexes

One thing I have been asked many times is how to create an index on top of encrypted data in SQL Server 2005. In SQL Server 2005 the encryption functions are nondeterministic, which means that every time a function is called, the output will be different, even if exactly the same input was used; for example:

CREATE SYMMETRIC KEY key1 WITH ALGORITHM = AES_256 -- TRIPLE_DES

ENCRYPTION BY PASSWORD = 'My Password!'

DECLARE @x varbinary(8000)

DECLARE @y varbinary(8000)

OPEN SYMMETRIC KEY key1

DECRYPTION BY PASSWORD = 'My Password!'

SET @x = EncryptByKey( key_guid( 'key1'), 'Test' )

SET @y = EncryptByKey( key_guid( 'key1'), 'Test' )

IF ( @x = @y )

PRINT 'ERROR: EncryptByKey returned the same output twice!!!!'

ELSE

PRINT 'EncryptByKey returns different results every time it is called'

CLOSE SYMMETRIC KEY key1

DROP SYMMETRIC KEY key1

As there is no way to predict the outcome of an encryption builtin call, there is no way to establish any relationship between two different values being encrypted. Therefore, creating an index on top of encrypted data will be of no use at all. It is possible to decrypt the data for the lookup operation, but it would be too expensive as it will result in a linear search. For example:

-- ...

-- The following select will always return 0 rows,

-- because EncryptByKey will never return the same value twice

SELECT CONVERT( nvarchar(20), DecryptByKey( SSN )) as SSN_decrypted,

* FROM t_Customer

WHERE SSN = EncryptByKey( key_guid( 'key_Encryption' ), N'111-11-1111' )

-- The following SELECT statement will work, but it

-- will be very expensive, because it will result in a linear search

SELECT CONVERT( nvarchar(20), DecryptByKey( SSN )) as SSN_decrypted,

* FROM t_customer

WHERE DecryptByKey( SSN ) = N'111-11-1111'

This raises a question: why did we choose to have a nondeterministic encryption function? In order to answer it, I want to start with a simplified explanation on how block ciphers work (i.e. DES or AES), which I hope will help explain this property better.

Block ciphers

As the name implies, block ciphers work on blocks of plaintext/ciphertext, and every time the same plaintext is encrypted by the same key, the resultant ciphertext will always be the same. This raises some security considerations, such as how can we conceal patterns in the plaintext or how can we reuse a key to encrypt more than one plaintext.

One mechanism that is used to address these issues is the use of the block cipher in different “cryptographic modes”, which usually consist on some feedback mechanism and simple operations such as XOR.

The most simple of the cryptographic modes is called Electronic Codebook mode (ECB) and it consists of dividing the plaintext in blocks and then encrypting them independently. Sounds simple and it is deterministic, but unfortunately, this cryptographic mode allows an attacker to create a code book without any need to access the key. Imagine the following scenario: You are trying to encrypt a social security number of the form “xxx-xx-xxxx”, if we were using TRIPLE_DES, the block length would be 8 bytes, therefore the ciphertext would really be divided in two blocks: Encrypt( key, “xxx-xx-x” ) + Encrypt( key, “xxx” ). An attacker can easily find out that any SSN that starts with the same numbers will always result in the ciphertext for the first block of ciphertext. It is not recommended to use this mode, especially if there are potential patterns to conceal or if the key is going to be used to encrypt more than one plaintext.

The cryptographic mode we decided to use in SQL Server 2005 is Cipher Block Chaining mode (CBC). This mode uses the previous ciphertext block to feed the next one, by XORing the previous ciphertext block with the current plaintext block before encrypting it, and it repeats this for every block. But what should we do with the first block? As there is no previous ciphertext block, we need to initialize the system with something else; in this case, we will use an Initialization vector (IV). How do we choose an IV? If we select a fixed IV, then the first block will always encrypt to the same ciphertext and all subsequent blocks will be the same until we hit the first difference between 2 plaintexts. To avoid this, a randomly generated IV is used every time and it is included as the first block of the ciphertext in order to be able to decrypt it back.

How to perform lookups on encrypted data

At this point, you are probably asking: “Good, thanks for the information, but I still need to create an index on encrypted data! Any suggestions?”. Well, actually, yes we have some ideas, and I really hope they can be of some help.

First of all, we need to emphasize that in order to be able to index something we need to give away some information about it. How much information? The answer may be different for every situation, but here are some ideas as well as some pros and cons for each one of them. The basic idea behind the following suggestions is to create an additional column on which we will create the index.

For the reminder of this article, we will assume the problem of indexing data that is used to uniquely identify entities in a table, for example, credit card numbers identifying customers or Social Security numbers identifying patients. For these scenarios, the data that we want to index is a primary key and is used as the identification method. Below we describe several suggestions for how to address this problem.

Create a new identifier value

It may be possible in some cases to use a column with an identity value completely unrelated to the data we are trying to protect, instead of using the protected data; for example, we could create a unique customer ID instead of using a social security number for identification.

Unfortunately, because of the business logic and existing applications, this approach is not always possible.

Index a hash of your data

The idea is simple: create a new column to store the hash (for example, SHA1) of the plaintext data and then index this column.

As the hash functions are deterministic, this sounds like a good approach, but its simplicity is also its worst enemy. An attacker can potentially create a dictionary with all possible values of plaintext offline and then she can just do a simple lookup of the indexing column and correlate that with her dictionary.

Index a MAC of your data

We can use a MAC (Message Authentication Code) of the plaintext to create a new indexing column. This approach is similar to using a hash, but it requires a secret key to calculate the MAC. This prevents an unauthorized user to use a general purpose dictionary of hashed values and it will also prevent her from creating a targeted dictionary without having access to the MAC key. Please note that an attacker who has access to the MAC key can generate a targeted dictionary.

SQL Server 2005 doesn’t provide a function for computing a MAC, but it is possible to write a user-defined function that calculates a MAC using either SQL Server 2005 CLR or by reusing the existing builtin functions.

Sample code for indexing a MAC

I would like to include a short demo that hopefully will help to understand this better. I strongly recommend reading more about hashes and MACs - one good source you can use is Bruce Schneier’s Applied Cryptography (https://www.schneier.com/book-applied.html).

First, I create a certificate to protect my symmetric key and I protect the certificate itself using the DB master key (DBMK). This will allow me to control who can access the protected data by giving the proper permissions on the certificate and symmetric key without the need to memorize or hardcode any password.

/*********************************************************

* This posting is provided "AS IS" with no warranties, and confers no rights.

* Authors: Raul Garcia

* Laurentiu Cristofor

* Date: 11/02/2005

* 09/01/2006

* Description:

* Create an index based on a MAC (message authentication code)

* to be able to do SEEK operations on a table that contains

* encrypted data that is typically used as an index, such as SSN

* or credit card numbers.

* It also demonstrates a few other topics such as EXECUTE AS.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Security note:

* This demo is intended to demonstrate how to protect data

* at rest, while trying to keep it easily accessible to users while

* the SQL Server instance is running; the root for the key hierarchy

* in this demo is the service master key (SMK).

*********************************************************/

-- This certificate will be used to protect the data-encryption key

CREATE CERTIFICATE cert_ProtectEncryptionKeys WITH SUBJECT = 'Data encryption key protection'

-- This key will be used to protect our plaintext data

CREATE SYMMETRIC KEY key_Encryption WITH ALGORITHM = AES_128 ENCRYPTION BY CERTIFICATE cert_ProtectEncryptionKeys

I will generate a random value as a MAC key. Also, because we may want to use this mechanism for more than one table, we would want to be able to store a MAC for each column that we index this way. So I will create a table to store all the MAC keys, as well as a certificate and a symmetric key to protect them. I will be using the table id as a primary key for the MAC keys. This will allow me to keep track of one MAC key for each table. Of course, if we would need to have several MAC keys per table, because we might want to index several encrypted columns, then we would need to use a column id in addition to the table id, but for the purpose of this demo, I am keeping it simple.

-- This is the certificate that will protect our MAC key-encryption key

CREATE CERTIFICATE cert_ProtectIndexingKeys WITH SUBJECT = 'Data indexing key protection'

-- This key will be used to protect the MAC keys

CREATE SYMMETRIC KEY key_Indexing WITH ALGORITHM = AES_128 ENCRYPTION BY CERTIFICATE cert_ProtectIndexingKeys

-- This table will store the encrypted MAC keys for all tables

CREATE TABLE t_MacIndexKeys( table_id int PRIMARY KEY, Mac_key varbinary(100) not null )

I will also define the MAC user-defined function. The function itself is quite simple, and I would strongly suggest you to review this code and my notes in it and modify it as needed. The MAC function that I am using is defined as SHA1( plaintext + key ). Also note that this function may return null in case of an error.

-- This function is used to generate a MAC based on the plaintext.

-- Parameters:

-- @Message Message we want to calculate the MAC for.

-- @Table_id The table id (object_id),

-- this value will be used to find the corresponding

-- MAC key from the t_MacIndexKeys table

-- return value: a varbinary(24) MAC if succeeded, null otherwise

-- NOTES: This function will run under the DBO context to be able

-- to access the required keys and calculate the MAC

-- Only grant execute permission to authorized principals

-- who need to access the protected data and/or use

-- the module signature feature to grant access via an application.

-- This function relies on the DBMK being available;

-- if the DBMK is not protected by the SMK, the caller must

-- open the DBMK explicitly before calling this function.

-- SECURITY NOTE: A potential attacker with access to the database

-- and with execute permissions on this function while the system

-- is online and the DBMK is opened can generate a dictionary with

-- all the possible values for the protected data

-- and its corresponding MAC based on the table id.

-- By using a different key for every table, the attacker

-- may be slowed down a little bit, depending on the protected data

-- domain space. A good advice is to monitor irregular activity

-- on this function.

CREATE FUNCTION MAC( @Message nvarchar(4000), @Table_id int )

RETURNS varbinary(24)

WITH EXECUTE AS 'dbo'

BEGIN

declare @RetVal varbinary(24)

declare @Key varbinary(100)

SET @RetVal = null

SET @key = null

SELECT @Key = DecryptByKeyAutoCert( cert_id('cert_ProtectIndexingKeys'), null, Mac_key) FROM t_MacIndexKeys WHERE table_id = @Table_id

if( @Key is not null )

SELECT @RetVal = HashBytes( N'SHA1', convert(varbinary(8000), @Message) + @Key )

RETURN @RetVal

END

The following stored procedure will be used to generate a new random varbinary value that we can use as a MAC key. As we don’t have any cryptographic random byte generator builtin available out of the box, we will use the Initialization Vector (IV) of the EncryptByKey builtin to generate entropy on our new varbinary MAC key. I also strongly suggest to review this code and to modify it as needed.

-- This SP creates a new randomly generated MAC key

-- for the table specified by table_id

-- Parameters:

-- @Table_id The table id (object_id) related to the MAC key.

-- NOTE: This SP will run under the DBO's context

-- to be able to access the required keys and tables.

-- It will not return any new or existing key blob back,

-- but it is recommended to only grant execute permissions

-- to trusted principals who need to be able to create new MAC keys.

CREATE PROC AddMacForTable @Table_id int

WITH EXECUTE AS 'dbo'

declare @Key varbinary(100)

declare @KeyGuid uniqueidentifier

SET @KeyGuid = key_guid('key_Indexing')

-- Open the encryption key

-- Make sure the key is closed before doing any operation

-- that may end the module, otherwise the key will

-- remain opened after the store-procedure execution ends

OPEN SYMMETRIC KEY key_Indexing DECRYPTION BY CERTIFICATE cert_ProtectIndexingKeys

-- The new MAC key is derived from an encryption

-- of a newly created GUID. As the encryption function

-- is not deterministic, the output is random

-- After getting this cipher, we calculate a SHA1 Hash for it.

SELECT @Key = HashBytes( N'SHA1', ENCRYPTBYKEY( @KeyGuid, convert(varbinary(100), newid())) )

-- Protect the new MAC key

SET @KEY = ENCRYPTBYKEY( @KeyGuid, @Key )

-- Closing the encryption key

CLOSE SYMMETRIC KEY key_Indexing

-- As we have closed the key we opened,

-- it is safe to return from the SP at any time

if @Key is null

BEGIN

RAISERROR( 'Failed to create new key.', 16, 1)

END

INSERT INTO t_MacIndexKeys VALUES( @Table_id, @Key )

And finally, we create a sample table with a Social Security Number column (SSN) that we want to use as the primary key. The table has been modified to have 2 columns for storing the SSN:

* SSN_cipher is used to store the SSN in an encrypted way, such that someone with the right key can recover all the SSNs from the table.

* SSN_index is used to store MACs based on the SSN and on this table’s MAC-key. This column will be used for identity lookups.

-- Table encryption example:

CREATE TABLE t_RawCustomer(

-- MAC-based index (SSN)

SSN_index varbinary(20) PRIMARY KEY,

-- ciphertext (SSN)

SSN_cipher nvarchar(60),

-- Two extra columns with information.

name nvarchar(max),

ExtraData nvarchar(100)

)

We also have to call the stored procedure that we created, to generate and store the new table MAC key.

--- Create a new MAC key for this table

declare @objid int

SET @objid = object_id('t_RawCustomer')

EXEC AddMacForTable @objid

Now I will create a few objects that will make it easier to work with the data table. These objects are not intended to increase the security of the system, only its usability. Let’s start with a view that will display the SSN in plaintext form when the viewer has proper access.

-- Now, let's create an easy to use view for the table

-- NOTE: as we are using DecryptbyKeyAutoCert to decrypt the data,

-- it is not necessary to explicitly open the key

-- Also note that this means that in order to see

-- the plaintext value for SSN, the caller requires permission

-- to access both the certificate's private key and the

-- data symmetric key.

CREATE VIEW v_Customer

WITH SCHEMABINDING

SELECT

-- We want to leave the index in the view as

-- it may be useful for SEEK operations

SSN_index as SSN_Id,

-- Use the certificate to automatically open the encryption key.

-- Additionally use the index (MAC(k1, PT)) to verify the

-- decryption and prevent data tampering such as

-- copying encrypted values from one row to another

convert( nvarchar(15), DecryptbyKeyAutoCert(

cert_id('cert_ProtectEncryptionKeys'), null,

SSN_cipher, 1, SSN_index)) as SSN,

-- the rest of the data will remain the same

Name, ExtraData FROM dbo.t_RawCustomer

Next, we create a trigger that will fire during an insert in the base table. This trigger expects that the SSN_cipher value will be a SSN plaintext and will perform the MAC calculation and encryption on it. The trigger will validate that the SSN is not null, as well as verify that the caller can generate both MAC and ciphertext values based on the keys being used.

Note that, for inserting data using our trigger, it is a prerequisite to open the key that is used to encrypt the data.

-- Intercept the inserts and make sure the inserted data is properly generated

CREATE TRIGGER trig_ProtectSSN on t_RawCustomer

INSTEAD OF INSERT

declare @Index varbinary(24)

declare @KeyGuid uniqueidentifier

declare @Cipher nvarchar(60)

if( select count(*) from inserted where SSN_cipher is null ) > 0

RAISERROR( 'Cannot store null as protected data. ', 16, 1)

ELSE

BEGIN

SET @KeyGuid = key_guid('key_Encryption')

SELECT @Index = dbo.MAC( SSN_cipher,

object_id('t_RawCustomer') ) from inserted

if( @Index is null

OR @KeyGuid is null

OR encryptbykey( key_guid('key_Encryption'), 0x00)

is null )

BEGIN

RAISERROR( 'Cannot Insert protected data. Either the encryption or indexing keys are not available or the indexing key is not valid for MAC generation.', 16, 1)

END

ELSE

INSERT INTO dbo.t_RawCustomer select

@Index,

encryptbykey( key_guid('key_Encryption'), SSN_cipher, 1, @Index ),

Name, Extradata

from inserted

END

In a similar way, we will create a trigger that will fire during updates and that will prevent arbitrary changes on the SSN fields (cipher or MAC).

-- Intercept any attempt to modify the RawData table

-- and prevent anyone from modifying the cipher values

CREATE TRIGGER trig_ProtectSSNUpdate on t_RawCustomer

INSTEAD OF UPDATE

if( COLUMNS_UPDATED() & 3 ) > 0

raiserror( 'Cannot update protected columns. Drop the row and create a new one with the updated information.', 16, 1 )

ELSE

BEGIN

UPDATE t_RawCustomer

SET Name = ins.Name, ExtraData = ins.ExtraData

FROM inserted ins, t_RawCustomer orig

WHERE ins.SSN_index = orig.SSN_index

END

Here is a usage example :

-- test: Try to insert without opening the key; this should fail

insert into t_RawCustomer values ( null, N'111-11-1110', N'Customer 0', N'New data' )

-- test: Open the symmetric key before we can use it

OPEN SYMMETRIC KEY key_Encryption DECRYPTION BY CERTIFICATE cert_ProtectEncryptionKeys

insert into t_RawCustomer values ( null, N'111-11-1111', N'customer 1', N'New data 1' )

insert into t_RawCustomer values ( null, N'111-11-1112', N'customer 2', N'New data 2' )

CLOSE SYMMETRIC KEY key_Encryption

-- Compare the raw table with the view

select * from t_RawCustomer

select * from v_Customer

-- Select using the SSN in plaintext.

-- Note that this will require a full table scan and is inefficient.

-- We cannot make this query take advantage of an index.

SELECT * FROM v_Customer

where SSN = N'111-11-1112'

-- Now select using the SSN_id column, which will use

-- the index that we have set up for this column.

SELECT * FROM v_Customer

where SSN_Id = dbo.MAC( N'111-11-1112', object_id('t_RawCustomer') )

And finally, here is a Least Privileged User example. These are the minimum permissions that a low-privileged user will need for different operations on the table/view that we created. Note that this is not an exhaustive list of possible operations or permissions on our table; it is just an example of some new and interesting permissions you may want to get familiar with. I strongly suggest to review the SQL Server 2005 documentation for these objects to get a complete list of permissions. In this particular example, I want to emphasize the following operations and permissions:

Perform identity lookup queries (i.e. SSN = ‘xxx’ ):
- GRANT SELECT ON v_Customer to LUA_principal
- GRANT EXECUTE ON dbo.MAC to LUA_principal. The SP will run under elevated context, therefore no explicit permission on the symmetric key protecting the MAC-keys is needed.
- GRANT VIEW DEFINITION ON dbo.t_RawCustomer to LUA_principal. This principal will need o access the base table index.
- NOTE: With these set of permissions, the principal can select from the view v_Customer, but cannot recover the SSN plaintext.
Insert a new record
- All of the Identity lookup queries permissions
- GRANT INSERT ON t_RawCustomer TO LUA_principal
- GRANT VIEW DEFINITION ON SYMMETRIC KEY::key_Encryption TO LUA_principal. Needed for the key_guid operation.
- Grant some way to open the symmetric key. In the example we show 1 mechanism.

-- Create a database-only principal for testing purposes

CREATE USER LUA_principal WITHOUT LOGIN

GRANT SELECT ON v_Customer to LUA_principal

-- We want this principal to be able to perform indexed queries,

-- so we need to grant him permissions on the MAC procedure

GRANT EXECUTE ON dbo.MAC to LUA_principal

-- We also need to grant the principal view definition on the

-- base table, so he can obtain its index

GRANT VIEW DEFINITION ON dbo.t_RawCustomer to LUA_principal

-- Run a simple test as this new principal

-- You will notice that the SSN column shows only NULLs.

-- This is because this principal has no access to the encryption key

EXEC ('select * from v_Customer') as USER = 'LUA_principal'

-- Note the new principal has no permissions at all on the base table!

EXEC ('select * from t_RawCustomer') as USER = 'LUA_principal'

-- Now run a select with a WHERE clause

-- As it is possible to execute the MAC function,

-- and the caller knows exactly what SSN he is looking for,

-- it is possible to run the query.

-- Notice that the SSN column shows NULL.

EXECUTE AS USER = 'LUA_principal'

SELECT * FROM v_Customer

where SSN_Id = dbo.MAC( N'111-11-1112', object_id('t_RawCustomer') )

REVERT

-- Now let's give the LUA permission to update the data

GRANT UPDATE ON v_Customer(ExtraData) TO LUA_principal

EXECUTE AS USER = 'LUA_principal'

update v_Customer set ExtraData=N'data updated by LUA'

where SSN_Id = dbo.MAC( N'111-11-1112', object_id('t_RawCustomer') )

SELECT * FROM v_Customer

where SSN_Id = dbo.MAC( N'111-11-1112', object_id('t_RawCustomer') )

REVERT

-- Now give our LUA_principal permission to insert data into

-- the table

GRANT INSERT ON t_RawCustomer TO LUA_principal

EXECUTE AS USER = 'LUA_principal'

-- NOTE: This one will fail as we don't have access to the key

-- for the data or the certificate protecting it!

insert into t_RawCustomer values ( null, N'111-11-1113',

N'customer 3', N'New data 3' )

SELECT * FROM v_Customer

REVERT

-- To give access to the encryption key to the LUA principal,

-- we have 2 options:

-- Option 1: Grant direct permission to the key

-- GRANT VIEW DEFINITION ON SYMMETRIC KEY::key_Encryption

-- TO LUA_principal

-- GRANT CONTROL ON CERTIFICATE::cert_ProtectEncryptionKeys

-- TO LUA_principal

-- Option 2:

-- GRANT VIEW DEFINITION ON SYMMETRIC KEY::key_Encryption

-- TO LUA_principal

-- Create a SP that opens the key for the calling context

-- and grant the LUA principal access to it.

-- For this demo we will use option 2.

-- The main advantage is to limit the control

-- permission on the certificate

-- as it may be protecting more than 1 key

-- We need it for key_guid('key_Encryption'),

-- otherwise we cannot look for the GUID

GRANT VIEW DEFINITION ON SYMMETRIC KEY::key_Encryption TO LUA_principal

CREATE PROC sp_GetCustomersKeyAccess

WITH EXECUTE AS 'dbo'

OPEN SYMMETRIC KEY key_Encryption DECRYPTION BY CERTIFICATE cert_ProtectEncryptionKeys

GRANT EXECUTE ON sp_GetCustomersKeyAccess TO LUA_principal

--- Test the new SP

EXECUTE AS USER = 'LUA_principal'

-- Empty

SELECT * FROM sys.openkeys

EXEC sp_GetCustomersKeyAccess

-- Now we have the required key open

SELECT * FROM sys.openkeys

-- and we can close it when it is not needed anymore

CLOSE SYMMETRIC KEY key_Encryption

SELECT * FROM sys.openkeys

REVERT

-- Now try the failed insert once more...

EXECUTE AS USER = 'LUA_principal'

EXEC sp_GetCustomersKeyAccess

insert into t_RawCustomer values ( null, N'111-11-1113', N'customer 3', N'New data 3 inserted by ' + user_name() )

insert into t_RawCustomer values ( null, N'111-11-1114', N'customer 4', N'New data 4 inserted by ' + user_name() )

-- Note that because LUA_principal doesn't have direct access

-- to the certificate protecting the encryption key,

-- the SSN column will still display NULLs.

-- As we only wanted to grant this principal permission

-- to insert new data and not to read arbitrary data,

-- this should be sufficient

SELECT * FROM v_Customer

-- Close the key after using it

CLOSE SYMMETRIC KEY key_Encryption

REVERT

-- finally, if you want to allow the LUA principal to see

-- all entries in the table.

-- SECURITY NOTE; After granting control on the certificate,

-- the principal will have full access to the PVK,

-- this means thsi principal can decrypt

-- and sign anything using this certificate.

-- Make sure this permission is granted only to trusted principals

-- and try to monitor activity on this certificate

GRANT CONTROL ON CERTIFICATE::cert_ProtectEncryptionKeys TO LUA_principal

EXEC ('select * from v_Customer') as USER = 'LUA_principal'

Indexing encrypted data

Additional resources