I’m a huge Sherlock Holmes fan (I’ve read all the books, watch Elementary on CBS every week, and loved the most recent season Four of Sherlock) so when I recently got a question about some unexplained behavior for SQL Server, I thought of the idea of posting some of these as I get and solve them in the form of a blog series titled SQL Server Mysteries (#sqlmystery). My goal is to resolve mysteries about SQL Server I encounter but do this without going straight the source code first. Rather I’ll use our source and other colleagues of mine at Microsoft to validate the answer.
The first case that helped me start this journey was asked by a MVP within the SQL Server community (his first name starts with Joey<g>).
The question went something like this…
“Quick question about a behavior we’re seeing at a customer and in testing. After enabling TDE, we are still seeing TempDB show up as encrypted, as expected, however, after disabling TDE, TempDB still shows as encrypted”.
First, let’s explain the mystery in more detail.
Transparent Data Encryption (TDE) is a feature that was introduced in SQL Server 2008 (and is also available for Azure SQL Database, Azure SQL Data Warehouse, and Parallel Data Warehouse) with the purpose of encrypting your data at rest. That is to ensure your database is encrypted at the file level. Which means that if someone was able to grab your SQL Server database and/or transaction log file, they could not see its contents simply by opening the file (Example. your laptop is stolen and the thief yanks out the hard drive and tries to inspect the files outside of using SQL Server).
The process to enable this for a database is described in our documentation. You effectively “turn on” encryption by using ALTER DATABSE SET ENCRYPTION ON. And we state in the documentation the following regarding tempdb:
The tempdb system database will be encrypted if any other database on the instance of SQL Server is encrypted by using TDE. This might have a performance effect for unencrypted databases on the same instance of SQL Server. For more information about the tempdb system database, see tempdb Database
What the documentation doesn’t say is what happens when you decrypt all user databases. I’ve seen many in the community say that once you encrypt a user database, tempdb will be permanently encrypted.
At first glance in SQL Server 2016, you can see if a database is encrypted for TDE by looking at sys.databases.is_encrypted. If I enable encryption for a user database using ALTER DATABSE, you will see results like this: (Note: that sys.databases.is_encrypted only shows 1 starting in SQL Server 2016).
And you can use this DMV to see more details about encryption:
select * from sys.dm_database_encryption_keys
The encryption_state column is documented as:
So you can see that both the user database and tempdb are encrypted. What does encrypted mean at this point? Well for a user database, when you run ALTER DATABASE SET ENCRYPTION ON, we actually create new SQL background tasks to encrypt all pages of the database file and current contents of the log file. Any future I/O for page or log will be encrypted “as we write”. For tempdb, we don’t encrypt the current contents of the database but any future I/O would be encrypted. A common I/O path for tempdb are sort spills and workfile I/O for hash joins (For gory details about tempdb internals see this PASS talk from 2011).
I’ve seen comments like this to indicate that when all user databases have encryption turned off, tempdb remains encrypted until a server restart. This connect item was marked by design because until the server is restarted some of your data that was encrypted from a user database could still exist in tempdb files. But what about after a server restart and comments about tempdb encryption being permanent? Knowing that tempdb is recreated after a server restart, my working theory is that it is not truly permanently encrypted. But how do I prove this?
We simply must find a way to trace the encryption of tempdb when it is enabled. Then we will use that same technique to observe it doesn’t happen when we believe it is disabled despite what sys.databases indicates.
The #1 tool I use at my disposal for tracing these days is Extended Events I searched through the available Extended Events that may have something to do with TDE and/or encryption and found these:
select * from sys.dm_xe_objects where (name like '%tde%' or name like '%encrypt%') and object_type = 'event'
The only one that looked usable is database_tde_encryption_scan_duration. But when I used that I found that is only used to show a scan of an existing user database when you enable encryption (e.g. scan the existing data and encrypt it). That is not fired for tempdb (even when encryption for tempdb is considered enabled).
Vowing not to go to the source yet, I decide to use the Windows Debugger and public symbols (so you to can do this at home). Since we are trying to trace encryption, there must be Windows APIs SQL Server uses to encrypt data. Turns out there are a few options namely: BCryptEncrypt() and CryptEncrypt(). According to the MSDN docs, BCryptEncrypt is the “nextgen” API so I’ll choose that one to try and “trace”. Next, I need a scenario that requires a page write for tempdb. The easiest candidate for this is a sort spill. These are seen when you see a Sort Warning (XEvent also has a sort warning). The basic concept is that a sort as part of a query plan is executed but cannot be completely done in memory. So we must spill part of the sort to disk to complete the sort operation. How do I create one of these sort spills? First you must build a scenario that requires a sort, which is easy with an ORDER BY on a column that is not indexed. Second, just make the sort large but limit the memory of SQL Server with ‘max server memory’.
Here is the repro. First set it up by running this from SQL Server Management Studio.
drop database yourdb
create database yourdb
drop table yourtable
create table yourtable (col1 int, col2 char(7000) not null)
set nocount on
declare @x int
set @x = 0
while (@x < 100000)
insert into yourtable values (@x, 'your row')
set @x = @x + 1
sp_configure 'show advanced', 1
sp_configure 'max server memory', 8192
Now let;’s run the query and use show statistics to see if a spill occurs
set statistics xml on
select * from yourtable
order by col2
Note: Be sure to set your ‘max server memory’ back to 0 when you are done with this fun exercise.
The resulting execution plan should have something like this with the Sort operator
Now that we have our repro, let’s use the debugger to see for these sort spills whether the BCryptEncrypt() is called. We can do this by setting up encryption and encrypting the user database we created called yourdb with these steps:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<Your Strong Passsword'
CREATE CERTIFICATE MyServerCert WITH SUBJECT = 'My DEK Certificate'
CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_128
ENCRYPTION BY SERVER CERTIFICATE MyServerCert;
ALTER DATABASE yourdb
SET ENCRYPTION ON
I now have a spill repro and TDE enabled for a user database. I queried select * from sys.dm_database_encryption_keys to make sure the user db and tempdb are setup for TDE. I will now run the Windows Debugger, attach it to the running SQL Server process using public symbols, and set a breakpoint on BCryptEncrypt().
From my powershell command prompt I run:
windbg -y srv*c:\public_symbols*https://msdl.microsoft.com/download/symbols -pn sqlservr.exe
The –y parameter is the symbol path. c:\public_symbols is a symbol cache and a folder I created on my laptop. This means when symbols are loaded from the http location (the public symbol server) they will be downloaded into that local folder so the next time loading symbols is fast.
Up comes the Debugger and I use the bp command to set a breakpoint on bcrypt!BCryptEncrypt (the BCryptEncrypt API documentation says this function is implemented in bcrypt.dll)
After typing in g to “go”, I go back and run the query again that caused the sort spill.
in my debugger it “broke in” by hitting the breakpoint. I used the k command to dump out the call stack to see where in the code it hit this API call (not the full stack but enough to see the story)
The “top” of the stack is our breakpoint for BCryptEncrypt. Notice what is below it. You can see from function calls like QScanSortNew… this is for a sort. The functions in between are for allocating memory for sorts (the get_bob_buf(). No relation to the author. bob stands for big output buffer). The Bob::IssueWrite is effectively our “sort spill” and the Page::Encrypt() is the call to “encrypt the page” (before it is written to disk in tempdb).
OK. Now we need to disable TDE for our user database and restart SQL Server. Then go back and basically do the same thing. Make sure you get a sort spill from the query. Then use the debugger and breakpoint to see if you hit BCryptEncrypt() again. The steps here are to disable TDE for the user database with ALTER DATABASE, end the debugger session with the .detach and then q to exit the debugger. Then restart the SQL Server service. Now repeat the steps above with the debugger to set the breakpoint, ‘g’, and run the sort spill query.
In my test I did not hit the breakpoint. And furthermore, you will notice that when you query sys.dm_database_encryption_keys, there is no row for tempdb at all. So our debugger breakpoint proves that tempdb is not permanently encrypted. Instead, if ALL user databases have TDE disabled and you restart SQL Server, tempdb is no longer encrypted. So instead of using sys.databases, use sys.dm_database_encryption_keys to tell which databases are truly enabled for encryption. I then verified my findings in the source code. Basically, we only enable encryption for tempdb if 1) ALTER DATABASE enables any user db for TDE 2) When we startup a user database and have encryption enabled. I also verified the behavior with my colleagues in the Tiger Team (thank you Ravinder Vuppula). We will look at fixing the issue with sys.databases in the future (ironically as I said earlier it was never enabled for tempdb before SQL Server 2016).
I hope you enjoyed not only the details of the mystery but the techniques involved to solve it. Look for more SQL Mysteries and their solutions on this blog