Get started with Dynamic Data Masking in SQL Server 2016 and Azure SQL DB

Dynamic Data Masking (DDM) is a new security feature in Microsoft SQL Server 2016 and Azure SQL DB. The main documentation is here (also see link under Resources at end). This post is a quick how-to intro to DDM, including applying it in a database and managing which principals see masked or unmasked data. I'll also answer a few questions that commonly come up.

What is DDM?

Picture this scenario. You have a database table which stores sensitive data, such as social security numbers, in the clear (unencrypted). Anyone with appropriate access can run select * against this table and see all the sensitive data.

This becomes a concern in organizations where production data is periodically restored into development, test, and/or staging environments. Developers, testers, and other people need to work with the data, but have visibility to sensitive data. This is clearly concerning (and may be unlawful in some jurisdictions). How do we give these roles the data they need, while protecting sensitive data?

This is what DDM does. A database administrator applies a masking rule to a column. The result is that when developers, testers, and other non-administrators run queries against the table, they no longer see masked columns' data in the clear; instead, they see the data in obscured format.

This does not mean the underlying data is now stored encrypted. No - it's still stored in the clear, and any role with UNMASK privilege can still access the data in the clear. DDM is simply a way to obscure sensitive data from roles or users who should not see it in the clear.

DDM in action

Let's say you have a table with social security numbers. A select * returns this:

Sample data

We want to use DDM to obscure this data. How do we do that? Simple. We use familiar alter column syntax to designate a column as masked, and we specify how to mask it. Here's how we'd mask the SSN column in T-SQL: alter table data.Users alter column [SSN] add masked with (function = 'partial(0,"XXX-XX-",4)');

What's happening here? First, we're designating the [SSN] column as masked (add masked), then we're specifying how to mask it, using a masking function. Currently, four masking functions are available; I'm using partial(), which lets me specify how many characters to leave clear at the start, what string to substitute in the middle, and how many characters to leave clear at the end.

After applying this mask, the same select * returns this:

Sample data masked

Note how the SSN data is now masked. Per our masking function, no characters are clear at the start; the data is partly replaced by the mask we specified; and the last four characters are in the clear.

How do we remove masking from a column? Simple:

alter table data.Users alter column [SSN] drop masked;

Note that DDM works in T-SQL for both SQL Server 2016 and Azure SQL DB. However, for Azure SQL DB, DDM can also be configured using the Azure portal.

DDM Security

By default, all principals not in the db_owner role will see masked data; principals in db_owner are automatically granted UNMASK privilege and will see data unmasked (in the clear) regardless of any masking rules applied.

DDM does not prevent masked users from executing DML statements. For example, if they have DELETE privilege, they can delete records from a table with a masked column; similarly with UPDATE. This is expected.

DDM FAQ

Why would we use DDM if it doesn't even encrypt the underlying data?

Because in many scenarios, such as dev and test environments, masking is enough protection without the performance and administrative overhead of encryption. For example, daily restores from production to a staging database used for manual and/or automated tests where it's important to understand the underlying data.

In any case, SQL Server offers excellent encryption options. DDM is a great option when masking is enough.


"We have some people who need to see masked data in the clear. Does that mean we have to add them to db_owner? We'd prefer not to."

No, people do NOT need to be added to db_owner to see data in the clear. Specific roles or users can be granted UNMASK; currently, this is at a database level, i.e. not to a specific table or column.

Here's how to do this for an OpsMgrRole database role:
grant unmask to [OpsMgrRole];


What if someone selects masked data into a temp table, or backs up the database and restores it on another machine? Can they get around DDM on the source table this way?

Short answer: No.
Longer answer: When a user to whom masking applies selects into another table or backs up the database, DDM will write the masked data statically, i.e. in this instance, the written data is no longer "in the clear"; what is written is statically masked. Let's prove it with this T-SQL:

T-SQL showing that DDM cannot be circumvented using a temp table

What's happening here?

On line 4, I'm reverting to my logged-in user (which happens to be a db_owner member). This is to make sure I am not in the context of a less-privileged user.

On line 7, I'm impersonating a lesser-privileged user who is not a member of db_owner, and who has not been granted UNMASK.

On line 10, I'm creating a temp table. On line 13, I'm extracting data from the source table, which has a mask on the SSN column, into my unmasked temp table. By doing this, I'm trying to get around the mask on the source table.

On line 16, I'm preparing to see people's sensitive data... only to be disappointed - the data is still masked! In fact, the data was written to the temp table statically masked, i.e. it's no longer in the clear in the temp table - it was written with all the X mask characters.

On line 19, I'm reverting back to my logged-in user, who is a db_owner member.

On line 22, I'm selecting data from that temp table again, but (as expected) it's still masked. Remember - it was written to the temp table statically masked by a user subject to masking on the source table, so my db_owner membership and implicit UNMASK privilege do not help.

Last, on line 25 I query the source table and see the data unmasked, exactly as expected, which demonstrates that the source table is dynamically masked whereas the temp table, as expected, is statically masked.

The three result sets from lines 16, 22, and 25 in the above T-SQL:

img04

Note: the above T-SQL is a screenshot image. I used that for easier readability with color and line numbers. See Resources below to get all the code I used for this post.


Will DDM interfere with query execution plans?

No. Queries are executed first, then the results are masked. Nothing changes with regard to query execution plans.


We have a more complex situation that's not covered by the currently available masking functions. For example, we have variable-length fields that store people's comments and notes. Can we detect, and mask, social security numbers or credit card numbers within such text blocks, but not mask anything else?

So something more like a regular expression. That's not possible at this point with DDM, but the SQL product group has gotten this feedback.

Resources

DDM MSDN documentation: https://msdn.microsoft.com/library/mt130841.aspx

My code used for this post is in my public git repo (please first note disclaimers at the repo root - this is sample code, use at your own risk). The readme for these files contains details, but briefly, I provide you code to create a sample database and then implement and test DDM, including the DDM subvert fragment above. All you need is a SQL Server 2016 test instance.

Hope this helps you get started with Dynamic Data Masking.