Unintended Consequences of Scalar-Valued User Defined Functions

Author:             Chuck Heinzelman

Reviewers:      Kevin Cox, Dan Jones, Lara Rubbelke

 

During a customer engagement, we noticed a query that was taking a long time to execute.  We saw that the query had a scalar-valued User Defined Function (UDF) encapsulating functionality in the SELECT clause.  To understand why this could be a problem, you need to understand how SQL Server deals with functions in the SELECT clause.  Let’s take a quick dive into how this works in SQL Server.

Assume that I have a table with the following structure:

CREATE TABLE dbo.Numbers

(

   Number integer NOT NULL PRIMARY KEY CLUSTERED

);

Now assume that this table has 100,000 rows in it – numbered 1 to 100,000.  As an aside, I like to have a table like this with about 1,000,000 rows in it for occasions where I need to duplicate data.  You never know when you will need a table of numbers!

If I execute the following SQL Statement, what will I get?

SELECT Number, GetDate() AS CurDate

FROM dbo.Numbers

WHERE Number <= 1000;

If you answered 1000 rows with the numbers 1 - 1000 and the exact same value for the CurDate column in every row, you would be correct.  This is because SQL Server only executes the GetDate() function once for the entire statement.

Now assume that I create the following function:

CREATE FUNCTION dbo.GetDateFunction()

RETURNS DateTime

AS

BEGIN

   RETURN GetDate();

END

If I execute the following SQL Statement, what will I get?

SELECT Number, dbo.GetDateFunction() AS CurDate

FROM dbo.Numbers

WHERE Number <= 1000;

If you said that my results would be the same as in the last example, you would be incorrect.  Since we encapsulated the GetDate() function in our own user-defined function, SQL Server had to execute it for every row returned, not just once for the entire query.  This knowledge is fundamental to understanding the performance of the queries that you write as well as ensuring that you get the results that you expect.

Unfortunately, our problem was not this simple.  We were dealing with encryption and decryption functions.  Take the following query as an example:

SELECT RowID, DecryptByKey(EncryptedColumn)
FROM dbo.TableWithManyRows;

In this case, even though the DecryptByKey function is a built-in function, it will be called once per row (each individual value needs to be decrypted).  Given that, what would be the harm of having this function embedded in a user defined function like this:

CREATE FUNCTION dbo.DecryptByKeyFunction(@Data varchar(255))

RETURNS varchar(255)

AS

BEGIN

   RETURN DecryptByKey(@Data);

END

And using it like this:

SELECT RowID, dbo.DecryptByKeyFunction(EncryptedColumn)
FROM dbo.TableWithManyRows;

Implementing it this way would allow you to easily swap out the encryption functionality for your entire system by changing a single function rather than changing every query that uses this logic.

The problem is that while the inline DecryptByKey function is called for each row in the query, the key is only opened once for the entire statement.  If you put the DecryptByKey function in your own user defined function, the key will need to be opened for each function call rather than once for the entire query.  In our scenario, this caused a huge performance issue.

 

Summary

Knowing the behavior of built-in and User Defined Functions is extremely important to understanding the performance and output of the queries that you write.  I am not trying to tell you to never use UDFs – just be sure to test your solutions thoroughly and understand all of the “unintended consequences” of encapsulating logic inside a UDF before rolling your solution out – even to your developers.