Performance Benefits of using Expression over User Defined Functions

Summary

No matter how simple a UDF is, there's a large performance penalty paid when they're used. This penalty shows up as poor query execution time when a query applies a UDF to a large number of rows, typically 1000 or more. The penalty is incurred because the SQL Server database engine must create its own internal cursor like processing. It must invoke each UDF on each row. If the UDF is used in the WHERE clause, this may happen as part of the filtering the rows. If the UDF is used in the select list, this happens when creating the results of the query to pass to the next stage of query processing. It's the row by row processing that slows SQL Server the most. In many cases an expression can replace the functionality that the UDF provides and offer a significant performance benefit over a UDF (see examples below).

More Information

SETUP FOR THE TEST

======================

--- Create Test Table

CREATE TABLE UDF_parameters (param1 int, param2 int, param3 int, param4 int)

-- Create Clustered Index

CREATE CLUSTERED INDEX CLU1 ON UDF_parameters(Param4)

--Code for the UDF.

CREATE FUNCTION dbo.divide_func(@numerator as int, @denominator as int, @default as float)

RETURNS float

BEGIN

                if @denominator = 0

                                RETURN @default

                if @numerator = 0

                                RETURN @default

                RETURN @numerator/@denominator

END

--- Insert records in the table ----

declare @count int

set @count =1 -----> This will insert 1 million records to the table

while @count <1000001

begin

                INSERT INTO UDF_parameters values(@count+3,@count+2,@count+1, @count)

                set @count = @count+1

end

Using UDF in Query SELECT list

SET STATISTICS TIME ON

GO

select PARAM1, DEVIDED_VALUE = dbo.divide_func(param2,param3,CAST(param4 as float)), param4 from UDF_parameters

GO

SET STATISTICS TIME OFF

GO

Rows Executes StmtText

------ -------- ---------------------------------------------------------------------------------------------------------------

500000 1 select PARAM1, DEVIDED_VALUE = dbo.divide_func(param2,param3,CAST(param4 as float)), param4 from UDF_parameters

500000 1 |--Compute Scalar(DEFINE:([Expr1004]=[TEST1].[dbo].[divide_func](parameter... )

500000 1 |--Clustered Index Scan(OBJECT:([TEST1].[dbo].[UDF_parameters].[CLU1]))

If you notice the plan, we would see that the Function call is being made, for all the rows being returned by the Clustered Index Scan.

If you take a profiler trace while running this statement, you would notice the multiple executions of the UDF. (Image at the end of the Page)

Query with UDF in Where Clause

SET STATISTICS TIME ON

GO

select PARAM1,param2,param3, param4 from UDF_parameters where dbo.divide_func(param2,param3,CAST(param4 as float)) = 1

GO

SET STATISTICS TIME OFF

GO

Rows Executes StmtText

------ -------- -------------------------------------------------------------------------------------------------------------------------------------------------------------500000 1 select PARAM1,param2,param3, param4 from UDF_parameters where dbo.divide_func(param2,param3,CAST(param4 as float)) = 1

500000 1 |--Filter(WHERE:([TEST1].[dbo].[divide_func]([TEST1].[dbo].[UDF_parameters].[param2],[TEST1].[dbo].[UDF_parameters].[param3],CONVERT(float(53),[TEST1].[dbo].[UDF_parameters].[param4],0))=(1.000000000000000e+000)))

500000 1 |--Clustered Index Scan(OBJECT:([TEST1].[dbo].[UDF_parameters].[CLU1]))

      Notice a similar behaviour in multiple executions of the UDF, when we have the UDF in the where clause. (Image at the end of the Page)

Using Expression In Query

SET STATISTICS TIME ON

GO

SELECT PARAM1,  

DEVIDED_VALUE =

                   CASE

                                WHEN PARAM3=0 THEN param4

                                WHEN param2=0 THEN param4

                                ELSE param2/param3

                   END ,

PARAM4

from UDF_parameters

GO

SET STATISTICS TIME OFF

GO

Rows Executes StmtText

------ -------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

500000 1 SELECT PARAM1,DEVIDED_VALUE = CASE WHEN PARAM3=0 THEN param4 WHEN param2=0 THEN param4 ELSE param2/param3 END ,PARAM4 from UDF_parameters

0 0 |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [TEST1].[dbo].[UDF_parameters].[param3]=(0) THEN [TEST1].[dbo].[UDF_parameters].[param4] ELSE CASE WHEN [TEST1].[dbo].[UDF_parameters].[param2]=(0) THEN [TEST1].[dbo].[UDF_parameters].[param4] ELSE [TEST1].[dbo].[UDF_parameters].[param2]/[TEST1].[dbo].[UDF_parameters].[param3] END END))

500000 1 |--Clustered Index Scan(OBJECT:([TEST1].[dbo].[UDF_parameters].[CLU1]))

Notice, how using a case expression, has only on execution.

Comparison of the time taken by the Different Statements.

Rows Returned

WITH UDF in Select List

UDF in Where Clause

With Expression

0.5 Million

26 seconds

26 seconds

4 seconds

1.0 Million

52 seconds

52 seconds

8 seconds

1.5 Million

86 seconds

87 seconds

12 seconds

Snapshots of Profiler Traces

==============================

As we can see, the query when using the function is slower than the one using the CASE expression.

In this case I was using very simple parameters, but if we have some complex parameters (like those involving mathematical operation) we would see the response time increase even more.

by
Sourabh Agarwal
Technical Lead, Microsoft SQL Server