Improving query plans with the SCHEMABINDING option on T-SQL UDFs

This blog describes how the SCHEMABINDING option specified during creation of T-SQL UDFs may affect query plans involving these UDFs in SQL Server 2005. Armed with this knowledge, you may find that you can dramatically improvement your query performance for free.

Before I jump into the details, let me briefly summarize the take-away: If you are using simple T-SQL UDFs that do not touch any tables (i.e. do not access data), make sure you specify the SCHEMABINDING option during creation of the UDFs. This will make the UDFs schema-bound and ensure that the query optimizer does not generate any unnecessary spool operators for query plans involving these UDFs.

The BizTalk folks ran into this (https://blogs.msdn.com/BizTalk_Core_Engine/) recently, so I thought I’d take a deeper look. Let’s dig into the details.

Take a look at the following example for creating a simple T-SQL UDF:

CREATE FUNCTION dbo.ComputeNum(@i int)

RETURNS int

BEGIN

  RETURN @i * 2 + 50

END

When creating the UDF, we have the option to bind the UDF to the schema of the underlying objects to which it refers. This is done using the SCHEMABINDING option. For UDFs that are schema-bound, any attempt to change the underlying objects’ schema will raise an error. Using this option ensures that the UDF will not inadvertently break due to changes of an underlying object’s schema.

In SQL Server 2005, there are a couple of derived properties associated with whether the UDF is schema-bound: SystemDataAccess indicates whether the function accesses system data (system catalogs or virtual system tables), and UserDataAccess indicates whether the function accesses user data.

These properties can be seen via:

SELECT OBJECTPROPERTYEX(OBJECT_id('<MyFunction>'), 'SYSTEMDATAACCESS')

And

SELECT OBJECTPROPERTYEX(OBJECT_id('<MyFunction>'), 'USERDATAACCESS')

In our example above, because we did not specify the SCHEMABINDING option, both of these properties get set to 1, even though the function itself does not do any data access. Why? Since the UDF is not schema-bound, there’s no way to ensure that the underlying schema (including the schema of any underlying UDFs or views that this UDF may call) did not change since its creation. This means that we would have to derive these properties at runtime during every execution of the UDF. To avoid this performance penalty, we simply mark the UDF as accessing data to be on the safe side and do not attempt to derive these properties at runtime.

So, why are these properties interesting? Because they affect query plans involving the UDF. In the above example, because these properties are set to 1, the query optimizer assumes that the function may access data and will add a spool operator in order to protect itself from any data changes (also known as “Halloween protection”). Consider following UPDATE statement that uses our UDF:

CREATE TABLE t(col int);

CREATE INDEX i_t ON t (col ASC);

UPDATE t SET col = col + 5 WHERE dbo.ComputeNum(col) > 50;

If you look at the query plan generated for the above UPDATE statement, you’ll see that we’re spooling the results, even though the UDF itself does not do any data access.

Making the UDF schema-bound, as in the following example, forces SQL Engine to analyze the body of the UDF and properly set these derived properties. You’ll avoid the unnecessary spooling and may see a very significant performance gain:

CREATE FUNCTION dbo.ComputeNum(@i int)

RETURNS int

WITH SCHEMABINDING

BEGIN

  RETURN @i * 2 + 50

END

In summary, specify the SCHEMABINDING option for T-SQL UDFs to ensure that plans using these UDFs do not do unnecessary spools.