A Computed Column Defined with a User-Defined Function Might Impact Query Performance

Author: Kun Cheng

Reviewers: Shaun Tinline-Jones, Silvano Coriani, Steve Howard, Thomas Kejser, Sanjay Mishra

A computed column is computed from an expression that can use other columns in the same table. The expression can be a noncomputed column name, constant, function, and any combination of these connected by one or more operators, but the expression cannot be a subquery. A simple example of a computed column is:







A benefit of using computed columns is that they can save developers from having to write calculation logic at the application layer; at the table level, the application can simply reference the computed column. In addition, performance can be improved by persisting a computed column and building an index on it (see “Creating Indexes on Computed Columns” at  http://msdn.microsoft.com/en-us/library/ms189292.aspx). In real-world scenarios (unlike the simple example above), a common way to define a computed column is by using a user-defined function (UDF) to encapsulate the calculation logic. For example, the script below uses an XQuery expression to extract a summary description as a computed column out of the XML data, which is stored in another column of the same table. Once the computed column is persisted to the physical storage of the table, we can build an index on top of it to speed up queries that access the computed column.

CREATE FUNCTION udf_compcol(@contents XML) 
RETURNS nvarchar(255)
RETURN @contents.value('(/item/description/text())[1]', 'nvarchar(255)')
ADD summary_compcol AS udf_compcol(Contents) PERSISTED
CREATE INDEX ix_Books_summary ON Books(summary_compcol)

In this example, we are promoting the “description” element out of the XML data to a relational column for performance gain, because querying against XML semi-structured data is slower than querying against structured data. However, the computed column might behave in an unexpected way that can impact query performance. One example of this unexpected behavior is discussed in the blog post “Query Plan for Persisted Computed Column Query” at  http://sqlcat.com/sqlcat/b/msdnmirror/archive/2011/09/01/quoted-identifier-causes-unexpected-query-plan-for-persisted-computed-column-query.aspx.

In this blog post, I will describe another unexpected behavior of computed column that occurs in a different scenario. On a multi-core server, which is typical for a SQL Server environment, the following XQuery generates a parallel plan as it scans the XML data.

SELECT title FROM Books WHERE Books.Contents.exist(N'/item/isbn[.=7646309]') = 1


With the addition of the computed column defined by the UDF udf_compcol, the same query generates a sequential plan, which is identical to the original plan except it is not parallel.

The reason that the query generates a sequential plan in this case is that when SQL Server compiles a query plan, it loads all referenced column expressions and determines if there is a UDF associated with any computed column(s). If there is, SQL Server disables the parallel plan and generates only a sequential plan. This is what happened with our query, which referenced a table containing a computed column defined with a UDF. Note the behavior is the same regardless of whether computed column is persisted or not.

In this case, we were able to tune the query by promoting the “/item/isbn” element out of the XML data to another relational column so the query didn’t need to scan the table in the first place. 

Here are some general recommendations:

  • Use computed columns wisely, and evaluate their performance impact with comprehensive testing against an established baseline.
  • If possible, use Transact-SQL instead of UDFs to define computed columns.
  • When promoting elements out of XML data to computed columns, a UDF is required. Consider tuning the impacted queries with XML indexes or promoting the referenced elements to relational column(s).