Query Performance – Data, Design, and SQL Server

Recently I was helping a customer design a database with a very large number of database objects – hundreds of schemas, a few thousands of stored procedures, hundreds of partitioned tables, with most tables containing between two hundred and a thousand partitions. Once all these objects were created, I had to write a catalog metadata query that went something like this: “Find the partition of a table in schema A that has the largest number of pages of all partitions in that table, with the limitation that the table must have more than one partition and may not be referenced by a foreign key constraint, and has a view in schema B with the same name.” (As little sense as this probably makes, there was a perfectly good reason why this query was needed.)

With the plethora of catalog views and DMVs in SQL Server, writing this query wasn’t particularly hard. In this case, it involved sys.schemas, sys.tables, sys.views, sys.partitions, sys.indexes, sys.dm_db_partition_stats, and sys.data_spaces. However, while the query returned correct results, its performance was very poor: it took about twenty seconds to return the row I was after. Looking at the query plan, I noticed something that would normally be a direct clue to the cause of poor performance: while the actual number of rows passing through various iterators in the plan varied, the estimated number of rows was always 1. In other words, the estimates used by the optimizer were inaccurate, resulting in suboptimal plan and poor performance.

If the query used regular user tables, the next troubleshooting step would be to update statistics on these tables in the hopes of giving the optimizer better estimates. But in this case, the query used only catalog views and DMVs, and on first glance, there was no statistics in sight. However, looking at the query plan, I saw that the objects referenced in the iterators were actually hidden system tables, i.e. sys.sysclsobjs, sys.sysidxstats, sys.sysschobjs, sys.syssingleobjrefs, etc. While it is not possible to query these tables directly (unless using the DAC connection), their definition can still be viewed with the sp_help stored procedure. The result from sp_help includes the indexes defined on these tables, and for each index, corresponding statistics can also be viewed with DBCC SHOW_STATISTICS.

When I did that, the reason for inaccurate estimates was clear – even though most tables had a rather large number of rows corresponding to the multitude of objects in the database, in many cases the statistics were dated prior to the time when database objects were created, and showed completely inaccurate density and distribution. Once I ran UPDATE STATISTICS on each index of the system tables, the query returned in less than a second.

Recently I had to troubleshoot a query that performed much slower than expected. The solution to that seemed sufficiently interesting to warrant a write-up, and the root cause of the problem reinforced a well-known best practice of avoiding scalar user-defined functions in set-based queries.

The original query looked like this:

SELECT 'Literal1', u.Col2 FROM dbo.Table1 AS u INNER JOIN dbo.Table2 AS l ON u.Col2 = l.Col2 WHERE l.Col3 = 'Literal2' AND NOT EXISTS ( SELECT 1 FROM dbo.Table2 AS l2 WHERE u.Col2 = l2.Col2 AND Col3 = 'Literal1' ) AND dbo.UDF1(u.Col1) IS NOT NULL;

Table1 had about 3 million rows, and Table2 had about 7 million rows. In its original form, the query was running for more than one hour, while the requirement was that it would complete within minutes.

The scalar user-defined function (UDF) in the WHERE clause immediately stood out as a warning sign (highlighted). Using scalar UDFs in set-based queries is generally not recommended for two reasons:

The function is invoked for each row in the set, and each invocation has a fixed and not insignificant cost, which is in addition to the cost of the statements in the function body.
The optimizer cannot accurately estimate the cost of the function (which could vary greatly), therefore it may come up with less than optimal query plans.

For the query above, removing the UDF from the WHERE clause brought down the execution time to 12 seconds. Looking at the query plan, it was clear why it took so long to execute the original statement – a full clustered index scan of Table1 (all 3 million rows of it) was followed by a filter operator, where the UDF was applied to the Col1 column of Table1. In other words, rather than first reduce the set via joins with Table2 and then apply the UDF to that much smaller set, the optimizer chose to first filter millions of rows with the UDF. This is a fairly typical example of a sub-optimal plan caused by an inaccurate estimate (reason #2 above), which was in turn caused by the usage of an “opaque” scalar UDF.

The obvious workaround for this problem would be to remove the UDF-based filter from the WHERE clause, insert the resulting intermediate result set into a temporary table, and then select from that temporary table, filtering the much smaller intermediate result set with the UDF. However, I wanted to avoid a procedural approach, which would be more complex and less robust than a single set-based statement.

The first attempt to come up with a single statement that avoids the original problem looked like this:

SELECT 'Literal1', up.Col2 FROM ( SELECT u.Col1, u.Col2 FROM dbo.Table1 AS u INNER JOIN dbo.Table2 AS l ON u.Col2 = l.Col2 WHERE l.Col3 = 'Literal2' AND NOT EXISTS ( SELECT 1 FROM dbo.Table2 AS l2 WHERE u.Col2 = l2.Col2 AND l2.Col3 = 'Literal1' ) ) AS up WHERE dbo.UDF1(up.Col1) IS NOT NULL;

Here we have a subquery in the FROM clause, and apply the filter with the UDF “on top” of the subquery, in the WHERE clause of the outer query. The intent is to evaluate the subquery first, before applying the filter. Unfortunately, this does not work – this query is logically equivalent to the original query, and SQL Server is at liberty to evaluate the predicates in the order that it deems optimal, which in this case still means filtering millions of rows in Table1 with the UDF-based filter, before joining with Table2.

Here is the query that did work (changes to previous attempt are highlighted):

The reason this succeeds in forcing the optimizer to evaluate the UDF filter after reducing the row set via joins is because we pass an expression, rather than a base column, to the function. The passed expression in this case is logically equivalent to the base column, but because it contains a reference to a column from a seemingly pointless outer query, the optimizer cannot push down the UDF-based filter to the inner subquery. In the plan generated for this query, the UDF executes only against a few hundred rows produced by the inner subquery, and the statement completes in 30 seconds – well within required time. Note that it wouldn’t be sufficient to use a literal empty string as the second part of the expression – to avoid predicate pushdown, it has to be a column from an outer query.

As a conclusion, this could be a useful query tuning technique for the specific case when the optimizer pushes down a filtering predicate (not necessarily UDF-based), and the resulting plan is less optimal than the one where the filter is evaluated later in the plan.

Tag: Query Performance

Statistics on system tables and query performance

Query performance, scalar UDFs, and predicate pushdown