Using Computed Columns to Fix Scalar Expression Estimation Errors


(Ok, let’s try something a bit more involved now.  Here’s a tip on how you can use computed columns to improve query plan quality in SQL Server 2005.  The optimizer supports building statistics on the results of expressions, and this can avoid debugging query plans later if you make sure that the system is doing this for complex expressions, user-defined expressions, or other places where the cardinality estimate isn’t matching what is actually happening at runtime.  Enjoy! Conor Cunningham):


 


Computed columns can be used to correct some cardinality estimation errors.  If the optimizer is incorrectly estimating a predicate (as examples, due to correlations or because the optimizer is guessing on the selectivity of a user-defined function), a computed column can give the optimizer a hook on which statistics can be stored.  Note that computed columns in SQL 2005 can be created without increasing the on-disk row width if the scalar expression meets specific requirements documented in Books Online (for example, it must be deterministic).


 


The following example demonstrates how computed columns can be used to affect cardinality estimates.  A few notes for the example:


 



  1. The plan is very simple in this example.  The cardinality estimate error does not impact the plan choice.  Please understand that this problem could negatively impact join order/algorithm in larger queries.

  2. User-defined functions need to be created WITH SCHEMABINDING or else the optimizer does not trust them at all (they could be changed without causing the plan to recompile, which is “not good”).

 


use tempdb


 


— create a table with 20000 rows in it


create table t1 (col1 int, col2 int)


declare @i int


set @i = 0


while @i < 20000


begin


insert into t1(col1, col2) values (@i, rand()*1000)


set @i = @i + 1


end


 


— drop function dbo.foo


— create a function.  The optimizer does not understand it, so


— it will cause guessing logic in plan generation.


create function foo(@a int) returns int


with schemabinding


as


BEGIN


set @a = 5;


return 155


END


 


— clean out the plan cache


dbcc freeproccache


 


set statistics profile on


— we under-guess on the UDF expression (assumes about 8% selectivity, actual is 100%)


select * from t1


where dbo.foo(t1.col1) = 155


 


(Abbreviated statistics profile output)


 


Rows                 Executes             StmtText                                                                   


——————– ——————– ———————————————–


20000                1                    select * from t1   where dbo.foo(t1.col1) = 155                            


20000                1                      |–Filter(WHERE:( foo(col1)=155))


20000                1                           |–Table Scan(OBJECT:(t1))                         


 


EstimateRows     


——————


1681.7928        


1681.7928        


20000.0          


 


 


— we over-guess on the UDF expression (assumes about 8% selectivity, actual is 0%)


dbcc freeproccache


select   * from t1


where dbo.foo(t1.col1) = 154


 


Rows                 Executes             StmtText                                                                  


——————– ——————– ———————————————–


0                    1                    select * from t1   where dbo.foo(t1.col1) = 154                         


0                    1                      |–Filter(WHERE:(foo( col1)=154))


20000                1                           |–Table Scan(OBJECT:(t1))                        


 


(estimates match previous example)


 


set statistics profile off


 


— add (non-persisted) computed column over expression.


— Note that the function has WITHSCHEMABINDING enabled (which causes the optimizer to trust the expression will not change over recompiles)


— Note2 – this could be persisted if the function were expensive.  Non-deterministic expressions are not persistable, though imprecise expressions are.


alter table t1 add c3 as dbo.foo(col1)


 


set statistics profile on


— exact same queries


— we correctly estimate 100% selectivity for this query


dbcc freeproccache


select * from t1


where dbo.foo(t1.col1) = 155


 


Rows                 Executes             StmtText                                                                                                         


——————– ——————– ———————————————–


20000                1                    select * from t1   where dbo.foo(t1.col1) = 155                                                                  


0                    0                      |–Compute Scalar(DEFINE:( c3=c3))                                  


20000                1                           |–Filter(WHERE:( c3=(155)))                                                          


0                    0                                |–CompSca(DEFINE:( c3=foo(col1)))


20000                1                                     |–Table Scan(t1))                                                     


 


EstimateRows           


————————


20000.0                


20000.0                


20000.0                


20000.0                


20000.0                


 


 


 


— we estimate 0% selectivity for this query (actually we have a floor to estimate at least one row)


dbcc freeproccache


select * from t1


where dbo.foo(t1.col1) = 154


 


Rows                 Executes             StmtText                                                                                                         


——————– ——————– ———————————————–


0                    1                    select * from t1   where dbo.foo(t1.col1) = 154                                                                  


0                    0                      |–Compute Scalar(DEFINE:( c3=c3))                                  


0                    1                           |–Filter(WHERE:( c3=(154)))                                                          


0                    0                                |–CompSca(DEFINE:( c3=foo(col1)))


20000                1                                     |–Table Scan(t1))                                                     


 


EstimateRows           


————————


1.0                


1.0                


1.0                


20000.0                


20000.0                


 


set statistics profile off


Comments (7)

  1. Conor Cunnigham kicks it off with:

    On behalf of the Query Optimization Team for Microsoft’s SQL Server…

  2. gauravbi says:

    Awesone … Good to see QO team blogging …. Hope to see good things in future.

  3. Of the questions that get asked on the forums many are wround query optimisation and trying to understand…

  4. Of the questions that get asked on the forums many are wround query optimisation and trying to understand…

  5. Of the questions that get asked on the forums many are wround query optimisation and trying to understand