Slow query performance because inaccurate cardinality estimate when using anti-semi join following upgrade from SQL Server 2000

 

We have a few customers who reported that some of their queries run slower following upgrade from SQL Server 2000 to SQL Sever 2005,  2008 and 2008 R2.   Specifically, queries experiencing the issue have anti-semi joins in the query plan and the join involves multiple columns as joining condition.

Anti-semi joins are results of query constructs like NOT EXISTS, NOT IN.   Here is an example of the query that would result in anti-semi join:

SELECT t1.*
FROM tst_TAB1 t1
WHERE NOT EXISTS( SELECT *   FROM tst_TAB2 t2     WHERE t1.c1 = t2.c1 AND t1.c2 = t2.c2 )

Note that you only experience this issue when multiple joining columns are involved in the join as the example above.

If you examine the query plan, you can spot the issue.  In this query execution plan output (re-arranged for ease of explanation),  the left anti semi join (merge join) returned 2808 rows but the EstimateRows only estimate 1 row.

image

Inaccurate estimate will impact overall query plan and potentially slow performance.

Solution:

This is a product regression and we have put out fixes for both SQL Server 2005, 2008 and 2008 R2.   Currently the fixes for SQL 2005 and 2008  are released.  Refer to KB https://support.microsoft.com/kb/2222998 for this fix.  SQL Server 2008 R2 fix is being planned and the same KB will be updated to reflect the fix once it becomes available.   Please note that you will need to enable trace flag 4199 to activate the performance fix.

 

Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support