Using TSQL Variables: Subtle BUG when assigning values to Variables via a Query

I’m unsure if this is a BUG or “By Design” but for me it is unexpected & likely to cause your application to hang if you aren’t aware of it. So one more article in my “TSQL Best Practise” or “Good TSQL Coding Standards” posts.

Summary:

If you assign a value to a variable using a SELECT statement, the variable is not set to NULL if no rows are returned, instead it keeps the existing value. ie: This type of code is an accident waiting for somewhere to happen.

 SELECT @Test = RetVal FROM VarSource WHERE Lookup = 'Exists1'

Where I saw it recently was in the middle of a WHILE loop. Simply put they were recursively walking up a hierarchical tree, looking for the parent. eg: SET @Key = ManagerID … WHERE EmpID = @Key. When it found a manager with appropriate level of signing authority they set the @key = 0 which terminated the loop. Unfortunately when a mid-level manager reported to an overseas boss (not in the same legal entity so not in the DB), instead of setting ManagerID to zero they set it to his EmployeeID. Thus a change in data caused infinite recursion. As their loop code didn’t have any MAXRecursion checks & they hadn’t written this code using Common Table Expression (CTE), it crashed the app. Another tip would be to connect with a query timeout less than infinite & always set the Workstation & Application properties in the Connection, to aid with troubleshooting.

Details:

There are 2 reasons why the above is bad practice.

  1. The query may return multiple rows & you’ve not indicated what to do with the extra rows.
  2. The query may return zero rows & you are unlikely to realise you are working with the previous value.

So unless you can guarantee, via unique constraints etc that you will get precisely 1 row to match your query, its better to code this differently.

I suggest the following alternatives.

Alternative 0: Always reset the variable to a known state before you use it.

 SET @Test = NULL
SELECT @Test = RetVal FROM VarSource WHERE Lookup = 'NotExists1'
If @Test IS NOT NULL BEGIN ....

 This may work but it sucks, choose a better, more readable alternative. 

Alternative 1: Wrap the assignment in a SELECT.

 SELECT @Test = ( SELECT RetVal FROM VarSource WHERE Lookup = 'NotExists1')

This will protect you against the ZERO Rows returned by ensuring that the @Test IS NULL & not left at some previous value. It is also easy to modify your existing code & unlikely to require immense testing. But leaves you exposed to Multiple results.

 

Alternative 2: Wrap the assignment in a SELECT TOP(1).

 SELECT @Test = (SELECT TOP(1) RetVal FROM VarSource WHERE Lookup = 'NotExists1')

Same as above but now explicit in what to do if multiple rows returned.

 

Alternative 3: Use an aggregate function – MIN, MAX, AVG

 SELECT @Test = MAX(RetVal) FROM VarSource WHERE Lookup = 'NotExists1'

This is explicit about what to do with multiple rows, & returns NULL where no rows returned. But may be a pain if your syntax was also assigning other variables & you needed them to be non-aggregate values. Clearly MAX() & MIN() would be more predictable than AVG()

 

Alternative 4: Don’t use a variable at all. Use a Join

In many cases I see code that sets a variable in one query (like @StoreID) only to use it as a Key value in another.

 SELECT @StoreID = StoreID FROM dbo.Stores WHERE Name = 'City'

SELECT * FROM dbo.Employees WHERE StoreID = @StoreID

could be rewritten as

 SELECT * 
FROM dbo.Employees AS E 
JOIN dbo.Stores    AS S ON S.StoreID = E.StoreID
WHERE S.Name = 'City'
 or
 SELECT * FROM Employees 
WHERE StoreID IN (SELECT StoreID 
                  FROM Stores WHERE Name = 'City')

=========================================================

Another Common use passing a value from a calculation. 

 SELECT @TotalEmps = COUNT(ID) FROM dbo.Employees

SELECT 'PctEmp' = Headcount / @TotalEmps
FROM dbo.Stores

which could be rewritten as

 SELECT 'PctEmp' = Headcount / ( SELECT COUNT(ID) 
                                FROM dbo.Employees )
FROM Stores

OK. I realise that that would work fine & is not affected by the bug mentioned above. But I’m sure some people could have lookup values or similar that would break. I find it challenging to think up illogical ways to write TSQL, for that I look for inspiration in TSQL written by others. Just as professional C Programmers might look at the OO design of my C code & use it for crappy code in their refactoring blogs. I find them to be a constant source of “Wow I didn’t even think that would work, I’m not surprised perf sucks.”

If I hear you thinking “Yer but my queries are way more complex & the increased complexity of even more joins would kill me. I suggest you read up on the “WITH” statement. It allows you break up your joins into easy to understand parts & is way more efficient than creating TEMP tables. Not to be confused with the “WITH” clause that lets you put query hints in your joins (like the dreaded NOLOCK hint )

-- ================================================================

If you’d like to repro this, Sample Code included below.

 -- ==============================
-- Purpose: Demonstrate BUG/Function of 
--          Variable not changed / Set to NULL when no rows returned.
-- AUTHOR:  David Lean
-- Date:    21 Aug 2009
-- ==============================

USE tempdb
GO

CREATE TABLE dbo.VarSource (
     Lookup VARCHAR(20) NOT NULL UNIQUE
    ,RetVal INT NOT NULL  
)
GO
-- 2005 Compatible syntax
INSERT dbo.VarSource VALUES('Exists1', 1);
INSERT dbo.VarSource VALUES('Exists2', 2);
GO

DECLARE @Test INT

-- Ensure it has a value
SELECT @Test = 99

-- Test1: See that the row is changed when a row is returned.
SELECT 'Value Before Test 1' = @Test
SET @Test = NULL
SELECT @Test = RetVal FROM dbo.VarSource WHERE Lookup = 'Exists1'
SELECT 'Value After Test 1 worked correctly' = @Test

-- Test2: BUG? See that the row keeps previous vaule when No rows returned.
SELECT 'Value Before Test 2' = @Test
SELECT @Test = RetVal FROM dbo.VarSource WHERE Lookup = 'NotExists1'
SELECT 'Value After Test 2 – BUG It did not change to NULL' = @Test

-- Test3: See that the row is changed when No rows returned.
SELECT 'Value Before Test 3' = @Test
SELECT @Test = RetVal FROM dbo.VarSource WHERE Lookup = 'Exists2'
SELECT 'Value After Test 3 worked correctly' = @Test

-- Test4: SELECT Workaround See that the row is changed when No rows returned.
SELECT 'Value Before Test 4' = @Test
SELECT @Test = ( SELECT RetVal FROM dbo.VarSource WHERE Lookup = 'NotExists1')
SELECT 'Value After Test 4 = (SELECT ...)' = @Test

-- Test5: Aggregate Workaround See that the row is changed when No rows returned.
SET @Test = 99;
SELECT 'Value Before Test 5' = @Test
SELECT @Test = MAX(RetVal) FROM dbo.VarSource WHERE Lookup = 'NotExists1'
SELECT 'Value After Test 5, = MAX()' = @Test

-- Test6: TOP 1 Workaround See that the row is changed when No rows returned.
SET @Test = 99;
SELECT 'Value Before Test 6' = @Test
SELECT @Test = (SELECT TOP(1) RetVal FROM dbo.VarSource WHERE Lookup = 'NotExists1')
SELECT 'Value After Test 6, = (SELECT TOP(1)...)' = @Test

/*
-- NOTE: Setting Variables like this is a Poor Practice. 
-- If the query returns multiple rows only 1 will be used, the others ignored. 
-- It is better to either :-
a) be explicit on which row to use with aggregate: SELECT @Test = SELECT MAX(RetVal)
b) JOIN with the Lookup Table 
        WHERE key IN ( SELECT RetVal FROM VarSource WHERE Lookup = 'NotExists1')
*/
-- ================================================================

I hope you find this useful. As always I’m interested in your thoughts, comments &/or disagreement.

David Lean

Technorati Tags: TSQL,Best Practise,Variables,BUGs