Increase your SQL Server performance by replacing cursors with set operations




You have probably heard many times, from different sources, that as a best practice; avoid using TSQL cursors.


During a recent visit to a partner we ran into a common cursor case, which I wanted to use as an example to demonstrate why you should avoid TSQL cursors in most cases, and how to convert cursor logic to simple set join operations. Now there are certain scenarios where using a cursor makes sense. For example, a cursor is ideal for row by row processing that can’t be accomplished by set based operations. A cursor is flexible in that it provides a window, or subset, of data and that allows manipulation of the data in various ways. Study carefully what you want to achieve on case by case basis before using a cursor. Keep in mind SQL Server, as a modern RDBMS system, performs much better with set operations.


Here is simplified version of a real cursor that was used to update a big table with over 200 million rows.


DECLARE @EntityId Varchar(16)


DECLARE @PerfId Varchar(16)


DECLARE @BaseId Varchar(16)


DECLARE @UpdateStatus Int


 


DECLARE outerCursor CURSOR FOR


SELECT EntityId, BaseId


FROM outerTable


–Returns 204,000 rows


 


OPEN outerCursor


FETCH NEXT FROM outerCursor INTO @EntityId, @BaseId


 


WHILE @@FETCH_STATUS = 0


BEGIN


            DECLARE innerCursor CURSOR FOR


            SELECT PRFMR_ID


            FROM innerTable


            WHERE ENTY_ID = @BaseId


                       


            OPEN innerCursor


            FETCH NEXT FROM innerCursor INTO @PerfId


            SET @UpdateStatus = @@FETCH_STATUS


           


            WHILE @UpdateStatus = 0


            BEGIN


                  UPDATE 200MilRowTable


                  SET ENTY_ID = @EntityId


                  WHERE PRFMR_ID = @PerfId


           


                  FETCH NEXT FROM innerCursor INTO @PerfId


                  SET @UpdateStatus = @@FETCH_STATUS


            END


           


            CLOSE innerCursor


            DEALLOCATE innerCursor –clean up inner cursor


                       


            FETCH NEXT FROM outerCursor INTO @EntityId, @BaseId


END


 


CLOSE outerCursor


DEALLOCATE outerCursor –cleanup outer cursor


 


 


You might notice that this is a nested cursor with 204,000 loops in total for outerCursor. The innerTable has 10 million rows but innerCursor varies in number of loops depending on @BaseId of outerCursor. When I arrived at the customer this cursor had been running for over a day. The developer was “hoping” that it would finish soon given another day or two. The problem was nobody knew for sure how much time this thing would need to complete. Well, we can find out how much progress it has made so far to make an educated guess:


SELECT execution_count, st.text


FROM sys.dm_exec_query_stats as qs


CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st


WHERE st.text like ‘%FETCH NEXT FROM outerCursor INTO%’


 


This would tell us how many times the outer cursor has looped already. It returned 107. That’s only around 0.5% (107/204,000) after 30 hours of running. If the trend were to continue, the cursor would need another 8+ months!!!


A rewrite of the batch to take advantage of set operations is as simple as this:


SELECT i.PRFMR_ID, o.EntityId INTO #tempTable


FROM innerTable i join outerTable o on i.ENTY_ID = o.BaseId


Go


UPDATE 200MilRowTable


SET m.ENTY_ID = t.EntityId


FROM 200MilRowTable m join #tempTable t on m.PRFMR_ID = t.PRFMR_ID


Go


–note this is only one of a few ways to rewrite.


In this particular case, “SELECT INTO” is minimally logged under simple recovery mode. The two statement approach makes it easier to understand the conversion logic.


This batch took approximately 17 hours to complete. Between the statement, I also put the database into simple recovery mode and added appropriate indexes to the temp table. I also dropped indexes from 200MilRowTable that touched “ENTY_ID” to speed this up. Adding indexes back took another 7 hours. The total time was approximately 24 hours, which is just a small fraction of the original cursor batch. I need to point out that the non-cursor batch uses more resources since the UPDATE now spawns multiple threads to process parallely. Remember our goal here is to make this finish faster not worrying about how much resources it consumes.


Note: this might not be a perfect example because the nested cursor is magnifying the slow performance. 


However, the bottom line is; aviod cursors if possible and use joins / set operations whenever you can.


Cross Posted from http://blogs.microsoft.com/mssqlisv


Comments (2)

  1. LenInNZ says:

    A further improvement may be to do chunks of N(100000) at a time by using SET ROWCOUNT or TOP and a WHILE @@RowCount>0 with m.ENTY_ID != t.EntityId

Skip to main content