Lessons learned updating 100 millions rows.
Microsoft SQL Server Escalation Services
Yes, post mostly Replication or High Availability topics, but I found this interesting enough to share on my ReplTalk blog. These are the troubleshooting and resolution we used to help a customer explore faster ways to update a 100 million row table.
Customer was running UPDATE query affecting 100 million rows. This query was the last step in an end-user application upgrade from SQL 2000 to SQL server 2008. The UPDATE was estimated to take DAYS not HOURS to complete
We were called to explore how to make this update run faster.
We learned the database was restored from SQL Server 2000 to SQL Server 2008. However, tables statistics were not updated following the RESTORE. Initially started running updated statistics for all table using:
EXEC sp_MSforeachtable 'UPDATE STATISTICS ? WITH FULLSCAN'
However, this was taking time, instead of waiting, we decided to update statistics on the problem table from a different session. The UPDATE STATISTICS on our problem table took 3:36 hrs to complete. Next time we may have selected SAMPLE to reduce update stats time.
For testing, we decided to proceed with small batch UPDATE with EXPLICIT Transaction:
SET end_flag = 'Y'
WHERE end_flag = 'N' AND
queue_state ='END' AND
mod_datetime > '2001-05-01-00.00.000000' AND
mod_datetime < '2001-09-01-00.00.000000' AND
In our original test, this query took 15 minutes to complete. After running Update Stats the 1.5 million rows were updated in about 6 minutes. However, we still estimated the UPDATE will take about 10 hours.
In order to get more improvement, we ran UPDATE from multiple session in parallel as Batches.
What problem do you think we encountered?
Started Session 1 without any problems, however when we started session 2 it was blocked. Blocking analysis shows Session 1 had an EXCLUSIVE lock on the table. Why a table lock? The UPDATE was touching so many rows SQL escalated the row lock to a table lock.
- Smaller batches
- Disable lock escalation using Trace Flag
- Rowlock Table hint (BOL: Specifies that row locks
are taken when page or table locks are ordinarily taken. )
We decided to disable lock escalation using SQL Sever trace flag.
Consequences? Generate a large number of page locks, but was okay because server was running 64bit SQL and had lots memory to spare.
We're now able to run UPDATE from two different sessions without blocking. We updated about 25 million rows per hour. This reduce original UPDATE from days to hours and the “after UPDATE STATS” and from 10 hours to 4 hours by running multiple sessions with Lock Escalation disabled.