I realize the title is generic and that is because the problem is generic. On the surface it would not surprise me that an update takes longer than a select. (A little bit anyway.)
There is logging, updates to index rows, triggers, replication needs, AlwaysOn needs, perhaps page splits and even re-ordering when the key values are changed that can take place. All of this leads to additional work and elongates the processing.
However, I am taking about a situation where the update takes significantly longer than the select and it seems excessive when you start looking at the actual number of I/O operations and such that are taking place.
For example: Select query runs in 22 minutes but update takes 140 minutes. When I first looked at this and similar issues, I found it hard to believe the additional update-related processing added that much time.
The customer had uncovered that if they set the SQL Server's max server memory setting to 2GB the update ran in ~45 minutes instead of 145 minutes. - Keep this in mind.
I started looking at the plan differences between the update and the same select criteria. I didn't see anything that popped out as significantly different for obtaining data on the select side of the input and the update side of the plan looked reasonable.
I then looked at the plan differences between the 2GB server memory setting and a larger memory setting - No differences, the same plan was being used. - Hmmm - interesting.
I looked a the statistics time and I/O outputs to see if something significant could be uncovered. The I/O was about the same but there was a significant difference in the CPU usage between the update and the select.
Back to the plan for the update. I was looking to see if it was possible the update portion of the plan could drive CPU if we had fetched the pages into memory. Clearly I can come up with an update that touches a small number of pages, gets them locked into buffer pool memory and then updates the same rows many times; driving CPU and not physical I/O. - This was not the case for the scenario presented. I had to update millions of rows to reproduce the problem.
Started tracing the activity to see what other things were going on. What I saw was lock escalation taking place when the query ran faster, under the 2GB SQL Server max server memory setting.
Now I had a pretty good idea that locking played a role in all of this. I then enabled trace flag (-T1211 - use with caution) to disable lock escalation and I could cause the same issue on the SQL Server's max server memory setting to 2GB installation.
Using debugging tools I captured the execution of the scenario and looked at those code paths using the most CPU resources. What I found was a code path related to creation and destruction of a lock class (but unfortunately there are no XEvents or Trace events in this area.)
Note: I did file work items with the development team to expose this activity.
At a high level , a lock class is a container of locks for a given part of a plan. It is often used to protect a plan from changing data as the data passes from one portion of the plan to the next. (Think hash, sort, spool … type of operations).
Let's discuss the following example:
update t set t.strData = CAST(t.iID as varchar(10)) from tblTest t join tblTest t2 on t2.iID = t.iID
||--Table Update(OBJECT:([tempdb].[dbo].[tblTest] AS [t]), SET:([tempdb].[dbo].[tblTest].[strData] as [t].[strData] = [Expr1006]))|
| |--Top(ROWCOUNT est 0)
|--Compute Scalar(DEFINE:([Expr1006]=CONVERT(varchar(10),[tempdb].[dbo].[tblTest].[iID] as [t].[iID],0)))
|--Nested Loops(Left Semi Join, WHERE:([tempdb].[dbo].[tblTest].[iID] as [t].[iID]=[tempdb].[dbo].[tblTest].[iID] as [t2].[iID]))
|--Table Scan(OBJECT:([tempdb].[dbo].[tblTest] AS [t]))
|--Table Scan(OBJECT:([tempdb].[dbo].[tblTest] AS [t2]))
I have highlighted the table spool because this is where a lock class can appear (which you can't see). What the SQL Server does is look at the locking strategy established by the session and the plan and if necessary upgrade the strategy to 'repeatable read' during portions of the plan.
In this case the rows for the update are being fed via a table spool. This means SQL Server does not want to release the lock on the rows flowing through the spool until it has completed the proper update(s). If the lock was not held through the table spool to the update level, the data could change as it is held in the spool.
The problem is not the lock class. Even if the isolation level is upgraded to 'repeatable read', during the window, resulted in reduced concurrency; that in and of itself won't lead to increased CPU usage.
The CPU is coming from the release activity associated with the lock class. Once the lock class is no longer needed the SQL Server releases it and in turn the references to the appropriate locks are released.
Clearly I can see that lock escalation will reduce the number of locks and reduce this work. One workaround but probably not what most folks want to do in a production environment.
My first thought was how many locks is the update requiring that would cause us to do a lot of work during release? In studying the lock acquired and released events for the update statement I found that it was only a handful. So again, why the large CPU burn? - The problem was the update was inside a cursor loop that executed millions of times and all of this was under a single transaction.
Another workaround I found was to use smaller transactions, but more of them. I found this odd as I am still doing the same amount of work just in smaller chunks. Smaller chunks would help avoid lock escalations and I thought it would make it worse based on previous testing.
What is happening is the lock class has a bug. It is not properly releasing just the locks it acquired. It is running the entire lock list for the transaction. Because the locks acquired before the update don't have a lock class association there is nothing to do for those locks. The locks are properly maintained, the SQL Server is just running the entire lock list instead of the portion associated with the lock class.
In this reproduction case there are 400,000 locks acquired before the cursor execution driving the updates. This means that each update will run ALL 400,000 lock structures and find nothing to do. As the next update occurs and SQL Server does it all over again, burning the unnecessary CPU. Based on this behavior; when I dropped the size of the transaction I reduced the number of locks and as such the number of CPU cycles.
Note: Bug filed with the SQL Server development team. Internal testing, with the bug fix, shows the query taking 140 minutes consistently runs in 32 minutes without any T-SQL code changes. This is scheduled to be released in an upcoming release (~ Nov 2012).
Is This My Problem? Am I a Candidate for the Fix?
There are some things you can do to see if your statements are encountering this problem.
The easiest way to test is to 'set transaction isolation level repeatable read' at the session level and then run the set of queries in question. Repeatable read may use more lock memory but it also acts as a broad 'lock class' for this test. I have also used snapshot isolation with near repeatable read results.
If the query has some of the before mentioned operations, runs significantly faster and uses far less CPU there is a good chance it is a candidate for the fix.
Without the fix you can use the transaction isolation levels, smaller transactions, or even locking hints to help control the behavior.
Bob Dorr - Principal SQL Server Escalation Engineer
assistance provide by Keith Elmore - - Principal SQL Server Escalation Engineer