Finding which queries were executing from a SQL Memory Dump – revisited

A message about the previous post on finding queries in a SQL Memory Dump file:

Back in October I did a post on finding queries in SQL Server Memory Dumps. It has been a popular topic, but some have had difficulties following it. After reviewing this with a reader, I realized there was an issue with this post for 64 bit dumps. Rather than make corrections, I decided I’d do another example here so we can learn more from the issue with that post and use this opportunity to view another dump.

The dump analysis detailed in that post worked as described in the post. However, in my attempt to convert the necessary steps taken from using internal symbols to something that can be applied using public symbols (dumping the right addresses with the right offsets), I mistakenly used the ‘dd’ command. The ‘dd’ command dumps what we call double words. Double words are 32 bits – or 32 bit addresses. The architecture of the system used for that post is 64 bit. Ironically this worked fine and went undetected in my previous example because the higher order digits were all zero – likely because my test box’s memory had not been stressed enough to use any higher range memory addresses. Therefore the lower 8 digits in the hex number (the right half) representing our memory addresses worked with ‘dd’ because the bottom half is 32 bits (0xFFFFFFFF) and the leading zeros in the higher 32 bits were not significant. Then when applying this method to a system that was under greater memory pressure, we begin to see those higher ordered digits on a 64 bit machine being used. The ‘dd’ doesn’t work for this. So, we’ll now use ‘dq’ to dump quad words. This will accurately display our 64 bit memory addresses and work more reliably on 64 architecture. The previous post would work fine for 32 bit architecture. Remember, we only need to use ‘dq’ with 64 bit addresses. Otherwise, ‘dd’ is appropriate for 32 bit addresses.

Finding your query in a dump file:

I’d like to thank the read mentioned above, Dylan, for allowing me to write this blog post using his SQL Memory dump as a sample (so we’ll have the higher addresses ranges in use).

But first the basics, we first setup our symbols. Once this is done, I see this dump is a “symptom” or “SPID” mini-dump in that it contains only the thread that encountered the problem:

0:000> ~
. 0 Id: 12a8.1870 Suspend: 0 Teb: 000007ff`fff0a000 Unfrozen

If we view the stack trace of that thread to see what was going on:

0:000> kc
Call Site
ntdll!ZwWaitForSingleObject
KERNELBASE!WaitForSingleObjectEx
sqlservr!CDmpDump::DumpInternal
sqlservr!CDmpDump::DumpFilter
sqlservr!CDmpDump::Dump
msvcr80!__C_specific_handler
ntdll!RtlpExecuteHandlerForException
ntdll!RtlDispatchException
ntdll!RtlRaiseException
KERNELBASE!RaiseException
sqlservr!CDmpDump::Dump
sqlservr!SQLDumperLibraryInvoke
sqlservr!CImageHelper::DoMiniDump
sqlservr!stackTrace
sqlservr!utassert_fail
sqlservr!FullXactImp::Rollback
sqlservr!CMsqlXactInternalBase::RollbackInternal
sqlservr!CMsqlXactInternalReadWrite::Rollback
sqlservr!CMsqlXactImp::RollbackInternal
sqlservr!CMsqlXactImp::Rollback
sqlservr!CMsqlXact::UnInstallUserXact
sqlservr!process_request
sqlservr!process_commands
sqlservr!SOS_Task::Param::Execute
sqlservr!SOS_Scheduler::RunTask
sqlservr!SOS_Scheduler::ProcessTasks
sqlservr!SchedulerManager::WorkerEntryPoint
sqlservr!SystemThread::RunWorker
sqlservr!SystemThreadDispatcher::ProcessWorker
sqlservr!SchedulerManager::ThreadEntryPoint
msvcr80!_callthreadstartex
msvcr80!_threadstartex
kernel32!BaseThreadInitThunk
ntdll!RtlUserThreadStart

We can see that we hit an assertion while SQL Server was attempting to roll back a transaction. Assertions are safety checks in code to verify that a condition that shouldn’t happen - isn’t happening. If that condition occurs (or the safety check fails), an error is generated like we see above. So we want to get two things out of this – what is the assertion and what is the query that was running? The latter is mostly for academic purposes here. In general, you cannot write TSQL statements that cause assertions or corruption in databases. TSQL just isn’t low-level enough to do that. However, the TSQL you run may lead to the scenario which causes us to come across an issue like this. Therefore, getting the query text is good to see if we can reproduce the issue. If we use some different parameters for the ‘k’ (stack trace) command, we get more information returned. When we run ‘kvnf’, we notice that we get the memory taken up by each frame on the stack (a frame is one line on the stack). If we look at the frame that called sqlservr!utassert_fail, we see that it’s memory usage is rather large compared to other frames. This is a good indication something is stored on the stack as opposed to the heap or in memory allocated via VirtualAlloc() calls. Here is that frame from the ‘kvnf’ command:

05 1550 00000000`55addca0 00000000`00fb23fd : 00000000`00000000 00000000`ffffffff 00000003`00000000 0000b688`75b7282f : sqlservr!utassert_fail+0x844

You’ll notice it uses 1550 bytes of memory – much more than the other frames. To investigate, we start dumping at the child base stack pointer – the Child-SP. This is where this frame starts on the stack in memory. I am going to dump Unicode data, so I’ll use the ‘du’ command:

0:000> du 00000000`55addca0
00000000`55addca0 "¸"
0:000> du
00000000`55addca4 ""

Type ‘du’ twice and then you can simply hit the ENTER key and it will increment automatically. Keep hitting ENTER and you’ll eventually find the assertion stored on the stack:

0:000>
00000000`55addcf0 "Location: "xact.cpp":4362.Expre"
00000000`55addd30 "ssion: !m_parNestedXactCnt.SPID"
00000000`55addd70 ": 84.Process ID: 4776.Descrip"
00000000`55adddb0 "tion: Trying to use the transac"
00000000`55adddf0 "tion while there are 1 parallel "
00000000`55adde30 "nested xacts outstanding"

Now that we have our assertion, we can search the online support knowledge base to see if there are any known issues (https://support.microsoft.com/).

However, we still have one thing left to get from the dump file – our query. Using the ‘dq’ command, we can follow the same general process as the previous post did. The first thing we notice is that this thread doesn’t show a call to sqlservr!CMSqlExecContext::ExecuteStmts like the previous post. That’s ok. What we are looking for is the address of a particular object that is passed to a few calls on the stack. In this dump, we’ll start with the call to sqlservr!process_request. From this call, we need to start with the first parameter – the address to the object we are interested in:

0:000> dq 00000007`5ae70fa0+0x20 l1
00000007`5ae70fc0 00000007`5ae70ea0

We dump a quad-word (‘dq’) with an offset of 0x20. The ‘l1’ tells the debugger we only want 1 quad word. Notice the higher order digits (the digits to the left of the apostrophe are not all zero – the issue with the previous post). 

Now, we need to take the address returned (the address of another object – which our first object points to) and dump that with an offset of 0x28 using ‘dq’.

0:000> dq 00000007`5ae70ea0+0x28 l1
00000007`5ae70ec8 00000009`91862160

We aren’t there yet. This address (another object in memory), has what we need at an offset of 0x200. So let’s get that:

0:000> dq 00000009`91862160+0x200 l1
00000009`91862360 00000003`408fdd30

In our chain of objects, this one has what we need at an offset of 0x20:

0:000> dq 00000003`408fdd30+0x20 l1
00000003`408fdd50 00000003`408fdd90

Finally, this address holds the pointer we want to get to:

0:000> dq 00000003`408fdd90 l1
00000003`408fdd90 00000003`408fde20

Which gives us the location of the buffer we’ve been after. So now we dump Unicode with ‘du’ to get the statement:

0:000> du 00000003`408fde20
00000003`408fde20 "(@P0 int)select count(*) as y0_ "
00000003`408fde60 "from dbo.V_PDM_CP this_ where th"
00000003`408fdea0 "is_.ResItNum= @P0 ??????????????"
00000003`408fdee0 "????????????????????????????????"
00000003`408fdf20 "????????????????????????????????"

From here we can see we have a parameterized SQL statement. This statement either went through auto-parameterization or was submitted as a parameterized statement via a call to sp_executesql. Now, we can move forward with researching and seeing if this is reproducible.

The approach is the same, but use ‘dq’ when dumping addresses in 64 bit SQL Memory Dumps.

-Jay