Basic Debugging Concepts and Setup

In my blog, from time to time I will rely on debugging to explain or prove out a concept. In this context, I am talking about debugging the SQL Server engine itself – not debugging any TSQL code written to run within SQL Server (although I am sure we’ll get into that at some point). To follow along with those posts, it will help to get familiar with the basics of debugging. This is what I want to go over here….

In Windows, you can debug in two different areas. The first is kernel mode debugging. Kernel mode debugging is done at the operating system level. This is where the operating system kernel and device drivers play. You will typically only encounter this if you write device drivers, are an SDE for the windows team, or you have a need to read kernel memory dump files (the ones that come from the blue screen of death). The other type of debugging is user mode debugging. This is what we are concerned with in SQL Server and in this blog. Applications typically run in “user mode”. SQL Server is a user mode application – there is no kernel level device driver code involved (from the SQL Product). SQL Server relies on Win32/Win64 API calls for all I/O and other services needed from the operating system. The primary distinction here - as it relates to our purposes - is that debugging in these two areas is not the same. Though some of the commands are the same, there are different rules and different data structures at play. User mode is a protected mode where memory is abstracted and one process cannot touch the memory space of another process. And it is because of this separation of roles (kernel/user) and protection of resources that user mode code does not directly crash another process and does not crash the system (aka cause a blue screen). If you are having blue screens on your SQL Server, there is something else causing it besides SQL Server – though SQL Server may be triggering a sequence of events that lead to this…

One other distinction to note is the difference between live debugging and post-mortem debugging or reading dump files. We’ll do both, but there are differences. In a live debug, we attach our debugger directly to the process – often halting it’s execution. A dump file is simply a file the debugger can read that contains some or all of the memory contents of the process it was created from. We can create full memory dump files which contain the entire committed address space (but will not contain paged out memory and have an extension of .dmp), or we can have just the thread stacks and loaded modules – called a mini-dump (often having an extension of .mdmp). The extensions are a matter of convention as with any file extension and can in fact be anything.

To start, we need the debugging tools. You can download the appropriate platform installer here:

https://www.microsoft.com/whdc/devtools/debugging/default.mspx

By default, the tools will install into “C:\Program Files\Debugging Tools For Windows” for 64 bit and “C:\Program Files (x86)\Debugging Tools For Windows” for 32 bit. You can do user mode debugging with Visual Studio of course, but we’ll stick to the Debugging Tools for Windows package here.

The package will install several debuggers including: windbg, kd, cdb, and ntsd. Some of these debuggers are console debuggers (kd, cdb, ntsd), and some (windbg) are GUI based. Some of them only do user mode debugging (cdb and ntsd), while some do kernel debugging (kd and windbg). Windbg is GUI based and does both kernel and user mode debugging. Windbg is my debugger of choice.

Before launching the debugger, we need to go over the basics of processes and threads. An application runs as a process. With SQL Server, the process we are interested is sqlservr.exe (the main executable for SQL Server). With user mode debugging, we are debugging an individual process – whether we attach to it with a debugger in a “live” debug, or read a dump file that contains the memory space of a process. In kernel debugging, you can view all processes and their memory space. (NOTE: in a memory dump, you will not have access to memory addresses that were paged out at the time the dump was created). The process is where the accounting of all resource usage by the application is performed. The actual execution of application code is performed by threads within the process. It is these threads that are scheduled and executed on the processor. When we think of a stack trace, this is a stack of executing code for a single thread. A thread will have a user mode and a kernel mode stack. However, in our user mode debugging, we will only see the user mode stack for each thread. This is generally all we’re concerned with when debugging a user mode application like SQL Server. 

The most common structure we look at when debugging is the stack trace. A stack trace, as we’ll see in an example below, is a list of all the outstanding function calls made on this thread. A stack is a LIFO structure – meaning the last thing to come in is the first thing to come back out. Commonly this is referred to as “pushing” and “popping” the stack. If we have 3 functions: Func01, Func02, and Func03, and if Func01 calls Func02 which returns and then Func01 calls Func03, we’d have the following stack trace:

MyModule!Func03 <—currently executing code for this thread
MyModule!Func01

The stack is read from the bottom up. The “call” on the top is the currently executing code. When we see real stack traces below, you’ll see a Child SP (Stack Pointer) and a RetAddr (Return Address) which is how the system tracks execution and returns to the calling code – and partly how the debugger constructs the stack trace for us.

As a final step in our setup, we must map symbols. If we open Windbg, we have to set our symbol file path – the place where the debugger can find symbols. Symbolic Debugging Files – or symbols – help the debugger to map memory addresses and offsets into memory locations to their respective function and class/variable names. If we looked at a stack trace on a thread without symbols, we’d only see memory addresses like below:

Child-SP RetAddr Call Site
00000000`09cbe9e8 00000000`777b2f60 ntdll!NtSignalAndWaitForSingleObject+0xa
00000000`09cbe9f0 00000000`00bdc99e kernel32!SignalObjectAndWait+0x110
00000000`09cbeaa0 00000000`00bc4575 sqlservr+0x1c99e
00000000`09cbed40 00000000`00bc3ea8 sqlservr+0x4575
00000000`09cbed80 00000000`00bdcfad sqlservr+0x3ea8
00000000`09cbf370 00000000`01139d9c sqlservr+0x1cfad
00000000`09cbf430 00000000`032b34c7 sqlservr+0x579d9c
00000000`09cbf650 00000000`00bd2abb sqlservr!TlsGetValueForMsxmlSQL+0x4706d7
00000000`09cbf6c0 00000000`00bd0fda sqlservr+0x12abb
00000000`09cbf7e0 00000000`00bd2665 sqlservr+0x10fda
00000000`09cbf870 00000000`0117abb0 sqlservr+0x12665
00000000`09cbf8e0 00000000`0117c4b0 sqlservr+0x5babb0
00000000`09cbf9a0 00000000`0117a060 sqlservr+0x5bc4b0
00000000`09cbf9d0 00000000`0117a9ef sqlservr+0x5ba060
00000000`09cbfa60 00000000`734937d7 sqlservr+0x5ba9ef
00000000`09cbfaf0 00000000`73493894 MSVCR80!endthreadex+0x47
00000000`09cbfb20 00000000`7775f56d MSVCR80!endthreadex+0x104
00000000`09cbfb50 00000000`77893281 kernel32!BaseThreadInitThunk+0xd
00000000`09cbfb80 00000000`00000000 ntdll!RtlUserThreadStart+0x21

This shows us our module, sqlservr.exe, is executing code at an offset into it’s address space. However, we don’t know what this function is – that is where symbols help us. With symbols mapped, we get something much more meaningful:

Child-SP RetAddr Call Site
00000000`09cbe9e8 00000000`777b2f60 ntdll!NtSignalAndWaitForSingleObject+0xa
00000000`09cbe9f0 00000000`00bdc99e kernel32!SignalObjectAndWait+0x110
00000000`09cbeaa0 00000000`00bc4575 sqlservr!SOS_Scheduler::SwitchContext+0x84e
00000000`09cbed40 00000000`00bc3ea8 sqlservr!SOS_Scheduler::SuspendNonPreemptive+0xc5
00000000`09cbed80 00000000`00bdcfad sqlservr!EventInternal<Spinlock<149,1,0> >::Wait+0x428
00000000`09cbf370 00000000`01139d9c sqlservr!ResQueueBase::Dequeue+0x19d
00000000`09cbf430 00000000`032b34c7 sqlservr!CheckpointLoop+0x1aa
00000000`09cbf650 00000000`00bd2abb sqlservr!ckptproc+0x47
00000000`09cbf6c0 00000000`00bd0fda sqlservr!SOS_Task::Param::Execute+0x11b
00000000`09cbf7e0 00000000`00bd2665 sqlservr!SOS_Scheduler::RunTask+0xca
00000000`09cbf870 00000000`0117abb0 sqlservr!SOS_Scheduler::ProcessTasks+0x95
00000000`09cbf8e0 00000000`0117c4b0 sqlservr!SchedulerManager::WorkerEntryPoint+0x110
00000000`09cbf9a0 00000000`0117a060 sqlservr!SystemThread::RunWorker+0x60
00000000`09cbf9d0 00000000`0117a9ef sqlservr!SystemThreadDispatcher::ProcessWorker+0x12c
00000000`09cbfa60 00000000`734937d7 sqlservr!SchedulerManager::ThreadEntryPoint+0x12f
00000000`09cbfaf0 00000000`73493894 MSVCR80!endthreadex+0x47
00000000`09cbfb20 00000000`7775f56d MSVCR80!endthreadex+0x104
00000000`09cbfb50 00000000`77893281 kernel32!BaseThreadInitThunk+0xd
00000000`09cbfb80 00000000`00000000 ntdll!RtlUserThreadStart+0x21

Now we have something more meaningful… From the function names we can pretty easily see that we have found the Checkpoint thread which you can usually see from the following DMV:

select * from sys.dm_exec_requests

It will be the worker thread waiting on CHECKPOINT_QUEUE with the CHECKPOINT command. As you may have already figured out, the calling convention on the stack trace is below:

<module_name>!<function call>

or

<module_name>!<class_name>::<method/function call>

Now we have to setup our symbol file path so that they can be downloaded as the debugger needs them (or downloaded ahead of time if you choose to do so). In the debugger (Windbg), you can set the symbol path from the file menu here:

image

Or, once you have opened a dump or connected to a live target, you can use the following command:

.sympath

The debuggers will also honor an environment variable named _NT_SYMBOL_PATH. Some tools, such as process explorer, have the ability to map symbols for troubleshooting purposes and will look for this path.

Applications can have both public and private symbols. Both serve the same purpose, but the private symbols have much more information for private members and source code location. Private symbols for SQL Server are only available internally within Microsoft for support personnel. Externally, you will use public symbols as will this blog.

The easiest thing to do here is to set your symbol path to the externally available public symbol store. You can add a local cache directory for these symbols with the following syntax:

srv*c:\symbols\public* https://msdl.microsoft.com/download/symbols

The underlined, bolded part sets the local cache directory. The https:// URL is the public symbol store that symbols will be downloaded from. To set this with the .sympath command, just type:

.sympath srv*c:\symbols\public*https://msdl.microsoft.com/download/symbols

You can add additional symbol file locations by separating paths with a semi-colon (;). Then you can use the .reload command to force the loading of all or a specific symbol. Here we force the reloading of the main SQL Server executable, sqlservr.exe:

.reload /f sqlservr.exe

The debugger is now setup and if you haven’t already attached to a SQL Server or opened a dump file, you can do so from the File menu seen above with the “Attach to a Process” or “Open Crash Dump” commands. 

In a future post, we’ll do some basic commands with the debugger to look around inside SQL Server.

-Jay