Getting to the Root Cause of SqlExceptions

It's pretty common for me to work cases where something bad happens when an ASP.NET application is accessing a database. The nature of my job dictates that troubleshooting on these applications is most often performed in a production environment via post mortem debugging. Fortunately, our toolset for doing that kind of work is really great.

If you aren't familiar with how to load up a dump in Windbg, set up symbols, etc., check out my post on that topic.

Finding out exactly what errors occurred when hitting a database is easy to do, but it's time consuming. Let's have a look at a particular SqlException that occurred when accessing a database.

0:000> !do 106fd670
Name: System.Data.SqlClient.SqlException
MethodTable 0x0343232c
EEClass 0x034ffef4
Size 68(0x44) bytes
GC Generation: 0
mdToken: 0x020001c4 (c:\windows\assembly\gac\system.data\1.0.5000.0__b77a5c561934e089\system.data.dll)
FieldDesc*: 0x03432228
MT Field Offset Type Attr Value Name
0x79b96824 0x400001d 0x4 CLASS instance 0x00000000 _className
0x79b96824 0x400001e 0x8 CLASS instance 0x00000000 _exceptionMethod
0x79b96824 0x400001f 0xc CLASS instance 0x00000000 _exceptionMethodString
0x79b96824 0x4000020 0x10 CLASS instance 0x1c2ef88c _message
0x79b96824 0x4000021 0x14 CLASS instance 0x00000000 _innerException
0x79b96824 0x4000022 0x18 CLASS instance 0x00000000 _helpURL
0x79b96824 0x4000023 0x1c CLASS instance 0x106fd8c8 _stackTrace
0x79b96824 0x4000024 0x20 CLASS instance 0x00000000 _stackTraceString
0x79b96824 0x4000025 0x24 CLASS instance 0x00000000 _remoteStackTraceString
0x79b96824 0x4000026 0x2c System.Int32 instance 0 _remoteStackIndex
0x79b96824 0x4000027 0x30 System.Int32 instance -2146232060 _HResult
0x79b96824 0x4000028 0x28 CLASS instance 0x00000000 _source
0x79b96824 0x4000029 0x34 System.Int32 instance 0 _xptrs
0x79b96824 0x400002a 0x38 System.Int32 instance -532459699 _xcode
0x0343232c 0x4000fb6 0x3c CLASS instance 0x106fd6b4 _errors

If you look at the _message field, you'll see this:

0:000> !do 0x1c2ef88c
Name: System.String
MethodTable 0x79b94638
EEClass 0x79b94984
Size 44(0x2c) bytes
GC Generation: 2
mdToken: 0x0200000f (c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll)
String: System error.

This is a generic error that tells you absolutely nothing. Essentially, it's telling you that there was an error. The key to finding out what that error actually was is the _errors field in the exception. (It's the last field listed in the exception above.) The _errors field is a SqlErrorsCollection as shown here:

0:000> !do 0x106fd6b4
Name: System.Data.SqlClient.SqlErrorCollection
MethodTable 0x0343249c
EEClass 0x034fff58
Size 12(0xc) bytes
GC Generation: 0
mdToken: 0x020001c3 (c:\windows\assembly\gac\system.data\1.0.5000.0__b77a5c561934e089\system.data.dll)
FieldDesc*: 0x034323bc
MT Field Offset Type Attr Value Name
0x0343249c 0x4000fb5 0x4 CLASS instance 0x106fd6c0 errors

The errors field here is an ArrayList of SqlError objects. Let's have a look at that.

0:000> !do 0x106fd6c0
Name: System.Collections.ArrayList
MethodTable 0x79ba2ee4
EEClass 0x79ba3020
Size 24(0x18) bytes
GC Generation: 0
mdToken: 0x02000100 (c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll)
FieldDesc*: 0x79ba3084
MT Field Offset Type Attr Value Name
0x79ba2ee4 0x4000362 0x4 CLASS instance 0x106fd6d8 _items
0x79ba2ee4 0x4000363 0xc System.Int32 instance 1 _size
0x79ba2ee4 0x4000364 0x10 System.Int32 instance 1 _version
0x79ba2ee4 0x4000365 0x8 CLASS instance 0x00000000 _syncRoot

Now, to get to the actual errors in that ArrayList, you need to dump the _items field, and to do that, you need to use the -v switch as follows:

0:000> !do -v 0x106fd6d8
Name: System.Object[]
MethodTable 0x020a209c
EEClass 0x020a2018
Size 80(0x50) bytes
GC Generation: 0
Array: Rank 1, Type CLASS
Element Type: System.Object
Content: 16 items
------ Will only dump out valid managed objects ----
Address MT Class Name
0x106fd780 0x0343263c System.Data.SqlClient.SqlError
----------

Alright, now we're getting somewhere. We have the specific SqlError instance that is associated with the SqlException that occurred. Let's dump that.

0:000> !do 0x106fd780
Name: System.Data.SqlClient.SqlError
MethodTable 0x0343263c
EEClass 0x03710010
Size 36(0x24) bytes
GC Generation: 0
mdToken: 0x020001c2 (c:\windows\assembly\gac\system.data\1.0.5000.0__b77a5c561934e089\system.data.dll)
FieldDesc*: 0x03432520
MT Field Offset Type Attr Value Name
0x0343263c 0x4000fad 0x4 CLASS instance 0x182f7d34 source
0x0343263c 0x4000fae 0x14 System.Int32 instance -2 number
0x0343263c 0x4000faf 0x1c System.Byte instance 0 state
0x0343263c 0x4000fb0 0x1d System.Byte instance 10 errorClass
0x0343263c 0x4000fb1 0x8 CLASS instance 0x1c2eb298 server
0x0343263c 0x4000fb2 0xc CLASS instance 0x14384218 message
0x0343263c 0x4000fb3 0x10 CLASS instance 0x106fd73c procedure
0x0343263c 0x4000fb4 0x18 System.Int32 instance 0 lineNumber

Now we have something we can use. We have the server, the error message, and the procedure. If we look at the error message, here's what we see:

0:000> !do 0x14384218
Name: System.String
MethodTable 0x79b94638
EEClass 0x79b94984
Size 248(0xf8) bytes
GC Generation: 2
mdToken: 0x0200000f (c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll)
String: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

That's a heck of a lot more useful than "System error," but it was a lot of work! You can save yourself some effort by scripting this. Each dump of an object contains an offset for each field. The field exists at that offset. In the SqlError above, the message field is at an offset of 0xc.

0x0343263c 0x4000fb2 0xc CLASS instance 0x14384218 message

Therefore, you can get the message of this exception by dumping the address located at an offset of 0xc from the address of the SqlError itself. You can do that like this:

0:000> !do poi(0x106fd780+c)
Name: System.String
MethodTable 0x79b94638
EEClass 0x79b94984
Size 248(0xf8) bytes
GC Generation: 2
mdToken: 0x0200000f (c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll)
String: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

Look familiar? This is an important concept because it means that you can take the original SqlException object and follow the trail all the way to the actual error message.

What if you want to just dump out details of all SQL errors that are on the heap? That's easy to, and the easiest way to do it is to write a script that you can then run inside of Windbg. What appears below is just such a script:

.logopen ${$CurrentDumpPath}\SqlErrors.txt
.printf "Dumping all SqlError objects on the heap..."
.printf "\n\n"

.foreach (se {!dumpheap -short -mt 0x0343263c})
{

.echo "=============================================================="
.echo "SqlError : ${se}"
.printf "Server Instance: "
du poi(${se}+8)+c
.printf "Procedure: "
du poi(${se}+10)+c
.printf "Error Message: "
du poi(${se}+c)+c
.printf "\n\n"

}
.logclose

The first thing this script does is open a log file in the location where the dump file exists. That way, you'll automatically have a text file containing all of the errors. It then uses a .foreach token to loop through all of the System.Data.SqlClient.SqlError objects. (In other words, all objects in MethodTable 0x0343263c.) For each one of those suckers, it dumps out the different fields using the offsets we looked at before.

All of these fields are text values, and text in the Framework is represented using Unicode. That's why I use the du command (dump Unicode) to dump that out. You'll also notice that each one is dumping out the address+c. Unicode strings are at an offset of 12 from the address of the string, so you have to dump out the string starting at address+c.

If you save the above into a text file and then copy it into the same folder where the debuggers are installed, you can then run it from within Windbg like this:

$><dumpsqlerrors.txt

That will give you output similar to this:

0:000> $><dumpsqlerrors.txt
Opened log file 'C:\Users\jamesche\Desktop\SqlErrors.txt'
Dumping all SqlError objects on the heap...

==============================================================
SqlError : 0x106fd780
Server Instance: 1c2eb2a4 "DBASESVR"
Procedure: 106fd748 "ConnectionRead (recv())."
Error Message: 14384224 "Timeout expired. The timeout pe"
14384264 "riod elapsed prior to completion"
143842a4 " of the operation or the server "
143842e4 "is not responding."

==============================================================
SqlError : 0x10721d38
Server Instance: 1c2eb2a4 "DBASESVR"
Procedure: 10721d00 "ConnectionRead (recv())."
Error Message: 14384224 "Timeout expired. The timeout pe"
14384264 "riod elapsed prior to completion"
143842a4 " of the operation or the server "
143842e4 "is not responding."

==============================================================
SqlError : 0x185266a8
Server Instance: 1c2eb2a4 "DBASESVR"
Procedure: 18526670 "ConnectionRead (recv())."
Error Message: 14384224 "Timeout expired. The timeout pe"
14384264 "riod elapsed prior to completion"
143842a4 " of the operation or the server "
143842e4 "is not responding."

==============================================================
SqlError : 0x18549968
Server Instance: 18549948 "SQL-01"
Procedure: 1c2d0230 ""
Error Message: 185498d0 "Changed language setting to us_e"
18549910 "nglish."

==============================================================
SqlError : 0x1c42e894
Server Instance: 1c2eb2a4 "DBASESVR"
Procedure: 1c42e85c "ConnectionRead (recv())."
Error Message: 14384224 "Timeout expired. The timeout pe"
14384264 "riod elapsed prior to completion"
143842a4 " of the operation or the server "
143842e4 "is not responding."

==============================================================
SqlError : 0x1c47faa8
Server Instance: 1c2eb2a4 "DBASESVR"
Procedure: 1c47fa70 "ConnectionRead (recv())."
Error Message: 14384224 "Timeout expired. The timeout pe"
14384264 "riod elapsed prior to completion"
143842a4 " of the operation or the server "
143842e4 "is not responding."

Closing open log file C:\Users\jamesche\Desktop\DumpSqlErrors.txt

The layout of the SqlError object isn't static. I wrote this script for ASP.NET 1.1 SP1.