A Hang Scenario, Locks and Critical Sections



The situation here is that the server at times slows down considerably, or even hangs completely. Requests start timing out and no new requests seem to be getting through. The scenario below is a bit simplified but it shows the technique to find owners, waiters and locations for locks.

A quick look with ~* kb (listing the native stack for all threads) shows us that most threads are sitting in this type of stack…

  30  Id: d78.c68 Suspend: 1 Teb: 7ff5a000 Unfrozen
ChildEBP RetAddr Args to Child
0c6becb4 7c822114 77e6711b 00000001 0c6bed04 ntdll!KiFastSystemCallRet
0c6becb8 77e6711b 00000001 0c6bed04 00000000 ntdll!NtWaitForMultipleObjects+0xc
0c6bed60 791e0b3b 00000001 00134f7c 00000001 kernel32!WaitForMultipleObjectsEx+0x11a
0c6bed90 791e0bdd 00000001 00134f7c 00000001 mscorsvr!Thread::DoAppropriateWaitWorker+0xc1
0c6bede4 791fccfe 00000001 00134f7c 00000001 mscorsvr!Thread::DoAppropriateWait+0x46
0c6bee68 791fcc17 01fcc738 ffffffff 00000000 mscorsvr!AwareLock::EnterEpilog+0x9d
0c6bee84 791fd43e 0c6bef7c 064aecf4 064ad210 mscorsvr!AwareLock::Enter+0x78
0c6bef18 0c1c0744 064ad210 064aecf4 0c6bef7c mscorsvr!JITutil_MonContention+0x124

WARNING: Frame IP not in any known module. Following frames may be wrong.
0c6bef40 0c1c0697 064aeca0 064ae094 064ae094 0xc1c0744
0c6bef84 0c525abd 064aeca0 064ad210 064ae094 0xc1c0697
0c6bf0cc 799a67a2 064ac400 064ac704 799b4168 0xc525abd
0c6bf0d8 799b4168 00000002 00000000 0210177c mscorlib_79990000+0x167a2
0c6bf160 79217188 00000000 00da9a60 061bc490 mscorlib_79990000+0x24168
0c6bf1d4 7923c206 01eb2f60 0c6bf2dc 0015d990 mscorsvr!ComCallMLStubCache::CompileMLStub+0x429
0c6bf214 791cc3c8 0c6bf284 791b3d64 0c6bf300 mscorsvr!ComCallMLStubCache::CompileMLStub+0x1af
0c6bf25c 7923c1c3 0015d990 7923c1ce 0c6bf284 mscorsvr!Thread::DoADCallBack+0x5c
0c6bf2c4 01afa0bd 01fcc738 0c6bf300 0c6bffdc mscorsvr!ComCallMLStubCache::CompileMLStub+0x2ba
0c6bf3b8 791cc310 000de250 00000000 791cc32b 0x1afa0bd
0c6bf41c 791b3e6f 791cc131 0010f14c 00000000 mscorsvr!Thread::SafeSetStaticData+0x63
0c6bf4f8 7c822124 77e6baa8 000002e4 00000000 mscorsvr!Frame::Pop+0x9

So from managed code at the bottom of the stack, we go into something called JITutil_MonContention which then goes into AwareLock::Enter. This means that the thread is sitting in some type of synchronization statement, likely a lock(){} waiting for that lock to be free so we can enter the critical section.

There are two questions to answer here


  1. Where are we locking?
  2. Who owns the lock?

For the answer to the second question, “who owns the lock?” we can turn to !syncblk. !syncblk without any parameters will list SyncBlocks corresponding to objects owned by active threads.

0:018> !syncblk
Index SyncBlock MonitorHeld Recursion Thread ThreadID Object Waiting
987 0x00134f68 19 1 0x10eb28 0xcc8 17 0x022841fc System.Object
Waiting threads: 18 19 23 24 25 26 27 29 30
—————————–
Total 1039
ComCallWrapper 4
ComPlusWrapper 1
ComClassFactory 0
Free 12

MonitorHeld shows 19. The owner has 1 and then each waiter has 2, so this means we have one owner and 9 waiters. We are synchronizing on a System.Object (0x022841fc) and the owning thread is thread 17.

The waiting threads are 18, 19, 23, 24…

So thread 18 for example… looks like this

0:018> !clrstack
Thread 18
ESP EIP
0x0c3ef490 0x7c82ed54 [FRAME: GCFrame]
0x0c3ef54c 0x7c82ed54 [FRAME: HelperMethodFrame]
0x0c3ef5a0 0x0c1c0744 [DEFAULT] Void SuperSlowWebApp.CacheUpdater.Update()
0x0c3ef5c8 0x0c1c0697 [DEFAULT] [hasThis] Void SuperSlowWebApp.MyPage.btnUpdateCache_Click(Object,Class System.EventArgs)
0x0c3ef610 0x0c525abd [DEFAULT] [hasThis] Void System.Web.UI.WebControls.Button.OnClick(Class System.EventArgs)
0x0c3ef624 0x0c52589a [DEFAULT] [hasThis] Void System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(String)
0x0c3ef634 0x0c52584b [DEFAULT] [hasThis] Void System.Web.UI.Page.RaisePostBackEvent(Class System.Web.UI.IPostBackEventHandler,String)
0x0c3ef63c 0x0c5257aa [DEFAULT] [hasThis] Void System.Web.UI.Page.RaisePostBackEvent(Class System.Collections.Specialized.NameValueCollection)
0x0c3ef64c 0x0c4c6ead [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x0c3ef690 0x0c4c5937 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x0c3ef6cc 0x0c4c53a3 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)
0x0c3ef6d4 0x0c4c537c [DEFAULT] [hasThis] Void System.Web.HttpApplication/CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
0x0c3ef6e4 0x0c1da288 [DEFAULT] [hasThis] Class System.Exception System.Web.HttpApplication.ExecuteStep(Class IExecutionStep,ByRef Boolean)
0x0c3ef72c 0x0c1d9322 [DEFAULT] [hasThis] Void System.Web.HttpApplication.ResumeSteps(Class System.Exception)
0x0c3ef774 0x0c1d91eb [DEFAULT] [hasThis] Class System.IAsyncResult System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(Class System.Web.HttpContext,Class System.AsyncCallback,Object)
0x0c3ef790 0x01eb6897 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)
0x0c3ef7cc 0x01eb6448 [DEFAULT] Void System.Web.HttpRuntime.ProcessRequest(Class System.Web.HttpWorkerRequest)
0x0c3ef7d8 0x01eb2fc5 [DEFAULT] [hasThis] I4 System.Web.Hosting.ISAPIRuntime.ProcessRequest(I,I4)
0x0c3ef8a0 0x79217188 [FRAME: ContextTransitionFrame]
0x0c3ef980 0x79217188 [FRAME: ComMethodFrame]

Waiting in CacheUpdater.Update(), and if we disassemble we can see this…

0:018> !u 0x0c1c0744
Will print ‘>>> ‘ at address: 0x0c1c0744
Normal JIT generated code
[DEFAULT] Void SuperSlowWebApp.CacheUpdater.Update()
Begin 0x0c1c0718, size 0x68
0c1c0718 55 push ebp
0c1c0719 8bec mov ebp,esp
0c1c071b 83ec14 sub esp,0x14
0c1c071e 57 push edi
0c1c071f 56 push esi
0c1c0720 53 push ebx
0c1c0721 33c0 xor eax,eax
0c1c0723 8945ec mov [ebp-0x14],eax
0c1c0726 c745f800000000 mov dword ptr [ebp-0x8],0x0
0c1c072d c745ec00000000 mov dword ptr [ebp-0x14],0x0
0c1c0734 a12c66170a mov eax,[0a17662c] (Object: System.Object)
0c1c0739 8945ec mov [ebp-0x14],eax
0c1c073c 8b4dec mov ecx,[ebp-0x14]
0c1c073f e8e712ff6c call mscorsvr!JIT_MonEnter (791b1a2b)
>>> 0c1c0744 b988130000 mov ecx,0x1388
0c1c0749 e85da79f6d call mscorlib_79990000+0x22aeab (79bbaeab) (System.Threading.Thread.Sleep)
0c1c074e 90 nop
0c1c074f c745f400000000 mov dword ptr [ebp-0xc],0x0
0c1c0756 c745f8fc000000 mov dword ptr [ebp-0x8],0xfc
0c1c075d 686f071c0c push 0xc1c076f
0c1c0762 eb00 jmp 0c1c0764
0c1c0764 8b4dec mov ecx,[ebp-0x14]
0c1c0767 e8e114ff6c call mscorsvr!JIT_MonExit (791b1c4d)
0c1c076c 58 pop eax
0c1c076d ffe0 jmp eax
0c1c076f c745f800000000 mov dword ptr [ebp-0x8],0x0
0c1c0776 eb00 jmp 0c1c0778
0c1c0778 90 nop
0c1c0779 5b pop ebx
0c1c077a 5e pop esi
0c1c077b 5f pop edi
0c1c077c 8be5 mov esp,ebp
0c1c077e 5d pop ebp
0c1c077f c3 ret

So we are sitting just before a call to Sleep(0x1388) or Sleep(5000) in a JIT_MonEnter call

In fact, the function looks like this… and we are stuck on the bolded line

		public static void Update()
{
lock(_syncRoot)
{
System.Threading.Thread.Sleep(5000);
}
}

And thread 17 (the lock owner) looks like this

0:017> !clrstack
Thread 17
ESP EIP
0x009ff610 0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Void System.Threading.Thread.Sleep(I4)
0x009ff620 0x0c1c074e [DEFAULT] Void SuperSlowWebApp.CacheUpdater.Update()
0x009ff648 0x0c1c0697 [DEFAULT] [hasThis] Void SuperSlowWebApp.MyPage.btnUpdateCache_Click(Object,Class System.EventArgs)
0x009ff690 0x0c525abd [DEFAULT] [hasThis] Void System.Web.UI.WebControls.Button.OnClick(Class System.EventArgs)
0x009ff6a4 0x0c52589a [DEFAULT] [hasThis] Void System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(String)
0x009ff6b4 0x0c52584b [DEFAULT] [hasThis] Void System.Web.UI.Page.RaisePostBackEvent(Class System.Web.UI.IPostBackEventHandler,String)
0x009ff6bc 0x0c5257aa [DEFAULT] [hasThis] Void System.Web.UI.Page.RaisePostBackEvent(Class System.Collections.Specialized.NameValueCollection)
0x009ff6cc 0x0c4c6ead [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x009ff710 0x0c4c5937 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x009ff74c 0x0c4c53a3 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)
0x009ff754 0x0c4c537c [DEFAULT] [hasThis] Void System.Web.HttpApplication/CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
0x009ff764 0x0c1da288 [DEFAULT] [hasThis] Class System.Exception System.Web.HttpApplication.ExecuteStep(Class IExecutionStep,ByRef Boolean)
0x009ff7ac 0x0c1d9322 [DEFAULT] [hasThis] Void System.Web.HttpApplication.ResumeSteps(Class System.Exception)
0x009ff7f4 0x0c1d91eb [DEFAULT] [hasThis] Class System.IAsyncResult System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(Class System.Web.HttpContext,Class System.AsyncCallback,Object)
0x009ff810 0x01eb6897 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)
0x009ff84c 0x01eb6448 [DEFAULT] Void System.Web.HttpRuntime.ProcessRequest(Class System.Web.HttpWorkerRequest)
0x009ff858 0x01eb2fc5 [DEFAULT] [hasThis] I4 System.Web.Hosting.ISAPIRuntime.ProcessRequest(I,I4)
0x009ff920 0x79217188 [FRAME: ContextTransitionFrame]
0x009ffa00 0x79217188 [FRAME: ComMethodFrame]

So in summary all the requests are waiting for thread 17 to come out of its sleep and release the lock

If we grab the System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6 from !dso (dumpstackobjects)

0:017> !dso
Thread 17
ESP/REG Object Name

0x009ff820 0x024961f4 System.Web.HttpContext
0x009ff824 0x02495ed8 System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6
0x009ff828 0x061c3460 System.Web.HttpRuntime
0x009ff84c 0x061ab370 System.String c:\inetpub\wwwroot\SuperSlowWebApp\
0x009ff850 0x02495ff0 System.String c:\inetpub\wwwroot\SuperSlowWebApp\
0x009ff870 0x061bc490 System.Web.Hosting.ISAPIRuntime
0x009ff934 0x021a0770 System.Runtime.Remoting.Messaging.LogicalCallContext

And look at the _startTime …

0:017> !do 0x02495ed8
Name: System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6
MethodTable 0x02061fec
EEClass 0x01ee8460
Size 200(0xc8) bytes
GC Generation: 0
mdToken: 0x02000115 (c:\windows\assembly\gac\system.web\1.0.5000.0__b03f5f7f11d50a3a\system.web.dll)
FieldDesc*: 0x02061e78
MT Field Offset Type Attr Value Name
0x01b28a80 0x4000798 0x4 VALUETYPE instance start at 02495edc _startTime
0x01b28a80 0x4000799 0 CLASS shared static s_HTTPStatusDescriptions
>> Domain:Value 0x000de250:NotInit 0x0010ee48:0x061c0e38 <<
0x01b28a80 0x400079a 0x4 CLASS shared static s_requestHeaderNames
>> Domain:Value 0x000de250:NotInit 0x0010ee48:0x061c0f80 <<
0x01b28a80 0x400079b 0x8 CLASS shared static s_responseHeaderNames
>> Domain:Value 0x000de250:NotInit 0x0010ee48:0x061c1030 <<
0x01b28a80 0x400079c 0xc CLASS shared static s_requestHeadersLoookupTable
>> Domain:Value 0x000de250:NotInit 0x0010ee48:0x061c10b8 <<
0x01b28a80 0x400079d 0x10 CLASS shared static s_responseHeadersLoookupTable
>> Domain:Value 0x000de250:NotInit 0x0010ee48:0x061c1194 <<
0x01b2938c 0x400096f 0x5c System.Int32 instance 14358560 _ecb
0x01b2938c 0x4000970 0x60 System.Int32 instance 0 _token

…we can use !convertvtdatetodate to find out when the request started …

0:017> !convertvtdatetodate 02495edc
01/09/2006 11:38:35

… and compare this to the current time given by .time

0:017> .time
Debug session time: Mon Jan 9 12:40:15.000 2006 (GMT+1)
System Uptime: 2 days 11:36:47.637
Process Uptime: 0 days 0:19:48.000
Kernel time: 0 days 0:00:01.000
User time: 0 days 0:00:00.000

(Note: the time for the request is one hour off since it is given in GMT time, and I am located in the GMT+1 timezone)

So we can see that this request has been executing for 1 minute and 40 seconds so far (11:38:35 -> 11:40:15), probably because it has been blocked waiting to get the lock in the first place.

You can use this method to find out how long any request has been executing in the same way, and if there is not an ISAPIWorkerRequestInProcForIIS6 on the stack, you can use the HttpContext._request._wr to get to it (by using !dumpobj)

As a sidenote, when you use .time in a live debug session it will give you the time now, even if you stopped the debugger a while back, so this technique for finding out how long a request has been executing works best in dumps.

A short note on Critical Sections

Similarly if you have threads stuck in a native synchronization block like ntdll!RtlEnterCriticalSection…

0:021> kb
ChildEBP RetAddr Args to Child
0210e7cc 7c822124 7c83970f 000009f8 00000000 ntdll!KiFastSystemCallRet
0210e7d0 7c83970f 000009f8 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0210e80c 7c839620 00000000 00000004 00000001 ntdll!RtlpWaitOnCriticalSection+0x19c
0210e82c 4bfa4c16 027c0714 0277a5b4 4bf727af ntdll!RtlEnterCriticalSection+0xa8
0210e838 4bf727af 0277a5b4 0277a3a0 0210e868 odbc32!CCriticalSection::Enter+0xf
0210e848 4bf79ae6 0277a5b4 021c2d60 0210e9e8 odbc32!ODBCEnterCriticalSection+0xf
0210e868 4bf923bb 0277a3a0 00000001 000003ee odbc32!FreeStmt+0xe4
0210e884 026f0a35 00000003 4bfa0000 0210e8dc odbc32!SQLFreeHandle+0x1f3
WARNING: Frame IP not in any known module. Following frames may be wrong.
0210e928 791b33e4 00000000 0260aa80 0210e948 0x26f0a35
0210e938 03720601 0210e960 791b3208 0210eb14 mscorsvr!EEClass::GetFixedUpSlot+0x1d
0210e948 791f4512 0210e994 00000000 0210e96c 0x3720601
0210eb14 021824ee 0210eb20 10311208 00000001 mscorsvr!COMMember::InvokeMethod+0x95a
0210eb64 799dd5a2 00000001 00000000 00000000 0x21824ee
0210eba8 799dd3e0 00000001 00000000 1525e724 mscorlib_79990000+0x4d5a2
00000000 00000000 00000000 00000000 00000000 mscorlib_79990000+0x4d3e0

…you can use the first parameter to the RtlEnterCriticalSection function (0x027c0714) and pass it to the !critsec function to find out who owns the critical section

0:021> !critsec 027c0714 

CritSec +27c0714 at 027C0714
WaiterWoken No
LockCount 1
RecursionCount 1
OwningThread 1490
EntryCount 0
ContentionCount 4
*** Locked


In this case the owning thread is 1490 and you can find the logical thread number by using

0:021> ~
# 0 Id: 1748.558 Suspend: 1 Teb: bffdf000 Unfrozen
1 Id: 1748.15d4 Suspend: 1 Teb: bffdd000 Unfrozen
2 Id: 1748.1628 Suspend: 1 Teb: bffdc000 Unfrozen
3 Id: 1748.1524 Suspend: 1 Teb: bffd8000 Unfrozen
4 Id: 1748.17d0 Suspend: 1 Teb: bffd7000 Unfrozen
5 Id: 1748.12b4 Suspend: 1 Teb: bffd6000 Unfrozen
6 Id: 1748.17e8 Suspend: 1 Teb: bffd5000 Unfrozen
7 Id: 1748.1220 Suspend: 1 Teb: bffd4000 Unfrozen
8 Id: 1748.1148 Suspend: 1 Teb: bffaf000 Unfrozen
9 Id: 1748.16a8 Suspend: 1 Teb: bffae000 Unfrozen
10 Id: 1748.12e0 Suspend: 1 Teb: bffad000 Unfrozen
11 Id: 1748.1278 Suspend: 1 Teb: bffac000 Unfrozen
12 Id: 1748.1700 Suspend: 1 Teb: bffab000 Unfrozen
13 Id: 1748.9f0 Suspend: 1 Teb: bffa8000 Unfrozen
14 Id: 1748.1690 Suspend: 1 Teb: bffa7000 Unfrozen
15 Id: 1748.11f8 Suspend: 1 Teb: bffa6000 Unfrozen
16 Id: 1748.2c0 Suspend: 1 Teb: bffa5000 Unfrozen
17 Id: 1748.134c Suspend: 1 Teb: bffa4000 Unfrozen
18 Id: 1748.1680 Suspend: 1 Teb: bffa3000 Unfrozen
19 Id: 1748.a7c Suspend: 1 Teb: bffa2000 Unfrozen
20 Id: 1748.1490 Suspend: 1 Teb: bff9e000 Unfrozen
. 21 Id: 1748.ab0 Suspend: 1 Teb: bff9d000 Unfrozen
22 Id: 1748.c70 Suspend: 1 Teb: bff9c000 Unfrozen
23 Id: 1748.170c Suspend: 1 Teb: bff9b000 Unfrozen
24 Id: 1748.1538 Suspend: 1 Teb: bff4a000 Unfrozen
25 Id: 1748.1470 Suspend: 1 Teb: bff47000 Unfrozen
26 Id: 1748.3c8 Suspend: 1 Teb: bff43000 Unfrozen
27 Id: 1748.ce8 Suspend: 1 Teb: bffa9000 Unfrozen
28 Id: 1748.16c4 Suspend: 1 Teb: bff99000 Unfrozen
29 Id: 1748.11e4 Suspend: 1 Teb: bff9a000 Unfrozen
30 Id: 1748.131c Suspend: 1 Teb: bffaa000 Unfrozen
31 Id: 1748.12ac Suspend: 1 Teb: bffdb000 Unfrozen
32 Id: 1748.bfc Suspend: 1 Teb: bffda000 Unfrozen
33 Id: 1748.1238 Suspend: 1 Teb: bffde000 Unfrozen

Or go to it directly with ~~[1490]s

Normally this thread will be executing some code inside the critical section protected by this critsec, but in this particular case, thread 20/1490 is a regular workerthread waiting for work

0:020> kb 2000
ChildEBP RetAddr Args to Child
020bfaf8 7c822124 77e6baa8 00000250 00000000 ntdll!KiFastSystemCallRet
020bfafc 77e6baa8 00000250 00000000 020bfb40 ntdll!NtWaitForSingleObject+0xc
020bfb6c 77e6ba12 00000250 00009c40 00000000 kernel32!WaitForSingleObjectEx+0xac
020bfb80 791d401f 00000250 00009c40 00000000 kernel32!WaitForSingleObject+0x12
020bfba4 791fdacc 00000000 79172262 79172270 mscorsvr!ThreadpoolMgr::WorkerThreadStart+0x3a
020bffb8 77e66063 0012c938 00000000 00000000 mscorsvr!ThreadpoolMgr::intermediateThreadProc+0x44
020bffec 00000000 791fda8b 0012c938 00000000 kernel32!BaseThreadStart+0x34

Since there is nothing on this thread that should be using a critical section in odbc32, most likely the critical section is either a) transitioning/moving to a different thread or b) orphaned, which means that it is lost forever because the thread that owned it got an access violation or similar while owning the critical section.

In a case like this there are two avenues. The first is to take two dumps after eachother (with about 1-2 minutes in between) to see if the critical section is transitioning, if it is, we should see that the first thread is no longer waiting for the critical section in dump #2. The second avenue is to take crash dumps instead of hang dumps so that we record any access violations or similar that occur prior to the hang and can find out what caused us to orphan the critical section.

Just so you don’t lay sleepless over this:) I will tell you the cause of this particular orphaning… the dump itself was about 1.4 GB so the process had suffered a number of OutOfMemoryExceptions leaving threads in unstable states like this…

Happy hunting!  

 








Comments (19)

  1. Mahesh S says:

    Really cool. I did not know that locks could just get gobbled up like this (There is more to it then meets the eye). The resolution was also cool( who would have have figured that locks get transitioned due of out of memory exceptions) , I am a developer who forgets all about the app by the time it is in production :D.. usually the focus for me has been on things like patterns practices…architechture etc…I find this blog really helpful in knowing the inner workings of .net and web applications in general..its like a world that i was so far unaware of :D

  2. I have talked about a number of different hang/performance issues in my posts. This post is a generic

  3. Faxedhead says:

    Thanks a lot Tess, this is a great article.  You should take a look at http://dotnetdebug.blogspot.com/2006/02/syncblk-in-sos-for-net-framework-20.html for a much easier way of determining which object a thread is trying to acquire (in fact, you are mentioned in that article).  No disassembly required.  If you could validate that method it would be fantastic.  Basically it involves doing ~~[threadid]kv and looking for the first arg of JIT_MonEnterWorker_Portable.  

  4. Tess says:

    Hi Faxedhead,

    Yeah, Erans method works great, thanks for pointing it out

    Tess

  5. A local radio show has started giving out cookie points to their interviewees for personal development,

  6. nufsaidnow says:

    If you get a sc-win32-status 121 "The semaphore timeout period has expired." in IIS Logs, is it possible to catch that specific exception using adplus in hang mode? I get it quite often (once a minute at least).

  7. Tess says:

    I dont think adplus is the right tool in that case,  that error indicates some type of network congestion if i recall correctly, so network monitor would be a better tool, but dont ask me what to look for because i’m extremely bad at troubleshooting network issues…

    This article might help though…

    http://support.microsoft.com/kb/898060

  8. nufsaidnow says:

    Thank you Tess!

    Networking is like a black hole to me, so I’ll forward this to our network pro’s.

    /Anders

    PS

    I’ve gotten hooked on debugging now, so I’ll blame you when my employer wonders why there is no code produced… ;)

  9. Tess says:

    LOL:)  Thats awesome…

  10. Recently we have seen quite a few issues where the problem turned out to be an orphaned critical section.&#160;

  11. In December I blogged about a little tool that i wrote to analyze hangs in dumps , and i showed the following

  12. In December I blogged about a little tool that i wrote to analyze hangs in dumps , and i showed the following

  13. Keith L says:

    Tess, I wanted to thank you for this most excellent set of materials on debugging Deadlocks. Using the tricks you taught in these pages, I was able to debug a deadlock issue that had been plaguing my production website for months.

    It’s a "slow leak" that took days to repro and I was finally able to trace it down to a bug in the MySql.Data.DLL that I was using to access the Database. Turns out this was a hot issue that had been plaguing many other people and websites for months:

    http://bugs.mysql.com/bug.php?id=29289

    I wanted to say I would still be clueless if I hadn’t learned these nifty tricks from your pages.

    Thanks a bundle,

    - Keith

  14. Tess says:

    Excellent Keith,  I’m very happy that it helped:)

  15. Mark says:

    Have you seen Debug Inspector at http://www.debuginspector.com. It works on managed and unmanaged code and resolves deadlocks where it can.

  16. Kevin says:

    What does it mean if the owning thread is 00000000?

    Example:

    0:000> !syncblk

    Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner

      62 001ffa3c           36         0 00000000     none    0bc63970 System.RuntimeType+RuntimeTypeCache+MemberInfoCache`1[[System.Reflection.RuntimeMethodInfo, mscorlib]]

    Waiting threads: 22 35 36 39 44 45 46 47 50 51 52 53 54 58 59 61 62 63

    —————————–

    Total           89

    CCW             2

    RCW             0

    ComClassFactory 0

    Free            22

  17. Tess says:

    Kevin,

    its a bit unusual to catch it like that but it means one of two things.

    1. the lock is in the middle of transitioning (i.e. someone unlocked and someone else is going to grab the lock) or…

    2. rather than using lock, someone is using Monitor.enter and managed to grab the lock without exiting it and the thread exited while the monitor was still held.

    Look at what the waiters are doing and see if there is another thread in the middle of leaving the lock

    /Tess

  18. Xavier says:

    I have my Page.ProcessRequest hang on me (randomly), but it is not because of a code lock on something that we are doing in code.  I am doing a trace and the call to the db and returning of data takes 1 second or so.  But for some people it hangs for like 2 minutes (consistently 1:55)

    Do you have a tip on how I can debug this?