ASP.NET Performance Case Study: Web Service calls taking forever

This issue has been pretty frequent since .net came out, and I am bringing it up for two reasons.

  1. It is something I think every web developer calling web services should be aware of... and
  2. It is a good example of how the product changes based on feedback from support incidents etc. to better suit our customers

Problem Description

We have an ASP.NET application acting as a GUI to our middle tier (Web Service) and at peak times the application is dead slow. Requests start timing out but some do go through so it doesn't appear that we are completely deadlocked.

After the peak in volume the application seems to work fine again.

During this time CPU utilization is reasonably low in the GUI process, and it is very low in the Web Service process. So low in fact that it looks like hardly anything is running in the web service application.

Gathering information

In a multi-tier application of any kind it is always hard to determine where to grab memory dumps, and often times the tiers are on different machines so it's not as easy as just setting the debugger up to grab a dump from all processes at once.

Let's call the GUI server "Server A", and the backend server (web services server) "Server B".

If we take a hang dump of Server A we temporarily stall Server A (even if just for a few seconds) and thus requests to Server B will be drained so we can't go in and take a dump of Server B and expect all requests to still be alive. If on the other hand we take a hang dump of Server B, stalling it momentarily, Server A's request might time out because we are debugging Server B. So we have to focus on one of them at a time.

There are a number of possible reasons we could be "hanging" here.

  1. The web service is slow executing the requests, or possibly even deadlocked.
  2. The GUI app is slow in parts not related to the web service
  3. The GUI app is slow getting requests through to the web service
  4. ...

If we focus on Server A (GUI) we can at least see if we are waiting for a response from Server B, in which case, next time we can take dumps from Server B, and find out what we are waiting for. So a hang dump of the process on Server A, or preferably two, taken about a minute or so apart so we can see if the threads are moving or completely stalled, would be good. Note: if you are taking multiple dumps, make sure that the first dump is ready before you take the next one so you don't get two identical dumps.

Debugging the issue

Drum roll please!... Introducing a command I haven't shown before:) !threadpool

0:065> !threadpool
CPU utilization 7%
Worker Thread: Total: 37 Running: 37 Idle: 1 MaxLimit: 40 MinLimit: 2
Work Request in Queue: 0
Number of Timers: 0
Completion Port Thread: Total: 2 Free: 2 MaxFree: 4 CurrentLimit: 2 MaxLimit: 40 MinLimit: 2

CPU Utilization is 7%. This is the complete CPU utilization on the box, so not specifically for the Asp.Net process, either way it is pretty low, just like it was mentioned in the problem description.

Currently there are no work requests in the queue and no timers waiting.

We have a maximum of 40 worker threads to execute our requests, and a minimum of 2 alive at any time (this comes from our machine.config settings). And currently we are executing on 37 of these, but it doesn't appear that the thread pool is exhausted. This jives well with that there are no work requests in the queue.

So what are all these threads doing?

Running ~* kb it appears that most of the threads are in some kind of wait. More specifically in a CorWaitOneNative...

  61  Id: 1164.a44 Suspend: 1 Teb: 7ff3a000 Unfrozen
ChildEBP RetAddr Args to Child
0d58f430 7c822114 77e6711b 00000001 0d58f480 ntdll!KiFastSystemCallRet
0d58f434 77e6711b 00000001 0d58f480 00000000 ntdll!NtWaitForMultipleObjects+0xc
0d58f4dc 791e0b3b 00000001 0d58f5bc 00000001 kernel32!WaitForMultipleObjectsEx+0x11a
0d58f50c 791e0bdd 00000001 0d58f5bc 00000001 mscorsvr!Thread::DoAppropriateWaitWorker+0xc1
0d58f560 791d5ba9 00000001 0d58f5bc 00000001 mscorsvr!Thread::DoAppropriateWait+0x46
0d58f5a8 024840c8 0d58f5b4 00000000 000186a0 mscorsvr!WaitHandleNative::CorWaitOneNative+0x6f
WARNING: Frame IP not in any known module. Following frames may be wrong.
0d58f614 0c9a5cb7 00000000 06c60b10 06c60b10 0x24840c8
0d58f644 0c9a00c2 06c60b10 0677c0c0 00000000 0xc9a5cb7
0d58f688 0c870721 06c60f9c 06c5fe04 06c60b10 0xc9a00c2
0d58f6ac 0c870532 06c60af0 06c5fe04 067592f0 0xc870721
0d58f6d0 0c952ea4 067592f0 06c5fe04 06c5f13c 0xc870532
0d58f800 799a67a2 06c5f1a4 06c5f4a8 799b4168 0xc952ea4
0d58f80c 799b4168 00000002 00000000 024a1710 mscorlib_79990000+0x167a2
0d58f890 791b3208 0d58f9a8 791b3ad7 0d58f8e4 mscorlib_79990000+0x24168
0d58f898 791b3ad7 0d58f8e4 00000000 0d58f8bc mscorsvr!CallDescrWorker+0x30
0d58f9a8 791d3ef0 00bf71bb 79b7c000 00000004 mscorsvr!MethodDesc::CallDescr+0x1b8
0d58fa64 791d3fa4 79bf71bb 79b7c000 79bd46de mscorsvr!MethodDesc::CallDescr+0x4f
0d58fa8c 791ebc5a 0d58faa4 00101ae0 0254c470 mscorsvr!MethodDesc::Call+0x97
0d58fab4 791cc3c8 02554bb0 791b3d64 ffffffff mscorsvr!RegisterWaitForSingleObjectCallback_Worker+0x139
0d58fafc 791ebb31 00101ae0 791ebbcc 02554bb0 mscorsvr!Thread::DoADCallBack+0x5c
0d58fb70 791d4096 02554bb0 02537ae8 793ecec8 mscorsvr!QueueUserWorkItemCallback+0xe3
0d58fb84 791d4055 02537ae8 00000000 791d3fe5 mscorsvr!ThreadpoolMgr::ExecuteWorkRequest+0x19
0d58fba4 791fdacc 00000000 79172262 79172270 mscorsvr!ThreadpoolMgr::WorkerThreadStart+0x129
0d58ffb8 77e66063 0254c108 00000000 00000000 mscorsvr!ThreadpoolMgr::intermediateThreadProc+0x44
0d58ffec 00000000 791fda8b 0254c108 00000000 kernel32!BaseThreadStart+0x34

...this means that someone from managed code used WaitHandle for synchronization, and calls WaitOne().

Checking the managed bit of this thread we can see that we are in WebForm1.Page_Load, calling into a function SlowStuff.seventoo.Service.DoSomethingSlow(), which in turn invokes a web service call, and now sits waiting in a function called GetRequestStream().

0:061> !clrstack
Thread 61
0x0d58f5d8 0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Boolean System.Threading.WaitHandle.WaitOneNative(I,UI4,Boolean)
0x0d58f5ec 0x799e4bb1 [DEFAULT] [hasThis] Boolean System.Threading.WaitHandle.WaitOne(I4,Boolean)
0x0d58f620 0x0c9a5cb7 [DEFAULT] [hasThis] Class System.IO.Stream System.Net.HttpWebRequest.GetRequestStream()
0x0d58f64c 0x0c9a00c2 [DEFAULT] [hasThis] SZArray Object System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String,SZArray Object)
0x0d58f694 0x0c870721 [DEFAULT] [hasThis] String SlowStuff.seventoo.Service.DoSomethingSlow()
0x0d58f6b4 0x0c870532 [DEFAULT] [hasThis] Void SlowStuff.WebForm1.Page_Load(Object,Class System.EventArgs)
0x0d58f6dc 0x0c952ea4 [DEFAULT] [hasThis] Void System.Web.UI.Control.OnLoad(Class System.EventArgs)
0x0d58f6ec 0x0c952de4 [DEFAULT] [hasThis] Void System.Web.UI.Control.LoadRecursive()
0x0d58f700 0x0c952097 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x0d58f744 0x0c950eef [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x0d58f780 0x0c95095b [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)
0x0d58f788 0x0c950934 [DEFAULT] [hasThis] Void System.Web.HttpApplication/CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
0x0d58f798 0x0c76de50 [DEFAULT] [hasThis] Class System.Exception System.Web.HttpApplication.ExecuteStep(Class IExecutionStep,ByRef Boolean)
0x0d58f7e0 0x0c76d8ba [DEFAULT] [hasThis] Void System.Web.HttpApplication.ResumeSteps(Class System.Exception)
0x0d58f828 0x0c76d783 [DEFAULT] [hasThis] Class System.IAsyncResult System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(Class System.Web.HttpContext,Class System.AsyncCallback,Object)
0x0d58f844 0x02446897 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)
0x0d58f880 0x0cb43cf4 [DEFAULT] [hasThis] Void System.Web.RequestQueue.WorkItemCallback(Object)
0x0d58fac0 0x791b3208 [FRAME: ContextTransitionFrame]

If we run ~* e !clrstack to get the managed stack for all the threads we find that 33 requests are waiting like this in GetRequestStream and 2 requests are waiting in GetResponse.

0:042> kb 2000
ChildEBP RetAddr Args to Child
0d0cebc4 7c822114 77e6711b 00000001 0d0cec14 ntdll!KiFastSystemCallRet
0d0cebc8 77e6711b 00000001 0d0cec14 00000000 ntdll!NtWaitForMultipleObjects+0xc
0d0cec70 791e0b3b 00000001 0d0ced50 00000001 kernel32!WaitForMultipleObjectsEx+0x11a
0d0ceca0 791e0bdd 00000001 0d0ced50 00000001 mscorsvr!Thread::DoAppropriateWaitWorker+0xc1
0d0cecf4 791d5ba9 00000001 0d0ced50 00000001 mscorsvr!Thread::DoAppropriateWait+0x46
0d0ced3c 024840c8 0d0ced48 00000000 000186a0 mscorsvr!WaitHandleNative::CorWaitOneNative+0x6f
WARNING: Frame IP not in any known module. Following frames may be wrong.
0d0ceda8 0ca29fd5 00000000 02970580 02970580 0x24840c8
0d0ceddc 0ca29e49 02970580 02970580 0677c0c0 0xca29fd5
0d0cee0c 0ca29e06 0c9a0110 02970580 0677c0c0 0xca29e49
0d0cee54 0c870721 02970a0c 0296f868 02970580 0xca29e06
0d0cee78 0c870532 02970560 0296f868 067592f0 0xc870721
0d0cee9c 0c952ea4 067592f0 0296f868 0296c800 0xc870532
0d0cefcc 799a67a2 0296c868 0296cb6c 799b4168 0xc952ea4
0d0cefd8 799b4168 00000002 00000000 024a14c4 mscorlib_79990000+0x167a2
0d0cf060 79217188 00000000 00dc9160 0672a508 mscorlib_79990000+0x24168
02442f60 24548bca 2c15ff10 8b020b95 8bcb8bd8 mscorsvr!ComCallMLStubCache::CompileMLStub+0x429
02442f60 00000000 2c15ff10 8b020b95 8bcb8bd8 0x24548bca

Thread 42
0x0d0ced6c 0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Boolean System.Threading.WaitHandle.WaitOneNative(I,UI4,Boolean)
0x0d0ced80 0x799e4bb1 [DEFAULT] [hasThis] Boolean System.Threading.WaitHandle.WaitOne(I4,Boolean)
0x0d0cedb4 0x0ca29fd5 [DEFAULT] [hasThis] Class System.Net.WebResponse System.Net.HttpWebRequest.GetResponse()
0x0d0cede4 0x0ca29e49 [DEFAULT] [hasThis] Class System.Net.WebResponse System.Web.Services.Protocols.WebClientProtocol.GetWebResponse(Class System.Net.WebRequest)
0x0d0cee14 0x0ca29e06 [DEFAULT] [hasThis] Class System.Net.WebResponse System.Web.Services.Protocols.HttpWebClientProtocol.GetWebResponse(Class System.Net.WebRequest)
0x0d0cee18 0x0c9a0110 [DEFAULT] [hasThis] SZArray Object System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String,SZArray Object)
0x0d0cee60 0x0c870721 [DEFAULT] [hasThis] String SlowStuff.seventoo.Service.DoSomethingSlow()
0x0d0cee80 0x0c870532 [DEFAULT] [hasThis] Void SlowStuff.WebForm1.Page_Load(Object,Class System.EventArgs)
0x0d0ceea8 0x0c952ea4 [DEFAULT] [hasThis] Void System.Web.UI.Control.OnLoad(Class System.EventArgs)
0x0d0ceeb8 0x0c952de4 [DEFAULT] [hasThis] Void System.Web.UI.Control.LoadRecursive()
0x0d0ceecc 0x0c952097 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x0d0cef10 0x0c950eef [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x0d0cef4c 0x0c95095b [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)
0x0d0cef54 0x0c950934 [DEFAULT] [hasThis] Void System.Web.HttpApplication/CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
0x0d0cef64 0x0c76de50 [DEFAULT] [hasThis] Class System.Exception System.Web.HttpApplication.ExecuteStep(Class IExecutionStep,ByRef Boolean)
0x0d0cefac 0x0c76d8ba [DEFAULT] [hasThis] Void System.Web.HttpApplication.ResumeSteps(Class System.Exception)
0x0d0ceff4 0x0c76d783 [DEFAULT] [hasThis] Class System.IAsyncResult System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(Class System.Web.HttpContext,Class System.AsyncCallback,Object)
0x0d0cf010 0x02446897 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)
0x0d0cf04c 0x02446448 [DEFAULT] Void System.Web.HttpRuntime.ProcessRequest(Class System.Web.HttpWorkerRequest)
0x0d0cf058 0x02442fc5 [DEFAULT] [hasThis] I4 System.Web.Hosting.ISAPIRuntime.ProcessRequest(I,I4)
0x0d0cf120 0x79217188 [FRAME: ContextTransitionFrame]
0x0d0cf200 0x79217188 [FRAME: ComMethodFrame]

Hmm... so what are these waiting for?

Well... when we are waiting in GetResponse we have sent the request to the web service server and we are waiting for it to return, in which case we can either be blocked/waiting on the server or we could be waiting on a callback thread to process the results on. In our case we have not exhausted the thread pool so we would have had free threads to use as callback threads. So... we are likely waiting on results back from the server on the two threads that are in GetResponse().

Although these two threads are interesting I find it even more interesting that we have 33 threads waiting in GetRequestStream(). This function is just supposed to get us a stream that we can send the request on. OK! So that seems like something that would be rather quick, however in doing so it also needs to get a connection.

So... what would be limiting us from getting a connection?

System.Net.Connections are managed by a class called ServicePoint, that contains information about the URI it services, the maximum number of connections, whether it expects 100-continue etc.

From !dso (!dumpstackobjects) on thread 61 (which is waiting on GetRequestStream() we can get the HttpWebRequest to the Web Service.

0:061> !dso
Thread 61
ESP/REG Object Name
0xd58f5f0 0x6c8b0d0 System.Net.HttpWebRequest
0xd58f5f4 0x6c8f23c System.Net.LazyAsyncResult
0xd58f5f8 0x6c95868 System.Threading.ManualResetEvent
0xd58f610 0x6c8b0d0 System.Net.HttpWebRequest
0xd58f620 0x6c60b10 SlowStuff.seventoo.Service
0xd58f624 0x6c60b10 SlowStuff.seventoo.Service
0xd58f628 0x677c0c0 System.String DoSomethingSlow
0xd58f630 0x6c8b0d0 System.Net.HttpWebRequest
0xd58f634 0x6c8c2e0 System.String
0xd58f638 0x677335c System.String
0xd58f64c 0x6c60b10 SlowStuff.seventoo.Service

And if we dump this out we find that it has a member variable called _ServicePoint, so that looks pretty interesting...

0:061> !do 0x6c8b0d0
Name: System.Net.HttpWebRequest
MethodTable 0x0c978b94
EEClass 0x0c96e798
Size 204(0xcc) bytes
GC Generation: 0
mdToken: 0x020001fa (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x0c977cf4
MT Field Offset Type Attr Value Name
0x79b9788c 0x40000b6 0x4 CLASS instance 0x00000000 __identity
0x0c9774d0 0x4000f0c 0 CLASS shared static s_PrefixList
>> Domain:Value 0x000d5330:NotInit 0x001024b8:0x0274a580 <<
0x0c978b94 0x4000f30 0x10 CLASS instance 0x06c8f23c _WriteAResult
0x0c978b94 0x4000f31 0x14 CLASS instance 0x00000000 _ReadAResult
0x0c978b94 0x4000f32 0x18 CLASS instance 0x00000000 _ContinueDelegate
0x0c978b94 0x4000f33 0x1c CLASS instance 0x0274e5c4 _ServicePoint
0x0c978b94 0x4000f34 0x20 CLASS instance 0x06c91c58 _WriteEvent
0x0c978b94 0x4000f35 0x24 CLASS instance 0x00000000 _WriteInProgressEvent
0x0c978b94 0x4000f36 0xb4 System.Boolean instance 0 _HaveResponse
0x0c978b94 0x4000f37 0xb5 System.Boolean instance 0 _OnceFailed
0x0c978b94 0x4000f38 0xb6 System.Boolean instance 0 _WriteNotifed
0x0c978b94 0x4000f39 0x28 CLASS instance 0x00000000 _HttpResponse
0x0c978b94 0x4000f3a 0x2c CLASS instance 0x00000000 _ResponseException
0x0c978b94 0x4000f3b 0x30 CLASS instance 0x00000000 _CoreResponse
0x0c978b94 0x4000f3e 0x34 CLASS instance 0x027447d0 _Verb
0x0c978b94 0x4000f3f 0x38 CLASS instance 0x027447d0 _OriginVerb
0x0c978b94 0x4000f40 0x3c CLASS instance 0x06c8b6b4 _HttpRequestHeaders
0x0c978b94 0x4000f41 0x40 CLASS instance 0x00000000 _WriteBuffer
0x0c978b94 0x4000f42 0x94 System.Int32 instance 2 _HttpWriteMode
0x0c978b94 0x4000f43 0xb7 System.Boolean instance 0 _AllowAutoRedirect
0x0c978b94 0x4000f44 0x44 CLASS instance 0x06c60b60 _Uri
0x0c978b94 0x4000f45 0x48 CLASS instance 0x06c60b60 _OriginUri
0x0c978b94 0x4000f46 0x4c CLASS instance 0x00000000 _ChallengedUri
0x0c978b94 0x4000f47 0x50 CLASS instance 0x00000000 _ChallengedSpn
0x0c978b94 0x4000f48 0x54 CLASS instance 0x00000000 _MediaType
0x0c978b94 0x4000f49 0x8 System.Int64 instance -1 _ContentLength
0x0c978b94 0x4000f4a 0x58 CLASS instance 0x0274c80c _Version

Btw, we can verify that we have picked the right one by dumping out the Uri

0:061> !do 0x06c60b60
Name: System.Uri
MethodTable 0x020b4eb0
EEClass 0x02472e10
Size 76(0x4c) bytes
GC Generation: 0
mdToken: 0x020001c4 (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x020b4804
MT Field Offset Type Attr Value Name
0x79b9788c 0x40000b6 0x4 CLASS instance 0x00000000 __identity
0x020b4eb0 0x4000e38 0x8 CLASS instance 0x06c60e44 m_AbsoluteUri
0x020b4eb0 0x4000e39 0x3c System.Boolean instance 0 m_AlreadyEscaped
0x020b4eb0 0x4000e3a 0xc CLASS instance 0x066f0230 m_Fragment
0x020b4eb0 0x4000e3b 0x3d System.Boolean instance 0 m_FragmentEscaped
0x020b4eb0 0x4000e3c 0x2c System.Int32 instance -41085911 m_Hash

0:061> !do 0x06c60e44
Name: System.String
MethodTable 0x79b94638
EEClass 0x79b94984
Size 108(0x6c) bytes
GC Generation: 0
mdToken: 0x0200000f (c:\windows\\framework\v1.1.4322\mscorlib.dll)
String: http://seventoo/myslowwebservice/service.asmx
FieldDesc*: 0x79b949e8
MT Field Offset Type Attr Value Name
0x79b94638 0x4000013 0x4 System.Int32 instance 46 m_arrayLength
0x79b94638 0x4000014 0x8 System.Int32 instance 45 m_stringLength
0x79b94638 0x4000015 0xc System.Char instance 0x68 m_firstChar
0x79b94638 0x4000016 0 CLASS shared static Empty
>> Domain:Value 0x000d5330:0x066f0230 0x001024b8:0x066f0230 <<
0x79b94638 0x4000017 0x4 CLASS shared static WhitespaceChars
>> Domain:Value 0x000d5330:0x066f0244 0x001024b8:0x027122b8 <<

Back to the service point

0:061> !do 0x0274e5c4
Name: System.Net.ServicePoint
MethodTable 0x0c9c5d08
EEClass 0x0c9b6840
Size 92(0x5c) bytes
GC Generation: 1
mdToken: 0x02000247 (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x0c9c5730
MT Field Offset Type Attr Value Name
0x0c9c5d08 0x400112d 0x48 System.Boolean instance 0 m_ProxyServicePoint
0x0c9c5d08 0x400112e 0x49 System.Boolean instance 0 m_UserChangedLimit
0x0c9c5d08 0x400112f 0x2c System.Int32 instance 2 m_ConnectionLimit
0x0c9c5d08 0x4001130 0x30 System.Int32 instance 2 m_ConnectionMode
0x0c9c5d08 0x4001131 0x4a System.Boolean instance 1 m_SupportsPipelining
0x0c9c5d08 0x4001132 0x4 CLASS instance 0x0274e620 m_ConnectionGroupList
0x0c9c5d08 0x4001133 0x8 CLASS instance 0x0677bf00 m_Address
0x0c9c5d08 0x4001134 0x34 System.Int32 instance 900000 m_MaxIdleTime
0x0c9c5d08 0x4001135 0x50 VALUETYPE instance start at 0x0274e614 m_ExpiresAt
0x0c9c5d08 0x4001136 0xc CLASS instance 0x0677bf4c m_ConnectionName
0x0c9c5d08 0x4001137 0x10 CLASS instance 0x0677e2e8 m_IPHostEntryInfos
0x0c9c5d08 0x4001138 0x38 System.Int32 instance 0 m_CurrentAddressInfoIndex
0x0c9c5d08 0x4001139 0x3c System.Int32 instance 2 m_ConnectionsSinceDns
0x0c9c5d08 0x400113a 0x40 System.Int32 instance 2 m_CurrentConnections
0x0c9c5d08 0x400113b 0x14 CLASS instance 0x00000000 m_ServerCertificate
0x0c9c5d08 0x400113c 0x18 CLASS instance 0x00000000 m_ClientCertificate
0x0c9c5d08 0x400113d 0x4b System.Boolean instance 1 m_UseNagleAlgorithm
0x0c9c5d08 0x400113e 0x4c System.Boolean instance 1 m_Expect100Continue
0x0c9c5d08 0x400113f 0x1c CLASS instance 0x0274df08 m_LookupString
0x0c9c5d08 0x4001140 0x20 CLASS instance 0x0677e248 m_CachedRemoteIPAddressInfo
0x0c9c5d08 0x4001141 0x24 CLASS instance 0x027543e4 m_Version
0x0c9c5d08 0x4001142 0x4d System.Boolean instance 1 m_Understands100Continue
0x0c9c5d08 0x4001143 0x28 CLASS instance 0x0274e72c DnsLock
0x0c9c5d08 0x4001144 0x44 System.Int32 instance 140 m_HashCode
0x0c9c5d08 0x4001145 0x4e System.Boolean instance 1 m_ComputedHashCode

Hmm, it has a connection limit of 2. That sorta rings a bell, we have two requests in GetResponse(), i.e. in the middle of some web service requests, and the rest waiting for a connection, so this makes total sense.

We could actually stop right here and conclude that this is the source of the problem. I.e. that we only have two available connections, but let's talk about that a little bit later and dig a bit further instead...

If we go back to thread 42 (GetResponse()) and run !dso there, we find a ConnectStream on the stack

0:042> !dso
Thread 42
ESP/REG Object Name
0xd0cee2c 0x6cef7b8 System.Net.ConnectStream

This connect stream has a connection (one of the two created by the service point)

0:042> !do 0x6cef7b8
Name: System.Net.ConnectStream
MethodTable 0x0c9c44c0
EEClass 0x0c9b6270
Size 104(0x68) bytes
GC Generation: 0
mdToken: 0x02000272 (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x0c9c3e5c
MT Field Offset Type Attr Value Name
0x79b9788c 0x40000b6 0x4 CLASS instance 0x00000000 __identity
0x79b9c140 0x4000b17 0 CLASS shared static Null
>> Domain:Value 0x000d5330:0x066f1eec 0x001024b8:0x0674a16c <<
0x0c9c44c0 0x400122f 0x3c System.Int32 instance 1 m_CallNesting
0x0c9c44c0 0x4001230 0x20 CLASS instance 0x06cf0c28 m_BufferedData
0x0c9c44c0 0x4001231 0x58 System.Boolean instance 1 m_WriteBufferEnable
0x0c9c44c0 0x4001232 0x59 System.Boolean instance 1 m_BufferOnly
0x0c9c44c0 0x4001233 0x8 System.Int64 instance 293 m_TotalBytesToWrite
0x0c9c44c0 0x4001234 0x10 System.Int64 instance 0 m_BytesLeftToWrite
0x0c9c44c0 0x4001235 0x24 CLASS instance 0x0274f03c m_Connection
0x0c9c44c0 0x4001236 0x5a System.Boolean instance 1 m_WriteStream
0x0c9c44c0 0x4001237 0x28 CLASS instance 0x00000000 m_ReadBuffer
0x0c9c44c0 0x4001238 0x40 System.Int32 instance 0 m_ReadOffset
0x0c9c44c0 0x4001239 0x44 System.Int32 instance 0 m_ReadBufferSize

0:061> !do 0x0274f03c
Name: System.Net.Connection
MethodTable 0x0c9c52f0
EEClass 0x0c9b6454
Size 152(0x98) bytes
GC Generation: 1
mdToken: 0x02000270 (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x0c9c4bb8
MT Field Offset Type Attr Value Name
0x0c9c52f0 0x40011f3 0x80 System.Boolean instance 1 m_CanPipeline
0x0c9c52f0 0x40011f4 0x81 System.Boolean instance 0 m_Pipelining
0x0c9c52f0 0x40011f5 0x82 System.Boolean instance 1 m_KeepAlive
0x0c9c52f0 0x40011f6 0x58 System.Int32 instance 0 m_Error
0x0c9c52f0 0x40011f7 0x4 CLASS instance 0x027be068 m_ReadBuffer
0x0c9c52f0 0x40011f8 0x5c System.Int32 instance 25 m_BytesRead
0x0c9c52f0 0x40011f9 0x60 System.Int32 instance 25 m_HeadersBytesUnparsed
0x0c9c52f0 0x40011fa 0x64 System.Int32 instance 25 m_BytesScanned
0x0c9c52f0 0x40011fb 0x68 System.Int32 instance 0 m_TotalResponseHeadersLength
0x0c9c52f0 0x40011fc 0x6c System.Int32 instance 65536 m_MaximumResponseHeadersLength
0x0c9c52f0 0x40011fd 0x8 CLASS instance 0x06d0f0bc m_ResponseData
0x0c9c52f0 0x40011fe 0x70 System.Int32 instance 1 m_ReadState
0x0c9c52f0 0x40011ff 0xc CLASS instance 0x027bf074 m_WaitList
0x0c9c52f0 0x4001200 0x10 CLASS instance 0x027bf0dc m_WriteList
0x0c9c52f0 0x4001201 0x74 System.Int32 instance 25 m_CurrentRequestIndex
0x0c9c52f0 0x4001202 0x14 CLASS instance 0x027bf1c0 m_StatusLineInts
0x0c9c52f0 0x4001203 0x18 CLASS instance 0x066f0230 m_StatusDescription
0x0c9c52f0 0x4001204 0x78 System.Int32 instance 0 m_StatusState

The connection in turn has a waitlist that contains all the requests that are waiting for this particular connection

0:061> !do 0x027bf074
Name: System.Collections.ArrayList
MethodTable 0x79ba2ee4
EEClass 0x79ba3020
Size 24(0x18) bytes
GC Generation: 0
mdToken: 0x02000100 (c:\windows\\framework\v1.1.4322\mscorlib.dll)
FieldDesc*: 0x79ba3084
MT Field Offset Type Attr Value Name
0x79ba2ee4 0x4000362 0x4 CLASS instance 0x02a7bd44 _items
0x79ba2ee4 0x4000363 0xc System.Int32 instance 17 _size
0x79ba2ee4 0x4000364 0x10 System.Int32 instance 35 _version
0x79ba2ee4 0x4000365 0x8 CLASS instance 0x00000000 _syncRoot

In this case, 17 requests are waiting for this connection, WOW!!!

And if we would run

0:061> .foreach (req {!do -v 0x02a7bd44 -short}) {!gcroot ${req}}

To find out on what threads all these requests are rooted/used we would see them rooted on thread 24, 26, 28, 30, 32, 34, 36, 37, 38, 44, 46, 48, 50, 52, 54, 56, and 58

The roots look something like this:

Scan Thread 46 (0x1540)

So all these threads are waiting for our thread 42 to finish. In other words, the last thread in the waitlist will have to wait for (16*Web Service execution time) + time for currently executing request, before it can even start with its request. Now, that can be a pretty long time:)

When a new request comes in it gets scheduled in on one of the two connections and stands in line awaiting its turn, just like a busy day at the DMV.

A little background

Why 2 connections??? that seems like an awfully small number...

Let's take a look at RFC 2616, the RFC for Hypertext Transfer Protocol -- HTTP/1.1. This is what it says about HTTP connections.

"Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy. A proxy SHOULD use up to 2*N connections to another server or proxy, where N is the number of simultaneously active users. These guidelines are intended to improve HTTP response times and avoid congestion."

So now you know where the number 2 likely came from. But as you can see, in our case it presents a bit of a problem.

Solutions please!!!

In the <> section of machine.config you can set the maxConnection value to indicate how many concurrent connections you want to allow. By default this is set to * and 2, which means that we will have a maximum of 2 simultaneous connections to any given address/port.

Each service point handles a particular URI, so in essence this means that any given service point will have a maximum of 2 connections.

<add address="*" maxconnection="2"/>

You can either change the general maxconnection setting or add one for the particular ipaddress / server name where your web service resides.

Along with setting this up, you should also look over the numbers for maxWorkerThreads/maxIOThreads, minFreeThreads and minLocalRequestFreeThreads when calling web services, per this kb article, but the biggest contention point is definitely maxconnections.

Better yet, in 2.0 a new machine config setting was introduced, and made the default...

<autoconfig = "true">

This automatically configures these values for you per the article above

Wrap up

I know that in this post in particular I might have made huge leaps of logic in some places, my college algebra teacher definitely wouldn't have been proud:) but I want to defend myself by saying that I am trying to show debugging in a step-by-step fashion, when really a lot of the time, when I am debugging, I use the "poke around until I find something interesting" technique that I mentioned earlier...

So far the posts have touched as distant topics as garbage collection, pinvokes, xmlserialization, intrinsics, and now connection management. It is impossible to know how everything is implemented which is why I am trying to post case studies on a lot of different areas so at least you will know bits and pieces of how things work and build upon that.

Have fun out there!



Comments (30)
  1. There are some blog posts with great technical information you just wouldn’t get from a plain knowledge…

  2. qdzx2008 says:

    .NET 点滴 ASP.NET 2.0

  3. bluesign [deniz at edincik dot com] says:

    i am really impressed about your article, but I am curious about performance issues, other than connection limit for using web services to connect to backend.

    also what is the definition of ‘peak volume’ in this article, (req/s) ?

    ps; i am happy to find such a cool blog about debugging

  4. Tess says:

    Thank you much:)

    Other performance issues would include

    a) the web service taking a long time for whatever reason, i.e. locks, slow database requests etc.

    b) dns issues (if its slow doing dns lookups)

    c) serialization and deserialization of parameters and return values. This can be pretty expensive both timewise and memorywise if for example you are passing large datasets back and forth.


  5. bluesign [deniz at edincik dot com] says:

    thanks 🙂  

    write more blog you should 😉

  6. Howard says:

    Hi Tess

    I’ve a question about the !threadpool command. The Completion Port Thread: line of the output from that command, as I understand it these threads are used to transport requests from IIS to ASP.NET (worker process), is that correct?

    I’ve a hang dump of a customers site which has the following output from the !threadpool command;

    0:000> !threadpool

    Worker Thread: Total: 4 Running: 4 Idle: 2 MaxLimit: 100 MinLimit: 4

    Work Request in Queue: 0


    Number of Timers: 0


    Completion Port Thread: Total: 0 Free: 0 MaxFree: 8 CurrentLimit: 0 MaxLimit: 1000 MinLimit: 4

    where the number of free completion port threads is 0. I would have expected the ‘free’ count to be the same as the MinLimit.

    At this point our app has ground to a hault, there’s no threads waiting in any of the stacks, but there’s a large amount of threadabortexceptions.

    My main question is how could the Free: count of completion threads ever be 0?


    Hope you’re over the flu 🙂

  7. Tess says:

    Thank you Howard, starting to feel better:)

    The completion port threads are usually used to receive results from async requests.  Requests from IIS to ASP.NET are brought in on native threads before they are dispatched as work requests to the queue, so in your case it seems like the requests are stuck either before they come in to or they are stuck on the way back. If it is on IIS 5, i would take a look at the inetinfo process.

    The minlimit is usually set to one per CPU unless you specify something else, and this is used to determine wether or not the threadpool should grow in case of a

    new request for a completion port thread. Contrary to what the name suggests it doesnt tell you the minimum number of IO/Completion port threads in the process.  

    Here is a pretty good explanation on how it works for worker threads (pretty much the same goes work io threads)

    and your MinLimit is shown as "CPU’s" in the ShouldGrowThreadPool diagram.

    (Quick note: The performance "problems" he mentions when doing a burst of requests can be fixed by configuring minWorkerThreads and minIOThreads as mentioned in, however the value should be left at the default (0) if your application does not have sudden bursts of requests)

    The threadpool is grown and shrunk based on a number of different factors, such as CPU usage, # of incomming and queued requests etc.

    Hope this helps clarify the output.


  8. Howard says:

    Hey Tess

    Thanks very much for the reply and the links, it does help a great deal. I’m still not sure I completely understand whats going on, but I’m sure it’ll eventually sink in.

    So the value of 4 makes sense for the minLimit because I’ve four heaps in that dump. In the same vien as your answer does the currentLimit really _mean_ the currentLimit? If so how could this ever be 0, if the currentLimit is 0 surely that would mean no requests would ever be serviced, or am I just being thick? Unless it means that the threadpool doesn’t need to grow anymore at the moment?. I know, so many questions. I guess I just need to dissect the code in the threadtests example on the codeproject.

  9. Tess says:

    To be perfectly honest i dont know what the currentlimit means, but i think (mayor guess here) that it could be the number of request that come in before we start looking at if we should grow the threadpool.  

    Either way, i know for sure, that it wont stop the threadpool from creating new such threads.

    I think the problem in your particular case is that requests get stuck either before they come in to the aspnet isapi or on their way back,  and its very hard to say why that would be.  It could be anything from a SSL accellerator thats acting up, to a bug of some sort to faulty content length causing requests to not be seen as completed.  

  10. Howard says:

    Hi Tess

    You were right in the fact that the requests were stuck outside of ASP.NET. It was a nasty deadlock situation involving Oracle & DCOM and an inappropriate critical section.

    I’m just posting this incase somebody gets a threadpool output like the one I posted above and might wonder where to start looking for the problem.

    There’s still many things to learn about the threadpool though and what the numbers actually mean.


  11. Tess says:

    Very nice to know,  btw, how did you find out?

    (sorry, just too curious for my own good some times:))

  12. Howard says:

    Yeah you are, thought you’d be sick of crashdumps by now 🙂

    We reproduced the problem on a couple of test systems and grabbed the dumps in the hang state. !clrstack showed that the methods at the top of the stacks were the same in each case so we knew we probably had captured the problem. Each of the methods at the top of the stacks needs to enter a critical section, running !locks showed 1 thread entering the lock 3 times, sounds a bit fishy.

    On one of the unmanaged stacks there was an outgoing RPC call preceded by a call to FinalRelease() on a COM interface, so something is being destructed and needs to make a DCOM call to clear up.

    Putting 2 and 2 together and cutting a long story short this was the code attempting to kill a db query. So why would it hang attempting to kill a db query? ah well thats because the original query had timed out the frontend but was still running in the database. So when the frontend destructed the object containing the db COM intefaces it also grabbed the lock, but it couldn’t destruct until the db interfaces released, but they couldn’t release because the query was still running. You needed the same lock to start any db operations, oh dear.

    As more db querys were started they all were waiting for the lock to be released, eventually the threadpool was exhausted hence the output above. This is all in a mixed mode C++ dll, is it managed? is it not? who knows 🙂

    Sounds so simple now 🙂

  13. Tess says:

    nice catch:)

  14. The following links to .NET resources have been collated over time with the assistance of colleagues.&amp;nbsp;…

  15. In my previous post on this topic,&amp;nbsp; I concluded with three key points:

    The Begin function does…

  16. I have talked about a number of different hang/performance issues in my posts. This post is a generic

  17. lb says:

    My question is less clever than everyone elses. I don’t even know what commandline/console window or application you’ve typed your command into:

    0:000> !threadpool

    is this the command window in vs 2005? is it a console app? it’s some special debugging app isn’t it… i wish i knew… i’ll be reading backlogs of your blog to try and find my bearings…

  18. lb says:

    hmmm… ntsd… i am learning… but my head hurts…

  19. says:


  20. 清炒白菜 says:



  21. I’m sorry I haven’t had a chance to write much lately because of a lot of different reasons. I moved

  22. Amudhan says:

    I have a doubt regarding the maximum number of connections that a webservice can accept simultaneously.

    How do we determine the optimum limit of the maximum connections for a webservice?

    Also please specify some tools to perform stress testing of my webservice.

  23. In a recent MS internal performance gig we encountered an interesting issue with the maxconnection setting

  24. I started this blog 2.5 years ago today, mostly because I felt that the same types of issues came up

  25. Alex says:

    Where did you learn how to use !threadpool and do dumps and understand them?  I’ve been doing ASP.Net for several years and that still looks like Greek to me.

  26. Jason says:

    Great post, thanks. This is way late, but we're running into an issue where our switch to a new server posed an interesting DCOM issue with threading, where we're needing to switch to COM+, so that the webservices can operate on a sufficient threading process model.  I'm working on this as I type (well, not literally as I type, but…).

Comments are closed.

Skip to main content