WCF host process hung: From winDBG perspective
One of my customers complaints of an unstable WCF service on his production environment. I heard them saying: "The service is frozen", "Getting unknown error from the client side UI", "The service is unresponsive". The real challenge of debugging a production issues is to reproduce it in some other kind of environment. And this is really a BIG task. But I was somewhat lucky; using my unit tests from a VSTS test client , I ran it for almost 10/12 hours until I reached a situation where all the complaints seem to be true. I took a memory dump of my WCF service process.
I tried to launch the client GUI. But unfortunately, after sometime I saw "Unknown Error" form the GUI. I checked the client side log and found something very interesting: 'The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:09:59.7289458'.
I tried to visualize the launching of the GUI application. It will contact almost 5/6 endpoints of the back end WCF service before successfully launching it. So, I suspect it can be because of at least one of the endpoint which is timing out.
This great blog: https://blogs.msdn.com/mahjayar/archive/2006/12/12/debugging-wcf-applications.aspx really helped me to debug.
I started with dumping the ServiceHost types.
0:025> !dumpheap -type System.ServiceModel.ServiceHost
Statistics:
MT Count TotalSize Class Name
50e822f4 11 220 System.ServiceModel.ExtensionCollection`1[[System.ServiceModel.ServiceHostBase, System.ServiceModel]]
50eaf9c8 11 264 System.Collections.Generic.List`1[[System.ServiceModel.IExtension`1[[System.ServiceModel.ServiceHostBase, System.ServiceModel]], System.ServiceModel]]
50e62118 11 352 System.ServiceModel.ServiceHost+ReflectedContractCollection
50e553e8 11 1496 System.ServiceModel.ServiceHost
I found 11 instances of System.ServiceModel.ServiceHost type. I checked the wcf servicehost config file and found there are 11 endpoints configured. So, it makes sense.
My particular interest was to see if any of the endpoints had reached it's throttle limit. To look into it I dumped System.ServiceModel.Dispatcher.FlowThrottle type, and found 33 objects.
0:025> !dumpheap -type System.ServiceModel.Dispatcher.FlowThrottle
Address MT Size
00fe56f8 50e49ffc 36
00fe57b4 50e49ffc 36
0103bda8 50e49ffc 36
0110cc74 50e49ffc 36
0110cce4 50e49ffc 36
0112ef04 50e49ffc 36
01287bd0 50e49ffc 36
01287c40 50e49ffc 36
0129b290 50e49ffc 36
012d6a94 50e49ffc 36
012d6b04 50e49ffc 36
012e57b0 50e49ffc 36
01305bd8 50e49ffc 36
01305c48 50e49ffc 36
013114f4 50e49ffc 36
0132886c 50e49ffc 36
013288dc 50e49ffc 36
01332840 50e49ffc 36
01337950 50e49ffc 36
013379c0 50e49ffc 36
01357ba0 50e49ffc 36
013e873c 50e49ffc 36
013e87ac 50e49ffc 36
013f4744 50e49ffc 36
0150ebd8 50e49ffc 36
0150ec48 50e49ffc 36
01517d00 50e49ffc 36
01523768 50e49ffc 36
015237d8 50e49ffc 36
01530948 50e49ffc 36
015452b4 50e49ffc 36
01545324 50e49ffc 36
0154d508 50e49ffc 36
total 33 objects
Statistics:
MT Count TotalSize Class Name
50e49ffc 33 1188 System.ServiceModel.Dispatcher.FlowThrottle
Total 33 objects
I was little curious to know why 33 objects are present for 11 endpoints. I checked the endpoint behavior at the server side :
<behavior name="HermesServiceBehavior">
<serviceMetadata />
<serviceDebug includeExceptionDetailInFaults="true" />
<serviceThrottling maxConcurrentCalls="64"
maxConcurrentSessions="64"
maxConcurrentInstances="2147483647" />
</behavior>
So, it answers my question; each endpoint has the three throttle behaviors. => 11*3= 33 FlowThrottles
My next step was to dump all these FlowThrottle and check for any saturation.
0.25> .foreach (name {!dumpheap -type System.ServiceModel.Dispatcher.FlowThrottle -short}) {.echo -------------; .echo name; !do name}
My formatted output looked like this:
....................................................................
....................................................................
-------------
01287bd0
Name: System.ServiceModel.Dispatcher.FlowThrottle
MethodTable: 50e49ffc
EEClass: 50e49f8c
Size: 36(0x24) bytes
(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)
Fields:
MT Field Offset Type VT Attr Value Name
790fdb60 400327b 18 System.Int32 1 instance 64 capacity
790fdb60 400327c 1c System.Int32 1 instance 0 count
790f8a7c 400327d 4 System.Object 0 instance 01287bf4 mutex
791118a4 400327e 8 ...ding.WaitCallback 0 instance 01287bb0 release
00000000 400327f c 0 instance 01287c00 waiters
790f9244 4003280 10 System.String 0 instance 00fe565c propertyName
790f9244 4003281 14 System.String 0 instance 00fe56a0 configName
-------------
01287c40
Name: System.ServiceModel.Dispatcher.FlowThrottle
MethodTable: 50e49ffc
EEClass: 50e49f8c
Size: 36(0x24) bytes
(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)
Fields:
MT Field Offset Type VT Attr Value Name
790fdb60 400327b 18 System.Int32 1 instance 64 capacity
790fdb60 400327c 1c System.Int32 1 instance 64 count
790f8a7c 400327d 4 System.Object 0 instance 01287c64 mutex
791118a4 400327e 8 ...ding.WaitCallback 0 instance 01287c20 release
00000000 400327f c 0 instance 01287c70 waiters
790f9244 4003280 10 System.String 0 instance 00fe5620 propertyName
790f9244 4003281 14 System.String 0 instance 00fe5758 configName
-------------
0129b290
Name: System.ServiceModel.Dispatcher.FlowThrottle
MethodTable: 50e49ffc
EEClass: 50e49f8c
Size: 36(0x24) bytes
(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)
Fields:
MT Field Offset Type VT Attr Value Name
790fdb60 400327b 18 System.Int32 1 instance 2147483647 capacity
790fdb60 400327c 1c System.Int32 1 instance 0 count
790f8a7c 400327d 4 System.Object 0 instance 0129b2b4 mutex
791118a4 400327e 8 ...ding.WaitCallback 0 instance 0129b270 release
00000000 400327f c 0 instance 0129b2c0 waiters
790f9244 4003280 10 System.String 0 instance 0103bd48 propertyName
790f9244 4003281 14
..................................................................................
It was a great output, because of this text in RED. This proves that I was guessing on the right direction. There is actually one endpoint which has reached the throttle limit.
But which endpoint is this, we have 11?
Let's dump the related ServiceHost type:
0:025> !do 01287a34
01287a34
Name: System.ServiceModel.ServiceHost
MethodTable: 50e553e8
EEClass: 50e55368
Size: 136(0x88) bytes
(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)
Fields:
MT Field Offset Type VT Attr Value Name
79103c00 4000915 28 System.Boolean 1 instance 0 aborted
79103c00 4000916 29 System.Boolean 1 instance 0 closeCalled
.................................................................................................................
.................................................................................................................
50e04530 4002be2 3c ...erviceDescription 0 instance 01287cd0 description
00000000 4002be3 40 0 instance 01287b2c extensions
00000000 4002be4 44 0 instance 012d6834 externalBaseAddresses
00000000 4002be5 48 0 instance 01293a80 implementedContracts
50e124c0 4002be6 4c ...nceContextManager 0 instance 01287b58 instances
7910b5d8 4002be7 6c System.TimeSpan 1 instance 01287aa0 openTimeout
50df93e0 4002be8 50 ...rformanceCounters 0 instance 01295ef4 servicePerformanceCounters
50e1272c 4002be9 54 ...r.ServiceThrottle 0 instance 01287b80 serviceThrottle
50e321ac 4002bea 58 ...erviceCredentials 0 instance 00000000 readOnlyCredentials
50e1f218 4002beb 5c ...orizationBehavior 0 instance 012d67ec readOnlyAuthorization
00000000 4002bec 60 0 instance 00000000 UnknownMessageReceived
7a751b00 4002bdd 74c System.Uri 0 static 00fe546c EmptyUri
790f8a7c 4003075 74 System.Object 0 instance 00000000 singletonInstance
790ffe7c 4003076 78 System.Type 0 instance 01057f28 serviceType
50e62118 4003077 7c ...ontractCollection 0 instance 01287e00 reflectedContracts
7910b450 4003078 80 System.IDisposable 0 instance 00000000 disposableInstance
The name of the endpoint can be found in this ServiceDescription
0:025> !do 01287cd0
Name: System.ServiceModel.Description.ServiceDescription
MethodTable: 50e04530
EEClass: 50e04478
Size: 32(0x20) bytes
(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)
Fields:
MT Field Offset Type VT Attr Value Name
00000000 400306f 4 0 instance 01287cf0 behaviors
790f9244 4003070 8 System.String 0 instance 01287d9c configurationName
50dd0b68 4003071 c ...ndpointCollection 0 instance 01287d28 endpoints
790ffe7c 4003072 10 System.Type 0 instance 01057f28 serviceType
50dda3c4 4003073 14 ...scription.XmlName 0 instance 00000000 serviceName
790f9244 4003074 18 System.String 0 instance 00fe7a2c serviceNamespace
It should be inside configurationName.
0:025> !do 01287d9c
Name: System.String
MethodTable: 790f9244
EEClass: 790f91a4
Size: 100(0x64) bytes
(C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: **********. ********.ExternalService
Fields:
MT Field Offset Type VT Attr Value Name
790fdb60 4000096 4 System.Int32 1 instance 42 m_arrayLength
790fdb60 4000097 8 System.Int32 1 instance 41 m_stringLength
790fad38 4000098 c System.Char 1 instance 4d m_firstChar
790f9244 4000099 10 System.String 0 shared static Empty
>> Domain:Value 001475b8:790d57b4 <<
79122994 400009a 14 System.Char[] 0 shared static WhitespaceChars
>> Domain:Value 001475b8:00fc13f4 <<
Here we go. The endpoint which is saturated with the max throttle limit is ExternalService. When the GUI launches it actually contacts this endpoint to get some data.