WCF host process hung: From winDBG perspective

One of my customers complaints of an unstable WCF service on his production environment. I heard them saying: "The service is frozen", "Getting unknown error from the client side UI", "The service is unresponsive". The real challenge of debugging a production issues is to reproduce it in some other kind of environment. And this is really a BIG task. But I was somewhat lucky; using my unit tests from a VSTS test client , I ran it for almost 10/12 hours until I reached a situation where all the complaints seem to be true. I took a memory dump of my WCF service process.

I tried to launch the client GUI. But unfortunately, after sometime I saw "Unknown Error" form the GUI. I checked the client side log and found something very interesting: 'The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:09:59.7289458'.

I tried to visualize the launching of the GUI application. It will contact almost 5/6 endpoints of the back end WCF service before successfully launching it. So, I suspect it can be because of at least one of the endpoint which is timing out.

This great blog: https://blogs.msdn.com/mahjayar/archive/2006/12/12/debugging-wcf-applications.aspx really helped me to debug.

I started with dumping the ServiceHost types.

0:025> !dumpheap -type System.ServiceModel.ServiceHost

Statistics:
      MT    Count    TotalSize Class Name
50e822f4       11          220 System.ServiceModel.ExtensionCollection`1[[System.ServiceModel.ServiceHostBase, System.ServiceModel]]
50eaf9c8       11          264 System.Collections.Generic.List`1[[System.ServiceModel.IExtension`1[[System.ServiceModel.ServiceHostBase, System.ServiceModel]], System.ServiceModel]]
50e62118       11          352 System.ServiceModel.ServiceHost+ReflectedContractCollection
50e553e8 11 1496 System.ServiceModel.ServiceHost

I found 11 instances of System.ServiceModel.ServiceHost type. I checked the wcf servicehost config file and found there are 11 endpoints configured. So, it makes sense.

My particular interest was to see if any of the endpoints had reached it's throttle limit. To look into it I dumped  System.ServiceModel.Dispatcher.FlowThrottle type, and found 33 objects.

0:025> !dumpheap -type System.ServiceModel.Dispatcher.FlowThrottle
Address       MT     Size
00fe56f8 50e49ffc       36    
00fe57b4 50e49ffc       36    
0103bda8 50e49ffc       36    
0110cc74 50e49ffc       36    
0110cce4 50e49ffc       36    
0112ef04 50e49ffc       36    
01287bd0 50e49ffc       36    
01287c40 50e49ffc       36    
0129b290 50e49ffc       36    
012d6a94 50e49ffc       36    
012d6b04 50e49ffc       36    
012e57b0 50e49ffc       36    
01305bd8 50e49ffc       36    
01305c48 50e49ffc       36    
013114f4 50e49ffc       36    
0132886c 50e49ffc       36    
013288dc 50e49ffc       36    
01332840 50e49ffc       36    
01337950 50e49ffc       36    
013379c0 50e49ffc       36    
01357ba0 50e49ffc       36    
013e873c 50e49ffc       36    
013e87ac 50e49ffc       36    
013f4744 50e49ffc       36    
0150ebd8 50e49ffc       36    
0150ec48 50e49ffc       36    
01517d00 50e49ffc       36    
01523768 50e49ffc       36    
015237d8 50e49ffc       36    
01530948 50e49ffc       36    
015452b4 50e49ffc       36    
01545324 50e49ffc       36    
0154d508 50e49ffc       36    
total 33 objects
Statistics:
      MT    Count    TotalSize Class Name
50e49ffc       33         1188 System.ServiceModel.Dispatcher.FlowThrottle
Total 33 objects

I was little curious to know why 33 objects are present for 11 endpoints. I checked the endpoint behavior at the server side :

<behavior name="HermesServiceBehavior">
  <serviceMetadata />
  <serviceDebug includeExceptionDetailInFaults="true" />
  <serviceThrottling maxConcurrentCalls="64"
maxConcurrentSessions="64"
maxConcurrentInstances="2147483647" />

</behavior>

 

So, it answers my question; each endpoint has the three throttle behaviors. => 11*3= 33 FlowThrottles

 

My next step was to dump all these FlowThrottle  and check for any saturation.

0.25> .foreach (name {!dumpheap -type System.ServiceModel.Dispatcher.FlowThrottle -short}) {.echo -------------; .echo name; !do name}

My formatted output looked like this:

....................................................................

....................................................................

-------------
01287bd0
Name: System.ServiceModel.Dispatcher.FlowThrottle
MethodTable: 50e49ffc
EEClass: 50e49f8c
Size: 36(0x24) bytes
(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fdb60  400327b       18         System.Int32  1 instance       64 capacity
790fdb60  400327c       1c         System.Int32  1 instance        0 count
790f8a7c  400327d        4        System.Object  0 instance 01287bf4 mutex
791118a4  400327e        8 ...ding.WaitCallback  0 instance 01287bb0 release
00000000  400327f        c                       0 instance 01287c00 waiters
790f9244  4003280       10        System.String  0 instance 00fe565c propertyName
790f9244  4003281       14        System.String  0 instance 00fe56a0 configName
-------------
01287c40
Name: System.ServiceModel.Dispatcher.FlowThrottle
MethodTable: 50e49ffc
EEClass: 50e49f8c
Size: 36(0x24) bytes
(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fdb60 400327b 18 System.Int32 1 instance 64 capacity
790fdb60 400327c 1c System.Int32 1 instance 64 count

790f8a7c  400327d        4        System.Object  0 instance 01287c64 mutex
791118a4  400327e        8 ...ding.WaitCallback  0 instance 01287c20 release
00000000  400327f        c                       0 instance 01287c70 waiters
790f9244  4003280       10        System.String  0 instance 00fe5620 propertyName
790f9244  4003281       14        System.String  0 instance 00fe5758 configName
-------------
0129b290
Name: System.ServiceModel.Dispatcher.FlowThrottle
MethodTable: 50e49ffc
EEClass: 50e49f8c
Size: 36(0x24) bytes
(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fdb60  400327b       18         System.Int32  1 instance 2147483647 capacity
790fdb60  400327c       1c         System.Int32  1 instance        0 count
790f8a7c  400327d        4        System.Object  0 instance 0129b2b4 mutex
791118a4  400327e        8 ...ding.WaitCallback  0 instance 0129b270 release
00000000  400327f        c                       0 instance 0129b2c0 waiters
790f9244  4003280       10        System.String  0 instance 0103bd48 propertyName
790f9244  4003281       14     

..................................................................................

 

It was a great output, because of this text in RED. This proves that I was guessing on the right direction. There is actually one endpoint which has reached the throttle limit.

 

But which endpoint is this, we have 11?

Let's dump the related  ServiceHost type:

0:025> !do 01287a34
01287a34
Name: System.ServiceModel.ServiceHost
MethodTable: 50e553e8
EEClass: 50e55368
Size: 136(0x88) bytes
(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
79103c00  4000915       28       System.Boolean  1 instance        0 aborted
79103c00  4000916       29       System.Boolean  1 instance        0 closeCalled
.................................................................................................................

.................................................................................................................
50e04530 4002be2 3c ...erviceDescription 0 instance 01287cd0 description
00000000  4002be3       40                       0 instance 01287b2c extensions
00000000  4002be4       44                       0 instance 012d6834 externalBaseAddresses
00000000  4002be5       48                       0 instance 01293a80 implementedContracts
50e124c0  4002be6       4c ...nceContextManager  0 instance 01287b58 instances
7910b5d8  4002be7       6c      System.TimeSpan  1 instance 01287aa0 openTimeout
50df93e0  4002be8       50 ...rformanceCounters  0 instance 01295ef4 servicePerformanceCounters
50e1272c  4002be9       54 ...r.ServiceThrottle  0 instance 01287b80 serviceThrottle
50e321ac  4002bea       58 ...erviceCredentials  0 instance 00000000 readOnlyCredentials
50e1f218  4002beb       5c ...orizationBehavior  0 instance 012d67ec readOnlyAuthorization
00000000  4002bec       60                       0 instance 00000000 UnknownMessageReceived
7a751b00  4002bdd      74c           System.Uri  0   static 00fe546c EmptyUri
790f8a7c  4003075       74        System.Object  0 instance 00000000 singletonInstance
790ffe7c  4003076       78          System.Type  0 instance 01057f28 serviceType
50e62118  4003077       7c ...ontractCollection  0 instance 01287e00 reflectedContracts
7910b450  4003078       80   System.IDisposable  0 instance 00000000 disposableInstance

The name of the endpoint can be found in this ServiceDescription

0:025> !do 01287cd0
Name: System.ServiceModel.Description.ServiceDescription
MethodTable: 50e04530
EEClass: 50e04478
Size: 32(0x20) bytes
(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
00000000  400306f        4                       0 instance 01287cf0 behaviors
790f9244 4003070 8 System.String 0 instance 01287d9c configurationName
50dd0b68  4003071        c ...ndpointCollection  0 instance 01287d28 endpoints
790ffe7c  4003072       10          System.Type  0 instance 01057f28 serviceType
50dda3c4  4003073       14 ...scription.XmlName  0 instance 00000000 serviceName
790f9244  4003074       18        System.String  0 instance 00fe7a2c serviceNamespace

It should be inside configurationName.

0:025> !do 01287d9c
Name: System.String
MethodTable: 790f9244
EEClass: 790f91a4
Size: 100(0x64) bytes
(C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: **********. ********.ExternalService
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fdb60  4000096        4         System.Int32  1 instance       42 m_arrayLength
790fdb60  4000097        8         System.Int32  1 instance       41 m_stringLength
790fad38  4000098        c          System.Char  1 instance       4d m_firstChar
790f9244  4000099       10        System.String  0   shared   static Empty
    >> Domain:Value  001475b8:790d57b4 <<
79122994  400009a       14        System.Char[]  0   shared   static WhitespaceChars
    >> Domain:Value  001475b8:00fc13f4 <<

Here we go. The endpoint which is saturated with the max throttle limit is ExternalService. When the GUI launches it actually contacts this endpoint to get some data.