Processor spikes in the web server; find the rogue process/thread

In one of my engagements, I was doing the load test of a WCF application.  I was using unit test for each of the scenario and mixed them together, to create the required workloads. In my case there were six scenarios. However, the interesting observation was that the load test results showed the system couldn't support more than 3 concurrent users, as the web server processor was 100% utilized. The task manager shows HERMES.SERVICEHOST.EXE as the process name causing this outage. I could have gone back to the app guys and told them "Hey look, this application sucks!', instead I took a different stand. With the help of much respected winDBG, I tried to analyze WHY the processor was so busy.

Let's investigate into "WHY"

My first step was to figure out which call/calls are heavy for the web server processor. Running each unit test individually, I found that out of 6 tests 3 tests are causing the problem, by showing the same pattern. I looked into the common set of service calls among these three tests, but I found many calls. However, I needed the exact call. So, I put break point on all the calls of a UNIT test and looked into the web server processor utilization with each F9 until I found the rogue call. It didn't take too long to find the evil. The Processor utilization reached around 80% and stayed there for almost 2 minutes.

From my UNIT test the call looked like

return dealClient.GenerateCps (localeId, .....................);
           

I confirmed the same call was responsible for other tests also. Now it was time to get into the processor's skin and my option was WinDBG.

I opened a command prompt from the web server debug directory, and typed this on (I didn't hit "ENTER" though)

adplus -hang -pn HERMES.SERVICEHOST.EXE -quiet

Then I went back to the my unit test with the break point set at "return dealClient.GenerateCps (localeId, .....................);" and ran the test in debug mode and when it reached this line, I pressed F9. I checked that the web server processor had almost hit the roof. I waited for couple of seconds before I hit the "ENTER" key from my earlier command prompt. Quietly it created the dump of the process in hang mode.

 

Who let the dogs out?

Let's spill the beans with winDBG. I used these set of commands in sequence .
           

0:000> .load sos.dll

.time : to know the uptimes of the system and the process (HERMES.SERVICEHOST.EXE)

0:000> .time
Debug session time: Tue Jul 29 16:25:29.000 2008 (GMT+6)
System Uptime: 9 days 3:37:40.437
Process Uptime: 0 days 5:04:45.000
  Kernel time: 0 days 1:40:55.000
  User time: 0 days 0:01:06.000

!runaway: to know which thread was consuming processor time in USER mode.

0:000> !runaway
User Mode Time
  Thread       Time
   9:1ee8 0 days 0:23:17.765
   3:318       0 days 0:00:01.218
   2:1494      0 days 0:00:00.781
  16:1b24      0 days 0:00:00.062
   0:1214      0 days 0:00:00.046
  21:1f1c      0 days 0:00:00.031
   8:1060      0 days 0:00:00.031
  14:1dec      0 days 0:00:00.015
   5:1910      0 days 0:00:00.015
  20:29c       0 days 0:00:00.000
  19:1bc0      0 days 0:00:00.000
  18:112c      0 days 0:00:00.000
  17:1c84      0 days 0:00:00.000
  15:1ca0      0 days 0:00:00.000
  13:1c68      0 days 0:00:00.000
  12:1d60      0 days 0:00:00.000
  11:1c24      0 days 0:00:00.000
  10:185c      0 days 0:00:00.000
   7:1f30      0 days 0:00:00.000
   6:1f90      0 days 0:00:00.000
   4:1070      0 days 0:00:00.000
   1:1330      0 days 0:00:00.000

The above output showed that the thread 9 in red had taken up the maximum time form the processor.

Now let's switch into this thread number 9 to see "WHY".

0:000> ~9s
eax=00000001 ebx=04d3ade0 ecx=0c851ec4 edx=0c851ee0 esi=0c851ec4 edi=00000004
eip=073d127f esp=040ec544 ebp=040ec568 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202
073d127f 8bcf            mov     ecx,edi

 

!clrstack: to see what was on the execution stack of this long running thread

0:009> !clrstack
OS Thread Id: 0x1ee8 (9)
ESP       EIP    
040ec544 073d127f SoftArtisans.OfficeWriter.ExcelWriter.ᧀ.ᜄ(Int32)
040ec570 0eef988f SoftArtisans.OfficeWriter.ExcelWriter.᠑.ᜂ()
040ec594 0eef9669 SoftArtisans.OfficeWriter.ExcelWriter.ᩨ.ᜅ()
040ec5a0 0eef95a1 SoftArtisans.OfficeWriter.ExcelWriter.ᩨ.ᜀ(SoftArtisans.OfficeWriter.ExcelWriter.ᢦ)
040ec5cc 0eef954b SoftArtisans.OfficeWriter.ExcelWriter.ᩨ.ᜁ()
040ec5d0 0eef951c SoftArtisans.OfficeWriter.ExcelWriter.៬.ᜀ(SoftArtisans.OfficeWriter.ExcelWriter.៬)
040ec5dc 0eef92e6 SoftArtisans.OfficeWriter.ExcelWriter.៬.ᜀ()
040ec5e8 0eef913f SoftArtisans.OfficeWriter.ExcelWriter.៘.ᜀ()
040ec618 0eef90dd SoftArtisans.OfficeWriter.ExcelWriter.ᢥ.ᜁ(Int32)
040ec624 0eef8e86 SoftArtisans.OfficeWriter.ExcelWriter.ᤩ.ᜀ(SoftArtisans.OfficeWriter.ExcelWriter.ᣊ, Int32, SoftArtisans.OfficeWriter.ExcelWriter.᤾)
040ec66c 0eef8d27 SoftArtisans.OfficeWriter.ExcelWriter.ᤩ.ᜀ()
040ec69c 0eef8cdb SoftArtisans.OfficeWriter.ExcelWriter.ឋ.ᜀ()
040ec6a0 0eef8c59 SoftArtisans.OfficeWriter.ExcelWriter.ᨀ.ᜀ()

040ec6d4 0740bcf1 MS.IT.Ops.Hermes.DataAccessObjects.GenerateCpsDAO.AssignProductInformation(System.Data.DataView, SoftArtisans.OfficeWriter.ExcelWriter.Worksheet, Int32 ByRef, System.Decimal ByRef, System.Data.DataTable, SoftArtisans.OfficeWriter.ExcelWriter.Palette, System.String, System.Globalization.CultureInfo)
040ec7cc 0aeb6c65 MS.IT.Ops.Hermes.DataAccessObjects.GenerateCpsDAO.GenerateCps(Int32, Int32, System.String, System.String, System.String, MS.IT.Ops.Hermes.DataTransferObjects.CPSData, System.String, MS.IT.Ops.Hermes.DataTransferObjects.ProposalEntity, MS.IT.Ops.Hermes.DataTransferObjects.CPSDomainData, System.String, Boolean, Boolean, System.DateTime, Boolean, MS.IT.Ops.Hermes.DataTransferObjects.Footer, Boolean)
040eed34 08407fc5 MS.IT.Ops.Hermes.BusinessComponents.Deal.GenerateCps(Int32, Int32, System.String, System.String, System.String, MS.IT.Ops.Hermes.DataTransferObjects.CPSData, System.String, MS.IT.Ops.Hermes.DataTransferObjects.ProposalEntity, MS.IT.Ops.Hermes.DataTransferObjects.CPSDomainData, System.String, Boolean, Boolean, System.DateTime, Boolean, MS.IT.Ops.Hermes.DataTransferObjects.Footer, Boolean)
040eed88 0464b34e MS.IT.Ops.Hermes.BusinessFacade.DealFacade.GenerateCps(System.String, System.String, MS.IT.Ops.Hermes.DataTransferObjects.ProposalEntity, Boolean, System.String, Int32, Int32, System.String, Boolean, System.DateTime, Boolean, MS.IT.Ops.Hermes.DataTransferObjects.Footer, Boolean)
040eee88 04649e6e MS.IT.Ops.Hermes.Services.DealService.MS.IT.Ops.Hermes.Services.IDeal.GenerateCps(Int32, System.String, System.String, MS.IT.Ops.Hermes.DataTransferObjects.ProposalEntity, Boolean, Boolean, System.String, Boolean, Boolean)
040eef7c 0822900c DynamicClass.SyncInvokeGenerateCps(System.Object, System.Object[], System.Object[])
040eefb8 50b09ede System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(System.Object, System.Object[], System.Object[] ByRef)
040ef030 50aeaff3 System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(System.ServiceModel.Dispatcher.MessageRpc ByRef)
040ef07c 508ad430 System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(System.ServiceModel.Dispatcher.MessageRpc ByRef)
040ef0bc 508ad286 System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage4(System.ServiceModel.Dispatcher.MessageRpc ByRef)
040ef0e8 508ad1f3 System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage3(System.ServiceModel.Dispatcher.MessageRpc ByRef)
040ef0f8 508ad039 System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage2(System.ServiceModel.Dispatcher.MessageRpc ByRef)
040ef108 508ac281 System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage1(System.ServiceModel.Dispatcher.MessageRpc ByRef)
040ef12c 508abfaf System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean)
040ef170 508ab8a4 System.ServiceModel.Dispatcher.ChannelHandler.DispatchAndReleasePump(System.ServiceModel.Channels.RequestContext, Boolean, System.ServiceModel.OperationContext)
040ef31c 508a9e22 System.ServiceModel.Dispatcher.ChannelHandler.HandleRequest(System.ServiceModel.Channels.RequestContext, System.ServiceModel.OperationContext)
040ef360 508a98ee System.ServiceModel.Dispatcher.ChannelHandler.AsyncMessagePump(System.IAsyncResult)
040ef374 508a9881 System.ServiceModel.Dispatcher.ChannelHandler.OnAsyncReceiveComplete(System.IAsyncResult)
040ef380 50212c98 System.ServiceModel.Diagnostics.Utility+AsyncThunk.UnhandledExceptionFrame(System.IAsyncResult)
040ef3ac 5089f211 System.ServiceModel.AsyncResult.Complete(Boolean)
040ef3e8 508a963c System.ServiceModel.Channels.FramingDuplexSessionChannel+TryReceiveAsyncResult.OnReceive(System.IAsyncResult)
040ef418 50212c98 System.ServiceModel.Diagnostics.Utility+AsyncThunk.UnhandledExceptionFrame(System.IAsyncResult)
040ef444 5089f211 System.ServiceModel.AsyncResult.Complete(Boolean)
040ef480 508a93e1 System.ServiceModel.Channels.SynchronizedMessageSource+ReceiveAsyncResult.OnReceiveComplete(System.Object)
040ef4ac 508a55fd System.ServiceModel.Channels.SessionConnectionReader.OnAsyncReadComplete(System.Object)
040ef4ec 508cffa2 System.ServiceModel.Channels.StreamConnection.OnRead(System.IAsyncResult)
040ef4f8 50212c98 System.ServiceModel.Diagnostics.Utility+AsyncThunk.UnhandledExceptionFrame(System.IAsyncResult)
040ef524 5089f211 System.ServiceModel.AsyncResult.Complete(Boolean)
040ef560 508cf689 System.ServiceModel.Channels.ConnectionStream+ReadAsyncResult.OnAsyncReadComplete(System.Object)
040ef58c 50901c29 System.ServiceModel.Channels.SocketConnection.FinishRead()
040ef594 50901afb System.ServiceModel.Channels.SocketConnection.AsyncReadCallback(Boolean, Int32, Int32)
040ef5d8 5089efe8 System.ServiceModel.Channels.OverlappedContext.CompleteCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
040ef5f0 50212d9f System.ServiceModel.Diagnostics.Utility+IOCompletionThunk.UnhandledExceptionFrame(UInt32, UInt32, System.Threading.NativeOverlapped*)
040ef624 793d6140 System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
040ef7c8 79e7be1b [GCFrame: 040ef7c8]

And BINGO! it is actually generating a file with ExcelWriter, which is very expensive in terms of processor utilization. Later, I talked with the application team and they confirmed it.