SP2010: 503s, HTTP Throttling, Threads Waiting


 

 

Symptoms

A web application in a SharePoint 2010 farm becomes unresponsive (hangs) to the client's perspective under high, medium, or even low client load at random times.¬

Clients of the SharePoint sites¬†see 503 and/or "server too busy" or "The server is busy now.¬† Try again later," in their browsers.¬

Admins may see events in the application event log that center around http throttling.  Examples:

Log Name:      Application
Source:        Microsoft-SharePoint Products-SharePoint Foundation
Date:          [Date and Time]
Event ID:      8062
Task Category: Http Throttling
Level:         Critical
User:          FOO\SP_WebApps
Computer:      WFE002.win.foo.org
Description:
Http throttling on SharePoint - 443 stops because there is no heavy load detected now. 472 requests have been throttled during the throttling period.


Log Name:      Application
Source:        Microsoft-SharePoint Products-SharePoint Foundation
Date:          [Date and Time]
Event ID:      8032
Task Category: Http Throttling
Level:         Critical
User:          FOO\SP_WebApps
Computer:      WFE002.win.foo.org
Description:
Http throttling starts because a heavy load was detected on SharePoint - 443. The excessive performance counters include: WFE002.win.foo.org
\ASP.NET\Requests Current: 634.04

The c:\windows\system32\logfiles\httperr\httperr*.log show shows many "connection_dropped" entries followed by many "connection_abandoned_by_reqQueue" for the appPool corresponding to the unresponsive Web App.   Examples:

 [Date and Time] 172.30.122.196 58092 10.51.2.80 443 HTTP/1.1 GET /dist/5e/src/oruit/Team+Documents/Forms/AllItems.aspx - 1512381070 Connection_Dropped SharePoint+-+443
 [Date and Time] 172.30.122.196 50060 10.51.2.80 443 HTTP/1.1 GET /nit/FOO/security/FSM/Documents/Forms/AllItems.aspx - 1512381070 Connection_Dropped SharePoint+-+443
 [Date and Time] 10.110.77.146 64060 10.51.2.80 443 HTTP/1.1 HEAD /nit/FOO/OneNote/Open+Notebook.onetoc2 - 1512381070 Connection_Abandoned_By_ReqQueue SharePoint+-+443
 [Date and Time]172.30.122.196 59270 10.51.2.80 443 HTTP/1.1 GET /nit/FOO/ins/Lists/MXTraining/overview.aspx - 1512381070 Connection_Abandoned_By_ReqQueue SharePoint+-+443
[Date and Time]172.30.122.196 61391 10.51.2.80 443 HTTP/1.1 HEAD /nit/FOO/Projects/Documents/BluePrint+Draft+2013.docx - 1512381070 Connection_Abandoned_By_ReqQueue SharePoint+-+443
¬

A "hang dump" of the w3wp.exe process (made with either taskmanager's "Create user dump" option or with DebugDiag 1.2) shows a very high percentage of worker threads (example: 83.58% of threads blocked [499 threads]) waiting and "calling an ISAPI Extension OWSSVR."   A debugdiag 1.2 crash/hang analysis script of such a hang dump may show, as an example, the following type of call stacks:

 

The following threads in w3wp.exe__Application Pool (6.13.2011 6.01.13 PM)__PID__13912__Date__09_13_2013__Time_05_05_16PM__920__Manual Dump.dmp are calling an ISAPI Extension OWSSVR (C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\ISAPI\OWSSVR.DLL.)

( 15 16 17 18 19 20 21 22 39 45 47 55 56 57 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 207 208 209 210 212 213 )

77.21% of threads blocked (166 threads)

Entry point     msvcr90!endthreadex+64

Create time     [Date and Time]

Time spent in user mode     0 Days 00:00:06.140

Time spent in kernel mode     0 Days 00:00:01.484

This thread is calling an ISAPI Extension OWSSVR

Full Call Stack

Function

ntdll!ZwDelayExecution+a

KERNELBASE!SleepEx+ab

ONETUTIL!Vistream::error+191

ONETUTIL!VonetLocks::CReaderWriterLock3::_LockSpin+e3

ONETUTIL!CLKRLinearHashTable::_ReadOrWriteLock+55

ONETUTIL!CLKRLinearHashTable::_FindKey+36

ONETUTIL!CLKRHashTable::FindKey+88

STSWEL!VglobalAuditStore::ensureDatabaseQueue+d0

STSWEL!VglobalAuditStore::addAuditEntry+d8

STSWEL!auditSecurityScopeEvent+27d

STSWEL!VdocumentStore::httpGetDocument+36cf

STSWEL!VhttpManager::loadFileCore+7ef

STSWEL!VhttpManager::loadFile+f7

STSWEL!VhttpManager::handleNormalFetch+73f

STSWEL!VhttpManager::handleCommonFetch+37

STSWEL!VhttpManager::handlePOST+18

STSWEL!VhttpManager::dispatchHttpRequest+2d6

OWSSVR!MsoFAssertSzTagProcVar+6bdf1

OWSSVR!MsoFAssertSzTagProcVar+5957f

ONETUTIL!Vframework::doMain+a7

OWSSVR!MsoFAssertSzTagProcVar+856

ONETUTIL!COWSThreadWithHeap::WalkHeap+1dc

ONETUTIL!COWSThreadWithHeap::Uninitialize+66a

msvcr90!endthreadex+47

msvcr90!endthreadex+e8

kernel32!BaseThreadInitThunk+d

ntdll!RtlUserThreadStart+1d

¬
 

All the other waiting ("blocked") threads will look identical to that.  Note especially the ONETUTIL!VonetLocks::CReaderWriterLock3::_LockSpin+e3 and the STSWEL!VglobalAuditStore::addAuditEntry.

Third party monitoring tools may flag "IIS: ASP Application restarts" and "IIS: ASP Request Execution Time."

 

Resolution

Consider applying SharePoint 2010's Service Pack 2 *AND* the August 2013 Cumulative Update (or higher). Link: http://technet.microsoft.com/en-us/sharepoint/ff800847.aspx

More Information

Changing the HTTP Throttling threshold with PowerShell on the WFE from default of 500 to 750 is probably not going to ultimately help.  But this may be the first reaction some Admins consider trying to minimize the symptoms.   Requests are queuing because the worker threads are all waiting.  Increasing the throttling threshold probably won't do anything to avoid that problem.   The problem is not too many requests to process; it is the inability to process the requests.


Comments (2)
  1. Ashley Steel says:

    Hi Christopher – are you able to provide any specific information as to why SP2 and Aug13 CU relieve these issues? We have a customer who is experiencing severe HTTP Throttling problems but we need to specifically prove what in the updates will fix it, as their patch and change management process is very vigorous.

  2. If they are doing auditing, seeing throttling and/or asp.net requests queuing, and are shy of the Aug 2013 cu for SP2010, they're encountering a "known issue" (pardon the euphemism) and that update is the most reasonable way to fix it.  If they are not doing out of the box auditing, then it may be a different issue and you may want to make a "hang dump" with debugdiag (per blogs.msdn.com/…/several-good-ways-to-trigger-a-hang-dump-of-an-unresponsive-process.aspx) and then have debugdiag's analysis script analyze it.    Or actually, try this if you need proof.  Wait for the throttling/hang to happen, create a hang dump of the process while it is hung, and then use debugdiag's analysis crash/hang analysis script to see if the thread call stacks match the "thumbprint" of what I left above.   Standard disclaimers for blog advice apply.

Comments are closed.

Skip to main content