Recently we have been getting quite a few similar hangs in ASP.NET. From the first look they really aren’t all that similar but when we get down to the nitty-gritty we see that they all hang or rather have a very high latency, in the exact same location.
On startup, the first request for an aspx page, no matter how simple it is, will take around 2 minutes. In fact, all subsequent requests will also take 2 minutes to execute, so the server is essentially out of order.
All the servers in the web cluster behave the same way, and have done so since the servers were installed.
Debugging the problem:
They took two dumps on the server, during the hang and the only thread that seemed mildly interesting was this one…
0:009> kp 2000
0130e41c 7d4e091f ntdll!NtQueryDirectoryFile+0x12
0130e498 79e7a0b3 kernel32!FindNextFileW+0xaa
0130eb38 79e7a23c aspnet_isapi!CPerfCounterServer::CleanupNames+0x103
0130eb3c 79e79e58 aspnet_isapi!CPerfCounterServer::InitPerfPipe+0x5
0130eb44 79e79de2 aspnet_isapi!CPerfCounterServer::Initialize+0x37
0130eb50 79e7642b aspnet_isapi!CPerfCounterServer::StaticInitialize+0x39
0130eb58 5a3233f8 aspnet_isapi!GetExtensionVersion+0x11
0130ec84 5a322dc9 w3isapi!ISAPI_DLL::Load+0x219
0130eca8 5a3228ca w3isapi!ISAPI_DLL_MANAGER::GetIsapi+0x1cd
0130ecd4 5a3968ff w3isapi!ProcessIsapiRequest+0x96
0130ed08 5a3967e0 w3core!W3_ISAPI_HANDLER::IsapiDoWork+0x3fd
0130ed28 5a396764 w3core!W3_ISAPI_HANDLER::DoWork+0xb0
0130ed48 5a3966f4 w3core!W3_HANDLER::MainDoWork+0x16e
0130ed58 5a3966ae w3core!W3_CONTEXT::ExecuteCurrentHandler+0x53
0130ed74 5a3c0a52 w3core!W3_CONTEXT::ExecuteHandler+0x51
0130f01c 5a3c41d9 w3core!W3_CONTEXT::ExecuteChildRequest+0x29d
0130fb74 5a3c427a w3core!W3_STATIC_FILE_HANDLER::HandleDefaultLoad+0x5cc
0130fba8 5a3c4cf2 w3core!W3_STATIC_FILE_HANDLER::DirectoryDoWork+0x55
0130fbcc 5a3c53f8 w3core!W3_STATIC_FILE_HANDLER::RealDoWork+0x113
0130fe78 5a396764 w3core!W3_STATIC_FILE_HANDLER::DoWork+0x368
0130fe98 5a3966f4 w3core!W3_HANDLER::MainDoWork+0x16e
0130fea8 5a3966ae w3core!W3_CONTEXT::ExecuteCurrentHandler+0x53
0130fec4 5a396648 w3core!W3_CONTEXT::ExecuteHandler+0x51
0130feec 5a392264 w3core!W3_STATE_HANDLE_REQUEST::DoWork+0x9a
0130ff10 5a3965ea w3core!W3_MAIN_CONTEXT::DoWork+0xa6
0130ff2c 5a36169f w3core!W3_MAIN_CONTEXT::OnNewRequest+0x55
0130ff38 5a361650 w3dt!UL_NATIVE_REQUEST::DoStateProcess+0x48
0130ff48 5a3616ca w3dt!UL_NATIVE_REQUEST::DoWork+0x7f
0130ff5c 5a3024de w3dt!OverlappedCompletionRoutine+0x1a
0130ff8c 5a3026bc w3tp!THREAD_POOL_DATA::ThreadPoolThread+0x73
0130ffa0 5a301db9 w3tp!THREAD_POOL_DATA::ThreadPoolThread+0x24
0130ffb8 7d4dfff1 w3tp!THREAD_MANAGER::ThreadManagerThread+0x39
0130ffec 00000000 kernel32!BaseThreadStart+0x34
How did i figure out that this one was interesting? Check out “Things to ignore when debugging an ASP.NET Hang – Update for .NET 2.0”.
Ok, so interesting hmm… well depends on who you ask:)
This thread is processing an ISAPI request (for example an ASP.NET request since all ASP.NET requests go through the aspnet_isapi.dll), then it is loading an ISAPI dll, which just happens to be our friend aspnet_isapi.dll
For this one to load it apparently has to initialize some performance counters, and then it goes into some strange function “kernel32!FindNextFileW” and sticks around there…
It is in the FindNextFileW and NtQueryDirectoryFile the similarities exists between the similar hang issues we’ve had.
These functions make a transition from user mode (our process space) to kernel mode, and when you take a process dump with adplus or debug diag you will only get the user mode stacks and information, so the buck stops here…
Since we know that we are blocking in kernel mode we can make a few assumptions and work on those unless we want to go the whole way and take a kernel dump which is a bit of a bother, both to gather and to debug… We can assume that we are blocking on some kind of driver in kernel mode, and since we are in NtQueryDirectoryFile, the suspects are drivers for removable storage or antivirus software.
Before I continue I want to just say that before I would ever suggest doing anything to remove antivirus software, I would strongly advice to take backup measures to protect the machine from virus attacks. To test the assumption of an antivirus scanner intercepting the call, most people will just exclude the directory from virus scanning or disable the antivirus software. However… experience has shown that this is not always enough since there are still kernel drivers present, so a full removal of the antivirus software is needed in order to make sure that av software is not intercepting the calls.
We have had quite a few issues where av software has gotten stuck because of very outdated signatures, and in this particular case it happened to be antivirus software getting stuck because they were running a version of the antivirus software that was not designed for use with 64-bit operating systems. Once they installed the version of the av software that had a 64-bit engine it ran like a charm.
So the moral of the story is: keep your av software up to date and make sure you use the one that ia appropriate for your target operating system:)