Intrusion Detection as one cause for latency when browsing to a SharePoint site

 

Problem Details / Symptoms

I have been troubleshooting a SharePoint performance problem in which clients were wait over 75 seconds to get and render an InfoPath browser form of a SharePoint site.   

Interestingly, usually the first GET tended to take the longest.  And that's not because the w3wp.exe process hadn't already been spawned on the server and not because asp.net hadn't already done its compiling.   After one successful GET-and-render of the page, the subsequent attempts would tend to be quicker.

Verbose ULS logging showed that the SharePoint server seemed to think it was serving the pages pretty quickly.

The client was in one office building with one domain and the SharePoint farm was in a datacenter that was part of another domain.  A two-way, non-transitive trust was in place.   So of course we took some network monitor traces to try to see where the latency really was.  To my surprise, everything from the server side traces (WFEs, DCs, SQL) looked good.  All the traffic from the server to the client was flowing smoothly, quickly, without tcp resets, without delays.  The most interesting thing was from the client side trace.  There was a mysterious 50 second delay only on the client side trace.  And there was no good reason for it visible in the trace.    

This odd phenomena reminded me to ask if they were running any IDS/IPS (intrusion detection, intrusion protection) application on their clients.  Over the past ten years I have seen a few problems like this with BlackIce/ISS and with Symantec Endpoint Protection in particular.  But in the past four years I had seen no problems like this. These client are all running Symantec Endpoint Protection (SEP).

 

 

IDS Policy Exclusions

 

The client didn't have the rights on her Windows 7 workstation to disable Symantec Network Threat Protection (part of the Symantec Endpoint Protection [SEP] suite) on her workstation.  But the latency problem improved as soon as the Security Administrator made a test group, added her workstation to the group, and pushed out an IPS exclusion policy to her workstation that excluded the URL we were browsing to. 

We did know that the URL we were browsing to was trustworthy, even though it was to a webserver in a different, trusted domain.  And the route between the two was safe (no real risk of man-in-the-middle attacks for example) as well. It was all in a secure network. So they decided to add the host to the exclusion policy and send it to all the workstations. And the latency problem improved drastically and immediately.   With the exclusions the overall GET-and-render-page times went from 75 seconds to 12-20 seconds!  

 

Disabling Network Threat Protection

 

But the 12-to-20 second range also proved dissatisfactory.  Setting up the host exclusions obviously did help mitigate the problem for the clients--but wasn't sufficiently "performant."    One of the clients was able to successfully disable the Network Threat Protection (one part of the Symantec Endpoint Protection Suite) on the client side.  Immediately the latency problem improved from ~15 seconds to 4-to-6 seconds!  Dramatic improvement again. 4-6 seconds was acceptable.  And when the Network Threat Protection was re-enabled, the latency problem returned.  Seemed pretty clear that SEP was causing additional latency.

Just when we thought all was well, the plot thickened when the SharePoint servers were moved from one datacenter to another.   The latency problem increased from ~5 seconds to ~15 seconds again, despite proper entries in the IDS exclusion policy.  This time disabling Network Threat Protection on test clients seemed to only shave one second off of the render times.

 

Solutions and Conclusions

Ultimately my recommendation to my customer was to work with Symantec support for further advice about dll updates and optimization tweaks. They haven't done it yet, however, so I can't tell you how that went.  Hopefully soon I will be able to update this blog post with the best answer.

I'm actually not trying here to blame SEP for doing something bad for the sake of trying to promote Microsoft's SCEP (System Center Endpoint Protection - https://www.microsoft.com/en-us/server-cloud/system-center/endpoint-protection-2012.aspx).  It's probably fair to say that SEP was doing truly valuable work while causing the delay. Perhaps it was checking its rules and scanning for malware.  Maybe SCEP might do the same thing--I don't know.  I don't yet know any details about what exactly SEP was doing.  But I'm can give it the benefit of the doubt that it was delaying the traffic for a reason that it felt was a good reason. 

 

 

 

More info:

 

Host Exclusion Policies

https://service1.symantec.com/SUPPORT/ent-security.nsf/2326c6a13572aeb788257363002b62aa/589bc3406761c16680257412003cd94a

Symantec Endpoint Protection Manager - Intrusion Prevention - Policies explained

Enable excluded hosts -  Enables you set up a list of hosts for which the client ignores all inbound and outbound traffic. The firewall and the IPS signatures do not scan these hosts for firewall rules, matching attack signatures, port scans, anti-MAC spoofing, or denial-of-service attacks.  This option is disabled by default.

 

Removal of SEP

 

I pointed out to my customer that as long as the teefer.sys/teefer2.sys driver is loaded as a filter driver into kernel memory at boot up, there is a chance that it could delay the http traffic.  I suggested that a more pure test would be to totally uninstall SEP and reboot (if you don't reboot, the teefer driver remains in memory) and then test.   They don't have that freedom however.   I suggested that they talk to Symantec support to find alternatives to uninstalling the entire SEP suite.  I suggested that perhaps based on https://www.symantec.com/docs/TECH91038  ("How to manually uninstall the Symantec Endpoint Protection client from Windows Vista, Windows 7, and Windows 2008 R2 64-bit.") they might be able to make some registry changes to ensure that the Symantec Teefer* driver(s) are not loaded in memory long enough to do some testing and then easily reload the teefer driver.   Or perhaps there is an even easier way to use FLTMC.exe to do it, I suggested.  (Compare: How to use FLTMC.exe to load and unload the evfilter mini-filter driver for Enterprise Vault (EV) for File System Archiving (FSA) |  Article:TECH54993   |  Created: 2009-01-22   |  Updated: 2011-05-10   |  Article URL https://www.symantec.com/docs/TECH54993.)  But I'm not comfortable advising here, so ask Symantec support.