Internet Explorer 8 impacts OWA Load Balancing Scenarios

Recently we’ve had several issues reported regarding Internet Explorer 8 and Outlook Web Access load balancing scenarios.

Here are the scenarios we’ve identified so far:

  • DNS Round robin load balancing when DNS TTL expires
  • DNS Round robin load balancing when outbound HTTP Proxy servers are used
  • SSL Session-ID persistence with a hardware load balancer

The reason that each one of these scenarios occur is because of the new “LCIE” or “Tab Process” model in Internet Explorer 8.  Essentially Internet Explorer 8 can create new “tab processes” for new browser tabs or browser windows that are opened.  It doesn’t do this for every tab or window but uses an internal algorithm to determine whether or not it should spin up a new process. 

For a detailed description of the new architecture, see the following two posts:
IE8 and Loosely-Coupled IE (LCIE)
Opening a New Tab may launch a New Process with Internet Explorer 8.0

Due to the new architecture certain things such as kept-alive connections and SSL sessions are not shared between Tab Processes.  This plays an important role in the scenarios we have found. 

In advance: Yes, I know it’s not called an “OWA Server” – I’m calling it that to make this work for Exchange 2003 or Exchange 2007.

The design of Forms Based Authentication in Outlook Web Access is another contributing factor.  Forms based authentication works by the OWA server storing a cookie in the client browser for authentication.  This cookie contains the encrypted credentials of the user that the OWA server can translate into basic authentication credentials for IIS with each new request.  However this cookie is encrypted by a key that is maintained (and rotated) in memory only on the OWA server that set the cookie.  For instance if a client logs into Server A and receives a cookie from Server A, it cannot use that same cookie to authenticate to Server B because Server B does not have they key to decrypt the credentials.  If you do try to present a cookie from Server A to Server B, your browser will be redirected to the login page on Server B to provide valid authentication.

That explanation given, here is a brief explanation of the behavior you may see in each scenario.

DNS Round robin load balancing when DNS TTL expires

If you use DNS round robin load balancing for Outlook Web Access, you may experience confusing client behavior when the DNS time to live values expires on the record.  The scenario usually manifests itself like this:

You are logged into OWA (with forms authentication) when the DNS TTL expires.  If you continue to work within the main OWA window, you should be able to continue to work within the main OWA window, you should be able to continue to work and not be logged out.  This is because IE uses it’s kept-alive connections that are already established to the original OWA server.  However if you open a number of email messages in new windows (double-click mail messages), a new tab process for iexplore.exe will be created.  This new tab process will have to create a new TCP session that, when resolved by Winsock, returns the IP address of a different OWA server in the DNS round robin.  When this new window opens, you will be redirected to the forms authentication login page.  If you provide your credentials to the “new” server, you will in-turn get logged out of the “main” OWA window that still has a persistent connection to the “old” server. And the next time you spin up a new tab process, the experience starts over again. 

So needless to say, it’s a confusing user experience and the only way to recover is to close all browser windows, restart IE and log back in to OWA.

DNS Round robin load balancing when outbound HTTP Proxy servers are used

If you use DNS round robin load balancing for Outlook Web Access, you may experience being logged out randomly with Outlook Web Access.  This is really due to the same behavior as above except instead of the name resolution happening on the client, it happens on the proxy server.  Since proxy sessions are not maintained across tab processes, new outbound connections will also be established from the proxy server.  Since many users may be accessing the same DNS name through the same proxy server, the TTL expiration may seem completely random.

SSL Session-ID persistence with a hardware load balancer

If you are using SSL session-id based persistence on your hardware load balancer, you may experience the same behavior as above except in a more severe form.  In this situation the new connections that are established by the new tab process are routed to a different OWA server by the load balancer.  This will happen every time a new tab process is spawned, not just at some arbitrary interval making this by far the worst of the experiences.

Workarounds and Solutions

We’ll start with the easy one: SSL Session-ID persistence.  Because of issues (just like this) where the SSL session-id changes, SSL session-id based persistence is not (and has never been) a supported persistence method for load balancing Outlook Web Access.  In fact, there aren’t really any Exchange Services for which we support or recommend SSL session-id based persistence (Edit: Obviously for services that don’t require persistence at all such as ActiveSync and certain Outlook Anywhere configurations, you can use SSL-ID based persistence to keep the SSL handshakes to a minimum).  The supported persistence methods for Outlook Web Access are Client-IP based or Cookie-based persistence.  In certain situations where many clients are coming from the same IP (maybe a proxy server), cookie-based persistence is a much better solution than client-ip based persistence to get the load spread evenly.  Be aware that Outlook Web Access is the only Exchange web service that can use cookie-based persistence.

For DNS round-robin the answer is a bit more complicated.  We’ve never really recommended DNS round robin for load balancing, but it technically is supported.  Typically we just outline the caveats like the reliance on the client-side DNS resolver cache and the fact that it gives you zero redundancy in the case of a service failure.  However with all this guidance, many folks still see this as the poor man’s load balancer and implement it anyway.  If you are experiencing this issue, I strongly recommend you move to a real load balancing solution whether this is Windows NLB, the load balancing built-in to ISA Server 2006+, or a hardware-based load balancer.  If you can’t or won’t, this is something you’ll have to live with for now.  You can work around this issue by setting the TabProcGrowth registry value for the IE browser to 1 (It’s mentioned in the second of the earlier linked IE blogs).  However, this isn’t feasible in most situations because you have little control over the client-base.

We are currently pursuing a design change request with the Internet Explorer team to see if we can make any of these experiences more consistent with the IE7 behavior.  However, since changes in this area can be very risky, we may have to live with this new behavior.  After all, it’s not a bug, it’s just an effect of the change in design for IE8.

Thanks to everyone for reading, and special thanks to John Towler for reporting this.