HTTPS and Keep-Alive Connections

As we explore network performance on the “real-world web”, one bad pattern in particular keeps recurring, and it’s not something that our many IE9 Networking Performance Improvements alone will resolve.

The bad pattern is the use of Connection: close semantics for HTTPS connections. In this bad pattern, a website allows only a single request and response on every HTTPS connection before closing the connection.

Defeating HTTP/1.1’s default Keep-Alive behavior is a bad practice for regular HTTP connections but it’s far worse for HTTPS connections because the initial setup costs of a HTTPS connection are far higher than a regular HTTP connection. Not only does the browser pay the performance penalty of setting up a new TCP/IP connection, including the handshake and initial congestion window sizing, but the request’s progress is also penalized by the time required to complete the HTTPS handshake used to secure the connection.

To do all of that work and then only use it for one HTTP request and response is a terrible waste of resources-- it's like paving a highway, allowing a single car to drive down it, and then dynamiting the road after it passes. You then need to pave a new highway for the next car, only to subsequently blow it up, and so on. This bad pattern can dramatically slow down the loading of the page and increase the load on your server.

Browsers have no choice but to close the connection when directed by the server; it would be a violation of the standard (and almost certainly wouldn’t work) to try to ignore the server’s directive to close the connection.

A while ago, I saw one site that was particularly bad in this regard—it was a shopping site that showed many product thumbnails on every page; the page included 200 thumbnail images, each delivered over HTTPS, and each from a server in Asia that closed the connection after every single response. While browsers work their hardest to load this page, performing multiple connections in parallel, each page on the site took several minutes to load. Using Fiddler to simulate exactly the same site, but allowing connection reuse, the site’s pages would load in about 15 seconds. I haven’t been back to that site since (they may not be in business any longer) but the problem can even be seen on “big” sites used by millions of people every day.

You can observe Connection Reuse with Fiddler’s Timeline tab. Right-click on the tab and change the view to Server PortMap and then look to see how many connections are reused.

On a site that makes good use of Keep-Alive, you’ll see that many connections are reused for multiple requests.

image

This site only has more than 6 lines (the connections-per-host limit) because it “shards” its requests across a number of related domains.

In contrast, here’s a site which does not allow connection reuse:

image

The chart shows that each request is made on a new connection, and a small red-x after each transfer indicates that the server is using the Connection: close pattern. Only a few connections are reused (near the bottom of the trace) as these are cross-domain requests to a server that is configured for better performance.

Overall, loading of the site is delayed due to the overhead in establishing HTTPS connections. The performance penalty will be even larger for clients that have longer round-trip times (e.g. on 3G connections, or more geographically distant users).

Our team did a bit of research into this bad pattern, and we found two common origins of the bad pattern.

In the first, a thoughtful web developer or operations team reasons: “Hey, HTTPS connections are expensive to maintain on the server. Let’s be sure to tear those down as soon as possible to free up the server to accept new connections.” That, of course, completely misses the point that if the server wasn’t tearing down the connections, the server would be under significantly lighter load to begin with!Sites that were deliberately written with this bad pattern load slowly in all browsers.

We also found another root cause—ancient advice for the configuration of Apache+OpenSSL. Prior to IE6, ancient and unpatched versions of IE sometimes encountered connection failures when interacting with HTTPS servers when Keep-Alive is used. That problem was fixed nearly a decade ago, but outdated 1999-era configuration advice continues to harm performance for unaware server administrators:

https://www.modssl.org/docs/2.8/ssl_faq.html#ToC49
https://httpd.apache.org/docs/2.0/ssl/ssl_faq.html
SetEnvIf User-Agent ".*MSIE.*" \
nokeepalive ssl-unclean-shutdown \
downgrade-1.0 force-response-1.0

https://www.faqs.org/docs/securing/chap29sec245.html
SSLOptions +ExportCertData +StrictRequire
SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown
SetEnvIf Request_URI \.gif$ gif-image
CustomLog /var/log/httpd/ssl_request_log \
"%t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%r\" %b" env=!gif-image
</VirtualHost>

Four years ago, there was a public call to update the guidance to reflect the fact that users of more modern browsers were paying an unneeded performance penalty. Finally, in June 2010, the default guidance was changed in recognition of the fact that the problem never affected IE6 and later:

BrowserMatch ".*MSIE [1-5].*" \
nokeepalive ssl-unclean-shutdown \
downgrade-1.0 force-response-1.0

Unfortunately, many major Apache installations still haven’t been updated with even this guidance. Also, alert readers will spot a very obvious problem with the “new” regular expression.

In the expression above, any IE version that starts with “1” will be treated as outdated and served connection slowly without Keep-Alive. Internet Explorer 1.0 didn’t even support SSL at all (SSL was added in 2.0), but worse, this loosely-written regular expression will also match future MSIE 10.0, MSIE 11.0, MSIE 12.0 (etc)user-agent strings. Hence, Apache hosts will one day find that the newest browsers are forced into the “slow” lane!

At the very least, Apache hosts should update their regular expression to this:

BrowserMatch ".*MSIE [2-5]\..*" \
nokeepalive ssl-unclean-shutdown \
downgrade-1.0 force-response-1.0

…but ultimately, they should probably remove this hack altogether. The ancient Internet Explorer 6’s marketshare is in decline, and there’s almost never any business reason to try to accommodate even older browsers.

Thanks for your help in building a faster web!

-Eric Lawrence