The Hazards of Relying upon Browser Quirks

While many web developers find subtle browser behaviors baffling, often browser developers are bewildered by web content. Yesterday, we ran into an interesting site compatibility problem that occurs in the latest internal version of IE9.

The site in question is a popular site which uses a Flash applet as a major component of the site. Upon attempting to log in to the site, an error message is displayed in the applet. The problem does not reproduce in IE8 or any other browser; only in internal IE9 builds built after the beta was released. Upon further examination, we uncovered the root cause of the problem.

The server is sending a bunch of headers that say “no matter what you do, don’t write this file to the cache.

POST https://site/folder/xmlrpc/?method=authenticate HTTP/1.1

HTTP/1.0 200 OK
Date: Tue, 21 Sep 2010 23:02:54 GMT
Content-Type: text/xml
Content-Length: 2837
Cache-Control: no-cache, no-store, no-transform, must-revalidate, max-age=-1
Pragma: no-cache, no-store

Expires: -1
Connection: close

In particular, both the bolded green lines clearly communicate that the server does not want the file cached. (The negative value for the max-age directive is illegal, but the rest of the directives are fine).

The problem is that the site's Flash applet appears to fail in IE if this file isn't cached (apparently a common complaint for Flash developers).

So, the immediate question is why did this ever work in IE8 and what broke in IE9?

After some investigation, it became clear that the site is deliberatelystructuring its cache headers to make it appear that the file would not be cached, while actually allowing IE to create a cache file. A bold claim, to be sure, but let me explain.

First, you'd expect that the no-store and no-cache directives would prevent caching of this HTTPS content. However, note that these are HTTP/1.1 cache control directives, and the response was delivered as a HTTP/1.0 response. Prior to our very latest IE9 builds, IE would only respect the max-age (and deprecated pre-check/post-check) tokens in a response delivered as HTTP/1.0. So, in this case, the no-cache and no-store directives were ignored.

Next, you'd expect that the pre-HTTP/1.1 Pragma header would prevent caching. However, IE only respects this header for HTTPS and only if the value is exactly and only no-cache. In this case, because the site has added an additional token to the header, IE ignores the Pragma directive.

Our latest internal IE9 builds have begun respecting all of the Cache-Control tokens regardless of the response's HTTP version, which led to the site breakage.

Now, could it be possible that this site just blindly happened upon these quirks, by accident? Possibly, but for one additional clue: The site in question uses browser sniffing to sniff for the Internet Explorer User-Agent string, and only if it finds IE does it change the response to be HTTP/1.0. For all other browsers, it returns a HTTP/1.1 response. It's obvious that the site owners are relying upon the quirks in legacy IE code to allow caching, despite sending headers that would appear to prevent it.

Clever readers might protest: "perhaps the site is trying to prevent caching on an intermediary while allowing caching in Internet Explorer?" That would make sense if any caching intermediary could "see" this response, but because it's delivered over HTTPS, that will not happen.

Our outreach team has contacted the site in question to request that they follow best practices for caching.

-Eric