IE8 Lookahead Downloader Fixed

Background
Last year, I wrote about two bugs in IE8’s Lookahead Downloader that would cause IE8 to make spurious download requests for non-existent URLs. These spurious download requests generally went unnoticed by users, because the main parser would eventually retrieve the correct resource when it was needed. However, for a small number of sites (where requesting non-existent URLs has side-effects), significant user-experience problems occurred when spurious requests were issued. For instance, on some sites, ASP.NET ViewState Corruption exceptions would result in the sites defensively logging the user out.

In October, we fixed one of the two bugs, correcting the URLs requested by the Lookahead Downloader when the markup contained a BASE tag.

After that fix, one more type of bug remained: a timing-related problem whereby the Lookahead Downloader would sometimes request a malformed URL consisting of the part of a URL preceding the 4096th byte[1] of the markup, combined with whatever text follows the 8192nd byte[1], up to the next quotation mark (sometimes called "the 4kb bug"). Our investigation determined that there were two scenarios that could lead to the 4kb bug:

  1. Parser Restart (occurs when the CHARSET is specified in a META tag)
  2. Parser Suspension (occurs for multiple reasons; a common one is when the document contains an XML Namespace declaration)

While web developers could easily avoid Scenario #1 (by specifying the CHARSET in the HTTP Content-Type response header), critically, Scenario #2 didn't have any easy, comprehensive workarounds.

Yesterday’s Fix
Yesterday’s IE8 Cumulative Update (KB980182) resolves the timing problems such that IE8’s Lookahead Downloader will no longer issue spurious requests.The Update resolves problems in Scenario #2 outright-- parser suspensions will no longer lead to problematic behavior. However, the Update kills the bug in Scenario #1 by disabling the Lookahead Downloader when a restart is encountered. Hence, we continue to strongly recommend that web developers specify the CHARSET in the HTTP Content-Type response header, as this ensures that the performance benefit of the Lookahead Downloader is realized. Even if a future version of IE addresses Scenario #1 more elegantly, there are other performance and security benefits to specifying the CHARSET using the HTTP header[2] for pages targeting any browser.

I’ve built a Meddler Script which demonstrates the Restart-related timing issue, but keep in mind that it shouldn’t do anything interesting in IE8 after the 3/30/2010 IE Cumulative Update is applied.

-Eric

[1] actual values varied, but were typically a multiple of 4kb
[2] Technically, using a Unicode BOM at the top of the document would also prevent the restart, but it doesn't confer the same security benefit.