First, a bit of background.
Some especially cutting-edge sites have tried to “help” browsers’ Lookahead Downloaders by using hidden IMG tags at the very top of their response to reference resources that will be needed later in the same page. The idea is that, by doing so, the browser will get resource requests out on the wire earlier, such that when the browser’s parser reaches the SCRIPT or LINK tag that needs the resource, the request for that resource will already be well-underway. Of course, the whole point of the Lookahead is to get resource request out earlier, but the technique of using IMG tags at the very top ensures that those URLs are very early in the HTTP response, possibly long before the client receives the markup containing the SCRIPT or LINK tags.
I love a good mystery, so I was excited when a website owner emailed me a Fiddler capture that showed Internet Explorer was downloading a resource on a page twice, despite the fact that the second download kicked off after the first one ended, and the first download had a proper caching header that would allow the resource to be re-used. What was going on here?
Fortunately, the capture was made with the X-Download-Initiator header enabled, so I was able to see why each request was made. The first download of the script was kicked off when the Lookahead reached an IMG tag. The second was kicked off when the parser reached a SCRIPT tag with the same SRC value. The URLs were identical-- why was the second request sent?
A further look showed that the client had actually aborted the first request, which explains why the second request was needed. But why was the first request aborted?
When IE encounters an IMG tag, it creates an image object and assigns the download request to it. As data arrives from the image download, it’s fed into the browser's image decoders. The decoders will reject data as malformed if you feed them plaintext, which seems reasonable, since they can't possibly make use of such data. When the decoders reject the data as "Not possibly an image," the image object will abort its processing. As a part of that abort, if the download has not yet completed, it too is aborted.
Aborting a download can be very bad for performance. Firstly, the client won’t get the resource that it asked for—only the part of the resource downloaded before the abort will be cached, and even that portion is cached only if the response was served with an ETAG and a Content-Length. If those headers are present, the browser may be able to later download the remainder of the file using a HTTP Range request. Secondly, establishing TCP/IP connections (and possibly HTTPS handshaking on top of that) can be quite expensive, so throwing away perfectly good connections can measurably increase the load time of your page.
Script loaded at 7:15:26. Ready for connections.
0: GET /;
1: GET /796-SCRIPTPretendingToBeAImg.js;
2: GET /796-SCRIPTNOTPretendingToBeAImg.js;
1: Error: An established connection was aborted by the software in your host machine
3: GET /UseTheScript.htm;
4: GET /796-SCRIPTPretendingToBeAImg.js; Range: bytes=4237-; If-Range: "796";
The line in yellow shows where the Script-fetched-by-IMG download was aborted, closing the connection. Later, session #4 shows that, when navigating to the second page of the repro, the browser sends a HTTP request asking for a partial download of the remainder of the script file.
If you use Fiddler to monitor this scenario, you’ll see that it doesn’t repro unless you enable Streaming Mode. That’s because, by default, Fiddler fully buffers each response and delivers it to the client in one shot, which means that the image object won’t have the chance to abort until the download has already been completed and cached. This reiterates the fact that using IMG for pre-fetch can succeed—but only under the best of network conditions.
The same abort behavior exists in IE6 to IE10, and Firefox 12.0. Opera 11.61, Chrome 18, and Safari 5.1.5 do not appear to abort when invalid image content is downloaded.
Interestingly, Firefox does not appear to cache even the partial file; when it re-downloads the script, the request does not contain a Range header. That behavior might be explained if Firefox uses a separate cache for IMG requests vs. other tags (an architecture that I believe Mozilla used at some point). Further evidence pointing in this direction exists. If you update the MeddlerScript so that the first request completes so quickly that the client has no chance to abort, Firefox still re-downloads the script file on the second page of the repro.
Script loaded at 7:31:39. Ready for connections.
0: GET /;
1: GET /143-SCRIPTPretendingToBeAImg.js;
2: GET /143-SCRIPTNOTPretendingToBeAImg.js;
3: GET /usethescript.htm;
4: GET /143-SCRIPTPretendingToBeAImg.js;
When the Web Developer who encountered this problem asked me for alternatives, my first thought was to try using IE’s startDownload method, which I hoped would accommodate their scenario, even though the method is limited to Same-Origin requests. Unfortunately, it turns out that startDownload isn’t a suitable replacement, because downloads initiated by this method are conducted with a no-cache flag, such that the cache is bypassed when making the request, and the response isn’t committed to the cache.
HTML5 proposes an explicit prefetch Link relation to allow clients to recognize resources for which pre-fetching may be beneficial. Internet Explorer 9 and 10 use these LINKs to perform DNS-prefetching; resources are not downloaded.
PS: Using IMG tags to pre-fetch images is, of course, entirely fine… so long as you’re not trying to pre-fetch images from HTTP on a page delivered by HTTPS. Doing that will cause a Mixed-Content problem and your page’s lock icon will disappear.