IE7 Networking improvements in content caching and decompression


Hello!  I am Venkat Kudallur, development lead for Networking in Internet Explorer.  We have made several improvements in Internet Explorer in Networking, and in this post, I would like to introduce you to some of the improvements in content caching and decompression in IE, two features that play a key role in speeding up the delivery of pages from a remote web server.  If you’re a webmaster, developer using the IE Networking API, or just curious about IE Networking, I think you’ll find these details interesting.

Content caching eliminates a round-trip to the server (or reduces traffic with conditional GETs), and compression, of course, effectively increases throughput by compressing data.   Compression (through standard algorithms such as gzip) plays a role in the dial up speedup services offered by several ISPs such as MSN, AOL, Netzero who offer a premium service that ‘speeds up’ dialup or broadband.  Most of these services use dedicated servers and a combination of standard and proprietary algorithms for compression, and/or tune TCP/IP parameters on the machine for speeding up data transfer.  Compression is likely to be a key part of the perceived speed up as most web content makes for good compression candidates:  typically ASP for HTML compresses 2X (two-fold), JS files for JavaScript by 2-4X and CSS files for style sheets compresses by 2-5X.  Proprietary algorithms are typically used for other media content, which these IE changes don’t impact. 

A quick introduction to the key IE modules used in Networking is called for:

  • WinInet.dll offers a Win32 API for http, https, and ftp downloads combined with other API for caching and parsing.  It’s a very popular binary, and in addition to being part of the IE platform, is widely used in Windows client applications for its Networking services. 
  • UrlMon.dll is a utility layer that wraps and generalizes the WinInet API into a more generic and extensible pluggable protocol layer.  It provides a COM interface to the HTTP Win32 API offered by WinInet, and has COM-based support for incorporating other protocol implementations into the IE stack.  Several download managers available for download on the web commonly use this mechanism to tap into IE’s download space and pick off certain types of content (such as binaries) to be downloaded within the manager. 

The key takeaway is that the bulk of http implementation, including caching, lies within WinInet, while UrlMon provides a COM wrapper around it and allows extension and filtering. 

Prior to IE7, decompression happened in the UrlMon layer as a pluggable layer.  The IE gzip and decompression was exposed through COM, and generically plugged in by the UrlMon implementation to work on the compressed data stream exposed by the WinInet Win32 API.  The model was nice because any new decompression formats could be nicely plugged in as a COM implementation and registered with UrlMon to use on the compressed data stream.  In practice, there were conditions under which this logical separation of decompression from the download complicated the model.  For IE7, we have moved the decompression to logically sit above the download implementation within WinInet.  This approach gives us several benefits:

  • It reduces a round of file system read/writes.
  • It avoids double parsing of caching directives.
  • It centralizes and makes consistent caching decisions and timing considerations for compressed and decompressed content.
  • It removes the need for COM-related synchronization in the default compression scenarios.

I expect that these changes fix a set of issues commonly seen in IE and IE-hosted applications when compression is used, particularly when there is dependence on the cache file used to store the content on the browsing machine.  Developers consuming UrlMon and WinInet API need not be concerned about any changes in API behavior resulting from this change in IE7 – the UrlMon API continues to decompress compressed data transparently, and the WinInet API, by default, returns compressed data as in prior versions.

WinInet.dll is responsible for a cache, which is loaded and synchronized across all the processes and services using it.  In addition to serving as a cache for various types of content downloaded by WinInet, it’s also exercised through the use of the WinInet caching API which provides a URL-based index for storage and retrieval.  Its popularity, however, brings with it the downside of any instability (e.g. corruption of the index from a sudden reboot in the middle of a write-through operation) impacting all the processes that use it.  We have significantly rewritten the WinInet cache index manager IE7 to ensure that it can gracefully recover from corruption or failure to grow the memory mapping of the index file.  In addition we have improved the caching heuristics, extensively scrubbed API for parameter validation, and now handle Internationalized Resource Identifiers (IRI) more consistently in the API.  I expect huge stability and functionality gains from the caching changes made in this release.

To read more on the impact of caching and compression on HTTP performance, check out this article by Eric Lawrence, IE Networking Program Manager.  I welcome your feedback and suggestions for IE Networking features or for topics you would like us to blog about.

 – Venkat

Comments (34)

  1. PatriotB says:

    Thanks for the excellent posting! It’s amazing all the under-the-hood stuff that is going on for IE7. Keep up the good work!

  2. Anonymous says:

    Hey Venkat, it’s great to hear this terrific news. IE7 sounds more and more promising with every post to this blog!

  3. Anonymous says:

    Hey you guys! You release a cool new product and "I" a Microsoft Consultant in Los Angeles need to blog about it first?

    http://spaces.msn.com/members/bhandler/Blog/cns!1pt1v0Q4vD8jSvNS4lqdAuug!507.entry

    Silly Developers 🙂

  4. Anonymous says:

    Blake: Thanks for the mention of the developer toolbar. We mentioned it was about to be released back in September (http://blogs.msdn.com/ie/archive/2005/09/13/465338.aspx) but we haven’t published the full post about the toolbar (still in beta) just yet. Stay tuned. 🙂

  5. Anonymous says:

    Oops. We actually did publish one post about it on 16 Sept: https://blogs.msdn.com/ie/archive/2005/09/16/469686.aspx

  6. Anonymous says:

    Does this mean that gzip encoding will actually work in IE7?

    That would be cool.(and it’s about time)

  7. Anonymous says:

    > I expect that these changes fix a set of issues commonly seen in IE

    Does this mean that the bug with compressed content not sending the If-None-Match header (see http://jpdeckers.blogspot.com/2005/05/ie-still-broken-with-gzipped-content.html) in the request has been fixed ? That would be very nice!

  8. Anonymous says:

    Hey gang, if anybody is listening. . .

    So we use gzip compresssion all the time, especially when sending down XML data. One of the things I’ve noticed with IE is that it decompresses the file in the cache. What worries me about that you say? Well is mshtml or msxml reading the compressed bytes or the uncompressed bytes? seems to me it should be reading the compressed bytes as that would be much faster. . . ? can anyone tell me which is the case. . .

  9. Anonymous says:

    <<seems to me it should be reading the compressed bytes as that would be much faster>>

    Hello Sean,

    Whenever the http data is decompressed into the IE cache, all consumers of the data in IE read the decompressed data. There are 2 good reasons for this approach.

    1. That is the fastest approach. Compressing the data enhances the effective download rate. However, there is a cost to decompressing the compressed data. Decompressing the data once amortizes the cost across all consumers of the data, (versus decompressing on demand each time).

    2. Another good reason is to leave the compression as a transport detail. Most consumers of the data (especially with the IMoniker model – http://msdn.microsoft.com/library/default.asp?url=/library/en-us/com/html/17f4c1df-7a9c-42ef-a888-70cd8d85f070.asp) don’t want to know or care where the data came from or how it was delivered – as far as they are concerned, it could just as easily be streamed from a file on disk or a stream in memory.

  10. Anonymous says:

    Hello Sean,

    what do you think about

    <link rel="prefetch" href="foo.html" title="foo" />

    I know, it’s not standard, but verry effective for smal bandwidth.

  11. Anonymous says:

    : Whenever the http data is decompressed into the IE cache, all consumers of the data in IE read the decompressed data.

    Venkat,

    I think he’s suggesting you pipe data from the decompressor straight into MSXML rather than decompress to disk and then read back in. (Although disk caching might even that out.)

  12. Anonymous says:

    <<pipe data from the decompressor straight into MSXML rather than decompress to disk>>

    Thanks for clarifying the question, Rup.

    The benefit from disk-caching (if caching semantics permit) typically outweigh the CPU cycle cost and the small delay from disk-IO of the decompressed data. The biggest advantage of having the data on disk is to provide seekable access if required. Several components in the IE ecosystem require lazy and/or random access to the downloaded data. In those cases, it’s most efficient if the data is available on disk rather than buffered in mmeory or required to re-download.

  13. Anonymous says:

    <<Does this mean that the bug with compressed content not sending the If-None-Match header (see http://jpdeckers.blogspot.com/2005/05/ie-still-broken-with-gzipped-content.html) in the request has been fixed ?>>

    Yes, it is :).

  14. Mike Dimmick says:

    The main issue I hope you’ve fixed is the cache indexes getting out of sync with the cache contents. This causes all manner of odd behaviour in IE – which can of course be cleaned up by emptying the cache, but that wastes the benefits of doing it.

    I recall someone (was it Jeff Davis?) posting a suggested fix to one of the numerous problems of content that had just been downloaded disappearing from the cache, causing either a failed rendering (missing images, incorrect styles due to missing stylesheet) or the inability to save an image in its original format using Save Picture As from the context menu. The fix was apparently to reduce the size of your cache to 60MB or less, because there was a limited number of cache index entries and these were exhausted long before the actual cache capacity was reached. Has this been fixed?

    I also hope that, should a power failure occur during a cache management operation, you will no longer invalidate and dump the entire cache! Basically I think I’m saying that the cache should be treated as valuable, since it represents saved bandwidth and time. Cookies are even more valuable (IMO, I know anti-spyware tools don’t agree) as they represent saved state data.

  15. Anonymous says:

    << there was a limited number of cache index entries and these were exhausted long before the actual cache capacity was reached. >>

    Mike– Yes, this is fixed in IE7 and in a monthly rollup for IE6 in (I believe) August. We actually ended up rewriting a huge amount of the caching code to make it much more robust and reliable.

    With regard to "Save Picture As"– we’ve actually identified about several specific scenarios that can cause this feature to offer only "Bitmap" format. The August fix was one of those, and the only one that could be resolved by having the user clear his or her cache file and refresh the page.

  16. Anonymous says:

    I’m not sure if this is the correct thread to ask in, so apologies if it is not, but could I ask if the image caching problem when using CSS pseudo code has been fixed?

    For example, when using the :hover pseudo element to change an image background, IE will request the image every time a hover is made use of. An example bit of code can simply be:

    a:hover {

    background: url(picture.jpg);

    }

    … and checking the access logs, the picture would have been requested and downloaded for every hover that takes place.

    It can also cause flickering when this is used for web sites navigation. For example, when the mouse move over a link the old image will disappear instantly, while the hover image is still waiting to download (again and again).

    If this has been fixed, then that’s fantastic, but could I also put forward the suggestion of pre-downloading images used within a CSS file, so that the initial flickering is less likely to be seen.

    Great post, and great blog by the way. I, along with so many others it seems, really do appreciate the openness and information being presented.

  17. Anonymous says:

    Do you have any plans on addressing the fact that IE have one idea about what "deflate" mean while most other browsers have another opinion? As far as I understand it is only a couple of header-bytes that differs, but the net result is that no-one can safely use "deflate".

  18. Anonymous says:

    <<about <link rel="prefetch" href="foo.html" title="foo" />>>

    Hello Jens,

    Assuming this question was intended for me:

    Where webpages find the predictive download useful, I’ve seen them use script or the download behavior to download files predictively. Improvement in perceived performance primarily depends on the accuracy of the prediction, and on the caching headers that are sent by the server to ensure the content stays around for use once it’s needed.

    Prefetching through html markup or META tags did not make it for IE7, but we do have it on radar for the subsequent rev.

  19. Anonymous says:

    To the networking team: any idea’s about if you will finally properly implement HTTP/1.1 pipelining?

  20. Anonymous says:

    ChrisH– There’s a very specific timing issue in play with IE’s download scheduler which is likely causing problems for your scenario. Your best bet is to ensure that the image has proper caching headers (see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnwebgen/html/ie_introfiddler2.asp?frame=true).

    If you’d like, you can trigger a prefetch of the CSS image by dynamically generating an IMG tag via script and setting its source to the image to be prefetched.

    jbg– I’ve not seen any indications that DEFLATE is unsafe; in fact, several popular HTTP compression hardware devices deliver deflated content that works in both IE and Firefox.

    I believe IE and IIS both match the RFC specification for deflate. I’d love to learn more about any failing scenarios. Do you have a URL the reproduces the problem?

    Ian– We have not implemented pipelining (either properly or improperly) in the IE7 WinINET HTTP stack.

    While pipelining can offer significant performance improvements when the end-to-end network path correctly implements pipelining, it fails when the server or intermediary proxies do not support pipelining. We expect to take another look in the IE8 timeframe.

  21. Anonymous says:

    With regard to pipelining, I’ve got it enabled in Firefox for a long time and the last time I saw a problem with regard to pipelining was also a long time ago. Which is not really surprising, because even the laziest web host owners are at some point forced to upgrade their software because of customer demand and to avoid falling victim to hackers.

    I think the situation is similar to SSLv2, and browsers can by now safely start implementing (and using) pipelining. I’d say it’s quite an improvement to the ‘perceived speed up’.

    ~Grauw

  22. Anonymous says:

    So, um, the untitled.bmp bug is not yet fully resolved? I have seen it this month using an up-to-date IE6 on XP SP2 with certain images…

    I dare to call it the worst bug in Internet Explorer.

    I’ve linked to a few images that don’t seem to work (one of them appears to work now) and listed some other bugs here:

    http://www.livejournal.com/users/tmaster/32935.html

    Look for item D.

    And thanks for all the improvements, I’m waiting for IE7! Take your time, though 😉

  23. Anonymous says:

    Good list of bugs there, hope someone on the team looks at it. On my system (with all available updates) the second and third images seem to save as jpegs but the others do still come up as bitmaps.

  24. Anonymous says:

    Thanks, frandom.

    I think the second and third might have been fixed by the August updates. I think I’ll be replacing those links when I find more images. If I’d insert all images that appear to be broken, I could replace all words in the entry with links.

  25. Anonymous says:

    Thanks for the reply r.e. pipelining Eric Law.

    Quote:"With regard to pipelining, I’ve got it enabled in Firefox for a long time and the last time I saw a problem with regard to pipelining was also a long time ago."

    There are still problems lurking about; Opera has to implement heuristics to enable/disable it, and has felt that some recent load-balancing proxies don’t handle pipelining properly. Still adoption by IE would certainly add pressure for proxies/servers etc. to finally work properly with HTTP/1.1 (all they have to do is ensure order in returning their requests).

  26. Anonymous says:

    Any word how this decompression will be exposed in WinInet? Any docs for this yet?

  27. Anonymous says:

    Documentation will be published on MSDN around December.

    Essentially, you’ll need only InternetSetOption an additional flag on your hInternet and WinINET will automatically decompress and remove the Content-Encoding header.

  28. Anonymous says:

    Thanks Eric,

    Although, could I ask anybody here how I may go about setting the HTTP headers for images on a web server running Apache which I can’t control (as my web site is virtual hosted).

    Thanks.

  29. Anonymous says:

    Hi Venkat, this was a nice article to read about what is coming up in IE7. As a side note I am a TEst manager works for a company in hyderabad. I am looking for some basic articles about how IE works – like how IE parses URLs, How it sends and recieves DNS queries and in general 100% basic stuff about IE – Can you point me to any publicaly available material on this? I am learning from test perspective how IE internally works so that I can test web applications that run on IE better. Would appreciate if you can drop a mail to shrinik@igate.com or shrinik@gmail.com.

    BTW, I am ex-Microsoftee worked at this great company between 2003-2005 at GDCI hyderabad.

    (check out my msdn blog at http://blogs.msdn.com/shrinik)

    Great work – keep it up

    Shrini

  30. Anonymous says:

    Chris H: The ability to set up cache control headers on images depends on your access level. You don’t need to be able to write to the server configuration, but the server needs to be configured to give you a certain level of control via .htaccess files. Some servers are, some are not.

    If your server permits you to use the appropriate options in .htaccess files, then a Google search for:

    htaccess "cache control"

    will tell you what you need to know.

    Some admin bribery may be required to get them to enable mod_expires and give appropriate control rights in .htaccess .

  31. Anonymous says:

    Craig Ringer, thank you for the heads up. I seems my host does support .htaccess and mod_expires, so I should be able to make use of the cache headers properly.

    Thanks again.