IE7 Networking improvements in content caching and decompression

Hello!  I am Venkat Kudallur, development lead for Networking in Internet Explorer.  We have made several improvements in Internet Explorer in Networking, and in this post, I would like to introduce you to some of the improvements in content caching and decompression in IE, two features that play a key role in speeding up the delivery of pages from a remote web server.  If you’re a webmaster, developer using the IE Networking API, or just curious about IE Networking, I think you’ll find these details interesting.

Content caching eliminates a round-trip to the server (or reduces traffic with conditional GETs), and compression, of course, effectively increases throughput by compressing data.   Compression (through standard algorithms such as gzip) plays a role in the dial up speedup services offered by several ISPs such as MSN, AOL, Netzero who offer a premium service that ‘speeds up’ dialup or broadband.  Most of these services use dedicated servers and a combination of standard and proprietary algorithms for compression, and/or tune TCP/IP parameters on the machine for speeding up data transfer.  Compression is likely to be a key part of the perceived speed up as most web content makes for good compression candidates:  typically ASP for HTML compresses 2X (two-fold), JS files for JavaScript by 2-4X and CSS files for style sheets compresses by 2-5X.  Proprietary algorithms are typically used for other media content, which these IE changes don’t impact. 

A quick introduction to the key IE modules used in Networking is called for:

  • WinInet.dll offers a Win32 API for http, https, and ftp downloads combined with other API for caching and parsing.  It’s a very popular binary, and in addition to being part of the IE platform, is widely used in Windows client applications for its Networking services. 
  • UrlMon.dll is a utility layer that wraps and generalizes the WinInet API into a more generic and extensible pluggable protocol layer.  It provides a COM interface to the HTTP Win32 API offered by WinInet, and has COM-based support for incorporating other protocol implementations into the IE stack.  Several download managers available for download on the web commonly use this mechanism to tap into IE’s download space and pick off certain types of content (such as binaries) to be downloaded within the manager. 

The key takeaway is that the bulk of http implementation, including caching, lies within WinInet, while UrlMon provides a COM wrapper around it and allows extension and filtering. 

Prior to IE7, decompression happened in the UrlMon layer as a pluggable layer.  The IE gzip and decompression was exposed through COM, and generically plugged in by the UrlMon implementation to work on the compressed data stream exposed by the WinInet Win32 API.  The model was nice because any new decompression formats could be nicely plugged in as a COM implementation and registered with UrlMon to use on the compressed data stream.  In practice, there were conditions under which this logical separation of decompression from the download complicated the model.  For IE7, we have moved the decompression to logically sit above the download implementation within WinInet.  This approach gives us several benefits:

  • It reduces a round of file system read/writes.
  • It avoids double parsing of caching directives.
  • It centralizes and makes consistent caching decisions and timing considerations for compressed and decompressed content.
  • It removes the need for COM-related synchronization in the default compression scenarios.

I expect that these changes fix a set of issues commonly seen in IE and IE-hosted applications when compression is used, particularly when there is dependence on the cache file used to store the content on the browsing machine.  Developers consuming UrlMon and WinInet API need not be concerned about any changes in API behavior resulting from this change in IE7 – the UrlMon API continues to decompress compressed data transparently, and the WinInet API, by default, returns compressed data as in prior versions.

WinInet.dll is responsible for a cache, which is loaded and synchronized across all the processes and services using it.  In addition to serving as a cache for various types of content downloaded by WinInet, it’s also exercised through the use of the WinInet caching API which provides a URL-based index for storage and retrieval.  Its popularity, however, brings with it the downside of any instability (e.g. corruption of the index from a sudden reboot in the middle of a write-through operation) impacting all the processes that use it.  We have significantly rewritten the WinInet cache index manager IE7 to ensure that it can gracefully recover from corruption or failure to grow the memory mapping of the index file.  In addition we have improved the caching heuristics, extensively scrubbed API for parameter validation, and now handle Internationalized Resource Identifiers (IRI) more consistently in the API.  I expect huge stability and functionality gains from the caching changes made in this release.

To read more on the impact of caching and compression on HTTP performance, check out this article by Eric Lawrence, IE Networking Program Manager.  I welcome your feedback and suggestions for IE Networking features or for topics you would like us to blog about.

 - Venkat