Content-Encoding != Content-Type

RFC 2616 for HTTP 1.1 specifies how web servers must indicate encoding transformations using the Content-Encoding header. Although on the surface, Content-Encoding (e.g., gzip, deflate, compress) and Content-Type (e.g., x-application/x-gzip) sound similar, they are, in fact, two distinct pieces of information. Whereas servers use Content-Type to specify the data type of the entity body, which can be useful for client applications that want to open the content with the appropriate application, Content-Encoding is used solely to specify any additional encoding done by the server before the content was transmitted to the client. Although the HTTP RFC outlines these rules pretty clearly, some web sites respond with “gzip” as the Content-Encoding even though the server has not gzipped the content.

Our testing has shown this problem to be limited to some sites that serve Unix/Linux style “tarball” files. Tarballs are gzip compressed archives files. By setting the Content-Encoding header to “gzip” on a tarball, the server is specifying that it has additionally gzipped the gzipped file. This, of course, is unlikely but not impossible or non-compliant.

Therein lies the problem. A server responding with content-encoding, such as “gzip,” is specifying the necessary mechanism that the client needs in order to decompress the content. If the server did not actually encode the content as specified, then the client’s decompression would fail.

Here is a potentially over-simplified example:

  1. Windows Vista Networking Rocks!
  2. Jvaqbjf Ivfgn Argjbexvat Ebpxf!

If I mistakenly claim that string a) has been encoded using the simple ROT-13 obfuscation scheme when in actuality it has not, then the decoded message b) will be very different than the intended message.

Since the AI engine for WinINet isn’t yet ready for production (joke), we try and work-around these non-compliant server responses but that isn’t the right long-term approach. The fix and the ask, is for web server, extension and application authors to test their servers to see if they exhibit this behavior and if so fix their implementations before we remove our client-side hacks.

To test your server for compliance, issue a simple HTTP 1.1 request, including the “Accept-Encoding: gzip” for a .gz file and inspect the headers. If you see Content-Encoding: x-gzip or gzip, then the server is either gzip-encoding the already gzipped file or it is misstating that the content has been encoded by the server before transmission and therefore perpetuating client HTTP stacks, such as WinINet, having to absorb and hide this bad server behavior.

-Billy Anders

Comments (6)

  1. Vince says:

    Doing some archeo here but:

    we just came accross an issue and after a case was open to MS, the official answer is:

    "The Internet Explorer not decompresses only files received with Content-Encoding=GZIP AND with Content-Type like









    to not break the behavior to IE 6. This was the reason to hold this behavior.

    We try for consistency with HTTP 1.1 standard, but we can´t give any guaranty that this meet the HTTP 1.1 RFC.

    So it´s better to use a modified Content-type-header to bring the IE to decompress these files."

    Very nice and not completelly inline with the "philosophy" of the above post…

  2. Paul Warren says:

    “By setting the Content-Encoding header to “gzip” on a tarball, the server is specifying that it has additionally gzipped the gzipped file. This, of course, is unlikely but not impossible or non-compliant.”

    It’s not actually that unlikely, and whilst pointless, it is, as you say, compliant.  The result of IE’s client side hack is to break perfectly compliant web sites, forcing them to implement workarounds for IE’s broken “hack”.

    We now disable gzip compression for anything other that text/html if the browser identifies itself as IE, because it can’t be trusted to get it right on other document types.

    I’m sorry that some of us have to pay to deal with other websites and server software that don’t handle the protocol right, but considering that most non text content is already compressed (images, video, xml based office document formats) the server setting you did sounds like the right configuration. — Ari
  3. Stuart Rowan says:

    We’re only having to ‘pay’ because the Internet Explorer team implemented the standard incorrectly.

    Also I think your original post confuses a tarball with a gzipped tarball.

    foo.tar — plain old tape archive (tar)

    foo.tar.gz / foo.tgz — plain old tape archive that is then gzipped compressed

    If you receive a document that is Content-Encoded using gzip, you must ungzip it before saving to disk. The fact you IE does this for nearly but not all content is quite bonkers in my opinion.

    An example: Download a zip file from a standards compliant web server. Let’s assume the file is called

    1. The browser sends a request for the zip file, including an Accept-Encoding: gzip, deflate header

    2. The server sends a response saying various things such as content-length, content-disposition and crucially Content-Encoding: gzip

    3. The browser receives a stream of bytes (a gzipped zip file in fact)

    4. The browser should gunzip the stream and save it to a file (ungzip and save resulting zip file to disk).

    Instead of step 4 above, which chrome, safari, opera and firefox all manage, internet explorer saves the stream in step 3 to in reality Internet Explorer has saved to disk but called it

    This means Windows built-in compressed folder widget cannot open the zip file if it is downloaded using IE but can if it is downloaded using any other popular browser.

    In light of the above, do you consider this behaviour of Internet Explorer a feature or a bug?

    As Billy mentions, we considered it a compatability hack that we would like to get rid of as soon as possible. However you still might want to consider the usefullness of compressing already compressed content. Also yes you are correct that we are talking about .tar.gz/.tgz files (but one of the content-encodings we look for is application/x-tar). –Ari
  4. Stuart Rowan says:

    Ari — thank you for your response. The main reason people are hitting this bug in IE is because they want simple server configuration: just enabling gzip content negotiation site-wide is simple and works with every other browser out there out of the box.

    Without inspecting a zip file one has no way of knowing whether entries in it have been compressed or simply ‘stored’ … the compression in a zip file is optional. So there’s no right answer as regards ‘compressing already compressed content’ in the context of application/zip.

    Furthermore, the overhead of gzipping already compressed data is negligible (on current hardware) so whether it’s useful to do so or not doesn’t really matter.

    Is the hack still present in IE8?

    Which specific group of people / applications benefit from the hack? This was never clear from Billy’s original blog post.

  5. This broken "feature" is still present in IE8. If you download a Zip file (Content-Type: application/zip) from Apache and it is also gzip encoded (Content-Encoding: gzip) then the file IE8 saves to disk will still be gzip-encoded, and Windows, Winzip, etc., will not be able to open it.