Why don’t ZIP files have the FILE_ATTRIBUTE_COMPRESSED attribute?

A customer reported that when they called Get­File­Attributes on a ZIP file, the FILE_ATTRIBUTE_COMPRESSED attribute was not returned. But ZIP files are compressed. Why isn't the FILE_ATTRIBUTE_COMPRESSED attribute being set?

Because FILE_ATTRIBUTE_COMPRESSED tells you whether the file was compressed by the file system. It is not a flag which describes the semantics of the bytes stored in the file. After all, the file system doesn't know that this particular collection of bytes is a ZIP file and contains data that was compressed externally. Who knows, maybe it's just some uncompressed file that just happens to look superficially like a ZIP file (but isn't)?

If a text file consists of the string "ADTUR ADKUH", is this a compressed file? Maybe it's somebody's product key, in which it isn't compressed. Or maybe it is short for "Await instructions before taking further action. Acknowledge receipt of this telegram by wire." That's an example of a commercial code, used to save telegram transmission costs by compressing frequently-used business phrases into five-letter pseudo-words.

The file system doesn't try to figure out whether a particular sequence of bytes it has been asked to store was externally compressed. It just stores the bytes on disk, perhaps after performing its own internal compression, and if that internal compression was performed (even if it didn't actually result in any compression), the FILE_ATTRIBUTE_COMPRESSED attribute is set.

Similarly, the FILE_ATTRIBUTE_ENCRYPTED attribute is set if the file contents were encrypted by the file system. If encryption took place externally, then the attribute is not set because the file system doesn't know that the byte sequence it was asked to store represented encrypted data.

(Note that many special-purpose file formats, such as DOCX, JAR, JPG, and PNG, are internally compressed, even though they are not advertised as such.)

Comments (25)
  1. Martin says:

    Because a ZIP file is an archive, FILE_ATTRIBUTE_ARCHIVE should be set.

    [Nice one. -Raymond]
  2. Raphael says:

    Don't forget FILE_ATTRIBUTE_ENCRYPTED for TrueCrypt volumes. Plausible Deniability is for the weak!

    (Incidentally, I would be quite interested in this FILE_ATTRIBUTE_VIRTUAL attribute.)

  3. Adam Rosenfield says:

    DOCX, JAR, and recent FLA (Flash development) files are all zip files in disguise, with extra constraints regarding their semantics (i.e. not every zip file is a valid DOCX, JAR, or FLA).  Older FLA files were just binary blobs with their own special file format before Adobe moved to zip archives (in CS5 I think?).

    The neat thing about zip files is that the file format only requires the directory to be at the end of the file, not the beginning, so you can stuff arbitrary bytes at the front of a zip and it'll still be a valid zip.  This allows for tricks like this: gayhacker.wordpress.com/…/well-well-well .

  4. Martin, I need a new keyboard now. Mine's covered in coffee.

    Why did I read Raymond's reply in my head in a completely flat tone, accompanied with a slow golf clap?

  5. Henke37 says:

    Because a zip file is like a folder you should be able to open it like one. Oh, wait…

  6. Dan Bugglin says:

    "(Incidentally, I would be quite interested in this FILE_ATTRIBUTE_VIRTUAL attribute.)"

    Pretty sure those mark "reserved" files hidden in the filesystem root used to store filesystem metadata.

  7. DWalker says:

    Of course, you could store a Zip file (which is compressed) in an NTFS compressed directory (or just tell NTFS to compress the file).  It probably won't take fewer bytes on disk, but the compressed attribute will get set!

  8. Matt says:

    @Adam Rosenfield – As are *.xap (Silverlight application package) files.  

  9. alegr1 says:

    File it under "I can't believe I had to tell that"

  10. Jeffrey Bosboom says:

    ZIP files don't necessarily use compression, anyway.  One of the defined compression methods is 'store', which just puts the file's bytes in the archive without compression.

  11. Me says:

    @Adam Rosenfield:

    Very interesting. I was of the impression ZIP files started with the "PK" magic header. Seems I was wrong.

  12. Csaboka says:

    @Adam Rosenfield: ZIP is even trickier than that – it can contain arbitrary data at the end as well, as the last field of the central directory is a general purpose "comment" field. The only limitation is that the extra data cannot be bigger than 64K since the length of the comment field is stored on two bytes.

    This means that the only foolproof way to open a ZIP file is to start at the end, and keep scanning backwards for a 4-byte magic number in the central directory. You can only safely reject a file after scanning its last 64 kilobytes. I only know because I had to deal with these edge cases, and the default ZIP handling code in Java gets it wrong…

  13. xpclient says:

    There's an "Index" attribute that probably only the legacy Indexing Service looked at. But it's text has changed in modern Windows to "Allow this file to have contents indexed in addition to file properties." Does this mean indexed by Windows Search or still the legacy indexer?

  14. Ben Voigt [Visual C++ MVP] says:

    Allowing arbitrary data appended to your file format is much appreciated by those who need to "tweak" one of your files to produce a hash collision.

  15. Csaboka says:

    @Ben Voigt: It's also useful for creating digitally signed self-extracting archives. The executable headers and the unpacker code will appear as an "arbitrary" prefix, and the digital signature will appear as an "arbitrary" suffix, but a well-written ZIP library can still extract files from it normally, without having to execute it.

  16. ErikF says:

    If Microsoft ever manages to get time travel working (come on already, Microsoft Research! :-) ), I wouldn't mind FILE_ATTRIBUTE_THE_ONE_YOU_ARE_LOOKING_FOR. It would simplify my programs ever so much, removing the need for those pesky Open and Save As dialog boxes!

  17. BC_Programmer says:

    Don't forget the FILE_ATTRIBUTE_SHARKS attribute, which is used to indicate that a file contains shark-related information. Later versions can add FILE_ATTRIBUTE_URCHIN of course, but since I would gain more from the SHARKS attribute clearly it should be added first.

  18. @ErikF

    And then you get the W32.Kenobi infect your system and start changing flags. FILE_ATTRIBUTE_NOT_THE_FILE_Y

    never mind.

  19. Neil says:

    Firefox's omni.ja(r) is a .zip file that has the central directory at the beginning.

  20. Mark Holland-Avery says:

    Clearly there would be no problem if Windows supported FILE_ATTRIBUTE_THIS_IS_A_ZIP_FILE. We also need to be able to flag important files with FILE_ATTRIBUTE_NOCORRUPT and frequently-used executables with FILE_ATTRIBUTE_GO_FASTER.

    Pro tip: If the search function can't find a file that you know is right there, and you are not content it got indexed, simply flag the file with FILE_ATTRIBUTE_NOT_CONTENT_INDEXED.

    Money-saving tip: You may be able to find that cordless mouse you used to have, as they often build a "human-interface device den" to live in. These can be found by searching for FILE_ATTRIBUTE_HIDDEN.

  21. DWalker says:

    xpclient has a valid complaint:  The attribute to "have this file's content indexed" is poorly documented.  As he says, "Indexed by what program?"  Indexed by Legacy Windows indexing or the newer Windows Search, or both, or what?

  22. 640k says:

    Index is a flag which suggest that indexing programs shouldn't index the file. As the Archive flag suggest that backup program shouldn't backup the file.

    What these programs actually do is up to the programs and their users. As a rule of thumb, MS own programs usually doesn't obey MS own rules.

  23. Gabe says:

    DWalker: The index attribute doesn't know who is going to index the file any more than the "archive" attribute knows who archives files. It's just an attribute to be used by any content indexing system that wants to know what files are(n't) worth indexing. No doubt it's used by both the old Windows Indexing and new Windows Search, along with other things like Google Desktop.

  24. Medinoc says:

    @640k: Doesn't the "archive" attribute suggest that backup program SHOULD backup the file actually?

  25. DWalker says:

    Gabe, you're probably right, but I suspect that Google Desktop Search (and probably other non-MS programs) don't use that flag at all.

Comments are closed.

Skip to main content