Windows file system compression had to be dumbed down


I noted some time ago that when you ask Windows to use file system compression, you get worse compression than WinZip or some other dedicated file compression program, for various reasons, one of which is that file system compression is under soft real-time constraints.

The soft real-time constraint means that one of the performance targets for file system compression included limits like "Degrades I/O throughput by at most M%" and "Consumes at most M% of CPU time," for some values of M and N I am not privy to.

But there's another constraint that is perhaps less obvious: The compression algorithm must be system-independent. In other words, you cannot change the compression algorithm depending on what machine you are running on. Well, okay, you can compress differently depending on the system, but every system has to be able to decompress every compression algorithm. In other words, you might say, "I'll use compression algorithm X if the system is slower than K megahertz, but algorithm Y if the system is faster," but that means that everybody needs to be able to decompress both algorithm X and algorithm Y, and in particular that everybody needs to be able to decompress both algorithm X and algorithm Y and still hit the performance targets.

The requirement that a file compressed on one system be readable by any other system allows a hard drive to be moved from one computer to another. Without that requirement, a hard drive might be usable only on the system that created it, which would create a major obstacle for data centers (not to mention data recovery).

And one of the limiting factors on how fancy the compression algorithm could be was the Alpha AXP.

One of my now-retired colleagues worked on real-time compression, and he told me that the Alpha AXP processor was very weak on bit-twiddling instructions. For the algorithm that was ultimately chosen, the smallest unit of encoding in the compressed stream was the nibble; anything smaller would slow things down by too much. This severely hampers your ability to get good compression ratios.

Now, Windows dropped support for the Alpha AXP quite a long time ago, so in theory the compression algorithm could be made more fancy and still hit the performance targets. However, we also live in a world where you can buy 5TB hard drive from Newegg for just $120. Not only that, but many (most?) popular file formats are already compressed, so file system compression wouldn't accomplish anything anyway.

We live in a post-file-system-compression world.

Comments (38)
  1. “We live in a post-file-system-compression world”
    For home use: Definitively, but for Servers (IIS/SharePoint-logfiles) and versions/editions of SQL Server that cannot compress backups file-system-compression can save a whole lot of expensive diskspace

    1. MarcK4096 says:

      For servers, you can use Windows deduplication. And some time in the future, ReFS block cloning will allow deduplication to be implemented without any of the rehydration penalties. NTFS file compression should go away. I tried it years ago on a Windows Server 2003 file server and it created a huge amount of fragmentation that decimated performance.

  2. DWalker07 says:

    Right; compressing a JPG or an XLSX or a DOCX file will give a same-sized or slightly smaller file, but not by much.

    1. Kai Schätzl says:

      It even may actually yield a slightly bigger file.

    2. Tom says:

      The docx is a zipfile anyway…

  3. Antonio Rodríguez says:

    Another scenario where file compression helps is when you have an slow hard drive or SSD. In that case, file compression can actually give you a better throughput. In mi case, I still have three (not one, not two, but three) Acer Aspire One 110 in working condition. You know, the ones that came with a slow SSD. I use the SSD just for the OS and the software, and put all the data in the second card reader (intended for storage expansion). That way, I can have the SSDs compressed so they are faster (even with the low-power Atom 260) and have more capacity. The storage cards, on the other hand, are uncompressed: they are formatted in FAT32, and most of their space is used by music, videos and compressed archives, so I wouldn’t gain much.

    But in the days of 500 MB/s transfer speeds, even the speed benefit of file compression is void.

  4. Mike Dimmick says:

    You can’t fit any new storage into WIMBoot devices, however. My girlfriend’s laptop has 32GB of non-expandable internal storage – 7GB of which was lost to the boot/recovery partition. Upgrading to Windows 10 basically made this redundant but there was no way to recover the space. When it came to installing the Windows 10 Anniversary Update, it failed due to insufficient space, but there was genuinely nothing on there apart from Windows itself. I had to do a wipe and clean install to perform the update.

    Windows was already reporting that it was in the compressed state, but forcing compression did gain some extra. A little. Not enough to actually install the OS update!

    In sum – the compression algorithm is still important.

  5. Tramb says:

    It would be better if MS made NTFS better and allowed *customers* to decide in which world they want to live.
    Many people (me included) seem to think ZFS (or btrfs) have useful features (I shan’t list these, you know what they are)

    1. AndyCadley says:

      Windows has supported installable file systems for some time now. If you want to use ZFS or Btrfs, you’re welcome to write one

      1. Joshua says:

        And how do you boot Windows from one?

      2. Tramb says:

        The point is that I understand that NTFS doesn’t have every feature in the world but don’t try to convince me I don’t need them.

        “If you want to use ZFS or Btrfs, you’re welcome to write one”
        I can accept “You can contribute” messages for OSes I don’t pay for, not for Windows.

        1. Klimax says:

          The only feature I know NTFS lacks are checksums on file content. ReFS fixes that. Are there any other feature missing? And in previous post you mention “world”, you mean what compression should be used? That would be confusing, rarely used and even rarer understood. And for little benefit. Just extra testing of unused code.

          IIRC if you want to augment it, then you can write your own driver filter or other add-on, but I don’t see rationale for general solution or mostly nonexistent problem.

          1. Kevin says:

            ZFS and btrfs are next-generation filesystems. If installed correctly (e.g. in a RAID configuration), they are quite capable of automagically healing file corruption, and quite a few other neat tricks like COW snapshots. Ars Technica did a nice feature on these things a while ago: http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/

          2. Tramb says:

            I was specifically thinking of ubiquitous checksumming and deduplication.
            Is ReFs a real possibility on the modern Windows desktop right now (bootable and choosable at install) ?
            And compression has its use on top of user-space compression because the fs and vm can maintain caches of uncompressed blocks, avoiding redundant unpacking between processes. I was referring to “We live in a post-file-system-compression world.”. I disagreed.

          3. Klimax says:

            By those definitions NTFS is next gen already… And ReFS doesn’t need RAID for fixing corruption at all. COW is IIRC under dedup, which is already supported in NTFS

            So in short, there is nothing missing in NTFS. (with sole exception of checksum and those were introduced with ReFS)
            Note: ReFS doesn’t appear yet bootable, but can’t confirm that.

    2. That’s why they made ReFS. Would be nice if they expanded ReFS’s feature set so it could host Windows on it, but I’m guessing they’re still waiting for it to fully stabilize and vet out before they take the next step considering the implications it’d have for customers.

      1. Koro says:

        Isn’t ReFS a copy-paste of the NTFS code, tweaked by people who have no idea how the original code worked?

        1. GDwarf says:

          Not even remotely? It doesn’t have some features NTFS does (hard links, for one) but has several that NTFS doesn’t, and is based on a fundamentally different way of performing file operations (Copy-on-Write). You might as well call ZFS “just a FAT clone”.

  6. Florian S. says:

    I’ve never seen it as the file system’s job to compress files on disk, even back in the days when disk space really mattered (I think it still matters, even in the TB era. All that this much free space led to was wasteful behavior. Same goes for RAM.) That’s what compressed container formats are for, and they are system-independent, too. And you can copy them without decompressing the data first (I assume that when you copy file system compressed data from one disk to another, it needs to be decompressed on the source and recompressed on the target, doesn’t it? Or does Windows pass the compressed data directly?)

  7. Mc says:

    Does the compression just compress the file contents, or the actual file system itself?
    i.e. If you have 8K clusters and write a 4K file that gets compressed to 2K is there still 6K of wasted, unused space on the cluster that can’t be reused for other files?

    1. Richard Wells says:

      IIRC, compression will combine files into a single allocation chunk and save on slack space that way. If the combined size of all the files being compressed is less than 64kB or greater than 50GB, NTFS compression may not be the correct solution.

  8. Pilchard123 says:

    I think you might have made a typo in your second paragraph. You refer to values M and N that you are not privy to, by use ‘M%’ twice and never (presumably) ‘N%’. An interesting post, nonetheless.

    Nit-pickery, ho!

    1. Pilchard123 says:

      And, of course, I have typo’d myself; ‘by use’ should be ‘but use’. Muphry’s law strikes again…

  9. Roger says:

    The Windows 10 installer and anniversary update installer both refuse to install if the “drive” is compressed. I couldn’t quite work out what it is looking at, and I think it only examined the top level directories and tree of \WINDOWS. It sure was annoying having to decompress to install, and then recompress afterwards.

    So why do I have compression? On a laptop because it is shared with Linux and I wanted Windows as small as possible. On a desktop because it is running off a 2TB spinning drive, and a blazingly fast processor so the less I/O the better.

  10. My main question for these types of requirements, is how they might be unit tested. I say this because unit testing for performance requirements (but also for comparing the performance among several implementations) has been annoying for me, historically – partially due to the ability to take measurements, but also because of newer technologies like SpeedStepping.

    At the time, I suspect that all the CPUs ran at full speed (fixed clock speed)… so a simple timer and eyeball for CPU use was probably sufficient… but I’d love to hear how the testing would be performed in today’s landscape.

  11. John Vert says:

    Maybe true, but the Alpha AXP decompression was twice as fast as all the other architectures because it could decompress 64-bits at a time. The other architectures (at the time) were limited to 32-bit chunks.

  12. Mihkel Soomere says:

    Actually NTFS compression is still very useful, for example on small SSDs where any saved space is a big win.
    Windows 10 seems to have added some new compression algorithms (LZX, XPRESS4K etc) in compact.exe however there isn’t much documentation on these.
    Anything interesting to tell us about these new algorithms?

  13. alegr1 says:

    Step 1: Make LINK write the PDB into compressed files.
    Step 2: Be surprised that now PDB write takes a lot of time compared with the rest of LINK.
    Step 3: Find some interns to separate PDB write to a separate thread.
    Step 4 ?
    Step 5: Nothing really changed. Your giant Windows build will still take as much time.

  14. Ted M says:

    I forgot I had full write access to the entirety of the system drive in a Windows Vista install (it was during my “more admin access is better” idiot phase) and I managed to (what I assume was) compress the boot loader. That took a while to fix.

  15. Bruce Hoult says:

    I find this very hard to believe. The Alpha had all the normal boolean operations (if not more) and arbitrary arithmetic or logical shifts by literal or dynamic counts. It even had population count and count leading zeroes.

    As far as I’m aware, these were all fast operations, even on the 21064, not bit by bit or something. The manual even says to do bitfield extraction by a left shift followed by a right shift (to make all unwanted bits fall off the end), and sign extension by a left shift to align the value to the left of the register followed by an arithmetic right shift to replicate the sign bit. I don’t think they’d tell you do do this if it was slow.

    The first Alpha didn’t have BYTE loads and stores, only full word, but that doesn’t hamper compression (or other) algorithms when you have fast shifting and masking. Well, and competent programmers, of course. And that certainly would not somehow favour nibble-sized data over arbitrary bit sizes.

  16. Stephen Donaghy says:

    I’m surprised nobody in the comments thus far has mentioned data deduplication yet. I think a lot of the “Reasons why I still use compression” are solved largely by switching data dedup on. Of course it’s only really available in windows server, which is a shame.

    1. alegr1 says:

      Data deduplication only helps if you have a lot of files with identical blocks, for example, a lot of VM virtual disks with Windows images on it. In general client usage scenarios it just doesn’t happen.

      1. smf says:

        Data deduplication gives you free “sparse files” without the application having to be coded specially.

        There are other scenarios than working with multiple VM’s.

      2. MarcK4096 says:

        Windows deduplication does do compression. IMO, there’s no reason to choose NTFS file compression over it.

  17. Martin Ba. _ says:

    “We live in a post-file-system-compression world.” – Others already mentioned it, but at this moment I think we are very much *not*:
    Small form factor devices (the mention 32GB tablet) and still limited SSD sizes, combined with a bazillion locations that do *not* store compressed data (Looking at you, C:\Windows\Installer – 10GB vs. 8GB compressed – a 2GB save is a 2GB save on a few hunnerd GB SSD)

    So, the article was nice, but “We live in a post-file-system-compression world.” is just not true. It’s still useful.

  18. smf says:

    “The compression algorithm must be system-independent. In other words, you cannot change the compression algorithm depending on what machine you are running on. Well, okay, you can compress differently depending on the system, but every system has to be able to decompress every compression algorithm.”

    I disagree. Ideally the algorithm and parameters would be selectable by the user, if the user wants to select the option that makes it degrade so that the disk can be shared with an Alpha AXP then it’s up to them. The ultimate situation would be that users could install their own compression engines. If you mount a disk on a system without that particular engine then you’d get a suitable error when you try to access the files (although again ideally you’d still be able to access the compressed version of the files).

Comments are closed.

Skip to main content