How does Explorer calculate “Size on disk”?


When you ask to see the size of a file, you get two values. One is the nominal file size (the value that shows up in a directory listing). The other is something called "Size on disk". What is "Size on disk"?

The algorithm for "Size on disk" is as follows:

  • If the file is sparse, then report the number of non-sparse bytes.
  • If the file is compressed, then report the compressed size. The compressed size may be less than a full sector.

  • If the file is neither sparse nor compressed, then report the nominal file size, rounded up to the nearest cluster.

Each of these values sort of makes sense in its own naïve way.

If a file is not compressed, then it occupies some integral number of clusters, so charge it for the clusters that it uses. This is accurate for FAT file systems, but it is naïve for NTFS, which has multiple stages of file growth. (Thre's even a stage for very small files, where the file contents aren't stored in a dedicated cluster at all.)

If a file is sparse, then Explorer reports the number of bytes of the file that are taking up space on disk. The spaces filled with virtual zeroes are not reported, since they aren't occupying any disk space. They only take up bookkeeping.

Furthermore, the "Size on disk" is naïve because it doesn't take into account any metadata for the file. The space on disk used to store the file name, last-modified time, the file size, the security information, and where on the disk the file contents can be found. And then there are volume journal entries, volume snapshots, and other things which the file contributes to by its mere existence. None of those are captured in the "Size on disk".

The upshot is that the Size on disk value reported by Explorer tries to say something that makes sense, based on context.

Bonus chatter: Starting in Windows 8.1, the Size on disk calculation includes the sizes of alternate data streams and sort-of-kind-of tries to guess which streams could be stored in the MFT and not count them toward the size on disk. (Even though they really are on disk. I mean, if they're not on disk, then where are they?)

Comments (15)
  1. pc says:

    Clearly the solution is to have several more values in the Explorer properties, such as:
    • Number of bytes that the MFT dedicates to this file
    • Number of bytes that would be freed if this specific file were deleted
    • Number of bytes that would be freed if this file and all hard links pointing to the same file were all deleted
    • Number of bytes on disk that would need to be overwritten in order to “destroy” the data in the file.
    • How many bytes it would take to send over a network

    And I’m sure you could add more for maximum confusion.

    1. Erik F says:

      Alternatively, Explorer could just say “The selection uses between -infinity and infinity bytes”, just to cover all possible cases (no file system I know of allows for negative sizes, but remember: future-proofing!) :-)

      1. pc says:

        I could imagine a file system (though I don’t know if any exist in practice) where the amount of space freed by deleting a tiny (or zero-data-length) file is negative, because the transaction log or equivalent would need to grow in order to record the deletion, more so than the space freed by the file entry. Thus, deleting the file would cause less free space on the disk, which could be thought of as a negative size by some of these definitions.

        1. Cesar says:

          @pc: That can happen on some filesystems with copy-on-write snapshots, or some layered filesystem scenarios, where to erase a file in the top layer means actually adding a “pretend this file doesn’t exist” note. Of course, the file isn’t being actually deleted in these cases.

          And there’s also btrfs, which for a while was infamous for running out of space (ENOSPC) in surprising situations, like when deleting a file. (It got much better later, and is nowadays quite usable.)

    2. Dennis says:

      It won’t be a solution for hardlinks / symlinks and junctions?

      We all know hardlink are evil.

  2. Wyatt says:

    Typo:

    Thre’s even

  3. Zan Lynx' says:

    Explorer gets this completely wrong for remote file systems too. It’s off by over 4 GB too much on 4,148 files.

    Of course, it would be a bit much to expect it to know about EXT4 or BTRFS over Samba.

    1. Joshua says:

      Ugh. There’s a system call for getting the number of blocks used. Use it.

  4. henke37 says:

    Personally I would have answered “It asks the file system.”. It’s funny to make people ask the correct question.

  5. DWalker says:

    “If the file is sparse, then report the number of non-sparse bytes.” Not rounded up to the nearest cluster size?

  6. alegr1 says:

    GetFileInformationByHandleEx:
    FILE_STANDARD_INFO::AllocationSize for non-compressed-non-sparse files.
    GetCompressedFileSize for compressed and sparse files.

    Note that AllocationSize may be the same as GetCompressedFileSize returns, in which case FILE_STANDARD_INFO is one stop shop.

  7. Antonio Rodríguez says:

    Wouldn’t it be simpler to just return number of clusters used times cluster size? It would cover the three mentioned cases and many more (like “how many space would I free if I delete this file?”, as mentioned by pc).

    Apple’s SOS, released in 1980, implemented a relatively advanced file system which allowed sparse files and alternate data streams (both ProDOS and GS/OS built on this functionality), but reported every block (SOS’ and ProDOS’ term for “cluster”) used by the file, including the index and superindex blocks. Thus, you always knew both the exact length of the file in bytes, and how much space it took in disk (both numbers were reported in a directory listing).

  8. MarcK4096 says:

    Size on Disk is also “influenced” by Windows Deduplication. In the current release, it’s a good marker of if the file has been processed by the dedup engine. If it has been, then Size on Disk will be very obviously artificially low. Some of the files I am checking now are actually showing “0 bytes”.

    1. ender says:

      I’ve found one file manager that shows all files that have been deduped as “Symlink”. I don’t know what API it’s using to get that, and interestingly, the behaviour only happens in v2 of the program – v3 shows file sizes like you’d expect.

  9. James Curran says:

    Interesting that you posted about this, as just a couple weeks earlier, I came upon this discuss:
    http://www.uglyhedgehog.com/t-380798-1.html

    Note that Ugly Hedgehog is a photography forum, so most members are either Mac users or Windows non-Power users.

    The problem the poster reports is a 4TB NAS with 900GB of files (probably mostly jpgs) which reports a “Size on Disk” of 24TB!

Comments are closed.

Skip to main content