On the various ways of creating large files in NTFS


For whatever reason, you may want to create a large file.

The most basic way of doing this is to use Set­File­Pointer to move the pointer to a large position into the file (that doesn't exist yet), then use Set­End­Of­File to extend the file to that size. This file has disk space assigned to it, but NTFS doesn't actually fill the bytes with zero yet. It will do that lazily on demand. If you intend to write to the file sequentially, then that lazy extension will not typically be noticeable because it can be combined with the normal writing process (and possibly even optimized out). On the other hand, if you jump ahead and write to a point far past the previous high water mark, you may find that your single-byte write lasts forever.

Another option is to make the file sparse. I refer you to the remarks I made some time ago on the pros and cons of this technique. One thing to note is that when a file is sparse, the virtual-zero parts do not have physical disk space assigned to them. Consequently, it's possible for a Write­File into a previously virtual-zero section of the file may fail with an ERROR_DISK_QUOTA_EXCEEDED error.

Yet another option is to use the Set­File­Valid­Data function. This tells NTFS to go grab some physical disk space, assign it to the file, and to set the "I already zero-initialized all the bytes up to this point" value to the file size. This means that the bytes in the file will contain uninitialized garbage, and it also poses a security risk, because somebody can stumble across data that used to belong to another user. That's why Set­File­Valid­Data requires administrator privileges.

From the command line, you can use the fsutil file setvaliddata command to accomplish the same thing.

Bonus chatter: The documentation for Set­End­Of­File says, "If the file is extended, the contents of the file between the old end of the file and the new end of the file are not defined." But I just said that it will be filled with zero on demand. Who is right?

The formal definition of the Set­End­Of­File function is that the extended content is undefined. However, NTFS will ensure that you never see anybody else's leftover data, for security reasons. (Assuming you're not intentionally bypassing the security by using Set­File­Valid­Data.)

Other file systems, however, may choose to behave differently.

For example, in Windows 95, the extended content is not zeroed out. You will get random uninitialized junk that happens to be whatever was lying around on the disk at the time.

If you know that the file system you are using is being hosted on a system running some version of Windows NT (and that the authors of the file system passed their Common Criteria security review), then you can assume that the extra bytes are zero. But if there's a chance that the file is on a computer running Windows for Workgroups or Windows 95, then you need to worry about those extra bytes. (And if the file system is hosted on a computer running a non-Windows operating system, then you'll have to check the documentation for that operating system to see whether it guarantees zeroes when files are extended.)

[Raymond is currently away; this message was pre-recorded.]

Comments (12)
  1. anonymouscommenter says:

    Would be amusing if it filled w/ ones on some future system.

  2. anonymouscommenter says:

    you both are. "Filled with zeros" is a valid for state for "Any state".

  3. Dan Bugglin says:

    Joshua: If we see a future hard drive or filesystem design where it is cheaper to initialize with 1s instead of 0s, maybe.

    But then they will probably just swap 1s and 0s so we still have 0s by default.

  4. anonymouscommenter says:

    @The MAZZTer: Welcome to the future.  NAND flash bits are erased to a 1 state, and SSDs are made out of NAND flash.  Though since an SSD does not present a raw view of the underlying flash, I suppose the controller would return zero'd pages for sectors that are erased, rather than returning the raw pages.

  5. Antonio 'Grijan' says:

    Statistically, 0x00 is a lot more frequent than 0xFF. Sections of files or filesystems filled by 0x00 are far more common than sections filled by 0xFF (or any other value or pattern). If in doubt, write a short program that counts the occurrences of each byte value in a given file, and run it on a set of randomly chosen files from your hard drive :-) . That's why a "zero fill" API or control call makes much more sense than a "fill with arbitrary value" one. The 65c02/65816 processors even had an STZ (STore Zero) instruction that let you zero a memory location in just one step.

  6. anonymouscommenter says:

    On Linux, there are 2 apis to create/grow files (other than just writing lots of 0s). Both will return 0 for unwritten data (of course).

    'truncate' creates sparse files (that is, write() can fail with an out of space error).

    'fallocate' reserves disk space, but still doesn't physically write to the blocks. There is metadata for each disk block for whether to pretend it is all zeros on read. NTFS should get an api like this :).

  7. anonymouscommenter says:

    Before sparse files came about in NTFS (around since Win2k), you could still make files larger than the filesystem by using file compression (around since NT3.51). It's not nearly as flexible (it's a lot easier to end up with 0s still writen to disk), but I was able to get 4GB of files on a 1GB disk to test what happens.

    It turns out the DIR had some very bad routine for attempting to do 64-bit math, while Explorer used 32-bit math for adding file sizes.

  8. cheong00 says:

    Are the "zero-fill" functionity exist on ReFS then? I can formulate good enquiry to search for this information.

    (I have to ask this because while FAT may be a thing of past for a lot of people, ReFS is not. And it's choosen to NOT support a number of NTFS features. So we need clarification here.)

  9. Kevin says:

    @cheong00:

    "FAT may be a thing of past for a lot of people"

    *laughs hysterically*

    If you want a filesystem that works on just about every Turing-complete I/O-capable system in the world, some variety of FAT is pretty much your only option.  It won't be dying any time soon.

  10. cheong00 says:

    @Kevin: Due to the file size limit and ineffective storage on large volumes CONs of FAT32, I think it would be replaced by exFAT some time later.

    If the storage have to be reformatted by Windows based system later, they had better support exFAT in the beginning.

  11. anonymouscommenter says:

    cheong: I believe licensing requirements will keep exFAT from ever being nearly as popular outside of Windows as FAT32 is.

  12. anonymouscommenter says:

    I have always thought that what was meant by describing the contents as not defined was that, in circumstances where security does not demand clearing the space, it might not be cleared. For example, if you reduce the file size and then increase it again (without closing it) you might see the old contents rather than zero bytes.

Comments are closed.

Skip to main content