Flushing your performance down the drain, that is


Some time ago, Larry Osterman discussed the severe performance consequences of flushing the registry, which is a specific case of the more general performance catch: Flushing anything will cost you dearly.

A while back, I discussed the high cost of the “commit” function, and all the flush-type operations turn into a commit at the end of the day. FlushViewOfFile, [see correction below] FlushFileBuffers, RegFlushKey, they all wait until the data has been confirmed written to the disk. If you perform one of these explicit flush operations, you aren’t letting the disk cache do its job. These types of operations are necessary only if you’re trying to maintain transactional integrity. If you’re just flushing the data because “Well, I’m finished so I want to make sure it gets written out,” then you’re just wasting your (and the user’s) time. The data will get written out, don’t worry. Only if there is a power failure in the next two seconds will the data fail to get written out, but that’s hardly a new problem for your program. If the power went out in the middle of the call to FlushFileBuffers (say, after it wrote out the data containing the new index but before it wrote out the data the index points to), you would’ve gotten partially-written data anyway. If you’re not doing transactional work, then your call to FlushFileBuffers didn’t actually fix anything. You still have a window during which inconsistency exists on the disk.

Conclusion: View any call to FlushViewOfFile, [see correction below] FlushFileBuffers, and RegFlushKey with great suspicion. They will kill your program’s performance, and even in the cases in which you actually would want to call it, there are better ways of doing it nowadays.

More remarks on that old TechNet article: The text for the Enable advanced performance check box has been changed in Windows 7 to something that more accurately describes what it does: Turn off Windows write-cache buffer flushing on the device. There’s even explanatory text that explains the conditions under which it would be appropriate to enable that setting:

To prevent data loss, do not select this check box unless the device has a separate power supply that allows the device to flush its buffer in case of power failure.

Hard drives nowadays are more than just platters of magnetic media. There’s also RAM on the hard drive circuit board, and this RAM is used by the hard drive firmware as yet another buffer. If the drive is told, “Write this data to the hard drive at this location,” the drive copies the data into its private RAM buffer and immediately returns a successful completion code to the operating system. The drive then goes about seeking the head, looking for the sector, and physically writing out the data.

When your program issues a write command to the file system (assuming that file system buffering is enabled), the write goes into the operating system disk cache, and periodically, the data from the operating system disk cache is flushed to the hard drive. As we saw above, the hard drive lies to the operating system and says “Yeah, I wrote it,” even though it hasn’t really done it yet. The data the operating system requested to be written is just sitting in a RAM buffer on the hard drive, that in turn gets flushed out to the physical medium by the hard drive firmware.

If you call one of the FlushBlahBlah functions, Windows flushes out its disk cache buffers to the hard drive, as you would expect. But as we saw above, this only pushes the data into the RAM buffer on the hard drive. Windows understands this and follows up with another command to the hard drive, “Hey, I know you’re one of those sneaky hard drives with an internal RAM buffer. Yes, I’m talking to you; don’t act all innocent like. So do me a favor, and flush out your internal RAM buffers too, and let me know when that’s done.” This extra “I know what you did last summer” step ensures that the data really is on physical storage, and the FlushBlahBlah call waits until the “Okay, I finished flushing my internal RAM buffer” signal from the hard drive before returning control to your program.

This extra “flush out your internal RAM buffer too” command is the right thing to do, but it can safely be skipped under very special circumstances: Consider a hard drive with a power supply separate from the computer which can keep the drive running long enough to flush out its internal RAM, even in the event of a sudden total loss of external power. For example, it might be an external drive with a separate power supply that is hooked up to a UPS. If you have this very special type of set-up, then Windows doesn’t need to issue the “please flush out your internal RAM buffers too” command, because you have a guarantee that the data will make it to the disk no matter what happens in the future. Even if a transformer box explodes, cutting off all power to your building, that hard drive has enough residual power to get the data from the internal RAM buffer onto the physical medium. Only if your hard drive has that type of set-up is it safe to turn on the Turn off Windows write-cache buffer flushing on the device check box.

(Note that a laptop computer battery does not count as a guarantee that the hard drive will have enough residual power to flush its RAM buffer to physical media. You might accidentally eject the battery out of your laptop, or you might let your battery run down completely. In these cases, the hard drive will not have a chance to finish flushing its internal RAM buffer.)

Of course, if the integrity of your disks is not important then go ahead and turn the setting on even though you don’t have a battery backup. One case where this may be applicable is if you have a dedicated hard drive you don’t care about losing if the power goes out. Many developers on the Windows team devote an entire hard drive to holding the files generated by a build of the operating system. Before starting a build, they reformat the drive. If the power goes out during a build, they’ll just reformat the drive and kick off another build. In this case, go ahead and check the box that says Enable advanced performance. But if you care about the files on the drive, you shouldn’t check the box unless you have that backup power supply.

Comments (30)
  1. Marquess says:

    That last link is disconcerting. That guy is a Microsoft employee, right? And he casually recommends to turn on a potentially dangerous setting without explaining the risks?

    That explains a lot.

    [I think you're reading too much into the identity of the person's employer. It's not like he's providing official technical support. It's just a blog. There's a lot of bad information in blogs. Including this one. -Raymond]
  2. Maurits says:

    The transactional sword cuts both ways.  If you're writing a database engine, then you may want to be really-super-positive that the data hit the disk before you report success back up the chain, so you would want to flush as far down as you could.

    On the other hand, when I was setting up an SMTP anti-spam gateway, my permanent data store was the downstream SMTP server; so, far from flushing disk writes, I actually mounted my mail spool directory in RAM to increase performance.  Power loss at any point would just result in a dropped SMTP connection, with no consummation of the transaction; in this case transactionality allows me to flush less.

  3. Someone You Know says:

    I find myself wondering if there are hard drives out there that ignore that command to flush their internal RAM, and lie to the operating system that they've done it. I can't immediately think of why the drive would want to do that, but it seems like the sort of thing we hear about on this blog occasionally — like that video device that lied about which graphics capabilities it supported.

  4. Adrian says:

    I used to work on hard drive firmware.  Some hard drives lie even when you send the "and flush your RAM cache, too" message.  If you have one of these drives, and you're trying to be transactional, you're wasting your effort.

  5. Arthur Strutzenberg says:

    What would happen if the machine in question was a virtual machine(hyperV base)?  (and the "drive" is a vhd file)?  

  6. Mystic says:

    What about improper shutdown due to the system hanging? This happens to each and every computer every once in a year or so. Should it be enabled then?

    [Not sure what you're asking about here. If the system hangs, then there's plenty of time for those buffers to get flushed out since the hard drive still has power. -Raymond]
  7. ERock says:

    @Adrian:

    Would you happen to be able to provide a source that names names and lists offenders? This disturbs me greatly… I would hope for the price overhead of enterprise hard drives that they follow the Operating System / RAID controller's orders.

  8. Aram Hăvărneanu says:

    ~Someone You Know: "I find myself wondering if there are hard drives out there that ignore that command to flush their internal RAM, and lie to the operating system that they've done it."

    A lot of consumer-grade drives do this, it's one of the reasons Apple dropped ZFS support in Snow Leopard. ZFS require the flush command to truly do what it claims it does in order to guarantee transactional consistency (okay, all file systems require this, but because of ZFS architecture this problem was much more severe). A lot of consumer grade hard drives or USB memory sticks fail to implement this command properly.

    For server-grade hardware this problem doesn't not exist, therefore SUN/Oracle can safely use ZFS on their servers.

  9. Alexandre Grigoriev says:

    In the old Windows98 days, there was a problem with IDE hard drives from Three Letter Name company, that even warranted a hotfix from Microsoft. The drives in question didn't flush their cache before system powerdown soon enough. I suspect they ignored the flush command altogether.

  10. Anon says:

    "FlushViewOfFile, FlushFileBuffers, RegFlushKey, they all wait until the data has been confirmed written to the disk."

    The MSDN docs for FlushViewOfFile appear to explicitly state that it does _not_ wait:

    "Flushing a range of a mapped view initiates writing of dirty pages within that range to the disk. Dirty pages are those whose contents have changed since the file view was mapped. The FlushViewOfFile  function does not flush the file metadata, and it does not wait to return until the changes are flushed from the underlying hardware disk cache and physically written to disk. To flush all the dirty pages plus the metadata for the file and ensure that they are physically written to disk, call FlushViewOfFile and then call the FlushFileBuffers function."

    [Thanks for the correction. Fixed. -Raymond]
  11. Lars says:

    Does the Windows Logo device driver certification process ensure that hard drives don't lie about flushing their RAM cache?

  12. Retro says:

    They removed Raymond's wikipedia entry! :o

  13. James Day says:

    @Aram, you're too optimistic. A few years back I had the dubious pleasure of making Slashdot when two brands of battery-backed RAID controller, one SATA, one SCSI, both failed to have hard drive write buffering off. One used to turn it off but removed the feature because it slowed them down. Expect vendors to cheat and lie about this and test it yourself with real power cycling to prove the data is there.

    For data center people, even with a UPS and generator, it's usually going to be wrong to disable flushing. That's because someone, someday, is sure to hit the emergency power off button and that's generally required by fire codes to cut power immediately, Even UPS and generator power. Or a UPS service engineer working on your nice redundant UPS system will turn off the live feed instead of the one being serviced and you'll have an outage even with an otherwise working UPS. Or one of your power feeds will have a failure and you'll find your second one responds to the surge by failing as well. That's both of your redundant power supplies without power. Sad.

    Count on losing power at the wrong moment. But maybe I'm just an overly jaded database person…

    Retro, one person had already decided that it should be kept but a second person deleted it without further discussion, breaching Wikipedia policies (and hiding what he'd done from anyone who wasn't an administrator there) along the way. It may come back sometime, though I assume that Raymond still has a preference that it doesn't. Raymond is inconveniently close to the border between being sufficiently and insufficiently notable.

  14. Poochner says:

    @James Day, or the batteries in the UPS just go bad, and it drops even while mains power is on…

    probably didn't help that I kept my coffee warmer on it, though.

  15. Martin Langhoff says:

    Raymond (or some other competent Windows programmer) — can you provide some info on "even in the cases in which you actually would want to call [fsync()], there are better ways of doing it nowadays".

    In modern Windows-land, how do you get cost-effective atomicity?

    In Linux-land, people who write something like a database use fsync() on their transaction log (hence you need to put that txlog on a separate disk for good performance. People wanting atomicity on file writes/updates write to a tmpfile (in the same fs as the destination file) and then rename/mv the file to it's final name.

    Actually, according to POSIX you must write();close();fsync();rename() for the promise to hold, and if that mountpoint has lots of dirty buffers the sync() will be costly. However, ext3 has forever made write();close();rename() atomic (so you'd get the old file, or the new file, but never a corrupt file) without incurring the heavy fsync() costs. This has turned out to be so efficient and practical that everyone uses it (often without realizing). Without being POSIX, it's become what a Linux FS is expected to do (see ext4, BtrFS).

    (In case it isn't obvious, this is a Linux programmer asking about Windows low-cost-atomicity tricks. TONT is a great read for someone who's last seen Windows >10 years ago but is curious as to whether there's intelligent life in that planet. Thanks Raymond!)

  16. Maurits says:

    I'm just an overly jaded database person

    It is not possible for a database person to be overly jaded.  It is, perhaps, theoretically possible, though I have never seen it, for a database person to one day become /sufficiently/ jaded.

  17. AndyC says:

    @Martin Langhoff, I'd presume Raymond is suggesting you probably want to use Transactional NTFS or, at the very least, some other service built on top of the kernel transaction manager.

  18. MItaly says:

    [ In the old Windows98 days, there was a problem with IDE hard drives from Three Letter Name company, that even warranted a hotfix from Microsoft. The drives in question didn't flush their cache before system powerdown soon enough. I suspect they ignored the flush command altogether. ]

    Uhu, I perfectly remember it; the fix just waited few seconds before actually powering off the machine.

  19. Doug says:

    @Aneurin Price

    As you mentioned, ZFS does things transactionally. If the disk lies about the data for transaction #9's commit record reaching the disk surface, ZFS might start writing metadata from transaction #10 before the commit record for transaction #9 is actually saved. If power fails, recovery will involve rolling back transaction #9 (since it isn't recorded as committed), but the data from transaction #10 is not rolled back because according to the transaction logs, transaction #10 never even started. The result is file system corruption. CRC checking won't help. The problem is in the transaction system itself, so that won't help either. Copy-on-write applies to the data, but the error occured in the filesystem metadata, which is not copy-on-write. ZFS does some very advanced things, but these operations depend heavily on the accuracy of the transaction history data.

    NTFS suffers from the same problem. Out-of-order writes can cause the filesystem transaction history to be invalid, making it possible to corrupt filesystem metadata.

  20. Aneurin Price says:

    @Aram Hăvărneanu

    That sounds highly implausible. ZFS is actually *more* resilient to this kind of problem than most filesystems because it a) is a copy-on-write fs and b) checksums blocks. In the case that the drive lied the checksum will be incorrect and it can roll back to the previous transaction whose checksum validates (which will probably be a few seconds earlier).

  21. Cheong says:

    @Dong: Really? Thanks for the information.

    I've always thought that the inode location data (which transaction #10 writes to) only would be updated with the transaction record, so such thing shouldn't happen. A "insert"-"update inode map"-"mark as update completed" transaction record would have protected failure from that scenario.

  22. DWalker59 says:

    I *do* care about the files on my server's drives, and the server is connected to a UPS.  With regular battery replacement, the UPS hasn't failed in 8 years.  We also have database backups.  The server never hangs either; it's mostly a file server for about 12 users, with a couple of database applications on it.

    So I feel comfortable turning on that "super-advanced performance" checkbox.  I don't like the description "Enabling buggy behavior in Windows Server 2003" at technet.microsoft.com/…/2007.04.windowsconfidential.aspx — I don't think it's quite fair.

  23. Mystic says:

    You mwan there's time to flush out the buffers even if a BSOD or complete freeze due to some hardware failing or memory corruption occurs?

  24. ender says:

    Mystic: we're talking about drive's internal cache here, so yes, there is time, because even in the case of BSOD, the drive is still powered on.

  25. Karl says:

    Lots of drives actually lie about this, even high-end SCSI drives. It doesn't matter if you use a RAID card with a battery backup — the drive will lie to the RAID card as well.

    Luckily, there's a tool to see if your disks are lying: brad.livejournal.com/2116715.html (see brad.livejournal.com/2094221.html for some background… this caused a major outage for LiveJournal after an EPO shutoff at Fisher Plaza).

  26. Steve Thresher says:

    Hi Raymond,

    Like Martin, I'd also love to hear about the 'better ways of doing it nowadays' as we currently use FlushFileBuffers in an attempt to guarantee disk writes.

  27. rs says:

    This is related to the previous comment. Isn't FlushFileBuffers the proper way of checking whether a file has been successfully written? If you do not call FlushFileBuffers before finally closing a file, how can you avoid the following scenario:

    (1) WriteFile places the data into the cache and returns a nonzero value.

    (2) The program closes the file. The data is still not yet written to the disk, as described in msdn.microsoft.com/…/aa364451(v=VS.85).aspx

    (3) The connection to the device is lost before the data is written. (Think of a network drive or a USB memory stick.)

    (4) The program has no chance of realizing this. When it exits, the data is lost.

    If, on the other hand, you call FlushFileBuffers before closing the file, and it returns zero, you know that something went wrong, and you can let the user specify another file for saving the data.

  28. Knut says:

    A colleague of me happend to have a stroy with ups.

    He was at the site for maintenance of a redundant system. They wanted to update the UPS at that site. So he made a shutdown on the backup server and told the electricians, they can work on the UPS for that server. Unfortunately someone had renewed the labels of the UPS and labeled them wrong. It's only a fifty-fifty chance. So the master server was taken down by cutting the power. The server was up an running again 15 Minutes later and had no important data loss.

    I learned from this not to rely on UPS. *** happens !

  29. Worf says:

    One of my projects involved a Linux device providing mass storage over USB. We`had performance problems, but noted that Windows had tripled performance or so if it didn't do the Force Unit Access bit set ("Optimize for quick removal"), which let Linux actually handle caching.

    Because there's no way to have Windows set those bits automatically, we coded the mass storage driver on pur device to ignore the bit. It was safe – our device had battery so there was plenty of time to flush the Linux caches (it did so every 5 seonds…).

  30. benjamin says:

    This whole discussion (keeping in mind Adrian's comment, especially) reminds me of Raymond's article about the video card that claimed to support every DirectX call ever made. We've got a function call that flushes to disk and the disk that says 'Okay, flushed.' So then we needed a function that said 'No really, flush your stuff' and disks that can potentially lie. So then we'll need a new function that says "Seriously though, flush your buffers."

    I'd suspect at that point the function would just be a fairly lengthy NOP loop.

    Maybe when super-capacitors become reliable enough and miniaturized HD manufacturers can stick one on their drives and have a disk that keeps going for a minute or two after power's lost.

Comments are closed.