When should I use the FIND_FIRST_EX_LARGE_FETCH flag to FindFirstFileEx?

Windows 7 introduces a new flag to the Find­First­File­Ex function called FIND_FIRST_EX_LARGE_FETCH. The documentation says that it "uses a larger buffer for directory queries, which can increase performance of the find operation." This is classic MSDN-style normative documentation: It provides "just the facts". Far be it for MSDN to tell you how to write your application; the job of function-level documentation is to document the function. If you want advice, go see a therapist.

If the reason why you're calling Find­First­File­Ex is to enumerate through the entire directory and look at every entry, then a large buffer is a good thing because it reduces the number of round trips to the underlying medium. If the underlying medium is a network drive halfway around the world, the latency will be high, and reducing the number of calls reduces the overall cost of communication. Another case where you have high latency is if you are enumerating from an optical drive, since those tend to be slow to cough up data, and once you get the medium spinning, you want to get all the information you can before the drive spins the medium back down. On the other hand, if your underlying medium has low latency, then there isn't much benefit to using a large buffer, and it can be a detriment if the channel is low bandwidth, because transferring that large buffer will take a long time, which can result in long pauses on your UI thread.

But what if you aren't enumerating with the purpose of reading the entire contents but rather are going to abandon the enumeration once you get the answer to your question? For example, maybe your function wants to enumerate the directory to see if it contains more than ten files. Once the tenth call to Find­Next­File succeeds, you're going to abandon the enumeration. In this case, a large buffer means that the underlying medium is going to do work that you will end up throwing away.

Here's the above discussion summarized in a table, since people seem to like tables so much.

Enumerating entire directory on UI thread No¹
on background thread Yes
Abandoning enumeration prematurely No

¹Actually, if you're on a UI thread, you should try to avoid any directory enumeration at all.

Comments (14)
  1. alegr1 says:

    You're suffering of premature optimization here. Unless the buffer is VERY BIG, and contains a thousand entries, it might be worth to just always use LARGE_FETCH.

    And please, anybody fix the painfully slow Explorer directory loading in \tsclient. And also stop using ethemeral IDs in the back history in Explorer for \tsclient. All these kernel optimizations with large fetch ain't worth nothing when the application is negating the performance gains.

  2. Preemptive rant: AAH! M$DN SUX EVEN RAYMOND SAYS IT1!!!!11ONE1FRIST

    [Note that I didn't say it was good or bad. I just said that it was classic normative documentation. The classic MSDN style is for reference material to be normative and to put the non-normative content in Overviews. -Raymond]
  3. Joshua says:

    To fix \tsclient, stop opening files to determine their icons.

  4. Paul Z says:

    + massive bonus points for the Eliza link :D

  5. cheong00 says:

    For a record, Eliza can't give advice on when to use FIND_FIRST_EX_LARGE_FETCH.

    I suspect the her log maintainer will see unsuspected increase in appearence of this keyword. :P

  6. Anonymous says:

    Does a larger buffer really mean fewer round trips? I always assumed this parameter was just to avoid syscall overhead. I don't know how SMB works but I could imagine a world in which the driver always queried the whole result, and this buffer mechanism is just the way to get it into user mode. (Fun fact: the buffer in question here is a parameter to NtQueryDirectoryFile. I am pretty sure this flag just gives that function a bigger buffer.)

    [The requested buffer size goes out to the I/O manager, which dumps the request on the wire. See MaxDataCount. -Raymond]
  7. MRDucks says:

    I've done a fair amount of testing (on Windows 8 dev machine) with FIND_FIRST_EX_LARGE_FETCH and simply cannot discern any performance improvement when using it. For example, a recursive directory walk against a wireless NAS seems to perform identically whether or not FIND_FIRST_EX_LARGE_FETCH is used. Same result for local file systems, remote NTFS shares, etc.

    Can't imagine what I might be missing here. Have tried rebooting between each test (to clear any potential caching being done), and completely isolating the network paths involved (e.g. there is no traffic contention, etc.) to improve consistency of the results.

    Nothing I try seems to result in a measurable performance difference. What am I missing?

    [Move your NAS to another continent. -Raymond]
  8. RaceProUK says:

    @MRDucks: This is pure speculation, but it's plausible Win8 can silently apply FIND_FIRST_EX_LARGE_FETCH when its asked to query a NAS.

  9. [Move your NAS to another continent. -Raymond]

    These benchmarks are starting to get horribly expensive! :-/

  10. 640k says:

    Leaky abstractions at its best.


    "Although network file systems like NFS and SMB let one treat files on remote machines as if they were local, the connection to the remote machine may slow down or break, and the file stops acting as if it was local."

  11. Legolas says:

    Whenever I encounter a new flag to an existing call, I always wonder how I should treat the version before it was implemented. If I pass this flag on XP, what happens? Should I set up code to pass it only from windows 7 on (that's what I end up doing out of precaution)?

  12. Csaboka says:

    The only safe solution is not to pass the flag for OS versions that don't know what it means. If the API function you are calling is written defensively, it will reject the unknown flag (i.e. return with an invalid parameter error). You cannot rely on how a given function happens to be implemented, as it may change between OS versions, or when a new patch is installed.

  13. mmarkwitzz says:

    I can confirm the flag does make a difference. I had an application which read the whole disk tree and read some props on each file. It runs much faster with a large buffer.

  14. MRDucks says:

    @mmarkwitzz Would love to hear more details about your tests. Was this a local or network file system? Did you reboot between tests to avoid caching? What time differences did you encounter?

Comments are closed.