How do FILE_FLAG_SEQUENTIAL_SCAN and FILE_FLAG_RANDOM_ACCESS affect how the operating system treats my file?


There are two flags you can pass to the Create­File function to provide hints regarding your program's file access pattern. What happens if you pass either of them, or neither?

Note that the following description is not contractual. It's just an explanation of the current heuristics (where "current" means "Windows 7"). These heuristics have changed at each version of Windows, so consider this information as a tip to help you choose an appropriate access pattern flag in your program, not a guarantee that the cache manager will behave in a specific way if you do a specific thing.

If you pass the FILE_FLAG_SEQUENTIAL_SCAN flag, then the cache manager alters its behavior in two ways: First, the amount of prefetch is doubled compared to what it would have been if you hadn't passed the flag. Second, the cache manager marks as available for re-use those cache pages which lie entirely behind the current file pointer (assuming there are no other applications using the file). After all, by saying that you are accessing the file sequentially, you're promising that the file pointer will always move forward.

At the opposite extreme is FILE_FLAG_RANDOM_ACCESS. In the random access case, the cache manager performs no prefetching, and it does not aggressively evict pages that lie behind the file pointer. Those pages (as well as the pages that lie ahead of the file pointer which you already read from or wrote to) will age out of the cache according to the usual most-recently-used policy, which means that heavy random reads against a file will not pollute the cache (the new pages will replace the old ones).

In between is the case where you pass neither flag.

If you pass neither flag, then the cache manager tries to detect your program's file access pattern. This is where things get weird.

If you issue a read that begins where the previous read left off, then the cache manager performs some prefetching, but not as much as if you had passed FILE_FLAG_SEQUENTIAL_SCAN. If sequential access is detected, then pages behind the file pointer are also evicted from the cache. If you issue around six reads in a row, each of which begins where the previous one left off, then the cache manager switches to FILE_FLAG_SEQUENTIAL_SCAN behavior for your file, but once you issue a read that no longer begins where the previous read left off, the cache manager revokes your temporary FILE_FLAG_SEQUENTIAL_SCAN status.

If your reads are not sequential, but they still follow a pattern where the file offset changes by the same amount between each operation (for example, you seek to position 100,000 and read some data, then seek to position 150,000 and read some data, then seek to position 200,000 and read some data), then the cache manager will use that pattern to predict the next read. In the above example, the cache manager will predict that your next read will begin at position 250,000. (This prediction works for decreasing offsets, too!) As with auto-detected sequential scans, the prediction stops as soon as you break the pattern.

Since people like charts, here's a summary of the above in tabular form:

Access pattern Prefetch Evict-behind
Explicit random No No
Explicit sequential Yes (2×) Yes
Autodetected sequential Yes Yes
Autodetected very sequential Yes (2×) Yes
Autodetected linear Yes ?
None No ?

There are some question marks in the above table where I'm not sure exactly what the answer is.

Note: These cache hints apply only if you use Read­File (or moral equivalents). Memory-mapped file access does not go through the cache manager, and consequently these cache hints have no effect.

Comments (39)
  1. Aaron.E says:

    So, if you've triggered the "Autodetected very sequential" condition and then break the pattern, presumably the size of the prefetch gets halved.  What happens to the cached pages that were already prefetched?  Are those immediately marked as 'available for re-use', or do they stick around and just get gradually aged-out?

  2. Patrick Huizinga says:

    So what happens if you pass both the random and the sequential flags?

  3. Nick Lowe says:

    The cache manager, presumably, will not cache for things that are impossible against a file handle, so it is another justification for always requesting only the access rights you need and nothing more. Many people request over broad access rights, because it's just simpler to think about things that way. (This mentality led to many of the LUA bugs that we see today.)

    @Patrick Huizinga, semantically they are mutually exclusive, so I would have presumed the call to CreateFile will fail. The documentation does not explicitly state this, perhaps because it's just such an obvious contradiction.

    The two flags map to NtCreateFile's FILE_SEQUENTIAL_ONLY and FILE_RANDOM_ACCESS flags. The documentation there also does not state they are incompatible with each other, again, perhaps because it is just so obvious.

    It may be that there is some compatibility behaviour that where both are specified, it ignores both flags and leaves it to the system's heuristics. Why don't you try it and see?

  4. Nick Lowe says:

    @Joshua Yes, definitely, and that's implicit – but there is a huge difference between curiosity / discovering an edge case behaviour and then going and depending on it in a 'real' application.

    The only way you could find both flags being set in such an application, if it even works, would be where the programmer was under a profound cloud of confusion.

  5. Adam Rosenfield says:

    I just tried calling CreateFile with FILE_FLAG_SEQUENTIAL_SCAN|FILE_FLAG_RANDOM_ACCESS, and I was surprised to see that CreateFile succeeded, instead of failing with ERROR_INVALID_PARAMETER. So I think Patrick has a valid question in how the file data is cached/prefetched.

    [I assume you're asking purely out of curiosity? Because you can't rely on any of this stuff, as I highlighted at the top of the article. I suspect "something weird" happens in terms of caching and prefetch, since the cache manager probably doesn't really expect you to set both. (While the behavior may be suboptimal, it will still be correct, since these flags are purely advisory.) -Raymond]
  6. Nick Lowe says:

    I am certainly curious in that, prima facie, there appears to be poor validation of input here. A mutually exclusive set of flags is accepted without ERROR_INVALID_PARAMETER being returned – I agree that what happens behind the scene, considering they are accepted, is a private implementation detail – but it would be nice to know why the call does not fail and what happens currently.

  7. 640k says:

    How does these hints cope with alternate data streams?

  8. Sean McGeough says:

    @640k I would be very surprised if it's not cached independently on a per-data stream basis.

  9. Falcon says:

    I would actually not expect CreateFile to fail if both flags are passed. These flags are caching hints only; they do not affect the ability to perform operations on the file and they do not mandate any particular behaviour. Similarly, if you pass FILE_FLAG_SEQUENTIAL_SCAN and then break the sequential access pattern, the operation should not fail, even though the caching behaviour may be suboptimal. According to MSDN: "However, correct operation is still guaranteed."

    The documentation states that they should not be used together, not that it "must not" be done.

    I agree, though, that any experimental results regarding the effects of these flags should not be relied on. I'm sure there are better ways to deliberately slow down your I/O if that's what you really want!

  10. Sean McGeough says:

    @Falcon At the layer behind CreateFile, NtCreateFile, the flag is FILE_SEQUENTIAL_ONLY – you're telling the cache manager, as the documentation says, "All accesses to the file are sequential.".

    The other flag is FILE_RANDOM_ACCESS, you're telling the cache manager there that "Accesses to the file can be random, so no sequential read-ahead operations should be performed on the file by FSDs or the system.".

    While correct operation is guaranteed if the application tells the cache manager that it's going to use the handle for sequential I/O and then doesn't, that doesn't mean that a contradictory set of flags should be accepted in the call to CreateFile – it's impossible to satisfy both.

  11. Tim says:

    @640: ADS' are distinct streams. You open each data stream individually with a call to CreateFile with a distinct set of parameters. What else is there to cope with?

  12. alegr1 says:

    Do these flags affect (formerly) dirty page eviction after they've been flushed, in case of a file open for writing?

  13. alegr1 says:

    Re: both flags specified

    Since these flags are advisory only, the only requirement is that if both are specified, it should not cause any pathological or unusual behavior of the cache manager. One possible valid behavior is that both would be just ignored. After all, it's a valid behavior to just ignore any of them at any time, even if only one is specified.

    Also, if such a flag is specified with FILE_FLAG_NO_BUFFERING, it would be ignored. It might be NOT ignored, if FILE_FLAG_NO_BUFFERING doesn't apply to the target device, but I don't know for sure.

  14. Anonymous Coward says:

    What I would like to see is some statistical data on which flags software actually pass and which flags would have yielded best performance.

  15. "What I would like to see is some statistical data on which flags software actually pass and which flags would have yielded best performance."

    Good idea! Please post the results of your test.

  16. hacksoncode says:

    Clearly, the ability to specify both flags is a forward looking design for when quantum computing becomes mainstream.

    Then the cache manager will both pre-fetch and not pre-fetch, as well as both evicting behind and not evicting behind (it might even evict ahead just in case :-).

  17. NT says:

    The use of the word "promising" near the end of the second paragraph is curious.  Surely the "not a contract" nature of these flags works both ways, and passing FILE_FLAG_SEQUENTIAL_SCAN need not actually be a promise as to the way you'll use the file.

    [You are correct. Poor wording on my part. "Indicating" would have been a better choice. -Raymond]
  18. Ian says:

    How do these flags play in with CopyFileEx and the COPY_FILE_NO_BUFFERING flag? I'm assuming CopyFileEx has internal logic around using FILE_FLAG_SEQUENTIAL_SCAN, but I've never really been sure on how the COPY_FILE_NO_BUFFERING flag affects things.

  19. Gabe says:

    NT: Raymond used the term "promise" in the politician's sense of the word. That is, an agreement which will only be kept when convenient.

  20. Joshua says:

    @Nick Lowe, Patrick Huizinga: "Why don't you try it and see?"

    I would recommend *not* depending on the results of that experiment.

  21. cheong00 says:

    On the other hand, the documentation of CreateFile2() in Windows 8 developer preview msdn.microsoft.com/…/hh449422(v=vs.85).aspx explicitly tell you not to combine FILE_FLAG_RANDOM_ACCESS with FILE_FLAG_SEQUENTIAL_SCAN. So perheps the documentation folks are aware of it afterall.

  22. Nick Lowe says:

    @cheong00 – Good find!

    "Some of these flags should not be combined. For instance, combining FILE_FLAG_RANDOM_ACCESS with FILE_FLAG_SEQUENTIAL_SCAN is self-defeating."

    This implies that the system ignores both when they are combined as they nullify each other.

  23. Dave says:

    I wonder how the cacheing strategy could chance in the future in the presence of SSDs?  Performing prefetching of larger blocks on the assumption that a small seek will end up in prefetched data is OK for mechanical storage, but when seeks are free there might be more optimal cacheing strategies.  Or does the current cache manager optimise for data reads and not bother too much about potential latency issues?

  24. Neil says:

    This reminds me of the read behaviour of WordPerfect, which apparently typically started at the end of the file and worked backwards. Hopefully this would be autodetected as linear with decreasing offsets.

  25. 640k says:

    @Dave: Ram disks have been around forever. On those seeks are even more free. On those I assume you should *not* hint at anything, because the cache manager is probably too stupid to take this into consideration. And maybe disable buffering also.

  26. AndyCadley says:

    @Neil: There was a Channel 9 video on the Cache Manager a few years back and I'm pretty sure they mentioned that backwards sequential access would also be detected and optimised for. Naturally though there's no guarantee that behaviour (or any other) still applies.

  27. Tony Mach says:

    And then I read the Note at the end of this post…

  28. Alex G says:

    By the way, CreateFile2 has very important feature:

    "exclusive access to a file or directory is only granted if the Metro style app has write access to the file". THis makes sure an unprivileged application is not able to block access to a system file. I'd love to have this extended to regular CreateFile.

  29. Nick Lowe says:

    Requiring the caller to have the ability to be granted the FILE_WRITE_DATA access right, or to have the SE_RESTORE_NAME privilege held, to be able to take an exclusive lock of some sort seems very sensible. It should not, obviously, require FILE_WRITE_DATA to be explicitly asked for in the desired access parameter though.

    Unless I am missing something, which is very possible!, I cannot see why it should not be applied to CreateFile et al.

  30. Nick Lowe says:

    Additionally, I would have thought to permit a lock for deletion / renaming, the caller should have the ability to be granted the DELETE standard access right or the call should fail.

  31. Engywuck says:

    If I interpret this article correctly *not* passing any flags should generally yield best results, because the "linear hopping" case is only available that way. So unless you are absolutely sure you have sequential (or random) access not passing any parameter only has a slight performance penalty at the beginning plus the benefit of getting best results even when your program behaves differently than your expectations.

  32. Joseph Koss says:

    I do not believe there to be many cases where you wouldnt know for certain that a file will be sequentially accessed. Either you are processing the whole file in one pass, or you aren't.

  33. GregM says:

    Joseph, what if you open a file and then pass it to a third-party library to read from it?  You now have no idea whether the file is processed in one pass or not, unless that library tells you what it plans to do.

  34. dave says:

    If your program behaves differently than/from/to(*) your expectations with respect to whether it's reading a file sequentially or not, I suspect you have bigger problems to worry about.

    (*) probably a case of "two countries separated by a common language"

  35. Joseph Koss says:

    GregM, are you suggesting that you do that often enough for it it be called 'many cases'?

  36. GregM says:

    Are you suggesting that the cases where I do that aren't significant?  What exactly was the point of your comment?

  37. Joseph Koss says:

    GregM, I'm suggesting that you made a counter-point (that had already been made many times already) to a point never made within my post. 'What ifs' are not 'many cases' but since you now even refuse to entertain that notion, have a nice day.

  38. GregM says:

    Joseph, you said "Either you are processing the whole file in one pass, or you aren't."  The counterpoint was that there is a third option, "you don't know".  Another possible option is "you are currently doing it, but it may change in the future."  If that wasn't the point of your comment, what was?  Obviously if you know you are doing sequential, you pass it.  If you know that you aren't, then you don't.  Were you trying to say something beyond that, because that point had already been made too?

  39. Stefan Kuhr says:

    Do these two flags only give hints to the cache manager of the local machine when calling CreateFile for a file on a local hard disk or do they also get transmitted to a remote machine when calling CreateFile for a file on a remote share so the cache manager of the remote box can optimize its behaviour for a remote client?

Comments are closed.