Why does my synchronous overlapped ReadFile return FALSE when the end of the file is reached?


A customer reported that the behavior of Read­File was not what they were expecting.

We have a synchronous file handle (not created with FILE_FLAG_OVERLAPPED), but we issue reads against it with an OVERLAPPED structure. We find that when we read past the end of the file, the Read­File returns FALSE even though the documentation says it should return TRUE.

They were kind enough to include a simple program that demonstrates the problem.

#include <windows.h>

int __cdecl wmain(int, wchar_t **)
{
 // Create a zero-length file. This succeeds.
 HANDLE h = CreateFileW(L"test", GENERIC_READ | GENERIC_WRITE,
               0, nullptr, CREATE_ALWAYS,
               FILE_ATTRIBUTE_NORMAL, nullptr);

 // Read past EOF.
 char buffer[10];
 DWORD cb;
 OVERLAPPED o = { 0 };
 ReadFile(h, buffer, 10, &cb, &o); // returns FALSE
 GetLastError(); // returns ERROR_HANDLE_EOF

 return 0;
}

The customer quoted this section from The documentation for Read­File:

Considerations for working with synchronous file handles:

  • If lpOverlapped is NULL, the read operation starts at the current file position and Read­File does not return until the oepration is complete, and the system updates the file pointer before Read­File returns.

  • If lpOverlapped is not NULL, the read operation starts at the offset that is specified in the OVERLAPPED structure and Read­File does not return until the read operation is complete. The system updates the OVERLAPPED offset before Read­File returns.

  • When a synchronous read operation reads the end of a file, Read­File returns TRUE and sets *lpNumberOfBytesRead to zero.

and then added

According to the third bullet point, the Read­File should return TRUE, but in practice it returns FALSE and the error code is ERROR_HANDLE_EOF.

The problem here is that there are two concepts here, and they confusingly both use the word synchronous.

  • A synchronous file handle is a handle opened without FILE_FLAG_OVERLAPPED. All I/O to a synchronous file handle is serialized and synchronous.

  • A synchronous I/O operation is an I/O issued with lpOverlapped == NULL.

The sample program issues an asynchronous read against a synchronous handle. The third bullet point applies only to synchronous reads.

To reduce confusion, the documentation would have been clearer if it hadn't switched terminology midstream.

  • If lpOverlapped is NULL, the read operation starts at the current file position and Read­File does not return until the oepration is complete, and the system updates the file pointer before Read­File returns.

  • If lpOverlapped is not NULL, the read operation starts at the offset that is specified in the OVERLAPPED structure and Read­File does not return until the read operation is complete. The system updates the OVERLAPPED offset before Read­File returns.

  • If lpOverlapped is NULL and the read operation reads the end of a file, Read­File returns TRUE and sets *lpNumberOfBytesRead to zero.

We asked what the customer was doing that caused them to trip over this confusion in the documentation.

The customer's original code opened a file (synchronously) and read from it (synchronously). The customer is parallelizing the computation in a way that will read that single file from multiple threads. A single file pointer is therefore not suitable, because different threads will want to read from different positions.

One idea would be to have each thread call Create­File so that each handle has its own file position. Unfortunately, this won't work for the customer because the sharing mode on the file handle denies read sharing.

The solution they came up with was to open the file synchronously (without FILE_FLAG_OVERLAPPED) but to read asynchronously (by using an OVERLAPPED structure). The OVERLAPPED structure lets you specify where you want to read from, so multiple threads can issue reads against the file position they want.

This solution works, but the customer is concerned because this hybrid model is not well-documented in MSDN. They found a blog entry that discusses it, but even that blog entry does not discuss what happens in the multithreaded case.) In particular, they are seeing that the end-of-file behavior acts according to asynchronous rather than synchronous rules.

Any advice you have on how we can pursue this model would be appreciated. Another concern is that since we do not set the hEvent in the OVERLAPPED structure, the file handle itself is used as the signal that I/O has completed, and this will cause problems if multiple I/O's are active simultaneously.

The problem is that the customer confused the two senses of synchronous, one when applied to files and one when applied to I/O operations. Since they opened a synchronous file handle, all I/O operations are serialized and execute synchronously. Passing an OVERLAPPED structure issues an asynchronous I/O, but since the underlying handle is synchronous, the I/O is serialized and synchronous. The customer's code therefore is not actually performing I/O asynchronously; its requests for asynchronous I/O is overridden by the fact that the underlying handle is synchronous.

The hybrid model doesn't actually realize any gains of asynchronous I/O. The use of the OVERLAPPED structure merely provides the convenience of combining the seek and read operations into a single call. Since the benefit is rather meager, the hybrid model is not commonly used, and consequently it is not covered in depth in the documentation. (The facts are still there, but there is relatively little discussion and elaboration.)

Based on this feedback, the customer considered switching to using an asynchronous file handle and setting the hEvent in the OVERLAPPED structure so that each thread can wait for its specific I/O to complete. In the end, however, they decided to stick with the hybrid model because switching to an asynchronous handle was too disruptive to their code base. They are satisfied with the OVERLAPPED technique that lets them perform the equivalent of an atomic Set­File­Pointer + Read­File (even if the I/O is synchronous and serialized).

Comments (16)
  1. Brian_EE says:

    You should contact the author of that other blog entry and let him know it cause some people some confusion.

  2. Andrew says:

    More post-mortems like this one please.

  3. anonymous1 says:

    Would duplicating the handle for each thread work better (each thread will have its own synchronous handle so no serialization between threads)?

    [Duplicating the handle doesn't help, since all the duplicates refer to the same file object and therefore share the same file position. -Raymond]
  4. Mordachai says:

    I would imagine separate file handles - one per thread - would work okay so long as all are readers, and all were opened in overlapped mode (async).  Otherwise the OS has to serialize everything (sync handle), and if you add a writer to the mix you've got sync/race conditions no matter how you slice it (what each thread will perceive is not predictable, depending on the actual order things resolve in at run time).

  5. Adam Rosenfield says:

    Why not just open separate handles in each thread with FILE_SHARE_READ?  Or did the customer intentionally want to deny read sharing to other processes?

    I've seen a lot of cases where programs open file handles with FILE_SHARE_NONE when they could have safely allowed at least FILE_SHARE_READ with no problems.  This is usually the result of opening files with either C's fopen() or C++'s [i]fstream, which by default use FILE_SHARE_NONE under the hood in the call to CreateFile; it takes extra effort from the developers, especially if it's a cross-platform code base, to call one of the Windows-specific variants that allows them to specify a sharing mode.

  6. Gabe says:

    I would put more emphasis on the use of this technique to avoid the race condition, whereby multiple threads seek and then read simultaneously. It's a race because the seek pointer is global to the file object.

    The only hint that this technique solves a real issue is the mention of "atomic" in the last sentence.

  7. Joshua says:

    Ref: Adam Rosenfield: Yeah that's been my big complaint too. Lots of programs have the wrong sharing because MS picked a really dumb default when writing the standard libraries.

    "r" should have been FILE_SHARE_READ | FILE_SHARE_DELETE

    "a", "a+" should have been FILE_SHARE_READ | FILE_SHARE_DELETE

    "w", "r+", "w+" should have been FILE_SHARE_DELETE

    Deleting a file open for writing is almost always equivalent to deleting it right after it was closed, especially on Windows where the name sticks around anyway.

    [You're assuming that the Research division has that time machine working. FILE_SHARE_DELETE did not exist when the standard libraries were written. -Raymond]
  8. Not only did FILE_SHARE_DELETE not exist when the standard libraries were written, versions of Windows prior to its introduction don't ignore it: the open fails!  So you mustn't set that bit if you are concerned about compatibility with older versions of Windows.

  9. Joshua says:

    [You're assuming that the Research division has that time machine working. FILE_SHARE_DELETE did not exist when the standard libraries were written. -Raymond]

    Which doesn't mean it couldn't have been added to the standard library as soon as FILE_SHARE_DELETE existed (with a windows version check). Even now it would fix more problems than cause.

    [So you are silently changing the behavior of apps written with the old version of the standard library? That seems risky. -Raymond]
  10. Cesar says:

    The problem is that portable programs (which are written using open/fopen/iostream) expect that files are opened for sharing (which is the behavior in other operating systems).

    Unfortunately, non-portable programs written for DOS (but also written using open/fopen/iostream/etc) expect that files are NOT opened for sharing. Windows inherited that behavior.

    For programmers coming from other operating systems, Windows' behavior is a pain; I have many times seen they do a "retry loop" waiting for the antivirus (or whatever the culprit is) to release the lock on the file so they can open/delete/rename it.

    Getting back on topic (sort of), the portable way to solve the seek+read/write problem is pread/pwrite; do these functions exist on Windows?

    [I find it interesting the mindset that lets you write the sentence "Portable programs expect behavior that is not portable (not required by the standard)." -Raymond]
  11. Is the simultaneous use of the same synchronous file handle in multiple threads actually supported?  I've always vaguely assumed that it wasn't, and don't recall ever seeing anything on the subject in MSDN.

  12. cheong00 says:

    s/oepration/operation/ :P

  13. Joshua says:

    [So you are silently changing the behavior of apps written with the old version of the standard library? That seems risky. -Raymond]

    I've already done a limited form of the experiment. The only program I broke was Adobe PDF and that turned out to be an artifact of the test (it spawned a worker process with a different security level). I could do it system-wide w/ an appinit DLL but appinit DLLs need to die.

    A minifilter would catch too much; however I wonder if that's actually safe all the same. I could prove it was safe if files w/ pending deletes could still be opened.

  14. Neil says:

    (Aren't these posts queued up months in advance? How long does it take to get the MSDN documentation updated to be less confusing? I think it's confusing that passing in an OVERLAPPED structure subtly changes that behaviour. Where's my time machine?)

  15. D-Coder says:

    @Neil "Where's my time machine?"

    I'm going to steal it a month ago, that's why you didn't see it next week.

  16. Ben Voigt says:

    I think it's important to observe that the hybrid mode needs to conform to the contract of async, because programs written for async (opening using FILE_FLAG_OVERLAPPED) may end up in hybrid mode without warning, depending on the file-like object being opened.  Some drivers always complete operations synchronously.

Comments are closed.

Skip to main content