Changing the conditions under which ReadFile produces fewer bytes than requested


In response to an article on hierarchical storage management, Karellen suggests that the problem could be ameliorated by having the hierarchical storage manager keep the first 4KB of the file online, thereby allowing programs that sniff the start of the file for metadata to continue operating without triggering a recall. "The way that file read operations tend to work (fread, read, and ReadFile), if an application opens a file and requests a large read, just returning the first 4KB is a valid response."

Premature short reads may technically be a valid response, but it won't be the correct response.

When your program reads from a file, do you retry partial reads? Be honest.

Suppose you want to read a 32-bit value from a file. You probably write this.

 uint32_t value;

 DWORD bytesRead;
 if (ReadFile(file, &value, sizeof(value),
               &bytesRead, nullptr) &&
               bytesRead == sizeof(value)) {
   // Got the value - use it...
 }

You probably don't write this:

 uint32_t value;
 BYTE *nextRead = reinterpret_cast<BYTE*>&value;
 DWORD bytesRemaining = sizeof(value);
 while (bytesRemaining) {
   DWORD bytesRead;
   if (!ReadFile(file, &value, bytesRemaining,
                 &bytesRead, nullptr)) return false;
   if (bytesRead == 0) break; // avoid infinite loop
   bytesRemaining -= bytesRead;
   nextRead += bytesRead;
 }

 if (bytesRemaining == 0) {
  // Got the value - use it...
 }

Most programs assume that a short read from a disk file indicates that the end of the file has been reached, or some other error has occurred. Consider, for example, this file parser:

struct CONTOSOFILEHEADER
{
  uint32_t magic;
  uint32_t version;
};

bool IsContosoFile(HANDLE file)
{
 CONTOSOFILEHEADER header;
 DWORD bytesRead;
 if (!ReadFile(file, &header, sizeof(header),
               &bytesRead, nullptr)) {
   // Couldn't read the file - assume not a Contoso file.
   return false;
 }

 if (bytesRead != sizeof(header)) {
   // File doesn't hold a header - not a Contoso file.
   return false;
 }

 if (header.magic != CONTOSO_MAGIC) {
  // Does not start with magic number - not a Contoso file.
  return false;
 }

 if (header.version != CONTOSO_VERSION_1 &&
     header.version != CONTOSO_VERSION_2) {
  // Unsupported version - not a Contoso file.
  return false;
 }

 // Passed basic tests.
 return true;
}

The problem is even worse if you use fread, because fread does not provide information on how to resume a partial read. It reports only the total number of items read in full; you get no information about how much progress was made in the items that were read only in part.

 // Read 10 32-bit integers.
 uint32_t flags[10];
 auto itemsRead = fread(flags, sizeof(uint32_t), 10, fp);
 if (itemsRead < 10) {
   if (!feof(fp) && !ferror(fp)) {
     // At this point, we have a short read.
     // We are now screwed.
   }
 }

Since nobody is actually prepared for a short read to occur on disk files anywhere other than the end of the file, you shouldn't introduce a new failure mode that nobody can handle.

Because they won't handle it.

And recall that the original question was in the context of displaying a file in a folder. Even if you know that Hierarchical Storage Management is not involved, you still have to deal with the cost of opening the file at all. If the folder is on a remote server where each I/O operation has 500ms of latency, then enumerating the contents of a directory with 1000 files will take over eight minutes. I suspect the user will have lost patience by then.

Comments (24)
  1. Joshua says:

    [When your program reads from a file, do you retry partial reads?]

    Yeah.

    [4 bytes ... Do you write ...]

    I wrote it once and now I write ReadBlock(h, buf, 0, buflen) which contains the retry-partial loop.

    [because fread does not provide information on how to resume a partial read.]

    fread() should do the retry-partial for you.

  2. Adam Rosenfield says:

    GNU's libc does retry partial reads inside fread().  The code is horrible to read and hard to follow, but check out the loop in _IO_default_xsgetn() github.com/.../genops.c .

  3. donx says:

    Why isn't the semantic of "short read means EOF" made a contract for ReadFile()? The function could internally implement the loop and all users would get what they (incorrectly) expected anyway. I'm amazed that somebody actually thought at some time that allowing for the possibility of short reads was a good idea (for such a widely used function).

    [If the handle is not a file, then short reads mean something else. (A short read on a pipe or a console does not mean "end of file"; it just means "no more data available right now (but try again later)".) -Raymond]
  4. Adam Rosenfield says:

    Err, actually, the real function in glibc which services fread for file streams is _IO_file_xsgetn(), not _IO_default_xwgetn(), but that too uses a loop which properly handles short reads: github.com/.../fileops.c .

  5. Josiah Worcester says:

    Perhaps more worthwhile is the C spec for fread, which clearly states that fread only does partial reads on errors or EOF.

  6. Adam M. says:

    I always handle partial reads, with a helper function if I need the full data right now. But you may be right; laziness is widespread. Partial reads are useful in the implementation of diverse stream types (network streams, decoding/translating streams, etc.), reducing blocking and often simplifying the code, but eliminating them would cause few problems, and result in a simpler API that's harder to misuse. And I think shifting the coding burden to the API implementor and away from the users is pretty much always the right approach.

    So I agree with how ReadFile /should/ work, but given how it actually works, I don't see a problem with returning only 4K on the first read. It may "educate" users about the possibility of partial reads, and prompt them to update the rest of their code to handle them.

  7. Yuri Khan says:

    I was bitten not long ago by a decompression filter that yielded short reads under some conditions. Had to change the condition from “cbRead < cbRequested” to “cbRead == 0”.

  8. alegr1 says:

    Offline files is such a red herring when the Explorer just isn't prepared to handle less marginal cases well. You open a directory on \tsclientc, and instead of doing quick read of the whole directory, showing it with generic view (no extracted icons) and then extracting the icons in background, you have to wait until it shows the files and directories one by one. You didn't care about those icons, you maybe just wanted to go one directory deeper, but you have to wait anyway.

  9. Karellen says:

    @Josiah: TIL that partial reads for fread() *only* occur on error or EOF. Thankyou!

    All the documentation (i.e. man pages) I've seen before for fread() always stated that fread() could read "up to" nitems items, and that it would return a short count on error or EOF. However, I'd never seen anything before to say that fread() could not return a short count under other circumstances.

    Given that read(2) explicitly *can* return a short count on non-error, non-EOF conditions, and that fread(3) will use read(2) under the hood on many platforms, I never before felt safe assuming that fread(3) might make any more guarantees than read(2) did.

    I still don't have access to a (draft) C standard, so it's conceivable that the C standard is less strict than POSIX, but at least according to The Open Group: "fread() shall return the number of elements successfully read which is less than nitems *only if* a read error or end-of-file is encountered."[0] (emphasis mine)

    Now I can go and delete a load of unnecessarily-defensive code in a bunch of my projects! Negative LOC days for the win!

    [0] pubs.opengroup.org/.../fread.html

  10. kolbyjack says:

    There's an error in your retry loop, it uses &value for lpBuffer instead of nextRead

  11. Buster says:

    I'm going to stick my neck out and say that it is contractual that a synchronous ReadFile on a non-pipe doesn't return until the number of bytes requested has been read unless an error occurs. The documentation says so, at the start of the Remarks section.

  12. Joshua says:

    @Buster: Unfortunately it is not true of network shares, so nobody should be depending on it. Don't ask me how I know this.

  13. Ancient_Hacker says:

    I once consulted with a big company that was getting read errors.  I looked over their code and everywhere they assumed when they asked for a packet form a server, that they got one packet, not a partial packet or multiple packets, back into their buffer.  I could not convince them that TCP/IP didn't guarantee the same rhythmm and pacing at the receiving end as at the sending end.   I suspect if you're reading a file through a network share you might get the data in something other than disk-sized chunks under some conditions.

  14. clintp says:

    Back in the day I did QA for 1990's Very Large Data Storage Company and they did offline storage by doing pretty much this.  Keeping the first disk block (4k, 8k, whatever) on local storage, and the rest *potentially* offline.  This let things like unix's file(1) still do their job peeking at metadata.

    The trick was that everything was accessed via NFS.  NFS v2 would do a hard, uninterruptable wait for pending data.  fread() went somewhere off into kernel-space and just didn't come back until the server (the file storage unit) responded.  Please don't kill the process either, because that's how you got zombies.

  15. trivee says:

    Karellen actually suggested two different things:

    - caching the first 4 kbytes of archived file in the "stub" to enable small file header parsing without triggering a recall, and

    - if the first read request is for more than 4 kbytes, returning the cached 4 kbytes

    The second behavior would certainly create problems as discussed here. But this doesn't in any way invalidate the first idea. If the first few kilobytes of the file are available online, and an application requests less than that amount - the read request can be satisfied immediately and efficiently without any issues. Only longer reads would still require a recall. The application issuing a small read followed by a large read would experience a delay on the second one, but most applications should handle that, no?

  16. Alex Cohn says:

    > you still have to deal with the cost of opening the file at all. If the folder is on a remote server where each I/O operation has 500ms of latency, then enumerating the contents of a directory with 1000 files will take over eight minutes.

    I beg to differ. You only need async read loope for them. Then, the 500ms latency will only delay the first response. The callbacks delivering data for the next file will be arriving at the pace limited by bandwidth, not by round trip time. The further improvement could be achieved if the async read could be started as wildcard, essentially "file \externalshare*" or "file \externalshare*.jpg".

  17. Freddie says:

    Speaking of partial reads it is interesting that on Mac OS X (and at least FreeBSD) the man page notes that:

    "Upon successful completion, read(), readv(), and pread() return the number of bytes actually read and placed in the buffer.  The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case."

  18. Josiah Worcester says:

    @Karellen: The POSIX and C specs use the same language here. I can also confirm that every C implementation I've dealt with has implemented that, FWIW. :)

  19. Cesar says:

    > When your program reads from a file, do you retry partial reads? Be honest.

    Yes, I do. And I even helped add a method to Rust's standard library to make it easier.

    But I think most commenters are missing one crucial detail: the sort of person who religiously reads Raymond's blog is also the sort of person who religiously retries short reads. A lot of programmers are not the sort of person who religiously reads Raymond's blog.

  20. Adam Rosenfield says:

    @Freddie: Good to know, I wasn't aware of that, though of course there are lots of types file system paths which are not ordinary files (FIFOs, sockets, files from mounted NFS, AFS, or FUSE file systems, etc.).

    @Cesar: Haha, very true.

  21. Someone says:

    @Adam Rosenfield: "there are lots of types file system paths which are not ordinary files (FIFOs, sockets, files from mounted NFS, AFS, or FUSE file systems, etc.)."

    Mounted files *are* normal files in this respect, because the application does access them completely transparently, without any knowledge about the driver(s) used by the operating systems. So, this Mac OS X statement must cover *all* regular files, but excludes stream-like communication devices like pipes, sockets, serial ports, raw USB access etc.

  22. Someone says:

    Joshua "Unfortunately it is not true of network shares, so nobody should be depending on it. Don't ask me how I know this."

    Can you explain a specific scenario where this can be observed?

  23. Joshua says:

    Here is a scenario fitting to this blog: The maximum size of a request to ReadFile is 32 megabytes. Windows 95 runs well on 16mb RAM. A 32mb read (from say an XP box with 256mb RAM) from a share on the W95 box cannot be satisfied at once, yielding a short read. Unless MS has gone back into the workstation SMB service and put the while loop for short reads there, there are other scenarios like these.

  24. Someone says:

    @Joshua: Interesting. Strange bug, especially because I would expect the transfer to be broken down to 64 KB (or smaller) blocks anyway: technet.microsoft.com/.../cc938632.aspx

Comments are closed.

Skip to main content