Why do BackupRead and BackupWrite require synchronous file handles?


The Backup­Read and Backup­Write functions require that the handle you provide by synchronous. (In other words, that they not be opened with FILE_FLAG_OVERLAPPED.)

A customer submitted the following question:

We have been using asynchronous file handles with the Backup­Read. Every so often, the call to Backup­Read will fail, but we discovered that as a workaround, we can just retry the operation, and it will succeed the second time. This solution has been working for years.

Lately, we've been seeing crash when trying to back up files, and the stack traces in the crash dumps appear to be corrupted. The issue appears to happen only on certain networks, and the problem goes away if we switch to a synchronous handle.

Do you have any insight into this issue? Why were the Backup­Read and Backup­Write functions designed to require synchronous handles?

The Backup­Read and Backup­Write functions have historically issued I/O against the handles provided on the assumption that they are synchronous. As we saw a while ago, doing so against an asynchronous handle means that you're playing a risky game: If the I/O completes synchronously, then nobody gets hurt, but if the I/O goes asynchronous, then the temporary OVERLAPPED structure on the stack will be updated by the kernel when the I/O completes, which could very well be after the function that created it has already returned. The result: A stack smash. (Related: Looking at the world through kernel-colored glasses.)

This oversight in the code (blindly assuming that the handle is a synchronous handle) was not detected until 10 years after the API was originally designed and implemented. During that time, backup applications managed to develop very tight dependencies on the undocumented behavior of the backup functions. The backup folks tried fixing the bug but found that it ended up introducing massive compatibility issues. On top of that, there was no real business case for extending the Backup­Read and Backup­Write functions to accept asynchronous handles.

As a result, there was no practical reason for changing the function's behavior. Instead, the requirement that the handle be synchronous was added to the documentation, along with additional text explaining that if you pass an asynchronous handle, you will get "subtle errors that are very difficult to debug."

In other words, the requirement that the handles be synchronous exists for backward compatibility.

Comments (24)
  1. Mark says:

    People managed to write applications that depend on memory corruption to work right?  Impressive.

  2. Joshua says:

    @Mark: See "defeating debuggers by stack necromancy" for a dangerous sensitive technique that depends on such things doing exactly what they are defined to do.

  3. Maurits says:

    If I understand correctly:

    The successful return value when incorrectly passed an asynchronous handle exists for backward compatibility.

    The inability to behave correctly when passed an asynchronous handle exists because there is no good business reason to fix it.

  4. Cesar says:

    The best way to deal with this would be to return an error code if an attempt is made to call these functions with asynchronous file handles (fail-fast instead of undefined behavior). Unfortunately, I can guess there are programs which work by accident while passing asynchronous file handles, so this cannot be done since it would break these (already broken in theory, working most of the time in practice) programs.

  5. Gabe says:

    Cesar: How do you know that a a file handle is asynchronous?

  6. Do backup handles still require SE_BACKUP_PRIVILEGE and SE_RESTORE_PRIVILEGE, even for working on an user's files only?

  7. Can NtQueryInformationFile be called from user mode? It allows to get the mode of the handle.

  8. Joshua says:

    @alegrl: Yes. LoadModule("ntdll") GetProcAddress(hNtDll, "NtQueryInformationFile")

    I can't find documentation for the particular call that would check though.

    Even if you do find it, you're walking on very thin documentation here.

  9. Killer{R} says:

    2Joshua its not a problem here. Is MS's code, they use undocumented functions more that documented ;)

    There'is no problem to 'detect' how file handle was opened with NtQueryInformationFile(…FileModeInformation…) – its even documented (fo a kernel mode.. but who cares) – msdn.microsoft.com/…/ff545809(v=vs.85).aspx

    But I think checking this will be redundant in normal. There're pretty many of ways to shot your leg, its only one of them. But of cuz better to check all this when app is being debugged (just like CloseHandle hrows exceptio on bad handles only when under debugger) or when running on checked Windows build.

    [Well, except that you need a time machine. FileModeInformation did not exist at the time BackupRead was written. -Raymond]
  10. Eric TF Bat says:

    Raymond, you could replace all your explanations of Windows oddities with "we did that for backwards compatibility" and you'd be 100% right, though a damn sight less entertaining.  

  11. Anonymous Coward says:

    Joshua, I have run into issues like that during legitimate debugging and it was very frustrating since it caused a bug that happened without debugger attached to go away completely when the application was debugged.

    I think a good debugging API should guarantee that the debuggee will behave as much as possible if it is not currently being debugged. Of course there are things that cannot be avoided because of hardware limitations: a breakpoint for example changes the value of a byte of program code, so if you don't want a debugger placing a breakpoint in a piece of code, just use it as an encryption key. But that shouldn't affect legitimate debugging.

  12. asdbsd says:

    This is bad. If even you guys at Microsoft who are well known for testing everything though and through (no sarcasm here) have managed to not notice the "what if it's asynchronous" case for ten years, how do you expect a common code monkey to mind that case? Hand-waving it with "you are going to get bugs" in the docs is just not enough. I don't know the specifics, but I think in this case the backup folks really should have tried to do everything in their power to support async handles.

  13. Does the Win64 APIs still have this bug? If so, why? There was no backwards compatibility story for 64-bit. Everyone had to recompile.

    But really, can't BackupRead and BackupWrite be modified to detect an async handle and scream as loudly as possible in an event/system log?

    In the cases of using BackupRead/Write on an async handle, can it lead to data loss? I'd figure fixing any data loss (or corruption) issues would be paramount.

    [Changing the behavior in Win64 would be the worst kind of breaking change: The silent breaking change not detected at compile time. -Raymond]
  14. Maurits says:

    Certainly introducing a breaking change in the backup world is the worst kind of pain. If backup is broken, that leads to all kinds of user hurt.

    #ifdef x86

    // synchronous file handles only, please

    BackupRead(…);

    BackupWrite(…);

    #endif

    // works with asynchronous file handles

    // (and synchronous ones too, of course)

    BackupRead2(…);

    BackupWrite2(…);

  15. ErikF says:

    @Rosyna: Win64 may be a new API, but it shares the same functions *and semantics* as Win32 (which had the same issue with Win16.) If you're talking about a brand-new set of functions, then sure, the sky's the limit. But for existing functions, you're stuck with what's already there. It's like a language standard: innovations are fine as long as you keep currently-existing programs running with as few changes as possible (that's probably the reason why gets() existed for so long in the standard even though it's an evil, evil function!)

    Regarding the event log, it's great for programs, but useless for APIs because all that would happen is the system/application log would get spammed with meaningless garbage for anyone but the programmer. The logs already have way too much of that as it is, IMHO; having more low-level entries that you can't turn off seems like a non-starter to me.

  16. John says:

    I'm a bit curious as to what kind of business case would validate fixing the APIs to work with asynchronous handles.  It is just a couple popular programs that crash semi-regularly?  I suppose the necessity is lessened due to the nondeterministic nature of the problem (I guess it works correctly more often than not).

  17. First name, last name or both? says:

    I understand you wouldn't want the function fail if you pass an asynchronous handle due to backwards compatibility. But why not just make it work deterministically by always waiting for the write to finish?

    BackupWrite(filehandle, …)

    {

     WriteFile(filehandle, …);

     if (filehandle is async) WaitForWriteToEnd(…);

    }

    [I thought I explained that in the article. "The backup folks tried fixing the bug but found that it ended up introducing massive compatibility issues." -Raymond]
  18. DrPizz says:

    > [I thought I explained that in the article. "The backup folks tried fixing the bug but found that it ended up introducing massive compatibility issues." -Raymond]

    I think the problem is that we are having a hard time imagining just what those issues might be.

  19. QQ says:

    > I think the problem is that we are having a hard time imagining just what those issues might be.

    Maybe they used it like:

    // write stuff

    BackupWrite(some async handle);

    // wait for write to finish

    WaitForMAGIC();

    And MAGIC is something that only happened due to the erroneous use of BackupWrite.

    Am I the only one uncomfortable with the fact that the backup software I might trust my data with is written so poorly that it depends on obscure bugs?

  20. exiledbear says:

    Why can't you add a line to check if the handle is synchronous to begin with and return an error if it isn't? That would prevent people from being lazy and passing whatever they felt like passing, and reduce the number of derp calls you get.

  21. j b says:

    exiledbear,

    You suggest that a programs that (usually) didn't fail previously now shall fail consistently.

    That is not my idea of "backwards compatibility". :-)

  22. Joshua says:

    Hmmm. BackupRead and BackupWrite when called with an async handle sometimes finish sync and sometimes async, so any caller calling async who managed to deal with the stack corruption must be capable of receiving a sync complete.

    This makes for a solution.

    If the underlying call returns ERROR_IO_PENDING, immediately call WaitForSingleObject(handle, INFINITE), followed by GetOverlappedResult(), which has the effect of converting the async call to sync.

    [Unless the application was relying on the async behavior. For example, calling Wait­For­Single­Object on the handle blocks APCs, including the oplock release request which is preventing the Backup­Read from completing! -Raymond]
  23. Matt says:

    Can't you do something like:

    T RequiresSyncHandle(…, HANDLE hHandle, …)

    {

     if(HandleIsAsync(hHandle)) // defined out of NtQueryInformationFile or whatever

     {

       HANDLE syncEquivilent = OpenSameFileButThisTimeAsSync(hHandle); // implementation left as an exercise for the reader:

       T result = RequiresSyncHandle(…, hSyncEquivilent, …);

       CloseHandle(syncEquivilent);

       return result;

     }

     Hooray();

    }

    [See the answer I gave some moments ago. -Raymond]
  24. Matt says:

    Also, the other way to fix the problem is to put the OVERLAPPED structure somewhere other than the stack in the case where you detect that the handle is asyncronous (e.g. malloc and then deliberately leak the OVERLAPPED structure so that when the kernel destroys it, nobody cares).

Comments are closed.