When you think you found a problem with a function, make sure you're actually calling the function


On an internal mailing list, a tester asked if there were any known problems with the Find­First­File­Ex function preventing a directory from being deleted and recreated.

Our code creates a test folder, then calls Find­First­File­Ex to look inside the test folder. When we're done, we call Find­Close, then delete the directory. When we try running the test twice, the second time fails to create the test folder; we get ERROR_ACCESS_DENIED. But if we switch to Find­First­File instead of Find­First­File­Ex, then everything works as expected.

Here's our code, simplified.

// Assume all functions succeed except where indicated.

CreateDirectory(L"C:\\Test", NULL);

// This version works:
//
// WIN32_FIND_DATA data;
// HANDLE hFindFile = FindFirstFile(L"C:\\Test\\*", &data);

// This version doesn't:
//
WIN32_FIND_DATA data;
HANDLE hFindFile = FindFirstFileEx(L"C:\\Test\\*",
                                   FileExInfoBasic,
                                   &data,
                                   FindExSearchNameMatch,
                                   NULL,
                                   0);
FindClose(hFindFile);

RemoveDirectory(L"C:\\Test");

// If we used FindFirstFile, then this CreateDirectory succeeds.
// If we used FindFirstFileEx, then this CreateDirectory fails.
CreateDirectory(L"C:\\Test", NULL);

I suggested that they try running their test with anti-malware software disabled. Anti-malware software will frequently intrude on file operations, and it could be that the virus scanner is still checking the old C:\Test directory when you get around to creating the new one. Content indexers are another case where this can happen, but content indexers tend to wait until the machine is quiet rather than introducing on actions as they occur. (Now, well-written virus scanners and content indexers know to do things like abandon a file scan when a delete request is made, or use opportunistic locks to get out of the way when an application wants to do something with a file being scanned. But not all virus scanners and content indexers as as well-written as we might like.)

We later heard back that they figured out the problem, and it wasn't because of a virus scanner or content indexing service.

The problem was that their code was running inside a test harness, and that test harness had mocked the Find­First­File and Find­Close functions, but it did not mock the Find­First­File­Ex function. When the mock Find­Close function was given a handle created by the real Find­First­File­Ex function, it got confused and ended up leaking the directory handle. The Remove­Directory function succeeded, but the directory was not fully removed due to the outstanding handle, and the attempt to recreate the directory therefore failed.

The tester also confirmed that the problem did not exist when they ran the code outside the test environment.

When you think you found a problem with a function, make sure you're actually calling the function. In this case, the code was running under nonstandard conditions: The test harness had redirected a bunch of OS functions. As a result, when the code called Find­Close, it wasn't actually calling Find­Close but rather a mock function provided by the test harness.

To be fair, the tester was new to the team and was likely not even aware that the test harness was mocking file I/O functions in the first place.

If you are having trouble with a function, one thing to check is that you're actually calling the function.

Comments (18)
  1. anonymouscommenter says:

    Ah yes the I/O test harness. Great for reliably causing all kinds of errors. Great for confounding newbies.

  2. anonymouscommenter says:

    Of course in this case, the user thought the problem was in FindFirstFileEx, and they were indeed calling that function. They didn't realize that the problem was in FindClose, which was not being called.

  3. John Ludlow says:

    I have frequently seen things like this while looking at installations and quickly installing an application, then uninstalling it and installing it again. This wasn't AV software because the machine in question does not have AV software installed. It wasn't a test harness which mocks things either. But the outward symptoms were very similar

    Raymond will probably correct me, but my theory in those cases was that there was a lag between a directory or file being deleted and its permissions being updated. I think that if you're quick enough, you can get in that gap and recreate the directory before the filesystem removes its permissions.

    When that happens, if you open the file or folder properties in Explorer, you will see there's no owner and no permissions for anyone. A reboot (or maybe it was me running chkdsk since in these situations I normally run a chkdsk /f and then reboot) seems to fix this. I imagine there's some check either in the boot or in chkdsk which detects these things and resets the owner and permissions to be the same as the parent directory.

  4. IanBoyd says:

    I've seen the code for the technique of using opportunistic locks to release my locks on a file if someone else wants to use it. What is the technique for the other feature mentioned; "abandon a file scan when a delete request is made"? How do I detect that a delete is request is made?

    Finally, is it possible to open a file for reading without taking any locks? For example:

       CreateFile(filename, GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, ...);

    Will that not allow any other readers, writers, or deleters to read, write, or delete, the file I have open?

    And if not, why not?

    [Run the linked program and then from another command prompt, delete the file. Oh look, the program closes the handle. -Raymond]
  5. @IanBoyd: Regarding your CreateFile question, the only thing I can think of is that even if a user "deletes" a file that you've marked as FILE_SHARE_DELETE, Windows won't actually delete it until all of the open handles on it are closed, so you run into issues if a user attempts to create a new file with the same name as the one they deleted.  It would be nice if Windows interpreted the FILE_SHARE_DELETE flag as saying "I don't care if this file goes away, you can ignore my handle in terms of deleting the file", but I'm guessing that support for that logic is somewhat non-trivial.  Maybe they'll add it in an iteration of Windows 10 or something.

  6. anonymouscommenter says:

    @MNGoldenEagle: That would break back-compat.  I'm sure there are at least a few applications which open a file as FILE_SHARE_DELETE, cache the handle somewhere, then try to reuse it when asked for a file by the same name.  If it were possible to recreate the file while the handle was still open, those applications would be buggy.

  7. @Kevin: Ah, didn't even think of that.  Maybe some kind of soft/safe handle or something...

  8. anonymouscommenter says:

    @Kevin: Yeah I use that all the time. However I know it really doesn't offer any protection. Rename the file and then delete it. (I recommend .nfs$$$$$$$$ where $$$$$$$$ is nFileIndex because such a name is likely to be understood by other developers as reserved for such a purpose.)

  9. anonymouscommenter says:

    IanBoyd: You can certainly open a file in such a way that others can read, write, and delete it. However, you would have no way of knowing that it happened without the oplocks. In other words, your indexer or thumbnail generator probably does not want to operate on a file that is being written to or deleted.

  10. anonymouscommenter says:

    @John Ludlow: That is exactly a symptom of an outstanding handle on a deleted directory. You could use a tool such as Process Explorer to see which process still has a handle open to that directory. As soon as the last handle is closed, the directory is deleted. I don't think this has anything to do with permissions; there are no permissions and no owner simply because the directory is logically deleted.

  11. John Ludlow says:

    @Francis Gagne:

    Logically deleted.... and then re-created. And it's after the re-creation that I observed the symptoms described.

    My theory was that the sequence of operations looked like this:

    Delete file / folder

    Recreate file / folder

    Permissions are removed from the deleted item

    I suppose it's possible that there was a lingering handle which forced things into this order, since the permissions weren't updated until after the last handle died, which was after the folder was re-created. However, Process Explorer won't help there because the perpetrator has already left the scene of the crime.

  12. anonymouscommenter says:

    @John Ludlow: perhaps you're seeing some sort of file system tunneling? That is, the sequence of operations is like this:

    Permissions are removed from the deleted item

    Delete file / folder

    Recreate file / folder

    And since this last step was less than a X time after deletion the file system tunneling copies the permissions from the recently deleted file/folder to the newly created file/folder, as some sort of attempt to preserve file permissions in the presence of old programs written before file permissions and which use a "delete and recreate" procedure to save a file.

    Raymond has written about file system tunneling before.

  13. anonymouscommenter says:

    Is there any simple way to trace the lifetimes of file and directory handles? I have a program which uses a deeply-nested working directory structure which it frequently deletes and recreates. Rarely, deletion of one the emptied sub-directories fails. I would like to see who are the involved parties at that very transient moment when this happens. Manually deleting this directory always succeeds so this is a bit hard to figure out.

  14. Pyjong says:

    See? How pleasantly Raymond helped a newbie? Prime example he is not acerbic :)

  15. anonymouscommenter says:

    @Timo Kinnunen

    I use procmon (technet.microsoft.com/.../bb896645.aspx) to debug situations like that.  You can capture file operations, registry operations, socket operations, etc made by any or all processes.  The output is understandably verbose, and procmon provides a variety of ways to filter the data so you can see only what you are interested in.

  16. anonymouscommenter says:

    I don't think it's ok that antivirus programs are not (or no longer) treated as part of the I/O backend below the filesystem driver. I.e., whatever the antivirus does while scanning files should be treated as a delay in the disk I/O operations instead of causing filesystem access violations.

  17. Someone says:

    @Gabe: I'm completely with you. Virus scanners *must* perform their operations as a filter driver. This way, they would not create additional references to a file or directory, would never interfere with locks or permissions or timing. The only effect would be a slight slowdown of I/O at certain points. And most important, this is the only way to detect a virus *during the write*, before any program (or the OS) can load/execute the file, and also, to *prevent* any reading or execution of infected files.

Comments are closed.

Skip to main content