Clarifying the documentation on wildcards accepted by FindFirstFile/FindFirstFileEx


A customer asked for clarification in the documentation for Find­First­File and related functions, such as Find­First­File­Ex and Find­First­File­Transacted

Does Find­First­File­Ex support multiple wildcards in the lpFile­Name parameter? The documentation is not clear. We are hoping to pass something like C:\Directory1\Directory2\*abc*\def*.txt. Note that there are two asterisks in the directory portion as well as an asterisk in the file name portion. Should we expect this to work? The documentation is not very explicit about this scenario. It just says

lpFileName: The directory or path, and the file name, which can include wildcard characters, for example, an asterisk (*) or a question mark (?).

I agree that the documentation is ambiguous here. One interpretation of the sentence is

The directory or path, and the file name, any of which can include wildcard characters, for example, an asterisk (*) or a question mark (?).

Or it could be interpreted as

The directory or path, and the file name. The file name can include wildcard characters, for example, an asterisk (*) or a question mark (?).

You can have multiple wildcards, but all wildcards must exist in the file name portion. The search pattern lets you apply a filter to a search within a single directory. It is not a SQL query.

I submitted a documentation change request to clarify the sentence to the second version above:

The directory or path, and the file name. The file name can include wildcard characters, for example, an asterisk (*) or a question mark (?).

Comments (32)
  1. Joshua says:

    The answer this deserves but MS can't give is "No, this is not a UNIX machine, so don't expect modern UNIX enhancements." To be fair, that hasn't made it reliably into glob() yet.

    [Find­First­File is more akin to readdir. And readdir doesn't support wildcards at all! (Find­First­File supports wildcards for CP/M compatibility. I suspect that if the designers of Win32 didn't have to be compatible with CP/M, they wouldn't have had any wildcard support in the kernel at all.) -Raymond]
  2. Adam Rosenfield says:

    If you put on your kernel-colored glasses, then it's clear that FindFirstFile[Ex] can't be doing anything too fancy.  The kernel should do the simplest possible thing it can do, which is to read the directory entries in the order they're in on disk.  The filtering for wildcards is an extra bonus which can easily be done in userspace (though I'm guessing in this case it's done in kernel space) but does not require any additional I/O overhead, just a bit of extra CPU time.

    Conversely, Unix's glob(3) might potentially have to read a large number of directories if you have a complicated pattern with lots of wildcards.  It can be a useful library function, no doubt, but it's not something that should be built into the KISS system call.

  3. Anonymousse au Chocolat says:

    "For example" is a bit strange here. Either asterisk and question mark are the complete list, then they aren't "examples". Or there are other wildcard characters, then where is the reference to the complete list?

  4. Jeffrey says:

    On a semi-related note, powershell has great wildcard support.  Want to find filetypes that have a shellnew handler?

    PS C:> dir hklm:softwareclasses*shellnew

  5. Joshua says:

    Did I strike a nerve by mistake? We both know the benefit of leaving it in kernel was to let the wildcard expansion happen on the remote side of the network when operating on a network directory.

  6. ErikF says:

    @Joshua: Why does every operating system have to be yet another UNIX?

  7. Antonio 'Grijan' says:

    @Joshua: if you agree that leaving it in kernel mode has benefits, then you should understand that it should be kept as simple as possible, as services provided by the kernel should be (for reliability and performance reasons). If you want to provide fancy filtering, sorting, or directory recursion, do it where Unix does: in a user mode library function (which is what glob() is). It's the KISS principle.

  8. fixing documentation is a good thing... but seems like this would fall under the "faster to check, rather than wait for someone to reply to my question" category.

    [But maybe it's supposed to work, but you're just doing it wrong? And if you find that it works, are you sure it's intended to behave that way, or are you relying on an implementation detail? -Raymond]
  9. Roger says:

    (In a past life I implemented a SMB/CIFS server.)  What is even more amusing is just how inconsistent this stuff gets.  Wildcards are also sent over the network.  Both the Microsoft client and server pieces then have various workarounds for the other party, so the answers differ based on who is asking, and/or who is being asked.  The Samba team eventually wrote a tool named masktest, which is essentially a fuzzer trying to find edge cases in wildcard processing.  You'd never think something as simple as listing files would be complicated, but throw in legacy decisions like case insensitivity, knowing file names are 8.3, letting clients not close directories even when they should, filesystem level wildcards, pagination of results, security considerations, client and server versioning, being lenient, backwards and forwards compatibility, and this gets complex.

    I am curious if Raymond can find out roughly how many lines of code are dedicated just to processing directory listings.  I had to rewrite our code **four** times because previous approaches became untenable as we discovered issues.  My memory is around 2,000 LOC in our implementation, whereas a naive simple implementation using opendir/readdir/closedir would be less than 50.

  10. Harry Johnston says:

    Quite apart from network servers, I'd have thought the overhead of one system call (and the corresponding transition to and from kernel mode) per file would become a performance bottleneck for directories with lots of files.

  11. Anon says:

    If we're comparing, then the Unix syscall interface is highly regrettable as well, due to the fact that a directory fd has to support telldir() and seekdir().  The whole notion of a current position in a directory is fairly meaningless since files can be added and removed at any time, and the filesystem is free to completely reorder its internal structures as it sees fit for maximum efficiency, but the idea of being able to save your place in a directory and then go back there assumes that the ordering is stable.  Go ask any unix filesystem developer how much time has been wasted having to come up with brutal hacks to support seeking in a directory and I'm sure you'll get a response that includes a lot of profanity.

  12. @Adam Rosenfield

    Yeah the filtering is done in kernel space. That's how rootkits are able to hide files. They just throw out the entries they don't want.

  13. Joshua Bowman says:

    @Harry, kernel transitions are pretty negligible when it's only a few dozen or hundred a second. It's not like a user-mode Ethernet. FindNextFile already uses a 4096 byte buffer to minimize kernel calls, which is enough to store at least 5 files with a MAX_PATH length (potentially many more); with the new Win7+ FIND_FIRST_EX_LARGE_FETCH flag you get a 65536 size buffer, enough for at least 88 files!

  14. Joshua says:

    @Anon: I just made seekdir return -ENOTSUPP.

  15. Harry Johnston says:

    @Joshua: even so, it isn't all that uncommon to have tens or hundreds of thousands of files in a directory.  Silly, but not all that uncommon. :-)

    stackoverflow.com/.../886887

    But I didn't know the user-mode function was buffering the information.  That would certainly make a difference.

  16. Anthony Wieser says:

    It's still not very clear, even with the change, as "an" implies only one, and the or may be exclusive.

    Making asterisk and question mark plural would fix that, if it's true that *.?a? is a legal pattern.

  17. cheong00 says:

    I thought because this function has to support POSIX subsystem for Windows, it would follow what section 3.13 of POSIX.2 say when processing wildcard in files.

    Then again this is just what I thought.

  18. Medinoc says:

    Is the POSIX subsystem even up to POSIX.2? I thought it only supported POSIX.1...

  19. boogaloo says:

    IMO it would make sense for the FindFirstFileEx to support wildcards in all positions, because of high latency and low bandwidth situations like SMB and tape storage.

    If the kernel designers were "forced" to include the limited wildcard support because CP/M did, then they dodged a bullet.

    It would also be nice if you could find files using more than the name.

  20. Harry Johnston says:

    @cheong00, I don't think so.  Interix presumably does the same thing Linux does, implementing glob() in user-mode.

    Besides, the underlying kernel function (ZwQueryDirectoryFile) takes a directory handle, not a path name, so there's no way to even *try* to put a wildcard in the path. :-)

  21. Dave says:

    Just curious, when you submit a documentation change request is that by going to the MSDN page for the docs and submitting a comment, or is there a MS-internal way to do it?  I've submitted several change requests via the MSDN pages for things that are obvious errors (e.g. _Inout_ parameter documented as _In_ so you get a compiler warning every time you call that particular function), but never had any response, and the docs stay incorrect.

  22. cheong00 says:

    POSIX.2 govens how command interpreters interpret the input and standard utilities (i.e.: commands) to be provided in POSIX systems like "cat" or "alias", the .2 is not version number, more like "chapter".

    The revision usually comes in form of "POSIX.2-1992", or in the standard form "IEEE Std.2-1992".

  23. cheong00 says:

    @Harry Johnston: There's no limitation on one user-mode API has to be implemented with exactly one kernel API. If there are more requirements on wildcard handling, the path parsing and matching would probably be done on the outer function too.

  24. Anonymous Coward says:

    FindFirstFile is a user-mode function, so there's no reason the kernel has to do the wildcard expansion. (It's not like Linux where the system call interface is documented and intentionally exposed to applications)

  25. boogaloo says:

    Most of the wild card handling should be performed by the file system anyway, not the kernel (*). However to allow wildcards in directories then you'd need to be able to identify which file system you were talking to because of reparse points.

    (*) This is assuming you want to build a decent system that works optimally in all use cases and aren't just happy with knocking something up that "works for me".

  26. cheong00 says:

    s/IEEE Std.2-1992/IEEE Std 1003.2-1992/

    Not sure whether I forgot to copy and paste the 1003 part, or somehow "the dog ate that" ( en.wikipedia.org/.../The_dog_ate_my_homework ).

  27. Kevin says:

    @Joshua: You can't use -ENOTSUPP for telldir, and *certainly* not for seekdir().  Both are POSIX-specified to never error, and seekdir() is additionally a void function so the chances of anyone checking the error code are approximately nil.  Linux quite illegally adds EBADF to telldir() (because the alternative, I suppose, would be undefined behavior), but it does not allow for ENOTSUPP.

    pubs.opengroup.org/.../telldir.html

    pubs.opengroup.org/.../seekdir.html

  28. Daniel Neely says:

    The sad bit is that despite the IIRC multi-year queue of written articles that Raymond maintains, the current MSDN documentation still has the old ambiguous text in it.

    msdn.microsoft.com/.../aa364419.aspx

  29. Harry Johnston says:

    @cheong00, I'm not sure what you mean.  The user-mode function (FindFirstFile) is part of the Win32 API, so what Interix does isn't relevant to it AFAIK.  (Interix is not layered on top of Win32, they are peers.)  

    Only the kernel-mode function (ZwQueryDirectoryFile) might have been affected by an Interix requirement to support wildcards in paths, and we can tell that it wasn't since the function signature doesn't permit it.

  30. cheong00 says:

    If the infomation given on wiki ( en.wikipedia.org/.../Interix ) isn't wrong, it makes me think Interix intends to cover console utilities which is covered by POSIX.2 . It's kind of strange if you only commited to confirm to part of the standard.

  31. DWalker says:

    @Dave said:

    "Just curious, when you submit a documentation change request is that by going to the MSDN page for the docs and submitting a comment, or is there a MS-internal way to do it?  I've submitted several change requests via the MSDN pages for things that are obvious errors (e.g. _Inout_ parameter documented as _In_ so you get a compiler warning every time you call that particular function), but never had any response, and the docs stay incorrect."

    I would like to echo that.  Does anyone know the answer?  Do any of the MSFT people here????  Thanks.

  32. bzakharin says:

    I'm not sure how this stuff evolved, but I'm pretty sure that in DOS days anything after a *, but before the dot (if applicable) would get ignored in a DIR command, so "DIR def*.txt" would return any text file starting with def, but "DIR *def.txt" would return any text file. This seems to not be the case in CMD today, but what about Find­First­File­Ex?

Comments are closed.

Skip to main content