FindFirstFile is not a SQL query


The FindFirstFile function is not a SQL query. It’s a very simple directory enumerator. There is a slightly fancier version called FindFirstFileEx, but even that function doesn’t add much at present beyond filtering for directories or devices.

You don’t get to pass it sort criteria like or “return the files/directories sorted smallest file first”, or “return the files/directories in reverse order of creation” The only filtering you can provide are the wildcard pattern and (if you use FindFirstFileEx) a directory filter. The wildcard language is very limited as well; it can’t express queries like “files whose extension are either .exe or .dll” or “all files whose extension is .c plus all directories regardless of extension“. You also can’t ask it questions like, “Tell me the total size of the files in this directory”, for as we saw earlier, this question is underspecified.

If you want to do any of those advanced queries, you’ll have to code it up yourself. Or as Hippie Tim is fond of saying, “Start typing!”

Comments (15)
  1. Scott says:

    It doesn’t look like hippie tim has followed his own advice on typing much. It looked like his blog would actually be interesting.

  2. Mr J says:

    I can’t believe that all the people asking to do funky stuff with FindFirstFile haven’t considered that it’s probably more powerful and quicker to just suck all the results from FindFirstFile into a std::vector and do whatever the hell you like with it afterwards.

    Is this a case of API dependecy ? Once you use an API to do a job, you start wanting it to do all the jobs.

  3. That’s exactly what we ended up doing in Xceed Zip ActiveX to support more advanced filtering and multiple masks in the same folder.

  4. A says:

    "The only filtering you can provide are the wildcard pattern and (if you use FindFirstFileEx) a directory filter"

    Hmm, the current Platform SDK documentation claims FindExSearchLimitToDirectories is "Reserved for future use".

    What exactly can done with FindFirstFileEx that can’t be done with FindFirstFile? All of the new options appear to be flagged as "reserved" or "not supported":

    FindExSearchLimitToDirectories – "Reserved for future use"

    FindExSearchLimitToDevices – "This filtering type is not available"

    lpSearchFilter – "this pointer must be NULL"

    dwAdditionalFlags – "Reserved for future use"

  5. Miles Archer says:

    If there’s demand for SQL like querying on directories, why isn’t it added to the OS so everyone doesn’t have to write their own?

    No, I don’t want to wait for Longhorn, WinFS, or whatever.

  6. Arlie Davis says:

    The desire for SQL-like functionality is not feature creep. It is doing the right job at the right place.

    Consider access to file servers. The right thing to do is to send the file server the filesystem query, have it process it, and send the results over the wire. Slurping large directories to the client, just so the client can cherry-pick the handful of entries that are actually relevant, is wasteful.

    I think SQL, or a SQL-like language, is the right thing, even for traditional filesystems. How many different random queries do we have for interrogating filesystems? Just look at FileInformationClass in the DDK. I used to be an NT developer, so I know that the number of file information classes is in excess of 20. Same thing for directory queries. There are lots of little "one-off" collections of attributes.

    The OS API would be a lot simpler if apps could provide the list of file attributes (columns), the source context (directory or entire directory tree), and a restriction clause (name matching, size, etc.). And yeah, if you follow that to its natural conclusion you get SQL, which I see as a good thing. Even WITHOUT all of the DBFS / WinFS stuff, it’s a good thing just for normal, traditional filesystems.

    What makes this powerful is that the FS implementation has more information at hand to work with, and can therefore make better decisions when it is processing the query. For example, FILE_NETWORK_OPEN_INFORMATION contains CreationTime, LastAccessTime, LastWriteTime, ChangeTime, AllocationSize, EndOfFile, and FileAttributes. If all you need is EndOfFile and FileAttributes, there’s no need for the FS to grovel up the access times, or waste network time with them.

    Yes, I know these attributes are on-hand for most FS implementations. But there ARE reasonable collections of columns, representing sets of data that applications want, that are NOT all contained in a single file information class or API. Apps are then forced to do multiple FS queries, and merge the results.

    The OS should serve the needs of applications, not the other way around. The OS should abstract common implementation details, where it makes sense to do so. I want my SQL FS queries.

  7. Jerry Pisk says:

    And as we learned yesterday FindFirstFile does not even support long filenames. If you ask it to find *.htm it will act like an outdated Win16 code and return index.html as well.

  8. Ben says:

    >>

    If there’s demand for SQL like querying on directories, why isn’t it added to the OS so everyone doesn’t have to write their own?

    No, I don’t want to wait for Longhorn, WinFS, or whatever.

    <<

    These two statements seem to be in direct contradiction. You want it in the OS but want it now? It *IS* being added to the OS (or was…I haven’t followed the latest news about WinFS for a few months).

    Implementing "SQL" on a file system isn’t a trivial undertaking, and neither is adding a new API to the OS. Any reader of Raymond’s (or other) blogs will quickly realize this.

    Until MS does write one that works OS-wide, we’re stuck with either 3rd-party libraries or waiting for MS to get it right and working.

    Personally, I’d rather wait and have something that works well across the whole system.

  9. Arlie Davis says:

    I know about Index Server, and it isn’t good enough. It’s the right idea, but the implementation leaves a lot to be desired. Mostly, it’s *optional*, and it’s rarely installed or configured correctly. Also, it only applies to trees that the administrator has enabled for indexing.

    Tell me — is content indexing enabled on your desktop? On every directory? Is it running? Is it — most important of all — up to date? Is it guaranteed to be up to date immediately after I call CreateFile + CloseHandle?

    A SQL-like interface should be just as reliable and omnipresent as FindFirstFile.

  10. Dean Harding says:

    But we *are* getting the SQL-like features in WinFS. I don’t see how you can expect that new features should be automatically back-ported to old versions of the OS. It’s like asking Ford to install power steering, ABS and central locking into your 1974 Mustang… for free!

    If you need more powerful searching, there’s a billion regular expression libraries out there, just use one of those to do your own.

  11. Nick Lamb says:

    The "total size" thing has been complicated by Windows developers traditionally not being told about the difference between the SIZE of something and its LENGTH.

    Unix clearly differentiates between the questions "What SIZE is this file?" (ie how many disk blocks are needed to store it) and "How LONG is this file?" (ie if I open it, and read one byte at a time, how many bytes is that?).

    This still leaves some subtle things to worry about for the application developer, as always, but it does mean the common question from users "What’s using all the space on my 40GB disk?" is an easy question to answer. There’s no possibility that you missed 850MB of ACLs, or that you mistakenly blame a 4GB long sparse file that’s only 1% filled.

    The designers of BeOS made a rather strange decision here BTW. They wanted a more powerful query system, fair enough, but they integrated it into their BeFS. So all the parsing of the query, the special cases etc. are all handled deep in the BeFS code. On the one hand this gives a free hand to filesystem developers who want to add e.g. a query feature that matches prime numbers, but on the other hand every single 3rd party filesystem (e.g a RAMfs, a network file system) must either do without this functionality altogether, or include a fragile (and security critical) query parser of its own in kernelspace.

    One thing BeOS did get right here was integration of update notification with file queries. So it’s very easy to write a GUI file chooser that remains up-to-date while other applications create, update and remove files. Legacy APIs like FindFirstFile can’t easily be modified to support this way of working.

  12. Arlie Davis says:

    I’m aware we’re getting SQL-like queries in WinFS, and I’m elated, and no, I don’t expect everything to be back-ported for free.

    All I was emphasizing is that a decent query processor does not *require* that the filesystem become a database.

  13. Find* forces any complex search into a linear one, no matter how important or unimportant your design goals. You could have the most stunning gorgeous query engine in the universe and yet you still have to run through the Find* gauntlet. Let the disk thrashing continue.

    When I first read the whitepapers for WinFS I was actually fascinated by the hierarchical object model, wondering why I hadn’t thought of it. Until it sank in that all this stuff was being dumped upon tired NTFS (right?). What a clusterf___. I’m disappointed.

  14. msemack says:

    Why are you dissapointed that WinFS is implemented on to of NTFS? What do you think is so wrong about it?

Comments are closed.