A number of people have been asking about Windows Vista search and how to programmatically query the search engine. There's not a lot of documentation available yet (and this will obviously change as we near the release of Windows Vista). So in the meantime, here's a quick primer for those of you that would like to get started today.
Windows Vista includes a built-in desktop indexing platform. You’ll see search integrated throughout Windows Vista, in all the Explorer windows and even in the Start menu. A number of Microsoft products are building on top of this infrastructure - like Outlook 2007 and OneNote 2007, just to name a couple.
Your applications can also plug into this same infrastructure and query the index, by using the new OLE DB Provider for Windows Search. (There are also extensibility mechanisms by which you can expose data to the indexer, but I'll leave that for a future post)
The Windows Vista search and indexing infrastructure will also be available as a download for Windows XP and Windows Server 2003. You can download the Windows Desktop Search 3.0 Beta Engine Preview here.
Using the OLE DB Provider for Windows Search
While there's a lot of information available on OLE DB, you won't get very far with that alone – you also need to know the specifics of the provider, like its connection string and its query syntax. (If you're using ADO.NET, just a reminder that you can use the types defined in the System.Data.OleDb namespace. If you don't know how to use an OLE DB provider, a good place to start would be the MSDN Data Access and Storage Developer Center.)
Here's the connection string for the OLE DB Provider for Windows Search:
The query syntax is what I've found to be the trickiest part, and that's what I'll focus on for the rest of this post.
The OLE DB Provider for Windows Search supports a single statement: the SELECT statement. The provider was designed solely for read-only operations, so there's no need for INSERT, UPDATE or DELETE statements.
The structure of the SELECT statement looks like this:
<properties> represents a list of one or more comma separated "column" names, where the columns correspond to properties defined in the new Windows Vista property system.
A note about the new Windows Vista property system: Windows Vista has a new schema-based property system which defines the metadata that can be stored within files. Hundreds of system-defined properties will ship with Windows Vista and the property system is also extensible by custom file format providers.
- The documentation for Property Description Schema in the SDK docs will give you a pretty good idea of how properties are defined.
- Documentation on the system-defined properties, while not yet available online, is available in the documentation installed with the Beta 2 release of the Windows SDK. Here's how to navigate to the docs for these properties: User Interface->Windows Shell->Shell Reference->Shell Properties
Here are a couple of details about the SELECT clause that may not be obvious:
- The convention used for property names is: Publisher.Application.Property
- Property names must be enclosed in double quotation marks (due to the period used in the naming convention)
- SELECT * is not supported, so you'll need to specify at least one property name
The FROM clause is pretty straightforward; since there's only a single index that can be queried, there's only one possible variation: you can precede SYSTEMINDEX..SCOPE() with a machine name to execute a query against the local index of a remote machine. (You must be running Windows Vista or Longhorn Server on the remote machine, and there are configuration steps/permissions that must be done before this will work).
The optional WHERE clause supports a number of predicates.
- Simple predicates: literal value comparisons (<,>,=) and LIKE
- Full-text predicates: CONTAINS and FREETEXT
- Search depth predicates: SCOPE and DIRECTORY
In my next post, I'll cover some details about the predicates and I'll provide some specific examples of their usage.
Updated June 26, 2006: modified formatting (you should now be able to copy and paste the connection string into your code, without getting a runtime error from the "smart" quotes)