Jon Udell questions the value and direction of WinFS

Jon Udell at InfoWorld is doing a series of blog entries on Longhorn.  Feedster just discovered his first one, from Wednesday on the justifcation for WinFS defining a new way to manage metadata.

It's a well written entry, and deserves a well thought out response.  I did want to get out at least one quick response into the blogsphere, though, because I think there's a misleading statement towards the end:

two powerful trends point to a brighter future for this scenario: the growing use of open XML file formats, and the steady advance of databases that can index and search XML content. WinFS embraces neither trend, and that looks to me like a looming headache. Personal information management, in Longhorn, will be a walled garden with its own notion of schema, and its own query language. To give users the benefit of finding stuff, Longhorn-style, developers will have to implement the Longhorn model.

Jon seems to have missed a few key entries in MSDN about WinFS's support for XML APIs, as well as the support for metadata handlers that copy metadata between WinFS and the filestream, precisely so that there is no walled garden -- if you're using Longhorn, you see WinFS properties, if you take the file somewhere else, you see EXIF headers or whatever other metadata format your file type supports.

XML formats with well-defined, licensed schemas, are certainly a great step towards a world of open data interchange.  But XML files alone don't make it easier for users to find, relate and act on their information.

Jon's contention is that full text search over XML files is good enough, but is it really?  I did a series of blog entries on WinFS scenarios back in February, and I don't think's Jon full text search approach would really enable these things.  Take the simple media scenario, where I want to add background music to a movie by browsing through my media library.  As it happens, I've recently started using iTunes to manage my music, and iTunes stores its metadata in “iTunes Music Library.xml“ on my hard drive.  So let's say I wanted to search for jazz music to add.  Here's a little snippet of what iTunes's XML format looks like for one of my jazz CDs:

- <dict>

<key>Composer</key>

<string>Lee Morgan</string>

<key>Album</key>

<string>Cornbread</string>

<key>Genre</key>

<string>Jazz</string>

<key>Kind</key>

<string>MPEG audio file</string>

<key>Location</key>

<string>file://localhost/D:/files/Music/Lee%20Morgan/Cornbread/01%20Cornbread.mp3/ </string>

</dict>

So what would full text search do for me over this file?  If I searched for “jazz“, it would certainly show me a result for “iTunes Music Library.xml“, since that file contains many instances of the string “jazz“.  It would also probably return other documents on my system that mention jazz, like emails, or papers I may have written in school.  How exactly does this help me find the right piece of music to add to my movie?

To help at all, the programmer who built the movie editing program would have to add in a bunch of smarts about how to index this particular iTunes file, understanding the key/string pairs, and also recognizing that Location has a particularly special meaning.  To make a really performant system, you'd probably need to make a smart indexer as well, so that if you change the genre of only song in you collection of 4000, the indexer doesn't have to recrawl the entire XML file to update itself.

Perhaps Jon's point is that the file format of the music itself should support XML, and we should replace .mp3 with a pretend new XML-media file format, .xm3.  Well, now you've still got to worry about the schema definition of .xm3, and you've got to also worry about what to do if your user happens to prefer media encoded with Windows Media Player or Ogg or whatever.  What you want is a common storage engine, and you want a shared schema with strongly typed metadata.  That's WinFS.

I could go on more here, but wow, this was supposed to be the simple scenario!  I also wrote up a more complicated event planner scenario.  The key value of an event planner app is that it relates together content that would otherwise be completely unrelated.  If I want to find the presention on Longhorn that I gave to Infoworld, full text search doesn't help at all, because “Longhorn“ and “Infoworld“ likely never appear together in any document.  Perhaps they appear together in a calendar entry, and full text search might help me find that one entry.  But then how would I find the agenda for that Infoworld meeting, or the notes I took from that meeting, or the presentation I gave?

Anyways, like I said at the start, Jon's well-written entry deserves a well-written response, but this is what I came up with off the top of my head.  If we're doing a bad job about explaining the end-user benefits of WinFS, keep in mind that so far we've tried to really focus the message on developers, since it will be a while still before a home user needs to think about Longhorn.  If you're a developer and you're interested, there's plenty of WinFS info up on MSDN today.  Take a read through, and see if you reach the same conclusion Jon did -- “There's no question that Longhorn aims for lock-in ”.