Jon Udell questions the value and direction of WinFS


Jon Udell at InfoWorld is doing a series of blog entries on Longhorn.  Feedster just discovered his first one, from Wednesday on the justifcation for WinFS defining a new way to manage metadata.


It’s a well written entry, and deserves a well thought out response.  I did want to get out at least one quick response into the blogsphere, though, because I think there’s a misleading statement towards the end:



two powerful trends point to a brighter future for this scenario: the growing use of open XML file formats, and the steady advance of databases that can index and search XML content. WinFS embraces neither trend, and that looks to me like a looming headache. Personal information management, in Longhorn, will be a walled garden with its own notion of schema, and its own query language. To give users the benefit of finding stuff, Longhorn-style, developers will have to implement the Longhorn model.


Jon seems to have missed a few key entries in MSDN about WinFS’s support for XML APIs, as well as the support for metadata handlers that copy metadata between WinFS and the filestream, precisely so that there is no walled garden — if you’re using Longhorn, you see WinFS properties, if you take the file somewhere else, you see EXIF headers or whatever other metadata format your file type supports.


XML formats with well-defined, licensed schemas, are certainly a great step towards a world of open data interchange.  But XML files alone don’t make it easier for users to find, relate and act on their information.


Jon’s contention is that full text search over XML files is good enough, but is it really?  I did a series of blog entries on WinFS scenarios back in February, and I don’t think’s Jon full text search approach would really enable these things.  Take the simple media scenario, where I want to add background music to a movie by browsing through my media library.  As it happens, I’ve recently started using iTunes to manage my music, and iTunes stores its metadata in “iTunes Music Library.xml“ on my hard drive.  So let’s say I wanted to search for jazz music to add.  Here’s a little snippet of what iTunes’s XML format looks like for one of my jazz CDs:



<dict>





  <key>Composer</key>


  <string>Lee Morgan</string>


  <key>Album</key>


  <string>Cornbread</string>


  <key>Genre</key>


  <string>Jazz</string>


  <key>Kind</key>


  <string>MPEG audio file</string>




  <key>Location</key>


  <string>file://localhost/D:/files/Music/Lee%20Morgan/Cornbread/01%20Cornbread.mp3/</string>

</dict>

So what would full text search do for me over this file?  If I searched for “jazz“, it would certainly show me a result for “iTunes Music Library.xml“, since that file contains many instances of the string “jazz“.  It would also probably return other documents on my system that mention jazz, like emails, or papers I may have written in school.  How exactly does this help me find the right piece of music to add to my movie?


To help at all, the programmer who built the movie editing program would have to add in a bunch of smarts about how to index this particular iTunes file, understanding the key/string pairs, and also recognizing that Location has a particularly special meaning.  To make a really performant system, you’d probably need to make a smart indexer as well, so that if you change the genre of only song in you collection of 4000, the indexer doesn’t have to recrawl the entire XML file to update itself.


Perhaps Jon’s point is that the file format of the music itself should support XML, and we should replace .mp3 with a pretend new XML-media file format, .xm3.  Well, now you’ve still got to worry about the schema definition of .xm3, and you’ve got to also worry about what to do if your user happens to prefer media encoded with Windows Media Player or Ogg or whatever.  What you want is a common storage engine, and you want a shared schema with strongly typed metadata.  That’s WinFS.


I could go on more here, but wow, this was supposed to be the simple scenario!  I also wrote up a more complicated event planner scenario.  The key value of an event planner app is that it relates together content that would otherwise be completely unrelated.  If I want to find the presention on Longhorn that I gave to Infoworld, full text search doesn’t help at all, because “Longhorn“ and “Infoworld“ likely never appear together in any document.  Perhaps they appear together in a calendar entry, and full text search might help me find that one entry.  But then how would I find the agenda for that Infoworld meeting, or the notes I took from that meeting, or the presentation I gave?


Anyways, like I said at the start, Jon’s well-written entry deserves a well-written response, but this is what I came up with off the top of my head.  If we’re doing a bad job about explaining the end-user benefits of WinFS, keep in mind that so far we’ve tried to really focus the message on developers, since it will be a while still before a home user needs to think about Longhorn.  If you’re a developer and you’re interested, there’s plenty of WinFS info up on MSDN today.  Take a read through, and see if you reach the same conclusion Jon did — “There’s no question that Longhorn aims for lock-in ”.

Comments (19)

  1. What I want to know is this:

    If I have a Vanilla Ice vs. The Strokes Remix/mash-up by Freelance Hellraiser called "Last Night Ice Ice Baby Saved My Life", will I be able to find it in a "stack" (or virtual folder) by looking for any of the 3 artists, by "rap", by "rock", and by "remix"? If not, then WinFS needs to be thought out a little more.

    Right now, I have to choose a single folder to put it in. I made a content management system a few years ago that worked with virtual folders (although the user never knew it) allowing a document to be placed in any number of folders at the same time. For instance, a photo of myself could be in both "Users> Shannon> Photos> Me" and "Assets> Photos> Staff> Shannon".

    I see WinFS as working somewhat similar to that but in a more automated fashion (instead of having to manually add the photo above to both folders when uploading to the system).

    Am I off base?

  2. Jeremy,

    You missed his point almost completely.

  3. Alex James says:

    Jeremy,

    I remember giving almost exactly the same example to you:

    <full text search doesn’t help at all, because “Longhorn“ and “Infoworld“ likely never appear together in any document>

    just with different keywords, back in September/October last year, when talking about ‘Save With’ instead of ‘Save As’ and still agree completely.

    I think the main problem facing Microsoft and you in particular is to provide powerful demonstrations of the power of WinFS. Since it a Conceptual sell, not a standard benefits sell. Your Scenarios are just the first step I think…

    I decided to stop working on my Relational File System after seeing WinFS… because:

    1) WinFS looked like it was going to be significantly better technically.

    2) But more importantly it was so hard to explain to customers, even MS is going to struggle to explain it the benefits to consumers, and you and MS have 2 years to do so.

    Good luck.

    Alex

  4. Mario Goebbels says:

    <<If I have a Vanilla Ice vs. The Strokes Remix/mash-up by Freelance Hellraiser called "Last Night Ice Ice Baby Saved My Life", will I be able to find it in a "stack" (or virtual folder) by looking for any of the 3 artists, by "rap", by "rock", and by "remix"? If not, then WinFS needs to be thought out a little more.>>

    Currently, the Gerne field in the schema is only defined as dummy, but if they’ll be adding an array of enums (or maybe going to implement it using Genre objects and relationships), then nothing speaks against your idea, means finding the same file in three different stacks.

    Using an enum doesn’t seem that effective to me. As already suggested above, having the ability to create Genre objects and then create a relationship between the actual song and the genre object, offers a lot more flexibility, since I could define my own music genres, or as Shannon suggested, apply multiple genres to one song. Where can I submit this idea, Jeremy? 😉

  5. David says:

    Next time, try reading the article before you respond to it.

    Where does he contend that "full text search over XML files is good enough"? His example uses XPath to search for all documents with a given keyword. Given a standard XML schema for audio metadata, a similar XPath query could find all Jazz albums.

    As for WinFS’s support for XML APIs: exporting a file to XML is a far cry from native global XPath searching. That PowerPoint presentation you linked doesn’t mention any support for XML APIs (unless you count an "XML" rectangle in the API box with no accompanying explanation, or a line saying that WinFS will support XML import/export).

    You say that "What you want is a common storage engine, and you want a shared schema with strongly typed metadata." That’s what Udell wants too. But he wants it implemented using existing standards, and Microsoft wants to do it all their way.

  6. roger says:

    Today’s Windows file system cannot be trusted to remember where I put a file (location of icon in window) or even that I wanted the window to be displayed in the graphical icon mode. Why should I look forward to trusting a future Windows file system with even more information?

    (Yes, I know that is an available feature … What I do not know is how to fix a system after that feature breaks).

  7. Marc's Voice says:

    I’ve been waiting for this battle to ensue.

  8. Pete King says:

    Excellent rebuttal and well thought out. Some of the other commenters need to go back and read Jon’s blog entry as he does indeed assert that full text search is good enough.

  9. David says:

    Pete: where does he assert that? I reread the article and all I saw was a statement that "The power of pervasive free-text search, by the way, is something that Microsoft seems consistently to underestimate," which is not the same thing.

  10. Robb Beal says:

    Jeremy: "Perhaps Jon’s point is that the file format of the music itself should support XML, and we should replace .mp3 with a pretend new XML-media file format, .xm3."

    Think file-format independent XML metadata where the association between data (eg, the mp3 bits) and the metadata is made via a file-system directory,

    http://weblog.infoworld.com/udell/2003/03/05.html#a627

  11. Anonymous says:

    Danny Ayers on WinFS and RDF/semantic web

  12. ‘If we’re doing a bad job about explaining the end-user benefits of WinFS, keep in mind that so far we’ve tried to really focus the message on developers, since it will be a while still before a home user needs to think about Longhorn.’

    OK, let’s say you are coming from the business side (with deep technology knowledge, but not hard-core developer) and trying to understand Longhorn so you can decide if you want to put resources into Longhorn development. I do think end-user benefits would be important to understand the bigger business value picture.

  13. Michael Bartlett says:

    I’ve got hours of thoughts on this whole thing, so for the sanity of others I’ll limit this post to just one or two.

    Firstly, with regards to the Jazz Music example, free text search can be just fine if you limit the scope of the search to mp3s – or "My Music". You wouldn’t then come up with any Word document containing jazz as it would be out of scope for the search. What I don’t understand is how it would be any different using WinFS as the user’s experience of the search is "Jazz" – so how would your (Jeremy) example be any better. Which lead me to believe that I don’t perhaps understand your example because now you are talking about a developer of a movie editing package adding "smarts" relating to the iTunes package. Could you elaborate a little on your example, in particular to the users’ experience and not neccessarily how easy it is to code.

    My second point is a general one on WinFS and, indeed Longhorn’s marketing messaging… could you please stop citing examples of pictures and music?!! I see very few examples of the benefit to corporate workgroups, especially when working with files on a mapped network drive. Is there any light you can shine on this for us (me) Jeremy?

  14. Weddings says:

    Jon Udell at InfoWorld is doing a series of blog entries on Longhorn. Feedster just discovered his first one, from Wednesday on the justifcation for WinFS defining a new way to manage metadata . It’s a well written entry, and deserves a well thought ou

Skip to main content