Did I misunderstand Udell’s argument against WinFS?

Several responses to my last post on Jon Udell’s entries on WinFS suggest that I didn’t get the Jon's point at all.  Dare wrote “You missed his point almost completely.” And David (no blog link provided) said “Next time, try reading the article before you respond to it.   Where does he contend that "full text search over XML files is good enough"?”  I bumped into Alex Hopmann (another active MS blogger, I see 😉 at the gym, and he said I missed the point too.

Well, David, I was just going by what Jon said his point was:

Here's the point of this installment. To the extent that our personal information stores contain information represented in XML, we have standard ways to search them.

Note also that the one example he gives (“There's no need to wait until 2007 to see what this would be like”) is all about searching over XML tags,

The same XML data will be open to the more powerful kinds of search available in the newer XML technologies now coming online: XPath 2.0, XQuery. Meanwhile, a growing number of databases are gearing up to do this kind of search efficiently, often in combination with both relational and free-text querying.

 And here’s how Jon himself summed up that original entry, as a prelude to his second entry on WinFS:

I concluded that the compelling benefit of WinFS must lie in the realm of "organizing stuff" rather than just "finding stuff" -- else why not just leverage existing and well-understood relational, free-text, and XML search methods?

So let my try again.  Here’s how I read Jon’s argument across his two WinFS entries:

  1. Everybody agrees that users need help finding and organizing their data.

  2. There are two basic ways to approach this problem

    1. Simply full-text index all your content, to allow quick full-text search

    2. Allow for semantic relationships between data to enable a richer search/organization experience

  3. The first option, full-text, is pretty darn appealing: “The power of pervasive free-text search, by the way, is something that Microsoft seems consistently to underestimate”, “brute-force free-text search routinely trumps navigation and structured search

  4. There is added value to the relationship approach: “it's easy to state the practical benefit. If my personal information store contains items of types Person, Organization, Project, and Document, and if it knows about relationship types like Employment and Authorship, then I can easily answer questions like "Which Project X documents were written by Doug?"

  5. But, we’re years away from a world like (4), with consistently rich relationship data: “in practice, I wonder if anybody […] can mandate such an approach given the chaotic messiness of reality”, and quoting Joshua Allen, “real-world information is chaotic

  6. On the other hand, the trends do support a world where full-text is a reality, thanks to two trends: “the growing use of open XML file formats, and the steady advance of databases that can index and search XML content.”

  7. These trends in (6) foster the following philosophy: “Let's get schematized information out into the open, where any XML-aware tool can see it and touch it and work with it

  8. WinFS, by contrast, “envisions a canonical set of schemas woven tightly into Longhorn”, and embraces the philosophy “Let's put schematized information into Windows, where any CLR-aware Windows application can see it and touch it and work with it

  9. In conclusion, “Personal information management, in Longhorn, will be a walled garden with its own notion of schema, and its own query language. To give users the benefit of finding stuff, Longhorn-style, developers will have to implement the Longhorn model. And then they'll have to find ways to unify that approach with the XML-oriented model prevailing in the world at large -- and indeed, even on pre-Longhorn Windows systems.”

 I see that Dare restated Jon’s argument in far fewer words than I just did:

"If the software industry and significant parts of Microsoft such as Office and Indigo have decided on XML as the data interchange format, why is the next generation file system for Windows basically an object oriented database instead of an XML-centric database?"

But I don’t see that as what Jon was asking.  He didn’t argue about a relational model vs. hierarchical model, or XPATH vs. T-SQL vs. ADO.NET or what have you.  (What is an “XML-centric database” by the way?  Is Yukon object oriented, relational or XML-centric?)

Jon’s point as I read it really boils down to “why are you taking this very complicated, rigid structured approach, when we can get along just fine with simple search over transparent XML files?”  You see it in his second post, where he's not arguing XML vs. relational, he's arguing the value of  “RDF/SemWeb“ vs. just plain simple XML files (and using Dare's words to do it.)

Let me stop here with this entry.  Did I get the point?  If not, help me out by providing your own breakdown of Jon’s premises and conclusions.  Once I'm sure I have the argument, I'll provide some commentary on each of the presmises.

Comments (10)
  1. Paschal says:

    Jeremy whatever the system WinFS will use, the point is that the user will have to enter more data to make its own data relevant. Where are the benefits in this ? I want my computer doing the job indexing my documents, not me. A relational model is a waste of time if nothing is done to simplify the indexation process. As developers, we enjoy developing complex stuff, but we forget too easily the world outside. My opinion: make it simple as Google and yes it will be a success. Otherwise WinFS will be used by a bunch of elitists blokes.

  2. Alex James says:

    I think the point everyone seems to overlook is that WinFS will actually mean users need to enter less data, not more.

    A lot of valuable meta data will be automatically extracted by WinFS, using promotion and demotion.

    Creating relationships will be trivial, for example dragging onto piles, but it may not even be required. So people need only do this when they know they need to capture some information that isn’t already in the data.

    All this is great but IMHO data entry will be minimized primarily by insuring that schematized data is automatically integrated.

    For example imagine that a CRM application built on top of WinFS uses a Schema that defines People: People created/updated in the CRM application will be immediately available to any other application that uses the same schema (maybe HR). This happens automatically because it is the same data! There is no import or export or cumbersome synchronization.

    Forget about having to remember to update you contact details for someone in 2,3 or even more apps, do it in one place and all apps see the changes! I don’t quite see how islands of unrelated XML can give you the same benefits, and IMHO this is the key to WinFS.

    Surely this is going to make people happy.

  3. jeremy says:

    I blogged on this topic a while back, http://blogs.msdn.com/jmazner/archive/2004/02/16/74595.aspx. I think there is plenty of useful meta-data out there that just isn’t being mined today.

  4. Paschal says:

    Alex I don’t get it. Not everything is structure in my data, and I hope WinFS is not just to make a better address book 🙂

    When I assisted a couple of month ago at one presentation of WinFS, it was clearly shown that WinFS rely on a relational database model. So if you say database, you say data.

    And I don’t see anything in WinFS which support an automatic indexation and cross reference of data.

    Get back to the basics, all users are not high level technicians, and nothing is done to clearly explain the technology, they will consider WinFS as another techie gadget. I am excited to see Google coming on our desktops and see how thjey approach the problem.

  5. Anonymous says:

    Jon responds…

  6. Anonymous says:

    Dare responds to Jon’s response

  7. Alex James says:

    Paschal, with regards to your comments on indexing etc. There is a big difference between finding found things again, and finding things for the first time.

    Goggle is very good at the latter not so good at the former. The number of times I have forgotten how I found something on google is extraordinary so it is lost forever, even though I know I found it once, and it is probably still there. However if I could modify how it is found in the future by attaching metadata somehow, I could find it again without remembering how I found it the first time.

    Does that make any sense ?

  8. Alex James says:

    And… the above is a good example of why external relationships are required (not just XML)… to modify the XML you need edit rights, to modify how you find something you should only need view rights!

  9. Dating says:

    Several responses to my last post on Jon Udell’s entries on WinFS suggest that I didn’t get the Jon’s point at all. Dare wrote “ You missed his point almost completely .” And David (no blog link provided) said “

  10. Weddings says:

    Several responses to my last post on Jon Udell’s entries on WinFS suggest that I didn’t get the Jon’s point at all. Dare wrote “ You missed his point almost completely .” And David (no blog link provided) said “

Comments are closed.

Skip to main content