Several responses to my last post on Jon Udell’s entries on WinFS suggest that I didn’t get the Jon's point at all. Dare wrote “You missed his point almost completely.” And David (no blog link provided) said “Next time, try reading the article before you respond to it. Where does he contend that "full text search over XML files is good enough"?” I bumped into Alex Hopmann (another active MS blogger, I see 😉 at the gym, and he said I missed the point too.
Well, David, I was just going by what Jon said his point was:
Here's the point of this installment. To the extent that our personal information stores contain information represented in XML, we have standard ways to search them.
Note also that the one example he gives (“There's no need to wait until 2007 to see what this would be like”) is all about searching over XML tags,
The same XML data will be open to the more powerful kinds of search available in the newer XML technologies now coming online: XPath 2.0, XQuery. Meanwhile, a growing number of databases are gearing up to do this kind of search efficiently, often in combination with both relational and free-text querying.
I concluded that the compelling benefit of WinFS must lie in the realm of "organizing stuff" rather than just "finding stuff" -- else why not just leverage existing and well-understood relational, free-text, and XML search methods?
So let my try again. Here’s how I read Jon’s argument across his two WinFS entries:
- Everybody agrees that users need help finding and organizing their data.
- There are two basic ways to approach this problem
- Simply full-text index all your content, to allow quick full-text search
- Allow for semantic relationships between data to enable a richer search/organization experience
- The first option, full-text, is pretty darn appealing: “The power of pervasive free-text search, by the way, is something that Microsoft seems consistently to underestimate”, “brute-force free-text search routinely trumps navigation and structured search”
- There is added value to the relationship approach: “it's easy to state the practical benefit. If my personal information store contains items of types Person, Organization, Project, and Document, and if it knows about relationship types like Employment and Authorship, then I can easily answer questions like "Which Project X documents were written by Doug?"”
- But, we’re years away from a world like (4), with consistently rich relationship data: “in practice, I wonder if anybody […] can mandate such an approach given the chaotic messiness of reality”, and quoting Joshua Allen, “real-world information is chaotic“
- On the other hand, the trends do support a world where full-text is a reality, thanks to two trends: “the growing use of open XML file formats, and the steady advance of databases that can index and search XML content.”
- These trends in (6) foster the following philosophy: “Let's get schematized information out into the open, where any XML-aware tool can see it and touch it and work with it”
- WinFS, by contrast, “envisions a canonical set of schemas woven tightly into Longhorn”, and embraces the philosophy “Let's put schematized information into Windows, where any CLR-aware Windows application can see it and touch it and work with it”
- In conclusion, “Personal information management, in Longhorn, will be a walled garden with its own notion of schema, and its own query language. To give users the benefit of finding stuff, Longhorn-style, developers will have to implement the Longhorn model. And then they'll have to find ways to unify that approach with the XML-oriented model prevailing in the world at large -- and indeed, even on pre-Longhorn Windows systems.”
I see that Dare restated Jon’s argument in far fewer words than I just did:
"If the software industry and significant parts of Microsoft such as Office and Indigo have decided on XML as the data interchange format, why is the next generation file system for Windows basically an object oriented database instead of an XML-centric database?"
But I don’t see that as what Jon was asking. He didn’t argue about a relational model vs. hierarchical model, or XPATH vs. T-SQL vs. ADO.NET or what have you. (What is an “XML-centric database” by the way? Is Yukon object oriented, relational or XML-centric?)
Jon’s point as I read it really boils down to “why are you taking this very complicated, rigid structured approach, when we can get along just fine with simple search over transparent XML files?” You see it in his second post, where he's not arguing XML vs. relational, he's arguing the value of “RDF/SemWeb“ vs. just plain simple XML files (and using Dare's words to do it.)
Let me stop here with this entry. Did I get the point? If not, help me out by providing your own breakdown of Jon’s premises and conclusions. Once I'm sure I have the argument, I'll provide some commentary on each of the presmises.