Tackling the meta-crap challenge


What better motivation to get back into blogging than a challenge from fellow Microsoftie Dare:



The WinFS folks and Longhorn evangelists will probably keep focusing on what I have termed “bad scenarios” because they demo well but I suspect that there’ll be difficulty getting traction with them in the real world.


 


I’m willing to try my hand at coming up with some not-bad scenarios, or maybe even some good scenarios, but first I want to repeat some of the principles I covered in my older blog posts.  Or at least I hope I covered them there 😉


 


The flurry of WinFS “metacrap” posts seems to have started with Simon Fell askingWhy will tagging 100 photos with ‘Wedding’ make things magically better than having the photo’s in a ‘Wedding’ directory?”, which lead to Scoble posting about his dream scenarios.  Dare responded to that post with several of his own, the first one from above responding to Scoble’s scenarios, and then a followup asserting that “Effectively tagging the content so it can be categorized in a way you can do interesting things with it search-wise is unfeasible”.  He also linked out to Cory Doctorow’s write-up from 2001 of why meta-data is more often meta-crap.


 


I agree in principal with a some of these criticisms of big huge dream scenarios.  We are a long way from the day when any file on any website is filled with valuable meta-data in a schema that every PC can understand.  It may well be that we never get there.


 


But that’s okay, because my dream scenarios for WinFS aren’t quite that grandiose.  I’ll be satisfied if WinFS helps me find, relate and act on my information, in a way that makes sense to me.  Bonus points if it helps me find, relate and act on information created by the people with whom I interact most closely: my family, friends and co-workers.


 


I think there are plenty of compelling benefits that open up with even a slight amount of metadata on the files I work with every day, and what’s more, I think that there are ways to mitigate the concerns Cory and other raise about people generally being lying, lazy, stupid self-deceivers (* applies to meta-data only, no promise about helping lying, lazy, stupid, self-deceiving politicians.)


 


My first assertion is that meta-data on a local, individual scale is interesting enough without asking all these questions about how it will scale to the entire Internet (although they’re good questions that we should address over time.)  If I can organize my own personal information in a better way than what I get today with the filesystem, that’s a win.


 


My second assertion is that in many cases, people actually are creating accurate meta-data today.  At Microsoft, for example, every slide deck from PDC is named something like “DATA201 Clark.ppt”, including the session title and speaker’s last name.  Not only that, they are all stored in a folder named, IIRC, “2003-10-27 PDC”.  There’s some interesting, accurate meta-data.  Another example: almost every feature specification at Microsoft uses some sort of template, at the top of which are a bunch of fields like Feature Name, Program Manager, Tester, Developer, Milestone, Review Status, etc.  These fields all tend to be filled out accurately (or at least they start out accurate, and then decay over the years during which the product is actually built.)  The decay is a problem to address, but hey, the meta-data is there.  Every photo on my hard drive has useful metadata built in, whether it’s the timestamp in the EXIF header, the filename (I use the XP photo wizard to get names like “Winter2003Holidays01.jpg”), or the folder path that leads to them (“\photos\family\Thanksgiving in LA”).  My Money2004 file is filled with great meta-data, most of which I didn’t even have to enter, about credit card charges and checks I’ve written.  My calendar in Outlook also is filled with accurate meta-data, including the time, location and subject of almost everything I do (and in many cases, it also has the list of other people who participated in the activity or meeting.)


 


See?  Plenty of handy meta-data, but today it’s pretty much inaccessible to any kind of centralized search and organization tool.  It’s mostly not in headers, or OLE doc props, it’s encoded into the filesystem and file streams – because today, that’s pretty much all we’ve got, files and folders.  Sure, we’ve tried to come up with better alternatives, but they haven’t shown any real benefits.  In Eric Newton’s response to Dare’s metacrap posting, he notes that “people didnt use office’s meta data because frankly it wasmt on the beaten path. and frankly most people just simply arent organized and dont care to be organized, until they want to find something.


 


My third assertion is that even where explicit meta-data isn’t just lying around in the filesystem, meta-data can in some cases be inferred.  In comments to Scoble’s post, Richard Talent writes “The real win of WinFS will be that there are multiple contributors of that metadata: for instance, my address book knows how to spell names of people I know, why should my photo software require me to duplicate that effort?”  Another commenter, Malach, says “your PIM has ‘22 December, Aspen, Skiiing’ and you take a lot of photos in that day, then whatever handles the meta data management side of things should be able to put one and one together and ask you if they equal two”.  I think this is the right direction.


 


Okay, so we’re back to the same challenge from Simon and Dare.  Can we come up with scenarios that are so good that they motivate people to accurately capture meta-data into WinFS ahead of time?  And why would meta-data in WinFS be more useful than meta-data encoded into the file system (Simon’s example of the folder called Wedding)?


 


I’ll try to post a few compelling scenarios in the next week.  The foundation for them all, of course, is WinFS.  All my scenarios will take advantage of this new item storage functionality that defines a common place to store data and meta-data, a common, discoverable schema for meta-data, and a data-model that allows you to establish relationships between items.

Comments (21)

  1. The foundation may be WinFS but the answer is in the LH shell and how people will work. Checkout http://dotnetjunkies.com/WebLog/mwherman2000/archive/2004/02/17/7372.aspx.

  2. Stephane Rodriguez says:

    "I’ll be satisfied if WinFS helps me find, relate and act on my information, in a way that makes sense to me."

    Usual trap you are falling into. It’s not about information, it’s about MESSAGE. Information is clueless, all what you get is kilometers of it. Who freaking cares? What you want is messages, value. The right software/plumbing is about making sure that no information overloads occurs. 100% of the internet-centric software fall on that trap.

  3. Stephane Rodriguez says:

    By any chance are you an evangelist?

  4. Only partly related, but I’d like to see the ability to export WinFS items into a sort of container file for storage outside of a WinFS store. Like this I could exchange items with my friends, that are completely tagged with metadata.

    While I don’t expect THEM to fill out the metadata on their items accurately, you can bet that I will. I’ve already enough chaos on my disks, if I can dynamically arrange the stuff, I take the burden to fill the metadata by hand if necessary.

  5. Longhorn and Metadata: The answer is in the shell and how people will work (aka metacrap, meta-creap – two nice searchable metadata values)

  6. theCoach says:

    WinFS creates an incentive for ISV’s to include the data they collect in the general or extended schema. Previously there was no coordination between the differnt vendors. The hope would be that some vertical schema standards emerge that are supported by all vendors in that vertical.

  7. Stephane Rodriguez says:

    "Previously there was no coordination between the differnt vendors"

    What a lure! How do you coordinate when protocols and file formats are undocumented?

  8. Ray Schraff says:

    Re: Simon Fell’s ‘Wedding’ directory idea.

    Why put metadata on an image instead of merely storing them in a folder??

    How about the case of 10,000,000 check images all with a common docType keyword value of CHECK and each with unique Account Number/Check Number keyword values?

    1-10,000,000 unique sub-folders ???

    I don’t think so…..

  9. kip says:

    One of the many things to be careful about is background meta-data that people don’t realize is stored at all. MS just released a tool to STRIP data from documents, at the same time it is developing new tools to add perhaps un-intuitive and un-explicit data. Is this what we want?

  10. If I have understood how WinFS is supposed to work, I’ll have to say: Trees are not the way to store data. And afaik, WinFS solves this "problem".

    Along the years, we have stored data differently, and hierarchical trees is the most common one. This is the one we find in most file systems (folders and files), the way most websites are organized (also folders with subfolders and articles), and the way most people think.

    But this is not the best way to store data. Anymore, at least. To organize information efficiently, we need a flatter structure. What we see today, is that most people just place most of their files on their desktops anyway (probably beacause this is the default visible folder in most download dialogs, and because they know where to find it. It’s the desktop, after all).

    Instead of having people placing everything on the desktop, making a flat and unstructured organization of things, it would be a lot neater to have them place everything inside a metadata storage, with the same flatness, only now with structure. I believe WinFS may provide this structure.

    Opera’s e-mail client, M2, has this kind of organization now. It makes a hell of a lot more sense than the hierarchical structure in all other e-mail clients. My e-mail is basically in a pool altogether, and is placed with pointers (shortcuts) in different views (compare them to database views, as there is no physical moving of files going on) that I can create. Some views are even created for me, based on the From: header and more.

    With enough metadata (which today is already there, in many formats), WinFS may provide a structure to people’s loads of files which they may benefit grately from. How this is going to work and look in the GUI, I don’t know, but at least WinFS makes it possible. Yes, plumbing is essential, but WinFS can do a lot without any utility programs, just through Windows Explorer.

  11. Albert Ho [MSFT] says:

    I think Outlook is a perfect example of where searching meta-data becomes a real-world scenario. I like many people live out of Outlook. I store my contacts, birthdays, restaurants – I even use journal. I attach a lot of files and do a lot of work around people and dates. Having something like WinFS to be able to store just the relationships between when and whom I do something for would simply questions like when did I send x to person y? Or how many people work around project x.

    If we could even store the information directly in WinFS then there would be huge potential for other applications outside of Outlook to search and even display other information. I totally like the idea of not having data siloed on a per app basis. It would be great if I could have email or contact information linked in with other office files or even source code 😉

  12. The direction for WinFS should be similar to the iFilter mechanism used by MS Index Server. This way, searching for and using meta data stored behind documents would be file type independent. Wouldn’t that be nice?

  13. Dating says:

    What better motivation to get back into blogging than a challenge from fellow Microsoftie Dare: “ The WinFS folks and Longhorn evangelists will probably keep focusing on what I have termed “bad scenarios” because they demo

  14. What better motivation to get back into blogging than a challenge from fellow Microsoftie Dare: “ The WinFS folks and Longhorn evangelists will probably keep focusing on what I have termed “bad scenarios” because they demo