you say ‘potato’, she says ‘vegetable’, I say ‘ingredient’


Metadata is a funny thing. It is one of those topics (subjects?) that gets people to froth at the mouth from time to time.  The reason it seems to wind people up so much (at least those in the business of making information navigable) is that we’re all coming at it from different perspectives and are trying to achieve different things.


One the one hand, the most most powerful search engines out there require no attribution of content by authors for the search engines to provide relevant results. They work very effectively for certain scenarios, namely ‘finding’.  As far as these engines as concerned, the content is the metadata.  That’s it. It solves a lot of problems. Some create categories on the fly (clustering), but no human is required to either attribute the content originally nor sit down and do the categorization of the search results – it is all done algorithmically. It scales beautifully.


On the other hand,  we instinctively want to categorize stuff and it is very useful to do so. But there are ways and ways of doing this.


For example, what makes tag-based online services such as Flickr and Del.icio.us and Technorati increasingly popular is there is no centrally managed taxonomy.  We all make it up – individually and collectively – as we go along.  This seemingly chaotic approach *works* because at one level the freedom allows us each to categorize the world as we see it – you say ‘potato’, she says ‘vegetable’,  I say ‘ingredient’, . We’re all right of course and no single classifying authority could ever account for the infinite array of contexts that exist (and will exist), nor allow powerful real-time emergent phenomena to occur (such as emergent tags – Joshua Porter describes ’emergent tags’: “those tags that become more popular over time.”).  With this bottom-up approach, order truly does come from chaos.


Yet in my current job I need attribution and lots of it. At MSDN and TechNet combined we have literally millions of pages of product documentation content and technical articles, thousands of related events, webcasts, blogcasts, podcasts and screencasts (!) going on each month around the world, countless downloads and hundreds of thousands user-created micro-content within the third-party community blogs and sites, and much, much more. Without the attribution of each item, without the discipline of taxonomy and the structured data entry of these things it would be almost impossible to provide a traditional and useful navigation interface (read: browsable) to them all beyond a search interface, let alone providing dynamically rendered and contextually relevant IAs and personlized experiences (although there are ways around this without the metadata).


So the metadata question itself is all about context. Whether you need it, and how you do it all depends on what you’re trying to do and problem you’re trying to solve.


To sum up my metarant, I’ll quote Jon Udell who hit the proverbial nail on the metaphorical head:



There’s no overarching schema for the metadata that flows through the service network, touching routers, registries, security gateways, databases, and end-user applications. And, in view of its many forms and uses, it’s not clear that convergence on a single standard is necessary or even desirable. What is necessary is that within each metadata domain we strike healthy balances between the constraints we apply to metadata vocabularies and the evolutionary freedom we allow them. Across domains, we’ll speak the lingua franca of data and metadata, namely XML.”


(tags:  Flickr del.icio.us Technorati metadata owl rdf semantic tagging web winfs xml)

Comments (4)

  1. AndrewSeven says:

    On a recent project I had an architect give me only one peice of guidance for the project:

    "No meta data"

    Its quite startling how much duplication there is in order to not have meta data. :(

  2. Scott Quick says:

    Well presented Alex. See my post on tagging.

  3. TommyW says:

    But couldn’t the overpowering need you have to provide a workable browsable interface be a reflection on the (in)ability for search to work on MSDN?

    That’s been the greatest weakness of the site for a long time — that search doesn’t work any better.

    What if you had a fantastic search engine and you provided a way for users to tag the content themselves? And then a way for users to share their taxonomies with each other? Over time, "authorities" could emerge from the soup that have comprehensive tagging, or that users cluster around. So there would be a "potato" authority and an "ingredient" authority. And maybe this would be a way to approach the problem of different organizational needs for different users: beginners need a very different set of documents and organization of those documents than experts, Web-focused developers want different results than Windows Mobile developers, etc.

    You could seed the process with experts from inside MSDN and Microsoft.

    But get rid of the idea that there’s a single (or even a few) "correct" organizational structures *and* that a few Information Architects inside Microsoft can actually come up with those.

    To summarize:

    (1) Massively improve search on msdn.microsoft.com.

    (2) Build up some personalization infrastructure and allow users to create their own tag structures of the content, and to share those structures with others.

    (3) Seed the tagging process with initial input from experts at Microsoft and/or in the MVP community.

  4. MSDNArchive says:

    Tommy, I’m completely with you on a number of your points.

    re:

    1. Yup.

    2. Yup, although personally I’d rather see us consume and augment tagging services that do the job well rather than create our own, but there you go…

    3. Yup.

    I’d also add 4. open up the MSDN & TechNet content stores – provide APIs to the whole shabang including any funtionality we’ve developed on the platform.

    I’d take issue with the idea that you don’t need an IA. I’d agree there there shouldn’t be a *signle* one, but you do need ‘one’ at least. Browsing isn’t dead yet :-)