you say 'potato', she says 'vegetable', I say 'ingredient'

Article
10/27/2005

Metadata is a funny thing. It is one of those topics (subjects?) that gets people to froth at the mouth from time to time. The reason it seems to wind people up so much (at least those in the business of making information navigable) is that we're all coming at it from different perspectives and are trying to achieve different things.

One the one hand, the most most powerful search engines out there require no attribution of content by authors for the search engines to provide relevant results. They work very effectively for certain scenarios, namely 'finding'. As far as these engines as concerned, the content is the metadata. That's it. It solves a lot of problems. Some create categories on the fly (clustering), but no human is required to either attribute the content originally nor sit down and do the categorization of the search results - it is all done algorithmically. It scales beautifully.

On the other hand, we instinctively want to categorize stuff and it is very useful to do so. But there are ways and ways of doing this.

For example, what makes tag-based online services such as Flickr and Del.icio.us and Technorati increasingly popular is there is no centrally managed taxonomy. We all make it up - individually and collectively - as we go along. This seemingly chaotic approach *works* because at one level the freedom allows us each to categorize the world as we see it - you say 'potato', she says 'vegetable', I say 'ingredient', . We're all right of course and no single classifying authority could ever account for the infinite array of contexts that exist (and will exist), nor allow powerful real-time emergent phenomena to occur (such as emergent tags - Joshua Porter describes 'emergent tags': "those tags that become more popular over time."). With this bottom-up approach, order truly does come from chaos.

Yet in my current job I need attribution and lots of it. At MSDN and TechNet combined we have literally millions of pages of product documentation content and technical articles, thousands of related events, webcasts, blogcasts, podcasts and screencasts (!) going on each month around the world, countless downloads and hundreds of thousands user-created micro-content within the third-party community blogs and sites, and much, much more. Without the attribution of each item, without the discipline of taxonomy and the structured data entry of these things it would be almost impossible to provide a traditional and useful navigation interface (read: browsable) to them all beyond a search interface, let alone providing dynamically rendered and contextually relevant IAs and personlized experiences (although there are ways around this without the metadata).

So the metadata question itself is all about context. Whether you need it, and how you do it all depends on what you're trying to do and problem you're trying to solve.

To sum up my metarant, I'll quote Jon Udell who hit the proverbial nail on the metaphorical head:

"There's no overarching schema for the metadata that flows through the service network, touching routers, registries, security gateways, databases, and end-user applications. And, in view of its many forms and uses, it's not clear that convergence on a single standard is necessary or even desirable. What is necessary is that within each metadata domain we strike healthy balances between the constraints we apply to metadata vocabularies and the evolutionary freedom we allow them. Across domains, we'll speak the lingua franca of data and metadata, namely XML."

(tags: Flickr del.icio.us Technorati metadata owl rdf semantic tagging web winfs xml)

you say 'potato', she says 'vegetable', I say 'ingredient'

Additional resources