Better Intranet Navigation: Statistically Improbable Phrases


On Amazon.com today, I noticed a feature that David Weinberger mentions in his book, but I don't think is yet imageavailable on Amazon's UK site.  Books for which they have electronic access to the text have been scanned and effectively auto-tagged.  These tags then become navigational aids to allow you to find books on similar topics.  Two sets of tags are listed:

  • Statistically Improbable Phrases are phrases that occur a large number of times in a particular book relative to all books scanned.
  • Capiltalised Phrases are things such as names and places mentioned frequently in a book.

Going one step up the value chain:

  • Books on Related Topics shows other related books based on their usage of similar Statistically Improbable Phrases.
  • Concordance shows a tag cloud of the 100 most popular words in a book.

This got me wondering.  Is there is a wide enough distribution of topics in the documents generated in a typical company to enable this sort of approach to work in the enterprise?  Let's say I navigate to a document, Acme Corp Widget 3000 Marketing Plan, and alongside it I see a list of SIPs.  Widget 3000 is an SIP.  Clicking the link takes me to page listing other documents containing that term.  Sites about the topic.  Discussion lists etc.  This could be a powerful way of navigating Intranet content without trying to create complex taxonomies and trying to get authors to categorize things in the right way.

I'd love to see us try something like this in Office 14.  What do you think?

Comments (2)
  1. Tim Wragg says:

    Hey Mark,

    I was having this conversation with David Lemphers (MS-AUS) just the other day. Mainly about search engine technology on the web and the differences when applying to the enterprise. Basically what is relevance to an enterprise and one of his points was to do with statistically improbably phrases – which I feel is fixable with different algorithms..

    I think he’s working on a blog post regarding it but not to steal his thunder, he pointed me to SHOE –

    http://www.cs.umd.edu/projects/plus/SHOE/search/

    a symantec search engine where a user helps set and build the context of search terms..

    Cheers,

    Tim

  2. Have a look at <a href="http://iNeedSomebody2tag.com/welcome/en">http://iNeedSomebody2tag.com/welcome/en</a&gt;. There is a web experiment regarding to folksonomies and collaborative tagging systems.

    Maybe it is of interest for you.

    Regards,

    Tobias Kowatsch

Comments are closed.

Skip to main content