Syndicated content search – still broken

For a few months now, Randy Charles Morin has been tracking and grading the performance of search engines specializing in syndicated (RSS’d or Atomized) content.

These search tools can be invaluable when you are trying to keep track of what customers, partners, influencers and press are saying about your product.

Now, having customers use your feedback systems is great and all very well, but you don’t get the whole picture, no matter how convenient these might be to use. From a product development / design point of view, listening to what your customers are saying about your existing and planned products requires the effort to reach out and listen to them in the places where they are talking.

Increasingly, ‘those places’ are blogs. 35m+ of them depending on who’s numbers you believe. Needles in haystacks, and all that…

This where these tools come in and why they can be so valuable.

From reading through Randy’s research and from my own personal experience I can only conclude that there is still plenty of room for improvement required from these types of syndicated content search service providers. In fact, I’d say there is still even room for a new player (or players) to enter and dominate this space.

The engines Randy has been tracking are:

  • Blogdigger
  • Bloglines
  • Blogpulse
  • Google Blogsearch
  • Feedster
  • Pubsub
  • Icerocket
  • Technorati.

For the last few months I’ve been running the same query (“”) on these engines. The frustrating thing is that one week Icerocket might do better than Google’s Blogsearch. Then the next week Technorati seems best. And the next week Bloglines. Without any of these making significant (apparent) improvements overall. This space is broken.

It means that in order for me to track what I want to, I have to track a number of search results (which I subscribe to) in order to get the complete picture. Lots of de-dupping. And yet there seems to be no rhyme nor reason for all this variation in performance.

This isn’t a short term phenomenon. I’ve tried them all myself (some for over two years +) in a number of contexts, and like Randy, I’ve found their results variable at best, but sadly they perform mostly poorly, most of the time.

So what is ‘performance’ is this space? Performance attributes I consider high priority (listed in order of priority) are:

  • Completeness / size of index
  • Time taken for items (from publishing) to be included within index (i.e. minutes, not days)
  • Consistency of service performance over time
  • Order results by content type (e.g. blogs, ‘news’, forums)
  • Order results by date and relevancy
  • Low spam pollution

You can judge for yourself – here are the following results for “”:

As you can see, most of the results are all over the shop. Track them for a few days and weeks and you’ll see the pattern – broken, broken, broken.

Today, it seems Bloglines provided the best results – today. Tomorrow? Who knows…Please, someone win here.

Comments (7)

  1. Eric Allam says:

    Alex, I completely agree.  Whenever I am writing a new blog post about something, I want to know what everyone else is saying about it, so I do searches.  Notice the plurality their.  Icerocket, technorati, google blog search, feedster.  I have to search all 4 before I am comfortable with believing I have found all that I need to, but even then, I feel like something is missing.  Technorati seems to be in the best position to dominate, but they need to work on their search algorithm, as well as their results display.  I also like how icerocket shows you the day each entry was posted, giving you your 5th bullet point from above.  I wouldn’t be suprised if we saw a new player very soon.  What is microsoft doing to win?

  2. – not perfect, but we’re working on it. Our fetcher built on .NET, BTW.

  3. BillyG says:

    Technorati sucks period.

    I used to have posts go immediately to Techno.

    I just checked, and the most recent for my http://billy-girlardo/delicious/ daily blog posting is 12 days ago.


    Maybe it has something to the McAfee SiteAdvisor button not evaluating my sites, although their website says I’m fine (but who the heck goes to their site?).

    Or maybe because Google says their last crawl of my sites came away with Not Found errors (thanks Yahoo!), even though the same sitemap.xml has been in existence for months with no prior problem!

    Sorry to rant, just tired of the multiple unanswered emails from those damn techno people. They suck, and now their site is catering to the MySpace crowd, making it even worse.

    There has to be a better way (thank God I don’t rely on or have any advertising lol)!

  4. A few days ago I grumbled at the poor state of the search engines specializing in syndicated (RSS’d or…

  5. There’s one thing to keep in mind — the ‘.’ character is OFTEN used as a terminator / word break character in search engines.  When Microsoft named .NET and things like I remember thinking to myself "Oh crap" for just this reason.  This is a very common problem — marketers naming things something that just can’t be searched

    Before you condemn search engines for failing in general I’d try searching for a simple term that you know to be indexed.

    One of the reasons I named my startup Ookles was because it was highly searchable.  It wasn’t the only reason but you can bet I considered it for every name we looked at.