Syndicated content search - still broken

For a few months now, Randy Charles Morin has been tracking and grading the performance of search engines specializing in syndicated (RSS'd or Atomized) content.

These search tools can be invaluable when you are trying to keep track of what customers, partners, influencers and press are saying about your product.

Now, having customers use your feedback systems is great and all very well, but you don't get the whole picture, no matter how convenient these might be to use. From a product development / design point of view, listening to what your customers are saying about your existing and planned products requires the effort to reach out and listen to them in the places where they are talking.

Increasingly, 'those places' are blogs. 35m+ of them depending on who's numbers you believe. Needles in haystacks, and all that...

This where these tools come in and why they can be so valuable.

From reading through Randy's research and from my own personal experience I can only conclude that there is still plenty of room for improvement required from these types of syndicated content search service providers. In fact, I'd say there is still even room for a new player (or players) to enter and dominate this space.

The engines Randy has been tracking are:

  • Blogdigger
  • Bloglines
  • Blogpulse
  • Google Blogsearch
  • Feedster
  • Pubsub
  • Icerocket
  • Technorati.

For the last few months I've been running the same query ("ado.net") on these engines. The frustrating thing is that one week Icerocket might do better than Google's Blogsearch. Then the next week Technorati seems best. And the next week Bloglines. Without any of these making significant (apparent) improvements overall. This space is broken.

It means that in order for me to track what I want to, I have to track a number of search results (which I subscribe to) in order to get the complete picture. Lots of de-dupping. And yet there seems to be no rhyme nor reason for all this variation in performance.

This isn't a short term phenomenon. I've tried them all myself (some for over two years +) in a number of contexts, and like Randy, I've found their results variable at best, but sadly they perform mostly poorly, most of the time.

So what is 'performance' is this space? Performance attributes I consider high priority (listed in order of priority) are:

  • Completeness / size of index
  • Time taken for items (from publishing) to be included within index (i.e. minutes, not days)
  • Consistency of service performance over time
  • Order results by content type (e.g. blogs, 'news', forums)
  • Order results by date and relevancy
  • Low spam pollution

You can judge for yourself - here are the following results for "ado.net":

As you can see, most of the results are all over the shop. Track them for a few days and weeks and you'll see the pattern - broken, broken, broken.

Today, it seems Bloglines provided the best results - today. Tomorrow? Who knows...Please, someone win here.