How RSS thickened my Long Tail


I did an experiment this morning. 
I analyzed my blog traffic/views using all the data since December
2003. For some reason I haven’t done this before.

I wanted to see how closely my page views traffic looked like the Long Tail.  I got more than I bargained for.

This is roughly what the classic Long Tail shape looks like:

.

Figure 1, The classic Long Tail

My Long Tail

I took the page view data of all 535
posts and sorted by page views (PVs – over 3.4 million total since
December 2003) in ascending order.  This is what I saw:

Figure 2, My classic Long Tail by PVs

And there it is, a Long Tail
if ever I’ve seen one – a small number of posts taking a
disproportionately large percentage of the total page views, curving
off dramatically with a gradual tailing off and the rest getting
fringe-type attention. The top 10% of the posts equate to 24% of total
PVs. The top 20% of the posts take up 36% of total PVs.

Quite uncanny, almost to the point of
obviousness in this mornings prediction. But it could have been
different. In fact if I also look at the RSS views (see below) the Tail
could be thinner or thicker or longer or shorter.

PVs of Latency

The following chart sorts the posts by
date and shows PVs for each post, the earliest first (Dec 2003 on the
left) and around 22 months if data at the time of writing this. Now at
first sight it might appear that my blog is getting worse over time as
the PVs in the latter 7 month are roughly averaging around half the of
the first 15 months.  In fact what we’re seeing is the effect time
has on post PVs.  Today I’m getting more PVs in the first week of
a post being live than I did a year ago, at least 6 times (I record my
monthly stats at the end of each month). So at least half the volume of
the PVs on the chart were served 15 months after they were posted.
Latency in action.

Figure 3, PVs over time, earliest post on left

Where are the PVs coming from? Search
engines, Google in the main.  At least 80% of the traffic I get to
posts after 3 months are via the search engines that match niche
content with niche interests –
the Filters of the web
The rest of the traffic comes after 3 months, from referrers from other
blogs and online resources (articles, guides, lists of useful links on
a subject, etc.).

The RSS Effect

Here is the same chart but with RSS
views overlaid (Figure 4).  RSS views are the number of times a
post (item) is viewed via RSS (obviously).  Some of the more
immediate PVs after posting will be click-thrus from the RSS feed for
about 2 weeks after the post has gone live, but I suspect there is some
double-counting here in Unique Users (UUs – not shown) as one user
might see both a post on the blog and in a reader/aggregator.

Total RSS views is 3.2 million (red
line), almost exactly matching the 3.4m PVs.  Notice the long term
RSS views trend/average (up – black lines) is inverted compared what we
saw with PVs (down, but will go up over time)

RSS has effectively doubled my ‘traffic’.

Figure 4, PVs (blue) and RSS view (red line and black line average) over time, earliest post left.

In the first half of the chart (the
first 11 months), RSS views pretty much track PVs, but you can see the
delta on the second half of the chart: RSS is double the views of PVs.
It’ll be interesting to see how this trend develops over the next year,
but it seems clear that RSS is making its impact on my blog use at
least.

RSS searching and filtering services such as Technorati, Ice Rocket, and most recently MSN
will make a greater and greater impact on traffic patterns as their
usage increases.  As we’ve seen with traditional web search, the
traditional web UI is no longer primary interface to much of the
content on the web. Users are finding content, not browsing for it.
They are beginning to subscribe not ‘visit’.  Navigating the web
is becoming a less structured affair (of course, 
IAs, site maps and navigational systemstraditional interfaces – will always be with us, for a while anyway).

My blog as it turns out has a Long Tail,
and that is not a bad thing – people are finding the content as they
need it.  But if I were in the business of ‘moneterizing’ my
content (my spell checker has just had kittens) I’d want to thicken and
shorten the tail, increase volume of PVs, increase distribution and get
better returns on the investments made for content development /
acquisition.

The following is a comparison of three
possible charts of my blogs PVs (Figure 5). The bottom left ‘Actual’ is
the PVs data we looked at above in Figure 2, this is fairly classic
Long Tail shape…I’m rather proud of it :-).  Stating the obvious
here somewhat, but what I’d really want is the curve to look like the
‘Web 2.0 ultra effect’ curve on the far right- squashed upwards with a
Short Tail and as thick as possible.  This in fact is a Long Tail
rotated 80d anti-clockwise, then mirror-flipped and thickened with
volume. Very little wastage on terms of utilization.

Figure 5, The Long Tail and the Web 2.0 Effect

The middle chart above ‘Web 2.0 effect’
(Figure 5 still) is closer to what Web 2.0 can achieve, and has in
my case due to RSS.

Below is my Long Thick Tail charted as stacking RSS views on top of PVs (Figure 6):

Figure 6, The Long Thick Tail, thickened by RSS

This is no longer a classic Long Tail
–  it is thicker and RSS constitutes half the total volume of all
views.  This is the Long Tail with Web 2.0 effect or The Long
Thick Tail.  The content on my blog has been set free by RSS and
the feed search engines, unconstrained by the tedium of a navigation
system.  These Web 2.0 technologies are amplifying the value of my
content by getting more eyeballs to it.  They are literally
creating new value.

On this subject, Joshua Porter, in a short but sweet post
(unlike this one!) that has propelled my own thinking on the connected
concepts of the Long Tail and Web 2.0. It is why I did the analysis, so
thank you Josh. He puts it this way:

“I see lots of similarities between
the Long Tail and Web 2.0. Both ideas are about improved access to
previously unavailable content, both are about showing the whole
catalog, and both are ultimately great at enabling user choice. They
seem to overlap a lot. If I had to make a marked distinction between
them I would say that Web 2.0 is about the access to information while
the Long Tail is about the economics of it all.”

Agreed.

Opening Up

The BBC get this connection, the relationship between the Long Tail and Web 2.0. At least they say they do although they’ve not released any APIs yet, they understand what’s coming and are preparing.  They are preparing for the remixed, mashed up, open and recombinant web enabled by web services, APIs, and RSS (buzzword bingo Alex, 10 out of 10).

Figure 7 is my attempt to visualize all
this.  Content gets created, stored and made available through
traditional UIs and then freed through RSS, web services/APIs for
others to amplify and create new value, who in turn create more UIs,
more RSS feeds, web services and APIs and value, and so on.

Figure 7, Web 2.0 enabled content via services/APIs & RSS creates amplification and new value

“Power in the Web 2.0 comes not from
controlling the whole system, but in controlling the connections in a
larger network of systems. It is the power of those who create not open
systems, but semi-open systems, the power of API writers, network builders and standards definers.”

I think William Blaze
is right here about how open a system can be without diminishing the
value it is there to create.  So while I partially agree with
Richard MacManus’ philosophy of ‘letting go of control and freeing your data’ (to paraphrase) it comes down to staying in touch with reality (not just capitalist reality this applies to the gift and attention economies too )…any philosophy that fails to take account of reality becomes academic at best and not practicable.

I suggest a working philosophy is really ‘let go of enough control and free most of the data’.

Web 2.0, the Long Tail and Business Models

An example of this philosophy at work is
eBay.  There are certain parts of its business (process &
content/data) that will need to remain proprietary and ‘own’ for it
to remain competitive. An obvious example is the customer data it has
acquired. This is an ultra-high value asset to eBay (of course the
privacy law constraints are another good reason for not opening this
data up).  This is the kind of data it wants to keep under
lock and key, not open.  The data it does want to and does share
are category listings, product details, pricing data, bidding counts,
etc., that make up the core of the value eBay has to offer
third parties.  Those third parties can then create new interfaces
(UIs and programmatic), recombined with other data to create new value.

The user interface on eBay.com
is the primary customer interface for its 56 million customers, but is
one of many used to interact with the powerhouse.  These other
interfaces are distributed under its Affiliate program,
including the APIs
and services (such as Paid Search).  This is the Web 2.0
enablement of  eBay’s content, data and services and helping 
make their
Long Tail thicker.  The Long Tail in this case being the 10,000 affiliates of  who participate in the distributed ecommerce program and growing revenues along with eBay through a variety of business models.

Now, there’s nothing new here
the pure-play does this instinctively. This type of distributed
business model is what it means to be a successful ecommerce business
and eBay, Amazon and others have been doing it for years.

When the Long Tail and Web 2.0 meets Flat World economics we get players to emerge like AllofMP3.com,
a Russian online music service providing pretty much any file type you
like of pretty any much any music out there.   The service
has already
caught the attention of the RIAA
and I’m not sure how long they might last.  At $1.50 a CD (you pay
by the meg, if you want high bit rate, you pay more, like 50c more) and
DRM-less you’re playing by Russia’s economics where CDs cost $3. User
imported digitally, cheap but legal.  As well as the web UI, there
is a download client with searchable local catalog database and with
integrated web UI for payments and account management
transactions.  I have no idea how much they are selling but I’m
pretty sure they are raking it in.  I’m actually spending more on
CDs than ever, I don’t think I’m alone in thickening up their Long Tail.

Again nothing new, been around a while.

What is new are the gazillion remix
opportunities that are emerging for everyone else, not just
the well funded players. Through insights provided by people like
Chris Anderson on the nature of the network economy, the Long Tail economic theory throws new light on old problems and combined with the direction that the web is going in – known as Web 2.0, there are some very interesting and good times ahead.

(PS, Let me know if you’ve got Long Tails of your own to share, am especially interested in RSS-related Tails, thanks.)

(Trackbacks are knackered again. Here are some blogs pointing to this post).

Comments (16)

  1. Link: Alex Barnett blog : How RSS thickened my Long Tail. Great analysis from Alex.

  2. It’s quite likely that your pageviews follow a Zipf distribution with classic long tail usage, since most websites have worked this way since at least 1996 (the first time I analyzed such data).

    However, it would be easier to evaluate your data if you plotted the data on log-log diagrams (i.e., logarithmic scales for both x and y axes).

    See my essay on <a href="http://www.useit.com/alertbox/zipf.html">Zipf Curves and Website Popularity</a> for sample charts.

    Basically, if the data shows as a straight line on log-log plots, then you have the expected distribution. If the curve droops on either end, then something else is going on. (See example at the bottom of the above reference with a plot from a site that had 10,000 pages in 1996 and needed 200,000 to fully meet the long tail requirements. By now, I think this site is fully compliant, but I don’t have its recent data.)

    A second question: what do you mean by "RSS views"? Is this number of times a page was *seen* by an actual human or is it simply the number of times it was downloaded by RSS software? It needs to be user-activated clickthroughs to be comparable with traditional pageviews.

  3. MSDNArchive says:

    Thanks Jakob and Richard.

    Jakob – I’ll study your work on this as you suggest and will try and re-analyze – see what comes up.

    Alex.

    (btw Jakob – I’ll look up the essay and re-run the numbers as you suggest. On your question of the definition of ‘RSS views’, this is defined by the number of times each item was viewed (NOT how many times the RSS file has been pinged – this is a different number I did not analyse).

    You say "Is this number of times a page was *seen* by an actual human or is it simply the number of times it was downloaded by RSS software? It needs to be user-activated clickthroughs to be comparable with traditional pageviews". I respectfully disagree with you on this point – getting RSS clickthroughs would then also count in the PVs log, because the PVs would register each clickthrough as a PV). I’m interested in the number of times each item was viewed – so PVs + RSS views = total views per post/item. RSS-generated clickthroughs, although a good metric to measure, is beside the point of my post).

  4. Chris Anderson, editor-in-chief of Wired Magazine. who wrote the original The Long Tail&amp;nbsp;acticle,&amp;nbsp;has…

  5. James Dutton says:

    Great article – and very interesting.

    I know this is slightly off topic, but after reading it I decided that maybe this knowledge could be applied in a web analytics context. So I set out to see if the patterns seen in the long tail model (when using a double log scale) had any bearing on things we already knew about a website.

    I tested two sites – one with high customer satisfaction and one with know usability problems. The results were interesting, although maybe not entirely scientific in the approach has got me thinking much more about this.

    http://slicecast.com/slices/2005/9/21/using-zipfs-law-to-forecast-website-usability.html

    I’m still relatively new to RSS, but am working with Web Analytics vendors to try and figure out best practices for defining clear RSS metrics.

  6. Just days after announcing its RSS news, now eBay has confirmed that it is dropping their…

  7. Mike Levin says:

    Brilliant article, Alex. Although it’s not specifically about using RSS for the purpose, my company built a tool that helps you thicken your long tail. In the sprit of building hits in the tail, it’s called <a href="http://www.hittail.com">HitTail</a&gt;. And I like to think of the process as HitTailing.

  8. MSDNArchive says:

    thanks Mike. Will check out.

  9. Mike Levin says:

    Thanks, Alex. It looks like your system here auto-links, and I messed up the link. Anyway, it’s http://www.hittail.com I’m putting thought into how to let people similarly visualize their tail.