Blog Log Analysis

I’ve been keeping a blog for about two months now, and I thought it would be an interesting
exercise to do some analysis of the logs. The blogging application that this site
uses (BlogX)
records the daily hits each blog gets into a tab-delimited file, so I used Data Transformation
Services to clean the data up a bit and import it into SQL Server, and then finally
used Analysis Services to create a multidimensional cube that I could manipulate with
Excel. This process worked very smoothly, and saved the need to purchase a specialised
web reporting tool. I’ll document this process more fully at a later stage, but the
information gleaned from the analysis was quite revealing about the current status
of the blogging world:

  • At the moment my blog averages around 40,000 hits per month. I’ve no idea how
    that compares to other blogs out there, but knowing that your blog is read is definitely
    a motivating factor when writing new entries! I suspect that most people stumble across
    this blog because it’s posted on the main GotDotNet
    page; I’m certainly under no illusions that it’s to do with any personal
    fame. Like any other website, one of the biggest challenges of a blog is capturing
    and maintaining traffic to the site. For bloggers without the inherent advantage of
    working for Microsoft, aggregation sites such as PDC
    are probably one of the best ways to spread the word.
  • I’m amused and amazed at how many people have wound up at the blog by means of a Google
    search. Unsurprisingly, searching for “Tim Sneath” brings the blog more or less to
    the top of the results, but I’ve had hits that have come from such bizarre search
    terms as “lossless wma”, “Sitar music that you can listen to on the net”, and “Frank
    Zappa AND Albanian Music”! Approximately 5% of browser hits to the site come via Google;
    other search engines might as well not exist for the traffic they bring.
  • There’s an astonishing variety of blog aggregators and browsing tools in use: I counted
    over 500 distinct user agent strings. Of the aggregators, various variants of SharpReader are
    the most popular, with a 46% share; Newsgator comes
    next with 23%; NewzCrawler has a 5% share,
    and many others have a smaller share. (Incidentally, 8% of visitors have an empty
    useragent string, a surprisingly high number.) I’m a SharpReader user myself; although
    I’ve never done an exhaustive survey of aggregation tools, I’ve certainly heard good
    things about Newsgator. What’s NewzCrawler like (I’ve not come across it before)?
  • The most popular blog entries have been ADO.NET
    Tips and Tricks
    , Mind
    and New
    C# Features in Whidbey
    . The last of the three can be explained by a link from Robert
    ‘s immensely popular blog, but the other two were a little more unexpected.
    I’ll write more on ADO.NET shortly.
  • Traffic drops by about 20% at the weekend. I was expecting that to be higher, but
    I guess many people leave their computers on permanently, so the aggregators continue
    to poll for new content.

Overall it’s been an intriguing experiment. I look forward to repeating it in a couple
of months to see whether there have been any noticeable changes of trend as weblogging
continues to mature.

Comments (3)

  1. Anonymous says:

    Hi! Is there a general place I could to work up a solution like yours? i’m trying to analyse my blog, and I like your solution, but I know nothing about datacubes 🙂

  2. Anonymous says:

    Addendum: Have now written up the "HOWTO" for creating the Analysis Services cube at the following location:

  3. Anonymous says:

    Here’s another google search that put your blog on the 1st page – "most popular blogs". That should keep you writing. If you find yourself ever at a loss for something to comment on you can go to the Random Thoughts section of for a collection of rarely, if ever, voiced ideas.