Blog Log Analysis

I've been keeping a blog for about two months now, and I thought it would be an interesting
exercise to do some analysis of the logs. The blogging application that this site
uses (BlogX)
records the daily hits each blog gets into a tab-delimited file, so I used Data Transformation
Services to clean the data up a bit and import it into SQL Server, and then finally
used Analysis Services to create a multidimensional cube that I could manipulate with
Excel. This process worked very smoothly, and saved the need to purchase a specialised
web reporting tool. I'll document this process more fully at a later stage, but the
information gleaned from the analysis was quite revealing about the current status
of the blogging world:

- At the moment my blog averages around 40,000 hits per month. I've no idea how that compares to other blogs out there, but knowing that your blog is read is definitely a motivating factor when writing new entries! I suspect that most people stumble across this blog because it's posted on the main GotDotNet
blogs
page; I'm certainly under no illusions that it's to do with any personal fame. Like any other website, one of the biggest challenges of a blog is capturing and maintaining traffic to the site. For bloggers without the inherent advantage of working for Microsoft, aggregation sites such as PDC
Bloggers
are probably one of the best ways to spread the word.

I'm amused and amazed at how many people have wound up at the blog by means of a Google  
search. Unsurprisingly, searching for "Tim Sneath" brings the blog more or less to  
the top of the results, but I've had hits that have come from such bizarre search  
terms as "lossless wma", "Sitar music that you can listen to on the net", and "Frank  
Zappa AND Albanian Music"\! Approximately 5% of browser hits to the site come via Google;  
other search engines might as well not exist for the traffic they bring.  
  • There's an astonishing variety of blog aggregators and browsing tools in use: I counted
    over 500 distinct user agent strings. Of the aggregators, various variants of SharpReader are
    the most popular, with a 46% share; Newsgator comes
    next with 23%; NewzCrawler has a 5% share,
    and many others have a smaller share. (Incidentally, 8% of visitors have an empty
    useragent string, a surprisingly high number.) I'm a SharpReader user myself; although
    I've never done an exhaustive survey of aggregation tools, I've certainly heard good
    things about Newsgator. What's NewzCrawler like (I've not come across it before)?
  • The most popular blog entries have been ADO.NET
    Tips and Tricks
    , Mind
    Mapping
    and New
    C# Features in Whidbey
    . The last of the three can be explained by a link from Robert
    Scoble
    's immensely popular blog, but the other two were a little more unexpected.
    I'll write more on ADO.NET shortly.
  • Traffic drops by about 20% at the weekend. I was expecting that to be higher, but
    I guess many people leave their computers on permanently, so the aggregators continue
    to poll for new content.

Overall it's been an intriguing experiment. I look forward to repeating it in a couple
of months to see whether there have been any noticeable changes of trend as weblogging
continues to mature.