Tag Clouds

Last week I think I started to suffer from "Blog widget envy". Everywhere I looked there were location widgets, tag cloud widgets, "Today I'm feeling ...." widgets, "Listening to Genesis, Invisible Touch" widgets. Okay, so I was less envious of the latter ones but I liked the idea of a tag cloud to smarten up my blog. I searched around and found a few implementations (I even found an animated "see how your tags have developed over time" cloud). I'm sure there was one that did what I wanted. And I'm sure someone will tell me that in response to this blog entry...

Anyway, I decided to knock something up for a couple of reasons. (1) I don't have much (any) control of my blog host. I'm hosted on blogs.msdn.com so I can't modify / add pages. All I can do is play with what customisation Community Server lets you do as a user (and that's pretty good but it wont achieve what I need here). (2) I thought it might be a fun challenge...

About the only place I can "plug-in" is the "News" area on the sidebar so that was my target. My next thought was to try and write some JavaScript to do it but that was going to be problematic for all sorts of reasons (my JavaScript's not that good for one and I'd be restricted to calling services on the host domain for security reasons but as I don't own the domain, I can't create an endpoint). So all round it was going to be a lot easier for me to create something in ASP.NET and take the hard work out of it. I planned to use the API exposed by Technorati (so I'll start tagging all my posts and move away from categories I think).

My first "cloud" was implemented as an aspx page which I just hosted in an <iframe> element on my blog page. A call to the blogposttags query yields a result  in the following format:

 <?xml version="1.0" encoding="utf-8"?>
<!-- generator="Technorati API version 1.0 /blogposttags" -->
<!DOCTYPE tapi PUBLIC 
    "-//Technorati, Inc.//DTD TAPI 0.02//EN" 
    "https://api.technorati.com/dtd/tapi-002.xml">
<tapi version="1.0">
  <document>
    <result>
      <querycount>8</querycount>
    </result>
    <item>
      <tag>Format</tag>
      <posts>2</posts>
    </item>
    <item>
      <tag>Plugin</tag>
      <posts>2</posts>
    </item>
    <item>
      <tag>Source Code</tag>
      <posts>2</posts>
    </item>
  </document>
</tapi>

So it's pretty straightforward to parse this XML, get the querycount and then iterate through each tag getting the name and the "popularity". In order to render this as html I simply used the <em> tag to emphasize the more popular tags and came up with a "normalising" function that ensures that even if one tag has 1000 posts and the others have one or two, the scaling is kept within limits.

Add a bit of CSS to set the background to be the same as the blog page and scale the font-size of the <em> element by 110%, remove the border from the iframe element and very quickly I had something that looked like a tag cloud. No links at this stage though.

To add links I simple built a URL for the search function of my blog, passing the <tag> string as the querystring. This lists all the blog entries containing the term (and of course that includes all the blog entries that have been tagged with that term).

But I wasn't very happy with the whole iframe thing so I decided to change tack and instead use an HttpHandler to return some JavaScript to the page that would render the tag cloud for me. All the logic of my ASP.NET page could be re-used, I just had to create a new HttpHandler, implement the ProcessRequest method such that what was written to the response was a piece of JavaScript including a function to write a string to the html document and a call to that function passing my tagcloud html as a parameter. Again, after wrestling with the JavaScript a little (it's hard enough without having to piece it together with lots of StringWriter.Write() calls!) I soon had a working cloud implementation using the handler which all round felt like a better approach.

What I didn't have was any caching which soon became apparent when Technorati stopped serving me any tags in response to my calls to their API - fair enough I thought. Caching the generated JavaScript response with an absolute expiration of +1 hour soon solved that one. It's not a fancy tag cloud. It doesn't do anything very clever and I'm pretty sure my "normalisation" algorithm will let me down sometime as, oh, a whole 2mins of thought went into it. But it works and it's pretty much what I was looking for (at least for now). I'm pleased with it.

I'm sure there are easier / better / quicker ways to achieve the same result. Let me know...

tags: .net, tag cloud, technorati api