Quantifying Readers

I have many tools for measuring and analyzing reader habits. I'm not sure if this is actually a good use of time. I don't think any of the tools give me particularly useful information that I can trust.

For example, MSDN gives me both continuous counts on the number of readers and a monthly sum. The two simply don't agree in any meaningful way. The continuous counts can show a huge increase throughout the month but the monthly total doesn't move significantly and vice versa. I've seen the continuous count be flat for several weeks but the summary number claims that there was significant month-over-month growth.

I've tried external services like Google and Technorati as well to see how they perform. Google has a very slick interface for analyzing data but the data itself is just as suspect as what comes from MSDN. Each data source simply disagrees with all the others about both the magnitude and percentage change. They occasionally don't even agree with each other on the direction of movement. Technorati and Feedburner give me different kinds of data but inspecting that data in context makes me skeptical of its accuracy.

The sampling of demographic data is probably the most correct. When Google tells me the top countries and cities of origin, I think they can do a reasonably accurate job of guessing that information from the traffic data. It's not possible to be completely accurate because they have to infer the information from an IP address and the networking technology involved will introduce uncertainty. These inferences have gotten pretty good over the last few years though because there's a certain amount of locality to the data. Your IP address may not perfectly associate to your hometown but it most likely associates with a geographically nearby location. This bit of noise is irrelevant at a large scale. I care more about distinguishing San Francisco from Paris than San Francisco from Oakland. Of course, the numbers are from an unknown subset of readers but it may seem reasonable to assume that there is no significant bias between the ability to sample and the reader's locale.

In the end, it turns out that the least detailed source of information is the one that I have the most confidence in. The monthly MSDN number gives a single value summing all readers across all pages for all days in the month. It doesn't tell me a thing about who or where those readers are. I have no clue if the magnitude is actually correct. However, the way the number moves over time is the closest to matching the reactions I get from talking to actual people. All of the tools at hand don't make the picture any clearer.