In the next little while, my role as a spam analyst is going to be transitioning from analysis to a more research role. Some of my duties will involve researchings trends.
I am a part-time stock speculator (very amateur… I am not making a fortune but I am treading water in 2006 despite it being a very tough year). I picked up a book recently entitled “Practical Speculation” by Victor Niederhoffer. One of the chapters is entitled “Avoid Spurious Correlations.” In the chapter, he goes into how to apply statistical methods to look for correlations between events that may influence the markets. In particular, he touts the use of scatter charts.
I had never thought of using scatter charts for the stock market but I did a little bit of scripting on my own time. I then realized that scatter charts could have a tremendous practical application in spam fighting. For example, let’s assume we have 5 different methods of blocking spam (say, a bayesian filter, blacklists and some other secret methods). Is there any relation between the total amount of spam blocked and increasing the sensitivity of the bayesian filter? What are the relationships between total volume of spam and the amount filtered by blacklists? Does an increase in the sensitivities of different internal tools have a correlating effect on false positives? I’ve already done a little bit of work on this and some of the things I have found have surprised me (such as having zero correlation between two of our internal filtering tools – meaning they both filter completely different types of spam).
These are interesting ideas to consider (at least, they are interesting to me) and I look forward to researching them over the next few weeks and months.