Grr... Blog Comment Spam!

I hate spam... both the physical and the electronic form. But for the moment... I am talking about the electronic form. :-)

Ever since the blogs.msdn.com upgrade a month and a half ago, the volume and frequency of blog comment spam has spiked for me. I do not know if it was just coincidence or not... what's changed in CS 2.0?

However, it really bugs me because this is the single biggest source of spam in my inbox (hey there, I'm not calling your *other* email comments spam! ;-) ). I really have no idea what the spammers want to get out of sending comments onto people's blogs. Is it just chasing after eyeballs and hoping someone will click through? Or something else?

Anyways, I am concerned because contrary to most spam, which I easily ignore, I actually open all the emails sent from my blog email subscription. This raises my eyebrows because it is an obvious vector for an email virus. Not that I am 100% concerned because I run as non-admin, so such virus cannot do much to my computer as myself, but just the same, I hate unscrupulous people taking advantage of my open invitation to help and listen to others. Oh, I hate having to clean off those useless blog comments along with trashing their email notifications in my inbox, too.

So, for the moment, I am going to turn off getting email notifications of public comments on my blog entries and just check blog stats every once in a while for comments. This should not affect your ability to send private comments via email; it just affects my promptness when you make a public comment on a blog entry.

Now, I have been thinking of proposing a blog comment filter which basically rejects comments that include "href" links, IP address, or URL address. I think that spammers ultimately want to direct eyeballs to some online website and thus must send out valid href, IP, or URL to them. So, I want comments matching that criteria to be disapproved and ignored by default. I know, I know, some of you actually make comments that include links, but from observation, the ratio of spam containing links vs. legitimate comments containing links is so high that I am willing to make the occassional effort to re-approve your comments.

Now, the astute reader notes that spammers may start mutating their comment strings to get around filters looking for href, IP, or URL... but I think they lose this time because these mutated strings will not be recognized as a link by email clients and likely cannot be be copy/pasted directly into a browser... so the filter simply raises the bar on getting the cheap eyeball click-thru link, hopefully enough to hurt the spammers profit margin so that they go focus on something else?

In other words, in the spam filtering arms race, I rather go with low-tech and deterministic instead of fancy content analysis that has a non-deterministic false-positive and false-negative rate. Also, I rather aim to make it more expensive for spammers to spam and remove their leverage of the Internet's reach and efficiencies of scale... than to allow them to make me solve computational problems of "is it spam or not".

Humans are amazingly adaptable creatures which work well with deterministic patterns... and non-deterministic patterns frustrate.

//David