Grr… Blog Comment Spam!

I hate spam… both the physical and the electronic form. But for the moment… I am talking about the electronic form. 🙂

Ever since the upgrade a month and a half ago, the volume and frequency of blog comment spam has spiked for me. I do not know if it was just coincidence or not… what’s changed in CS 2.0?

However, it really bugs me because this is the single biggest source of spam in my inbox (hey there, I’m not calling your *other* email comments spam! 😉 ). I really have no idea what the spammers want to get out of sending comments onto people’s blogs. Is it just chasing after eyeballs and hoping someone will click through? Or something else?

Anyways, I am concerned because contrary to most spam, which I easily ignore, I actually open all the emails sent from my blog email subscription. This raises my eyebrows because it is an obvious vector for an email virus. Not that I am 100% concerned because I run as non-admin, so such virus cannot do much to my computer as myself, but just the same, I hate unscrupulous people taking advantage of my open invitation to help and listen to others. Oh, I hate having to clean off those useless blog comments along with trashing their email notifications in my inbox, too.

So, for the moment, I am going to turn off getting email notifications of public comments on my blog entries and just check blog stats every once in a while for comments. This should not affect your ability to send private comments via email; it just affects my promptness when you make a public comment on a blog entry.

Now, I have been thinking of proposing a blog comment filter which basically rejects comments that include “href” links, IP address, or URL address. I think that spammers ultimately want to direct eyeballs to some online website and thus must send out valid href, IP, or URL to them. So, I want comments matching that criteria to be disapproved and ignored by default. I know, I know, some of you actually make comments that include links, but from observation, the ratio of spam containing links vs. legitimate comments containing links is so high that I am willing to make the occassional effort to re-approve your comments.

Now, the astute reader notes that spammers may start mutating their comment strings to get around filters looking for href, IP, or URL… but I think they lose this time because these mutated strings will not be recognized as a link by email clients and likely cannot be be copy/pasted directly into a browser… so the filter simply raises the bar on getting the cheap eyeball click-thru link, hopefully enough to hurt the spammers profit margin so that they go focus on something else?

In other words, in the spam filtering arms race, I rather go with low-tech and deterministic instead of fancy content analysis that has a non-deterministic false-positive and false-negative rate. Also, I rather aim to make it more expensive for spammers to spam and remove their leverage of the Internet’s reach and efficiencies of scale… than to allow them to make me solve computational problems of “is it spam or not”.

Humans are amazingly adaptable creatures which work well with deterministic patterns… and non-deterministic patterns frustrate.


Comments (8)

  1. BlakeHandler says:

    Hey you’re lucky — your blog doesn’t support trackbacks as well! Not only do MSN Spaces bloggers also get email and comment spam — we get trackback spam as well! (A new buzzword: pork back?)

  2. Kev says:

    How about MSDN blogs implement a hip-captcha type thing ?

    I use dasBlog and it has a similar device and it’s cut my comment spam right out.

  3. Geezz. I got the same problem. In fact before this, I just deleted over 20 junk comments. I don’t have email notification, I will just go in on and off and look at the latest feedback. what I don’t get it is that sometime the anonymous comment get published, even I have configure ‘moderate anonymous comment’. 2 of 10 will get through.. that puzzle me.

  4. David.Wang says:

    Bernard – yeah, I don’t know about comment moderation… I basically allow anyone at any time to make a comment, so I’d never notice.

    Now, I have noticed that only a few posts are attracting attention of the spam, but spam avoidance is probably not a valid strategy here…

    I have been experimenting with not having email notification and just periodically cleaning things up, and it seems to be ok right now. I’ll let it sit for a few more days before I decide one way or another.


  5. David.Wang says:

    Kev – I agree that HIP/CAPTCHA is cool and works right now. However, I suspect it will not be a long term solution.

    It just escalates the spam war by saying "it is computationally hard to translate images to text", to which a determined spammer could pass that test through brute-force pattern match (suppose they hire cheap human capital to do the translation and cache the results).

    Somehow, we have to make it expensive to send spam, not make it cheap to send spam and then spend productivity cycles dealing with its proliferation.


  6. David Wang says:

    Three months ago, I ranted in this blog entry about blog comment spam. Well, it appears that the arms…

  7. River sand says:

    Thank you for posting this post, we apperciate your post on some of you actually make comments that include links, but from observation, the ratio of spam containing links vs. legitimate comments containing links is so high that I am willing to make the occassional effort to re-approve your comments.