Can a computer fight spam better than a human, part II

Article
04/12/2007

Leaving aside the question of email authentication for the time being, is it possible for a computer to recognize spam the way a human does?

Even when it comes to human verification of the contents of spam, even we use heuristics to make quick judgements. Computers do the same thing. If we know that cousin Jim always has a get-rich-quick scheme, we'd probably never invest money with him because we wouldn't expect to make a decent return. Similarly, if we know certain IPs are reputed to always send spam, we simply refuse to accept email from them and presume that the odds of receiving legitimate mail is next to nothing.

The problem with content filters is that they are primarily blocking spam based on what they have seen before. If they have rules engines (regexes) they can add some flexibility but it is still based on a corpus of known spam and the skill of foreseeing possible variations of the spam rule writer. In this regards, I give an edge to the human. I can draw from my own experience when I write a spam rule to foresee possible combinations of future spams without ever having actually seen that fictional spam.

On the other hand, my experience is exactly the point; I can foresee possible combinations of future spam because I have seen other spam in the past but can also relate it to stuff I have seen in real life. Being inundated with pop culture allows me to make guesses about the possible interpretation of new spam. In this regard, if a computer were allowed to digest a corpus and create spam rules (Bayesian, spam rules, distributed checksum, etc) based off of it, and somehow figured out a way to learn pop culture and incorporate that into a "guessing" algorithm, I think that would definitely give computers a leg up.

I can think of one application: when it comes to content blocking porn spam. This is less applicable now because porn spam is such a smaller piece of the spam pie, but the point remains. When spammers switch around words and come up with such creative new subject lines to describe what people are doing to each other, to a computer that means very little. It's simply random bytes. To a human, we can interpret what those subject lines mean based on our familiarity with slang words of describing certain acts and assigning meaning to them, even if they are heavily misspelled. We can assign familiarity to these words based upon our own experiences with pop culture. If a computer could learn from pop culture, it could pre-emptively determine (based on probability, I suppose) whether or not a message is spam. I would think this could also work for stock spam when a spam filter sees an image with text beneath it and is able to determine that the text contains meaningless words.

The question remains on the plate - can a computer fight spam better than a human? A human is much better at assigning and interpreting meaning of words. Eventually, I would expect that computers will "learn" to see words in a variety of contexts and behave a little smarter. Until then, I think humans still retain an edge.

Can a computer fight spam better than a human, part II

Additional resources