A bit more on the spam chief interview

Following on from my previous post on the interview with the spam chief at Yahoo, I thought I'd respond to a couple more things that Mark Risher said.

bartonas: What is the effect, if any, other than putting it back in my in-box, of me selecting "not spam" for an email in the spam folder?

Mark: We've got some incredibly sophisticated systems trying to analyze the messages our users mark as "spam" and "not spam." We're constantly analyzing the feedback from users like yourself to figure out how we can improve.  The effect of clicking "not spam" on a message is that it sends a powerful signal to our systems that we've made a mistake. That's one of the best ways we can learn, both to ensure that we don't block messages from that sender in the future, and that our systems shouldn't block similar messages next time.

Here in Exchange Hosted Services, we have a couple of ways to access your spam.  One is through the Spam Quarantine web interface, and if you click "Not Spam" a copy of the message goes to the spam team for analysis in addition to being salvaged to your inbox.  While there is some automated filtering and sorting done on the back-end, unlike Yahoo, our techniques are not quite as sophisticated.  We rely more on human analysis to make decisions.  We do this because, in my opinion, human analysis on false positives is more accurate.

Humans look at messages and adjust spam rules, but they also make determinations about whether or not a message is spam or not.  Most submissions sent to the false positive alias are actually spam, so a great deal of pre-processing is required before adjusting messages to separate the wheat from the chaff.  After that, the spam analyst makes a decision to release the message and updates the spam rules or reputation filters according.

opher: SMTP requires a confirmed IP address between the sending and receiving servers. That means spammers can spoof the NAME of the sending server, but not the IP address. Since Yahoo knows the IP address of all of their mail servers, why not validate the IP address and when it does not match, drop the spoofed email?

Mark: Yahoo! has been a pioneer in advancing e-mail authentication — the ability to conclusively identify that a message that says it comes from somebody really comes from that somebody — and was the inventor of the open source DomainKeys and DKIM technologies. As we see the adoption of these technologies continue to take off, we’re exploring ways to take action against messages that “spoof” a Yahoo! origin. You're right that IP address is one of the few, truly trustworthy parts of an inbound spam message, and it's a major factor in our determination of whether a message is spam.

What opher is asking is if Yahoo knows its IP addresses and an email comes from Yahoo but is not sent from any of its IP addresses, why not reject the message?  This is basically SPF/SenderID, something that Yahoo does not do (or if they do, you certainly couldn't tell and they don't publish SPF records either).

Far be it from me to criticize another antispam company, but I think that this is a flaw in Yahoo's spam filtering service.  SPF is a pretty basic way to filter spam.  DomainKeys and DKIM don't cut it because both only say what to do in the case an email is authenticated; it says nothing about what to do if a message fails a DomainKeys/DKIM check, and it says nothing if a message should even be signed.  In fact, it says treat it as unsigned mail (ie, neither confirm nor deny).  I seriously doubt spammers would take the time to DKIM sign their mail, so using those two technologies to fight spam would have minimal impact unless you did a custom DomainKeys/DKIM implementation.

In my view, Risher's comment is a diplomatic way of saying "Yes, we should use SPF but we don't."  It's better for fighting spam than DKIM as the moment.