My take on blacklists, part 2

Article
07/01/2009

I'm going to attempt to summarize a blocklist without going to the article on Wikipedia. I'll be doing this straight off the top of my head.

Motivation

A blocklist is essentially a shortcut to spam filtering. Assume that you have a content filter that is doing all of the work of filtering, faithfully executing and flagging messages as spam. Everything is great except that the spam filter is doing a lot of work and occasionally, the odd spam message or two slips through. You can live with this if all you are filtering is 10,000 messages per day.

But imagine you are filtering 10 million messages per day. Suddenly bandwidth becomes an issue because most of your bandwidth is being taken up by useless data (spam). In addition, if your filter is "only" 99% effective, 100,000 spams are still leaking through to end users. If your organization has 10,000 users (a good size company), then that's about 10 spams per day to the end user.

You need a way to make this work better.

Methods

You sit down one day and start pouring through your spam samples that your end users are submitting to you. "What's this?" you say out loud to no one in particular. You observe that while the spams have no particular pattern, you do notice that they seem to be coming from a narrow set of IPs. Let's say that out of 100 messages, you see the following pattern (I'm using hypothetical IPs):

IP	Spam Count
292.144.16.11	16
292.144.16.17	15
292.144.16.19	22
292.144.16.22	18
292.144.16.27	29

"That's odd," you say again. "There seems to be a lot of IPs in that range." You do a quick WHOIS lookup of that IP and you find that the IP space is owned by the organization Canadian Pharmaspammers. "Well," you exclaim, "if these guys own those IPs, I should flat out block them all! It is very unlikely that they will ever send out anything legitimate." How do you know this? Spammers never change their spots. If a spammer sends out this much spam from these IPs, at that level of volume (100 messages randomly sampled) then you can safely conclude that they will never send out anything else.

You decide to add all five of those IPs to your own blocklist. Anything that hits your network that comes from those IPs you will reject (how this works we'll get to in a future post). You've now saved your end-users from getting spam from these IPs.

Refinements

You wipe your hands and assume the problem is solved. But it's not; users are still getting Canadian Pharmaspam! Once again, you start to grab the spam samples and looking at the connecting IP. The content is all different -- again -- but the IPs look familiar:

IP	Spam Count
292.144.16.12	19
292.144.16.14	17
292.144.16.18	18
292.144.16.21	20
292.144.16.26	27

Those IPs look similar to the IPs you previously blocklisted. You have no spam from those other IPs, but lots of spam from its sister IPs. Once again, you decide to do a WHOIS look up on that IP and notice something you didn't see before. It's listed to Canadian Pharmaspammers, but they also own the netblock 292.144.16.0/27 -- a netblock of 32 IPs. You decide to get pre-emptive; you go into your personal blocklist and remove the previous five IPs and instead insert 292.144.16.0/27. You have now listed the entire range of IPs. You only have evidence from 10 different IPs but strongly suspect that spam is coming out of all of them, and therefore you engage in a pre-emptive strike. You list the IP range, cross your fingers and hope for the best.

The next day you check your spam stats and notice something; rather than content filtering 10 million messages per day at the content filter, your upstream IP filter has cut that down to 1 million per day! Gah! That's a reduction of 90%! Your content filter is flying! Furthermore, the amount of spam complaints has gone down from 100 per day to 20 per day, a reduction of 80%. By adding these IPs to the blocklist, you have accomplished two things:

Users are seeing less spam in their inboxes because while your filters are good, there may be gaps. This blocklist fills in those gaps.
You have saved a good chunk on bandwidth and spending precious resources on less and less junk.

Those are the two basic uses of blocklists. A third would be spam filter automation and leveraging the work of others, but we'll get to that in a future post. But by and large, these impacts are immediately noticeable by everyone using the service and therefore, the use of blocklists eventually becomes indispensable if you want to run a filtering service.

My take on blacklists, part 2

Additional resources