What does a spam fighter do all day? Part 1b

Further to my other posts, in addition to handling false positives and processing spam (usually abuse submissions but not always), as a spam fighter we also handle IP blocklist delisting requests.

For those of you who have ever run a blocklist, you will know that there is a definite art to managing it; blocklist management is non-trivial.  There are two components:

  1. Having a process for adding IPs
  2. Having a process for removing IPs

Back in the olden days, pre-July-2006, our blocklists were not a very big part of our filtering and they were managed manually.  If we came across a spammy IP, we'd manually research it using various web tools, run it through a script to make sure that it wasn't a duplicate and then add it to our rbldns servers.  Of course, first we put it in a file which was then replicated to the rbldns servers.

That was how we handled additions.  To remove IPs, customers would write in and complain that IPs were getting blocked.  So, we manually researched the IP using various web tools, would go into the blocklist file and comment out the entry and then the list would get replicated to the rest of the network.

That was then, this is now.  Since then, edge blocks (blocks at the edge of our network without scanning the content) account for 85% of our traffic on a daily basis.  We use several different blocklists, including some of our own custom ones, and the management process is complicated.  In response to the various types of lists we use, we have procedures for the following:

  1. An automated process for scraping our logs, looking for IPs to block rather than content filter.
  2. A process for grabbing external lists (ie, lists provided by either Microsoft or by 3rd party tools).
  3. A process for delisting IPs due to customer complaints.  Different lists have different delisting procedures and criteria.

The biggest problem in maintaining internal lists is handling customer escalations (the second biggest problem is knowing when to expire the IPs off the blocklist so it doesn't continue to grow indefinitely). How do we know if the customer is requesting a legitimate delisting or is infected with a virus and is spamming?  To answer these questions, spam analysts do analysis of traffic patterns for IPs.  These traffic patterns are stored in databases, so we need processes to scrape our logs and store them in databases that are accessible to the spam team (ie, store the stats on servers that do not touch production).

To handle escalations, we needed a process for client services to escalate to the spam team to review the delisting request.  The spam team needed a process to validate whether or not to delist an IP.  As support volume grew, we needed ways to automate the easy delisting requests so spam analysts could handle the difficult cases.

In essence, there are a lot of moving parts and lots of interacting components, and it has taken a while to get them all sorted, spec'ed, coded, tested and deployed.  The bad news is that it's a lot of work, the good news is that we're getting closer to being finished.  The general goal of this process is to use technology to automate part of the listing/delisting process; of the parts that are not automated, technology will speed it up.