Is customer feedback useful in determining spam effectiveness?

This past summer when the image-spam run began, our overall spam volume started to rise.  With the rise in spam volume came an increase in overall customer complaints and spam submissions to our spam abuse inbox (ie, when customers get a spam submission, they forward it to our abuse inbox).

I used to colloquially say that we know how well our spam filter is doing by how much our customers report back to us regarding their spam in the inbox.  This leads to an interesting question - do customer submissions to abuse improve our overall spam filtering performance, and do the number of abuse submissions reflect how well (or how poorly) our filters are performing?  Intuitively, we might expect that increased submissions help us in our ability to improve our filter performance.

To determine this, I checked our historical spam filtering % (determined by taking number of messages marked as spam and dividing by volume) and checked it against our number of submissions to the abuse inbox.  I checked data for the last four weeks (N = 31) and this year's historical data (N = 240) to verify consistency.

I calculated the correlation coefficient between these two sets of data - the number of submissions and our total spam filtering percentage.  For the last 30 days worth of data, the correlation was -0.317 but multplied by the number of data points, the result was less than 10.  Multiplying correlation by N with a result less than 10 is not statistically significantly (9.82 in this case - not much but still enough).  For the year's historical data, the correlation was -0.112 and the result was statistically significant but note that the correlation is negative.

So, to answer my question, do customer submissions to abuse improve our overall spam filtering, the answer is no, they do not.  In fact, the relationship is slightly negative: more submissions means spam filtering is less effective.

However, this ties into the second question, do the number of abuse submissions reflect how well or how poorly our filters are performing?  The answer is yes, but just barely.  The results barely reflect how well our filters are performing.  But the question that arises from this is the following: are our filters performing worse because submissions are increasing, or are our submissions increasing because our filters are performing worse?  I say that it is the latter; when a spam run hits and we are not as effective, customers submissions go up.  However, these customer submissions do not help us in our overall filtering, typically we figure out a way to stop the spam on our own.  This does not mean that customers should not keep submitting spam samples; far from it, we use them to determine the best course of action.  What this does mean is that more and more spam submissions does not result in better and better filtering.

Perhaps this illustrates the law of diminishing returns?