Once again, I'm proven right about false positive lag time

I hate to brag (no, wait, I love to brag), but once again I have been proven right.

One the problems with getting accurate statistics about false positives is that users quite regularly submit them late.  So, assume for the week of Dec 3 - Dec 10 we report that we had 100 false positives.  One week later, we report that the week of Dec 3 - Dec 10 had 188 false positives.  This is a net change of 88 FPs!  What happened?

For the longest time, I intuitively knew this.  When I was processing FPs, I always saw FPs submitted by people that I knew I had fixed.  I began to get a good feel for how late people submit them and found that once we reach the 3-week mark, there is little chance that an FP that occurred 3-weeks ago will be submitted.  Said another way, there is little chance that an FP that occurred on Nov 27 will be submitted to us on Dec 18. 

I also found that while after 1 week we could get a good feel for how many FPs would be submitted, it was not enough time for them all to come in.  After 2 weeks we could get a pretty good representation of what the numbers would eventually look like.  For example, suppose it's now March 10, 2008.  The final FP numbers for Dec 3-Dec 10 are 197 FPs.  Well, on Dec 17, the numbers for Dec 3 - Dec 10 would say 180 FPs.  That's pretty close to what the final numbers will be.

As I was saying, these lag times were estimated by me based on experience and intuition.  A couple of weeks ago, I finally got around to actually writing some scripts and tracking data in databases.  Here are the numbers with regards to false positives:

  • 11% are submitted on the same day they occur.
  • 20% are submitted the day after they occur.
  • 50% are submitted in less than 3 days after they occur.
  • The remaining 50% are submitted up to 12 weeks afterwards.
  • 99% are submitted within 3 weeks after they occur.

So, the numbers back up my experience and intuition.  This illustrates a point I have been talking about for months - my intuition for dealing with spam is often correct and rarely is contradicted by actual data.