I hate to brag (no, wait, I love to brag), but once again I have been proven right.

One the problems with getting accurate statistics about false positives is that users quite regularly submit them late. So, assume for the week of Dec 3 – Dec 10 we report that we had 100 false positives. One week later, we report that the week of Dec 3 – Dec 10 had 188 false positives. This is a net change of 88 FPs! What happened?

For the longest time, I intuitively knew this. When I was processing FPs, I always saw FPs submitted by people that I knew I had fixed. I began to get a good feel for how late people submit them and found that once we reach the 3-week mark, there is little chance that an FP that occurred 3-weeks ago will be submitted. Said another way, there is little chance that an FP that occurred on Nov 27 will be submitted to us on Dec 18.

I also found that while after 1 week we could get a good feel for how many FPs would be submitted, it was not enough time for them all to come in. After 2 weeks we could get a pretty good representation of what the numbers would eventually look like. For example, suppose it’s now March 10, 2008. The final FP numbers for Dec 3-Dec 10 are 197 FPs. Well, on Dec 17, the numbers for Dec 3 – Dec 10 would say 180 FPs. That’s pretty close to what the final numbers will be.

As I was saying, these lag times were estimated by me based on experience and intuition. A couple of weeks ago, I finally got around to actually writing some scripts and tracking data in databases. Here are the numbers with regards to false positives:

- 11% are submitted on the same day they occur.
- 20% are submitted the day after they occur.
- 50% are submitted in less than 3 days after they occur.
- The remaining 50% are submitted up to 12 weeks afterwards.
**99% are submitted within 3 weeks after they occur.**

So, the numbers back up my experience and intuition. This illustrates a point I have been talking about for months – my intuition for dealing with spam is often correct and rarely is contradicted by actual data.

PingBack from http://geeklectures.info/2007/12/18/once-again-im-proven-right-about-false-positive-lag-time/

Luckily spams have reached the point where they can be judged more easily than software projects.

Traditional question: Will this software project be a failure? How to answer: Yes. With this answer, you’ll be right 90% of the time, and you don’t have to waste money doing an analysis.

Modern question: Is this e-mail spam? How to answer: Yes. With this answer, you’ll be right 95% of the time, and you don’t have to waste money doing an analysis.

In fact, the way companies like to measure false positives, we can just answer yes to everything and the ratio of false positives will continue going down.