Estimating Hidden Bug Count -- Part 3/3

Part 4: Step By Step Guide

This is just a summary of the previous chapters as a flow chart (click here for the derivation of the method):

Here variable meanings are:

External bugs, or E – the count of active (not fixed) bugs reported against the product externally
Internal bugs, or I – the count of active bugs reported against the product internally
Shared bugs, or S – the count of active bugs reported by both internal and external sources
E_f, I_f, and S_f are the same categories as above but fixed (not active)
β is the ratio of how faster Shared bugs are fixed in comparison to the Internal bugs.
f is the fix rate per (small) unit of time relative to bugs quantity. For example, if out of 20 product bugs 2 are fixed per day, the fix rate is 2/20 = 0.1/day.
B₀ is the count of all bugs (known and hidden) in the product when versions were split
B is the current bug count (both known and unknown bugs)
t is the observation time

That’s it – I hope you find it useful for assessing the hidden part of your product's security "iceberg".

Obviously, the method has its limitations, and some of them should be named:

It's not very precise. Statisticians can easily point out numerous reasons why. My experience suggests that its results should be treated as approximate with an error margin on the order of ±2x.
You need both internal and external bugs to be filed against the product, which is rather embarrassing. Yes, there are methods that don't require that (i.e., they can work with internal sources only), but those are more complex.
…and you need those bugs in statistically significant quantities. I'd say at least about 10 Internal and External bugs, and at least 3-4 Shared among them.
Detecting Shared bugs requires sound engineering practices and great bug tracking discipline in the product.
It is assumed that Internal and External bugs are not correlated. That means both sources work independently and don't know anything about each other. In practice, some correlation might exist, both positive ("look where the attackers are focusing") and negative ("search for known bugs before filing your own ones").
The method would have difficulty with continuously updated online service, since their time-to-fix window is so short that virtually no shared bugs have any chances to arise.
The results apply, technically, to only one of two product versions. To carry over the conclusions to both versions, codebases must remain similar.

That's it. Thank you for reading and have a great day.