A/B testing with napkin math in OneNote

One of the email aliases to which I subscribe here at work centers on testing software (imagine that!) but keeps questions to a more general level. It stays away from discussions such as "How do I test performance a DAV stack implementation?" and centers on "How can I tell when I can equivalence class two test configurations?"

An email came through last week about A/B testing. While this is generally a marketing term (the link has some good details) it also applies to testing. For testing, this means that we need to verify multiple ways of generating data all give the same output expected. There are plenty of examples of this - sorting items (bubble sorts, heap sorts, etc…), drawing circles, defragmenting a hard drive, etc… which should all produce the same results when done, but may be implemented differently for a variety of reasons. I want to focus on just one example.

For instance, imagine you are tasked with testing a square root function in your code. There are many ways to implement this (Netwon's, Bisection, Euler are a few) and they may each give different results depending on how long you let each run. Let's simplify the task to only verifying the accuracy of the result and ignore performance considerations. This is not as crazy as it may sound - on modern machines, to get 12-15 digits of precision takes very few milliseconds no matter the algorithm you use - so we can ignore that for right now.

The question then becomes "What do I expect to get if I ask the code to compute the square root of several different numbers, such as -1, 0, 1, 4 and 17?" We can easily tell by inspection what the first four of these should be but 17 is going to be a little trickier.

So how do we get the result for 17? If the code uses Euler's method, and we use Euler's method, we're just repeating the code we are trying to test. We can verify we get consistent results, but not accurate results. If we were testing compasses, as an analogy, we could verify that all the needles point the same direction by comparing them to each other, but we have no way of telling if that direction is north, which is what we really want.

We need to implement a second method (ideally, maybe a third or fourth) and then compare the results. That's actually what we did for the -1, 0, 1 and 4 cases already. For those, we used our brains to compute the results and did not need a computer at all. Testing that we should get (either an error or i ), 0, 1 and 2 respectively is exactly the right test.

For 17, we have some choices to make. If we know we are testing Euler's method, we can use Newton's and/or the bisection method to compute the expected results. We may already have that at hand - Java, I think, uses Newton's method, or we can solve this by hand (with most tutorials using the bisection method). In any case, once we have our expected answer computed using a method or methods other than Euler, we can check the Euler results to see if it is expected.

Once we have the expected results we can automate this test to verify that the results are expected. And once we know that sqrt(17)=4.123105625617661 , we can hard code this expected answer in our automated test. At this point, we know that Euler and the methods we used all agree that this is the expected answer, and then if anyone ever changes how our application computes square roots, we can detect if the change was for the worse.

I hope this makes sense.

Questions, comments, concerns and criticisms always welcome,


Comments (0)

Skip to main content