# Murky Research

No computers today, but some interesting - and important - math. (And, happy Canada Day, Canadians!)

"Car Talk" is a popular weekly phone-in program that has been on National Public Radio for several decades now, in which Bostonian brothers Tom and Ray crack wise and diagnose car (and relationship) problems. On many programs they feature a "puzzler". Here's the puzzler from a couple of weeks ago, which I reprint in full:

Tim and Jethro were happy to have their jobs at the new self-serve gas station in town. And, since the Farmer's Almanac had predicted this to be the coldest winter since the last ice age, they were happy to be working indoors, while the customers pumped their own gas.

This station was so modern that it had a video camera for each of the pumps, and a TV monitor that would show the rear of everyone's vehicles as soon as they pulled up to the pumps.

When the boredom of their jobs finally set in, Tim and Jethro began playing a little game. The game involved trying to figure out which customers had pulled up to a pump with the fuel door on the wrong side-- that is, facing away from the pump.

Now, they couldn't see the cars pull in to the gas station. The video cameras were only aimed at the back of the vehicles. So, there was no time during which they could see the side of a vehicle where the fuel door was located. They could only see the vehicle after it was in position to refuel.

They had to make their bets before the driver shut off the key and exited the vehicle-- before he dope slapped himself for pulling in on the wrong side.

Jethro was correct 99 percent of the time. Tim was correct about 50 percent of the time, because he was just guessing.

What did Jethro know that enabled him to tell when a driver had pulled up to the pump with the fuel door facing the wrong way?

When I heard that puzzler, the answer was immediately obvious to me. Imagine my surprise when the answer that was immediately obvious to me was not the answer Ray gave this past Saturday! The answer given by Ray was that in almost all cars, the tailpipe is on the opposite side of the car from the fuel door. Jethro knows that if a car pulls up with their tailpipe on the same side as the pump then they're on the wrong side. Jethro is 99% accurate because almost all cars you actually see on the road follow this pattern; very few cars have a tailpipe in the middle, have two tailpipes, have the fuel door in the middle, have the tailpipe on the same side as the door, or other anomalies.

Now, I'm not willing to go as far as Tommy sometimes does and call BOOOOOOOOOOOGUS! on this one; I believe them that this is a 99% reliable heuristic for deducing the side the fuel door is on, and that if Jethro knew that, then Jethro could consistently defeat Tim. (I note that we are not given the parameters of how Tim is guessing, but we have reason to assume that Tim is guessing by some process akin to flipping a fair coin.) Clearly, when Jethro and Tim play this game, about half the games are going to be ties (because Tim is guessing right at random), but of the non-ties, Jethro is going to win most of the time. I'm not disputing that.

However, there is an extremely important aspect of this analysis which has been ignored. The percentage of drivers who drive up to the wrong side we know from experience is low. The only times I have driven up to the wrong side of the pump and had to back out is when in a borrowed or rented car (and many cars now have an arrow on the fuel gauge telling you where the fuel door is.) The vast majority of customers will pull up to the correct side. This additional fact, not given in the statement of the puzzler but reasonably assumed, changes the analysis.

Let's call a car which pulls up to the wrong side a "positive" (for reasons which will become apparent later) and a car which pulls up to the correct side a "negative". Jethro and Tim each make a prediction of whether a given car is going to be a positive or a negative. Let's call a car whose fuel door side can be correctly predicted by Jethro to be a "normal" car and one which cannot, an "unusual" car.

Let's assume that the percentage of "positive" drivers is 1%. Notice that this is the same percentage of cars that Jethro's heuristic predicts incorrectly. I've made the assumption that these percentages are about the same deliberately but really all that matters is that they are both small. We'll do a bit of fiddling with the numbers later to see what happens when we change those around. But for now, just assume that it is a coincidence that the rate of boneheaded drivers is roughly the same rate as the number of cars that Jethro cannot correctly predict the side of the fuel door.

Given that reasonable assumption, Jethro does not need to have his fancy heuristic in the first place! Suppose we replace Jethro with Bob, who always bets negative, regardless of the position of the tailpipe. This strategy is 99% accurate because 99% of the drivers are negatives!

That was the solution that was immediately obvious to me: deny the premise of the question. You can defeat Tim's coin-flipping random strategy by simply observing that negatives vastly outnumber positives; it is reasonable to always bet on negative.

Suppose Bob and Tim play this game a million times  (it's a long winter in Boston) and Bob uses the "always negative" strategy, which, as we know, will be accurate 99% of the time solely on the basis of distribution of negatives vs positives. On average there will be 990000 negatives and 10000 positives. Bob will predict the negatives with 100% accuracy and Tim will predict them with 50% accuracy. So for those 990000 cases, Bob wins 495000 times, ties 495000 times, and loses never. Bob will predict the positives with 0% accuracy, Tim will predict them with 50% accuracy. So for those 10000 cases, Bob wins never, ties 5000 times, and loses 5000 times. Of the million games, Bob wins 495000 times, ties 500000 times, and loses 5000 times. As we'd naively expect, Bob wins 99% of the games which are not ties.

Now suppose Jethro and Tim play this game a million times and Jethro uses his fancy tailpipe strategy, which is 99% accurate on the basis not of distribution of negatives, but on ability to detect normals. The analysis appears to be just the same as before: There will be 990000 normals and 10000 unusuals. Jethro will predict the normals with 100% accuracy, the unusuals with 0% accuracy, blah blah blah, and will win 495000 times, tie 500000 times and lose 5000 times, same as Bob.

This demonstrates my point that Jethro doesn't need his fancy-pants strategy to beat Tim. Bob and Jethro both beat Tim exactly the same number of times, on average, with their respective strategies. But we can look deeper at Jethro's strategy:

Of those million trials, 99% of them will be normals. Of those normals, 99% of them will be negatives. Working out the percentages, on average we should see:

980100 negative normals - Jethro predicts negative, correctly
9900 positive normals - Jethro predicts positive, correctly
9900 negative unusuals - Jethro predicts positive, incorrectly
100 positive unusuals - Jethro predicts negative, incorrectly.

Holy smackers! Jethro predicts positive 19800 times and is wrong 50% of the time, even with an overall 99% accurate heuristic! Considering only those cases where Jethro says positive, he is on average no more accurate than Tim, who is flipping a coin.

Now suppose instead of yokels trying to predict whether a driver is boneheaded, we have three doctors each trying to predict whether you have Tappet's Disease. Suppose further that 99% of the population is negative for Tappet's Disease: they do not have it. Dr. Jethro has a test for Tappet's Disease which is 99% accurate. (People for whom the test works are "normals", and 99% of people are normals.) Dr. Bob doesn't even bother to diagnose you, he just says "you're negative" every time. Dr. Tim flips a coin.

Suppose you go to all three doctors and they all say "you're negative". Remember, Dr. Bob and Dr. Jethro are both accurate 99% of the time, but Dr. Tim is only accurate 50% of the time. Clearly you have learned absolutely nothing from Dr. Bob, who didn't even look at you; the fact that he is 99% accurate tells you nothing about you. Clearly you have learned absolutely nothing from Dr. Tim; he flipped a coin right in front of you. But of every 980200 patients where Dr. Jethro says "negative", he is correct 980100 times and incorrect 100 times. Dr. Jethro's 99% accurate test is actually 99.99% accurate when he says negative

Put another way: odds of Drs. Bob and Tim being wrong when they predict negative are both 1 in 100. Odds of Dr. Jethro being wrong if he predicts negative are 1 in 10000, a hundred times less likely. Dr. Jethro is taking advantage of the low incidence of positives and the accuracy of his test in a way that the other two are not. Dr. Jethro is way, way more reliable than either of the other two, provided that your result is negative.

But what if the result had been positive? (Obviously Dr. Bob would not say positive, so let's ignore him.) Of Dr. Jethro and Dr. Tim, which should you trust? If Dr. Tim says positive then you have a 99 in 100 chance that Dr. Tim is wrong. If Dr. Jethro says positive then you have a 50 in 100 chance that Dr. Jethro is wrong. Dr. Jethro is clearly still the winner here, but it is deeply counterintuitive to people that a test which is overall accurate 99% of the time can have a 50% false positive rate.

And of course the more rare the disease is, obviously the more likely it is that a negative result is correct; most people don't have the disease, so a negative result is likely. But the flip side is that the more rare a disease is, the more likely it is that a positive is a false positive: an artefact of flaws in the test.

Imagine if only one in ten thousand drivers pulled up to the wrong side. Jethro's 99% accurate heuristic would now be worse than Bob's "always guess negative" strategy because Jethro gets so many false positives.

This is a serious problem in medicine! The worst false outcome is a false negative - that is, the test says you do not have the condition when really you do. That's why Dr. Bob's strategy is completely unacceptable; all his false results are false negatives. But as we've seen, the mathematics of the situation means that given a Dr. Jethro with a reasonably accurate test for a condition with low incidence, false negatives are very rare.

But false positives cause unnecessary, potentially harmful or expensive treatment, not to mention unnecessary anxiety. The mathematics of the situation is that false positives are an extremely high percentage of positives when the inaccuracy of the test and the rarity of the disease are close to each other. Tests for rare conditions have to be incredibly accurate for the false positive rate to be low. The inaccuracy of the test has to be orders of magnitude less than the rarity of the condition.

This sort of probability analysis is based on Bayes' Theorem and it has many fascinating implications beyond just this quick sketch. It has implications in law, in spam detection, it comes up all over the place.

If Tom and Ray have any comments on this critique of their puzzler, I'd be happy to hear them.

1. ficedula says:

Interestingly this is another occasion when cultural biases play an important part – over here (in the UK) fuel pumps at major filling stations often tend to have long pipes specifically so you *can* pull up regardless of where your filler cap is, and the nozzle will reach around to the other side of the car – in order to make most efficient use of the pumps (you don't have to wait for a pump on "the right side"). Some stations display signs to encourage people to do this in order to prevent queues building up.

Still, all other things being equal people do prefer to pull up to a pump on "the right side" so I imagine your heuristic would still definitely beat a 50/50 guess and the general point about false positives is of course absolutely correct ðŸ˜‰

2. Mario says:

The consequences of false positives and false negatives are rarely the same. This may lead to a relatively less accurate test to be preferred to a more accurate one as long as the error is of the least undesirable kind.

For instance, a spam test should never have false positives (which would mean that legitimate mail gets marked as spam), while diagnostic medicine, as you correctly stated, usually considers a false positive the lesser evil (especially so when a disease is highly transmissible: in extreme cases it might be preferrable to just pull a "reverse Bob" and treat everybody).

It's when the evaluation of the potential damage is subjective that things get really interesting…

3. Dean Harding says:

@ficedula: the same is true here (in Australia), but then the stereotypical American is driving a Hummer or Ford F450, and I doubt even the longest pipes would wrap around one of those :p~

4. Leo Bushkin says:

I think your analysis of the puzzler is spot on.

With regard to your comments on the subject of medical tests, I would say that doctors have a responsibility to educate their patients about what the results of a test *actually mean*. Statistics and Bayesian theory are not a subject well understood by most people (although wouldn't it be nice if they were). As such, doctors have an ethical responsibility to help their patients understand the true implications of a test's statistical error. Doctors should strive to *reduce the suffering of their patients* – that includes the mental anguish from learning the results of such tests.

There's also another consideration here as well – economics. The accuracy of a medical test needs to improve significantly (relative to its cost) to be a worthwhile alternative to an existing but cheaper test. Based on your own example, if the test was not 99% accurate but 99.3% accurate, its false positive rate is still 44% (as opposed to 50%). At 99.9%, however, the likelihood of a false positive is now <10% – for the same prior probability of disease in the population at large of 1%. If the slightly more accurate test is significantly more expensive (as they often are) – it's questionable if its broad application is the most effective use of scare medical dollars.

Achieving an appropriate balance between the cost of medical tests and their effectiveness is fraught with difficulty. As individuals, we all want the best, most advanced, most accurate medical procedures applied to us when we are concerned for our health, or the health of someone we love. But as a society, always opting for such an approach results in escalating costs for both insurance and medical treatment – in effect, undermining the very thing we desire.

5. Denis says:

>The inaccuracy of the test has to be orders of magnitude less than the rarity of the condition.

It seems to me that in medicine, in particular, it's much easier to achieve than it sounds, really: how many people are tested for plague nowadays? Or, say, for leprosy? People only rush to do the tests when the condition becomes dangerously common (or made seem so, usually by sensationalists in the media). And most of the medical tests nowadays are reasonably accurate (although again: the only test I can imagine to have the 99.9% accuracy is testing whether the patient is dead or alive). So, if the condition is really rare, there won't be too many people running tests for it and, consequently, hurt by the inaccuracy of those tests; and if it's not so rare, then the statement about the inaccuracy of the test being order of magnitude less then the rarity of the condition becomes true.

6. John B says:

@Dean Harding

You are correct about Australia yet I'd state 99% of people are either unaware or unwilling to attempt this.

Also, some stations make it mighty hard to do not having extendable hoses to some extent (I drive a Fairmont so it's a bit of a stretch without an extendable hose of some sort).

I'm probably the only person I know that actually does this.

JB

7. Niall Connaughton says:

I think Denis raises an interesting point. All cars will come into a petrol station to refuel. But for diseases that are rare, it's certainly the case that not all people will come in for a test. People will have the test done because of some suspicion – family history, symptoms, etc. So if 1 in 10,000 have the disease, you will likely see a much more frequent rate of disease in the sample of people actually having tests.

Interesting post.

8. Gabe says:

This post reminds me of the lotto prediction service that I considered starting once. The idea was that for the low, low price of only \$1, I would predict whether your lotto ticket was going to hit the jackpot — 100% accuracy guaranteed or your money back!

9. Carsten says:

Interesting. Maybe you are interested in the following story I read in a german book last year (Denken Sie selbst, sonst tun es andere fÃ¼r Sie, in english something like: Think on your own, or someone else will do it for you). Say you have 100 women which take the birth control pill (secure at about 99.8%). All of them do a pregnancy test (accurate at the same rate, about 99.8%). If the test says yes, pregnant, the answer is correct in only 50% of the cases. I'm not able to explain as nicely as Eric, but his story remembered me to this case and should have the same mathematical background. Regards.

10. JP Bochi says:

I argue that many tests (specially medical) should not completely reduced to binary logic. Their result are much more fuzzy-like (see Fuzzy Logic). They have confidence rates.

How many times you saw a medic asking for another test because the result of the last one was not conclusive?

Of course, for some automated test (like spam detection), a threshold has to be set, and a fuzzy value has to be reduced to a binary value. Nevertheless, I think there are some spam detection software that flags some email messages as spam-suspect.

11. AC says:

@Gabe I once had a similar idea.

I would guarantee a 50% rate of return on anything they would spend on lotto / poker / whatever. A far better average rate of return than a slot machine will give you, but people didn't find it quite as fun to play.

12. Wonderful intro to Bayesian reasoning here: yudkowsky.net/…/bayes

(Seems to have been given a facelift since I first read it.)

13. Matt Phillips says:

I always go to the pump with the shortest queue (shortest may be zero), irrespective of the side.

If two shortest queues are equal, and they offer a choice of sides, I will prefer a pump on the filler cap side.

But, all in all, there is no /wrong/ side

14. Phil Koop says:

I briefly considered posting under some name such as "Captain Pedantic", but that would be cowardly. And really, in a post that is fundamentally about counting, you should get the arithmetic right.

"Bob wins 445000 times, ties 500000 times, and loses 5000 times."

The snag about this is that 445,000 + 500,000 + 5,000 = 950,000 < 1,000,000. The mistake crept in when you attempted to divide 990,000 by 2 and got 445,000 instead of 495,000. The analogous error is repeated for Jethro.

I notice that you have avoided the use of a separator character such as a comma or point in long numbers (trying not to offend or confuse your heterogeneous readership?) and I suggest that this may have promoted your mistake.

It was a typographical error. I've fixed it. – Eric

15. Robert Cooper says:

Ironically, Car Talk had a puzzler on this very same issue:

16. Mike Vargas says:

They could easily tweak the puzzle to make sense with their reasoning if they rephrased it to limit the results to the set of those in which the driver drove up on the wrong side, i.e. "In all the situations in which a driver pulled up on the wrong side, Tim called it 50% of the time and Jethro called it 99% of the time."  That should resolve the hangup.

17. Sid says:

If you like this sort of post, I recommend "The Drunkard's Walk" by Leonard Mlodinow.

It explains numerous statistical "gotcha's":

– Let's Make a Deal puzzle

– Lady with twins and one is X and Y

– The one from the post (I believe it's the prosecutor's fallacy)

– And the gambler's fallacy (it's gotta land on black, it's been red five times now… it's due!)

18. Donavon Wewers says:

I am a practicing internist and I think you did a great job of explaining the implications of Bayes Theorem on the practice of  medicine.  I would like to share how this very logic effects my practice.

First of all the decision to test someone for a disease is obviously not as simple as someone coming up to me and asking for me to run a test.  (although this does happen)  Usually after taking a history and physical I have suspicion that someone might have a disease.  Then I order a test that can confirm or refute my impression.  So even if the disease in question is relatively rare in the general population, it is no longer unlikely in this particular patient.  This of course depends on my ability to glean clues in my interview and exam that I feel increase the "pretest probability" of disease.  If my impression of the pretest probability is sufficiently high, I will order the test and be relatively pleased with both the positive predictive value and the negative predictive value.  Then I will make a firm diagnosis based on the result of the test.

Now for three different scenarios that really drive this home:

John is a 25 year old male with no risk factors for heart disease.  He comes in complaining of sharp, burning chest pain after eating spicy mexican food.  He is very concerned that he is having a heart attack. It is my opinion that his chest pain is very unlikely to be cardiac.  I do not order further testing.  (I might recommend  Zantac.)  I know that if I order a stress test and even if it is positive, it is still extremely unlikely that the patient has heart disease.  (A nuclear stress test is considered to be about 90% sensitive and specific)

George is a 50 year old male whose dad died at age 70 with a heart attack.  He smokes a pack per day, he is about 30 pounds overweight, and he doesn't like to take his blood pressure pills,  He gets a funny feeling in his left chest when he carries his garbage to the curb.  This is the most strenuous thing he does all week.  Now my pretest probability is significantly higher.  Maybe even 20%.  I order a stress test.  He passes or fails.  Either way I'm pretty confident in the result.  If it's positive I send him to the cardiologist.  If it's negative, I reassure him and try to get him to work on his risk factors.

Joe is a 70 year old male with Diabetes, High blood pressure, hypercholesterolemia.  He also smokes.  Last year he lost his leg due to peripheral arterial disease.  He comes in with crushing chest pain radiating to his jaw, shortness of breath, nausea, and heavy diaphoresis.  I don't order a stress test because now my pretest probability is so high that I wouldn't be satisfied with a negative result.  His chances of a false negative are probably higher than of a true negative.  Instead I would send him straight to the cath lab.

Its important to note that even John's chances of having heart disease are not zero, there just small.  However if I ran stress tests on people like him, about 10% or so would come back positive.  Now we have a population of people with positive stress tests who are now worried that they might have heart disease.  If we send all of those people for heart caths, about one in one thousand will have a serious complication like a stroke.  So if his chances of having heart disease are less than one in 10 thousand (and they are) then it would be much riskier to be tested than not tested.