# Bing gains, Google drops

The following is an excerpt from Investor's Business Daily:

Microsoft (MSFT), the software giant, increased its market share in U.S. Web searches to 8.23% in June from 7.81% in May, thanks to its new Bing search site, according to tracking firm StatCounter. Web search king Google (GOOG) lost share slightly, dipping to 78.48% from 78.72%.

Figures like these really annoy me.  Why?  Because they are using statistics inaccurately.  Look at Google's "loss" of search share - a drop of 0.24%.  How could they possibly measure that?

In statistics, there is always a margin of error known as the confidence interval.  If you were to survey a group of users and 75% of them reported the same answer, then you cannot straight out extrapolate that to the rest of the population.  If you sampled ~1000 people, then you can say that 75% of the population, +/- 4% would give the same answer.  At a 95% confidence level, then you would say that you are 95% confident that between 71% - 79% of the population would give that answer.

Surveys work by doing random sampling.  Yet, in order to get the responses above, we have to make sure that the margin of error is less than the difference.  For example, in my above example, suppose you asked 1000 people what kind of widget they liked best and 67% of them said Widget A.  Next month, you ask 1000 people the same question and and 65% of them say Widget A.  Does that mean there was a drop of 2%?  No, because the 2% drop is within the 4% margin of error from the previous month.  You cannot be certain of anything.

In order for Google to have experienced actual market share loss, the original number had to be 78.72% +/- 0.11%, while the second number has to be 78.48% +/- 0.11%.  Why?  Because we have to have non-overlapping margins of error:

78.72 - 0.11 = 78.61%
78.48 + 0.11 = 78.59%

Those two do not overlap and thus we can be confident that real market share has been lost by Google.  So, how many people would the survey have to interview in order to get that confidence interval?  About 735,000.  I somehow doubt that this surveying company actually asked that many people what their favorite search engine is (or however they did their sampling).  In order for Microsoft to have gained their market share, they would need to have sampled 213,000 people.  Sounds unlikely to me unless they have some automated way of culling out all of this data.

People need to know how to use statistics properly.

Tags

1. James says:

I suspect that it’s both better and worse than that.

If they’re a tracking firm, they’re probably not asking anyone directly. They would have tracking bugs on a large number of websites, and are probably looking at HTTP referer headings to see which search engines sent enquirers to their sites. They can probably also use cookies to identify individual users and tell which search engines they use. So they probably do have enough data to reasonably give that level of precision.

But it’s not random sampling. There’s no way to know that they don’t have systemic accuracy issues.

One could quibble that their customers are not exactly randomly chosen, so what they see may not be representative. Also, one might expect that (say) Bing gives more information about links, so Bing users will visit fewer sites in response to a Bing query, and so StatCounter is less likely to see Bing users that don’t use search engines very often.

On the other hand, these systemic accuracy problems are unlikely to change much from week to week, so the trends probably are accurate.

2. James says:

Sorry: when I wrote "to identify individual users and tell which search engines they use", that should be "to identify individual users, and deduce which search engines a particular user uses from all the data against that user".