Software Testing Cage Match: vs. Microsoft

While I previously made some comparisons between and Microsoft’s different approaches to software testing in Building Services: The Death of Big Up-Front Testing (BUFT)?, I think now would be a fun and interesting time to do a deeper dive on this.

image Before I joined in 2005 as an SDET, while I was interviewing for said position in fact, I was told about the “QA situation” there.  I was told “it’s improving”.  Improving from what? you may ask.  Well, the perception was QA got short shrift with the 1 to 10 (or 1 to 7, or 0 to infinity) Test to Dev ratio held up as proof.


imageImproving, eh?”  Did I buy that?  Not necessarily, but I  quickly came to a realization: I have previously used a lot, and had rarely noticed any problems…it seemed to work.  Even so, after I joined the QA Team there, it was still a frequent source of grousing that Amazon as a whole did not value the Quality Assurance profession, otherwise they would surely fund more of us.  I later shared this grouse with my former director of engineering from my previous company over a game of cards.  I expected sympathy, but instead he simply asked “And is this the right thing for Amazon?”  Between that eureka moment, and years more of experience, it taught me that it’s not about the ratio, but about what do you expect from your software teams (Dev, QA, product specialists, and managers) and how do you handle and mitigate risk.

Across the Lakeimage

In 2009 I changed companies and moved to Microsoft (across Lake Washington from Amazon’s Seattle HQ).  Microsoft has a reputation as a place where testers were respected and software quality was given priority. I was eager to see the  “Microsoft way” of quality…the fabled 1 to 1 ratio.  Turns out that’s the Office and Windows way, but a nascent realization of how we test such “shrinkwrap” product versus how we test services was taking hold and experiments in quality processes abounded.  However I think there are still fundamental differences to how approaches software quality versus Microsoft.

Head to Headimage

I manage a Test Team at Microsoft with a 1.5 to 1 Dev to Test ratio.  At Amazon I had 1 SDET for every 7 or so Devs.  So my new job must be easier right?  Nope.  Ratio is not an input into the equation, it’s an output.  You set your quality expectations and you employ processes to get you there.  One path takes 10 SDETs and another takes 1.  How can this be?  Well, let’s compare and answer the question:

How does Amazon get by with so few hours spent by its QA teams relative to Microsoft?

1. At whole features, services, code paths went untested.  Amazonians have to pick and choose where to apply their scarce resources.  At Microsoft, other than prototypes or “garage” projects you can expect the complete “triad” of Development-Test-Product Management teams to be engaged at every step of he way.

Exclusion of code from testing cuts down your need for testers.  Maybe SDETs to lines of code tested is a more interesting ratio than Test to Dev?  If you exclude untested features at Amazon, then the test to dev hours ratio is going to increase closer to Microsoft standards

2. Amazon has “Quality Assurance” teams while Microsoft has “Test” teams.  However QA at Amazon almost never got involved in anything but testing.   That is to say Microsoft and Amazon should swap the names they use for their QA teams since Amazon’s are much more “test” only teams while at Microsoft we seem to achieve more actual QA.

Saving time by not reviewing the design or designing for testability is not saving time at all.

3. Functional-only testing was common at, however performance testing was either not done, done by developers, or given second class status.

Performance testing was often done by the dev teams, so these test hours were actually spent, just not by the test team.

4. image A high operations cost was considered acceptable (developers carried pagers), so releasing a bug was OK because it was relatively quick to fix it. (lower quality bar for release).  Also Amazon had better tools and processes for build and deployment which enabled rapid deployment of hot fixes.

Essentially a form of testing in production (TiP).  Again tally up the hours and put them on Dev’s tab 

5. Better self-service analysis tools.  Any issue that was found in production was easier to analyze and turn-around quickly due to better tools for monitoring servers and services, and sending alerts.

Reducing cost through automation (and tools)… this is a real savings.

6. Cheap Manual testing.  I am of mixed mind listing this since I spent a great deal of energy encouraging the manual testers to automate their tests, but Amazon employs overseas teams to bang on the product via the black box interface and find problems before production users do.  This had a decent yield for finding defects. 

Hidden test hours.  When people talk about the test to dev ratio at Amazon they often do not count these off shore teams. 


A friend of mine who is a QA manager at Amazon recently lamented:

“The test to dev ratio [is] insanely stretched …. there’s soo much more we could do, but no we just rush and rush and cut things and get [blamed] when we miss something”

So maybe my “head to head” comparison does not explain away all the differences, but the message I would like to convey is that it is about expectations.  I originally wrote the above list in response to a Dev manager who asked me why we couldn’t be more like Amazon and “pay less” for QA.  Amazon has one expectation and Microsoft has another about quality and about risk… that’s why.


I’ve made a lot of generalizations about how things are done at Microsoft and, which means what I said is going to be simply wrong when applied to several teams.  Feel free to let me know in the comments how I screwed up in portraying your team.  But be aware I know it’s not one size fits all…hopefully I’ve captured the big picture.  


And to close, I will say that other than the ratio, Amazon did improve while I was there.  I saw the QA community come together and start  interacting in positive ways.  Amazon’s first ever Engineering Excellence forum was organized by the QA community.  So that just leaves the final questions:  Does Amazon’s ratio need to be improved, and what does improved look like?  Do Microsoft’s expectations need to be changed, and what would those look like?



Comments (5)

  1. says:

    Seth, this is a good post. Understanding the cost of fixing bugs is essential to understanding the differences in the way Amazon and Microsoft approach QA.

    I like to say that the “pager culture” at Amazon is a big reason why Amazon doesn’t invest more in QA. Developers at Amazon know their pager may go off in the middle of the night and they’ll have to diagnose a problem and deploy a fix. There’s a cost to this, but it doesn’t compare to the cost of patching those boxed software products that Microsoft is famous for.

    With “software in a box”, the cost to fix bugs is very large. Fixing a bug means pushing updated bits out to all of the clients running the software. That’s why Microsoft spends so much on QA for Windows… and Windows Update still sends a bunch of hotfixes to my computer every month! When you consider how many computers in the world are running Windows, it’s easy to see how each one of those fixes costs Microsoft a lot of money. Hence, the choice of how much QA to employ comes down to a business decision of what costs less.

    With “software as a service”, instead of having each of your customers install your software on their computers, the bulk of your software stays with you and it’s much easier for you to control. To fix a bug, you simply patch your servers and you’re done. The cost is much less than sending a fix out to each client. The result is that at the end of the day, bugs are more tolerable in Amazon’s culture than they are in Microsoft’s culture. Ouch! That was painful to say, but it’s true.


  2. seliot says:

    Hey Rob, Thanks for your comments.

    I think as someone who has been an SDET at both Microsoft and Amazon your input carries a lot of weight here.

  3. nitinmehra20 says:

    I agree with what Seth has mentioned, our deployment setup is both a boon and a curse. This culture of roll-back-if-it-crashes has been partly fueled by the ease with which we can roll back, fix and redeploy code. Other than the operational cost, their is very little cost involved in a rollback. Thankfully, this is changing in some teams.

    This is specially true with regards to Kindle and the SDK teams. We now find ourselves in the same situation as Microsoft, as in we too now own "software in a box". Once released, it is difficult, not to mention prohibitively costly to fix bugs on the Kindle and its associated software development kit.

    Though I feel we still need to go some distance to create QA teams which have enough strength in numbers to meet this paradigm shift head on. We need to staff for a "shippable" code base as opposed to a "deployable" one.

  4. ralphcase says:

    Hi Seth,

    I think there are other factors that are important in decided the optimal level of test investment.

    You discussed how easy it is to fix a bug, and how that can be quite different depending on the type of product.

    Another important question is how easy it is to detect a bug.  If a bug causes a server to crash, that’s easily detected and reported by monitoring systems.  If a bug causes the wrong amount to be deducted from a bank balance, can you detect that in production, or do you need to develop specific test cases to detect such bugs before release?

    Yet another variable is how the system behaves when a bug hits.  If the effect is that a web page does not display correctly, but when the user refreshes the page it does, that bug is much easier to tolerate in production than a bug that corrupts the database.

    If a system is designed from the start to tolerate certain kinds of failures or bugs, that can have a big impact on the ultimate test budget to ensure the system is meeting its requirements.

    Ralph Case

  5. seliot says:

    Great point Ralph.

    I think it can be summarized as risk assessment of bug impact.

    Hard to detect bugs are higher risk.  Bugs without work-arounds are higher risk.

    Also systems like banking, aerospace, medical are less tolerant to defects in production, even if you take steps to limit the scope of these defects (such as by using exposure control when TiP).