TQA - Quality in the Cloud: The new Role of TestOps

Seth wrote an article about the new role of tester in service world, called TestOps. For more information, you can visit his blog for details.Here is the summary of the paper:

  "The term "TestOps" is a new one, but it encapsulates a new way of thinking about how we test and ensure quality of cloud services. It is a portmanteau of "Test" and "Ops", so let's start with "Test", a term I expect this readership to know all about but will briefly define here for clarity. When we test we apply a series of actions on the system and check for respective expected results, so as to assess the soundness of the system for use by our customers -- or looked at the other way, the risk of exposing the system to end users.

"Ops", which is short for "operations", generally owns the operation and maintenance of the servers, network, and support systems in the data center where your service is deployed and running. One of Ops' most powerful tools is monitoring - the observation of data such as server health, network load, and user traffic to assess overall data center health. And by health we mean soundness for customer use and insight into potential risks to end users, just like our testing.

In this way we as testers have the same goals as ops, so can we also make use of the data signal continuously being emitted by our systems in production to do our jobs? Recall that to test we apply a series of actions. The results of those actions are test results which have traditionally been our signal by which we assess quality. But instead of applying synthetic actions and trying to predict how real users and complex production deployments will act, why don't we instead use the signal from these real users and real deployments as part of our quality strategy? And rather than simple monitors like "server up" or "free disk space" we can measure complex use scenarios in real environments.

For example if we are interested in system performance, why attempt to replicate every combination of machine, OS, and browser in lab, only to shoot synthetic traffic at them? Instead be like Microsoft Hotmail and measure actual times it takes real users to send and receive emails. This can be collected for millions of users with no PII (personally identifiable information) but including key environment data like OS or browser type. With data like that systems can be tuned to better perform where trouble spots are identified with specific OS, browsers, or even specific locations around the world."