Why we automate

I never really understood why so many people external to Microsoft seem to be against the Microsoft strategy to increase the amount of automation we rely on to test our products. Test automation has become sine qua non at Microsoft for many valuable reasons.  Although there are some uninformed managers who propose the nonsensical notion of 100% automation, Microsoft as a company does not have some magical goal of 100% automation (whatever that means).

I have come to the conclusion these (mostly) external naysayers ("people who tend to be negative, who have difficulty thinking outside of the box, who go around, consciously or unconsciously, squelching dreams, ideas and visions") of automation simply do not, cannot, or perhaps refuse to comprehend the complexity of the systems or the problem space we deal with at Microsoft or how we design and develop our automated tests (in general, we do not write scripted tests). Or perhaps the anti-automation folks are SOS (stuck on stupid) and believe that any test (including automation) that does not find a bug is not a reasonable test or that executing that test does not provide some perception of reasonable value to an organization.

So, for those of you who are not afraid of being replaced by automation (if you are afraid a machine will take your job, then it probably will), and for those of you who have an open mind and know that real testing is not a simple comparison of scripted vs. exploratory, and that successful testing requires a myriad of techniques, methods, and approaches, then read on and I will discuss some of the reasons why we automate at Microsoft.

  • Increasingly complex test matrices. (Whether you like it or not) the Windows operating system proliferates throughout the world. But, even the different Windows operating systems in common use throughout the world are staggering. Although we no longer support Windows 9x (including ME) there is still support for Windows NT 4.0, Windows 2000, Windows Xp, Windows 2003 Server, and Windows Vista. Then add in various combinations of service packs and hotfixes released for each of these, and test environments composed of upgrade scenarios (Windows Xp upgraded to Windows Vista) approximately 30 language versions, 16- bit, 32-bit, and 64 -bit and the number of operating system platforms alone becomes mathematically staggering. But, instead of writing several automated tests for each combination of environment parameters our testers ideally design a single test that detects the OS platform and environments, and develop the test in a way it can decide how to achieve its objective based on platform profiling and other design techniques. (For example, a test for defrag has to take into account the German version of the OS does not contain the defrag utility (long story), or a security test on the French version considers the French version does not contain 128-bit encryption). One automated test instead of an army of button pushers doing the same thing on 5+ operating environments and 30 languages just makes sense!
  • Sustained maintenance. Microsoft supports many of its products for 7 to 10 years. So, the more test automation we can hand off to our sustained engineering teams, the less cost the company has to bear in the long term. Do these regression tests find a lot of defects. Perhaps not, but they do provide confidence that changes to the code base introduced by hotfixes and service packs do not adversely impact previous functionality. They also eliminate the burden of having to maintain an army of testers for the entire shelf life of the product.
  • Increase breadth of coverage. Automation is distributed across the network, it is not ran on one or two desktops or on a few lab machines. Many product teams have extensive test harnesses that are capable of scheduling tests, configuring environments, then distributing tests to hundreds of machines in a lab, or even on idle machines on the network. This way we not only run test automation on pre-defined environments, but also on work machines and other systems. Test automation is running 24 hours a day and collecting and providing valuable information.
  • We don't rely on automation to find bugs, we use test automation to provide information. Sometimes automated tests can expose unexpected and/or unpredictable behavior, but we know that automated tests do not find a great deal of defects after the design and development phase of that automated test. However, many of our products have daily builds and instead of rerunning a bunch of tests to ensure changes from a previous build have not affected previously tested functionality an automated test is more efficient. The minefield analogy sometimes applied as an argument against regression testing only really makes sense if the test is being executed on a static code base. But, if you test in a real world where you might get daily or even weekly builds of complex systems then you probably realize that executing a set of tests after someone changes the minefield, has a probability that following the same path may expose a defect, and if it doesn't then it still provides valuable information.
  • Increased job satisfaction! This is a big side benefit. I know some people are afraid of automation, and some people may lack the skill to design and develop effective automation, but professional testers realize it is a huge challenge to design and develop effective test automation for complex systems. To quote one tester, "the hardest code I ever wrote was the code needed to test other code."

There are many more justifiable reasons why increased automation simply makes sense at Microsoft, but this should give you some idea of why without automation our jobs would be much more difficult than they already are.

Comments (2)

  1. Shrini says:

    Good Points BJ —

    The point on automated tests providing information instead of finding bugs is interesting.

    Can you elaborate on "information" aspect?

    you mentioned that you don’t rely on Automation to find bugs. Other day someone also mentioned that "Automated tests are not expected to find bugs (if they do, it is bonus) but they are expected to check that the application has not "regressed". Are you with this arguement?

    IMHO, automated tests are good at very precisely checking differences between two versions of the program – a kind of static thing. Gathering information is a dyanamic thing, more suited to a skilled manual testing effort.

    >>>However, many of our products have daily builds and instead of rerunning a bunch of tests to ensure changes from a previous build have not affected previously tested functionality an automated test is more efficient.

    I am not sure about the point that you are trying to make here. Are you comparing "bunch of tests (manual, I suppose)" with "automated tests" ? What you actually do in "daily build" scenario?

    >>> But, if you test in a real world where you might get daily or even weekly builds of complex systems then you probably realize that executing a set of tests after someone changes the minefield, has a probability that following the same path may expose a defect, and if it doesn’t then it still provides valuable information.

    Two points here. One – you seem to indicate that for complex systems with daily or weekly builds, minefield analogy does not apply – as chances of one stepping on a bug is high even following same path. Agreed – variables here could be – nature of code changes, code review effectiveness, configuration management effectiveness, unit testing effectiveness etc.

    second – you are saying that even when a bug is not discovered, there is some valueble information discovered. Can you give me an example?


  2. I.M.Testy says:

    Hi Shrini,

    I should have said the majority of automation does not expose a large number of defects. Automation will often expose certain categories or classes of defects more effectively and efficiently as compared to manual tesitng (race conditions or memory leaks for example). But, in general a large majority of defects found in an automation lifecycle are found during the design and development of that automated test.

    In the context of computer science, static implies something that has little or no change, or produces the same or similar output or result over and over again. So, if you agree with that connotation, then even a static test that runs repeatedly without error provides at least a baseline of information. (The value of that information depends on the purpose and design of the test, and whether the management team considers that information important.) However, if the tests are constantly changing (as they do during exploratory testing) then you cannot establish a reliable baseline because information changes. Exploratory testing can be a useful testing method, but because of its very nature of (perceived) non-repetition it provides little explicit qualified information other than time spent in activity and bugs found during that period?

    Yes, by bunch of tests I was suggesting manually executed tests versus automated tests.

    The Reader’s Diagest version of a build process generally involves recompiling all files, or the files that have been changed (new/modified features or functionality and/or bug fixes). Then the files are generally packaged as they would be for distribution, and that ‘build’ is released for testing.

    The testing of the build often starts by checking file attributes, checking for new/changed/removed files, checking to make sure all files that are supposed to be in a particular package or SKU are there, etc. All these processes are highly automated in most cases. This is often referred to as a Build verification test or BVT. The build acceptance test or BAT usually involvles a series of critical functional tests on the validity of the build has been established.

    I previously mentioned I wrote the BVT for Far East language versions of Windows 95. The BVT consisted of approximately 500 functional tests that ran on simultaneously on 4 language versions on 8 machines (upgrade and clean installs) every week and required approximately 30 minutes to complete.

    The information provided by this test suite allowed a tester to determine whether the build was rejected, released for test only (unstable but testable), or released for self hosting (or dog-fooding) based on established quality criteria (the baseline). I think you would agree this is pretty critical information that is much more effectively and efficiently gathered via automation.

    Sometimes the BVT suite ran without incident, and sometimes it exposed various errors. But, even when it ran without incident it provided valuable information to the team. My experience with BVT suites convinced me the minefield analogy doesn’t seem to take into account iterative development lifecycle models which are commonly used in today’s software projects. However, I will also say the ‘minefield’ doesn’t change as much during sustained engineering, which should cause us to question how much regression testing is really necessary in sustained engineering (e.g. the regression test suite for sustained engineering is huge, but do we need to run all automated tests for every hotfix?).

    Another area where automation provides valuable information (if it is important to the decision makers) is performance testing (here I am specifically referring to tests that determine the avg. time necessary to complete a specific task or series of tasks). The majority of performance tests generally don’t expose a (functional) defect, but the information is valuable and allows management to make important decisions regarding certain aspects of the product.

    – Bj –

Skip to main content