To “Think” or “Not to Think”

This is one of my favorite topics to debate with people, and it also seems to be one that stirs up the most passion amongst testers who have an opinion. However, as with so much information in testing strategies, there is NOT ONE CORRECT ANSWER.

What I hear all of the time…

“Driving load without introducing think time in your test harness is a great way [the best way] [the only way] [place your phrase here] to push the system and find the breaking point and/or capacity of an application”

Example 1 – Bad for the System

For this example, I was helping a customer test their application, attempting to reach a throughput of 24,000 transactions in an hour. Given the estimated speed of each transaction, they had estimated that a single user could process a single transaction in about 15 seconds. This translated to 240 transactions per user per hour without think time. Therefore they configured the test to run for one hour with 100 concurrent users. They got fairly close to reaching the goal of 24,000 and seemed pretty happy. However, I had a nagging feeling about these results, so I suggested that we modify the load test to run on User Pacing, instead of Number of Users. By switching, we were telling Visual Studio to drive load toward a pre-defined pace instead of “as fast as you can go.” I decided to set the pace to be 48 transactions/user/hour with a user load of 500. This would supposedly show the same behavior:

  • 100 users X 240 transactions/user/hour = 24,000 transactions per hour
  • 500 users X 48 transactions/user/hour = 24,000 transactions per hour

So naturally, we would get the same (or at least similar) results… We hit the “Execute” button, and watched over the next five minutes as the system proceeded to fall apart completely. By the time I hit the abort button, we were on pace to complete less than 5,000 transactions in an hour…. It turns out that they had a couple of components in their front end server software that built and managed their own thread pool. Every user hitting the system was spinning up 8 different threads to do work. When we switched to 500 users, the system was overloaded with blocking, deadlocks and resource starvation. We never would have caught that with only the initial testing.

Example 2 – Bad for the Rig

For this example, we were trying to max out a WCF connection module. We wanted to measure how many users could be authenticated in a given time frame. In my mind, this was a reasonable test to try running without think time. The test created a connection, then released it. This was executed with a step load up to 5,000 users and no think time or pacing. The test was executed twice, and both times the test aborted itself just before reaching the desired load of 5,000 users. The reason was because of an "Out of memory" exception on the test agent machine. The agent was sustaining 97-100% CPU, context switching of over 100,000/sec, a Processor Queue Length averaging 137/sec and maxing at over 500/sec.

This extra work was brought about on the agent because the test being executed was a very short and fast test (the average total test time was 0.006 seconds). The agent machine had to do more work per test than the system being tested. Therefore, the agent machine would never be able to keep up. We added a one second pacing time to the load test, and immediately the test rig started behaving better. We went down to an average Context Switching rate of 8,000/sec and an average Processor Queue Length of 0 with a max of 4. Our CPU dropped to around 85% on the agent. We added 4 more agents and were able to drive a sustained load of over 4,800 tests/sec with 5,000 users.

Conclusion

When trying to test the capability of a certain part of a system, people often build and execute load tests with no think time in the tests and drive the load "as fast as they can." This approach does have value in certain instances, such as Red Zone, Green Zone testing of SharePoint Servers (https://technet.microsoft.com/en-us/library/ff758657.aspx) or testing failure points on systems where you wish to find or reproduce blocking/race conditions. However, when doing this type of testing, you need to be aware of how the results can be used, how the profile can hide potential issue in the Application being tested, and what effect the test has on the rig providing the load.