PART 1 in a series of posts on Load Test Planning
As part of my work inside Microsoft, I have to interface with several different groups of people, from sales to account management to the technical delivery people, and also to different product group teams. I have learned some very valuable lessons from these conversations. The biggest single lesson I have learned is to not only ensure that everyone is speaking the same language, but to ensure that we also all understand the reason we are even speaking. In this blog series, I will attempt to show (at a high level) the thought process I use and some of the steps I and my former testing team follow when we work on customer engagements.
Please note that you will see me refer to several different types of resources throughout the series. This is a critical part of successful performance load testing since it is impossible for only one or two people to know enough to pull this off. My old team has a very simple slogan: “We test WITH you, not FOR you.” That phrase sums up one of the most important reasons the engagements are so successful. It takes a team of skilled people to effectively design and execute performance load tests. I will also tell you that the people I have the privilege of working with all have several years of experience in their respective fields, which is also a key factor in success.
The Right Questions
I find it critical to always ask the questions of the final goal(s) of a testing effort before doing any other work. If this question cannot be accurately answered, or if the answer does not make sense, then I start drilling into the specifics to get more clarification.
I recently bought a video series from O’Reilly books called “Software Architecture Fundamentals”, delivered by Neal Ford and Mark Richards. In one of their segments, they talked about understanding the requirements of a project and they gave a great example of how you can change the entire dynamic of any engagement by asking the right question (Sorry Mark, but as you said in the video, I would rather re-use your example than make a new one). Please note I am paraphrasing here:
“The design team of the F-16 fighter plane was asked by the Air Force to design a plane that would fly at Mach 2-2.5. At the time this was very difficult to achieve. The team struggled a lot. Finally they went back to the Air Force and asked ‘Why does it need to go so fast?’ The Air Force said ‘Because we need to be able to escape when we get into bad combat situations.’ So the design team took that knowledge and offered an alternative. They designed a plane that had high acceleration and high maneuverability. The plane would not go Mach 2, but was still very successful because they realized that the acceleration and maneuverability were better than overall speed in those situations.”
My Starting Points
Here is the list of things I start with for every engagement I do. Sometimes it can take a few days to answer just these questions. Other times the customer can hand me the answers in their “ready to Use” state:
What are the GOALS/ OBJECTIVES of this testing effort?
A goal or objective is a “desired result or outcome” of an activity or endeavor. Defining this up front sets the stage for every other piece of work you will do. You need to make sure the goal(s) are realistic, and that they possible to obtain. They should also align with the team’s skill sets, the business partner’s needs, and the overall project. Here are a couple of goals to consider:
- Determine and document system load capacity.
- Benchmark the application against Expected Peak Load
- Obtain Client Timings
These goals all seem pretty good at first. However, I bet some of you are thinking that they are either too vague or that the first two are the same, or that there is some other problem.
The first one wants to figure out what load will cause the system to break. The second one wants to understand how the system will behave at a pre-defined load. The third one wants to find out how fast the client is. While they might all be easier to understand if we changed them a bit, you’ll see that I will clarify them in the next step.
What are the SUCCESS CRITERIA for this testing effort?
Now that we have goals, we want to know “What determines if we have successfully achieved the goal or not.” These are the success criteria. Continuing with the list of goals above, we can add the following:
- Determine and document system load capacity. This goal was for an ISV and they really just wanted to know how much load the system could handle before “failing.” This triggered the question: “What does it mean to fail?” Now I have a question that references the concepts of PASS and FAIL, which could be re-phrased as “Succeeded or Failed.” So let’s define that. In order to be considered as “passing” the system should be able to
- Process at least 10,000 sales per hour with 3,000 of those coming from new users. (NOTE: you will see later that I am going to remove this as a Success Criteria).
- The users should never have to wait more than 2 seconds for any screen to redraw.
- The server resources should stay within reasonable industry resource utilization.
- Benchmark the application against Expected Peak Load. So here we just want to understand what the systems resources and response capacity look like with a pre-defined load:
- How long to process a single sale?
- How long to register a new user?
- How much of the system’s resources are consumed?
- Obtain Client Timings. OK, is that with the server system under load or at rest? If it is at rest, then my job is usually done. You can get client timing info in single user tests and through profiling tools. If however, the answer is under load, then we run the same client profiling or single user tests, but we only do it when we have a test harness executing something similar to the previous tests.
What are the METRICS used to measure the success criteria?
This is the meat of the data collected. Metrics are concrete values obtained through specific tooling as part of the testing. The values are usually from the load test rig and/or from Performance Monitor collections or other monitoring tools. However, we still define them in the test plan in a generic fashion. Please note that I am only doing metrics for the FIRST goal above.
- Process 10,000 sales per hour with 3,000 of those coming from new users. We define some transactional timers and measure the throughput:
- CompleteTheSale execution count: 10,000 / 3600 (seconds in an hour) = 2.8 T.P.S. (transactions per second)
- RegisterTheUser execution count: 3,000 / 3600 = 0.8 T.P.S.
- The users should never have to wait more than 2 seconds for any screen to redraw.
- CompleteTheSale average response time < 2 seconds
- RegisterTheUser average response time < 2 seconds
- The server resources should stay within reasonable industry resource utilization. – Wow, this one is loaded. What defines “industry Standard?” Well, I won’t answer that directly, but I’ll throw a few numbers out that I see all of the time. Modify them as you see fit:
- Web Server CPU Average less than 70%
- Web Server Available Memory > 20% total memory
What is wrong with the above data?
Looking at the above examples, you may notice that there is a big problem with one of my data points. Remember that I said I was going to remove the first Success Criteria for the first goal above? Here’s why. When I am trying to find the breaking point of a system, one of the things that I am always trying to find is “What throughput can the system handle?” Throughput usually contains metrics like requests/second or transactions/second, etc. If this is the number I am trying to determine, it should not be one of my success criteria. The success criteria in this case should be listed as “Obtain the maximum TPS for CompleteTheSale without the other criteria reaching a fail point.”
Think of it this way. The first goal is to define the acceptable behavior of the system and see how much load (throughput) it can handle. The second goal is to define the load (throughput) and see what the behavior is.
The bottom line for all of these is that they should be well defined, they need to make sense, they need to be measurable and the efforts and testing needs to be repeatable.
I will visit the considerations for designing a load pattern and how to model that load.