All About Load Test Results and the Results DB (Part 4 – What Are the Numbers Really Telling Me?)

Article
06/10/2014

This question comes up all of the time. Unfortunately, there isn’t really a single correct answer. However, there are a number of things that can be pointed out that will help lead you to an acceptable answer. The most important thing is to be sure you have defined the proper goals, objectives and success criteria (see this post). I also want to point out that I am not covering any information on how to use results to find issues or troubleshoot test runs. This post focuses solely on the mathematics of results and the expectations of the people receiving the results from test runs. Let’s look at a question and some of the responses I have seen before. Some of the answers look very reasonable at first, but they may have some caveats:

INITIAL QUESTION: In my performance report there are 3 columns, say Average Response Time, 95% Response Time and 99% Response Time. Which one should I use to report to my Business?

RESPONSES: Here are some responses I have seen and some thoughts to consider when reading them

I don’t think average response time is a good idea to report since ½ your users will see something worse. [The average response time does not mean that half the values are better or worse than the average. Averages can be skewed by just one value e.g. the average of (1 , 1, 2, 2, 1, 20) = 4.5. In this example, all the values except for one are below the average. The MEDIAN or 50^th percentile represents the exact middle value of a set of data.]
We often report the 80th percentile since that represents a vast majority of users without worrying about the extreme outliers. [the 80^th percentile only excludes the “greater than” outliers. This does not necessarily exclude all “extreme” outliers since you may have outliers on the low end of the scale. It also does not mean that there are “extreme” outliers. It just means that the 80^th percentile represents a value which 80% of data is either less than or equal to.]
It’s helpful to chart the results so that you can see the distribution. [I like this answer]
In terms of what to report back, the business should be telling you what they want to see based on their needs and those of their customers. [I like this answer]
Sharing thoughts on percentile values when reporting (to see how the values are calculated, read the bottom of this post):

1. If the standard deviation is < 5% for the individual result set of transactions/pages, we could take the average response time.
2. If the standard deviation is > 5 and < 10 % for the individual result set of transactions/pages, we could take the 75^th percentile response time.
3. If the standard deviation is > 10 and < 20 % for the individual result set of transactions/pages, we could take the 90^th percentile response time.
4. If the standard deviation is > 20 % for the individual result set of transactions/pages, we could take the 95^th percentile response time. [Keep in mind that if the variation is low (small std dev), you generally want to use the average because it does not exclude any data. When you switch to using percentiles, you are excluding some chosen percent of the data. If you want to get a measure of central tendency when there is higher variation, I would generally steer toward the median or some sort of truncated mean (i.e. remove the bottom and top x percent rather than just excluding the top x percent). If you want to get a measure against a criteria (such as an SLA), the chosen percentiles might make more sense because they will be saying x percent of calls are less than or equal to y. HOWEVER, you have to be careful when doing comparison reporting to always choose the exact same percentile value AND to exclude outliers using the same formula.]

I would love to hear some other thoughts on this subject as well.

All About Load Test Results and the Results DB (Part 4 – What Are the Numbers Really Telling Me?)

Additional resources