I thought I could shed some light on the ASP.NET performance testing by presenting our most basic scenario: Hello, World!
Step 1: State the Objective
We want to ensure that the basic ASP.NET pipeline does not regress from one release to the next. We use server throughput (requests per second) as our performance metric. We want throughput to be on-par or better than the previous release (aka, baseline).
Step 2: Create the Scenario
We have a basic IIS website with the following ASPX page:
<html> <%=”Hello, World!” %> </html>
Step 3: Run the Scenario
In order to detect regressions we need to have reproducible results. Here are some of the ways that we reduce variability:
- Eliminate the variables. Since I am measuring ASP.NET performance, .NET should be my one variable. Other factors such as hardware, OS, IIS, wcat version and the scenario itself should remain constant.
- Use a private network. Eliminate unnecessary network traffic to the web server which could affect results.
- Minimize server processes. Again, we want to avoid competition for server resources. We usually disable Windows Firewall and run the wcat controller and clients from separate machines.
- Measure warm. Since we are testing throughput and not startup, we can reduce the variability by adding a warm-up period. This will reduce JIT compilation and population of caches which could cause more variance.
- Max the CPU. Server load and system resources factor into the equation as well. We try to keep this near-constant by applying a full load during testing. Our goal is to achieve at least 90% CPU usage during our runs.
We quantify our test variance (aka, noise) by running multiple iterations. We ignore the first iteration which we’ve found to have greater variance, and calculate the average and standard deviation for the remaining three. Standard deviation is our noise indicator.
Step 4: Analyze the Results
After doing our baseline and test runs, we should have results like the following:
|Scenario||Baseline||Result||Diff||Baseline StdDev||Result StdDev||Pass/Fail|
These are the columns:
- Baseline, Result – Requests per second average for the last 3 iterations of the run
- Diff – Percentage of how far off the Result is from the Baseline
- Baseline StdDev, Result StdDev – Percentage of variance between the run iterations
- Pass / Fail – Whether the Diff is above our threshold.
Based on our experience we’ve set our threshold at 5%. We define failures as runs that are more than 5% regressed. Runs that show >5% improvement should also be investigated in order to understand and validate the cause.
We consider a run noisy if the standard deviation exceeds a threshold, which we also set at 5%. If a failed run is noisy, we throw out one more iteration to see if the Diff and StdDev improve. If they do we may ignore the failure and wait for the next run. If the runs continue to be noisy, then the scenario should be investigated in order to further reduce the variance.
Step 5: Investigate Regressions
As the saying goes, “Measure Early, Measure Often”. The less changes there are between your runs, the easier it will be to track down the cause of your regressions. This is why the ASP.NET performance lab runs daily with builds from multiple branches. Often times I’m able to quickly identify a regression simply by looking at the source control history.
Another way to quickly diagnose regressions is to enable performance counters or other non-invasive tracing which could help identify the cause. We always save the wcat logs with our performance counters and other useful information such as throughput, working set, percent cpu usage, and HTTP responses. The HTTP responses can rule out test failures, while the other diagnostics (combined with source control) can help narrow down the cause.
Finally, if a quick diagnosis is not possible, find a good profiler. Some of the tools we use are the Visual Studio (F1) profiler, CLR Profiler, and XPerf. I hope to demonstrate some of these in future posts.