ASP.NET Performance Lab: Hello, World!


I thought I could shed some light on the ASP.NET performance testing by presenting our most basic scenario: Hello, World!

Step 1: State the Objective

We want to ensure that the basic ASP.NET pipeline does not regress from one release to the next.  We use server throughput (requests per second) as our performance metric.  We want throughput to be on-par or better than the previous release (aka, baseline).

Step 2: Create the Scenario

We have a basic IIS website with the following ASPX page:

<html>
    <%=”Hello, World!” %>
</html>

Step 3: Run the Scenario

Most of our scenarios use the wcat load tool to test the web server.  You can download wcat here.

In order to detect regressions we need to have reproducible results.  Here are some of the ways that we reduce variability:

  • Eliminate the variables.  Since I am measuring ASP.NET performance, .NET should be my one variable.  Other factors such as hardware, OS, IIS, wcat version and the scenario itself should remain constant.
  • Use a private network.  Eliminate unnecessary network traffic to the web server which could affect results.
  • Minimize server processes.  Again, we want to avoid competition for server resources.  We usually disable Windows Firewall and run the wcat controller and clients from separate machines.
  • Measure warm.  Since we are testing throughput and not startup, we can reduce the variability by adding a warm-up period.  This will reduce JIT compilation and population of caches which could cause more variance.
  • Max the CPU.  Server load and system resources factor into the equation as well.  We try to keep this near-constant by applying a full load during testing.  Our goal is to achieve at least 90% CPU usage during our runs.

We quantify our test variance (aka, noise) by running multiple iterations.  We ignore the first iteration which we’ve found to have greater variance, and calculate the average and standard deviation for the remaining three.  Standard deviation is our noise indicator.

Step 4: Analyze the Results

After doing our baseline and test runs, we should have results like the following:

Scenario Baseline Result Diff Baseline StdDev Result StdDev Pass/Fail
HelloWorld 41549.67 40432.67 -2.69% 0.64% 0.35% PASS

These are the columns:

  • Baseline, Result – Requests per second average for the last 3 iterations of the run
  • Diff – Percentage of how far off the Result is from the Baseline
  • Baseline StdDev, Result StdDev – Percentage of variance between the run iterations
  • Pass / Fail – Whether the Diff is above our threshold.

Based on our experience we’ve set our threshold at 5%.  We define failures as runs that are more than 5% regressed.  Runs that show >5% improvement should also be investigated in order to understand and validate the cause.

We consider a run noisy if the standard deviation exceeds a threshold, which we also set at 5%.  If a failed run is noisy, we throw out one more iteration to see if the Diff and StdDev improve.  If they do we may ignore the failure and wait for the next run.  If the runs continue to be noisy, then the scenario should be investigated in order to further reduce the variance.

Step 5: Investigate Regressions

As the saying goes, “Measure Early, Measure Often”.  The less changes there are between your runs, the easier it will be to track down the cause of your regressions.  This is why the ASP.NET performance lab runs daily with builds from multiple branches.  Often times I’m able to quickly identify a regression simply by looking at the source control history.

Another way to quickly diagnose regressions is to enable performance counters or other non-invasive tracing which could help identify the cause.  We always save the wcat logs with our performance counters and other useful information such as throughput, working set, percent cpu usage, and HTTP responses.  The HTTP responses can rule out test failures, while the other diagnostics (combined with source control) can help narrow down the cause.

Finally, if a quick diagnosis is not possible, find a good profiler.  Some of the tools we use are the Visual Studio (F1) profiler, CLR Profiler, and XPerf.  I hope to demonstrate some of these in future posts.


Comments (7)

  1. Phillip says:

    When testing from one release to the next, is the test run on identical hardware to the previous?

  2. Eric Swanson says:

    Would it matter if you used MVC? Would it matter if you used different view-engines e.g. Razor @("Hello World")  ?

  3. Phillip –

    Yes, I don't want the hardware to be a variable.  If I do update my hardware then I make sure to re-run my baseline.

  4. Eric –

    I'm assuming you're looking for a HelloWorld comparison across WebForms, MVC2 (WebForms View Engine) and MVC3 (Razor View Engine)?  I haven't done that comparison – but maybe I can look into it for a future post.

  5. Niels Kühnel says:

    I really love that you put so much effort into keeping ASP.NET as an "opt-in on features" framework!

    Hardware shouldn't really be an issue as you can always measure the baseline of the previously passed release on new hardware. However, why do you introduce network latency at all?

    If you're really interested in measuring "Request against .NET, ceteris paribus", why don't you run it all against localhost? Use a box with lots of cores, and run tests where you try, say,

    a) IIS locked to 1 core

    b) IIS locked to 2 cores

    c) IIS locked to 8 cores

    and then lock wcat to the rest for requests, but no more than to keep the IIS safely under 100%. Then you will even test against different typical simple deployment scenarios.

    If those tests succeed then you should probably also see d) if actual network requests also succeed (there could be something hidden in there, why not?)

    If tests are distributed around a mean below baseline without any extraordinary positive skew, you're safe, so that should be what you test for.

    Since you have plenty of time and samples are "free" you should run plenty of times for a long time each and discard anything above even, say, 0.001.

  6. There's no network latency in the measurements.  This is requests per second on the server, not response times on the client.  That is another approach and which approach you take depends on your goals.

    Load is added from remote clients so that wcat and other unrelated processes are not competing for system resources with IIS and ASP.NET.  I have confidence that my web server performance — good or bad — is actually due to the request processing.

    There is a variety of hardware in our lab, just as I'm sure our customers have.  Unfortunately we don't have infinite time and resources – so picking good scenarios and deciding how often to run them is important.

  7. György Balássy says:

    Very useful, thanks for sharing!