Diagnosing OutOfMemoryExceptions that occur when running load tests

We’ve had a number of users report OutOfMemoryExceptions occurring in the QTAgent process (or the VSTestHost process when running locally). There can be several different causes of the OutOfMemoryExceptions. This article attempts to help diagnose and prevent these problems.

First a bit of background information: the QTAgent.exe and VSTestHost.exe processes are both 32-bit processes in VSTS 2008, even when running on 64-bit machines. This limits the virtual address space for user portion of the process (where all of the web and load test runtime code resides) to 2GB. In almost all cases where we have seen users getting OutOfMemoryExceptions while running load tests, it occurs because the virtual memory usage of the process has hit the 2GB limit rather than the machine being out of memory, and in fact your machine may have lots of memory available.

If you are getting OutOfMemoryExceptions, the first thing I would do is make sure that you have configured the Server garbage collector if you are running on a multi-processor or multi-core machine (any machine that shows more than one logical processor in the CPU History graphs when you look at the performance tab in Task Manager). To configure this, see Sean Lumley’s blog post at: https://blogs.msdn.com/slumley/pages/improve-load-test-performance-on-multi-processor-machines.aspx.

Next make sure that the %CPU on the load test agent or VSTestHost process is not too high (say over 85%); if there are not enough free CPU cycles, the .NET garbage collector may not be able to run often enough to free up memory as soon as it should.

If you’ve done that and are still getting OutOfMemoryExceptions, the next step is to determine which of these two cases your problem falls into:

1. The OutOfMemoryException is due to a memory leak.

2. There is no memory leak, but the memory usage is too high because of the workload.

To determine if there is a memory leak, I would suggest the following experiment: run a load test either locally or with a single load test agent, using a Step load pattern with a maximum user load of between 100 and 200 users (with think time enabled for so that you the CPU usage on the agent is not too high). Setup the step load parameters so that the max user load is reached within 5 minutes or so, but run the load test for 30 to 60 minutes or at least long enough to tell definitively whether or not memory usage is increasing. To monitor this, graph the performance counter Process\Virtual Bytes\QTAgent from the agent machine (or Process\Virtual Bytes\VSTestHost if running locally). Once the load test has reached the maximum user load watch this performance counter to see if it continues to go up throughout the load test. If so, there is probably a memory leak.

The next steps to take depend on the result of the above experiment:

1. If there is a memory leak:

a. If your load test contains Web tests that target an HTTPS web site, and especially ones that require your Web test to specify client certificates, you are probably running into a known product bug that causes a memory leak. There is currently no fix for this bug. However, there is a work around: in the load test’s run settings, change the “Web Test Connection Model” to “Connection Pool”. You should probably increase the size of the Connection Pool from the default of 50. You can probably set the Connection Pool Size as high as the maximum user load for the load test divided by the number of agents (since you get a connection pool of the size specified on each agent).

b. If your load test is not targeting an HTTPS web site, we don’t know of any other product bugs that would be causing a memory leak. It's possible that the memory leak is caused by user written code such as:

· unit tests

· code under test called by the unit test that runs in the same process as the unit test

· coded Web tests

· Web test plug-ins

· Custom Web test validation or extraction rules

·  A LoadTestPlugin

The most common programming practice in any of these pieces of user code that could cause a memory leak is to use a collection such as a ArrayList, List, Hashtable, or Dictionary that is declared as a static where the count of items in the collection grows as the load test runs.

To identify the specific cause of a memory leak in user written code, I would recommend:

· If your load test contains many tests, run the load test with each of the tests enabled one at a time to attempt to isolate the specific test causing the memory leak.

· Talk to the person responsible for writing that code.

· Use a tool that analyzes the memory usage in the .NET heap to debug the memory leak. The VSTS profiler can be used for this (see https://msdn.microsoft.com/en-us/library/ms182375.aspx ) as well as other memory analysis tools.

2. If there is NOT a memory leak, then the memory usage could be too high because:

a. The user load per load test agent is too high

b. The size of the Web test responses (or posted requests) is large

c. The number of rows of data in one or more data sources is very large (note that all of the rows and columns used in a data source are read into memory at the start of the load test and remain in memory until the load test completes).

d. A combination of the above

In any of these cases, the best solution, if possible, is to increase the number of load test agents, so that the number of virtual users per load test agent is smaller.