Web App Performance Counters Compilation

In order measure the performance of a common web application (a web frontend connected to a database server), to which performance counters we have to pay attention?. If we are trying to diagnose a performance problem, where to start?. What does each performance counter means? When I have to get worried by a specific counter? 

Well, there are hundreds of performance counters available, most of them related with others, and could be a little bit tricky to select which counters we want to gather. The performance of any application will be determined by the throughput of following computing elements: CPU, Memory, IO and Network. At the same time, this means that if we have an application not performing well, the bottlenecks  will be located in one or more of these elements. So, these are the elements that, in general, we have to monitor in any machine where our application is deployed.

Then, each of the machines involved in the application deployment plays a different role, so different counters must be gathered in each one. Our sample web application is using a web frontend and a database server, so we need to pay attention to specific counters to each of these roles.

Following, I have gathered from different sources a compilation of performance counters and their meanings that can be used as starting point to measure the performance or diagnose an issue in your web application. There are three different blocks: General Counters, Web Server Counters and SQL Server Counters.

General Counters

Counters to be gathered in all servers.

Category

Object/Counter

Description

Recommendations

Processor

Processor : % Processor Time

 

 

 

Processor : % Total User Time

The value of this counter helps to determine the kind of  processing that is affecting the system. Of course the resulting value is the  total amount of non-idle time that was spent on User mode operations. This  generally means application code.

 

 

System : % Total Privileged Time

This is the amount of time the processor was busy with  Kernel mode operations. If the processor is very busy and this mode is high,  it is usually an indication of some type of NT service having difficulty,  although user mode programs can make calls to the Kernel mode NT components  to occasionally cause this type of performance issue.

 

 

System : Processor Queue Length

Oddly enough, this processor counter shows up under the  System object, but not without good reason. There is only 1 queue for tasks  that need to go to the processor, even if there is more than one CPU

The resulting value is a measure of how many threads are in  the Ready state waiting to be processed. When dealing with queues, if the  value exceeds 2 for a sustained period, you are definitely having a problem  with the resource in question.

Memory

Memory\Pages/sec

The number of pages read from or written to disk to resolve  hard page faults. This counter serves as a primary indicator of the types of  faults that cause system-wide delays.

Although it is normal to have some spikes, this counter  generally remains at or close to zero

 

Memory : Page Faults/sec

This counter gives a general idea of how many times  information being requested is not where the application (and VMM) expects it  to be. The information must either be retrieved from another location in  memory or from the pagefile.

Recall that while a sustained value may indicate trouble  here, you should be more concerned with hard page faults that represent  actual reads or writes to the disk. Remember that the disk access is much  slower than RAM.

 

Memory : Page Reads/sec

This counter is probably the best indicator of a memory  shortage because it indicates how often the system is reading from disk  because of hard page faults. The system is always using the pagefile even if  there is enough RAM to support all of the applications. Thus, some number of  page reads will always be encountered

A sustained value over 5 Page Reads/sec is often a strong  indicator of a memory shortage. You must be careful about viewing these  counters to understand what they are telling you. This counter again  indicates the number of reads from the disk that were done to satisfy page  faults. The amount of pages read each time the system went to the disk may  indeed vary. This will be a function of the application and the proximity of  the data on the hard drive. Irrelevant of these facts, a sustained value of  over 5 is still a strong indicator of a memory problem. Remember the  importance of "sustained." System operations often fluctuate,  sometimes widely. So, just because the system has a Page Reads/sec of 24 for  a couple of seconds does not mean you have a memory shortage

 

Memory : Page Writes/sec

Much like the Page Reads/sec, this counter indicates how  many times the disk was written to in an effort to clear unused items out of  memory

The numbers of pages per read may change. Increasing values  in this counter often indicate a building tension in the battle for memory  resources

 

Memory : Available MB

This counter indicates the amount of memory that is left  after nonpaged pool allocations, paged pool allocations, process' working  sets, and the file system cache have all taken their piece

 

Disk

PhysicalDisk : Current Disk Queue Length

This counter provides a primary measure of disk congestion.  Just as the processor queue was an indication of waiting threads, the disk  queue is an indication of the number of transactions that are waiting to be  processed.

Recall that the queue is an important measure for services  that operate on a transaction basis. Just like the line at the supermarket,  the queue will be representative of not only the number of transactions, but  also the length and frequency of each transaction

 

PhysicalDisk : % Disk Time

Much like % Processor time, this counter is a general mark  of how busy the disk is. You will see many similarities between the disk and  processor since they are both transaction-based services

This counter indicates a disk problem, but must be observed  in conjunction with the Current Disk Queue Length counter to be truly  informative. Recall also that the disk could be a bottleneck prior to the %  Disk Time reaching 100%.

 

PhysicalDisk : Avg. Disk Queue Length

This counter is actually strongly related to the %Disk Time  counter. This counter converts the %Disk Time to a decimal value and displays  it. This counter will be needed in times when the disk configuration employs  multiple controllers for multiple physical disks. In these cases, the overall  performance of the disk I/O system, which consists of two controllers, could  exceed that of an individual disk.

 If you were looking  at the %Disk Time counter, you would only see a value of 100%, which wouldn't  represent the total potential of the entire system, but only that it had  reached the potential of a single disk on a single controller. The real value  may be 120% which the Avg. Disk Queue Length counter would display as 1.2.

 

The number of requests should not exceed two times the  number of spindles constituting the physical disk. If the number of requests  is too high, you can add additional disks or replace the existing disks with  faster disks.

 

PhysicalDisk: Disk Transfers/sec

The rates of read and write operations on the disk. Define  a counter for each physical disk on the server.

 

Network

Network Interface\Bytes total/sec

Number of bytes traveling over the network interface per  second. This counter only reflects the local network connection.

If this value stays below 50 percent of your available  network bandwidth, the network adapter on the server running SQL Server 2000  is not likely to cause any performance bottlenecks.

 

Web Server Counters

Counters to be gathered in the servers acting as web frontend in addition to the general counters.

Category

Object/Counter

Description

Recommendations

Process

Process(aspnet_wp)\% Processor Time

 

 

 

Process(aspnet_wp)\ Private Bytes

 

 

 

Process(aspnet_wp)\ Virtual Bytes

 

 

 

Process(aspnet_wp)\ Handle Count

 

 

.NET

.NET CLR Exceptions\# Exceps thrown / sec

The total number of managed exceptions thrown per second

As this number increases, performance degrades. Exceptions  should not be thrown as part of normal processing. Note, however, that  Response.Redirect, Server.Transfer, and Response.End all cause a  ThreadAbortException to be thrown multiple times, and a site that relies  heavily upon these methods will incur a performance penalty. If you must use  Response.Redirect, call Response.Redirect(url, false), which does not call  Response.End, and hence does not throw. The downside is that the user code  that follows the call to Response.Redirect(url, false) will execute. It is  also possible to use a static HTML page to redirect. Microsoft Knowledge Base  Article 312629 provides further detail.

 

Threshold: 5% of RPS. Values greater than this should be  investigated, and a new threshold should be set as necessary

 

.NET CLR Security(_Global_)\% Time in RT checks

Displays the percentage of elapsed time spent performing  runtime code access security checks since the last sample. This counter is  updated at the end of a .NET Framework security check. It is not an average;  it represents the last observed value

 

 

.NET CLR Memory\% Time in GC

The percentage of time spent performing the last garbage  collection. An average value of 5% or less would be considered healthy, but  spikes larger than this are not uncommon. Note that all threads are suspended  during a garbage collection

Threshold: an average of 5% or less; short-lived spikes  larger than this are common. Average values greater than this should be  investigated. A new threshold should be set as necessary

ASP.NET

ASP.NET\Application Restarts

    •   

 The number of   application restarts. Recreating the application domain and recompiling   pages takes time, therefore unforeseen application restarts should be   investigated. The application domain is unloaded when one of the following   occurs:

  •   
    •    
    • Modification of machine.config, web.config, or        global.asax.
    •    
    • Modification of the application's bin directory or its        contents.
    •    
    • When the number of compilations (ASPX, ASCX, or ASAX)        exceeds the limit specified by <compilation        numRecompilesBeforeAppRestart=/>.
    •    
    • Modification of the physical path of a virtual        directory.
    •    
    • Modification of the code-access security policy.
    •    
    • The Web service is restarted.
    •   

Threshold: 0. In a perfect world, the application domain  will survive for the life of the process. Excessive values should be  investigated, and a new threshold should be set as necessary.

 

 

ASP.NET\Requests Rejected

The number of request currently rejected

 

 

ASP.NET\Worker Process Restarts

The number of aspnet_wp process restarts.

Threshold: 1. Process restarts are expensive and  undesirable. Values are dependent upon the process model configuration  settings, as well as unforeseen access violations, memory leaks, and  deadlocks. If aspnet_wp restarts, an Application Event Log entry will  indicate why. Requests will be lost if an access violation or deadlock  occurs. If process model settings are used to preemptively recycle the  process, it will be necessary to set an appropriate threshold.

 

 

ASP.NET\Request Execution Time

The number of milliseconds taken to execute the last  request. In version 1.0 of the Framework, the execution time begins when the  worker process receives the request, and stops when the ASP.NET ISAPI sends  HSE_REQ_DONE_WITH_SESSION to IIS. For IIS version 5, this includes the time  taken to write the response to the client, but for IIS version 6, the  response buffers are sent asynchronously, and so the time taken to write the  response to the client is not included. Thus on IIS version 5, a client with  a slow network connection will increase the value of this counter  considerably.

 

 

ASP.NET\Requests Current

The number of requests currently handled by the ASP.NET  ISAPI. This includes those that are queued, executing, or waiting to be  written to the client

 

 

ASP.NET\Request Queued

The number of requests currently queued

Threshold: 0. The value of this counter should be 0. Values  greater than this should be investigated

 

ASP.NET\Request Wait Time

The number of milliseconds that the most recent request  spent waiting in the queue, or named pipe, that exists between inetinfo and  aspnet_wp (see description of Requests Queued). This does not include any  time spent waiting in the application queues.

Threshold: 1000. The average request should spend 0  milliseconds waiting in the queue.

 

ASP.NET Applications(__Total__)\Requests Total

The number of requests since the application was started

 

 

ASP.NET Applications(__Total__)\Requests/Sec

The number of requests executed per second. I prefer  "Web Service\ISAPI Extension Requests/sec" because it is not  affected by application restarts

 

 

ASP.NET Applications(__Total__)\Errors Total

The sum of Errors During Preprocessing, Errors During  Compilation, and Errors During Execution

 

 

ASP.NET Applications(__Total__)\Errors Total/Sec

The total of Errors During Preprocessing, Errors During  Compilation, and Errors During Execution per second.

 

 

ASP.NET Applications(__Total__)\Cache API Entries

The number of entries currently in the user cache.

 

 

ASP.NET Applications(__Total__)\Cache API Hit Ratio

The total hit-to-miss ratio of User Cache requests.

 

 

ASP.NET Applications(__Total__)\Cache API Turnover Rate

The number of additions and removals to the user cache per  second. A high turnover rate indicates that items are being quickly added and  removed, which can be expensive.

 

 

ASP.NET Applications(__Total__)\Cache Total Entries

The current number of entries in the cache (both User and  Internal). Internally, ASP.NET uses the cache to store objects that are  expensive to create, including configuration objects, preserved assembly  entries, paths mapped by the MapPath method, and in-process session state  objects.

Note   The  "Cache Total" family of performance counters is useful for  diagnosing issues with in-process session state. Storing too many objects in  the cache is often the cause of memory leaks

 

ASP.NET Applications(__Total__)\Cache Total Hit Ratio

The number of additions and removals to the cache per  second (both user and internal). A high turnover rate indicates that items  are being quickly added and removed, which can be expensive.

 

 

ASP.NET Applications(__Total__)\Cache Total     Turnover Rate

The number of additions and removals to the user cache per  second. A high turnover rate indicates that items are being quickly added and  removed, which can be expensive.

 

IIS

Web Service(_Total)\Current Connections

A threshold for this counter is dependent upon many  variables, such as the type of requests (ISAPI, CGI, static HTML, and so on),  CPU utilization, and so on. A threshold should be  developed through experience.

 

 

Web Service(_Total)\ISAPI Extension Requests/sec

Used primarily as a metric for diagnosing performance  issues. It can be interesting to compare this with "ASP.NET  Applications\Requests/sec" and "Web Service\Total Method  Requests/sec." Note that this includes requests to all ISAPI extensions,  not just aspnet_isapi.dll

 

 

SQL Server Counters

Counters to be gathered in the servers hosting the SQL Server service in addition to the general counters.

Category

Object/Counter

Description

Recommendations

SQL Server

SQLServer:Memory Manager\Total Server Memory

The total memory in use by SQL

Add memory to the server if this value is generally higher  than the amount of physical memory in the server.

 

SQLServer:Access Methods\Full Scans/sec

The number of unrestricted full scans. These can either be  base table or full index scans.

 

 

SQLServer:Buffer Manager\Buffer Cache Hit Ratio

The percentage of pages that were found in the buffer pool  without having to incur a read from disk.

When this percentage is high, your server is operating at  optimal disk I/O efficiency. If this value decreases over time, you might  consider adding physical memory to your server.

 

SQLServer:Databases\Log Growths

The total number of log growths for the selected database.

Run against your application database instance

 

SQLServer:Databases Application Database\Percent Log Used

The percent of space in the log that is in use.

Run against your application database instance

 

SQLServer:Databases Application Database\Transactions/sec

The number of transactions started for the database.

Run against your application database instance.

 

SQLServer:General Statistics\User Connections

The number of users connected to the system.

Research any dramatic shifts in this value

 

SQLServer:Latches\Average Latch Wait Time

The average latch wait time, in milliseconds, for latch  requests that had to wait.

If this number is high, your server might have resource  limitations.

 

SQLServer:Locks\Average Wait Time

The average amount of wait time, in milliseconds, for each  lock request that resulted in a wait.

 

 

SQLServer:Locks\Lock Waits/sec

The number of lock requests that could not be satisfied  immediately and required the caller to wait before the lock was granted.

 

 

SQLServer:Locks\Number of Deadlocks/sec

The number of lock requests that resulted in a deadlock.

 

 

SQLServer:Memory Manager\Memory Grants Pending

The current number of processes waiting for a workspace  memory grant.

 

 

SQLServer: SQL Statistics: Batch Requests/Sec

This counter measures the number of batch requests that SQL  Server receives per second, and generally follows in step to how busy your  server’s CPUs are

Generally speaking, over 1000 batch requests per second  indicates a very busy SQL Server, and could mean that if you are not already  experiencing a CPU bottleneck, that you may very well soon. Of course, this  is a relative number, and the bigger your hardware, the more batch requests  per second SQL Server can handle.

 

SQLServer: SQL Statistics: SQL Compilations/Sec

How many compilations are performed by SQL Server per  second

If you find that your server is performing over 100  compilations per second, you should take the time to investigate if the cause  of this is something that you can control. Too many compilations will hurt  your SQL Server’s performance

 

SQLServer: SQL Statistics: SQL Re-Compilations/Sec

Number of statement recompiles per second. Counts the  number of times statement recompiles are triggered.

Generally, you want the recompiles to be low