Comparing Linux/Unix and Windows Performance Counters on Microsoft Azure

 

Wow… has it really been over two years since I wrote a blog post? Nobody out there is reaching out to me and keeping me honest here?  Oh my.
That’s okay though, all the internal work I have been doing has certainly kept me heads-down in many cases, and we have expanded the Azure platform exponentially over the past two years.  Just wait until my Azure peeps out there see what is coming. Stay tuned to those Build and Ignite announcements, and of course the Azure blog and perhaps this little corner of cyberspace.  :)

Well, I’m going to bring forth a post I was working on, and never got around to actually making.  This should help some of you out there in the dual OS world a little bit, who are collecting diagnostics information from both Windows and Linux virtual machines on Microsoft Azure.  Let’s start off with the Linux/Unix side with Sar, and then look at Windows Performance counters. You will see, there is clearly a difference in the counters, and in many cases, they aren't relatable to each other 1:1.  This is because of differences in kernels, running on a hypervisor cloud platform, and in some cases, the way counters are collected cumulatively by the OS platforms.  This is good information to be aware of. 

SYSTAT - SAR Returned System Metrics

systat is a popular system performance tool for Linux and Unix administrators. The sysstat package contains the sar, mpstat and iostat commands for Linux. The sar command collects and reports system activity information. The iostat command reports CPU utilization and I/O statistics for disks. The mpstat command reports global and per-processor statistics. sar is mainly utilized to collect, report, or save system activity information. sar writes to standard output the contents of selected cumulative activity counters in the operating system. If multiple samples and multiple reports are desired, it is convenient to specify an output file for the sar command. Collection of data in this manner is useful to characterize system usage over a period of time and determine peak usage hours.

sar -u 1 1
This gives the cumulative real-time CPU usage of all CPUs. “1 1″ reports for every 1 seconds a total of 1 times. Most likely you’ll focus on the last field “%idle” to see the cpu load
%user, %nice, %system, %iowait, %steal, %idle

§ %user  - Percentage of CPU utilization that occurred while executing at the user level (application). Note that this field includes time spent running virtual processors.

§ %nice  - Percentage of CPU utilization that occurred while executing at the user level with nice priority.

§ %system  - Percentage of CPU utilization that occurred while executing at the system level (kernel). Note that this field includes time spent servicing hardware and software interrupts.

§ %iowait  - Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.

§ %steal  - Percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.

§ %idle  - Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

sar -u ALL 1 1
Same as “sar -u” but displays additional fields.
%usr, %nice, %sys, %iowait, %steal, %irq, %soft, %guest, %idle

§ %usr - Percentage of CPU utilization that occurred while executing at the user level (application). Note that this field does NOT include time spent running virtual processors.

§ %nice  - Percentage of CPU utilization that occurred while executing at the user level with nice priority.

§ %sys - Percentage of CPU utilization that occurred while executing at the system level (kernel). Note that this field does NOT include time spent servicing hardware and software interrupts.

§ %iowait  - Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.

§ %steal  - Percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.

§ %irq - Percentage of time spent by the CPU or CPUs to service hardware interrupts.

§ %soft – Percentage of time spent by the CPU or CPUs to service software interrupts.

§ %guest - Percentage of time spent by the CPU or CPUs to run a virtual processor.

§ %idle  - Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

sar –w 1 1
Display context switch per second: This reports task creation and system switching activity. i.e. The total number of processes created per second, and total number of context switches per second. “1 1″ reports for every 1 seconds a total of 1 times.
proc/s, cswch/s

§ proc/s - Total number of tasks created per second.

§ cswch/s  - Total number of context switches per second.

sar -r 1 1
Memory Free and Used: This reports the memory utilization statistics. “1 1″ reports for every 1 seconds a total of 1 times. Most likely you’ll focus on “kbmemfree” and “kbmemused” for free and used memory.
kbmemfree, kbmemused, %memused, kbbuffers, kbcached, kbcommit, %commit

§ kbmemfree  - Amount of free memory available in kilobytes.

§ kbmemused - Amount of used memory in kilobytes. This does not take into account memory used by the kernel itself.

§ %memused  - Percentage of used memory.

§ kbbuffers  - Amount of memory used as buffers by the kernel in kilobytes.

§ kbcached  - Amount of memory used to cache data by the kernel in kilobytes.

§ kbcommit  - Amount of memory in kilobytes needed for current workload. This is an estimate of how much RAM/swap is needed to guarantee that there never is out of memory.

§ %commit  - Percentage of memory needed for current workload in relation to the total amount of memory (RAM+swap). This number may be greater than 100% because the kernel usually overcommits memory.

sar -b 1 1
This reports I/O statistics. “1 1″ reports for every 1 seconds a total of 1 times.
tps, rtps, wtps, bread/s, bwrtn/s

§ tps – Transactions per second (this includes both read and write)

§ rtps – Read transactions per second

§ wtps – Write transactions per second

§ bread/s – Bytes read per second

§ bwrtn/s – Bytes written per second

sar -q 1 1                                                                                                             
Reports run queue and load average: This reports the run queue size and load average of last 1 minute, 5 minutes, and 15 minutes. “1 1″ reports for every 1 seconds a total of 1 times.
runq-sz, plist-sz, ldavg-q, ldavg-5, ldavg-15

§ runq-sz - Run queue length (number of tasks waiting for run time).

§ plist-sz  - Number of tasks in the process list.

§ ldavg-1  - System load average for the last minute. The load average is calculated as the average number of runnable or running tasks (R state), and the number of tasks in uninterruptible sleep (D state) over the specified interval.

§ ldavg-5  - System load average for the past 5 minutes.

§ ldavg-15  - System load average for the past 15 minutes.

sar -n SOCK 1 1
Report network statistics: This reports various network statistics: For example: number of packets received (transmitted) through the network card, statistics of packet failure etc.. SOCK – Displays sockets in use for IPv4. “1 1″ reports for every 1 seconds a total of 1 times
totsck, tcpsck, udpsck, rawsck, ip-frag, tcp-tw

§ totsck  - Total number of sockets used by the system.

§ tcpsck  - Number of TCP sockets currently in use.

§ udpsck  - Number of UDP sockets currently in use.

§ rawsck  - Number of RAW sockets currently in use.

§ ip-frag  - Number of IP fragments currently in queue.

§ tcp-tw - Number of TCP sockets in TIME_WAIT state.

sar -n DEV 1 1
Report network statistics: This reports various network statistics: For example: number of packets received (transmitted) through the network card, statistics of packet failure etc.. ◾DEV – Displays network devices vital statistics for eth0, eth1, etc.. “1 1″ reports for every 1 seconds a total of 1 times
rxpck/s, txpck/s, rxkB/s txkB/s, rxcmp/s, txcmp/s, rxmcst/s

§ rxpck/s - Total number of packets received per second.

§ txpck/s  - Total number of packets transmitted per second.

§ rxkB/s  - Total number of kilobytes received per second.

§ txkB/s  - Total number of kilobytes transmitted per second.

§ rxcmp/s  - Number of compressed packets received per second (for cslip etc.).

§ txcmp/s  - Number of compressed packets transmitted per second.

§ rxmcst/s  - Number of multicast packets received per second.

sar -B 1 1
This reports the swap paging statistics: Use “sar -B” to generate paging statistics. i.e.: Number of KB paged in (and out) from disk per second. “1 1″ reports for every 1 seconds a total of 1 times.
pgpgin/s, pgpgout/s fault/s, majflt/s, pgfree/s, pgscank/s, pgscand/s, pgsteal/s, %vmeff

§ pgpgin/s  - Total number of kilobytes the system paged in from disk per second.

§ pgpgout/s  - Total number of kilobytes the system paged out to disk per second.

§ fault/s  - Number of page faults (major + minor) made by the system per second. This is not a count of page faults that generate I/O, because some page faults can be resolved without I/O.

§ majflt/s  - Number of major faults the system has made per second, those which have required loading a memory page from disk.

§ pgfree/s - Number of pages placed on the free list by the system per second.

§ pgscank/s - Number of pages scanned by the kswapd daemon per second.

§ pgscand/s - Number of pages scanned directly per second.

§ pgsteal/s - Number of pages the system has reclaimed from cache (pagecache and swapcache) per second to satisfy its memory demands.

§ %vmeff - Calculated as pgsteal / pgscan, this is a metric of the efficiency of page reclaim. If it is near 100% then almost every page coming off the tail of the inactive list is being reaped. If it gets too low (e.g. less than 30%) then the virtual memory is having some difficulty. This field is displayed as zero if no pages have been scanned during the interval of time.   

References:
Systat Utilities Manual Page - https://sebastien.godard.pagesperso-orange.fr/man_sar.html

Windows Performance Monitor Performance Counters

Processor and Processor Information
The Processor performance object consists of counters that measure aspects of processor activity. The processor is the part of the computer that performs arithmetic and logical computations, initiates operations on peripherals, and runs the threads of processes.  A computer can have multiple processors.  The processor object represents each processor as an instance of the object.

The Processor Information performance counter set consists of counters that measure aspects of processor activity. The processor is the part of the computer that performs arithmetic and logical computations, initiates operations on peripherals, and runs the threads of processes. A computer can have multiple processors. On some computers, processors are organized in NUMA nodes that share hardware resources such as physical memory. The Processor Information counter set represents each processor as a pair of numbers, where the first number is the NUMA node number and the second number is the zero-based index of the processor within that NUMA node. If the computer does not use NUMA nodes, the first number is zero.

§   Processor\% User Time is the percentage of elapsed time the processor spends in the user mode. User mode is a restricted processing mode designed for applications, environment subsystems, and integral subsystems.  The alternative, privileged mode, is designed for operating system components and allows direct access to hardware and all memory.  The operating system switches application threads to privileged mode to access operating system services. This counter displays the average busy time as a percentage of the sample time.

§ Processor\% Processor Time is the percentage of elapsed time that the processor spends to execute a non-Idle thread. It is calculated by measuring the percentage of time that the processor spends executing the idle thread and then subtracting that value from 100%. (Each processor has an idle thread that consumes cycles when no other threads are ready to run). This counter is the primary indicator of processor activity, and displays the average percentage of busy time observed during the sample interval. It should be noted that the accounting calculation of whether the processor is idle is performed at an internal sampling interval of the system clock (10ms). On today’s fast processors, % Processor Time can therefore underestimate the processor utilization as the processor may be spending a lot of time servicing threads between the system clock sampling intervals. Workload based timer applications are one example of applications which are more likely to be measured inaccurately as timers are signaled just after the sample is taken.

§ Processor\% Idle Time is the percentage of time the processor is idle during the sample interval.

§ Processor\% Interrupt Time is the time the processor spends receiving and servicing hardware interrupts during sample intervals. This value is an indirect indicator of the activity of devices that generate interrupts, such as the system clock, the mouse, disk drivers, data communication lines, network interface cards and other peripheral devices. These devices normally interrupt the processor when they have completed a task or require attention. Normal thread execution is suspended during interrupts. Most system clocks interrupt the processor every 10 milliseconds, creating a background of interrupt activity. Suspends normal thread execution during interrupts.  This counter displays the average busy time as a percentage of the sample time.

§ Processor\% Privileged Time is the percentage of elapsed time that the process threads spent executing code in privileged mode.  When a Windows system service in called, the service will often run in privileged mode to gain access to system-private data. Such data is protected from access by threads executing in user mode. Calls to the system can be explicit or implicit, such as page faults or interrupts. Unlike some early operating systems, Windows uses process boundaries for subsystem protection in addition to the traditional protection of user and privileged modes. Some work done by Windows on behalf of the application might appear in other subsystem processes in addition to the privileged time in the process.

§ Processor Information\% Priority Time is the percentage of elapsed time that the processor spends executing threads that are not low priority. It is calculated by measuring the percentage of time that the processor spends executing low priority threads or the idle thread and then subtracting that value from 100%. (Each processor has an idle thread to which time is accumulated when no other threads are ready to run). This counter displays the average percentage of busy time observed during the sample interval excluding low priority background work. It should be noted that the accounting calculation of whether the processor is idle is performed at an internal sampling interval of the system clock tick. % Priority Time can therefore underestimate the processor utilization as the processor may be spending a lot of time servicing threads between the system clock sampling intervals. Workload based timer applications are one example of applications which are more likely to be measured inaccurately as timers are signaled just after the sample is taken.

· Processor\Interrupts/sec is the average rate, in incidents per second, at which the processor received and serviced hardware interrupts. It does not include deferred procedure calls (DPCs), which are counted separately. This value is an indirect indicator of the activity of devices that generate interrupts, such as the system clock, the mouse, disk drivers, data communication lines, network interface cards, and other peripheral devices. These devices normally interrupt the processor when they have completed a task or require attention. Normal thread execution is suspended. The system clock typically interrupts the processor every 10 milliseconds, creating a background of interrupt activity. This counter displays the difference between the values observed in the last two samples, divided by the duration of the sample interval.

In comparing to systat, the following counters are not contrastable to performance counters available while running within Microsoft Azure:

§ nice is a utility program on Unix/Linux systems that maps to a kernel call of the same name.  The value has to do with executing a program with a higher or lower priority than the default value of other executing processes on the system. 
§ %steal  - Percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
§ %guest - Percentage of time spent by the CPU or CPUs to run a virtual processor.

Memory
The Memory performance object consists of counters that describe the behavior of physical and virtual memory on the computer.  Physical memory is the amount of random access memory on the computer.  Virtual memory consists of the space in physical memory and on disk.  Many of the memory counters monitor paging, which is the movement of pages of code and data between disk and physical memory.  Excessive paging, a symptom of a memory shortage, can cause delays which interfere with all system processes.

§ Memory\Available KBytes is the amount of physical memory, in Kilobytes, immediately available for allocation to a process or for system use. It is equal to the sum of memory assigned to the standby (cached), free and zero page lists.

§ Memory\Committed Bytes is the amount of committed virtual memory, in bytes. Committed memory is the physical memory which has space reserved on the disk paging file(s). There can be one or more paging files on each physical drive. This counter displays the last observed value only; it is not an average.

§ Memory\System Code Total Bytes is the size, in bytes, of the pageable operating system code currently mapped into the system virtual address space. This value is calculated by summing the bytes in Ntoskrnl.exe, Hal.dll, the boot drivers, and file systems loaded by Ntldr/osloader.  This counter does not include code that must remain in physical memory and cannot be written to disk. This counter displays the last observed value only; it is not an average.

§ Memory\Pool Nonpaged Bytes is the size, in bytes, of the non-paged pool, an area of the system virtual memory that is used for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated.  Memory\Pool Nonpaged Bytes is calculated differently than Process\Pool Nonpaged Bytes, so it might not equal Process(_Total)\Pool Nonpaged Bytes.  This counter displays the last observed value only; it is not an average.

§ Memory\Cache Bytes the size, in bytes, of the portion of the system file cache which is currently resident and active in physical memory. The Cache Bytes and Memory\System Cache Resident Bytes counters are equivalent.  This counter displays the last observed value only; it is not an average.

§ Memory\Commit Limit is the amount of virtual memory that can be committed without having to extend the paging file(s).  It is measured in bytes. Committed memory is the physical memory which has space reserved on the disk paging files. There can be one paging file on each logical drive). If the paging file(s) are be expanded, this limit increases accordingly.  This counter displays the last observed value only; it is not an average.

§ Memory\% Committed Bytes In Use is the ratio of Memory\Committed Bytes to the Memory\Commit Limit. Committed memory is the physical memory in use for which space has been reserved in the paging file should it need to be written to disk. The commit limit is determined by the size of the paging file.  If the paging file is enlarged, the commit limit increases, and the ratio is reduced). This counter displays the current percentage value only; it is not an average.

§ Memory\Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays.  It is the sum of Memory\Pages Input/sec and Memory\Pages Output/sec.  It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.

§ Memory\Page Faults/sec is the average number of pages faulted per second. It is measured in number of pages faulted per second because only one page is faulted in each fault operation, hence this is also equal to the number of page fault operations. This counter includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle large numbers of soft faults without significant consequence. However, hard faults, which require disk access, can cause significant delays.

§ Page Reads/sec is the rate at which the disk was read to resolve hard page faults. It shows the number of reads operations, without regard to the number of pages retrieved in each operation. Hard page faults occur when a process references a page in virtual memory that is not in working set or elsewhere in physical memory, and must be retrieved from disk. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It includes read operations to satisfy faults in the file system cache (usually requested by applications) and in non-cached mapped memory files. Compare the value of Memory\\Pages Reads/sec to the value of Memory\\Pages Input/sec to determine the average number of pages read during each operation.

§ Page Writes/sec is the rate at which pages are written to disk to free up space in physical memory. Pages are written to disk only if they are changed while in physical memory, so they are likely to hold data, not code.  This counter shows write operations, without regard to the number of pages written in each operation.  This counter displays the difference between the values observed in the last two samples, divided by the duration of the sample interval.

Physical Disk The Physical Disk performance object consists of counters that monitor hard or fixed disk drive on a computer.  Disks are used to store file, program, and paging data and are read to retrieve these items, and written to record changes to them.  The values of physical disk counters are sums of the values of the logical disks (or partitions) into which they are divided.

§ PhysicalDisk\Disk Transfers/sec is the rate of read and write operations on the disk.

§ PhysicalDisk\Disk Reads/sec is the rate of read operations on the disk.

§ PhysicalDisk\Disk Writes/sec is the rate of write operations on the disk.

§ PhysicalDisk\Disk Bytes/sec is the rate bytes are transferred to or from the disk during write or read operations.

§ PhysicalDisk\Avg. Disk Bytes/Read is the average number of bytes transferred from the disk during read operations.

§ PhysicalDisk\Disk Write Bytes/sec is rate at which bytes are transferred to the disk during write operations.

§ PhysicalDisk\Avg. Disk Bytes/Write is the average number of bytes transferred to the disk during write operations.

System The System performance object consists of counters that apply to more than one instance of a component processors on the computer.

§ System\Processes is the number of processes in the computer at the time of data collection. This is an instantaneous count, not an average over the time interval.  Each process represents the running of a program.

§ System\Processor Queue Length is the number of threads in the processor queue.  Unlike the disk counters, this counter counters, this counter shows ready threads only, not threads that are running.  There is a single queue for processor time even on computers with multiple processors. Therefore, if a computer has multiple processors, you need to divide this value by the number of processors servicing the workload. A sustained processor queue of less than 10 threads per processor is normally acceptable, dependent of the workload.

§ System\Threads is the number of threads in the computer at the time of data collection. This is an instantaneous count, not an average over the time interval.  A thread is the basic executable entity that can execute instructions in a processor.

§ System\System Calls/sec is the combined rate of calls to operating system service routines by all processes running on the computer. These routines perform all of the basic scheduling and synchronization of activities on the computer, and provide access to non-graphic devices, memory management, and name space management. This counter displays the difference between the values observed in the last two samples, divided by the duration of the sample interval.

§ System\System Up Time is the elapsed time (in seconds) that the computer has been running since it was last started.  This counter displays the difference between the start time and the current time.

§ System\File Write Operations/sec is the combined rate of the file system write requests to all devices on the computer, including requests to write to data in the file system cache.  It is measured in numbers of writes. This counter displays the difference between the values observed in the last two samples, divided by the duration of the sample interval.

§ System\File Read Operations/sec is the combined rate of file system read requests to all devices on the computer, including requests to read from the file system cache.  It is measured in numbers of reads.  This counter displays the difference between the values observed in the last two samples, divided by the duration of the sample interval.

Network Adapter The Network Adapter performance object consists of counters that measure the rates at which bytes and packets are sent and received over a physical or virtual network connection.  It includes counters that monitor connection errors. 

§ Network Adapter\Packets Received/sec is the rate at which packets are received on the network interface. 

§ Network Adapter\Packets Sent/sec is the rate at which packets are sent on the network interface. 

§ Network Adapter\Bytes Received/sec is the rate at which bytes are received over each network adapter, including framing characters. Network Interface\Bytes Received/sec is a subset of Network Interface\Bytes Total/sec. 

§ Network Adapter\Bytes Sent/sec is the rate at which bytes are sent over each network adapter, including framing characters. Network Interface\Bytes Sent/sec is a subset of Network Interface\Bytes Total/sec. 

§ Network Adapter\Packets Received Non-Unicast/sec is the rate at which non-unicast (subnet broadcast or subnet multicast) packets are delivered to a higher-layer protocol. 

§ Network Adapter\Packets Received Unicast/sec is the rate at which (subnet) unicast packets are delivered to a higher-layer protocol. 

§ Network Adapter\Current Bandwidth is an estimate of the current bandwidth of the network interface in bits per second (BPS).  For interfaces that do not vary in bandwidth or for those where no accurate estimation can be made, this value is the nominal bandwidth.   

In comparing to systat, the following counters are not contrastable to performance counters on Microsoft Azure:

§ rxcmp/s  - Number of compressed packets received per second (for cslip etc.). 
§ txcmp/s  - Number of compressed packets transmitted per second.

Paging File

The Paging File performance object consists of counters that monitor the paging file(s) on the computer.  The paging file is a reserved space on disk that backs up committed physical memory on the computer. The Process performance object consists of counters that monitor running application program and system processes.  All the threads in a process share the same address space and have access to the same data.

§ Paging File\% Usage The amount of the Page File instance in use in percent.

§ Paging File\% Usage Peak The peak usage of the Page File instance in percent.

§ Process\Page File Bytes is the current amount of virtual memory, in bytes, that this process has reserved for use in the paging file(s). Paging files are used to store pages of memory used by the process that are not contained in other files. Paging files are shared by all processes, and the lack of space in paging files can prevent other processes from allocating memory. If there is no paging file, this counter reflects the current amount of virtual memory that the process has reserved for use in physical memory.

§ Process\Page File Bytes Peak is the maximum amount of virtual memory, in bytes, that this process has reserved for use in the paging file(s). Paging files are used to store pages of memory used by the process that are not contained in other files.  Paging files are shared by all processes, and the lack of space in paging files can prevent other processes from allocating memory. If there is no paging file, this counter reflects the maximum amount of virtual memory that the process has reserved for use in physical memory.

§ Process\Pool Paged Bytes is the size, in bytes, of the paged pool, an area of the system virtual memory that is used for objects that can be written to disk when they are not being used.  Memory\Pool Paged Bytes is calculated differently than Process\Pool Paged Bytes, so it might not equal Process(_Total)\Pool Paged Bytes. This counter displays the last observed value only; it is not an average.

§ Process\Pool Nonpaged Bytes is the size, in bytes, of the nonpaged pool, an area of the system virtual memory that is used for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated.  Memory\Pool Nonpaged Bytes is calculated differently than Process\Pool Nonpaged Bytes, so it might not equal Process(_Total)\Pool Nonpaged Bytes.  This counter displays the last observed value only; it is not an average.

References:
Windows Performance Monitor: https://technet.microsoft.com/en-us/library/cc749249.aspx

The views and opinions stated in this blog are mine and do not necessarily reflect those of Microsoft.
Each posting on this blog is provided "AS IS" with no warranties, and confers no rights.