Hyper-V Performance Counters – Part four of many – “Hyper-V Hypervisor Virtual Processor” and “Hyper-V Hypervisor Root Virtual Processor” counter set

The “Hyper-V Hypervisor Virtual Processor” and “Hyper-V Hypervisor Root Virtual Processor” counter sets have the same counters. The only difference between the two is the ““Hyper-V Hypervisor Root Virtual Processor” contains counters for only the Root Virtual Processors (VP’s) whereas “Hyper-V Hypervisor Virtual Processor” has counter for all other partitions.

The virtual processor counters are very useful because they help you understand how much guest VM’s are running and where they are running. Unfortunately these counters do suffer from a small amount of clock skew in WS08 Hyper-V but this only slightly reduces their usefulness. We hope to remove the clock skew in future releases. The skew shows up in that some” %” counters may exceed 100%. I’ve seen some go as much as 110% depending on the system load. The problem has to do with the fact this counter set uses the clock from the root rather than from the hypervisor as a basis of time. For more on clock skew see (https://blogs.msdn.com/tvoellm/archive/2008/03/20/hyper-v-clocks-lie.aspx).

Virtual Processors (VP) are the unit of execution for a partition and each partition contains one guest virtual machine (VM). For each VP there is a set of counters. Perfmon.exe will let you view the counters separately or as an average for all VP’s called “_Total”. VP counters are prefixed with the name of the partition like this “WS08 Guest 1:” followed by the VP id like this “Hv VP 0”. This makes it easy to identify which VP’s go with which partitions.

The VP counters have a lot of detail on what the virtual processors are doing so I have ordered them with the most useful counters at the top.

Hyper-V Hypervisor [Root] Virtual Processor counters

· %Guest Run Time – For guest VM’s this is the percentage of time the guest VP is running in non-hypervisor code on an LP or for the _Total the total across all guest VP’s. For the root this is the percentage of time the root VP is running in non-hypervisor code on an LP or for _Total the total across all root VP’s. If you sum the _Total for both the guest VP’s and root VP’s this will equal the % Guest Run Time _Total of the Logical Processor counter set.

· %Hypervisor Run Time – For guest VM’s this is the percentage of time the guest VP is running in hypervisor code on an LP or for the _Total the total across all guest VP’s. For the root this is the percentage of time the root VP is running in hypervisor code on an LP or for _Total the total across all root VP’s. If you sum the _Total for both the guest VP’s and root VP’s this will equal the % Hypervisor Run Time _Total of the Logical Processor counter set.

· %Total Run Time – This is just a sum of %Guest Run Time + % Hypervisor Runtime on a per VP basis. If you sum the %Total Run Time across the Root Virtual Processor and Virtual Processor counter sets it will equal the sum of %Total Run Time from all the Logical Processor counters.

· Total Intercepts/sec – Whenever a guest VP needs to exit is current mode of running for servicing in the hypervisor this is called an intercept. Some common causes of intercepts are resolving Guest Physical Address (GPA) to Server Physics Address (SPA) translations, privileged instructions like hlt / cupid / in / out, and the end of the VP’s scheduled time slice.

· Total Intercepts Cost – This is a relative measure of cost of intercepts. The cost can vary based on the types of intercepts and the machine architecture.

· Hypercalls/sec – Hypercalls are one form of enlightenment. Guest OS’s use the enlightenments to more efficiently use the system via the hypervisor. TLB flush is an example hypercall. If this value is zero and says zero this is an indication that Integration Components are not installed. New OS’s like WS08 can use hypercalls without enlightened drivers so it is only a prereq. not a guarantee of having Integration Components installed.

· Hypercalls Cost – This is a relative measure of cost of hypercalls. The cost can vary based on the types of calls and the machine architecture.

· HLT Instructions/sec – Number of CPU halts per second on the VP. A HLT will cause the hypervisor scheduler to de-schedule the current VP and move to the next VP in the runlist.

· HLT Instructions Cost - This is a relative measure of cost of halt. The cost can vary based on the machine architecture.

· IO Instructions/sec – Number of CPU in / out instructions executed per second. Many older or low bandwidth devices use “programmed I/O” via in / out instructions.

· IO Instructions Cost - This is a relative measure of cost of the in / out instructions. The cost can vary based on the machine architecture.

· Page Fault Intercepts/sec – Whenever guest code accesses a page not in the CPU TLB a page fault will occur. This counter is the number of Page Faults per second. This counter is closely correlated with the Large Page TLB Fills /sec Small Page TLB Fills / sec counters.

· Page Fault Intercepts Cost - This is a relative measure of cost of a page fault. The cost can vary based on the machine architecture.

· Large Page TLB Fills/sec – There are two types of TLB entries (and some three). Small TLB which generally means a 4K page and Large Page which generally means 2MB. There are fewer Large TLB entries on the order of 8 – 32. This counter is the number of Large Page TLB fills / second. A non-zero value indicates the guest OS is using large pages.

· Small Page TLB Fills/sec – There are two types of TLB entries (and some three). Small TLB which generally means a 4K page and Large Page which generally means 2MB. There are fewer Large TLB entries on the order of 64 – 1024+. This counter is the number of Small Page TLB fills / second.

· Emulated Instructions/sec – Some instructions require emulation to complete in the Hypervisor. One such example is APIC access. This counter is the number of emulated instruction completed per second.

· Emulated Instructions Cost - This is a relative measure of cost of emulation. The cost can vary based on the machine architecture.

· CPUID Instructions/sec – The CPUID instruction is used to retrieve information on the local CPU’s capabilities. This counter is the number of CPUID instructions calls per second. Typically CPUID is only called when the OS / Application first start so this value most likely will be 0 most of the time.

· CPUID Instructions Cost - This is a relative measure of cost of the CPUID instruction. The cost can vary based on the machine architecture.

· MSR Accesses/sec – Machine specific register instruction calls per second. There are many types of MSRs such as C-state config, Synthetic Interrupt (Synic) Timers, and control functions such as shutdown.

· MSR Accesses Cost - This is a relative measure of cost of the MSR instruction. The cost can vary based on the machine architecture.

· Control Register Accesses/sec – Number of CPU Control Register accesses per second. Control registers are used to set up address mapping, privilege mode, etc.

· Control Register Accesses Cost - This is a relative measure of cost of changing the control register. The cost can vary based on the machine architecture.

· MWAIT Instructions/sec – Number of MWAIT Instructions per second. MWAIT is the monitored wait instruction where the CPU waits for a memory location between a and b to change.

· MWAIT Instructions Cost - This is a relative measure of cost of the MWAIT instruction. The cost can vary based on the machine architecture.

The following counters (and some above) likely have limited usefulness to end users of Hyper-V outside of OS / driver developers so my plan to continue to document higher value counters in other counter sets.

Check back later as I plan to flush out these counters.

· Page Invalidations/sec

· Page Invalidations Cost

· Other Intercepts/sec

· Other Intercepts Cost

· External Interrupts/sec

· External Interrupts Cost

· Pending Interrupts/sec

· Pending Interrupts Cost

· Debug Register Accesses/sec

· Debug Register Accesses Cost

· Guest Page Table Maps/sec

· Reflected Guest Page Faults/sec

· APIC MMIO Accesses/sec

· IO Intercept Messages/sec

· Memory Intercept Messages/sec

· APIC EOI Accesses/sec

· Other Messages/sec

· Page Table Allocations/sec

· Logical Processor Migrations/sec

· Address Space Evictions/sec

· Address Space Switches/sec

· Address Domain Flushes/sec

· Address Space Flushes/sec

· Global GVA Range Flushes/sec

· Local Flushed GVA Ranges/sec

· Page Table Eviction/secs

· Page Table Reclamations/sec

· Page Table Resets/sec

· Page Table Validations/sec

· APIC TPR Accesses/sec

· Page Table Write Intercepts/sec

· Synthetic Interrupts/sec

· Virtual Interrupts/sec

· APIC IPIs Sent/sec

· APIC Self IPIs Sent/sec

· GPA Space Hypercalls/sec

· Logical Processor Hypercalls/sec

· Long Spin Wait Hypercalls/sec

· Other Hypercalls/sec

· Synthetic Interrupt Hypercalls/sec

· Virtual Interrupt Hypercalls/sec

· Virtual MMU Hypercalls/sec

· Virtual Processor Hypercalls/sec

· Total Messages/sec