Hyper-V Performance Counters – Part four of many – “Hyper-V Hypervisor Virtual Processor” and “Hyper-V Hypervisor Root Virtual Processor” counter set


The “Hyper-V Hypervisor Virtual Processor” and “Hyper-V Hypervisor Root Virtual Processor” counter sets have the same counters.  The only difference between the two is the ““Hyper-V Hypervisor Root Virtual Processor” contains counters for only the Root Virtual Processors (VP’s) whereas “Hyper-V Hypervisor Virtual Processor”  has counter for all other partitions.


The virtual processor counters are very useful because they help you understand how much guest VM’s are running and where they are running.  Unfortunately these counters do suffer from a small amount of clock skew in WS08 Hyper-V but this only slightly reduces their usefulness.  We hope to remove the clock skew in future releases.  The skew shows up in that some” %” counters may exceed 100%.  I’ve seen some go as much as 110% depending on the system load.  The problem has to do with the fact this counter set uses the clock from the root rather than from the hypervisor as a basis of time. For more on clock skew see (http://blogs.msdn.com/tvoellm/archive/2008/03/20/hyper-v-clocks-lie.aspx). 


Virtual Processors (VP) are the unit of execution for a partition and each partition contains one guest virtual machine (VM).   For each VP there is a set of counters.  Perfmon.exe will let you view the counters separately or as an average for all VP’s called “_Total”.  VP counters are prefixed with the name of the partition like this “WS08 Guest 1:” followed by the VP id like this “Hv VP 0”.  This makes it easy to identify which VP’s go with which partitions.


The VP counters have a lot of detail on what the virtual processors are doing so I have ordered them with the most useful counters at the top.


Hyper-V Hypervisor [Root] Virtual Processor counters


·         %Guest Run Time – For guest VM’s this is the percentage of time the guest VP is running in non-hypervisor code on an LP or for the _Total the total across all guest VP’s.   For the root this is the percentage of time the root VP is running in non-hypervisor code on an LP or for _Total the total across all root VP’s.  If you sum the _Total for both the guest VP’s and root VP’s this will equal the % Guest Run Time _Total of the Logical Processor counter set.


·         %Hypervisor Run Time – For guest VM’s this is the percentage of time the guest VP is running in hypervisor code on an LP or for the _Total the total across all guest VP’s.   For the root this is the percentage of time the root VP is running in hypervisor code on an LP or for _Total the total across all root VP’s.  If you sum the _Total for both the guest VP’s and root VP’s this will equal the % Hypervisor Run Time _Total of the Logical Processor counter set.


·         %Total Run Time – This is just a sum of %Guest Run Time + % Hypervisor Runtime on a per VP basis.  If you sum the %Total Run Time across the Root Virtual Processor and Virtual Processor counter sets it will equal  the sum of %Total Run Time from all the Logical Processor counters.


·         Total Intercepts/sec – Whenever a guest VP needs to exit is current mode of running for servicing in the hypervisor this is called an intercept.  Some common causes of intercepts are resolving Guest Physical Address (GPA) to Server Physics Address (SPA) translations, privileged instructions like hlt / cupid / in / out, and the end of the VP’s scheduled time slice.


·         Total Intercepts Cost – This is a relative measure of cost of intercepts.   The cost can vary based on the types of intercepts and the machine architecture.


·         Hypercalls/sec – Hypercalls are one form of enlightenment.  Guest OS’s use the enlightenments to more efficiently use the system via the hypervisor.   TLB flush is an example hypercall.  If this value is zero and says zero this is an indication that Integration Components are not installed.  New OS’s like WS08 can use hypercalls without enlightened drivers so it is only a prereq. not a guarantee of having Integration Components installed.


·         Hypercalls Cost – This is a relative measure of cost of hypercalls.   The cost can vary based on the types of calls and the machine architecture.


·         HLT Instructions/sec – Number of CPU halts per second on the VP.  A HLT will cause the hypervisor scheduler to de-schedule the current VP and move to the next VP in the runlist.


·         HLT Instructions Cost – This is a relative measure of cost of halt.   The cost can vary based on the machine architecture.


·         IO Instructions/sec – Number of CPU in / out instructions executed per second.  Many older or low bandwidth devices use “programmed I/O” via in / out instructions.


·         IO Instructions Cost – This is a relative measure of cost of the in / out instructions.   The cost can vary based on the machine architecture.


·         Page Fault Intercepts/sec – Whenever guest code accesses a page not in the CPU TLB a page fault will occur.  This counter is the number of Page Faults per second.  This counter is closely correlated with the Large Page TLB  Fills /sec  Small Page TLB Fills / sec counters.


·         Page Fault Intercepts Cost – This is a relative measure of cost of a page fault.   The cost can vary based on the machine architecture.


·         Large Page TLB Fills/sec – There are two types of TLB entries (and some three).  Small TLB which generally means a 4K page and Large Page which generally means 2MB.  There are fewer Large TLB entries on the order of 8 – 32.  This counter is the number of Large Page TLB fills / second.  A non-zero value indicates the guest OS is using large pages.


·         Small Page TLB Fills/sec – There are two types of TLB entries (and some three).  Small TLB which generally means a 4K page and Large Page which generally means 2MB.  There are fewer Large TLB entries on the order of 64 – 1024+.  This counter is the number of Small Page TLB fills / second. 


·         Emulated Instructions/sec – Some instructions require emulation to complete in the Hypervisor.  One such example is APIC access.  This counter is the number of emulated instruction completed per second.


·         Emulated Instructions Cost – This is a relative measure of cost of emulation.   The cost can vary based on the machine architecture.


·         CPUID Instructions/sec – The CPUID instruction is used to retrieve information on the local CPU’s capabilities.  This counter is the number of CPUID instructions calls per second.  Typically CPUID is only called when the OS / Application first start so this value most likely will be 0 most of the time.


·         CPUID Instructions Cost – This is a relative measure of cost of the CPUID instruction.   The cost can vary based on the machine architecture.


·         MSR Accesses/sec – Machine specific register instruction calls per second.  There are many types of MSRs such as C-state config, Synthetic Interrupt (Synic) Timers, and control functions such as shutdown.


·         MSR Accesses Cost  – This is a relative measure of cost of the MSR instruction.   The cost can vary based on the machine architecture.


·         Control Register Accesses/sec – Number of CPU Control Register accesses per second.  Control registers are used to set up address mapping, privilege mode, etc.


·         Control Register Accesses Cost – This is a relative measure of cost of changing the control register.   The cost can vary based on the machine architecture.


·         MWAIT Instructions/sec – Number of MWAIT Instructions per second.  MWAIT is the monitored wait instruction where the CPU waits for a memory location between a and b to change.


·         MWAIT Instructions Cost – This is a relative measure of cost of the MWAIT instruction.   The cost can vary based on the machine architecture.


The following counters (and some above) likely have limited usefulness to end users of Hyper-V outside of OS / driver developers so my plan to continue to document higher value counters in other counter sets.   


Check back later as I plan to flush out these counters.


·         Page Invalidations/sec


·         Page Invalidations Cost


·         Other Intercepts/sec


·         Other Intercepts Cost


·         External Interrupts/sec


·         External Interrupts Cost


·         Pending Interrupts/sec


·         Pending Interrupts Cost


·         Debug Register Accesses/sec


·         Debug Register Accesses Cost


·         Guest Page Table Maps/sec


·         Reflected Guest Page Faults/sec


·         APIC MMIO Accesses/sec


·         IO Intercept Messages/sec


·         Memory Intercept Messages/sec


·         APIC EOI Accesses/sec


·         Other Messages/sec


·         Page Table Allocations/sec


·         Logical Processor Migrations/sec


·         Address Space Evictions/sec


·         Address Space Switches/sec


·         Address Domain Flushes/sec


·         Address Space Flushes/sec


·         Global GVA Range Flushes/sec


·         Local Flushed GVA Ranges/sec


·         Page Table Eviction/secs


·         Page Table Reclamations/sec


·         Page Table Resets/sec


·         Page Table Validations/sec


·         APIC TPR Accesses/sec


·         Page Table Write Intercepts/sec


·         Synthetic Interrupts/sec


·         Virtual Interrupts/sec


·         APIC IPIs Sent/sec


·         APIC Self IPIs Sent/sec


·         GPA Space Hypercalls/sec


·         Logical Processor Hypercalls/sec


·         Long Spin Wait Hypercalls/sec


·         Other Hypercalls/sec


·         Synthetic Interrupt Hypercalls/sec


·         Virtual Interrupt Hypercalls/sec


·         Virtual MMU Hypercalls/sec


·         Virtual Processor Hypercalls/sec


·         Total Messages/sec


               


 

Comments (12)

  1. Mihir Patel says:

    Hi Tony,

    I having issues collecting Hypervisor Root VP counters from perfmon when I have multiple VMs defined.  If I have 1VM..Hypervisor Root VP counters work fine.  Is there any workaround for this?

    thanks,

    Mihir Patel

     

     —– Tony’s Reply —-

    Mihir please be more specific.  I not sure what trouble collecting means.  Does this mean they dont show up in perfmon, they have incorrect values, … ???

     

      Thanks – Tony

  2. Mihir Patel says:

    Hi Tony,

    When I try to collect Hypervisor Virtual Processor counters from perfmon and they will not show up in perfmon in case of Multiple VMs runnings.  For example..let’s say I have 4VMs runing and each VMs has 2VPs defined.  If I collect Hypervisor Virtual Processor counters, it will not show up in the perfmon.  If I have 1VMs with 1VP or 2VP…it works fine.  I hope this helps.  I am running LH RTM + RC0 bits.

    Thanks,

    Mihir

    [Tony’s reply] Thanks we’ll look into it.

     

  3. GeetaGiri says:

    Hello Tony,

    In "%Guest Run Time" it is written that:  For guest VM’s this is the percentage of time the guest VP is running in "non-hypervisor code"

    and in "%Hypervisor Run Time" it is written that: For guest VM’s this is the percentage of time the guest VP is running in "hypervisor code" .

    Would you please help me to understand the difference between non-hypervisor code and hypervisor code?

    This will help me to choose the performance counters which will measure the processor usage of hyper-v.

  4. Deepak says:

    what is the HyperV processor overload

    [Tony’s reply]  Sorry I dont understand the question

  5. kong says:

    Hi,

    1. In part 3, you said _Total is average but here in part 4 you said _Total is total.  Should both be total, or both be average, or one is average and one is total?

    [Tony’s reply] The _total changes meaning based on the counterset.

    2. Env: 4 LP’s. VM1: 2VP’s, VM2: 4VP’s.  VM1 is running a load.

    I found data inconsistency using perfmon against hyper-V RTM code:

    2.1. The sum of {Hyper-V Hypervisor Root Virtual Processor counters%Guest Run Time_Total} and {Hyper-V Hypervisor Virtual Processor counters%Guest Run Time_Total} does NOT equal to {Hyper-V Hypervisor Logical Processor%Guest Run Time_Total}.

    2.2 Same problem with %Hypervisor Run Time.

     [Tony’s Reply]  There is slight difference in the totals because the Root Virtual and Logical Processor countersets use different time sources.  More load of the system will cause more skew.  We hope to make this better in future releases.

    Thanks.

    Kong

  6. Eduardo Claudio says:

    Hi!

    I have a little doubt concerning theese counters.

    I have a machine with two quad core processors, so the HV ROOT Virtual processors reports 8 instances: Root VP 0 thru Root VP 7.

    Inside this machine I have 5 VMs: 3 using one (virtual) processor and 2 using two (virtual) processors, so the counter HV Virtual processor reports the follwing instances:

    Server1: HV VP 0

    Server2: HV VP 0

    Server3: HV VP 0

    Server3: HV VP 1

    Server4: HV VP 0

    Server5: HV VP 0

    Server5: HV VP 1

    Could you explain to me why are there only Hv VP 0 and Hv VP 1, and not Hv VP 0 thru Hv VP 7?

    Can I set (force) the VMs to use another Hv VP, like VP 2, 3, etc?

    Thanks in advance,

    Eduardo.

     ——

    [ Tony’s Reply ]

    The root virtual processors are shown in the “Hyper-V Hypervisor Root Virtual Processors” counter set and all the guest virtual processors are shown in the “Hyper-V Hypervisor Virtual Processors” counter set.  Note the addition of the “Root” to the counter set name.

  7. RobFord says:

    Hi

    Following on from Kong’s comment.

    I’m also getting issues with the counter sums not adding up.  I understand that clock skew will make a difference, but in some cases I’m getting a Virtual Processor% run time value that is twice the value of the Logical Processor% run time value.

    Can this be attributed to clock skew as well?

    Thanks

    Rob

  8. Geeta says:

    can we relate

    "Hyper-V Hypervisor Root Virtual Processor"-> "%Total Run Time" attribute with WMI Class attribute "Msvm_VirtualSystemManagementService" -> ProcessorLoad ?

    As name suggests both represent processor usage, but at the same time I see different value for both of these counters.

  9. Geeta says:

    Hello,

    Can we relate attribute "ProcessorLoad" of WMI Class "Msvm_VirtualSystemManagementService" with attribute "%Total Run Time" of Perfmon Class "Hyper_V_Hypervisor_Root_Virtual_Processor" ?

    I see as and when the load on physical processor increases both of these values increase proportionally. but which one of them represents the actual physical processor load? what is the difference between these two attributes?

  10. Geeta says:

    Hello,

    Can we relate attribute “ProcessorLoad” of WMI Class “Msvm_VirtualSystemManagementService” with attribute “%Total Run Time” of Perfmon Class “Hyper_V_Hypervisor_Root_Virtual_Processor” ?

    I see, as and when the load on physical processor increases both of these values increase proportionally. but which one of them represents the actual physical processor load?

    what is the difference between these two attributes?

    Thanks and Regards,

    Geeta

     —-

     Tony’s Reply – They are releted.  The Hyper-V Virtual Processor counters represent the % of Physical Processor (core / SMT).

  11. Now that Hyper-V has been in the market for over 9 months a common question that has come my way is “what

  12. Fabrizio Grossi says:

    Hi Tony,

    I have some doubt on Processor measure.

    I have a Pyisical box with a Dual Core Processor, so 2 LPs. On this box I have 4 VMs, VM01 and VM02  with 2 VP assigned, VM03 and VM04 with 1 Core assigned, so 6 VPs.

    I start a CPU stress tool on all VMs.

    Here the results:

    – Hyper-V Hypervisor Logical Processor%Guest RunTime_Total is about 99% (correct for my comprehension)

    – Hyper-V Hypervisor Virtual Processor%Guest Run Time_VM01 … VM04 is about 30%. Correct from my point of vue, 6 VPs share 2 LP so the result is 30% of CPU time for every VP

    The counters that create me some problems are:

    -Hyper-V Hypervisor Virtual Processor%Hypervisor Run Time_Total is about 2 %

    -Hyper-V Hypervisor Virtual Processor%Guest Run Time_Total is about 30 %

    -Hyper-V Hypervisor Virtual Processor%Total Run Time_Total is about 32 %

    Why this? for my comprehension "Hyper-V Hypervisor Virtual Processor%Guest Run Time_Total" should be the sum of "Hyper-V Hypervisor Virtual Processor%Guest Run Time_VM01 … VM04".

    Is because the sum is > 100% ?

    thank’s for wour help

    best regards

    Fabrizio Grossi