Configuring Heartbeat parameters under Virtual Server

Virtual Server provides a service where a user can be notified if a virtual machine is not responding.  This is called the virtual machine heartbeat.  There are two situations under which a virtual machine might not send its heartbeat.  One is because the virtual machine has crashed – and no programs are running any longer.  The other is because another program on the virtual machine may be using all of the CPU resources and not leaving enough CPU time for our code to be able to send a heartbeat message.

Because of these two scenarios we are cautious about telling the user that the virtual machine has stopped sending heartbeats.

By default we will send one heartbeat every 6 seconds.  If we miss a heartbeat we will continue to send one heartbeat every 10 seconds.  We will then only declare the virtual machine ‘dead’ if we have not received a heartbeat in 120 seconds.  Depending on your virtual machine configuration – you may wish to change these parameters to make us more or less sensitive to the state of the virtual machine heartbeat.  You can do this by editing the virtual machines .VMC file and finding the following section:

          <failure_attempts type=”integer”>12</failure_attempts>
          <failure_interval type=”integer”>10</failure_interval>
          <rate type=”integer”>10</rate>
          <time type=”integer”>60</time>

Failure_attemps specifies how many heartbeats should be missed before we fire the ‘heartbeat stopped’ event.  Failure_interval specifies how long (in seconds) we should wait between heartbeats once and initial failure has been detected.  Time specifies the standard interval (in seconds) in which to sample heartbeats – and Rate specifies how many heartbeats should be received in the interval defined by Time.