Windows Azure instance uptime and instance re-initialization

Windows Azure virtual machines goes at least once a month for guest OS update and similarly Host OS update. Because of it the Windows Azure virtual machine, in which your application in running, will part of guest or host OS update and will be down for a few minutes per month during those update process. Besides these scheduled update it is possible your virtual machine may be down for other potential reasons. In other words you can case, the Windows Azure virtual machines uptime can be interrupted due to any reason and here we will see what the potential reasons are which can interrupt the virtual machine uptime.

 

 Instance uptime interruption: A virtual machine Instance uptime is considered the amount of time virtual machine instance is running without interruption. This could be impacted due the following two reasons: 

  • Host OS servicing- It is typically done once each month mainly for compliance and keeping your virtual machine secure. This host OS servicing does a systematic reboot, following user defined update domains, throughout all machines owned by the Windows Azure fabric controller.
    • For Web/Worker roles, guest agent components in virtual machine are updated during host OS update which re-initialize the OS to insert the newest guest agent. This OS is still the original OS version you defined.
    • For VM role – Instance is shut down and restart after the update, without re-initializing it.
  • A Potential hardware failure: A hardware failure which is a rare event however still considered a potential reason to re-allocate virtual machine instance on another physical machine. This will give you a new virtual machine for your current running instance and considered a fresh virtual machine instance.

 

Instance re-Initialization: Instance re-Initialization is considered a process in which your instance is re-initialized with a clean VHD for given VM Role. For VM role, Instance Re-Initialization could happen in following few situations:

  • Hardware failure: 
    • In this case instance is re-allocated to another physical machine. It is non-deterministic for specific instance and will start with a fresh OS disk and fresh local resource disk in this case the platform supplied local storage
  • Non-responsive Instance: 
    • In this case the instance will be forcibly shutdown and when it occurs, a fresh OS image will be taken to avoid any possible corruption that may have occurred from the forced shutdown. Instance still use the existing local storage resource disk.
  • Virtual machine Shutdown failure: 
    • When host OS update is started, an stopping event is generated and if instance does not successfully shutdown after ten minutes, virtual machine will be forced to shut down to progress host OS update. This results in a re-initialization of OS disk with the existing local storage resources.

 Thanks to Windows Azure Team for providing this information.