Virtualization Clustering and Processor Compatibility

Generally speaking I try to pretend that the rest of the Internet does not exist, and just happily talk about my technology.  However when I find members of my own team confused about statements made on others blogs / news sites I guess I am compelled to set the record straight.

You may be wondering what I am talking about - well recently a blog post was made (and picked up on) that stated:

"With the Xen based live migration and Microsoft Quick Migration they do not perform the check and so you can actually do the migration but your app and your OS may die as a result."

Where the "check" being mentioned is whether the source and target processor are compatible or not.  And the issue being discussed is migrating virtual machines from Intel to AMD processors (and vice versa).

Before I get started on the technical discussion of how this is handled with Virtual Server and Hyper-V clustering - I want to make two quick points:

  1. I am talking about planned fail-over in a Virtual Server and Hyper-V clustering - not Live Migration.  We do not have a solution available today with that functionality so there is little point in me discussing that.

  2. The root problem of moving a virtual machine with active processor state between physical computers with different processor capabilities is a real problem that all virtual machine migration solutions need to handle in one way or another.

Now on to the fun stuff:

For Virtual Server and Hyper-V clustering we allow you to install Windows Server Fail-over Clustering in combination with Virtual Server / Hyper-V to create highly available virtual machines.  Once you do this there are two ways to have virtual machines move from one physical computer to another.

  1. Unplanned fail-over.

    In this case a physical server has failed.  All virtual machines running on that physical computer will no longer be running.  The cluster will detect this and start the virtual machines on other physical computers in the cluster.  As no state is transferred there are no processor compatibility issues here.

  2. Planned fail-over.

    Here the virtual machines are placed into a saved state on the source physical computer and are then restored on the target physical computer.  Since there is state transferred here there are issues with processor compatibility.  For this reason we state here that you should have compatible processors for all computers involved in a virtualization cluster.

So what happens if you try to configure a cluster with Intel / AMD processors?

Unfortunately we are the only server product / role that cares about the processor type beyond "x86 or x64" so Windows Server Fail-over Clustering will happily let you create such a configuration.

When you then try to perform a planned fail-over of a virtual machine it will be placed in saved state on the source physical computer, but when we try to restore it on the target physical computer we will detect that the processor is not compatible and will fail the request.  The virtual machine can then be safely restored on a compatible system.

The reason why this happens is because this scenario has been known of since Connectix Virtual PC 4.0 - where a user could manually place a virtual machine into a saved state on one computer, move it to another computer, and attempt to restore it there.  Checking for processor compatibility is a standard part of our save / restore code so there is certainly no risk of the virtual machine starting and experiencing instability as a result.

So what if you really want to have a cluster with incompatible processors? 

Well - this is actually possible.  What you need to do is to configure virtual machine resources in the cluster to shutdown - instead of save state - when performing a planned fail-over.  To do this with Virtual Server you need to edit the offline action in the generic resource script (HAVM.VBS) to shutdown instead of save state.  To do this with Hyper-V you need to change the resource configuration for the virtual machine object under clustering to shutdown instead of save state.  This way a planned fail-over is as fast as a shutdown and restart - and is safe across in processor types.

Cheers,
Ben