Adventures with Hyper-V and Backup

A while ago I talked about how I use Hyper-V in my house.  One of the problems that I identified with my current setup was that I had most of my virtual machines (except for the Windows Home Server) running on the same disk as the system disk for the management operating system.

Apart from being a bad practice in general – this has always concerned me as that disk represents a pretty large single point of failure in my server (if that disk fails I will lose my domain controller, FTP server, SCVMM server, SCOM server, MED-V server and WDS server).

Recently I also discovered that the disk in question is the oldest (and slowest) disk in the system – and this is causing performance issues for all of the virtual machines running off of it. 

Given all of this I decided to shuffle some disks out of other systems in my house and setup a higher performance two disk mirror for my system disk.  This would at least address the issues of performance and resiliency to disk failure.  The problem I faced was how to transfer my current system disk to a new RAID configuration.

After some failed attempts at using various cloning programs out there – it struck me that this was an ideal use of our backup technology.  I would just backup the current system disk – and restore it to the new physical disk.

As this was just going to be a “once off” backup – I did not want to spend the time to setup a full enterprise backup solution (like DPM) but just wanted to use Windows Server Backup.

I knew that Windows Server Backup does not support Hyper-V by default – so went off to get the details of how to enable this from the appropriate KB article (https://support.microsoft.com/kb/958662) and was pleasantly surprised to find that a “Fix it” has been made for this issue – so I was able to complete this step without too much trouble.

Side note: You may wonder what happens if you do not enable this fix it. Simply put, by default Windows Server Backup will not engage our VSS backup components. This means that it will just copy the files of the virtual machines without doing anything to prepare them for backup. If your virtual machines are turned off – this is fine. If your virtual machines are running – this can result in your backup having corrupt data in the virtual machines (but it will not affect the currently running virtual machines).

Once you enable the fix it – there is nothing in the Windows Server Backup user interface that indicates that anything is different. But now when you backup a drive that contains virtual machines we will either use VSS inside the virtual machine in order to guarantee a valid backup is taken – or we will momentarily put the virtual machine into a saved state (if VSS is not supported by the guest operating system) and resume it after the backup is taken.

Most of my virtual machines support VSS, but I did fire up a Windows XP virtual machine just to watch the backup progress – otherwise there is no way for me to know that anything actually happened to the virtual machines :-)

I then fired up Windows Server Backup and requested to do a custom backup, and selected to only do a “Bare metal recovery” backup.  This meant that I was able to backup my system disk without backing up the (rather large) data disks used by my Windows Home Server virtual machine:

image

But then things started to go sideways.

On my first attempt, the backup failed after 10 minutes with an error message that stated:

“(0x81000101) The creation of a shadow copy has timed out. Try this operation again.”

Searching on this error message revealed nothing of particular interest – and as I was backing up the system due to slow performance of the disk I was trying to backup – I figured this was not too surprising.  So I decided to do as the error message advised – and try again.

The second attempt got further – about 30 minutes in – when it failed with an I/O error message.  A bit of investigation quickly revealed that the USB disk that I was trying to back up the system to had chosen this particular point in time to die.  Hmmm… Ominous.

For the third attempt I tried to backup over the network to my main desktop computer (after having to shuffle a lot of virtual machines around to make space).  This time I received an error message that stated:

“(0x80042336) The writer experienced a partial failure.”

Sigh.  At least I knew about this error message.  Basically – VSS (the backup infrastructure in Windows) prefers to have applications either succeed or fail an entire backup process.  The problem that we have is that we can succeed on all but a single virtual machine – in which case we need to report failure back to the backup application, but we also need to indicate that a specific virtual machine caused the problem.

Seeing this error message I went to check the event log.  Looking in the Admin section of the Hyper-V-VMMS log showed me that it was my FTP server that had caused the problem:

image

From here I went to look in the event log inside my FTP server. 

At first I checked the System log – and saw a number of error messages from the VDS Basic Provider that stated:

“Unexpected failure. Error code: 490@01010004”

One of these occurred around the time of the failed backup – but there were a number of other instances that did not appear to correlate to any backup activity.  A quick web search turned up this KB article:https://support.microsoft.com/kb/979391 that explains that this is a benign error message that can be safely ignored.

Next I checked the Application log – and saw an error message at the right time that looked like the culprit:

“Volume Shadow Copy Service error: Unexpected error calling routine.  IVssBackupComponents::SetContextInternal.  hr = 0x80042301, A function call was made when the object was in an incorrect state.”

Unfortunately searching on this error message revealed nothing but random people struggling with random variations of the error message – and none of them related to Hyper-V.  After reading through a number of these I decided that the layman's interpretation of this error message was “something went wrong deep in guts of VSS”.  With such insight in hand I decided that I would just give it another shot.

The fourth time the backup went through without a hitch.

I honestly did not expect this process to be so painful – but the nice thing is that (with the exception of my Windows XP virtual machine, which does not support VSS) through this whole process none of my running virtual machines were disturbed.  In fact – I was watching video streaming off of one of them for pretty much the entire time.

Unfortunately this story is yet to have a happy ending – as while I have been able to confirm that a valid and complete backup was taken (ironically by restoring the backup to a virtual machine on my Hyper-V server – which worked fine) I cannot get the darned thing to restore to my new disk configuration.

So for now my server continues to run a little slow on the old disk, and I am hunting down Windows Server Backup people to try and figure out why my restore is failing.  On the plus side – if I do have a hardware failure now I will have a valid backup to restore the system from (once I get that part working).

Cheers,
Ben

Update: This discussion is continued at https://blogs.msdn.com/virtual_pc_guy/archive/2010/03/10/adventures-in-backup-continued.aspx