A while ago I talked about how I use Hyper-V in my house. One of the problems that I identified with my current setup was that I had most of my virtual machines (except for the Windows Home Server) running on the same disk as the system disk for the management operating system.
Apart from being a bad practice in general – this has always concerned me as that disk represents a pretty large single point of failure in my server (if that disk fails I will lose my domain controller, FTP server, SCVMM server, SCOM server, MED-V server and WDS server).
Recently I also discovered that the disk in question is the oldest (and slowest) disk in the system – and this is causing performance issues for all of the virtual machines running off of it.
Given all of this I decided to shuffle some disks out of other systems in my house and setup a higher performance two disk mirror for my system disk. This would at least address the issues of performance and resiliency to disk failure. The problem I faced was how to transfer my current system disk to a new RAID configuration.
After some failed attempts at using various cloning programs out there – it struck me that this was an ideal use of our backup technology. I would just backup the current system disk – and restore it to the new physical disk.
As this was just going to be a “once off” backup – I did not want to spend the time to setup a full enterprise backup solution (like DPM) but just wanted to use Windows Server Backup.
I knew that Windows Server Backup does not support Hyper-V by default – so went off to get the details of how to enable this from the appropriate KB article (http://support.microsoft.com/kb/958662) and was pleasantly surprised to find that a “Fix it” has been made for this issue – so I was able to complete this step without too much trouble.
Side note: You may wonder what happens if you do not enable this fix it. Simply put, by default Windows Server Backup will not engage our VSS backup components. This means that it will just copy the files of the virtual machines without doing anything to prepare them for backup. If your virtual machines are turned off – this is fine. If your virtual machines are running – this can result in your backup having corrupt data in the virtual machines (but it will not affect the currently running virtual machines).
Once you enable the fix it – there is nothing in the Windows Server Backup user interface that indicates that anything is different. But now when you backup a drive that contains virtual machines we will either use VSS inside the virtual machine in order to guarantee a valid backup is taken – or we will momentarily put the virtual machine into a saved state (if VSS is not supported by the guest operating system) and resume it after the backup is taken.
Most of my virtual machines support VSS, but I did fire up a Windows XP virtual machine just to watch the backup progress – otherwise there is no way for me to know that anything actually happened to the virtual machines 🙂
I then fired up Windows Server Backup and requested to do a custom backup, and selected to only do a “Bare metal recovery” backup. This meant that I was able to backup my system disk without backing up the (rather large) data disks used by my Windows Home Server virtual machine:
But then things started to go sideways.
On my first attempt, the backup failed after 10 minutes with an error message that stated:
“(0x81000101) The creation of a shadow copy has timed out. Try this operation again.”
Searching on this error message revealed nothing of particular interest – and as I was backing up the system due to slow performance of the disk I was trying to backup – I figured this was not too surprising. So I decided to do as the error message advised – and try again.
The second attempt got further – about 30 minutes in – when it failed with an I/O error message. A bit of investigation quickly revealed that the USB disk that I was trying to back up the system to had chosen this particular point in time to die. Hmmm… Ominous.
For the third attempt I tried to backup over the network to my main desktop computer (after having to shuffle a lot of virtual machines around to make space). This time I received an error message that stated:
“(0x80042336) The writer experienced a partial failure.”
Sigh. At least I knew about this error message. Basically – VSS (the backup infrastructure in Windows) prefers to have applications either succeed or fail an entire backup process. The problem that we have is that we can succeed on all but a single virtual machine – in which case we need to report failure back to the backup application, but we also need to indicate that a specific virtual machine caused the problem.
Seeing this error message I went to check the event log. Looking in the Admin section of the Hyper-V-VMMS log showed me that it was my FTP server that had caused the problem:
From here I went to look in the event log inside my FTP server.
At first I checked the System log – and saw a number of error messages from the VDS Basic Provider that stated:
“Unexpected failure. Error code: 490@01010004”
One of these occurred around the time of the failed backup – but there were a number of other instances that did not appear to correlate to any backup activity. A quick web search turned up this KB article:http://support.microsoft.com/kb/979391 that explains that this is a benign error message that can be safely ignored.
Next I checked the Application log – and saw an error message at the right time that looked like the culprit:
“Volume Shadow Copy Service error: Unexpected error calling routine. IVssBackupComponents::SetContextInternal. hr = 0x80042301, A function call was made when the object was in an incorrect state.”
Unfortunately searching on this error message revealed nothing but random people struggling with random variations of the error message – and none of them related to Hyper-V. After reading through a number of these I decided that the layman’s interpretation of this error message was “something went wrong deep in guts of VSS”. With such insight in hand I decided that I would just give it another shot.
The fourth time the backup went through without a hitch.
I honestly did not expect this process to be so painful – but the nice thing is that (with the exception of my Windows XP virtual machine, which does not support VSS) through this whole process none of my running virtual machines were disturbed. In fact – I was watching video streaming off of one of them for pretty much the entire time.
Unfortunately this story is yet to have a happy ending – as while I have been able to confirm that a valid and complete backup was taken (ironically by restoring the backup to a virtual machine on my Hyper-V server – which worked fine) I cannot get the darned thing to restore to my new disk configuration.
So for now my server continues to run a little slow on the old disk, and I am hunting down Windows Server Backup people to try and figure out why my restore is failing. On the plus side – if I do have a hardware failure now I will have a valid backup to restore the system from (once I get that part working).
Cheers,
Ben
Update: This discussion is continued at http://blogs.msdn.com/virtual_pc_guy/archive/2010/03/10/adventures-in-backup-continued.aspx
Sounds like Deja vu. I use backup exec with the hyper-v agent on 6 vm servers. Two of them just refuse to backup. It’s trying a new thing each night with the hopes it will finally work. The vss is blessing and a pain all at once. I am almsot at the point of giving up and having the servers shutdown during the backup.
And now you see why it’s so hard to trust this… I run HyperV in a production environment with four VMs – one of them running SQL 2008 – and I never feel safe when the backup is taken.
There’s no guarantee that a backup is good, unless I test every single one. I can’t trust the platform as it is. Too many false signals, noise and error messages that are not documented anywhere else.
Hi Ben,
This is exactly why i use my powershell script to inform me if the last backup was successfull. I have had some wierd crashes causing the backup service to fail. :
http://mindre.net/post/Backing-up-Virtual-Machines-using-Windows-Server-Backup-in-Server-2008-R2.aspx
Yes. On one hand I still think this is really cool technology (I mean – my virtual machines never missed a beat through this whole process) but clearly this is something that needs to be more robust / reliable.
Cheers,
Ben
This is one area where VMWare seemingly has the upper hand sadly. I’ve tried Backup Exec, Windows Backup and DPM and they all seem to randomly fail with VSS errors.
DPM was the most entertaining as it snapshots every 15 mins and you can imagine the amount of error logs you get in a day on 20 servers (physical and VM).
VSS seems to need _way_ more error checking internally and some sensible error messages would be nice too. Actually scratch that, if it "just worked" I wouldn’t care about error messages cos I’d never see them 🙂
Hey Ben,
Can you not just install W2K8 R2 on your new disk configuration and then restore your old W2K8 R2 backup over the top? Have you tried this? That may be a route forward. I’d suggest other than that it is likely that the restore isn’t correctly starting up the necessary RAID controller drivers. The old way around that would be to install the RAID controller drivers before taking the backup. Not sure if that still applies to W2K8R2 as not done that for a while! :o)
Good luck
Janson
Hi Ben,
Sorry that this is off-topic but I’d appreciate your comments on this.
I read here (http://www.microsoft.com/windows/enterprise/products/mdop/med-v.aspx) that MED-V SP1 will support both 32 and 64-bit guest operating systems when it is released.
"MED-V 1.0 SP1 with support for Windows7 (32bit and 64bit) will be available in the first quarter of calendar year 2010.
MED-V 1.0 SP1 will rely on Virtual PC 2007 technology, and will not require hardware-assisted virtualization (e.g. Intel VT, AMD-V)."
Since this is based on Microsoft Virtual PC 2007 technology, this would imply that MS are working on making Virtual PC work with 64-bit guests.
Will there be a standalone version of Virtual PC 2007 which support 64-bit guests in the future ?
Are you able to comment on this at all ?
It is good to see I am in the same boat. The biggest downfall of Hyper-V is that reliable backups are nearly impossible in my experience. It does not matter what backup software you actually use because the problems always arise from VSS.
We have had to revert back to "within" backups of guests or a shutdown/suspend + robocopy. It is just not possible to rely on VSS in production.
Janson –
Yes, that idea has crossed my mind – but I would like to figure out why a bare metal restore is not working on my specific hardware – as this is my prefered route.
Paul Lynch –
The 64-bit reference here is support for 64-bit host operating systems – which MED-V does not support today.
Dave / Doug –
After this experience I have been thinking about trying to setup nightly back ups of my system so that I can get a better feel for the sorts of issues encountered. The thing that really annoys me is that (apart from the busted USB disk) both of the errors I encountered had no corrective action other than "try it again".
Cheers,
Ben
By the way –
You may be wondering why I made a blog post that seems to shine a negative light on our backup functionality.
The main reason for this is:
– I blog what I see, good or bad
– Having spent a couple of hours getting this working I wanted to share my experience, so that others would know what to do in a similar situation (e.g. how to diagnose a 0x80042336 error)
I still think our back up functionality is pretty darned cool – and remain impressed that even while I was seeing these random failures – I suffered no downtime. But clearly we do need to make this system more reliable / predictable.
Cheers,
Ben
Hi Ben
If you haven’t researched futher already, this link may help: http://social.technet.microsoft.com/Forums/en/windowsbackup/thread/268e5d38-fc99-41b0-9d79-31e6f5e98d96
My understanding reading this is that Bare Metal Recovery is only possible/supported on similar hardware, so the change of drive controller is likely causing the issue.
Hope that helps
Janson
Regarding no downtime – that is true, until the only way to resolve a VSS issue becomes "reboot the server." 🙂
Ben,
as an interim and additional saftey measure for your "eggs in one disk" problem:
Grab a copy using disk2vhd, it should still let you watch tv while you run it and you then you will have hedged your bets with both a backup to restore and a vhd file to play with.
If you get time you could push for disk2vhd2disk.
Good luck
Ben,
Great time to post this. I was actually thinking about how I want to back up my Hyper-V R2 VMs. Since I have JBOD with VMs on them so i could have as many spindles as possible I have the same single point of failure as you.
I’m thinking about getting a large external USB drive and just let it back up to there every other day or so. Only down side after reading the KB article is you can only do a whole drive not individual VMs.
To offer some contrasting perspective to these comments, I would like to say that I am running a production Hyper-V environment with a fully virtualized corporate server infrastructure. Many of the VMs have undergone P2V migration, restoration from backup, or OS upgrades – and they continue to work as expected. I use DPM for more granular backups during the week and WSB for full images of the Hyper-V hosts + VMs on the weekend. And I have found this to be the most stable, complete, and efficient backup system I’ve ever encountered, though it took a while to work out the bugs. More to the point, I feel safer than ever that I will be able to revert or recover when necessary, and IT management is much easier. So thanks MS, and in particular, thank you Ben for all the great virtualization work you have done over the years – it is inspiring to me, and it keeps getting better with every release!
I like watching the Hyper-V manager when the backups run. In the status column it shows saving/restoring for VMs that hibernate, or “Creating VSS snapshot set…succeeded!” for those with full integration services enabled. This gives me a warm, secure feeling inside.
That being said, I do think the VSS infrastructure can be dramatically enhanced and improved to address the issues described in this post (I’ve dealt with many of them myself), but I have faith that the developers will get there in time, as more R&D investments are made in the technology. A big complaint I have is that initially MS did not implement backing-up certain things with WSB. Exchange 2007 didn’t get it until SP2. DPM 2007 can’t use WSB to make a full image of itself (system volume only, not the replicas). And as for Hyper-V, I would have thought the registry key enabler for WSB would have been added automatically in 2008 R2, and was surprised it wasn’t.
In short, if it is a MS product, WSB (or its future incarnation) should be able to back it up natively with VSS, right out of the box…period! Personally, I think lack of support for certain applications was a deliberate attempt to push DPM on enterprise customers. But now I use both technologies in tandem, and am pleased with the level of protection this combination offers. Full images are vital to have in a crunch, plus DPM incremental grabs for good measure.
I have to agree with Doug with regards to creating reliable backups. My experience with it is also nearly impossible.
Hello Ben. It’s sad to hear you have such a poor setup. My Simple Beyond TV configuration has 4 drives with mixed RAID 0 and 1 drives.
Mirroring your two drives would do zero for performance on your system. Just help with the safety/reliability of your boot OS.
Most standard practices generally have the server OS on one pair of mirrored drives and then your data or VMs on mirrored pairs of drives depending on the load. In larger environments you might have multiple pairs of drives. In your "home" situation a simple setup of 4 drives in a RAID 1+0 configuration would solve the speed and security issue, but you obviously have to have a lot of drives. They’re so cheap who doesn’t have quite a few.
Also, unless you want ultimate speed from something like a Velociraptor, pairs of laptop 2.5" drives running 7200rpms work very well. They cost a bit more per GB, but they have less heat, noise and are more durable.
Regarding your backup, I still use GHOST, more from a USB key than a floppy, but it’s extremely fast and would copy over your drives while changing to the larger capacity at the same time. I know you got to highlight the feature fix for this article, it’s just not what I do at my company. NTBACKUP also still works very well inside of the VM to a network drive.
Is there ANY way to get VMM to run on a Workgroup only environment? I don’t have a domain, only workgroups but I want to use the tools that VMM has. How can I do this.
Thanks,
Mike
I too had the same concern as you about a single point of failure. To solve this problem I just added a hard disk and enabled a software RAID-1 mirror for the system volumes. It’s not exactly what you were trying to do, but I thought it might be helpful for others to know that this is possible and requires no downtime or backups. You can even do it through the command line on Hyper-V Server using Diskpart. I plan to blog about my experience installing and configuring Hyper-V Server 2008 R2 in the next couple of days and this set of actions will be included.
Janson –
Thanks for the pointer!
Ronnie Isherwood –
disk2vhd uses VSS to create the snapshot – so it actually is pretty much the same thing under the covers.
Kurt –
Thanks for the feedback.
Mike –
No, VMM needs to have a domain controller.
Cheers,
Ben
[Written based off of reading the original article, not considering other comments]
It is because of the new disk configuration (and different accompanying Disk Controller Hardware) that causes the restore to fail. Doing this with with Windows Server Backup is really no different than imaging the whole thing with something like Ghost and then applying the image to the new Drive Setup and expecting it to boot up.
You need to backup all the VMs and any other needed material. Then Load the OS clean from scratch on the new Drive Setup,…then restore all the VM and the material. Basically this is the same thing as dealing with a failed machine when you are forced to restore on to Dissimilar Hardware (which just is another way of saying that you can not restore). In fact the Drive Controllers and Drive Arrangement is the primary thing that causes hardware to be considered "dissimilar" or "incompatible".
phillip.windell@wandtv.com
phillip.windell@live.com
i use the free version from altaro for my home setup http://www.altaro.com/hyper-v-backup
doesnt have the full functionality that you might need though more than enough for home and my test lab..
there are a few backup/export scripts out there too that help…just search for them
Dear Mr. Amstrong,
please help me 🙂 we have a issue with Hyper-v and third party Backup (Veeam 9.0). The backup Server on a Hyper-V host, the other server running same on Hyper-v. The backup job running on error: 20.02.2017 22:22:36 :: Failed to create snapshot (Microsoft Software Shadow Copy provider 1.0) (mode: Veeam application-aware processing) Details: Writer ‘Microsoft Hyper-V VSS Writer’ is failed at ‘VSS_WS_FAILED_AT_PREPARE_SNAPSHOT’.
Vss error: ‘0x800423f4’
–tr:Failed to verify writers state.
–tr:Failed to perform pre-backup tasks.
20.02.2017 22:23:04 :: Retrying snapshot creation attempt (Writer ‘Microsoft Hyper-V VSS Writer’ is failed at ‘VSS_WS_FAILED_AT_PREPARE_SNAPSHOT’.
Vss error: ‘0x800423f4’
–tr:Failed to verify writers state.
–tr:Failed to perform pre-backup tasks.)
20.02.2017 22:39:54 :: Unable to allocate processing resources. Error: Writer ‘Microsoft Hyper-V VSS Writer’ is failed at ‘VSS_WS_FAILED_AT_PREPARE_SNAPSHOT’.
Vss error: ‘0x800423f4’
–tr:Failed to verify writers state.
–tr:Failed to perform pre-backup tasks.
and then came this error. On the VM is a .AVHD and the virtuelle machine running after the backup with this .avhd disk.
The differencing disk will be the active disk file and can’t change manually, cant take new snapshoot, just after a shutdown and start will be ok. The changes will be on the original VHDx committed, and all is fine. Can in running state the changes on the VHDx enforce? have a idea?