Diagnosing hard disk failure with Hyper-V

Last week I had one of the hard disks in my Hyper-V server fail.  Having had to deal with this a couple of times over the last few years, I had little trouble working through the process of fixing my system, but I thought I would write up the details of my experience so that others can benefit.

It all started when my previously happy Windows Home Server started reporting a large number of file conflicts:

Error 1

Having seen this before – the first thing I did was open the Hyper-V management console and connect to the Windows Home Server directly.  Logging into the virtual machine directly revealed that Windows was reporting a large number of write delay failures.

Write delay failures inside a virtual machine are almost always an indication of an underlying hardware problem with storage.  Unfortunately the virtual machine has no insight into what is actually going wrong with the physical disk – all it knows is that it tried to write to the disk and it failed. 

From here I went to the management operating system and bought up the event log.  The Windows system event log contained exactly the information I was looking for:

Error 2

Clearly something was wrong with hard disk 2.  I shut down the Windows Home Server virtual machine (which was the only virtual machine in my system that was using this disk) and then used disk manager in the management operating system to take hard disk 2 offline:

Error 4

I then updated the Windows Home Server virtual machine to remove the virtual hard disk that was stored here.  When I started the virtual machine back up – it was happy again (except for complaining about a missing hard disk).  Thankfully most of my data was configured for replication inside Windows Home Server.  Unfortunately, I had just disabled file replication on one of my Windows Home Server shares about one week earlier because I was running low on space – but that too was easy to address.

With the virtual machine up and running I then bought the problematic disk back online in the management operating system.  I used disk manager to connect the missing virtual hard disk to the management operating system and attempted to copy the missing files off of the virtual hard disk back onto the now running Windows Home Server.

This took a bit of work – as the hard disk was clearly on its last legs, and would regularly fill my event log with error messages and cause the virtual hard disk to get disconnected from the management operating system.  But each time I was able to reconnect the virtual hard disk and start the copy again.  After 3 or 4 attempts I was able to get all of the data off the disk.

Finally I used the Intel Storage Management tool to figure out the serial number for the faulty hard disk:

Error 3

Then I shutdown the physical computer and removed the disk in question.

At the end of the day I had the system back up and running happily with no data lost.  I cannot complain about that!

Cheers,
Ben