I use Windows Home Server at home to store *everything*… it’s really quite a fantastic product. It has a feature called Drive Extender, for which Wikipedia describes nicely:
Windows Home Server Drive Extender is a file-based replication system that provides three key capabilities:
- Multi-disk redundancy so that if any given disk fails, data is not lost
- Arbitrary storage expansion by supporting any type of hard disk drive (Serial ATA, USB, FireWire etc.) in any mixture and capacity
- A single folder namespace (no drive letters)
Users (specifically those who configure a family’s home server) deal with storage at two levels: Shared Folders and Disks. The only concepts relevant regarding disks is whether they have been "added" to the home server’s storage pool or not and whether the disk appears healthy to the system or not.
Shared Folders have a name, a description, permissions, and a flag indicating whether duplication (redundancy) is on or off for that folder.
If duplication is on for a Shared Folder (which is the default on multi-disk Home Server systems and not applicable to single disk systems) then the files in that Shared Folder are duplicated and the effective storage capacity is halved. However, in situations where a user may not want data duplicated (e.g. TV shows that have been archived to a Windows Home Server from a system running Windows Media Center), Drive Extender provides the capability to not duplicate such files if the server is short on capacity or manually mark a complete content store as not for duplication.
Here at Microsoft, we have an internal mailing list for WHS, and every once in a while, someone asks one of the following questions:
Isn’t RAID better than Drive Extender?
Why should I use Drive Extender instead of RAID?
Which RAID card should I buy?
How good is software RAID5?
I try to ignore those threads, but when the responses start coming in about the merits of RAID vs simply using DE, I end up getting itchy, and chime in. The topic came up again, this last weekend, and I recycled an old response, and it started looking like a good blog post… so here’s the skinny.
First, the reason why you don’t want Software RAID 5
First, there’s a big gap between software RAID5 and hardware RAID5. Software RAID5 is slow. Damn Slow. Faster than that… maybe pretty damn slow. Not a great solution. You won’t be happy at the end of the day (see section below “Why you don’t want RAID 5.”)
Hardware RAID5 is fast. Zippity fast. So is how fast you will lose your data.
Why you don’t want RAID 5
RAID 5 is not about data integrity… it’s about performance and availability.
If you want your data to be safe, replicate it. Back it up. Put it in more than one place at a time.
If you use RAID5, you still need to back up your data. RAID5 is designed so that a single drive failure, will preserve your data, and make it available (but slower) until you get another drive in place, when it will rebuild the missing volume.
Here’s the kicker. What happens when a drive fails, and you are not there? If the system is in use, it’s going to get really really busy, and all of the drives in the array are going to get a lot of use.
When that hard drive fails (and you are not planning for IF but WHEN), and the others pick up the slack, the chances of losing a second drive go thru the roof. What will you lose if a second drive goes?
This is common, especially in a server/computer in a home environment, where the drives may not be busy most of the time.
One other contributing factor to multiple drive failure in RAID5, is people tend to use the same brand of drives, especially if they are the same batch (ie, you bought them at the same time).
My personal experience with RAID5
I had a server running RAID5 at home, it ran perfect for over a year (actually, close to two). One night after I went to bed, a drive failed. 3 minutes later another failed. This was a 2 terabyte RAID array.
I came down in the morning to my worst nightmare. Every bit of ‘valuable data’ I had in the world was now gone. In desperation I scoured the internet, and finally found a piece of software that (for $40!) could recreate every file that I still had data for, if not a little slowly. I rushed out and bought 3 750gb drives, and started to restore everything I had lost. The restore process took 3 and half MONTHS, running full time, around the clock. The good news is that I was able to get one of the failed drives spinning again, and I lost a total of one file.
What did I learn?
RAID5 doesn’t back up my data. Sadly, I thought it was safer. Worse than that, it was actually less safe. A single drive failure would have meant nothing. Add another drive, and keep chugging. Potentially, it may have taken a few hours to rebuild the lost volume, but I could have been using it while it did.
A second drive failure would have meant I was offlined for the time it took to restore–If I actually had a backup. Still, not bad, considering that would have been less than the 3.5 months.
But a two drive failure(which is fairly likely)–without a backup– is a nightmare.
If you value your data, replicate. I now have a home server with 6 250gb drives and 3 750gb drives, and the data that I value is replicated. (and the really valuable data is foldershare’d to a friend’s house, and vice versa, giving us offsite backups too). Sure, it’s not as ‘space efficient’ but at least I can deal with a drive failure.
Raid 1 (mirroring) is the only RAID where a failure doesn’t increase drive activity drastically—well, reads are all going to one drive now, but if you had 8 drives mirrored in 4 sets, typical access won’t cause all the drives to get busier.
Raid 0 of course, is purely about speed. Half the safety at twice the speed. That’s what I use in my desktops. (where I want it fast. I of course back up anything that I’m not willing to lose to the server.)
What is my Advice?
Know this: if you are using hard drives, one day, you will experience a drive failure. Not ‘might’, but ‘will.
How you are affected depends on your choices.
Determine how valuable your data is.
Stop thinking about the price of the hard drives. Disk Space is very cheap. It got cheaper while you were reading this. It’s the stuff you store that’s not.
Are you planning for the inevitable, or playing the odds?
I can talk all day why DE is better than RAID, or why one particular strategy is better than the other. At the end of the discussion, you’re still the one making your decision, and you’re probably pretty smart. (You’re reading my blog). Ask yourself: why are you doing what you are doing?
I’ll leave the rest to your imagination.