Windows Server 2016 introduces Storage Spaces Direct (S2D), which enables building highly available storage systems which is virtual shared storage across the servers using local disks. This is a significant step forward in Microsoft’s Windows Server Software-defined Storage (SDS) story as it simplifies the deployment and management of SDS systems and also unlocks use of new classes of disk devices, such as SATA disk devices, that were previously unavailable to clustered Storage Spaces with shared disks. The following document has more details about the technology, functionality, and how to deploy on physical hardware.
That experience and install guide notes that to be reliable and perform well in production, you need specific hardware (see the document for details). However, we recognize that you may want to experiment and kick the tires a bit in a test environment, before you go and purchase hardware and using Virtual Machines is an easy way to do that. You can configure Storage Spaces Direct inside a VM on top of any cloud… be it Hyper-V, Azure, or your hypervisor of preference.
Assumptions for this Blog
- You have a working knowledge of how to configure and manage Virtual Machines (VMs)
- You have a basic knowledge of Windows Server Failover Clustering
- Windows Server 2012 R2 or Windows Server 2016 host with the Hyper-V Role installed and configured to host VMs
- Enough capacity to host two VMs with the configuration requirements noted below
- Hyper-V hosts can be part of a host failover cluster, or stand-alone
- VMs can be located on the same server, or distributed across servers (as long as the networking connectivity allows for traffic to be routed to all VMs with as much throughput and lowest latency possible.
Overview of Storage Spaces Direct
S2D uses disks that are exclusively connected to one node of a Windows Server 2016 Failover Cluster and allows Storage Spaces to create pools using those disks. Virtual Disks (Spaces) that are configured on the pool will have their redundant data (mirrors or parity) spread across the disks in different nodes of the cluster. Since copies of the data is distributed, this allows access to data even when a node fails or is shutdown for maintenance. Documents which go into details on Storage Spaces Direct can be found here http://aka.ms/S2D
You can implement S2D implement in VMs, with each VM configured with two or more virtual disks connected to the VM’s SCSI Controller. Each node of the cluster running inside of the VM will be able to connect to its own disks, but S2D will allow all the disks to be used in Storage Pools that span the cluster nodes.
S2D leverages SMB3 as the protocol transport to send redundant data, for the mirror or parity spaces to be distributed across the nodes.
Effectively, this emulates the configuration in the following diagram:
- Network. Since the network between the VMs transports the replication of data, the bandwidth and latency of the network will be a significant factor in the performance of the system. Keep this in mind as you test configurations.
- VHDx location optimization. If you have a Storage Space that is configured for a three way mirror, then the writes will be going to three separate disks (implemented as VHDx files on the hosts), each on different nodes of the cluster. Distributing the VHDx files across disks on the Hyper-V hosts will provide better response to the I/Os. For instance, if you have four disks or CSV volumes available on the Hyper-V hosts, and four VMs, then put the VHDx files for each VM on a separate disks (VM1 using CSV Volume 1, VM2 using CSV Volume 2, etc).
Enabling Storage Spaces Direct in Virtual Machines:
Windows Server 2016 includes enhancements that automatically configures the storage pool and storage tiers in “Enable-ClusterStorageSpacesDirect”. It uses a combination of bus type and media type to determine devices to use for caching and the automatic configuration of storage pool and storage tiers.
Below is an example of the steps to do this:
#Create cluster New-Cluster -Name <ClusterName -Node <node1>,<node2>,<node3> -NoStorage #Enable Storage Spaces Direct Enable-ClusterS2D #Create a volume New-Volume -StoragePool "S2d*" -FriendlyName <friendlyname> -FileSystem CSVFS_REFS -StorageTiersFriendlyNames Performance, Capacity -StorageTierSizes <2GB>, <10GB> #Note: The values for the -StorageTierSizes parameter above are examples, you can specify the size you prefer. The -StorageTierFriendNames of Performance and Capacity are the names of the default tiers created with the Enable-ClusterS2D cmdlet. There are some cases there may only be one of them, or someone could have added more tier definitions to the system. Use Get-StorageTier to confirm what storage tiers exist on your system.
Configuration Option #1: Single Hyper-V Server (or Client) hosting VMs
The simplest configuration is one machine hosting all of the VMs used for the S2D system. In my case, a Windows Server 2016 system running on a desktop class machine with 16GB or RAM and a 4 core modern processor.
The VMs are configured identically. I have a virtual switch connected to the host’s network and goes out to the world for clients to connect and I created a second virtual switch that is set for Internal network, to provide another network path for S2D to utilize between the VMs.
The configuration looks like the following diagram:
Hyper-V Host Configuration
- Configure the virtual switches: Configure a virtual switch connected to the machine’s physical NIC, and another virtual switch configured for internal only.
Example: Two virtual switches. One configured to allow network traffic out to the world, which I labeled “Public”. The other is configured to only allow network traffic between VMs configured on the same host, which I labeled “InternalOnly”.
- Virtual Machines: Create two or more Virtual Machines
- The servers which are going to be nodes in the S2D cluster cannot be configured as Domain Controllers
- Memory: If using Dynamic Memory, the default of 1024 Startup RAM will be sufficient. If using Fixed Memory you should configure 4GB or more.
- Network: Configure each two network adapters. One connected to the virtual switch with external connection, the other network adapter connected to the virtual switch that is configured for internal only.
- It’s always recommended to have more than one network, each connected to separate virtual switches for resiliency so that if one stops flowing network traffic, the other(s) can be used and allow the cluster and Storage Spaces Direct system to remain running.
- Virtual Disks: Each VM needs a virtual disk that is used as a boot/system disk, and two or more virtual disks to be used for Storage Spaces Direct.
- Disks used for Storage Spaces Direct must be connected to the VMs virtual SCSI Controller.
- Like all other systems, the boot/system disk needs to have unique SIDs, meaning they need to be installed from ISO or other install methods, and if using duplicated VHDx it needs to be generalized (for example using Sysprep.exe), before the copy was made.
- VHDx type and size: You need minimum of two VHDx data disks presented to each node, in addition to the OS VHDx disk. The data disks can be either “dynamically expanding” or “fixed size”.
Example: The following is the Settings dialog for a VM that is configured to be part of an S2D system on one of my Hyper-V hosts. It’s booting from the Windows Server VHD that I downloaded from Microsoft’s external download site, and that is connected to the IDE Controller 0 (this had to be a Gen1 VM since the TP2 file that I downloaded is a VHD and not VHDx). I created two VHDx files to be used by S2D, and they are connected to the SCSI Controller. Also note the VM is connected to the Public and InternalOnly virtual switches.
Note: Do not enable the virtual machine’s Processor Compatibility setting. This setting disables certain processor capabilities that S2D requires inside the VM. This option is unchecked by default, and needs to stay that way. You can see this setting here:
Guest Cluster Configuration
Once the VMs are configured, creating and managing the S2D system inside the VMs is almost identical to the steps for supported physical hardware:
- Start the VMs
- Configure the Storage Spaces Direct system, using the “Installation and Configuration” section of the guide linked here: Storage Spaces Direct Experience and Installation Guide
- Since this in VMs using only VHDx files as its storage, there is no SSD or other faster media to allow tiers. Therefore, skip the steps that enables or configures tiers.
Configuration Option #2: VMs Spread Across Two or more Hyper-V Servers
You may not have a single machine with enough resources to host all VMs, or you may wish to spread the VMs across hosts to have greater resiliency to host failures. Here is an diagram showing a configuration spread across two nodes, as an example:
This configuration is very similar to the single host configuration. The differences are:
Hyper-V Host Configuration
- Virtual Switches: Each host is recommended to have a minimum of two virtual switches for the VMs to use. They need to be connected externally to different NICs on the systems. One can be on a network that is routed to the world for client access, and the other can be on a network that is not externally routed. Or, they both can be on externally routed networks. You can choose to use a single network, but then it will have all the client traffic and S2D traffic taking common bandwidth, and there is no redundancy if the single network goes down for the system S2D VMs to stay connected. However, since this is for testing and verification of S2D, you don’t have the resiliency to network loss requirements that we strongly suggest for production deployments.
Example: On this system I have an internal 10/100 Intel NIC and a dual port Pro/1000 1gb card. All Three NICs have virtual switches. I labeled one Public and connected it to the 10/100 NIC since my connection to the rest of the world is through a 100mb infrastructure. I then have the 1gb NICs connected to a 1gb desktop switch (two different switches), and that provides my hosts two network paths between each other for S2D to use. As noted, three networks is not a requirement, but I have this available on my hosts so I use them all.
- Network: If you choose to have a single network, then each VM will only have one network adapter in its configuration.
Example: Below is a snip of a VM configuration on my two host configuration. You will note the following:
- Memory: I have this configured with 4GB of RAM instead of dynamic memory. It was a choice since I have enough memory resources on my nodes to dedicate memory.
- Boot Disk: The boot disk is a VHDx, so I was able to use a Gen2 VM.
- Data Disks: I chose to configure four data disks per VM. The minimum is two, I wanted to try four. All VHDx are configured on the SCSI Controller (which you don’t have a choice in Gen2 VMs).
- Network Adapters: I have three adapters, each connected to one of the three virtual switches on the host to utilize the available network bandwidth that my hosts provide.
How does this differ from what I can do in VMs with Shared VHDx?
Shared VHDx remains a valid and recommended solution to provide shared storage to a guest cluster (cluster running inside of VMs). It allows a VHDx to be accessed by multiple VMs at the same time in order to provide clustered shared storage. If any nodes (VMs) fail, the others have access to the VHDx and the clustered roles using the storage in the VMs can continue to access their data.
S2D allows clustered roles access to clustered storage spaces inside of the VMs without provisioning shared VHDx on the host. S2D is also useful in scenarios where the private / public cloud does not support shared storage, such as Azure IaaS VMs. See this blog for more information on configuring Guest Clusters on Azure IaaS VMs, including with S2D: https://blogs.msdn.microsoft.com/clustering/2017/02/14/deploying-an-iaas-vm-guest-clusters-in-microsoft-azure/