I have been on about four engagements in a row this year where we are looking at SQL performance on VMWare or Hyper-V.
Here is a list of common things that you can do with virtualisation that may adversely affect SQL performance. Most of them also apply to physical environments, for example if you are consolidating SQL onto multiple instances.
- Using a large Shared disk group for all virtual workloads
- Mounting the VHD to VMWare disk on a server file system (instead of pass though disks)
- Using a large disk pool when only one controller can own the disk group (some SAN’s are limited in this way and some are not)
- Overcommiting CPU
- Overcommiting Memory
- Not using 64k block size and allocation unit size
- Not using Volume alignment (on guest and host)
- Using dynamic disks (much better in Hyper-V R2, but still not generally recommended)
- Not ensuring Logs are on dedicated spindles
- Not using multiple HBA channels on larger workloads
- Sharing a switch between data, network and CSV
- Not using CPU affinity (some virtualisation platforms support affinity)
- Not using an “enlightened” operating system (Hyper-V)
- Running multiple VM’s on a single host slightly decreases throughout, but this is kinda the point of virtualisation so hard to avoid.
- Running lots of SQL Servers on one host and having too few HBA cards or a low queue depth
- Running 32 bit SQL Server guest on workloads that need lot of memory.
- Not pre-sizing TempDB
- not planning for database growth events
The top one item (use of a shared disk group) is a very common configuration for disks, especially when using the clustered shared disk volumes. But we know they will adversely affect performance, so what to do ? ban these configurations?
A more practical approach may be to just accept that your latency for SQL Server will be slightly worse than it could be and to invest in making the disk infrastructure perform better, so we work out at the same performance on virtual environment as we would on the physical.
For example if you know that a shared disk volume is going to add 1ms+, can we add more spindles to the disk pool to take away 1ms + from the latency.
If we do need to use a shared disk pool, can we at least separate OLTP and reporting/data warehouse workloads into two disk pools. One might be suitable for RAID 5/6 with spikes in sequential IOPS and one might lean towards RAID 10 with mainly random IOPS and a larger reliance on writes.
If we are overcommiting CPU, as we have lots of SQL Servers only using 5% of CPU, can we dedicate CPU’s for the one mission critical ERP database.
If we are sharing Logs, can we at least dedicate a disk group to shared Logs and allocate a dedicated log disk to the 1 or two critical workloads.
Cindy Gross has a great blog article on troubleshooting SQL Server on VMWare and associated white papers which is well worth a read.