This may sound obvious to many – my apologies to that section of my audience. However, as I engage with more and more customers complaining about not getting good read/ write IO performance on Azure Linux VM-s, especially when using databases like Mongo/ MySQL or Cassandra/ CouchDB or HBase, I see a trend.
Typically, the engineering group runs the initial tests on performance. They spin up an Azure Linux VM, install the database software on it, point a load testing client at it and run tests. As this is meant to be a quick go/no-go assessment, they do not bother to add additional data disks and move the data location from the OS disk to the data disk.
The results, therefore, are misleading. When you install any of these databases using the standard package managers (apt-get or rpm), the data location is still on the OS disk (/usr/local or /etc or something like that).
On an Azure VM, the OS Disk is not optimized for IO performance. Instead, it is optimized for faster boot times. That is why before running any initial performance tests, we should add at least one data disk and move the data location to the new additional disk. After all, that is the way production is going to look like – so the initial tests will mimic production in that sense.
That alone will improve the IO performance by several times. If we want to get even more performance gain, we should add multiple data disks and stripe them in a RAID 0 array. See this blog post.
Build better things on Azure!