Running a MySQL Galera cluster on Microsoft Azure


A few weeks ago, I was looking into running a MySQL Galera Cluster for a customer with a large Linux IAAS deployment on Azure.

Why that? There’s ClearDB, a Microsoft Partner that offers MySQL on Azure as SaaS (Software as a service), so you can go to and pick your size. Or, if you want to run it on your own, you can pick a Ubuntu Linux Gallery image and type “apt-get install mysql-server” and that’s it, right? Well, not so fast…

ClearDB is a great offering for most customers that need a MySQL backend, but in this case, even the largest ClearDB offer was not sufficient.

So the customer followed the second path down, he created an IAAS VM (actually several VMs which each run an independent database server for different purposes) and configured his services to use these databases via the internal IP addresses of these servers. But there’s one problem with this approach: Occasionally, Azure needs to deploy patches to the host systems running these VMs. And occasionally, the Linux VMs also need patches that require a restart of the database server or a reboot of the machines. Whenever this happened, the customer site would be down for a few minutes. 

To avoid this occasional downtime, I teamed up with Oli Sennhauser, CTO at FromDual and my colleague Christian Geuer-Pollmann to set up a MySQL Galera Cluster on Azure.

Such a cluster consists of three MySQL VMs. Database connections can be handled by all three machines, so the DB (read) load is distributed as well. As long as two machines are up, the database service is available. Galera achieves this by implementing the replication of database write transactions. More information can be found on and on 

So, here’s the tl;dr version of what we did:

- Set up three Ubuntu 14.04 LTS IAAS VMs with fixed internal IP addresses
- Set up an Azure internal load balancer so that database clients have a single IP they connect to
- Installed mysql-server-wsrep-5.6.16-25.5 and galera-25.3.5 plus a few dependencies
- Configured galera on these three machines
- Added a bit of iptables magic, courtesy of FromDual, to the VMs to block access to the MySQL port while a database server is recovering. The internal load balancer then moves the clients to the other servers of the cluster in case one is down.
- And in order to keep this all neat and clean, we used Powershell to automate the Azure setup part.

0. Prerequisites

The fixed internal IP and the internal load balancer make use of features that were only added to the Azure virtual network quite recently. Chances are that if you configured an Azure virtual network a while ago, these function may not be available. So just configure a new virtual network for this.

Currently, some of these features can only be configured via Powershell. So you need a (windows) machine to run powershell on, if you don’t have one handy, just create a small (A1) Windows server machine in the Azure portal and use RDP to connect to it. Then install the Azure Powershell, see here.

And you should do a bit of planning ahead for your new virtual network. It should have sufficient IP addresses to host all your database clients, the three servers of the cluster and the additional IP input address of the load balancer. In this case, we used the default setting but placed all the database servers in the subnet.

1. Creating the machines and the internal load balancer

As said before, we scripted all this in powershell. And in order to keep the configuration apart from the actual commands, we set a bunch of variables in the header of our script that contain the actual settings. So when you see $servicename in the examples below, that is something we’re setting in this header.

The Load balancer is created by this powershell command:

Add-AzureInternalLoadBalancer -ServiceName $servicename -InternalLoadBalancerName $loadBalancerName –SubnetName $subnetname –StaticVNetIPAddress $loadBalancerIP

When running this command, we found that the service needs to be deployed before running this command. So in order to ensure this, we just created a small toy IAAS VM, then created the loadbalancer and the database VMs and then removed the toy VM again.

To configure a VM to use the internal load balancer, we add an endpoint to the VM configuration:

Add-AzureEndpoint `
            -Name mysql `
            -LocalPort 3306 `
            -PublicPort 3306 `
            -InternalLoadBalancerName $loadBalancerName `
            -Protocol tcp `
            -ProbePort 3306 `
            -ProbeProtocol "tcp" `
            -ProbeIntervalInSeconds 5 `
            -ProbeTimeoutInSeconds 11 `
            -LBSetName mysql

Since we have multiple Linux VMs in the same cloud service, we need to remove the standard SSH endpoint and create an individual SSH endpoint for each machine:

Remove-AzureEndpoint `
            -Name SSH `
            | `
Add-AzureEndpoint `
            -Name SSH `
            -LocalPort 22 `
            -PublicPort $externalSshPortNumber `
            -Protocol tcp `

And we want to use a static internal IP for each machine since we need to specifiy these IP adresses in the galera configuration:

Set-AzureSubnet -SubnetNames $subnetname `
            | `
Set-AzureStaticVNetIP -IPAddress $machineIpAddress `
            | `

We wrapped all this into a configuration function called GetCustomVM. So here’s the complete script:

 1: #
 2: # Set up three VMs for a Galera Cluster
 3: #
 6: # Azure Cmdlet Reference
 7: #
 9: $subscriptionId     = "<your subscription ID here>"
 10: $imageLabel         = "Ubuntu Server 14.04 LTS"       # One from Get-AzureVMImage | select Label
 11: $datacenter         = "West Europe" # change this to your preferred data center, your VNET and storage account have to be set up there as well
 12: $adminuser          = "<your linux user name here>"
 13: $adminpass          = "<a linux password>"
 14: $instanceSize       = "ExtraSmall" # ExtraSmall,Small,Medium,Large,ExtraLarge,A5,A6,A7,A8,A9,Basic_A0,Basic_A1,Basic_A2,Basic_A3,Basic_A4
 15: $storageAccountName = "<the storage account name for the vm harddisk files>"
 16: $vnetname           = "<the name of your vnet>"
 17: $subnetname         = "<the name of the subnet for the database servers>"
 19: $loadBalancerName   = "galera-ilb" # should be changed if there are multiple galera clusters  
 20: $loadBalancerIP     = ""
 22: $servicename        = "<your service name>" # all machines will be created in this service
 23: $availabilityset    = "galera-as" # should be changed if there are multiple galera clusters  
 25: #
 26: # Calculate a bunch of properties
 27: #
 28: $subscriptionName = (Get-AzureSubscription | `
 29:     select SubscriptionName, SubscriptionId | `
 30:     Where-Object SubscriptionId -eq $subscriptionId | `
 31:     Select-Object SubscriptionName)[0].SubscriptionName
 33: Select-AzureSubscription -SubscriptionName $subscriptionName -Current
 35: $imageName = (Get-AzureVMImage | Where Label -eq $imageLabel | Sort-Object -Descending PublishedDate)[0].ImageName
 37: $storageAccountKey = (Get-AzureStorageKey -StorageAccountName $storageAccountName).Primary
 39: $storageContext = New-AzureStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $storageAccountKey
 41: #
 42: # Fix the local subscription object
 43: #
 44: Set-AzureSubscription -SubscriptionName $subscriptionName -CurrentStorageAccount $storageAccountName 
 47: #
 48: # This function encapsulates the configuration generation of a single new Galera VM
 49: #
 50: Function Get-CustomVM
 51: {
 52:     Param (
 53:         [string]$customVmName, 
 54:         [string]$machineIpAddress, 
 55:         [int]$externalSshPortNumber,
 56:         [string] $storageAccountName = $storageContext.StorageAccountName
 57:         )
 59:     # 
 60:     # configure the VM object
 61:     #
 62:     $vm = New-AzureVMConfig `
 63:             -Name $customVmName `
 64:             -InstanceSize $instanceSize `
 65:             -ImageName $imageName `
 66:             -AvailabilitySetName $availabilityset `
 67:             -MediaLocation "https://$$customVmName-OSDisk.vhd" `
 68:             -HostCaching "ReadOnly" `
 69:             | `
 70:         Add-AzureProvisioningConfig `
 71:             -Linux `
 72:             -LinuxUser $adminuser `
 73:             -Password $adminpass `
 74:             | `
 75:         Set-AzureSubnet -SubnetNames $subnetname `
 76:             | `
 77:         Set-AzureStaticVNetIP -IPAddress $machineIpAddress `
 78:             | `
 79:         Remove-AzureEndpoint `
 80:             -Name SSH `
 81:             | `
 82:         Add-AzureEndpoint `
 83:             -Name SSH `
 84:             -LocalPort 22 `
 85:             -PublicPort $externalSshPortNumber `
 86:             -Protocol tcp `
 87:             |`
 88:         Add-AzureEndpoint `
 89:             -Name mysql `
 90:             -LocalPort 3306 `
 91:             -PublicPort 3306 `
 92:             -InternalLoadBalancerName $loadBalancerName `
 93:             -Protocol tcp `
 94:             -ProbePort 3306 `
 95:             -ProbeProtocol "tcp" `
 96:             -ProbeIntervalInSeconds 5 `
 97:             -ProbeTimeoutInSeconds 11 `
 98:             -LBSetName mysql
 100:     $vm
 101: }
 103: #
 104: # 0. Create cloud service before instantiating internal load balancer
 105: #
 106: if ((Get-AzureService | where ServiceName -eq $servicename) -eq $null) {
 107:     Write-Host "Create cloud service"
 108:     New-AzureService -ServiceName $servicename -Location $datacenter
 109: }
 111: #
 112: # 1. Create a dummyVM with an external endpoint so that the internal load balancer (which is in preview) is willing to be created
 113: #
 114: $dummyVM = New-AzureVMConfig -Name "placeholder" -InstanceSize ExtraSmall -ImageName $imageName `
 115:     -MediaLocation "https://$" -HostCaching "ReadWrite" `
 116:     | Add-AzureProvisioningConfig -Linux -LinuxUser $adminuser -Password $adminpass `
 117:     | Set-AzureSubnet -SubnetNames $subnetname `
 118:     | Set-AzureStaticVNetIP -IPAddress "" 
 120: New-AzureVM -ServiceName $servicename -VNetName $vnetname -VMs $dummyVM 
 122: #
 123: # 2. Create the internal load balancer (no endpoints yet)
 124: #
 125: Add-AzureInternalLoadBalancer -ServiceName $servicename -InternalLoadBalancerName $loadBalancerName –SubnetName $subnetname –StaticVNetIPAddress $loadBalancerIP
 126: if ((Get-AzureInternalLoadBalancer -ServiceName $servicename) -ne $null) {
 127:     Write-Host "Created load balancer"
 128: }
 130: #
 131: # 3. Create the cluster machines and hook them up to the ILB (without mentioning "-Location $datacenter -VNetName $vnetname ", because the $dummyVM pinned these already
 132: #
 133: $vm1 = Get-CustomVM -customVmName "galera-a" -machineIpAddress "" -externalSshPortNumber 40011
 134: $vm2 = Get-CustomVM -customVmName "galera-b" -machineIpAddress "" -externalSshPortNumber 40012
 135: $vm3 = Get-CustomVM -customVmName "galera-c" -machineIpAddress "" -externalSshPortNumber 40013
 136: New-AzureVM -ServiceName $servicename -VMs $vm1,$vm2,$vm3
 138: #
 139: # 4. Delete the dummyVM
 140: #
 141: Remove-AzureVM -ServiceName $servicename -Name $dummyVM.RoleName -DeleteVHD






Now the load balancer and the three VMs are created.

2. Install and configure Galera on the three VMs

We took the galera .deb packages from and 

In these packages, we found a few minor glitches that collided with the Ubuntu 14.04 LTS we installed them on. The first glitch was that mysql-server-wsrep-5.6.16-25.5-amd64.deb has a configured dependency on mysql-client. And Ubuntu sees this satisfied with the mysql-client-5.5 package it uses as default, but this creates a version conflict. So I downloaded the .deb and modified its dependency to point to mysql-client-5.6 by following The second glitch was the fact that the default my.cnf contains the path /var/log/mysql/error.log which does not exist on Ubuntu. This created the strange situation that the server process would not start but just create two mysterious entries in syslog. Running strace on the server process showed the path it was trying to access, and once I created it everything worked fine. Another glitch in the package was that is was missing an upstart script for mysql, instead it had just a classic /etc/init.d shell script which confused upstart.  So I took one from a standard mysql-server-5.6 package and everything worked out well.

The steps to set up Galera were:

$ apt-get install mysql-client-5.6
$ apt-get install libssl0.9.8
$ dpkg -i galera-25.3.5-amd64.deb
$ dpkg --force-depends -i mysql-server-wsrep-5.6.16-25.5-amd64.modified.deb
$ mkdir /var/log/mysql
$ chown mysql /var/log/mysql

and put the standard upstart script from mysql-server-5.6 into the upstart config directory.

The next part was to configure the galera cluster function. As you can see in the script above, we have created three machines with the internal IP addresses, and For this, we need to set a few things in the default my.cnf

wsrep_cluster_name="<your cluster name here>"

These settings are the same in all three machines. On each of the machines, we can now set a human readable node name, eg.g

wsrep_node_name='Node A'

In the next step, we configured the actual clustering, i.e., we told each machine where to find the replication partners.

On machine, we set the following line in my.cnf:


This allows this database node to come up even if there is no replication partner.

Then we started the server on

Then we set the following line in my.cnf on


and started the server on

Then we set the following line in my.cnf on


and started the server on

Now we went back to and changed the line to:


and restarted the server. Now the galera cluster was configured.

Instead of changing the configuration of the initial node twice, one can also directly start the server process and add the configuration setting to the command line, e.g. mysqld_safe wsrep_cluster_address=”gcomm://”. This is a good workaround if for whatever reason the cluster was fully shut down and needs to be brought up manually again.

Since the internal load balancer was already configured before, we can now use the ILB input IP address to connect to the cluster. So the clients use to connect to the cluster. And with each new TCP connection, the load balancer chooses one of the running nodes and connects the client to it.

There is one additional issue that may confuse clients in one specific situation. Imagine that one of the nodes just failed and is about to start up again. In this state, the database server can be accessed but does not yet have data replicated from the other nodes. In this state, although the clients can connect, all database commands will fail. If clients aren’t prepared to handle this situation, this may show up as database errors in applications. But there’s a solution: FromDual has implemented a small shell script that uses the Linux iptables firewall to deny access to the server while it is in this state. The load balancer then finds it cannot access the TCP port and reroutes the request to another running cluster node.

To run the script whenever a replication state change occurs, another line is added to my.cnf:

wsrep_notify_cmd = /usr/local/bin/

The script and the instructions for setting this up can be found here: Don’t be alarmed by the fact that it talks about hardware load balancers, it works the same with the (software-based) Azure internal load balancer.

Hope this helps,


Comments (11)

  1. Rami Sarieddine says:

    Thank you for sharing! this documentation is awesome!

  2. Sakari Keskitalo says:

    Thank you for another nice Galera story how it can be implemented and used!

  3. Oscar Valderrama says:

    Thank you,  I have a question

    68:   -HostCaching "Read" ` is "Readonly" or "ReadWrite"

  4. Dominic Taylor says:

    Here is a pastebin with all line numbers removed;

  5. Holger Kenn says:

    Hi! I just fixed a typo in the script, as Oscar pointed out, the caching option should not be "Read" but "ReadOnly". Dominic: Could you also update that in the pastebin?



  6. Dominic Taylor says:

    Updated pastebin with the ammends and 3 nodes;

    Also script with 5 nodes for more resilience;

  7. Dominic Taylor says:

    If you setup the cluster originally with 5 nodes then using the autoscaling feature built into azure you can set 3 nodes minimum running all the time, to 5 nodes max autoscale with demand. Then if the cluster gets higher demand then there is easy scope for auto scaling from the onset.

  8. Holger Kenn says:


    I've decided to put my scripts on github, that way it is easier for me to update them and they're all in one place.…/azurescripts


  9. Zil says:


    Good article, thanks.

    One our client have MySQL  Galera and we need to migrate it to Azure.

    Question is: is it possible to optimize resource usage using cloud service autoscaling in such a way:

    – create cloud service with autoscaling,

    – add all cluster nodes to cloud service.

    – let cloud service manage cluster nodes (shutdown / start)?

    1. Holger Kenn says:


      You can do that but I would recommend against this. The scaling of a database cluster is a complex operation and adding an additional node is essentially quite a burden to the remaining nodes since the database needs to be replicated to the new node. So while “scaling up” you actually reduce the database performance of your application until the new node is online. In addition, there is no simple performance value that the autoscaler could base its decision on. For instance, setting a CPU utilization threshold will probably not be enough because the typical database node is usually I/O bound before the CPU even feels a slight load. Essentially, the scaling of a database should be a conscious decision that I wouldn’t leave to the autoscaler.

      However, you can prepare a couple of additional database nodes beforehand (like you would do with the autoscaler) and then start them manually or on a fixed schedule if you know that the load will be high. For instance, you could run three nodes over night but many more nodes during the day for a business website. You could run many nodes before christmas but few nodes for the rest of the year for an e-commerce website. And you could start a few extra nodes in case you are a German soccer club and have an important game coming up. You can use the built-in scheduling function of the autoscaler to orchestrate this.

      One caveat though: Although horizontal scaling is great for read performance, write performance does not profit that much since in a cluster like Galera, every node gets all the writes replicated and has to write everything down into its own storage.


    2. lpanzano says:

      Hi Zil,

      We recently announced the availability of MariaDB Enterprise Cluster with Galera technology on the Azure Marketplace:

      This solution allows you to automatically deploy a full cluster in Azure with just a couple of clicks.
      MariaDB was founded by the creators of MySQL so you shouldn’t expect any major compatibility issues. Let me know if this is helpful for your scenario.
      Link to the Azure Marketplace:

Skip to main content