Geo-Load Balancing with the Azure Traffic Manager

One of the great new features of the Windows Azure platform is the Azure Traffic Manager, a geo load balancer and durability solution for your cloud solutions.  For any large website, managing traffic globally is critical to the architecture for both disaster recovery and load balancing.

When you deploy a typical web role in Azure, each instance is automatically load balanced at the datacenter level.   The Azure Fabric Controller manages upgrades and maintenance of those instances to ensure uptime.  But what about if you want to have a web solution closer to where your users are?  Or automatically direct traffic to a location in the event of an outage?   

This is where the Azure Traffic Manager comes in, and I have to say, it is so easy to set up – it boggles my mind that in today’s day and age, individuals can prop up large, redundant, durable, distributed applications in seconds that would rival the infrastructure of the largest websites. 

From within the Azure portal, the first step is to click the Virtual Network menu item.

image

On the Virtual Network page, we can set up a number of things, including the Traffic Manager.   Essentially the goal of the first step is to define what Azure deployments we’d like add to our policy, what type of load balancing we’ll use, and finally a DNS entry that we’ll use as a CNAME:

image

We can route traffic for performance (best response time based on where user is located), failover (traffic sent to primary and only to secondary/tertiary if primary is offline), and round robin (traffic is equally distributed).   In all cases, the traffic manager monitors endpoints and will not send traffic to endpoints that are offline.

I had someone ask me why you’d use round robin over routing based on performance – there’s one big case where that may be desirable:  if your users are very geography centric (or inclined to hit your site at a specific time) you’d likely see patterns here one deployment gets maxed out, while another does not.   To ease the traffic spikes to one deployment, round robin would be the way to go.  Of course, an even better solution is to combine traffic shaping based on performance with Azure scaling to meet demand.

In the above image, let’s say I want to create a failover for the Rock Paper Azure botlab (a fairly silly example, but it works).   I first added my main botlab (deployed to South Central) to the DNS names, and then added my instance deployed to North Central:

image 

From the bottom of the larger image above, you can see I’m picking a DNS name of botlab.ctp.trafficmgr.com as the public URL.  What I’d typically do at this point is go in to my DNS records, and add a CNAME, such as “www.rockpaperazure.com” –> “rps.ctp.trafficmgr.com”.

In my case, I want this to be a failover policy, so users only get sent to my North Central datacenter in the event the south central instance is offline.  To simulate that, I took my south central instance offline, and from the Traffic Manager policy report, you’d see something like this:

image

To test, we’ll fetch the main page in IE:

image

… and we’re served from North Central.  Of course, the user doesn’t know (short of a traceroute) where they are going, and that’s the general idea.  There’s nothing stopping you from deploying completely different instances except of course for the potential end-user confusion!

But what about database synchronization?   That’s a topic for another post …