Queue Scalability

I'm building an always-on service that gets its messages from a front-end queue. How do I design this service to be scalable?

There are two directions to go in when talking about scalability. There's scaling up, handling more messages while continuing to use a single machine, and then there's scaling out, handling more messages by using multiple machines. We'll talk about the two individually even though you may be using a combination of techniques to improve scalability.

Scaling up is the simpler of the two although it isn't as interesting to talk about here. The WCF model works very naturally and automatically with scaling up. No matter how powerful the machine gets, it rarely makes sense to replicate an always-on service on a single machine. Rather than duplicating processes, single machine scalability in WCF is achieved by adding more threads to a single process. In other words, you'll get automatic scaling up until the point that you start hitting service model quota limits. This won't take long as the default service model limits are designed for a single processor machine with 10 concurrent clients. However, you can easily in the binding start increasing the number of threads pumping messages and the number of threads processing calls until you hit the hardware limitations of the machine. This is a balancing act to reach a point where resources are never idle, but they're also not overcommitted and facing contention.

Scaling out involves replicating the server process to multiple machines. Typically the machines are tied together with network load balancing so that the distribution of load on the farm is controlled by the administrator rather than the user. Each of the machines needs their quotas tuned as above, although homogeneous hardware will let you get away with using a homogeneous binding on all of the machines. An asymmetric farm, with machines of varying capabilities, is really hard to tune.

Queues come in to the picture because they generally favor scaling up rather than scaling out. Having a local queue has a number of advantages, including better transaction support unless you happen to be running a Vista or later operating system. However, it is generally much cheaper to scale out than scale up.

A quick way to judge which way you need to scale next with your queue is to look at the bottlenecked resource. If you have exhausted the amount of network bandwidth you have moving messages out of the queue, then you should be thinking about moving the queues close to the service and scaling up. If you have exhausted the amount of computational power, either processing or disk, then you should be thinking about replicating the service to scale out and read from a remote queue. If clients are having trouble but you've maxed out neither your network nor computational power, then you probably need to go back and fix your quotas.

Next time: Amplified Flooding Attacks