Design for Failure, Growth and Distribution!

When designing any piece of software, there are always going to be design tenants that you practice based on the type of software being developed.

For service software that is designed for a global, business critical audience, there are three key tenets in my opinion. In fact, there not so much tenets as they are assumptions.

So when you start designing your service(s), you need to design with the assumption that your service is going to do the following, and do it on a regular basis:

  • Fail
    • This does not only mean your service will fail during runtime execution, but also during upgrades, deployment of new functionality, bug fixes, configuration changes to the environment
    • Your operating environment will fail. Power will go out, network appliances will die, hard disks will crash, CPU and Memory will fault, assume it will all happen, and probably all at once
    • In a standard data center deployment, you need to work out what your minimum unit of failure is. Is it a rack? Is it a switch? The data center? Knowing this helps you work out how to distribute your service to counter failures.
  • Grow
    • Your application is going to need to serve lots and lots of requests, so think about the scale of your application. Not only scale out (as in, the ability to serve lots of incoming requests simultaneously) but also scale up (make sure your application can dole out capacity)
    • Bandwidth and Ingress/Egress is also a concern, pipes in and out of your service are constrained by physical limitations, so you’re going to have to think about how to deal with heavy load on your network and how your application will deal with this during peak and off-peak periods
    • At some point, you service is going to exceed the place its located, as in, there just won’t be any more machines to deploy your service to, and there won’t be any more space to deploy more machines, so you’re going to have to design your service with the goal that you can deploy it across many servers across many data centers
  • Distribute
    • Your customers invariably will want your service closest to them, and they will invariably never be located closest to your primary deployment data center, so design your service in a way that it can serve your customers at the point that closest to them. It reduces latency and improves service quality and experience

Also, designing a service involves thinking through the whole application lifecycle. Traditionally, we think about issues and bugs at the runtime level, but with services, because they are always on, living, breathing systems, you have to design them in a way that they can be changed (upgraded, configured differently, etc) at any time without interrupting its availability (the ability for customers to connect and use your service) and its consistency (the state maintained by the system must always be consistent and correct).

Now, this is by no means the magna carta of service design, but as you start thinking about designing your always available, highly scalable, geo-distributable service, the tenets above should be good food for thought.

Enjoy :)

Technorati Tags: Design,Services