Stateful Services: A New Paradigm


This post was authored by Mani Ramaswamy, a Program Manager in the Azure Service Fabric Team.

 When building apps, particularly cloud services, the key set of characteristics that you have to focus on are scale, reliability and data consistency. Today, reliability is solely the premise of data store products such as Azure storage, Azure DB, etc., and you generally expect that these are stores are consistent.  This has pushed for a design approach where you have stateless middle-tier services with the state being completely external and each tier is scaled separately.

However, due to this separation of data and code for logic, you find that you have to partition the storage to lower latency & improve throughput.  Furthermore, at scale, often messages get lost between service tiers – thus, queues to enable reliable communication get thrown into the mix.  Furthermore, it wasn’t just between tiers but even between the machines within a tier where communication wasn’t reliable needing queues within a tier.  And you end up needing caches so that you can scale reads since the storage is external. 

Finally, depending on the data store chosen, one may also need state consistency and so you have to write code to manage transactions and leader election. Thus, the final architecture with code/data separation looks like the figure below. 

 

Getting all the above right at scale is a hard task, but necessitated by the separation of code and data.  Now, consider an alternative approach where this separation is removed, and you bring the reliability from the storage tier into the middle-tier (look at the DB symbol next to the compute symbol in the figure below showing code and data are co-located). This is the essence of the Stateful Services paradigm that Service Fabric offers.

Now, you get the reliability advantages of having code and data together and you no longer need as many queues. Furthermore, you don’t need caches since data is co-located with code –  this improves both read and write latency.

Service Fabric has first class partitioning support built into the system enabling massive scale (as you saw from earlier posts, Azure SQL DBs and Bing Cortana are built on top of Service Fabric). The platform also provides built-in transaction semantics to make it easier on the developer to build consistent systems such as banking and financial. An architecture using Service Fabric Stateful Services looks like the following figure. 

 

You may still have a three-tier approach since there will be some services that are fine as stateless services and you may choose to off load data into colder storage for offline processing or back-ups, but having the ability to address scale, reliability and data consistent at the app level, greatly helps with the simplification of the design. In summary, many cloud applications could be made a lot simpler while having reliability and performance at scale by using the Service Fabric Stateful Services approach.

 

Comments (2)

  1. OmariO says:

    It is quite big that MS offer such platform for devs. I was missing it and was going to buid fault tolerance, high availability and low latency.

    But you guys are almost invisible. Get to Channel 9: make one deep dive intro to the tech and many short how-tos. You will be popular!

  2. Richard Reukema says:

    Would this not make the Service Fabric a huge object oriented  database – where the object encapsulates the data as well as the functions that maintain the data? If so, would not the reporting of the data also get impacted – where the entire customer object would have to be retrieved to obtain the name?  I would like to see examples where stateful services are used, but also how this state can be extracted and used within a relational database.

Skip to main content