Over the past two decades I have worked with a wide range of teams in a variety of industries building highly available, large scale systems and the experiences seem to be relatively consistent across them. The teams that act based on strong academic concepts understand the nature of a highly distributed and highly available system but sometimes struggle when the practical world gets in the way. On the other end of the spectrum the teams that focus almost completely on the practical aspects of the solution often miss the understanding of highly distributed computing opting to simplify concepts and reduce complexity in areas that later prove to be a bad decision.
I started my journey into this from a practical side, learning or brushing up on the relevant academic theories and concepts as appropriate. It is the healthy balance of these two almost conflicting view points that make for the best architectures in my opinion.
That’s it, right? Well on paper we may think that but then we get into the real world, it’s funny how that causes issues, and business drivers must be brought into it. Cost of goods, supportability, HR aspects surrounding hiring, compliance and much more. Let’s face it, if we build the best system in the world but it cannot achieve the business objectives have we succeeded?
This is the next element that needs to come into play, business and constraints that it puts on our ability to choose direction, innovate and in general operate highly available systems at scale. If you think about the CAP theorem and many of these other “choose any two triangles” this feels an awful lot like that at times. Possibly there is a need for a BAP (Business-Academic-Practical) triangle because I have yet to find a single project I have worked on that they are all perfectly aligned, although there may be some out there.
This leads to the purpose of this post. I have decided to begin a series of blob posts that will be complete with sample code, open source repositories or other relevant artifacts to simplify peoples building of these solutions. The goal will be to build out several of the commodity elements associated with a typical highly available and scalable solution so as a community we can increase our time to delivery in the real world. To understand what is needed we will focus on building an IoT solution using the Azure platform. Some of the key elements of the solution will include
1. Security first mindset
2. Occasionally connected devices that use Service Assisted Communication
3. Heterogeneous device and protocol environment
4. Command and Telemetry channels
5. Critical messaging
6. Alerting, alarming and incidents
7. I/O mapping that is done as early in the service as possible
8. LAMDA architecture with hot and cold path
9. Analytics and authoritative store
10. Population isolation
11. Scale units
12. Audit and logging
13. Operational data collection and dashboard
14. Disaster Recovery
It would be easy to do something like this if we never cared about scales over a handful of devices, but just to keep us honest we will target a scale of 100K msg/second fed from 100K devices each with varying payloads from 10B to 2KB.
This should be a fun journey!