Using a public Cloud such as Microsoft Azure is associated with a general expectation of infinite capacity and scalability. While we all know that there are always physical limits, the massive scale, ease of management and self-service nature of cloud environments give us the impression of a seemingly infinite set of computing resources. However, all cloud resources have finite capacity and when creating cloud apps we should carefully design for scalability from the very beginning.
In a cloud environment it's all about scale-out, so we should design and be prepared to manage scale-out efficiently. Even if you start small, you should be prepared and be able to easily scale out or scale back depending on demands. But apart from dynamic scaling, when the scale gets really big and you need for example thousands of cores, thousands of databases or tens of petabytes of storage, managing the capacity becomes challenging.
If designed well, you should be able to easily operate at big scale too and hopefully don't have to worry about running out of capacity or worry about overprovisioning.
What to Consider
Without defining how big "big" really is, let's discuss some key aspects to consider when designing for big scale.
First of all, hopefully your application is well designed for Cloud in general. There is a lot of available guidance for that (e.g. Azure prescriptive guidance, cloud design patterns, etc.) so I won't cover general cloud architecture aspects here.
Assuming the application is well designed for Cloud and highly scalable, let's look into capacity planning. As it is the case with on-premises environments, you should quantify the resource needs for your workload. Sometimes you don't know upfront how much end users or traffic you'll have to manage. But for scale-out scenarios, what you really need is to estimate or measure the resource needs for a "scale unit". The scale unit will be individually defined for the app and typically will include a set of underlying cloud resources (e.g. a number of compute nodes, storage, and other services). The provisioning of a scale unit should be fully automated (and you can use it for initial provisioning, scaling operations or even for disaster recovery). In addition you would need to verify if your application needs or depends on some other centralized resources (outside of a scale unit), such as front end gateway, logging components, etc. that might become a point of contention. If this is the case you'd need to make sure those can serve the required workload (and they might need to dynamically scale too, using their own scale units).
Once the capacity planning (incl. growth estimations) is complete, we might assume that all we need is to automate and manage the scaling operations (static or dynamic, proactive or reactive, based on estimates, monitoring data/events, etc.). And this is kind of true. But wait, how about any resource limits of the underlying cloud platform?
Yes, there are limits in the cloud world too. And those limits will impact your architecture. Some limits are driven by technical constraints, some are in place for administrative and manageability reasons. You can find the service limits for Azure documented in this article.
Note: The article describes the general quotas and limits for main Azure services. Similar to compute nodes, all other resources typically come with some size and capacity (that often varies for different SKUs). Dependent on which resources your application uses you'd need to check the specific capacity and quotas. Typically this information is provided in the product documentation or SKU pricing details for the service (a few examples for some common services: BizTalk Services: Editions Chart, Service Bus usage quotas, Azure Queues and Service Bus Queues - Compared and Contrasted, Azure Websites details).
The limits apply to the underlying building blocks of the platform that your application will use. So when you define your application scale units and scaling strategy, you'd need to consider the limits for the underlying resources that will be used. You'll see some resources are grouped into what I call "logical containers", e.g. storage account, database server, cloud service, subscription. Those containers have capacity and performance targets too. If you need more than the specified limits, you'll need to use additional containers, e.g. to scale across multiple storage accounts, database servers, or subscriptions. When you automate the provisioning and scale operations you should make sure you can do this across logical containers. Ideally you should use parameters for the containers and be flexible where you deploy to.
In addition to the documented limits here is a list of some other topics to consider:
Currently, no network bandwidth limits are listed for compute nodes/VMs, but as you can expect the bandwidth is finite too. For initial planning purposes you can assume the allocated bandwidth mentioned in this article. However, you should test the application workload and make sure you are not running into a network bottleneck.
Note: This is subject to change; for latest Azure service limits, see Azure Subscription and Service Limits, Quotas, and Constraints.
- You can have up to 2048 VMs per Virtual Network (VNET). However, if you need more VMs with direct connectivity you can use the VNET-to-VNET feature (see this article for more details).
- Each VNET supports a single virtual network gateway and for planning purposes you can assume a theoretical bandwidth limit of 100 Mbps. For higher throughput needs you can consider using ExpressRoute. A good overview and comparison with VNET VPN can be found here.
- For manageability reasons and better experience during deployment, upgrades or scale-out operations, I would recommend to stay under 100 instances per cloud service role. If you need more than that you might consider using multiple cloud services roles with identical application code deployed. Note that this would result in additional endpoints so you'll need a mechanism to distribute the traffic across the endpoints.
- Multiple smaller deployments instead of one very large deployment are preferred. Although there is no specific limit around this, based on experience I would recommend a single deployment size of up to 100 cores. For example, if you need 1,000 cores, deploying 10x 100 cores would be a better option instead of a single deployment with 1,000 cores. For sure you can use bigger deployments too, but keeping them small will increase the manageability.
- Because VM VHDs are stored in a storage account, there is a recommended number of VM disks per storage account in order to avoid the 20,000 total requests limit and potential throttling: for Basic tier VMs up to 66 heavy used VHDs in a single storage account (20,000 request rate limit /300 8 KB IOPS per persistent disk); for Standard tier VMs up to 40 heavy used VHDs in a storage account (20,000 request rate limit /500 8 KB IOPS per persistent disk).
Typically, multiple limits will impact your calculations and deployment architecture. Let's work though a real-life example for the number of VMs that can be deployed in a single subscription (before it becomes necessary to scale across subscription):
- 50 VMs per cloud service * 200 cloud services per subscription = 10,000 VMs per subscription
à 10,000 VMs/subscription
- A 10,000 cores limit per subscription applies, so in this case it won't be possible to have 10,000 VMs with more than one core. In the specific example 2-core VMs were used, which reduced the maximum number of VMs to 5,000 VMs per subscription.
à 5,000 VMs/subscription
- In this example, Standard tier VMs with one persistent disk were used, which gives us:
50 storage accounts per subscription * 40 VMs per storage account = 2,000 VMs per subscription
à 2,000 VMs/subscription
- After performing tests, the optimal number of VM VHDs per storage account for the given application workload was identified as 30 VMs per storage account:
50 storage accounts per subscription * 30 VMs per storage account = 1,500 VMs per subscription
à 1,500 VMs/subscription
However, keeping aside 5 storage accounts for other purposes resulted in the final number of 1,350 VMs per subscription.
à 1,350 VMs/subscription
As you can see, different aspects and dependencies between services and limits impact the calculation. For this reason, I recommend a multi-phase approach. First, define the scale units from application (logical) perspective and put the deployment architecture in place based on that. Second, go item by item through the documented limits and verify if the defined architecture fulfils those. In many cases you'll find out that some of those limits impact the architecture, so you'll need to modify it accordingly. Once adapted, you'll need to work again through the list and verify the limits.
After you have finalized your deployment architecture, often you'll realize that you'll need to increase some of the default limits for your subscription(s). You will need to contact Azure customer support to request quota increases.
Once deployed, it would be important to monitor and compare the actual workload with the initial estimates and make adjustments as necessary.
As described in this blog post, you should carefully consider service limits, quotas, and performance targets when designing your deployment architecture.
The Azure platform evolves very fast and limits are subject to change. Especially if you are running a large scale installation in Azure it is important to check regularly the Azure Subscription and Service Limits, Quotas, and Constraints and service specific product documentation for the latest updates.