Santiago Cánepa – Premier Field Engineer – Microsoft
Hugo Salcedo – Premier Field Engineer – Microsoft
Hugo and I presented a session a few months ago about Azure limits and unlimited scalability. The session went very well, and we had a blast doing it.
Azure promises unlimited scalability, but the documentation does not exactly tell you how to achieve it. Continuing in the spirit of our previous posts to provide guidance, we thought we’d shed some light into what it means to be unlimitedly scalable:
The word “limit” and its derivatives are a bit funny, since they appear in the text above, but also in the text below:
Turns out we offer unlimited servers and storage, with limits on servers and storage… Somewhat confusing, to say the least!
You see, these two excerpts from the Azure website are referring to different things, so they are really not contradictory.
The first one is referring to the ability to create unlimitedly scalable applications, whereas the second is referring to the implementation details of Azure, which you need to bear in mind, to be able to avoid scalability pitfalls, hindering your scalability potential.
Although in our presentation we discussed the concept of scalability, scale units, and some of the most popular and important patterns of scalability, we wanted to focus this blog entry on the more practical topics, more specifically Azure limits, how they interact, and how to plan for large deployments.
Let’s take a look at the currently existing limits. These are publicly published and are refreshed fairly often, when changes are introduced. This is great, since when we started with Azure, there was no single place to look these limits up, leaving it to each individual to keep a list and maintain it!
We are going to focus our analysis on IaaS virtual machines, but you must consider that the same principles apply to any type of resource in Azure.
Soft Limits vs. (almost) hard Limits
You have probably noticed that when limits are published, there are two types of limits: Default and Maximum.
The default limit are those enforced when a subscription is created. They prevent excessive consumption of resources, which helps both the subscriber and Azure. From the subscriber point of view, it helps prevent excessive consumption of resources, and requires explicitly extending the limits (by means of a support ticket) if needed. From the Azure perspective, it helps ensure that there is no single customer trying to consume too many resources (e.g. trying to exhaust Azure resources on purpose).
The default limits are considered soft limits when there is a higher maximum limit.
Now notice that the maximum (hard) limit, is a number which we have seen adjust up as new capabilities and capacity are added to the datacenters. Although strictly speaking, these are not hard limits, at any point in time we consider them hard limits because we can’t request more resources than those maximum limits at the time of the request.
When we create these virtual machines, we have to properly size them. Selecting a particular VM size, not only determines the number of cores and memory size, but it also determines how many data disks you can attach, the size of the temporary storage (D: Drive), and the bandwidth associated. Notice that for PaaS VMs, the size of the temporary storage is larger than that of IaaS, but also depends on the size. Another important aspect is whether they can be associated to load balanced endpoints, which is determined by the service tier.
A subscription has an associated number of cores that they can allocate. This is an aggregate of all the VMs (PaaS and IaaS) for the subscription. Although shared cores are cheaper from a billing standpoint, they count as one core towards this limit.
Considering that the current default limit on cores is 20 cores per subscription, the largest machine combination we could create is one 16 core VM and one 4 core VM. Also, we could have 20 VMs with either shared cores or one core.
If we increase the limit of cores for the subscription to its maximum of 10,000, things get a bit more interesting. Could we create 10,000 VMs, with one core each? It depends. Before we jump to create this many machines, we have to consider other details:
· Do the machines need to “talk” to each other via a private network connection (as opposed to accessing each other via a public endpoint)?
· Do the machines need any extra storage attached?
· How many endpoints must each have?
· What public names (and associated VIPs) are required?
Depending on the answers to these questions, each subscription will be able to host more or less resources.
Let’s take the case where we need to create a large number of VMs. Considering the connectivity (or isolation) requirements, the number of computers that a subscription can host will vary greatly.
Say we want all machines to connect to each other. We could leverage the implicit networking features of cloud services. We get connectivity and even name resolution with very low effort.
Now, if we consider limits in cloud services, we’ll see that we can create 50 VMs per cloud service. For PaaS, the concept of “deployments” comes into play. This effectively splits the amount of VMs that can communicate among each other to 25 (still 50 per cloud service).
If you consider the default number of 20 cloud services, you will be able to have 20 sets of 50 connected computers, totaling 1,000 VMs. Increase the limit of cloud services to its maximum of 200, and you’ll get 200 sets of 50 computers, totaling, 10,000 VMs. Now remember, if you get to this point, these can only be shared or one core VMs. As you can see, there are “limit interactions” that you have to consider at all times.
Now, let’s imagine that all those computers we need to create must have connectivity among each other. For this, we would need to rely on virtual networking.
We can create a virtual network and try to provision all the VMs to that virtual network. However, there are limits at the virtual network level as well. If you looked closely at the limits, you might have noticed that virtual network have an associated limit of 2,048 virtual machines. What this means is that although technically, you can have 10,000 single core VMs, you can’t just network them using one single virtual network.
So how can we create a set of more than 2,048 computers with connectivity among them? Well, in order to do that, you’ll need to create a set of virtual networks and link them with VPN connections.
Like everything else, virtual networks are not free of limits. You can only have 10 VPN connections per virtual network. So let’s say you create 10 virtual networks and connect each with all others, you would have 10 Virtual networks * 2,048 VMs, yielding a total of 20,480 VMs.
Not so fast!
At this point, you will have hit the 10,000 core per subscription limit, so at this point, you would not be able to implement the above scenario… at least not in a single subscription.
The nice thing is that when you manage an Azure account, you will be able to have an unlimited number of subscriptions associated to that account. This opens things up to the promise of unlimited compute power. Of course, the physical limit of the data center will be a next limit (or rather, the amount of money you would have to cough up in order to pay for the usage of all those resources).
With some careful networking and resource planning, you could have an arbitrarily big compute setup that could satisfy almost any need.
Compute and Networking are two of the Azure services that depend on each other. The third service mandatory of consideration is Storage.
The reason this service is of special importance is that it is used as backing for the virtual hard disks of the VMs that are created.
Just like any other service, storage has many capabilities and many associated limits. The first important limit is the size to which a single storage account can grow. The importance of this limit is that the aggregate size of the disks associated to a VM (OS or data disks) cannot exceed the limit for the storage account.
Another important limit is the level of sustained performance that a single blob and the storage account as a whole can provide. These are expressed in IOPS and, at the time of writing, a single blob will have a target of 500 IOPS, with 20,000 IOPS at the storage account level. A quick calculation shows that a single storage account can host up to 40 virtual disks which could have simultaneous bursts without being throttled. Notice that we can store a much higher number of virtual disks in one single storage account, but we are using 40 as a best practice to ensure that the maximum performance can be achieved for every virtual hard disk in the storage account.
Now, this adds another important piece of information: considering that we can have 100 storage accounts, and that each could have 40 virtual hard disks (for maximum performance), then we could have, at most, 4,000 single-disk VMs (this number would decrease linearly, as the number of virtual hard disks increases per VM).
As described in the previous sections, there are interactions across the different limits that must be considered when designing an Azure implementation:
We could have 10 VMs with 2,048 VMs each totaling 20,480 VMs in a subscription…
But we can only have 10,000 cores or 10,000 single-core computers in a subscription (remember that shared cores count as one core towards resource accounting)…
And even if we could have more cores, we still can only have 200 cloud services hosting a maximum of 50 VMs each, totaling 10,000 VMs…
But for them to be performing, we can only have 100 storage accounts hosting 40 virtual disks, totaling 4,000 single disk VMs.
Fortunately, we can have as many subscriptions as we want.
Remember that the limits mentioned above are not the only limits to consider. For instance, if we are deploying many machines to a virtual network, ensure that the subnets in which you are placing the VMs have enough free addresses for each computer.
Among the 4 big categories of services: Compute, Networking, Data, and Application, there are many services in Azure.
All these have different limits, and each will have different ways to measure what they do, how well they do it, and how they are constrained by limits. Likewise, their approach to scalability depends on the specific metaphor they present. It is well beyond the scope of this post to discuss those, but if you are planning of using other services, you should be familiar with the specific scalability capabilities.
There are many things to consider when planning a large scale architecture. We can’t possibly mention all the limits and their interaction, so we wanted to just focus on a few to get you thinking about the topic. These are the ones we see customers hit the most quickly, and struggle to overcome them. We hope that this post helps you reach your limitless self.