Handling Azure Resource Manager Deployment Limits


For many of those who know me, they know that I am very much a proponent of the Azure Resource Manager (ARM) framework and all that it can provide. My customers definitely know that as well as I have been helping them migrate to the framework for almost a year now.

However, one of my customers hit a deployment limit within one of their resource groups that has caused them some problems with respect to their Continuous Integration/Continuous Deployment (CI/CD) process. My customers are extremely smart, so even before talking with the ARM product team, they had already created a work around script and it is this limit and work around script that I want to talk about here.

Special Thanks to Bruce Bell from Liquidity Services for both bringing this to our attention and for creating the work around script.

Problem

What Liquidity Services is doing is using an ARM template as part of their continuous deployment process for their up and coming SaaS application. This is an application that leverages a combination of both IaaS and PaaS resources and it gets deployed to a single Resource Group every time that a build is processed through Visual Studio Team Services (VSTS). On average, the development team processes through 5-10 builds a day. This means that the number of “Deployments” to the same Resource Group grows pretty quickly.

With this, Liquidity found out very quickly that we have a limit set on our Resource Groups to prevent the number of unique deployments from going over 800. If you try and run the 801st Deployment against that Resource Group, it will fail. Azure does this so that a customer can roll back to a previous Deployment should a new one break what is happening within the Resource Group.

Needless to say, this is a problem for Liquidity and I am sure that it is a problem for many other customers that use ARM templates as part of the CI/CD process.

Solution

There is a pretty straight forward solution to this problem as there is no Out of the Box way to solve it. I call this the Resource Group Deployment Retention Policy. Through the use of a fairly simple script that can be deployed using Azure Automation, you can easily clear out all Resource Group Deployments that are older than a specified date.

$Conn = Get-AutomationConnection -Name AzureRunAsConnection
Add-AzureRMAccount -ServicePrincipal -Tenant $Conn.TenantID -ApplicationId $Conn.ApplicationID -CertificateThumbprint $Conn.CertificateThumbprint

$ResourceGroups = (Get-AzureRmResourceGroup | Where-Object { $_.ResourceGroupName -like '*LiquidityOne' })

foreach($ResourceGroup in $ResourceGroups)
{
   Get-AzureRmResourceGroup -Name $ResourceGroup.ResourceGroupName

   $AllDeployments = (Get-AzureRmResourceGroupDeployment -ResourceGroupName $ResourceGroup.ResourceGroupName) | Where-Object { $_.Timestamp -lt (Get-Date).AddDays(-5) }

   foreach($deployment in $AllDeployments)
   {
      Remove-AzureRmResourceGroupDeployment -ResourceGroupName $ResourceGroup.ResourceGroupName -Name $deployment.DeploymentName -Force
   }
}

Using the script above, you can very easily setup a scheduled Runbook within Azure Automation to run on a periodic basis to regularly clean out the stored Resource Group Deployments for a particular Resource Group. Matter of fact, using an Azure Automation variable, you could even use the same script to process through multiple Resource Groups and their corresponding Deployments. This will guarantee that you will never hit the Resource Group Deployment limit again.

Conclusion

This probably seemed like a pretty simple script and a simple problem to be solved. However, some customers get nervous when they starting running into our defined limits. Some of them are very easy to request getting increased, some of them are very easy to work around and some of them are hard and fast. This one just so happened to fall into the second category, but there was not any clear direction on how to work around the problem, so I thought that it would be good to provide a documented solution.

For more information on what all of the Azure Limits are, please check out the following:https://azure.microsoft.com/en-us/documentation/articles/azure-subscription-service-limits/

Comments (2)

  1. Very interesting problem and workaround. At the end, Azure problem got resolved by another azure offerings :).
    Nice to know this and don’t see any wrong with this. Thanks for sharing !

  2. Eric Golpe says:

    So, curious as to what core quotas and capacity commitments you have made, and why a single RG deployment? I would imagine the 800 deployment limit likely keeps customers from shooting themselves in the foot by placing all deployments in a single region, worse a single datacenter within the region (although this is harder to tell without zone info and controlling fault domains well with availability sets, etc.). In the early days of Azure, limits were in the high thousands before customers realized they were creating their own pseudo nightmares with large provisioning times whilst the Azure Fabric Controller determined best placement throughout upgrade domains and host/guest availability. The Azure team rightfully backed the limit down in the latest service model, even with enhanced capacity and vm sizes, but having more info in these types of larger scale deployments jus likely better for all Azure customers in the community IMHO. Thanks for the info.

  3. Eric Golpe says:

    So, curious as to what core quotas and capacity commitments you have made, and why a single RG deployment? I would imagine the 800 deployment limit likely keeps customers from shooting themselves in the foot by placing all deployments in a single region, worse a single datacenter within the region (although this is harder to tell without zone info and controlling fault domains well with availability sets, etc.). In the early days of Azure, limits were in the high thousands before customers realized they were creating their own pseudo nightmares with large provisioning times whilst the Azure Fabric Controller determined best placement throughout upgrade domains and host/guest availability. The Azure team rightfully backed the limit down in the latest service model, even with enhanced capacity and vm sizes, but having more info in these types of larger scale deployments is likely better for all Azure customers in the community IMHO.

Skip to main content