Handling "Service Unavailable" error easily in Azure Logic Apps

As the on-call engineer for the week in Azure Logic App, I worked with our customer support team to assist a Logic Apps customer dealing with intermittent ServiceUnavailable errors over the past few months. That customer was nesting Logic App within another and once in a while a call to the nested Logic App would result in Status: 'Failed'. Code: 'ServiceUnavailable'.

What was happening internally is that our attempt to check point the new run in our internal storage account queue failed, as may happen (storage, like Logic App, while having 99,99% SLA is not 100% and once in a while within the remaining 0,01% there will be that odd ball failure). By default Logic App would retry the request for given HTTP status codes considered as retry-able. For instance 400 Bad Request is documented by W3C in RFC as not to be retried, and Logic App doesn't retry it by default, but 500 Internal Server Error or 503 Service Unavailable do get retried by default by Logic App. For advanced users, Logic App service provides a way for customers to configure a custom retry policy on their app. This governs how the service retries actions in case of failures. Either of the default behavior or a custom retry policy are useful to counteract the effect of such blips in service availability.

In this customer’s Logic App definition, the retry policy was specified as none. Hence, the service did not retry if such transient errors happen, as per the explicit customer configuration. We advise to consider using a retry policy. If one wants the default policy provided by the service (this is maximum of 4 retries within a minute or so), one can simply remove the section from the definition that says retry policy none. If one wants to configure something more custom, one can use the document linked here for guidance.