Azure API Management provides really good capabilities for usage throttling. This is useful in scenarios such as defending against a denial of service attack and protecting back-end services against a huge influx of requests to your API layer. It can also be used to provide tier based access restrictions to your customers as a feature.
To implement such throttling capabilities, API management provides abilities to either limit the rate or limit the overall quota for a given subscription. The options you have for implementing these limits are -
- Limit call rate by key: Limits the call rate for a particular subscription key.
- Limit call rate by subscription: Limits the call rate for a particular subscription. Applies to all the keys in a given subscription.
- Set Usage quota by Subscription: Limits the overall usage (call volume) for subscription over its lifetime or a period of time. E.g. 1000 calls per month.
- Set usage quota by Key: Limits the overall usage (call volume) for a particular subscription key over its lifetime or a period of time.
Now at a first glance, both the call rates and usage quotas seem to be for controlling the number of calls over a given period of time. For instance, look at the examples of setting call rate and usage quota by subscription, both of them need a specific timeframe.
<rate-limit calls="number" renewal-period="seconds">
<api name="name" calls="number" renewal-period="seconds">
<operation name="name" calls="number" renewal-period="seconds" />
<quota calls="number" bandwidth="kilobytes" renewal-period="seconds"> <api name="name" calls="number" bandwidth="kilobytes"> <operation name="name" calls="number" bandwidth="kilobytes" /> </api> </quota>
So when do we use quota versus the call rates?
As a guideline, consider that call rates are usually used to protect against short intense volume bursts. For instance, if you know your backend service gets choked on its database with a high call volume, you would set your API management to not allow high call volume by using this setting. In such cases, this setting can be set to not allow say more than 100 calls every minute.
On the other hand, usage quotas are more for controlling call rates over a longer period of time. Usage quotas for instance can determine the total number of calls in a given month. For monetizing your API, this can also be set to tier based subscriptions where a Basic tier for instance can make no more than 10,000 calls a month but a Premium tier can go up to 100,000,000 calls each month.
From the API management implementation standpoint, the rate limits information is understood to be for a shorter duration (less than 5 minutes) and hence any changes in rate limits are propagated faster across the nodes to protect against spikes. The usage quota information on the other hand is expected to be used over a longer term and hence its implementation is different.
In summary, call rates protect against short intense volume bursts and usage quotas for longer duration access restrictions and tier based monetization scenarios.
Hope this helps.