After a service is feature-complete and before going to production, it is typical to test for performance to ensure that your service will behave well under load. On-premises this often meant bombarding a dedicated performance test environment from which ever client machines you could muster, driven by the limits of client scale. In the cloud, scaling the client resources is no longer any issue, while cost of extra environments and usage is more prominent, such that the approach to performance testing should be adapted.
Scaling the client requests
Cloud computing clients can both scale up and out. Virtual Machines or worker nodes can trivially be scaled using cloud management interfaces. For instance, with IaaS, Azure’s VM scale sets allow to “create thousands of identical virtual machines in minutes” [sic]. Or high-end performance computing VMs can be procured within minutes from the Azure VM H series. Each instance can generate a large amount of parallel threads and requests or even accumulate large count of asynchronous requests without specific multi-threading. Visual Studio makes it easy to run performance tests from either on-premises or cloud.
What services do
Services attempt to satisfy reasonable client requests. As request load increase, services will also scale out and/or up. As scaling is relatively easy and quick (minutes), cloud services no longer over-provision, unlike on-premises services where procuring new servers could take days or weeks. Hence cloud services tend to choose cost-effective provisioning strategies where the resources provisioned at a given point of time are only marginally larger than the need. In aggregate clients request load is reasonable to forecast with sufficient buffer in the provisioned resources margin to scale ahead of the load increase.
Spikes in load are atypical but could temporarily exceed the service’s provisioned resources.
Services will typically have several performance optimizations by their owner to reduce the amount of resources required to satisfy a client’s request. Of particular interest here is that services tend to cache information about recently active clients such that they may satisfy further requests from the same clients faster.
Services often define limit to client requests, e.g. Azure Service-specific limits. This further protects the service from load spikes.
To enable such response a service accounts the request count per client for fair throttling and (optionally) computes a meaningful retry-after.
An easy approach for this is to off-load the implementation of throttling to Azure API Management by providing the service’s specific throttling criteria.
Tipping the scale
Poorly behaving clients will ignore the service’s response details and either or both immediately retry the request or send further additional requests. As accounting per client request count for fair throttling and generating 429 answer with meaningful retry-after value requires some computing resources and the request-response holds connection resource, however small, a massive amount of client requests can tip the service down. Then other clients start to see degradation in the service’s response (e.g. time-outs or 5xx errors) although their requests were fair. The massive number of requests from a single client becomes then a Denial of Service attack commonly referred to as DoS. With the cloud, it is trivial to scale out and a DoS attack then becomes a DDoS attack (distributed DoS).
Azure API Management itself is a service and can be subject to DoS or DDoS attacks. That service specializes on request routing and throttling such that its code is reasonably optimized to resist -but not immune to- such attacks.
When the DoS / DDoS attack is intentional by the client, the ISP and/or service owners will eventually black list the client and no longer attempt to service the malicious requests with 429 or retry-after response header. This topic is not the focus of this article.
What clients should do
Before developing a performance test, check out the service’s documentation to discover what limits are available to you. We’ve seen clients issue 60 thousand requests to services which current per user limit was 300 for that same time interval. When you are charged by the request, it is a pity to pay for 59,700 times a 429 answer.
Honor service’s response
As mentioned earlier, when a client has exceeded its allocated limit, the service will start answering with 429 and potentially a retry-after response header. Mind your test code and don’t ignore the service answer. Do retry after the provided delay, do hold off further requests.
Scale with purpose, no unbounded test
In production, your application is not going to send request to down-stream services just for the sake of it. It is going to do so based on actual client requests. So, set yourself a target of performance you wish to achieve – say a number of requests per second and/or a throughput – and test for that. E.g. if you target 10 TPS, author a test that will issue up to but no more than 10 TPS.
Test from warm state
In production, your application is typically going to have a sustained load with some variability. It is typically not going to go from no load at all to a large spike. You should reproduce behavior alike production in your performance tests; ensure that a low, non-zero volume of requests is issued by your client to the service before ramping up the load into stress zone.
If you do not start your performance test from a warm state, so-called cold-start, you will defeat service side attempt at caching and may fail to measure the actual performance capability of the service under production load.
Data-driven performance test
The golden standard of testing is if you have instrumented your application in production and have actual field data as to what is the typical and maximal load your application works under. You can then apply a reasonable growth projection and margin, then perform your test to that specific target. It is the most meaningful test because you are testing base on your known needs. This option is only available to mature applications which owner have instrumented to collect data from the field.
Give a heads-up
As a service developer, you may be called upon yourself to perform on-call duties or assist in the event of a live-site incident. With globalization of the service’s customers, service availability hence on-call duties become a 24/7 job. And then there is that 3 o’clock in the night incident which everyone loves. When you are planning your next performance stress test, have a thought for the owners of the service(s) you are going to stress and give the owners a heads-up. Most Azure services have open support forums on MSDN and sometimes e-mail aliases for you to reach out.
(Intentional) security tests
Services also need to be protected and tested against DoS or DDoS. The purpose of this article is not to negate the existence of that security concern and the need to address it. It is simply not the focus of this article.