HTTP Keep-Alive is not always your friend - but math is!

In HTTP 1.1 connections are reused by default. This means that if you make two HTTP requests after each other you can do it over the same TCP connection. This saves you the overhead of setting up a new TCP connection. This is even more important if you're using HTTPS since the SSL handshake to setup the secure communication is relatively expensive for the server. That is why you typically want to reuse connections when you can. if you're using an HTTP client library (such as HttpClient in .Net) the client library will open a few connections if needed (this is configurable and the default is two) and then reuse them as long as there is new requests pending soon enough. but this can also be a problem if your server is a cluster such as a few instances in an Azure deployment. The problem occurs when you have few (relatively speaking) clients generating a lot of requests on your servers.

This is easily illustrated with an example. Let's assume that you have two servers behind a VIP (i.e. for each new TCP connection one server is chosen using a round robin selection) and three clients who each need one TCP connection but do 10 requests per second using this connection. Client one will first open a connection to server one, client two connects to server two and then client three connects to server one. And now we have an uneven load of 20 requests per second on server 1 and 10 requests per second on server 2. This might not be too bad if we had 101 clients and two servers given that each client is equal but that is probably not the case. Consider for example the case where client two in the example above only generates one request per second and closes the connection between each request. Now server one will alternate between 20 and 21 requests per second while server two has zero or one request per second. If you have a mix of these short lived connections and long lived connections and assuming that even the long lived connections once in a while will be closed, then you will notice that some of your servers will have much more load than some others. This is probably not desirable since random servers will have very high load at random times while a lot of servers will have less than average load. This problem is called connection affinity.

You might think that a smarter VIP choosing where to route new connections based on capacity would help and yes it will. A little. The problem is that once a TCP connection is established the VIP cannot do anything to reroute it. At least not your vanilla VIP. Maybe there is some super advanced HTTP inspecting VIP out there but you probably don't have one so don't bother thinking about that too much.

What you want to do is to let a few requests be handled by the server on each TCP connection to get the best of both worlds; reuse a connection for a while to reduce connection overhead but once in a while force the client to connect again so that gout VIP can load balance. While this is definitely possible the overhead to keep a list of all clients currently connected in your web service will waste some memory and CPU cycles and instead you can let math help you. If you want to reuse each connection N times on average you can randomly choose to close a connection after each request with the probability of 1/N. This works great for a mix of long lived and short lived connections since the long lived connections will be closed on average every N requests (trust math!) while short lived connections with just a few requests are likely to not be closed prematurely.

You might be temped to just have a global counter and close connections every time your total counter hits N. This does not achieve what you want. There is a famous problem called the coupon collector's problem that tells us that if you have N options that are equally probably the number of picks you need to make in order to expect to have picked all N options is N ln(N). That means that if you have C connections and close every N requests it will take C N ln(N) requests before you can expect to have closed all connections so he average life time of each connection is going to be larger than N. Once you add a number of short lived connections it gets even worse. Trusting randomness is much easier and more accurate!