TCP Keep Alive

Article
12/12/2007

How do I detect when the other side of a TCP connection has gone away? Does TCP keep-alive take care of this for me?

Although we take it for granted that change can be quickly detected for closely connected components, it turns out to be surprisingly difficult to detect change when two machines are isolated by more than a simple wire. Even a really big change to the system, like one of the machines disappearing, is hard to spot.

Detecting that the other side has disappeared is a common request because, on a server, knowing that the client has dropped the connection allows you to clean up resources much faster. The TCP transport sometimes gives you quick notifications by aborting the session that the connection has been dropped. However, there's no guarantee that the transport will be able to detect that the other side has gone away. That's because notification of a TCP connection reset has to travel just like any other piece of data and can be lost or redirected along the way, if it was sent at all. The only sure thing is that the next time you attempt an IO operation, you'll find out if the channel was still good or not.

If you're unhappy waiting for the next IO operation, then you can make IO operations happen faster. The basic concept is to have a cheap IO operation that does nothing but bounce between the two parties. This is sometimes called a heartbeat and is exactly what takes place when you talk about TCP keep-alive. However, the standard keep-alive interval for TCP is 120 minutes, which is probably worse than your current latency for detecting change. By default, a service gets bored waiting after about 10 minutes and gives up. The chance of a keep-alive happening between the time that a client disconnects and the service notices it is pretty small.

If you want something faster but don't want to change the timeouts, then you can take the basic concept into your own hands. One approach is to create a keep-alive method on your service contract that does nothing but let you trigger IO operations at a frequency you desire. Another approach is if you control both ends and don't want to change your service contract, then you can do the same thing in a protocol channel and swallow those messages so that the service never has to see them.

Next time: Collections without CollectionDataContract

TCP Keep Alive

Additional resources