Making things go fast on a network, part 3

Now I want to pop the stack up a bit and talk about messages.  At their heart, connection oriented protocols are about establishing pipes between senders and receivers - they don't NEED to have semantics beyond that.

But applications are different.  They typically fall into some stereotypical use patterns, the most common of which is the client/server usage pattern.  A client sends data to the server, the server responds.  The role of sender and receiver alternate between the client and the server.

In the client/server pattern, the client sends a "message" to the server, the server responds with its own "message".  A "message" may be formed from multiple packets (send()s) but not always.

Every "message" is self describing - there needs to be a mechanism that allows the server to know it's received all the data that the client sent.  That mechanism may be a length prepended to each message, it might be a magic "terminating sequence" (CR/LF is a very common magic terminator).  Often the semantics of a message are defined by the API being used to send the data - for example, the NetBIOS networking API includes a length of the data being sent, the receiver of the message is guaranteed to receive a block of the same length that was sent, regardless of fragmentation (RFC1001/RFC1002 define how NetBIOS API semantics are implemented over TCP/IP if anyone cares).  In other cases, the semantics of the message are defined by the protocol being implemented.  For example, POP3, IMAP4, NNTP and SMTP define their messages as CR/LF delimited strings, while LDAP uses ASN.1's semantics for defining messages.

But however a message is defined, there is still a request/response semantic associated with client/server protocols.

From an application level, here's what happens on the wire when you have a client/server interaction at the application level:

Client Server
Send message request "A"  
  Send message with response "A"

But as we discussed earlier, that's not REALLY what happens.  Each of those messages is a packet, and that means there has to be an acknowledgment between the two.

Client Server
Send message request "A"  
  Acknowledge receipt of request "A"
  Send message with response "A"
Acknowledge receipt of response "A"  

Now it gets REALLY interesting when you string lots of client/server request/response sequences together:

Client Server
Send message request "A"  
  Acknowledge receipt of request "A"
  Send message with response "A"
Acknowledge receipt of response "A"  
Send message request "B"  
  Acknowledge receipt of request "B"
  Send message with response "B"
Acknowledge receipt of response "B"  
Send message request "C"  
  Acknowledge receipt of request "C"
  Send message with response "C"
Acknowledge receipt of response "C"  
Etc.  

Remember that on local area networks, the time to send a given packet is the same, regardless of the payload size.  It would be unbelievably cool if there was a way of combining the acknowledgement with the response to an operation - that would double your bandwidth since you'd be sending half the number of packets.

And it turns out that several people independently came to the same solution.  In the NetBEUI protocol the feature's called "piggy-back acks", in TCP, it's called the "Nagel Algorithm" (after the person who invented it).  When you turn on piggy-back acks (or nagling), the sequence above becomes:

Client Server
Send message request "A"  
  Acknowledge receipt of request "A" and send message with response "A"
Acknowledge receipt of response "A" and send message request "B"  
  Acknowledge receipt of request "B" and send message with response "B"
Acknowledge receipt of response "B" and send message request "C"  
  Acknowledge receipt of request "C" and send message with response "C"
Acknowledge receipt of response "C"  
Etc.  

It halves the number of frames.  But there's a tricky bit here - it only works if the application's going to send a response to the client - if not, it needs to send the acknowledgement.  And that's where things get "interesting".  In order to give the server time to send the response to the client, the transport holds off on sending the ack for a short amount of time (somewhere around 200 ms typically).  If the server responds to the client within 200 milliseconds, everything's copasetic.  If the server doesn't respond, the receiver sends the acknowledgement and nothing happens.

But what happens if you DON'T use the request/response pattern?  What happens if your protocol involves multiple messages from the client to the server? 

This isn't as silly an idea as you might think - for example, in CIFS, the client and server negotiate a common message size - the client is prohibited from sending a block larger than the server's block size and the server is prohibited from sending a block larger than the client's buffer size.  If a CIFS client needs to send tons of data to the server, it would make sense to break the requests up into server block size chunks and shotgun them to the server - the client issues async sends for all the requests to the server and waits on the transport to deliver them as best as it can.

Client Server
Send message "A.1"  
Send message "A.2"  
Send message "A.3"  
  Respond to A.1..A.3

On the surface, it seems like a great idea, except for the fact that (as I mentioned in my first article) the sender can't start sending message A.2 before it's gotten the acknowledgment for A.1.

Client Server
Send message request "A.1"  
  Acknowledge receipt of request "A.1"
Send message request "A.2"  
  Acknowledge receipt of request "A.2"
Send message request "A.3"  
  Acknowledge receipt of request "A.3"
  Send response to A.1..A.3
Acknowledge receipt of response to A.1..A.3  
Etc.  

But if nagling is involved, things get REALLY ugly:

Client Server
Send message request "A.1"  
  Wait 200 ms waiting for the server to respond
  Acknowledge receipt of request "A.1"
Send message request "A.2"  
  Wait 200 ms waiting for the server to respond
  Acknowledge receipt of request "A.2"
Send message request "A.3"  
  Acknowledge receipt of request "A.3" and send response to A.1..A.3
Acknowledge receipt of response to A.1..A.3  
Etc.  

All of a sudden, the performance "optimization" of nagling has totally killed your performance.

This leads to Larry's "Making things go fast on the network" rule number 2:

You can't design your application protocol in a vacuum. You need to understand how the layers below your application work before you deploy it.

In this case, the designers of TCP/IP realized that the nagle algorithm could cause more harm than good in certain circumstances, so they built in an opt-out.  The opt-out has two forms: First, if you're sending "large" packets (typically more than 16K) nagling is disabled.  Second, you can disable nagling on an individual socket basis by calling setsockopt(socket, ..., SO_NODELAY,...).  But your better choice is to understand how the underlying network works and design your application protocol to match.  Yes, that's a leaky abstraction, but this article is about making things faster - when you're trying to make things fast, you have to understand all the layers.

The CIFS protocol has a couple of protocol elements that use the "send multiple messages" pattern mentioned above, when we were doing NT 3.1, it became clear that the performance associated with nagling/piggyback acks was killing our performance, I wrote about that two years ago here (I thought I'd written this stuff out before).