Making stuff go fast on a network, part 1

Article
05/09/2006

So yesterday, I commented on how I was glossing over lots of details about how to make a client/server protocol operate efficiently on a network. After I wrote it, I realized that while it's out of scope for a single post, how to make a client/server protocol work efficiently can be part of a series.

So what does it mean to make a program "go fast" on a network? Well, as I intimated the last time, it's "complicated". To explain why, I've got to explain how traffic flows on a network. First off, a caveat: The last time I did any serious work in network performance analysis, it was on 10base-T networks, and the performance characteristics of gigabit networks are likely to be somewhat different. On the other hand, as far as I know, all the principles I'm thinking about still apply.

Let's start with some networking basics. For the purposes of this discussion, I'm going to limit my discussion to connection oriented protocols (like TCP, SPX or NetBEUI). I'm not going to discuss datagram based protocols (like IP, UDP and IPX), because (in general) datagram based protocols don't have any of the "interesting" semantics I'm describing.

First, some definitions:

When you're transferring data on a connection oriented protocol, there are two principals involved, the sender and receiver (I'm not going to use the words "client" and "server" because they imply a set of semantics associated with a higher level protocol.
A "Frame" is a unit of data sent on the wire.
A "Packet" is comprised of one or more frames of data depending on the size of the data being sent.
A "Message" is comprised of one or more packets of data and typically corresponds to a send() call on the sender.

And some axioms:

Networks are unreliable. The connection oriented protocols attempt to represent this unreliable network as a reliable communication channel, but there are some "interesting" semantics that arise from this.
LAN Networks (Token Ring, Ethernet, etc) all transmit messages in "frames", on Ethernet the maximum frame size (or MSS, Maximum Segment Size) is 1500 bytes. Some of that frame is used to hold protocol overhead (around 50 bytes or so), so in general, you've got about 1400 bytes of user payload for in each frame available (the numbers vary depending on the protocol used and the networking options used, but 1400 is a reasonable number).
A connection oriented protocol provides certain guarantees:
1. Reliable delivery - A sender can guarantee one of two things occurred when sending data - the receiver received the data being sent or an error has occurred.
2. Data ordering - If the sender sends three packets in the order A-B-C, the
  receiver needs to receive the packets in the order A-B-C.

Now then, given axiom #1 (Networks are unreliable), how do you ensure reliable communication?

Well, the answer to that question really depends on the answer to a different question: "How does the sender know that the receiver has received the data?" Remember that the underlying network is unreliable - there are no guarantees that the data is received (although the network promises to do its best to guarantee that the data's been received). To ensure that the receiver has received the data, the receiver tells the sender that it's received the data with a special packet called an ACK (for acknowledgment).

If a sender doesn't receive an ack within a timeout period, it will retransmit the data. It's ok to lose an ACK, the cost of losing an ACK is a retransmission of the data - which is exactly the same as if the receiver hadn't received the ack in the first place. The receiver knows that the sender has received the ACK because it stops retransmitting the packet (each packet sent by the client has a tag (a sequence number or index of some form) that allows the receiver to identify which packet is being sent).

So the data flow is (more-or-less):

Sender	Receiver
Send Packet A
	Acknowledge "A"
Send Packet B
	Acknowledge "B"
Send Packet C
	Acknowledge "C"

You'll notice that to send 3 packets of user data on the wire, you end up sending 6 packets! Remember from my comment yesterday: It takes effectively the exact same amount of time to send 1 byte of data as it does 1400 bytes. So even though the ACK packet is very small, it means that you spend twice the time sending a packet - the data and the ACK.

Now then, what happens when you want to send more data than fits in a single packet? 1400ish bytes isn't very much, protocols often send far more than that data in a single send() call, and the transport needs to deal with that. Well, the sender breaks the request into frames and sends them in turn:

Sender	Receiver
Send Packet A, Frame 1
	Acknowledge "A.1"
Send Packet A, Frame 2
	Acknowledge "A.2"
Send Packet A.Frame 3
	Acknowledge "A.3"

There is an interesting consequence to that last guarantee: It turns out that the sender can't start transmitting B until after it knows that the receiver has received A. Why is that?

Well, remember our third axiom? Part B says that a connection oriented transport guarantees data ordering. This directly means that the sender can't start transmitting B until it knows A has been received. If that wasn't the case (the sender started transmitting B before A was acknowledged), then the receiver might receive B before A, which would violate the 3rd axiom. Now it's possible that the receiver might hold onto B and not hand it to the application until it has received A. But that means that some amount of the data buffers on the receiver will be held by the data for B. If A never arrives, you've just wasted that data. It's easier to simply never transmit B before A's been acknowledged and that way you guarantee ordering.

Ok, so far so good, that's the basics of data transmission. But why do you have to acknowledge each packet within the larger transmission? Surely there's a better way.

And there is, it's called a sliding window algorithm. With a sliding window algorithm, the transmission looks like:

Sender	Receiver
Send Packet A. Frame 1
Send Packet A. Frame 2
Send Packet A. Frame 3
	Acknowledge "A.1, A.2, and A.3"

Now the actual algorithm for the sliding window is way beyond the scope of this already too long article, but you get the idea - the receiver waits for a couple of packets to be received and acknowledges all of them.

Again, this is remarkably simplified, if you care about the details, consult your nearest protocol manual.

Tomorrow: Ok, that's cool, now we know how to send data. But what good is that?

Edit: Rearranged some text because I accidentally left a really important thought hanging.

Making stuff go fast on a network, part 1

Additional resources