Making things go fast on a network, part 2

Sorry about the delay - I got nailed by a nasty sinus infection on Tuesday night last week that took me out until today.


In my last post I started discussing some of the aspects of networking that need to be understood before you can make things "go fast" on a network.

As a quick recap, here are the definitions and axioms from that post (note: I've changed the definitions of packet and message because they make the subsequent articles easier):

First, some definitions:

  1. When you're transferring data on a connection oriented protocol, there are two principals involved, the sender and receiver (I'm not going to use the words "client" and "server" because they imply a set of semantics associated with a higher level protocol.
  2. A "Frame" is a unit of data sent on the wire. 
  3. A "Packet" is comprised of one or more frames of data depending on the size of the data being sent, it typically corresponds to a send() call on the sender
  4. A "Message" is comprised of one or more packets of data and typically corresponds to a higher level protocol verb or response.

And some axioms:

  1. Networks are unreliable.  The connection oriented protocols attempt to represent this unreliable network as a reliable communication channel, but there are some "interesting" semantics that arise from this.
  2. LAN Networks (Token Ring, Ethernet, etc) all transmit messages in "frames", on Ethernet the maximum frame size (or MSS, Maximum Segment Size) is 1500 bytes.  Some of that frame is used to hold protocol overhead (around 50 bytes or so), so in general, you've got about 1400 bytes of user payload for in each frame available (the numbers vary depending on the protocol used and the networking options used, but 1400 is a reasonable number).
  3. A connection oriented protocol provides certain guarantees:
    1. Reliable delivery - A sender can guarantee one of two things occurred when sending data - the receiver received the data being sent or an error has occurred.
    2. Data ordering - If the sender sends three packets in the order A-B-C, the
      receiver needs to receive the packets in the order A-B-C.

At the end of the last post, I introduced one consequence of these axioms:  When sending packets A, B, and C, the sender cant transmit packet B until the receiver has acknowledged receipt of packet A.  This was because of axiom 3.b.

There's one thing I forgot in my last post:

What happens when the receiver isn't ready to receive data from the client?

Well, it's not very pretty, and the answer depends on the semantics of the protocol, but in general, if the receiver doesn't have room for the packet, it sends a "NACK" to the client (NACK stands for Negative ACKnowledgement).  A NACK tells the client that there's no storage for the request, the client now needs to decide what to do.  Sometimes the NACK contains a hint as to the reason for the failure, for instance the NetBEUI protocol's NACK response includes the reasons like "no remote resources, unexpected request, out of sequence".  The client can use this information to determine if it should hang up the connection or retry (for no remote resources, for instance, it should retry).

Sender Receiver
Send Packet A.1  
  Send ACK A.1
Send Packet A.2  
  Send NACK A.2 (No Memory)
Send Packet A.2  
  Send NACK A.2 (No Memory)
Send Packet A.2  
  Send ACK A.2
Send Packet A.3  
  Send ACK A.3

All of this retransmission goes on below the covers, applications don't typically need to know about it.  But there's a potential perf pitfall here.


If you're analyzing network traces, you often see this pattern:

Sender Receiver
Send Packet A.1  
  Send NACK A.1 (No Memory)
Send Packet A.1  
  Send NACK A.1 (No Memory)
Send Packet A.1  
  Send NACK A.1 (No Memory)
Send Packet A.2  
  Send ACK A.2
Send Packet A.3  
  Send ACK A.3

What happened here?  Well, most likely the problem was the receiver didn't have a receive buffer down waiting for the sender to send data, so the sender had to retransmit its data to the receiver before the receiver got around to being able to receive it.

So here's "Making things go fast on a network" perf rule number 1:

        Always make sure that you have a receive request down BEFORE someone tries to send you data.

In traditional client/server networking, this rule applies to clients as well as servers - a client that doesn't have a receive outstanding when the server sends the response to a request, it will stall in the same way waiting on the client to get its receive down.


Btw, a piece of silly trivia.  J Allard, of XBox fame used to have two iguanas in his office named ACK and NACK back when he was the PM for NT networking.


Comments (12)

  1. Sriram says:

    I think you meant "What happens when the receiver isn’t ready to receive data from the *sender*?"

    You weant and used the ‘c’ word 🙂

  2. Norman Diamond says:

    > If you’re analyzing network traces, you often see this pattern

    I’m confused.

    Due to windowing I wouldn’t be surprised to see sends of A.1, A.2, and A.3 before the first NACK for A.1.  Depending on the protocol and the recipient’s buffering policy I wouldn’t be too surprised to see ACKs for A.2 and A.3 while waiting (prepared) for A.1 to be retransmitted.

    But if the sender repeats A.1 due to NACKs, then why did the sender ever proceed to A.2 in the first place?  If the sender is retransmitting A.1 until it gets an ACK then how does your pattern ever get seen?

  3. nick says:

    "Always make sure that you have a receive request down BEFORE someone tries to send you data."

    Thats why you have per-connection receive buffers in OS-es, unless you turn them off (set the buffer size to 0 for a connection). For instance, on Windows, Winsock sets up an initial buffer for both receives and sends….I believe the default buffer sizes are ~ 32 KB, I might be mistaken (easy to check with getsockopt).

    So there’s no desperate need to race to use a recv on a TCP client application for example. Just make sure you always have a recv posted *before* the recv buffer fills up.

    As you said, servers and clients are symmetrical in this respect (i.e. once the initial handshake is done) so the above observations apply equally to them.

    From an MSDN article on writing Scalable apps

    [ ]

    "As just mentioned, AFD.SYS handles buffer management for applications that use Winsock to talk to the transport protocol drivers. This means that when an application calls the send or WSASend function to send data, the data gets copied by AFD.SYS to its internal buffers (up to the SO_SNDBUF setting) and the send or WSASend function returns immediately. The data is then sent by AFD.SYS behind the application’s back, so to speak. Of course, if the application wants to issue a send for a buffer larger than the SO_SNDBUF setting, the WSASend call blocks until all the data is sent.

    Similarly, on receiving data from the remote client, AFD.SYS will copy the data to its own buffers as long as there is no outstanding data to receive from the application, and as long as the SO_RCVBUF setting is not exceeded. When the application calls recv or WSARecv, the data is copied from AFD.SYS’s buffers to the application-provided buffer."


    "Turning off receive buffering in AFD.SYS by setting SO_RCVBUF to 0 offers no real performance gains. Setting the receive buffer to 0 forces received data to be buffered at a lower layer than Winsock. Again, this leads to buffer copying when you actually post a receive, which defeats your purpose in turning off AFD’s buffering."

    NOTE: You also can get major perf loss if you do this and then fail to post recvs in a timely fashion.

  4. mjb says:

    maybe I’m being obtuse, but shouldnt the last Send Request of A.1 (of 3) have gotten an ACK?

  5. Tom H says:

    Here again, Larry, I think your presentation may be NetBEUI-centric (or some other protocol that I’m not familiar with). TCP handshaking establishes a receive window size; the sender can’t send until it knows there’s a nonzero receive window. Stevens has a basic discussion of this in Chapter 20 of Volume 1 of TCP/IP Illustrated and spends some time talking about bandwidth-delay product and determining the optimum window size.

  6. I don’t think there are many developers today that actually get this. They are using abstractions that hide all this. Really nice to get it mentioned.

  7. Dave Beaver says:

    Reply to nick about recv:

    Yes, it’s not necessary to get the receive down before the data arrives at the node. But the focus of these articles is speed, and if you’re interested in speed, you want to have that receive in place. Why? Because the OS will go out of it’s way to land the data into the buffer you provide as it’s received. It does this for the obvious efficiency reasons; if it doesn’t, a data copy is required to get the data from the transport buffer to the application buffer. And that copy is relatively expensive.

    Note that this apparently even more true in Vista/Longhorn, from my (admittedly brief) looks at it so far.

  8. Dave’s absolutely right – you don’t have to have the receive down, unless you care about perf.  If you care about perf, then the additional data copy can kill your perf.  After all, what would you rather your CPU be doing – copying data from an in-kernel buffer or doing work in your application?  If you want the CPU to be doing your work, get the receive down before the sender sends it.

    Tom, you’re right, my knowledge IS NetBEUI-centric, but I think you’ll discover that the principles involved apply on TCP.  The details are absolutely different (TCP’s sliding windows are based on offsets, not packet sequence numbers, etc) but at a high enough level it doesn’t matter.

    In particular, the axioms, corrolaries and rules I’ve mentioned apply pretty-much universally.

  9. Tom H says:

    Larry, the axioms you’ve stated are all true, but then I guess I didn’t understand the rule:

    "Always make sure that you have a receive request down BEFORE someone tries to send you data."

    From what Dave’s said above, and your agreement with him, I interpret this as being that you should have let the OS know what buffer to put the data in before the network delivers the data. That makes perfect sense in a high-performance application, but I didn’t have a clue what "hav[ing] a receive request down" meant, and even after your explanation I wonder if that is NetBEUI terminology?

    I remember lots of research at UCSD in the early 1990s about reducing buffer copies and boundary crossings when you receive network packets. Good to know that some of it seems to have seen light. This kind of optimization sounds much easier to do with transport protocols that expose the frame explicitly, like UDP and (according to Google?) NetBEUI, but I suppose you could do it in TCP too, even though the application can’t see frame boundaries.

    But in TCP, the implementation is really expected to have buffer space in the OS, and the flow control protocol lets the client know how much buffer space there is, so you should never see the "No Memory" kind of response. The drawback is that the protocol had some trouble with high bandwidth-delay connections (standard flow control only allows 64kB of buffer space), but I think they’ve extended around that by now.

    (You refer to corrolaries and rules, but I see only the one rule bolded or otherwise picked out between this and the previous post – am I missing something?)

  10. Oh, Larry, Larry, Larry…

    Articles 1 and 2 were great – really necessary reading to a lot of would-be…

Skip to main content