A Bit Of Service Bus NetTcpRelayBinding Latency Math

From: John Doe 
Sent: Thursday, May 12, 2011 3:10 AM
To: Clemens Vasters
Subject: What is the average network latency for the AppFabric Service Bus scenario?
Importance: High

Hi Clemens,

A rough ballpark range in milliseconds per call will do. This is a very important metric for us to understand performance overhead.

Thanks,
John


From: Clemens Vasters
Sent: Thursday, May 12, 2011 7:47 AM
To: John Doe
Subject: RE: What is the average network latency for the AppFabric Service Bus scenario?

Hi John,

Service Bus latency depends mostly on network latency. The better you handle your connections, the lower the latency will be.

Let’s assume you have a client and a server, both on-premise somewhere. The server is 100ms avg roundtrip packet latency from the chosen Azure datacenter and the client is 70ms avg roundtrip packet latency from the chosen datacenter. Packet loss also matters because it gates your throughput, which further impacts payload latency. Since we’re sitting on a ton of dependencies it’s also worth telling that a ‘cold start’ with JIT impact is different from a ‘warm start’.

With that, I’ll discuss NetTcpRelayBinding:

  1. There’s an existing listener on the service. The service has a persistent connection (control channel) into the relay that’s being kept alive under the covers.
  2. The client connects to the relay to create a connection. The initial connection handshake (2) and TLS handshake (3) take about 5 roundtrips or 5*70ms = 350ms. With that you have a client socket.
  3. Service Bus then relays the client’s desire to connect to the service down the control channel. That’s one roundtrip, or 100ms in our example; add 50ms for our internal lookups and routing.
  4. The service then sets up a rendezvous socket with Service Bus at the machine where the client socket awaits connection. That’s just like case 2 and thus 5*100ms=500ms in our case. Now you have an end-to-end socket.
  5. Having done that, we’re starting to pump the .NET Framing protocol between the two sites. The client is thus theoretically going to get its first answer after a further 135ms.

So the handshake takes a total of 1135ms in the example above. That’s excluding all client and service side processing and is obviously a theoretical number based on the latencies I picked here. You mileage can and will vary and the numbers I have here are the floor rather than the ceiling of relay handshake latency.

Important: Once you have a connection set up and are holding on to a channel all subsequent messages are impacted almost exclusively by the composite roundtrip network latency of 170ms with very minimal latency added by our pumps. So you want to make a channel and keep that alive as long as you can.

If you use the Hybrid mode for NetTcpRelayBinding and the algorithm succeeds establishing the direct socket, further traffic roundtrip time can be reduced to the common roundtrip latency between the two sites as the relay gets out of the way completely. However, the session setup time will always be there and the Hybrid handshake (which follows establishing a session and happens in parallel) may very well up to 10 seconds until the direct socket is available.

For HTTP the story is similar, with the client side socket (not the request; we’re routing the keepalive socket) with overlaid SSL/TLS triggering the rendezvous handshake.

I hope that helps,
Clemens