RTC's VOIP delay

 

This blog explains the different components that contribute to the audio delay in a VOIP call, and then discusses RTC's VOIP delay numbers.

When any VOIP call made between any 2 VOIP entities, the end-to-end delay for the VOIP call can be categorized as following:

a. VOIP Sender side delay

b. Propagation delay on the wire between 2 VOIP entities.

c. VOIP receiver side delay

Since our focus is delay characteristics of a device running RTC, let's ignore the propagation delay for now.

On VOIP sender side, the following component contributes to delay in general:

1. Audio Driver capture delay.

2. VOIP stack send delay (RTC media stack send delay)

3. (OS network + network driver) send delay

On VOIP receiver side, the following component contributes to the delay:

4. (OS network + network driver) receive delay

5. VOIP stack receive delay (RTC media stack send delay)

6. Audio Driver playback delay

As you can see, besides VOIP (RTC) stack delay, the end-to-end delay is also dependent on the network driver as well as the audio driver of the devices, which can vary from device to device.

Let us now look at RTC numbers with respect to the above categories.

For RTC, when a peer-to-peer VOIP call is made between 2 CEPCs (x86 machines running CE) sitting side-by-side in a lab, and using G711 codec, the total end-to-end delay measured was around 70msec.

RTC’s send delay (i.e. time spent by an audio packet within RTC media stack only, after RTC receives it from the waveIn and just before RTC submits the packet to socket layer to transmit over the wire) was measured around 2 to 3 msec

RTC’s receive delay (i.e. time spent by an audio packet within RTC media stack only, after RTC receives it from the socket layer and just before it submits the packet to wave API layer to play) is around 35 to 38 msec. The receive delay is more because RTC has an audio healer algorithm to dejitter and heal received audio data.

 

So the total RTC delay (send + receive) is around 38 to 40 msec.

The rest of the time (70 – 40 = 30 msec) is spent in the network + audio driver.

Since a human ear can tolerate a delay of around 200 mec, there is plenty of room for propagation delay.