First a bit of terminology that will help keep things clear. For the purposes of this discussion I will use the term packet to mean an audio/video unit of data. Specifically this is an ASF packet based on the ASF file specification which includes WMA and WMV. When I refer to a frame, I’m talking the individual network unit that we see in a network capture utility. The network frame is typically going to be a maximum 1500 bytes whereas the audio/video packet can be much larger.
Next a bit of history about the fragmentation and bursting that you’re seeing. Over 10 years ago when we created Windows Media Services and Windows Media Encoder we created content in a way that was the most efficient way possible from a CPU standpoint. The encoder would create packets that were most efficient for the amount of data that was being encoded. This packet size is based on the audio bitrate and sampling rate because we use that as our timing track. With audio there are significantly more samples per second. The packet sizes for lower bitrates (back in that day that could have been 10 kbps of audio) would easily fit inside of the Ethernet frame size of 1500 bytes. At larger bitrates we would simply allow the network stack break up the packet to fit within the 1500 byte frame size. The network stack did this in kernel mode which was more efficient anyway. So you would end up with one packet spread over multiple network frames.
Because the encoder-server connection is over HTTP and HTTP is a TCP based protocol, you get the built-in flow control with TCP. The encoder can send two frames of data to the server and then must wait for an acknowledgment from the server before the encoder sends more data. Once the server receives the TCP frames it reassembles the same original packet that was spread over multiple frames. Next the server will buffer the data in memory and send it out based on the presentation time for the packet.
For unicast UDP or multicast (UDP) streams you will typically see two problems. First, when WMS sends a large packet out through UDP, UDP will fragment the packet into packet into multiple network frames. However, since there is no flow control with UDP the entire packet and the resultant fragmented frames will be sent out on the network at the same time. Remember, with TCP the same thing essentially happens, but the Encoder can only send 2 frames before having to wait for an acknowledgement. The second issue is that WMS will buffer approximately 500 milliseconds (half of a second) worth of data. It does this for efficiency reasons since it’s more efficient to do a buffered write than lots small writes. This is actually can become a positive if you have hundreds / thousands of unicast client connections. With a multicast this means that not only are you sending out multiple fragments from a packet at the same time, but you see multiple packets (usually only 2 or 3) being sent out as a burst.
For small networks this usually isn’t a problem. I say ‘usually’ because some routers just really don’t like fragmented UDP frames despite this being allowed by the RFC. But I’ve found that in large enterprise environments, in particular ones that have lots of VLANs, this becomes a problem. If you look at the bitrate coming out of WMS and average that over a full second then you will see the bitrate at which you encoded the video. However, if you look at the millisecond level you will see that there may be as many as 50 frames of video that have timestamps only a few milliseconds apart. This instantaneous measure of bitrate can easily be 20 Mbps. Of course this can vary a good bit up or down depending on various things like the original encoded bitrate. So 20 Mbps really shouldn’t be too bad, but when that must be copied to multiple VLANs on a switch in a very short amount of time the switch may drop some of the frames. You get an amplification effect at the switch. For each VLAN that the switch must copy the multicast to you must use that much more bandwidth. So with our 20 Mbps instantaneous spike copied to 8 VLANs you get a spike of 160 Mbps. Many switches won’t be able to handle this.
There are actually two KB articles that discuss what to do. First, you should reduce the packet size at the encoder. If you have prerecorded content that was encoded with an unrestricted packet size then you may need to reencode. The article describes includes a table that describes the optimum maximum packet sizes:
The video occasionally is rebuffered, or frames are dropped when you try to stream a video through a multicast from Windows Media Services to Windows Media Player clients
Note that for Expression Encoder 4 the maximum packet size is set for a live stream on the Encode tab. The encode tab has a triangle pointing down just under the audio output format.
To change the bursty behavior with WMS, you need to adjust the TCPBurstLimitMS and UDPBurstLimitMS as per the following article.
Windows Media Services 9 Series network send behavior may lead to an unwanted client experience
To have the best experience you need to change both settings. The maximum packet size to prevent the packets from fragmenting into multiple frames and the burst limits to prevent the packets from being buffered written. If you are unable to change the maximum packet size, you *may* still realize some benefit by changing the burst limits.