Buffering in HTTP.SYS

My name is Chun Ye. I am a Software Design Engineer in the Microsoft Windows Networking Transports & Connectivity group. I'm here to describe the scenarios under which an application using HTTPAPI.DLL should set the HTTP_SEND_RESPONSE_FLAG_BUFFER_DATA flag. This flag applies to both HttpSendHttpResponse and HttpSendResponseEntityBody APIs.

This is the section in http.h that describes this flag:

//
// HTTP_SEND_RESPONSE_FLAG_BUFFER_DATA - Specifies that a caller wants the
// response to complete as soon as possible at the cost by buffering partial
// or the entire response.
//

#define HTTP_SEND_RESPONSE_FLAG_DISCONNECT 0x00000001
#define HTTP_SEND_RESPONSE_FLAG_MORE_DATA 0x00000002
#define HTTP_SEND_RESPONSE_FLAG_BUFFER_DATA 0x00000004

First of all, some background information related to the introduction of the buffer-data flag. Before Windows 2003, IIS 5.0 is implemented using winsock, which enables buffering by default. However, when IIS 6.0 in Windows 2003 moves to HTTP.SYS, users have encountered performances issues when sending a response or entity bodies. There are two underlying causes for this performance issue.

The first cause is related to the TCP Delayed ACK. The TCP stack in Microsoft Windows' implementation turns on the Delayed ACK algorithm by default. What this means is that the TCP receiver sends the ACK after every second segment or until a delayed ACK timer expires (the Windows TCP stack sets the timer value to 200 milliseconds). When an application sending a response or entity bodies results in odd number of TCP segments, the Delayed ACK is introduced, and the application's IO is blocked until the ACK is received.

Another issue that an application can encounter send performance issues is when the underlying network has long latencies (Round Trip Times). In this case, the ACKs take long time to come back therefore the IO is blocked for a long time until all ACKs are received.

An application using only a single send can be hit by both issues mentioned above. The result is under network utilization because the application is not keeping the network pipe full and it is serializing every send by RTT units of time.

To solve these problems, starting Windows 2003 SP1, HTTP.SYS introduced the HTTP_SEND_RESPONSE_BUFFER_DATA flag for the send APIs. HTTP.SYS applies the winsock-like buffering when the flag is set. By buffering, HTTP.SYS copies the user data into its own internal buffer and sends it to the transport layer. The original IO from the user mode is completed immediately once the data reaches the transport layer but without waiting for the all the ACKs. By buffering the data in the kernel, we are trading CPU/memory for network utilization. This allows a "one send at a time" application to queue sends in parallel.

Here is the recommendation for the three types of applications in terms of their IO models:

  1. An application that does synchronous IO. It is recommended that such applications always set the HTTP_SEND_RESPONSE_BUFFER_DATA flag for the send IOs.
  2. An application that does asynchronous IO but keeps maximum only one outstanding IO for send on each connection. Here it is recommended that the application should set the HTTP_SEND_RESPONSE_BUFFER_DATA flag for all the intermediate send IOs (when the HTTP_SEND_RESPONSE_MORE_FLAG flag is also set).
  3. An application that utilizes multiple outstanding send IOs on each connection. For instance, an application sends out the next entity body without waiting for the previous send completion. In this case, there is no need to set the HTTP_SEND_RESPONSE_BUFFER_DATA flag since such an application is not blocked in any way. New and truly asynchronous application should try to use this IO model.

While setting the buffering flag solves the performance issue for most of the applications that fall under category 1 and 2 above, there are certain negative sides that the user should be aware of:

  1. Setting the flag increases CPU and memory usage. Buffering requires extra memory to store the user response or entity bodies. So this means extra CPU cycles and memory usage. Compared to an truly asynchronous application (category 3 above), an application that sets the buffering flag can expect some decrement in terms of clients being serviced. A trade-off has to be made between application simplicity and system scalability.
  2. When the buffering flag is passed, the send API will be completed as soon as the data is copied and passed to the transport layer. Thus, the completion status of the send API is not indicative of the completion status of the actual send.

- Chun Ye