HTTP.SYS, IIS, and the 100 continue


Question:


Hi David,


“My Company” is a leading middleware provider for mobile multiplayer games. Cutsomer like Disney, THQ etc.


The backend is built on .NET. We went live in the US with W2K3/IIS6 which is great.


But we have a major IIS6 issue. The handsets connect through HTTP/POST and sometimes the Server answers with “100 continue” and this leads to crashes on certain phones.


In IIS5 we wrote an ISAPI filter – which works well.


Question:


1) Is IIS6 sending the “100 continue”. We assume yes … (due to file must exist)


2) Will an ISAPI Extension be able to fullfill this?


I would really apprciate your help.


thanks,


Answer:


<soapbox>


Actually, I would frame it differently – this is probably NOT a “major IIS6 issue”.


Clients which advertise to be HTTP/1.1 compliant and then crash on “100 continue” are the real problem (they are not following public specifications), and servers that allow such broken clients are also a part of the problem.


Technically, these phones should just keep crashing until the consumer gets sick of it and switches to another phone that works correctly. This is the way to get the phone manufacturers to write/use properly implemented networking protocol stacks – when their customers hit their pocketbooks. As middleware, it should not differ if the consumer uses one phone or another to run your games… as long as the consumer has *A* phone that runs your games.


On the other hand, if the server keeps hacking to work around client-side bugs, the phone manufacturers never get wind of their problems and have ZERO incentive to fix their phones, leading to accumulation of server-side hacks over time that increase server maintenance costs for you.


Thus, it is in your best interest to notify and get phone manufacturers to fix their buggy software.


I understand that business pressures can force you to compromise and otherwise work-around such issues in the spirit of “making things work”, but I just want to remind you of the implications of your actions on your long-term interests.


</soapbox>


On Windows Server 2003, it is a cooperation between IIS6 in user mode and HTTP.SYS in kernel mode that drives HTTP request/response serving. Logically speaking:



  • HTTP.SYS picks up data off the network, parses it into HTTP requests, identifies which Application Pool it belongs, and places it into its queue.
  • IIS6 user mode worker process picks up requests from the queue of its Application Pool, processes each by running either a user-supplied ISAPI/CGI or the built-in IIS Static File Handler, and hands the response back to HTTP.SYS to send over the wire.

A “100 continue”, like a “400 Bad Request” or a Kernel Response Cache Hit, is special in that HTTP.SYS transparently handles it in kernel mode without notifying user mode of anything. In addition, ISAPI Extensions cannot interact with any response output – they can only generate response output, not see results of response output. Thus, an ISAPI Extension will never be able to interact with requests that generate “100 continue” nor “100 continue” responses themselves to suppress them.


On IIS6, the only way to inject user mode processing into these transparent request handlings of HTTP.SYS is to run in IIS5 Compatibility Mode and use an ReadRawData/SendRawData ISAPI Filter. ReadRawData forces HTTP.SYS to hand the raw data off the network into user mode for filtering PRIOR to parsing that user mode output into HTTP requests to place into queues.


Of course, this method completely defeats the purpose of running IIS6 with Application Pools and process isolation (a single failure in this filtering user mode process halts the entire server)… but such is the server-side compromise when the client is buggy…


FYI: This approach will not work on Vista Server/IIS7. HTTP.SYS will no longer hand raw data off the network into user mode for filtering prior to parsing, so it will be impossible for user mode code to know that a request which triggers the automatic “100 continue” happened.


//David

Comments (23)

  1. David Barlow says:

    Doesn’t your explanation above conflict with this KB article http://support.microsoft.com/default.aspx?scid=kb;en-us;898708&sd=rss&spid=2097 , where Microsoft acknowledge that this is a bug in IIS6 on Windows 2003 SP1? My company has received the hotfix for this bug, and have not seen any "100-continues" since.  We have not yet confirmed whether this has been fixed in Windows 2003 R2.

  2. Phylyp says:

    > Technically, these phones should just keep crashing until the consumer gets sick of it and switches to another phone that works correctly.

    You’ll drive Raymond Chen out of a job if you do this!! 🙂

  3. David.Wang says:

    David – They are actually not in conflict.

    The KB references a real bug where from the client perspective, the 100 continue incorrectly came *after* a 200 OK.

    On the other hand, this question is more about whether a 100 continue acceptable at all by the client (i.e. reference to ISAPI Filter on IIS5 to get rid of the "100 continue" since phones crashed on it).

    Meanwhile, all of the technical information are still correct:

    – client that advertise HTTP/1.1 must handle "100 continue" response

    – "100 continue" is transparently sent by HTTP.SYS (whether it is correctly sent or not is up to debate with the bug[s])

    – ISAPI Extension cannot be involved with anything related to "100 continue"

    – ISAPI Filter in IIS5 Compatibility Mode with ReadRawData+SendRawData can be involved with "100 continue"

    – This will not work on Vista Server/IIS7

    //David

  4. David.Wang says:

    Phylyp – hehe. This is the reverse of Raymond’s recent blog entry on the SAMBA bug and the Vista resolution… except for the middleware vendor, it really doesn’t matter which way to go. 😉 It so happens that the easiest way to go also cooperates with the spec.

    //David

  5. I really think, you have to tread the phone vendors to solve that bug. Also an problem is that too less people are reporting that to an vendor. they just look for an workaround on their side (e.g. the server side) just that it works. and not writing an mail, call them. if you are an software vendor, that works for such big media companies, than you can throw that into the discussion…. say them, they made that wrong, and that you will identify such an handset, and tell the customer, that his phone is buggy…. and then see, how fast they change that.

    yea, in the most cases, we at the server side needs an workaround. but if we do that silently, nothing will change.

  6. David.Wang says:

    Christoph – Exactly!

    Server-side "compatibility hacks" cannot be made silently or else the clients will keep abusing it.

    //David

  7. PlatformAgnostic says:

    For extremely specialized cases like this (non-standard webservers), is there a kernel-mode interface to filter http.sys incoming and outbound requests?  I can’t imagine it would be fun to support, but the barrier to entry into kernel land is so high that only the few who really need it will go to those lengths.

  8. Harri J says:

    New aspect on HTTP/1.1 100 Continue and IIS6.

    This question is not about HTTP 100 continue handling in mobile devices, nor about raw interface inside IIS to remove HTTP 100 Continue at low level.

    Instead, would IIS/HTTP.sys respond to below TCP level packets (having unusual many PSH flags set) with HTTP 100 Continue response? Based on your explanation on IIS6 and HTTP.sys versus ISAPI filter that is my suspicion.

    Landscape: IE clients -> ssl accelerator/reverse proxy -> IIS6+custom ISAPI filter

    Connection: Keep-Alive is always used and ethereal trace confirms it.

    Scenario at TCP packet level between ssl/revProxy <-> IIS6:

    1. -> PSH,ACK, payload: POST header part 1

    2. -> PSH,ACK, payload: POST header part 2

    3. <- ACK, no payload

    4. -> PSH,ACK, payload: POST header part 1

    5. -> PSH,ACK, payload: POST header part 2

    6. <- ACK, no payload

    7. -> PSH,ACK, payload: POST end of header+0d0a0d0a

    8. <- PSH, ACK, payload: HTTP/1.1 100 Continue

    9. -> RST, no payload

    10. -> PSH, ACK, payload POST partially its body

    11. -> RST, no payload

    Does IIS6/HTTP.sys dislike the PSH flags in all the packets of POST request and respond with HTTP/1.1 100 Continue in packet 6?

    We traced the same IE against the same IIS6 with bypassing the ssl/reverseProxy and there are PSH flags set only in following packets. In this setup IIS6 does not response with HTTP/1.1 100 Continue. Thus we suspect IIS6/HTTP.sys not liking the PSH flag being repeated. Below is the bypassing TCP traffic. Here the PSH flag is set only to mark the end of HTTP request header or HTTP response header and IIS6 does not ever response with HTTP/1.1 100 Continue.

    1. -> ACK, payload: POST header part 1

    2. -> ACK, payload: POST header part 2

    3. <- ACK, no payload

    4. -> ACK, payload: POST header part 1

    5. -> ACK, payload: POST header part 2

    6. <- ACK, no payload

    7. -> PSH,ACK, payload: POST end of header+0d0a0d0a

    8. -> PSH, ACK, payload POST whole body

    9. <- ACK, no payload

    10. <- PSH, ACK, payload: HTTP/1.1 200 OK and whole response header

    11. <- ACK, payload response body

    Would IIS6/HTTP.sys really get annoyed of the PSH flagged TCP packets? Or do you see any other reason for the HTTP/1.1 100 Continue response?

    br

    Harri J

  9. David.Wang says:

    Harri – What is the time-elapse for the client to send:

    1. the POST and of header + 0d0a0d0a packet

    2. the entity body packet

    In both cases, they are separate packets, so I want to know how much time elapsed between the two packets in both the situation with and without 100-continue response.

    //David

  10. Harri J says:

    David: The time differences are as follows:

    scenario1 with http 100:

    time in s

    0.000000 1. -> PSH,ACK, payload: POST header part 1

    0.000115 2. -> PSH,ACK, payload: POST header part 2

    0.000317 3. <- ACK, no payload

    0.002403 4. -> PSH,ACK, payload: POST header part 1

    0.002533 5. -> PSH,ACK, payload: POST header part 2

    0.002553 6. <- ACK, no payload

    0.002589 7. -> PSH,ACK, payload: POST end of header+0d0a0d0a

    0.008605 8. <- PSH, ACK, payload: HTTP/1.1 100 Continue

    3.015474 9. -> RST, no payload

    3.559465 10. -> PSH, ACK, payload POST partially its body

    3.559496 11. -> RST, no payload

    scenario 2 without http100:

    0.000000 1. -> ACK, payload: POST header part 1

    0.000117 2. -> ACK, payload: POST header part 2

    0.000130 3. <- ACK, no payload

    0.000245 4. -> ACK, payload: POST header part 1

    0.000366 5. -> ACK, payload: POST header part 2

    0.000394 6. <- ACK, no payload

    0.000434 7. -> PSH,ACK, payload: POST end of header+0d0a0d0a

    0.000452 8. -> PSH, ACK, payload POST whole body

    0.000462 9. <- ACK, no payload

    0.253168 10. <- PSH, ACK, payload: HTTP/1.1 200 OK and whole response header

    0.261283 11. <- ACK, payload response body

  11. David.Wang says:

    Harri J – I believe it is the longer timing between sending POST header and POST entity that triggers the 100-continue.

    http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.2.3

    Handling of 100-continue is pretty twisted (look at the RFC’s complexity). Read the section on "Requirements for HTTP/1.1 origin servers".

    What it boils down to, from a practical perspective, is that if the client waits too long between sending completed request headers and the request entity body, server will send 100-continue after waiting a "short" amount of time.

    //David

  12. Nick Sivo says:

    I work at a company where we’ve run into this issue.  Complaining to Motorola will do us no good because consumers we need to support are using phones that won’t be affected if new firmware is released.  It’s not as simple as saying "the user should buy a new phone".

  13. David.Wang says:

    Nick – I understand your plight and how you are really stuck in the middle, but banking on the user to switch to a better phone is probably the best solution that utilizes market forces.

    Silently absorbing the problem at server-side simply ignores the root of the problem of buggy clients while increasing your server-side costs. If you want to support the costs, feel free to do so.

    Relative to computers and browsers, people break/switch cell phones far more frequently. I think it is viable to solve the actual problem by hampering the bad phone; phone manufacturers see this hit their bottom line and should adjust.

    //David

  14. Nick says:

    So, we actually solved the bug on the client side.  If instead of requesting "https://contoso.com/page.ashx"”>https://contoso.com/page.ashx" we request "https://contoso.com/page.ashx HTTP/1.0nJunk:", it sticks the HTTP/1.1 declaration into the Junk header.  Yay!

  15. David.Wang says:

    Nick – Awesome. Thanks for sharing the tip. It is a pretty clever work-around that should keep working until the client-side framework validates – and since the client is not updating, as you’ve said, this should be a good work-around that "deals" with the issue at the right place.

    //David

  16. Jimmy says:

    Is there a way for IIS to treat everything at HTTP/1.0?

    apache has this option call "downgrade-1.0" that forces the request to be treated as a HTTP/1.0 request even if it was in a later dialect.

    I hoping this would be a way to suppress 100 continues..?

  17. David.Wang says:

    Jimmy – there is no way to configure IIS to act like that, and for many good reasons. The Apache option you mention is just a Bad Idea (TM).

    The ramifications of changing a web server to act like HTTP/1.0 is not as simple as just returning HTTP/1.0 and suppressing 100 continues.

    For example, a web server acting as HTTP/1.0 would also have to prevent many HTTP VERBS, status codes, and request headers from being processed or generated. Yes, 100 continue is one of them, but there are others, such as:

    – PUT, DELETE

    – HTTP status 205, 206, 305, 307, 405, 406…

    – reject use of Transfer-Encoding: chunked header

    – alter default treatment of Connection: header

    – etc…

    Now, I am pretty sure you want some of those HTTP/1.1 benefits (for example, dynamic compression of responses from scripts like ASP, ASP.Net, PHP, etc). You just want the server to not return 100 continue and keep everything else. So what then? Add more options such as "allow-transfer-encoding" or "nokeepalive", "suppress-100-continue", "allow-HTTP-status-405", etc such that you get the as much of HTTP/1.1 benefits without returning 100-continue?

    But how does one define "to be treated as HTTP/1.0 request even if it was in a later dialect"? It sounds like a selectively proprietary extension that is not sanctioned nor standardized by W3C, in the name of "getting things to work".

    In other words, downgrade-1.0 is just a hack useful for someone’s specific situation but is not a good solution in general. It’s a hack that begets more hacks, thus it is not a solution.

    Now, suppressing 100-continue may be a pragmatic solution in specific situations (for example, you’re in a factory whose automation system does not understand 100-continue yet sends HTTP/1.1 requests, and it would cost millions to replace the machines), so the best way is a one-off solution on IIS6 to suppress 100-continue (which is possible, as I mentioned in the blog entry). However, this "hack" will not become a part of IIS nor be a configurable option because it is a dangerous and costly hack.

    When clients lie about their capabilities and servers can be configured to ignore the client’s claims, then what good is a specification like HTTP? The clients start implementing whatever it wants, bugs included, and the server starts having hard-coded patches for each broken client, implementing different behaviors that is a mash of existing standards. When does it end?

    Sure, one can say "I just want to suppress 100 continues, and Apache’s downgrade-1.0 allows that, so IIS needs to improve" — but that statement is dangerous to the future of the entire World Wide Web at the benefit of someones short-term gain.

    //David

  18. Kristofor says:

    Thanks for the thread!  It helped me track down a problem I ran into while troubleshooting an HTTP package for a lightweight (i.e., embedded) client (the API wasn’t properly handling 100/Continue).

    It’s aggravating, though, that in section 8.2.3 of RFC2616, it indicates that a server "Should not" return 100/Continue for HTTP 1.1 clients that haven’t specified the "Expect: 100-continue" header (assuming the server isn’t waiting on the socket datastream).  I’ve tested against several HTTP servers, and none do this except for IIS.

    While I can live with the idea that clients probably need to deal with the 100/Continue due to unknown conditions in the network (latency, etc.), it still looks like IIS isn’t conforming to the revised (RFC2616) HTTP 1.1 spec.

  19. David.Wang says:

    Kristofor – the problem is that not all clients which expect 100 continue send Expect, either. In any case, non-compliance with the specification causes issues for either client or server, so HTTP.SYS opted to be forward looking instead of backwards looking when it has to "guess". This tends to aggrevate the embedded and mobile clients who tend to be incomplete in implementation, but it will improve over time as the client improves.

    //David

  20. Dickson says:

    Thanks guys for the discussion. It has shed light.

    Bottom line?

    Get to the phone manufacturers and have them fix the bugs and comply with standards.

    I am also a mobile phone application developer and IIS is letting me down.

    Al try running Apache and see if I get similar issues.

    Keep on.

  21. Martin says:

    So far we have not found any web server software that can

    – act as reverse proxy

    – properly handle Expect: 100-continue headers.

    We did not try any hardware solutions.

    And so far we did not find any REAL need/excuse to use 100 headers.

    We use reverse proxies to set up portals, to pre-filter application garbage, to make authentication and authorization dirty job and so on. The ONLY problem we came up to is some Symbian clients trying to access Windows Mobile.