Unicode and ISAPI Filters


Question:

How can one use GetHeader api of PHTTP_FILTER_PREPROC_HEADERS to retrieve fields, such as “url” and others, in unicode? I could not find any documentation on this topic.

Thanks a lot,

Answer:

It is not possible to retrieve a value in Unicode using GetHeader.

ISAPI Filter is an ANSI API – all values you can get/set using the API must be ANSI. Yes, I know this is shocking; after all, it is 2006 and everything nowadays are in Unicode… but remember that this API originated more than a decade ago when barely anything was 32bit, much less Unicode. Also, remember that the HTTP protocol which ISAPI directly manipulates is in ANSI and not Unicode.

Now, one should note that GetServerVariable() of ISAPI Filter on IIS6, like ISAPI Extension on IIS6, is able to retrieve server variable values in Unicode, using the UNICODE_ prefix in front of server variable names. Prior versions of IIS do not support nor retrieve any values in Unicode in ISAPI.

However, this does NOT apply to request headers retrieved via the HEADER_ or HTTP_ prefix (i.e. you cannot use UNICODE_HTTP_ACCEPT_ENCODING to retrieve Unicode value for the Accept-Encoding HTTP request header). It also does not make sense, either, because HTTP headers are not transported in Unicode. You might as well make the MultiByteToWideChar() call yourself if you want the value in Unicode.

The usual caveat with GetServerVariable() and ISAPI Filters is that not all server variables are valid in any given ISAPI Filter event (i.e. you cannot retrieve UNICODE_URL in SF_NOTIFY_PREPROC_HEADERS because it is not yet valid, but you can retrieve it in a later event like SF_NOTIFY_AUTHENTICATE).

In other words, for fields like URL that you would retrieve with GetHeader, it is simply not possible for you to retrieve them in Unicode during the SF_NOTIFY_PREPROC_HEADERS filter event.

//David

Comments (17)

  1. Kirit says:

    The filter API though doesn’t get the URL decoding correct in all circumstances when it is UTF8 encoded. Installing an extension to handle 404 errors and then the URLs are correct.

    For example, the filter will get this URL correct:

    http://www.kirit.com/Niccol%C3%B2%20Machiavelli/The%20Prince/Dedication

    But a URL like this one will be correct in the 404 handler (which is why it manages to find the page), but the filter cannot find it as the URL was incorrectly decoded.

    http://www.kirit.com/Categories:/Microsoft%20Windows%E2%84%A2

    Is there a way to get the actual URL that was requested without it being decoded? Then it would also be able to use a custom decoding scheme (for example treating underscores as spaces to make URLs easier to read than having %20s in them).

    The level the filter is running on at this site is SF_NOTIFY_AUTH_COMPLETE.

  2. David.Wang says:

    Kirit – Filter API is ANSI. Decoded UTF8 character may/not be in the current server’s code page, so behavior of UTF8 URLs (and corresponding Unicode Filesystem namespace) will always be hit-miss when get/set with the Filter API.

    I am not certain what is correct/incorrect in your ISAPI Filter examples, 404 handlers, file not found, etc – Nor what API call you are making to retrieve the values, so I cannot comment further.

    FYI: Custom mapping of underscore to spaces does not require URL manipulation.

    This general problem space is addressed by GetServerVariable( "UNICODE_URL" ) as well as HSE_REQ_EXEC_UNICODE_URL in ISAPI Extension.

    //David

  3. Kirit says:

    David, thanks for your quick reply (and on a Sunday to boot).

    I’m using GetServerVariable with UNICODE_URL. What I really want to be able to get a hold of is what the UA actually sends to the server. With that I can decode it myself (like I already do with the query encodings if I choose to do those in a way IIS doesn’t understand).

    For the underscores, I wouldn’t need to get the URL as sent by the UA unless I wanted to encode the underscore as %5F which would be a pretty natural encoding. If I leave that to IIS I won’t know which underscores are which.

    There  is a similar issue with the URL redirect mechanism IIS uses for a custom 404 handler. It builds a URL to pass in as a query string to the handler which is all very well, but it only provides the decoded URL (which it decodes to give a different file specification than is available in the ISAPI filter). So, for example, a question mark in the URL encoded as %3F is then indistinguishable from one that starts the query string (although the query string is left encoded so that can be reliably decoded), see http://www.kirit.com/Errors%20in%20IIS‘s%20custom%20404%20error%20handling.

    There is yet another example of an encoding/decoding problem in the Response.Redirect() method too. It expects an un-encoded URL (or at least that the file specification part is not encoded) which in effect makes assumptions about the format of a valid URL. Even the W3C isn’t immune to these problems (see http://www.kirit.com/W3C‘s%20CSS%20validation%20service).

    So, is there no way to find what the UA actually sent to the server with the GET or POST (or whatever)? If so this is a real shame and a seemingly pointless oversight.

  4. Kirit says:

    On re-reading that I realise that I’m not making an awful lot of sense.

    The basic thing is that if the filer API is going to decode the URL given by the browser based on its own thoughts on which code page to use then the obvious answer is to allow access to the string given by the browser in an un-modified form.

    I can’t show what the filter gives for the second example URL on the original comment because to do that will break the site. The filter correctly decodes the sequence %C3%B2 to a "ò", but in the second example the sequence %E2%84%A2 is decoded in such a way that it isn’t possible to interpret. If the 8 bit string is treated as UTF8 then the first URL is broken because the "ò" is not valid at that location in a UTF8 string. I can’t remember if the second URL as exposed to the filter could be decoded as a UTF8 sequence, I don’t think so.

    The reason why both pages work on the site is that the first is served by the ISAPI filter and the second is served by an ISAPI extension that is the custom handler for 404 errors. You can see this because the filter knows where the query starts so you can append a query string to the first URL and it will still serve the page. With the second if you append a query then the extension can’t work out if the question mark is part of the path specification in the URL or not.

    Between the two of them this allows most pages to work reasonably well (although the problems with the way that the 404 URL is passed to the extension makes it hard to decode any query strings that may be used).

    What is most strange is that the way that the path specification in the URL is decoded by IIS when requesting UNICODE_URL is different to the way that the URL is decoded when IIS builds a URL to pass to the 404 handler (which is then fetched using GetServerVariable on the EXTENSION_CONTROL_BLOCK using UNICODE_QUERY_STRING).

    I spent quite some time trying to work out what encoding the filter was assuming (I tried all the mbcs system calls etc), but I couldn’t find how it was doing it at all.

  5. David.Wang says:

    Kirit – I see… you are doing your homework on URL encoding and yearn to have the raw values so that you can do everything yourself. Unfortunately, most of your fellow developers have NOT done their homework and abused this privilege in the past to the tune of producing security vulnerabilities… so the raw values and access you want is simply not available on IIS6. It isn’t exactly an oversight on our part; we have been watching what people have ]been doing…

    I’m not certain what you are trying to corroborate nor when you want to obtain the values to do what, so I cannot comment further on the most effective way to do what you want. I’m just going to answer your questions.

    The reason Response.Redirect() and related functions which send browser-actionable data require unencoded URL is because the function will apply a %-encoding automatically. Why did we do this? Because it was the fastest server-side way to address the majority of cross-site scripting (XSS) security vulnerabilities that users write all the time. Yes, we all know that XSS is really a client-side problem to bad server-side code, but customers saw this as a server problem and demanded a server-side solution because they can’t make their users fix/upgrade their browsers nor did they want to fix their server-side code to not be vulnerable.

    As for 404 Custom Error handler, it gets decoded original URL (but encoded querystring) because that is exactly the safest and most correct thing that IIS can provide. Why? Well, it would be bad for IIS to hand server-side code a raw URL so late in request execution, nor should IIS ever decode the querystring because it cannot do so correctly.

    Finally, on IIS6 it is impossible for ISAPI Filter to get the real raw values. HTTP.SYS actually handles and parses the request in kernel-mode, so IIS in user-mode only get digested, Unicode values. For app-compat reasons, IIS6 actually converts the Unicode values back to ANSI for ISAPI Filters. For example, notice that the ALL_RAW server variable value does not necessarily match the raw request on IIS6, but it does on IIS5.

    I know that you want raw values to do your own thing, but unfortunately in the grand scheme of things, more harm results from IIS providing raw values to developers that abuse it than the feature parity of providing them.

    Regarding your question on path specification differences between UNICODE_URL and URL passed to the 404 Handler, can you give concrete example and values for:

    1. Raw URL sent by client over the wire (and encoding of URL)

    2. UNICODE_URL by ISAPI prior to CustomError invocation

    3. UNICODE_QUERY_STRING inside of CustomError Handler

    //David

  6. Kirit says:

    David, thank you for your candid reply. I’ve seen more than enough bad code and sites to know that the default behaviour of an API needs to be such that users (read developers) don’t get themselves into too much trouble. I’m also sure that my library designs will have holes in them that will come to light in due course (even though I believe I’ve plugged the most likely holes – only time will tell though). Belief in your own engineering often isn’t worth the thoughts that try to express them :-). I just notice my failure to encode apostrophes and the blog software’s failure to realise that they’re a legitimate part of a URL.

    It seems as time goes by I use the IIS framework for less and less of the core HTTP delivery mechanisms (I’m even sending status codes from my filters), but I totally understand the advantages that I get from the worker architecture in IIS 6 even though I don’t take advantage of much of the hand holding in the HTTP protocols themselves. IIS 6 has shown up race conditions and other multithreading issues that were hidden on IIS 5, although, I have to say not all of them turned out to me in my code base.

    Although I have pretty effective multithreading capabilities and TCP/IP streams in our framework I’m also pretty certain that the implementations in IIS 6 beats anything my small team could come up with. This alone makes me reluctant to just throw away IIS and use our own HTTP server implementation, quite apart from any marketing considerations…

    I will get back to you with the answers to the questions that you ask about what I get from requests types, but right now I’m stuck in trying to upgrade the content system’s support for Mediawiki syntax (which is by no means simple – the syntax looks simple but the validation rules around many constructs, especially lists, doesn’t allow for the simple modular approach that we may expect). It’ll be a week or two before I can reconfigure a development server and go back through the string decoding routines.

    This exchange is certain to have an effect on my road map for what I will be using and designing. For the way forward I’m not sure as I will need time to think this through. I could insert a question mark at the beginning of each path specification and that would force IIS to give me the encoded string. This I have to think about as it will be a big change in how sites are structured… What I do know is that I can’t deliver a system to a customer and say that I don’t know what URL to issue to a browser to retrieve a particular resource because I can’t be sure how IIS will decode it. That strikes me as not good enough. Hand holding I can live with but that just seems capricious.

    However, having said all of that, I’m sure that any security improvements achieved through locking doors that only a tiny fraction of web sites need will benefit the internet as whole.

    Now, I must finish that parser and then I’ll get on with working out what IIS really does so I can work out how I should be designing my framework.

    Kirit

    PS- I will be writing (finishing) a page on how to do Respone.Redirect if you already have an encoded URL (like one fetched from a refer(r)er header). The article will deal with all the 30x responses and what do do about document bodies. Give me another couple of days to finish it though 🙂

  7. Suyog says:

    I am using GetHeader and SetHeader functions in

    "SF_NOTIFY_AUTH_COMPLETE" .

    The signature of the function is:

    BOOL (WINAPI * SetHeader) (

         struct _HTTP_FILTER_CONTEXT * pfc,

         LPSTR lpszName,

         LPSTR lpszValue

         );

    As mentioned in MSDN LPSTR is a "Pointer to a null-terminated string of 8-bit Windows (ANSI) characters."

    So my question is that is there a function which I can use to manipulate the headers using wide char functions?

    I am not aware of how certificates work but I assume that they provide data which could be in UNICODE and then I could be in trouble.

    Can you someone help me sort this out?

    Thanks in advance,

  8. David.Wang says:

    Suyog – ISAPI Filter API provides no mechanism to manipulate anything using WideChar. Neither does ISAPI Extension API except for very specific functions and even then, IIS only accepts URL and FilePath in Unicode. Remember, HTTP is not in WideChar.

    If you need to transport Unicode values through a non-Unicode transport, then you should do what Browsers do – transport Unicode as %-encoded UTF8, which transports as ANSI but can be decoded back to Unicode.

    //David

  9. Kirit says:

    I’ve been back through the code that I’ve been using. Although I wasn’t able to recreate the problem (which is good I guess, but I would have prefered to understand why it was happening) I have written up everything that I did to track through it.

    There’s a full write up on fetching the Unicode values and how IIS expects path specifications to be encoded. I haven’t found any official documentation on how IIS interprets this information or how it falls back to other encodings (or even which it uses when it does fall back). There are a number of examples in the article below and it looks like IIS (at least on my systems which may of course not be typical) prefers UTF-8 to any other encoding.

    http://www.kirit.com/Getting%20the%20correct%20Unicode%20path%20within%20an%20ISAPI%20filter

    Anyway, thanks for your attention.

    Kirit

  10. rmani says:

    I am writing an ISAPI filter that is to be used to measure some performance metrics in IIS 5.1 and 6.0 (on a per request basis)

    As part of correlating these metrics with other processes that process  the requests , the response is expected to have a header which I am interested in retrieving in my filter.

    I have used 2 functions so far and have not been successfull…

       GetServerVariable

       GetHeader

    I have used them in the SF_NOTIFY_SEND_RESPONSE  and SF_NOTIFY_LOG event handlers in my filter.

    The return value is false and the GetLastError returns -1

    Any help is appreciated..

  11. David.Wang says:

    rmani – please read:
    http://blogs.msdn.com/david.wang/archive/2006/04/20/HOWTO_Retrieve_Request_Headers_using_ISAPI_ASP_and_ASP_Net.aspx

    You said the expected header is in the response.
    1. By definition, GetServerVariable only retrieves the REQUEST header, so it will not work to retrieve RESPONSE header and it is by-design.
    2. If GetHeader in SF_NOTIFY_SEND_RESPONSE does not work, then either
    a. you either did not use the right syntax – use GetHeader(“special-header_name:”)
    b. or the response header was NOT sent by the ISAPI/CGI on IIS as a “structured” header – in which case the only way to retrieve the header is for your ISAPI Filter to listen on SF_NOTIFY_SEND_RAW_DATA, buffer the response, and re-implement the HTTP Response Parsing stack, parse all response output, and locate the response header that you want. As soon as you do this, the kernel response cache is turned off for all requests that the filter applies to.

    //David

  12. minder says:

    Hi I don’t know very well about ISAPI and I have a problem with ASP programming. When IIS executes 404, ??? appears instead of characters because IIS doesn’t support Unicode. I want IIS to support Unicode and shows the proper character. As far as I understand, you refer to this problem. As I say, I am not very familiar with ISAPI.

    I would appreciate very much if you could give me the full source code to accomplish this and how I install them. C++is installed to my computer and I can compile your sample codes. Thank you very much in advance.

    Best regards

    Sinan MERT

  13. Uday Shanbhag says:

    Hi David,

    I wanted to know is it possible to write widecharacters in  client through ISAPI.

    I am working on a application which was ANSI earlier. Now converting it to support unicode.

    English characters which are converted to wide character are getting displayed properly but other language characters are not.

    So i wanted to know whether it is possible to write characters of other languages using ISAPI?

    please provide any small ISAPI application or any web link that might be helpful.

    Thanks in advance,

    Uday

    udaymshanbhag@gmail.com

  14. muhammadasimsajajd says:

    Hi All,

    I have developed an ISAPI Filter which saves user name in the cookie in ANSI format but the same cookie is also used by my web application which uses UTF-8 format to save and retrieve values in that cookie.

    Actually I am developing a Swedish application which will run on Windows Server 2003 (Swedish version) and the user name contains Swedish charters (For example ‘Administratör’). The filter store this user name in the cookie in ANSI format and when my Web application retrieves that user name from that cookie, it reads it as ‘Administratr’ (without ö).

    Is there any way that my ISAPI Filter can store User name in UTF-8 format in the cookie or we can set the HTTP page header to use UTF-8 format (encoding)?

    Following is the code I have written to write cookie

    sprintf(szCookie, "Set-Cookie: UserID=%s;expires=%s; path=/;rn;",” Administratör”,CurrentDate);

    PHTTP_FILTER_CONTEXT->AddResponseHeaders(pfc, szCookie,0)

    Regards

    Asim

  15. David.Wang says:

    Asim – I suggest you properly encode and encode your values such that they pass correctly transparently.

    HTTP 1.1 header is defined by RFC 2616 to use OCTETS (any 8-bit character), which means that while you can put any 8-bits, including the "o", into it, the recipient also interprets it as OCTET. There is no way to "set the HTTP page header to use UTF8 format" since it’s defined already.

    In your case, you put in the "o" (character above 127) as ANSI, but the page interpreted it as UTF8, which is not valid for characters above 127.

    If you %-encode UTF8 versions within the ISAPI (and corresponding %-decode in the application), the character should transport correctly.

    //David

  16. rob says:

    Hello david,

    I’m having some problems with decoding urls on IIS6 even if I use %xx utf-8 encoding!!

    When I click on URL UTF-8 encoded, resulting querystring replace any non english chars with question marks. Why? I use UTF-8 encoding in my pages !!! Thank you

  17. David.Wang says:

    rob – %-encoded UTF8 value as URL works. ISAPI Filter API is ANSI, so you must use %-encoded UTF8 (and decode it yourself) to pass values around. That is the way to pass Unicode values through ANSI APIs.

    The encoding "in your pages" have no effect on the interpretation of URL or querystring.

    //David

Skip to main content