Why URLScan ignores the querystring for DenyUrlSequences

Article
07/19/2005

Users frequently want to use URLScan's DenyUrlSequences feature to scan the incoming URL, including querystring, and reject based on various arbitrary criteria, such as potential SQL injection characters, banned characters for directories/files, etc.

However, one quickly finds that URLScan only applies its checks, including DenyUrlSequences, against the URI-stem portion but NOT the querystring. So, as the following user asks, why? Is this just a bug in URLScan, or is there another reason...

Question:

Does urlscan ignore the rest of the querystring after "?" and if so, is there a way to get it to process the entire querystring? I can include invalid characters in the querystring after the ? and urlscan allows them - sql injection becomes quite a problem then : )

Answer:

This behavior is actually by design. URLScan does not apply its checks against the querystring because of one fundamental uncertainty - it is not clear how to decode and normalize the querystring to be able to make the logical character comparisons that the user configured.

Why Normalize?

Now, some users question why URLScan needs to normalize a string for logical character comparison. After all, isn't just a simple strstr() of the raw data good enough for both URI-stem and querystring? Well... not exactly.

Consider the simple case of detecting the .. (directory traversal) sequence. According to URL encoding rules, web servers must consider the following raw data sequence sent by clients logically as .. :

..
%2E%2E
.%2E
%2E.

Thus, if you are trying to deny directory traversal but making character comparisons against raw data, you need to declare all permutations of possible encodings which logically map to .. (because the bad guy can try any combination). Clearly, this will easily get out of hand since there are many possible encodings for every logical character and for security, you must give all possible permutations. This is why logical character comparisons MUST be made against normalized strings, not raw data.

The Case Against Querystring

Now, HTTP clearly defines possible encoding semantics for the URI-stem, so URLScan can confidently decode and normalize the URI-strem to make logical character comparisons and take action. But what about the querystring?

RFC 2396 defines "query" in section 3.4 as (I have snipped out all of the relevant BNF definition of the terms in question):

    The query component is a string of information to be interpreted by
   the resource.

      query         = *uric

   Within a query component, the characters ";", "/", "?", ":", "@",
   "&", "=", "+", ",", and "$" are reserved.

      uric          = reserved | unreserved | escaped

      reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                    "$" | ","

      unreserved  = alphanum | mark

      alphanum = alpha | digit

      alpha    = lowalpha | upalpha

      lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                 "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                 "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"

      upalpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"

      digit    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                 "8" | "9"

      mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

      escaped     = "%" hex hex
      hex         = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                            "a" | "b" | "c" | "d" | "e" | "f"

Notice that the definition of "query" states that it is a string of information to be interpreted by the resource (underlined for emphasis).

In other words, the encoding and meaning of the querystring is completely arbitrary and determined by the targeted resource itself. Intermediaries like URLScan have no idea how the querystring is encoded nor interpreted.

This means that it is impossible for automated tools like URLScan to generically decode and normalize the querystring with 100% accuracy and apply a logical character scan (such as for SQL Injection character sequences). How can URLScan figure out how your web application decodes the querystring and falls victim to SQL Injection?

Quite simply, it cannot. This is the fundamental reason why generic character sequence scanning tools against the querystring is not 100% reliable.

But, but, but...

Now, some users scoff at the harsh stance of URLScan - why not have the ability to only apply some relaxed querystring scanning to certain URL extensions and do the obvious - no decoding whatsoever since applications tend to do this? I mean, other IIS Security tools offers this sort of security "feature", so why doesn't URLScan?

Well, yes, if you constrain the problem, you can find localized solutions that require tweaking, but URLScan is a general purpose tool. Anyone can write a custom ISAPI Filter which works only for their particular situation, but that does not make it a redistributable solution for the masses.

Besides, if you knew which URLs to apply the scan against and you knew how it interprets the querystring, why don't you just fix the code itself? And if you did NOT know the URL nor how the querystring is interpreted, then how do you suppose an external tool like URLScan can figure it out?

Thus, I suggest that you approach solving security issues by fixing the vulnerable source code itself. Generic tools to scan/reject requests simply cannot confidently work 100% of the time.

//David