XSS Filter Tech: Later is Better?

Arcane design decisions can have subtle but important effects on the characteristics of a security mitigation.  Consider how client-side XSS filtering might examine a given HTTP response for evidence of a reflected attack.  Is it more sensible to examine the response before or after that response is processed in the browser?

An easy answer might be that it’s better to examine the response after processing, as this is when the true meaning of the response is most apparent.  While this makes sense intuitively, it turns out that later matching may be considered suboptimal.  Let’s explore why.

Transformations Everywhere

As suggested above, modern client-side XSS filtering techniques attempt to identify request data that has been reflected into the response.  Transformations on this data will always have the potential to reduce a filter’s ability to successfully make a match.  Some transformations may occur at the server, and since the XSS filter has no way of preventing this, it must compensate.  XSS filters do in fact compensate for server-side transformations.

But at the same time, transformations occur on the client side as a response is parsed and makes it way out to the HTML DOM and/or script engine.  For a filter that delays matching until after a response is processed by the browser, transformations will apply prior to matching and can inhibit successful matching.  This manifests as a filter bypass scenario, or false negative.

In fact, transformations that may occur in various places within the browser codebase are not generally regarded as security related, so they may get introduced, change, or disappear over time without any warning.

Example

Consider the following filter bypass scenario that affected a very old version of Chrome (4.0.249.89):

Benchmark:
foo?x=<IFRAME%20src='javascript:alert(1)'>

Bypass: 
foo?x=<IFRAME%20src='javascript:alert%26%23x25;281)'>

This issue was reported several years ago and subsequently resolved.

Observe how the open-parenthesis is replaced with its HTML encoding, &#x25;.  Because browsers automatically HTML-decode attributes, the &#x25; is able to substitute for open-parenthesis in this context.  But given a post-parsing matching process, the &#x25; present in the URL will no longer match the open-parenthesis present in the actual script!

Conclusions

It is not simply a matter of later matching being inherently “better.”  As you can see from the example above, there is a real tradeoff – while a late-matching technique may be able to more specifically target an attack for disablement, it loses some ability to accurately identify an attack.

In any manifestation of this problem, it would seem that there are several approaches to mitigate the threat:

  1. Move the matching process to occur before parsing.  This may be difficult if matching is entirely context-dependent.  Eg: If there are no regular expression heuristics to identify what an attack actually looks like, reflection detected before parsing may be too generic to flag as a potential attack.
     
  2. Simulate HTML encoding behavior in matching.
     
  3. Change the browser to remove automatic HTML decoding in attributes.  Unfortunately though this is a bit of a non-starter.  The automatic HTML decoding is supported cross-browser and removing it would trigger application compatibility issues.

Taking an early-matching approach, the Internet Explorer XSS Filter still must account for the behavior of the HTML parser so as to properly identify attacks.  It does this in a way that is designed into its core architecture – using a flexible regular expression heuristic.

Refining a regular expression has a number of advantages relative to alternative approaches to addressing bypass scenarios like the one described above.  Specifically:

  • Straightforward implementation and testing
  • Consistency across fixes
  • Easier to reason about overall fix approach
  • No code churn outside the core filter logic
  • Less likelihood of a performance penalty

Finally, the “later is better” argument fails to recognize that matching can be decoupled from blocking.  It is asserted that matching should occur as early as possible so as to avoid any transformations that may be observed as a response proceeds through the browser’s internals.  For the purposes of accuracy, blocking can be performed later in the process, as necessary, when the browser has determined the semantics of any suspect response fragment.