Share via


URL Fragments and Redirects

I’ve worked on the Internet Explorer team for six+ years, and on web sites for a decade longer, so I’m understandably excited when I come across a browser behavior I can’t explain. Last week, I encountered such a mystery, and it took me quite a while to figure out what was going on.

Background

Facebook tends to use URL Fragments in their URLs. For instance, a car dealer’s website includes a link to their Facebook page thusly:

https://www.facebook.com/\#\!/MBofWhitePlains?sk=app\_192229990808929 

The Fragment component of the URL is the end of the URL from the hash symbol (#) onward. URL Fragments are never sent to the server in the HTTP request— only JavaScript running in the page can see them. So, when your browser loads the URL above, the server sees only “https://www.facebook.com” in the request, and it’s the responsibility of JavaScript in the returned page to examine the URL to find the extra information in the Fragment.

Clicking on the link will go to the specified URL:

image

…and then script on the page will redirect you to a final page which contains the “MBofWhitePlains” identifier in the URL path, clearing out the URL Fragment.

image

Now, you may have heard that Facebook now offers an opt-in choice to always use HTTPS when loading Facebook:

image

If you set this option, Facebook will immediately return a HTTP/302 redirect for a HTTPS page if your browser ever requests a page using HTTP.

That’s a problem for this scenario: because the URL Fragment is never sent to the server, the server sends your browser a redirect to https://www.facebook.com, with no URL Fragment specified. Hence, when the redirected page is loaded, the URL Fragment is blank, and you’re left on the Facebook homepage.

Now, this made perfect sense to me—a simple limitation of the way Facebook is using URLs.

Except for one thing…

While Safari and Internet Explorer both behave as expected, Firefox, Chrome, and Opera were somehow landing on the HTTPS version of the car dealership’s Facebook page—not the homepage. This was a truly surprising outcome, and I spent a ton of time ensuring that the different behavior wasn’t related to Facebook performing User-Agent sniffing and returning different responses, or anything of the sort. It turns out that the code was the same, but the browser behavior was very different.

Peeking behind the curtain

After much debugging, I realized that Firefox, Chrome, and Opera will re-attach a URL Fragment after a HTTP/3xx redirection has taken place, even though that fragment was not present in the URL specified by the Location header on the redirection response. So

In Chrome/Opera/Firefox

Loading https://foo/#SomeInfo –> HTTP/302 to Location: https://bar => final URL of https://bar/#SomeInfo

In Internet Explorer and Safari

Loading https://foo/#SomeInfo –> HTTP/302 to Location: https://bar => final URL of https://bar/

Update: Internet Explorer 10 now preserves the fragment when loading a redirected resource, matching other browsers and the updated standards documents.

Here’s a simple test page: https://www.fiddler2.com/test/redir/fragment/ demonstrating this behavior:

clip_image002

Interestingly, Chrome, Firefox, and Opera reattach the fragment information even in a cross-domain redirect, and even when redirect from HTTPS to HTTP.

I wasn’t able to find anything in the HTML5 specification calling for this behavior:

The HTTP specification (RFC2616 and the active HTTPBIS revision) doesn’t specify proper behavior either, noting only that the behavior when the Location header itself contains a URL Fragment is not defined:

Note: This specification does not define precedence rules for the case where the original URI, as navigated to by the user agent, and the Location header field value both contain fragment identifiers. Thus be aware that including fragment identifiers might inconvenience anyone relying on the semantics of the original URI's fragment identifier.

…although almost all browsers appear to respect a URL Fragment specified on the redirect response. Specifically, if both the original URI and the redirect Location specify a fragment-- Internet Explorer, Chrome, Firefox, and Safari will use the Fragment component from the Location header. Opera 11.01 will instead keep the Fragment component from the original URL; they only use the Fragment component from the Location header if the original URL didn't contain a fragment at all. Opera 11.11 changed that behavior to match Chrome and Firefox.

Interesting stuff.

-Eric

Update: Internet Explorer 10 now preserves the fragment when loading a redirected resource, matching other browsers and the updated standards documents.

Update-to-the-Update: Internet Explorer 10 and IE11 behave differently than other browsers when there's no fragment on the first URL, there is on the first 302, and there's none on a second 302. (Test case)

Comments

  • Anonymous
    May 16, 2011
    Unless I'm mistaken, the Location field, as specified by section 14.30 of RFC 2616, is always an absolute URI. RFC3986 specifies that an absolute URI can contain a query string, but not a fragment. IE and Safari's implementation is the correct one, according to these two RFCs.

  • Anonymous
    May 16, 2011
    @DanielKi: Yeah, it's interesting. For what it's worth, HTTPBIS is updating RFC2616 to allow for relative URIs, since all browsers support these and many major sites use them. trac.tools.ietf.org/.../185 tools.ietf.org/.../draft-ietf-httpbis-p2-semantics-14

  • Anonymous
    May 16, 2011
    The comment has been removed

  • Anonymous
    May 16, 2011
    @Zoompf: Nah, there's nothing special about #! when it comes to their behavior. I added another link to the test case to demonstrate this.

  • Anonymous
    May 17, 2011
    Ha!  I love it when the "standards compliant" browsers bend the rules to get behavior which is probably more desirable in many circumstances.  Seems to me this is the exact kind of thing that those same people railed against Microsoft on during the IE6 days.... ;) Like I've said before -- show me someone who claims to have a standards compliant browser, and I'll show you a liar :)

  • Anonymous
    May 17, 2011
    Well, it is clear from the HTTPbis comment that they expect fragments from either URI to be used, since they explicitly state (unfortunately) that no precedence is defined when both URIs contains fragments. There would be no need for this comment if the Location header URI fragment overrides the original URI fragment (even with an empty URI). But they should have stated it, and they should define the precedence as implementations clearly diverge.

  • Anonymous
    May 17, 2011
    Nick: I would not go that far, but IE6 itself was called standard-compliant back in 2001.

  • Anonymous
    May 17, 2011
    @Gustaf: Having talked to a few of the people working on that section of HTTPBIS, I can say that the expected behavior is not "clear" at all. But I suspect an update may be forthcoming.

  • Anonymous
    May 18, 2011
    http://www.w3.org/TR/cuap#uri point 4.1

  • Anonymous
    May 18, 2011
    The comment has been removed

  • Anonymous
    June 29, 2011
    Great information. I also found out about this having the opposite problem, wanting to clear the URL fragment after a redirect. I thought it would take me a long time to figure it out, so thanks for the solution!

  • Anonymous
    July 25, 2011
    This has been fixed in the latest Editor's Draft of HTML5.  The diff is at <http://html5.org/r/6322>, and the updated navigation algorithm is at <dev.w3.org/.../history.html.  There's a typo in the new text, with a bug filed at <www.w3.org/.../show_bug.cgi, but the basic idea is there.  The revised algorithm requires the fragment to be propagated in cases like this. In the future, the spec could probably be fixed faster and more reliably if this sort of issue was reported in the W3C bug tracker instead of just mentioned in a blog post.  Adrian Bateman seems to be the one who reports most HTML5 bugs for Microsoft, so if you find further errors in HTML5, maybe you could ask him to file a bug if you don't want to create a Bugzilla account and so on.

  • Anonymous
    July 25, 2011
    @Aryeh: It's not entirely clear why HTML5 believes that this behavior is in their "jurisdiction", so to speak. The HTTPBIS guys have it tracked as an issue against their update to RFC2616 (and I'm generally inclined to think it more appropriately belongs there). But the jurisdictional issues here are not anything I'm experienced with or interested in. As to the "speed" question: the problem has been known to the web standards community for 12 years (see the expired draft I linked) but interoperable behavior was never specified in a draft that made it through standardization. Roughly two months after I "just mentioned" the issue in a post, a proposal appears in a Standards-track draft spec in Last Call.

  • Anonymous
    July 26, 2011
    The comment has been removed