Namespaces in HTML: Too much trouble to bother with

If you've been following the progress of the Windows Live Contacts Control since its launch this past summer, you may recall that in the second beta release we shifted from a programmatic constructor call style of instantiating the control on the web page to a declarative html element with attributes style.  On page load, our js library scans the page for custom HTML tags that it knows about such as "contactscontrol" and kicks off instantiation of the corresponding JavaScript objects for each one found.

Whenever you introduce a new name into an existing system, you have to worry about name collisions.  What if someone else is already using this custom tag in their web page?  It doesn't matter how obscure and unusably complicated you make your custom tag, the risk of name collision is always non-zero.

But we have a solution for that, right?  Namespaces!  Namespaces have seeped into a variety of programming and markup languages, either by design, as with C# and XML, by evolution, as with Delphi, or by convention, as with JavaScript.   As long as the namespace itself can be defined unambiguously, you can prefix your custom name with your namespace to effectively reduce to zero the chance of name collision with someone else.  (Name collision risk is not actually zero, but when namespaces are used and defined correctly collisions can always be disambiguated)

So, to mitigate the risk that someone else is already using (or may someday start using) "contactscontrol" as a custom HTML element tag for their own purposes and conflict with our use of that tag, we should use a namespace to disambiguate our use from anybody else's.  HTML pages can define a namespace identifier like devlive and associated it with a URI in a domain that we control, dev.live.com.  Namespaces can be defined in the HTML tag using the xmlns attribute, like this:

<

html xmlns="www.w3.org/1999/xhtml" xmlns:devlive=<dev.live.com/contactscontrol>>

With that definition in place, we can fully qualify our custom HTML tag like this:

<

devlive:contactscontrol
class="ContactsControl"
devlive:privacyStatementURL="privacyPolicy.html"
devlive:dataDesired="name,email">
...
</devlive:contactscontrol>

Notice that we can also use the namespace to prefix the attributes within the element as well, to eliminate the chance that our "dataDesired" attribute collides with another provided by the DOM implementer/browser or other unknown source. 

Also, "devlive" is completely arbitrary.  What makes the namespace identifier uniquely ours is the URI associated with the identifier.  We may use "devlive" for the identifier, but somebody else may use "WL" instead on their HTML page.  As long as the identifier is defined using the URI of record, and the identifier is used consistently within the document, it will allow our JavaScript library code to locate "contactscontrol" tags that the HTML author intended to be associated with our Windows Live behaviors and ignore anything else.

That's what we would like to use namespaces for in HTML.  However, as far as Gallo and I have been able to determine, it doesn't actually work.  (If I've overlooked something major, please let me know!)

Everything looks fine at the HTML level.  None of browsers complain too much about the custom tag, with or without the namespace qualifier. 

Namespace support was added in the W3C DOM level 2 spec, along with several new methods that add a namespace param alongside tagname.  element.getElementsByTagNameNS(ns, tagname), for example, sibling to the DOM 1 element.getElementsByTagName(tagname).  The old methods still work in DOM Level 2, but since they don't recognize namespaces they may return different results than the newer namespace functions.  If you're writing code to support namespaces, use only the namespace functions. Don't mix calls to NS and non-NS functions.

Things start to unravel when you start working with the DOM in code. 

First off, IE doesn't support DOM level 2 at all (IE6 nor IE7).  That's not such a huge problem, since I can write my own rudimentary NS functions to backfill in IE.

Next, Firefox claims support for the DOM level 2 NS functions, but when you give them a spin in your HTML page, they don't work.  The functions are acknowledged to exist, but they don't recognize your namespace.  The sample provided in the Mozilla online documentation for element.getElementsByTagNameNS fails to locate any of the <P> elements if you copy the sample into an .html file on your own http server.

A clue to where things are breaking down is in the section title of the W3C DOM level 2 spec cited above: "1.1.8 XML Namespaces".  Namespaces are an artifact of looking at HTML through an XML lens.  Firefox effectively disables its DOM level 2 namespace functions unless the document is an XML document.  And the icing on the cake is that there is no way for a document to self-declare that it is a valid XML document.  The only way to tell the browser to parse a page as XML is to modify the web server to send the page with a different Content-Type in the http header.

Ok, so to test this idea, copy the .html file on the web server to .xml.  Just as all web servers have a preconfigured MIME type mapping the .html file extension to a Content-Type of text/html, many also have a MIME type mapping .xml to text/xml.

This almost works, except for the small detail that CSS and JavaScript content blocks use characters that confuse XML. If you go as far as commenting out your JavaScript and CSS blocks with CDATA, the NS functions in Firefox will suddenly start working.

In order to "do the right thing" in HTML, you have to leave HTML for the spartan life of XML, and disavow all knowledge of JavaScript while lying through your teeth?  Well that sucks. 

It's not quite that bad, though.  Schillmania tipped me off to another Content-Type option.  Firefox will also kick into XML parsing mode when the web server returns a page with a Content-Type of application/xhtml+xml.  You can do that by defining a MIME type map on your web server to map, say, .xhtml file extensions to application/xhtml+xml content types.  XHTML is slightly more forgiving than XML, so it at least removes the CDATA comment requirement on your JavaScript and CSS code.  It would be nice if the Mozilla docs for the DOM level 2 NS functions included this critical little bit of information with the sample code that requires it.

I had the pleasure of sharing more than a few lunches with Ian "hixie" Hickson during my stint at Google and talking about absolutely everything except work.  Every time I try to put a label on what Ian does I get it wrong, so let's just say he's deeply involved with the evolution and formulation of "web standards".  Who he works for, I don't know (He sat in our group but didn't report to our manager).  What standards bodies, I don't know.  What standards, I don't know.  But he's really good at it!   

After Gallo and I figured out what was going on with Firefox and the NS functions (that it was an HTML vs XHTML/XML document distinction), I was suddenly aware that one little tiny brain cell had been jumping up and down the whole time, trying to tell me to quit farting around with all this empirical crap and just go ask hixie.  Or at least, hixie's web site.

Sure enough, hixie has the topic covered in spades.  All the reasons you shouldn't let XML or XHTML loose on the web as HTML content, why you shouldn't mix XHTML and HTML, and why we're all basically screwed until DOM level 2 browsers reach ubiquity.

Hixie also makes a strong argument that XHTML is not HTML, and that numerous subtle differences exist in the semantics of each.  Even if we could return content-type = application/xhtml+xml only for browsers that support it, we would still have to support non-namespaced HTML for the browsers that don't.  Oh, and don't forget that the web server that has to return application/xhtml+xml is the third party host page's server, not the Windows Live server.  All pain, no gain.

The net of all this is that I'm now advocating internally that we step back from using namespaces as the recommended practice for HTML markup for third parties using our control.  We'll just bob around in the HTML tag soup for the time being.  Namespaces are the right thing to do, but the requirements are too high and the coverage too spotty to build upon it in a broad-reach platform right now.