' is in XML, in HTML use '


I just got hit by a very confusing "by design" behavior and it took me a while to figure out what’s going on.

Here is the line of code:

    text = System.Security.SecurityElement.Escape(text);

This method replaces invalid XML characters in a string with their valid XML equivalent.

The problem that I had is that when escaping some VB code using this method and then pasting it into Windows Live Writer, VB comments ‘ became '.

Well, it turns out, XML supports ' to denote the apostrophe symbol ‘. However HTML doesn’t officially support ' and hence Live Writer "HTML-escaped" my already "XML-escaped" string.

Solution:

    text = System.Security.SecurityElement.Escape(text);
    // HTML doesn't support XML's '
    // need to use ' instead
    // http://www.w3.org/TR/html4/sgml/entities.html
    // http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2005-October/004973.html
    // http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
    // http://fishbowl.pastiche.org/2003/07/01/the_curse_of_apos/
    // http://nedbatchelder.com/blog/200703/random_html_factoid_no_apos.html
    text = text.Replace("'", "'");
Comments (5)

  1. Wow, that’s a good one, didn’t know about it! Thanks Thijs!

  2. Santosh says:

    thanks!

  3. Shehim says:

    Cheers.