Whitespace significance. To standard or not to standard or ‘don’t change my HTML again!’


Have a look at this bug. In fact, it is not that we again munge your HTML. It is that we are trying to preserve whitespace and ensure correct page rendering. Why? Let’s look at the standard first.

B.3.1 Line breaks

SGML (see [ISO8879], section 7.6.1) specifies that a line break immediately following a start tag must be ignored, as must a line break immediately before an end tag. This applies to all HTML elements without exception.

The following two HTML examples must be rendered identically:

<P>Thomas is watching TV.</P> 
<P> Thomas is watching TV. </P> 

So must the following two examples:

<A>My favorite Website</A> 
<A>
My favorite Website
</A>

As you see, CRLF after open atag and before closing tag are not significant. Well, that’s not quite what happens in a real world… I created the following HTML:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Strict//EN">
<a href="http://www.microsoft.com">
Foo
</a>
<br>
<a href="http://www.microsoft.com">
Foo</a>
<br>
<a href="http://www.microsoft.com">Foo</a>
Here is how IE6 renders the fragment:
 
Here is how FireFox does it:

 
Interesting, but FireFox renders first line differently (note the longer underline). Opera does the same. Which brings the question: should we follow the letter of the standard and treat
 
<a>foo</a>
 
and
 
<a>
foo
</a>
 
as the same or should we follow real world rendering and treat them differently? The problem is that upon switch from Design to Source view some HTML formatting may change which may be perceived as that dreaded ‘oh, no, my HTML is changing AGAIN!‘.


Comments (13)

  1. Dean Harding says:

    So the reason it’s being moved currently is to ensure it renders correctly in Firefox and Opera, is that right?

    Personally, I’d say screw that – stick to the standard, treat them as the same and keep the formatting in-tact.

    I’d say it’s more important to keep the formatting in-tact than it is to make sure the page renders properly on every browser — after all, it’s the problem of the browser developer to ensure the browser is standards-compliant (and failing that, the website developer to ensure everything looks the same on each browser).

    Just my $0.02..

  2. Søren Lund says:

    I say stick with the standard even more so because I actually expect it to behave the way IE renders it.

  3. Julian Harse says:

    It is just a bug which will get fixed and it will eventually get fixed. You are better off just sticking to the standard…

    Interestingly, with an XHTML DOCTYPE, FireFox renders each "Foo" the same.

  4. Text formatting and rendering are different topics. Please keep text formatting as typed!

  5. Jeff Parker says:

    Please Please Please stick with standard.

    BTW please work with the asp.net controls team. Most of the code from the controls simply will not validate. by the w3c validator. XHTML that is. I have never understood why people can go so strictly to the standard of XML but never follow the standards of HTML Google reports 4,285,199,774 web pages that it searches currently. Now Imagine if XML Standards was treated as willy nilly as the HTML standard is. Every XML document would be basically worthless. No real common way to get data in. The Microsoft XML Namespace would be worthless since it would only talk to other microsoft xml documents. It is time to start fixing abused html. It can be fixed and has been clearly defined in XHTML as far as sintax and quotations and case. But MS being the biggest software company right now can only help by forcing it as well.

    Also out of those 4,285,199,774 web pages I would love to know how many of them have had to have multiple versions because of browser not following the standards.

    And one more thing call me crazy but I remember back in the days IE 4 came to life. It was originally the geeks, the developers who started the revolution to IE. Why because IE 4 followed simple standards better than Netscape. There is a big shift now off to firefox, why, they support html standards much better and much stricter. I know when I finally seen IE 4 I personally told several people to upgrade to it. Before all they used was Netscape. There really wasn;t another competing browser IE 3 was terrible. Win the developers by not deviating from standards so we do not have to do more work just to get something simple like html to work properly and you going to come out on top smiling.

  6. b.gr says:

    <Comment mode="Standard zealot">This HTML fragment is not valid.</Comment>

    Anyway, adding the required tags makes no difference (nor the XHTML doctype Julian Harse mentioned) so it is a bug in the other browsers – doesn’t happen often, eh? So, keep with the standard when possible but leave the code as typed 🙂

  7. Stick to the standard and assume that Firefox will correct any rendering bugs it has. And hope that IE will do the same. You should not assume that the standards will change to conform with a browser bug, you should assume the bugs will be corrected to conform with the standard(s).

  8. c.a. says:

    Leave the code as the developer wrote it. Please don’t mess with our code in a way where you think you are helping us out. If you want to include xhtml validators and such in the IDE that would be great, but if a developer writes a conforming page or not is up to the developer ( although I highly stress sticking to standards when writing markup ).

    Nothing is more frustrating, and almost insulting than when the code you just wrote gets changed around because the IDE thinks it’s helping out.

  9. Mikhail Arkhipov (MSFT) says:

    Let me clarify this a bit. Don’t wprry, existing code formatting WILL NOT change. However, new (or heavily changed) markup may be formatted not the way you expect since we may be too careful with white space significance.

  10. bigor says:

    it’s a MOZILLA BUG, Duh! 🙂

  11. Suggestion says:

    Why don’t you let the user decide if they want their code to be reformatted? VS.net gives you the option, and then ignores your selection. I find it hard to believe that you can’t understand why people call that a bug.

    BTW — You could always add a CodeSweeper function, like in HomeSite, and let people manually decide if they want formatting & error checking help.

    $.02

  12. bg says:

    I apologize if this is slightly off topic but I was trying to write an addin to solve the html rewrite issue when switching between design view and html view. it’s no problem to hook the edit > advanced > format document option using the guid:

    {1496a755-94de-11d0-8c3f-00c04fc2aae2}

    and the command id: 319. However this is not the proper hook to get the behavior to stop when the design tab is clicked from the html tab. I was wondering if you could provide me with the guid and command id for that event as I could not find a list of guids and cmdIDs for VS2003. Thanks.

  13. Mikhail Arkhipov (MSFT) says:

    Your HTML is reformatted not because of some formatting code that you cannot turn off. Have a look here:

    http://weblogs.asp.net/mikhailarkhipov/archive/2004/05/16/132886.aspx

    It also provides a link to an articale that describes how to format HTML on view switch if you want to fix it. But you cannot make VS stop modifying it.