Why VS 2003 keeps changing your HTML and what you can (and cannot) do to it.


Why VS 2003 keeps changing my HTML when I switch beween Design and HTML views? Is it a bug? Can it be fixed in a Service Pack?


We get bugs opened by PSS folks on customer complaints pretty much every other week. What’s the problem? Here is a bit of history. Back in Visual InterDev 6.0 (it was version 2 or VID, actually), we started using Internet Explorer core engine (MSHTML.DLL, aka Trident) in editing mode for our Design View. That was IE4 at that time. Why did we do that? Well, HTML rendering is not a simple piece of code, especially when you add styles. Writing our own editor compatible with IE rendering would be prohibitevily expensive. FrontPage Editor did not match IE4 at that time and we needed design surface that would be WYSIWYG with IE. It had to support everything that IE did.


Now lets think about how browser works. It obviously has a parser. Most parsers are based on tokenizer (lexer) and some sort of grammar analyzer (yacc). What is the first thing pretty much every tokenizer does? It discards all the whitespace, indents and carriage returns since for the language syntax they are irrelevant. Even when lexer does keep the whitespace (such as in textual content), tokens typically do not carry information where they were located in the original document. The stream of tokens and the resulting element tree does not keep relation to the original source file. Hence, when you switch views and MSHTML.DLL persists HTML back to the text document, it does not keep the original formatting since knowledge of it is long gone. It simply rewrites the document. Capitalization changes, your formatting and indentation are gone. You can observe the same effect in Web Matrix, which uses MSHTML.DLL as well in its raw form.


VS 2003 actually has a piece of code that attempts to match new HTML to the old one and in many cases it does relatively good job. There are many cases though, when it doesn’t. Same piece of code exists in VID6 and VS 2002. It was improved between VID6 and VS 2002, but remains pretty much unchanged in VS 2003. VS 2002/2003 somewhat mitigates the issue a bit by applying pretty formatting each time you switch views. The problem is that the formatting has very few options that you can customize. You do can switch formatting off in Tools | Options, but it will not solve the underlying issue, it will only switch off pretty formatting. You can ease your pain a bit by installing HTML Tidy or any other third party HTML code formatter that is very customizable and tweak it to your taste. You can then hook it up to VS so HTML Editor will use your formatter instead of the internal one. Have a look at this article. Todd works in my team, btw. Even that VS will continue reformatting your code on every view switch, it will be doing it to your taste so you may perceive that formatting actually does not change.


We deliberatey decided to stop tweaking the old code so we changed pretty much nothing in VS 2003. Instead, we abandoned old approach and invested our time and resources into development of completely different piece of code that should be able to preserve user formatting all the time, no matter what. The result is what you see in Whidbey. We hope you like it better. The basic idea was to stop trying to restore formatting in the new HTML and instead detect incremental changes. In Whidbey we never directly use the HTML that MSHTML outputs. Instead, we transfer changes from it to you document. Therefore, if you only changed one attribute, only its new value will be applied to the original file, everything will be left alone. You now can probably guess that it cannot be retrofitted into VS 2003 since it would require significant changes in the code base, well beyound what is typically acceptable for service packs.


Now the last question you might have: why IE team could not fix the issue? They could, but it would hurt browsing performance and would make HTML element tree much larger since it would need to store all the whitespace information. There are multiple issues with whitespace such as figuring out where it belongs and should it be removed when element is deleted or should it be moved with the element when it moves in the tree. Since download size and speed of opening pages were very important, the idea did not get through.


Comments (59)

  1. Guess what, rendering depends on the way html source is formatted. Quick example:

    xx.html:

    <table cellpadding="0" cellspacing="0" border="0">

    <tr><td bgcolor=blue>

    <img width=40 border=0 height=40 src="image.gif">

    </td></tr>

    </table>

    yy.html:

    <table cellpadding="0" cellspacing="0" border="0">

    <tr><td bgcolor=blue><img width=40 border=0 height=40 src="image.gif"></td></tr>

    </table>

  2. I mean, if it is not clear readable in the comment – the second example has its tr element written in one row, and the first does not (seems extra spaces – .text’s fault).

  3. Mikhail Arkhipov (MSFT) says:

    That’s why Whidbey code formatting options respect whitespace rendering rules and ignore settings that will affect rendering. TD is one good example.

    However, there are cases when people don’t care about couple of additional spaces even in TDs so we are still debating if we should add a switch. Something like ‘ignore whitespace significance’.

  4. There are tabs or spaces in td and img in my post, they are not supposed to be. The only difference between examples is carriage return in the 1st source. In the second example tr is written in one row.

  5. Scott says:

    My bug doesn’t have anything to do with the layout. When you switch back from design view to markup view, the editor will attempt to fill in any missing elements it find. For example closing tags. When it fills in the missing tags, they are always in upper-case, even if you have explicitly set "lower case tags" in the HTML options for VS.

    Filling in the missing elements isn’t a big deal, well it is if you are creating user controls containing HTML fragments (say the top 1/2 of an HTML table) but the fact that it ignore the user preferences is troubling. What other options is it ignoring? Why is it ignoring the options at all? Is it a bug in the code that reads the options or does the HTML formatter just forget to check the options?

    Changing the formatting is annoying, changing the case can break you HTML if you’r trying to write XHTML or using a doctype like strict-4.0.

  6. Mikhail Arkhipov (MSFT) says:

    May be a bug or, as I said, it might be one of the cases which is not covered in the VS 2003 and earlier formatting preservation. Strictly speaking, if one has to write a bunch of new code to fix an issue, it is dfficult to qualify it as a bug πŸ™

    VS 2003 is not XHTML compliant anyway, it will ‘fix’ <BR/> and make it <BR>.

    I would recommend trying to hook up HTML Tidy.

  7. Saurabh Nandu says:

    great open post !

  8. Snorrk says:

    Excellent post.

    I’ve been wondering about this for a long time and already heard the legacy-code-issue reason. Getting the full scoop like this is why I read the Weblogs.

    And yes – this has been fixed in VS.NET 2005? Right? πŸ™‚

    >S

  9. Mikhail Arkhipov (MSFT) says:

    Right. It should be much, much better now. Please try Whidbey and tell us if that is not so. My team owns the issue, we really want to fix this and there is still time.

  10. travis says:

    I stopped using design view altogether because of this issue. Now the issue I run into is when VS.NET 2003 alters my code on copy/paste.

    #1 It always lowercases the <!DOCTYPE> tag for some reason, no matter what case the original tag was.

    #2 When I copy/paste <table> and <form> tags it automatically inserts and ID into it.

    #3 When I copy/paste code with IDs that are in the current document, it totally overwrites them, instead of squiggly-red underlining them for me to change.

    It’d be nice if VS.Net could read the <!DOCTYPE> and validate according to that instead of using whatever arbitrary way it validates stuff now. And it could use the current <!DOCTYPE> to show the correct attributes for that version (and show all when in quirks mode).

  11. Mikhail Arkhipov (MSFT) says:

    #1 is fixed in Whidbey

    #2 is also fixed. In previous version we used to auto ID everything that is scriptable. Now we only autoid elements that already have ID attribute.

    #3 is still there (see #2). I will file a change request, but there is no guarantee it will get into the product. However, there are two ways we can provide customization: Tools | Options and registry keys. If tools/options won’t have the option in UI, will you agree to tweak a registry key?

  12. travis says:

    As far as #1 and #2, nice work.

    #3 Yeah, I’m fine with the registry hacking, just as long as I can find it easily πŸ˜‰

  13. Hi

    VS 2003 cannot color .shtml files (as html). In order to make it work I need to patch the registry πŸ™

  14. Mikhail Arkhipov (MSFT) says:

    Anatoly, this is fixed in Whidbey.

  15. The doctype problem annoys me as well, but what gets me more is the lack of understanding for any XHTML doctype and the way VS destroys code aimed at XHTML (changing <img /> to <img>, dropping at random </li> and so on).

    So if that stops, great, but are we going to see better XHTML support, or even better a plugable validation engine so as standards get refined we can actually plonk the raw DTD somewhere in the VS directory structure and we’ll have validation at design and compile time?

  16. Bob Brinker says:

    #2 & #3 drive me crazy. if you can fix those, i know my team would be begging for the upgrade.

    another ‘issue’ i have run into is when i am editing a code behind file and use the horizontal divider, often times a copy/paste operation will change the location of the editor in the ‘other’ window section. for example, i open home.aspx.cs and split the page with the horizontal divider. i scroll to a section of code in the top 1/2 of the window and to a different section in the bottom 1/2 of the window. when i do copy/paste operations, often times the split window i am **not** working in shifts when i ctrl-v to paste. not a BIG deal, but mildly annoying.

    btw – great blog.

  17. KevMar says:

    Talking about the way IE handles the HTML. If I create a XHTML page and via client side code, output the body.innerHTML, it is nolonger the same. all attributes that dont have special characters loose their quotations and all tags are capitalized.

  18. Eric Newton says:

    Another annoyance is the schizophrenic Page/Register directives that VS simply LOVES to flip flop… causing a lot of UNNECCESSARY VSS check out requests

  19. Todd Brooks says:

    Does Whidbey bring back the functionality in VS.NET 7.0 where you could view ATL Server stencil files (SRFs) in Design View? That was removed in VS.NET 7.1 due to some unspecified bugs with rendering…I never ran into it but I REALLY miss being able to view my HTML in design view in the SRF files.

  20. Mikhail Arkhipov (MSFT) says:

    To Barry:

    Whidbey is XHTML compatible, so <img /> stays as such. Actually, we generate XHTML by default, so if you drop a button from toolbox, you’ll get <input type="button" />. Speaking about validation, we do provide XHTML 1.1 Strict and XHTML 1.0 Transitional schemas. However, another option is to open XHTML file in the new XML editor that is based on System.XML and is able to validate against DTD so you can directly use W3C DTDs if you wish.

    To Eric: I believe this is not an issue anymore (inWhidbey)

    To Todd: no, it doesn’t. Unfortunately, ATL Server team has chosen {{ }} syntax that standard HTML parser such as one employed by IE does not recognize. At best you’ll see {{ }}, at worst you may lose them. Can you elaborate a bit more how do you expect Design view to render {{ }} expressions?

  21. Todd Brooks says:

    I’m not asking for it to render what is in the tag handler {{ }}, what I’m asking is for the editor to allow me to view my SRF in Design View. WIth VS.NET 2003 (7.1), the ATL Server team REMOVED the ability to switch from code and design view for SRF files, even though they are standard HTML markup. Before, with VS.NET 2002 (7.0), you could view your SRFs in Design View, and they would just put the tag handler inline with the HTML. But this functionality was removed with 2003. You can’t switch to Design view at all anymore, it says that the editor won’t allow SRF viewing. Which means I have to create/edit my SRFs outside of VS.NET, which is ridiculous.

  22. Daniel Stolt says:

    I’ve been following with great interest the conversations about the HTML formatting functionality of VS.NET 2003 vs. Whidbey. You said you wanted us to let you know if this stuff isn’t working properly in Whidbey, and that there’s still time to fix remaining issues, so here goes.

    Maybe this has been corrected after the March Community Preview drop, but the formatter doesn’t seem to respect the line break settings.

    For example, the "title" element has the default line break setting "before opening, within and after closing". According to the preview, this should make the tag look like this:

    <title>

    Content

    </title>

    However, after applying formatting, it ends up like this:

    <title>

    Content</title>

    The same goes for "div" elements and probably a whole bunch of other elements with this setting too.

    Another thing that worries me is that the editor inserts "div" elements by default instead of "p" elements, when you edit a page in designer mode. Is there a setting to revert this behaviour? If not, I think there should be. Some of us still appreciate what you can do with traditional paragraph-based page layouts.

    By the way, I think it’s really super great that you guys have implemented tag specific formatting settings in Whidbey, á la FrontPage 2003. Let’s just hope it works a little more predictably than the FrontPage version. πŸ™‚

  23. Mikhail Arkhipov (MSFT) says:

    To Todd: I found VS 2003 bug that caused us to disable Design view for ATL Server files. Actually, it was my team (and me personally who disabled it). Here what the bug says:

    —————————————————-

    Repro:

    1. open a .srf page in the editor with srf tags that include such things as parameters that include a long database connection string that cannot have newlines inserted into it

    (opens in design view by default)

    2. switch to HTML view. Notice that the file contents have changed

    3. deploy the .srf file and try to access the page from a browser

    Actual:

    not very useful HTTP 500 error message

    Expected:

    Autoformatting not to break .srf page functionality or to be turned off by default.

    —————————————————-

    However, it seems the issue may not apply anymore since a) we do not autoformat on view switch and b) new formatter is better than the old one. I’ll see if we can enable Design view back.

    BTW, VS 2003 workaround is to change SRF extension to HTM or copy/paste content between temporary scratch HTML page and the SRF file.

  24. Todd Brooks says:

    Yes, the workaround is similar to what I’m doing now (viewing and editing via UltraEdit in this case). Not as nice as with VS.NET, since I lose the IntelliSense. I would really like to have it enabled again in VS.NET.

    Any chance of there being a quick reg fix or something I might be able to do with VS.NET 2003 to get this functionality back, even if it is not supported by Microsoft?

  25. Todd Brooks says:

    I hope you that you don’t mean I will have to wait until Whidbey ships….by that timeframe, I might have moved some of my presentation code to .NET 2.0 and ASP.NET 2.0, depending on performance.

    Right now, I use ATLS for perf reasons, plus I’m extremely familiar with it, am a C++ programmer at heart and the original design specs were for straight HTML (no ActiveX or Java components). Now that .NET 2.0 has some legs and is becoming more of a standard (along with the runtime shipping on current operating system releases), moving to .NET isn’t the headache it was before for my software and target market. With ASP.NET 2.0 I will get some nice feature-rich front-ends, I just don’t know about performance compared to ATLS.

    Regardless, all of my current development is with ATLS, so not having to copy and paste to an outside browser, or rename all my files to HTM or use temp files in VS.NET would REALLY help me out. I certainly miss not being able to design my SRF in VS.NET.

    Thanks again for the info and like I said, if there is anything that I can do to get that functionality back, even if it isn’t officially supported by MS, such as a use at your own risk solution, I’m game for that. It would make my development much easier and me more productive.

  26. Lorenzo says:

    Can’t you make it possible to hook up Mozilla’s project HTML parser if they provide you an interface between the calls you do in your code and their’s?

    I mean writing HTML for Mozilla brings lesser problems for IE than the other way around.

    Very interesting post anyway. I’ll try the HTMLTidy option in the meanwhile.

  27. Dave McMurray says:

    Interesting reading one comment about how VS fills missing elements, I have experienced this but also the opposite. Coming into work this morning to continue tidying up some old projects, I discovered that VS has kindly removed all the "/li" tags from the ordered/unordered lists. Most unpleasant. I don’t use the design view at all, unless I need to add controls that require more backend code than a simple line of definition.

    From the sounds of some comments, Whidbey is no better. It appears it still changes code, where it should respect the programmers’ work and leave it well alone (even if it means ticking an extra option in the preferences). The "New Age" programmers may need stabilizers, so much for the next generation, but traditional coders like myself rarely need the hindering "help". My guess is the creators didn’t even run VS after it was developed, or they saw the mess it created and thought "Well people these days don’t need to understand what the code does, so why should it be readable?". A lot of the code I’ve seen has had even single tags split in half over 2 lines!

    Of course this is atleast where MS is consistant, perpetuating the future of lazy development, I mean how did MS start again? ;). Maintaining focus on buzz words for business employees to throw around at meetings, instead of concentrating on developing quality software and no, quality is not a flashy interface or some patronising animated cartoon animal.

    For how long will the next gen be rocked to sleep by each days growing memory and GHz ratings? "ooh ooh we can put extra layers between calling a function to say helloworld and the function that actually displays it, because it’s just not simple enough a task yet and we have clock cycles that desperately need burning"

    I have to say that Notepad is still the best web/windows application development tool for the windows platform, backed up with the odd compiler/linker for which ever language you choose.

    Sorry about the rant, but I most problems like the issue of VS reformatting your HTML runs a little deeper than a buggy or untested bit of code. The answer would be to solve the problem, not cope with it.

  28. I have no problem with rants, you should see MY rants that I send to other teams πŸ™‚

    Please have a look at Whidbey yourself. I think your opinion may improve πŸ™‚ As for Noted, do you really use it? I mean, there are free editors that at least are able to show line numbers and colorize markup…

  29. Lorenzo says:

    BTW the tidy HTML solution offered cannot apply to VS.NET 2003 Standard C# Edition (which I own) because it misses the possibility to create Extensibility Projects. Is there any workaround for that?

  30. David McMurray says:

    I will certainly be taking a look at Whidbey, since my work dictates that I use VS2003 and I will have to insist we upgrade πŸ˜‰

    I used Notepad exclusively for HTML since 1994 for my own work, but yes, recently I have moved on to the Crimson editor which provides both line numbers and colourised markup. Although editing multiple documents still throws me as I continually try to alt-tab between documents, but old habits die hard.

  31. Pooja says:

    Good entry.

    Answers one very often asked question.

    Its a lot better to see the same problem aftee you know the reasons than otherwise πŸ™‚

  32. peterhcy says:

    I think i see waht you are saying,and it useful for me

  33. paull says:

    The problem where you paste a piece of HTML code and the IDE will reset the ID and NAME tags is a real problem. Its bad enough having the VB VS2003 IDE messing up the indenting on all your VB code indenting, but to actually take a piece of code and change it during paste with no option to stop it doing that is ridiculous. We are taking a direct hit on code quality and productivity. There’s lots of circumstances where you need two controls with the same ID and you might just be pasting from an old block to a new block (intending to delete the old block later). The last thing you want is your ID and NAME tags reset, especially to something as useless as "Text1"

  34. paull says:

    What about a "leave my code alone" option ? Where the IDE does not change any code under any circumstances – guaranteed. Thats all we want.

  35. Mikhail Arkhipov says:

    Paul, you can switch off VB intenting and pretty formatting in Tools | Options | Text Editor | Basic | VB Specific.

    Autoid is different in Whidbey, it only changes id when element already have one, it never adds IDs. If you want it completely off, please submit feedback on MSDN product feedback site so it gets filed as a bug or work item.

  36. tjturner says:

    I’ve been very frustrated with my html being changed.

    Vs removes from .aspx my style tags where:

    width=’70px’ style=’width: 70px;’> ‘for Netscape, etc.

    After I compile the style is gone?

    width=’70px’>

    Frustrating.

  37. paull says:

    >> you can switch off VB intenting and pretty formatting in Tools | Options | Text Editor | Basic | VB Specific.

    Actually some of us want to have capitalisation and white spacing as in vb6. The 2003 auto indenting doesnt work properly e.g. in select case statements it will tab out on an if and not tab in again on an endif (I’ve had it completely trash the indenting on a 10,000 line .vb module and it couldn’t be fixed except by segmenting into 10 smaller modules and then pressing tab with the code highlighted).

    The tools options thing should give separation of each feature, not a single option covering all.

    I think you should guarantee to all programmers that all your development tools do not change code. We just don’t want it.

    What vb6 did was OK, we all got used to it and it worked well. It wouldn’t be so bad if the .net 2003 ide was properly thought out and tested like vb6 – it wasnt.

    Taking out html element and changing ID and NAME element values when you copy and paste is a load of rubbish – who would ever want that ?

    I will put some stuff in the feedback.

  38. Mikhail Arkhipov (MSFT) says:

    I will forward your notes to the VB team, but please file the request on MSDN Feedback site since then it will be entered as official bug. Thanks!

  39. There have been many online discussions about how Visual Studio messes up the formatting of HTML source code, I must admit I have been involved in a few of these, including Mikhail Arkhipov’s Weblog for instance. Which explains that the…

  40. Bryan says:

    This problem is a reall pain. Rumor has it that it’s been fixed in VS 2005, but I don’t want to have to upgrade just for a bug fix.

    There must be something easier that can be done…

  41. Here is patent that we filed on the method, which we invented in order to solve that old known problem…

  42. anon says:

    Just fix it – please. We don’t want to upgrade. The reformatting kills productivity; there is no way this should have been released. An otherwise great product is definitely marred by this problem.

    Remember: coding for the web is all about editing html.

  43. mike says:

    I like the visual effect of Visual Stuido 2003 on my HTML code. However, I am upset with the fact that VS2003 does not keep my original formatting. I try to program every web page in compliance with XHTML, but VS keeps removing my original formatting.

    For example, VS is taking out my closing slash for tags with no end tags (i.e. it changes <br /> to <br>, and <img /> to <img >).

    I think I’m going to switch to Crimson Editor or even Notepad.

  44. Joe says:

    As nuts as it may sound, I loved the fact that VS "standardized" HTML formatting. If you are fortunate enough to work in a small group where you are both the web designer and the server-side programmer and you design the web pages yourself, I can see how letting VS handle the formatting could be frustrating. But for those of us who work in commercial software companies (by that I mean mid-sized companies whose revenues are derived solely from packaged software they sell), the web designer and the developer are two different people. The designer often puts something together in Dreamweaver and gives it to you so you can wire it up to the database and add the server side stuff. Problem is, no two designers format their HTML exactly alike, which in itself is an issue.. But imagine DesignerJoe creates a significant portion of the site, which I implement in ASP.Net then leaves the company. DesignerBob comes in and reworks the design by snagging the HTML off the site itself. He runs it through Dreamweaver and gives it back to me to "wire it up". I know have to deal with HTML that looks completely different and have to re-adjust my eyes from being used to DesignerJoe’s formatting to DesignerBob’s particular style.

    BAH! Major pain. VS.Net takes care of all that for me. It might not be the prettiest HTML in the world but at least it is standard and consistent.

    Some of you might be saying "with well designed controls you should have cleaner separation than that"… but the reality is that some of us are still working with large amounts of classic ASP that have been ported to .Net, and/or we must write code in such a way that it is not 100% dependent on MS technologies (per management orders).

  45. Jibran says:

    Completely by accident I found, one day, that switching to design view threw me an error: "Could not open in design view. Quote values differently inside a β€˜<%…"value"…%>’ block." I was ecstatic. DESIGN VIEW FAILED TO OPEN! YES! YES! YES! YES!

    I had incorrectly formatted a C# server side script block. This is what I did:

    <input type="hidden" id="price" value="<%# MyFunction() %>"/>

    Notice it’s just a regular html hidden input. The value attribute is dynamically populated and wrapped with double quotes ("). VS .NET design view doesn’t like the double quotes around a script block but at run-time it doesn’t matter. Single quotes work just fine here, however design view works in this case.

    So if any of you are like me and NEVER use design view this is a sure fire way to avoid crappy formatting of your markup.

    Respectfully,

    Jibran

    jibran@thefro.com

  46. Alexander says:

    I need your help. I am using IE 6.0 with Windows XP Home Edition. Trying to use HTML Editor under IE to modify a webpage created by another program. I am able to move object on this html page except to animated objects. (I believe they were created with JAVA). Is there a Tools/Options/Advanced feature that needs to be turn on to enable the editing of animated objects? How can I make it work?

    Please email a copy of your posted reply to aasinc@hotmail.com

    Thank you.

  47. Vishal says:

    Help!

    I’m facing the same problem as Jibran. In HTML view I changed the code to look like below and now I am not able to go to design view without reverting my changes:

    <link rel="stylesheet" href="<%= (string)Application["CSSBasepath"] + (string)Session["SelectedCss"] %>" type="text/css">

    I tried changing the double-quotes to single quotes but it did not help!

    Please help,

    thanks,

    -Vishal

  48. Inner quotes must be different from outer quotes: either " outside and ‘ inside or the other way around:

    <link rel="stylesheet" href='<%= (string)Application["CSSBasepath"] + (string)Session["SelectedCss"] %>’ type="text/css">