Why VS 2003 keeps changing your HTML and what you can (and cannot) do to it.

Why VS 2003 keeps changing my HTML when I switch beween Design and HTML views? Is it a bug? Can it be fixed in a Service Pack?

We get bugs opened by PSS folks on customer complaints pretty much every other week. What's the problem? Here is a bit of history. Back in Visual InterDev 6.0 (it was version 2 or VID, actually), we started using Internet Explorer core engine (MSHTML.DLL, aka Trident) in editing mode for our Design View. That was IE4 at that time. Why did we do that? Well, HTML rendering is not a simple piece of code, especially when you add styles. Writing our own editor compatible with IE rendering would be prohibitevily expensive. FrontPage Editor did not match IE4 at that time and we needed design surface that would be WYSIWYG with IE. It had to support everything that IE did.

Now lets think about how browser works. It obviously has a parser. Most parsers are based on tokenizer (lexer) and some sort of grammar analyzer (yacc). What is the first thing pretty much every tokenizer does? It discards all the whitespace, indents and carriage returns since for the language syntax they are irrelevant. Even when lexer does keep the whitespace (such as in textual content), tokens typically do not carry information where they were located in the original document. The stream of tokens and the resulting element tree does not keep relation to the original source file. Hence, when you switch views and MSHTML.DLL persists HTML back to the text document, it does not keep the original formatting since knowledge of it is long gone. It simply rewrites the document. Capitalization changes, your formatting and indentation are gone. You can observe the same effect in Web Matrix, which uses MSHTML.DLL as well in its raw form.

VS 2003 actually has a piece of code that attempts to match new HTML to the old one and in many cases it does relatively good job. There are many cases though, when it doesn't. Same piece of code exists in VID6 and VS 2002. It was improved between VID6 and VS 2002, but remains pretty much unchanged in VS 2003. VS 2002/2003 somewhat mitigates the issue a bit by applying pretty formatting each time you switch views. The problem is that the formatting has very few options that you can customize. You do can switch formatting off in Tools | Options, but it will not solve the underlying issue, it will only switch off pretty formatting. You can ease your pain a bit by installing HTML Tidy or any other third party HTML code formatter that is very customizable and tweak it to your taste. You can then hook it up to VS so HTML Editor will use your formatter instead of the internal one. Have a look at this article. Todd works in my team, btw. Even that VS will continue reformatting your code on every view switch, it will be doing it to your taste so you may perceive that formatting actually does not change.

We deliberatey decided to stop tweaking the old code so we changed pretty much nothing in VS 2003. Instead, we abandoned old approach and invested our time and resources into development of completely different piece of code that should be able to preserve user formatting all the time, no matter what. The result is what you see in Whidbey. We hope you like it better. The basic idea was to stop trying to restore formatting in the new HTML and instead detect incremental changes. In Whidbey we never directly use the HTML that MSHTML outputs. Instead, we transfer changes from it to you document. Therefore, if you only changed one attribute, only its new value will be applied to the original file, everything will be left alone. You now can probably guess that it cannot be retrofitted into VS 2003 since it would require significant changes in the code base, well beyound what is typically acceptable for service packs.

Now the last question you might have: why IE team could not fix the issue? They could, but it would hurt browsing performance and would make HTML element tree much larger since it would need to store all the whitespace information. There are multiple issues with whitespace such as figuring out where it belongs and should it be removed when element is deleted or should it be moved with the element when it moves in the tree. Since download size and speed of opening pages were very important, the idea did not get through.