HTML editor based on Gecko

This weekend I downloaded NVU which is HTML editor based on Gecko engine.I played with a bit and quickly figure out that it exhibits the same problem as editor based on MSHTML: lack of preservation of user formatting and lack of XHTML compliance. Which confirms my statement that it is difficult to build HTML editor based on a browser engine without extensive additional measures since browsers don’t care about user code formatting.

Typically default editing mode of the browser engine mainly targets e-mail client which does not need to be XHTML-compliant and neither it needs to preserve user HTML formatting since e-mail authors typically never want to edit the markup. For that very reason browser-based editors don’t have natural meansof editing external stylesheets since HTML e-mail should not contain them. In fact, even inline styles may not be rendered correctly so it is safer to output HTML 4.0 with formatting tags as opposed to XHTML with external stylesheets.

Comments (15)

  1. orka says:

    Yes, there is no XHTML output, but the HTML 4.0 output is correct HTML. You can validate it at Furthermore the Internet Explorer 6.0 can’t parse correct XHTML code (ex: <object> and java:). So long there is no new spec conform version of Internet Explorer you have to use HTML 4.0 or an other browser.

  2. Visual Web Developer 2005 has HTML, XHTML, XML & CSS 1, 2, 2.1 validation built in.

  3. Of course it has, my team owns that part of it 😉

  4. Ovidiu says:

    I have two questions:

    1. Why do you have to build a designer based on a browser? I really don’t think it would have been impossible for you to make ASPX pages _pure_ XML. Yes, with CDATA sections for script and everything. This would have made everybody happy – developers who don’t want their code screwed up, developers who want their pages to be automatically validated according to a DTD or an XML Schema, developers who just want to drop the schema in some folder and automagically being able to use it, developers who want XHTML pages (or even HTML standard-compliant pages).

    2. Ok, this is *BY DESIGN* (and Won’t Fix, I assume) for VS 2003. I read your previous posts, I got the reasons, all clear. But even so, how comes VS 2003 even screws up &quot; (and other escapes) and transforms them into " and such in the HTML source? Doesn’t this make either VS or MsHTML a bit… buggy?

  5. The answer to #1 is that customers want WYSIWYG editor and that means the editor has to render the page the same way browser does. Alternative is to reimplement rendering engine, that is what FrontPage and DreamWeaver do. This path is quite expensive. As for ASP.NET pages as pure XML I don’t necessarily disagree, but we have to consider migration path and time that people spent leasning classic ASP.

    As for #2, bug is a relative term :-). In the example you have given, page rendering does not change, so techically it is not a bug. Also, if you use UTF-8, you may not have to escape special characters, you can just type them. VS 2005 will not necessarily preserve escaped symbols either, sorry. So far we haven’t seen any complaints on the MSDN Product Feedback site, but we are still gathering it. If it is important for you, please report the issue.

  6. Ovidiu says:

    Thanks for the reply, Mikhail. However, I disagree with you on both matters:

    1. As far as I remember, FrontPage has three views – a designer view, implemented with it’s own rendering, a source view and a "what you get" view. Of course, the "what you get" view is easy to implement in FrontPage since it deals with static HTML.

    Honestly, I would have preferred a similar approach in Visual Studio – the designer view would merely be a canvas for me to drop controls on and to change their properties. No fancy rendering, just a basic idea of how the server-side control properties would affect the look and feel. That _can_ be done by hand (for example, don’t tell me that MSHTML interprets <asp:stuff> elements on its own – you render those controls anyway).

    The "what-you-get" view could be based on MSHTML and would be obtained by rendering the server-side controls to HTML and then sending the results to MSHTML. More important, this would be read-only, in the sense that the developer wouldn’t be allowed to change anything, just preview; also, MSHTML wouldn’t be allowed to make changes on its own.

    Of course, this is just my fantasy world, since the issue seems to be dead and buried for Microsoft.

    2. If you want to get " in the rendered page, you’re supposed to have &quot; in the source. It’s called an escape sequence. Just because IE allows for " to appear in the source HTML where &quot; was expected, doesn’t mean that the source HTML is valid. Initially, when I saw that VS keeps changing &quot; to " I thought it might be doing HTMLEscape at the very end of the rendering process, but checking the HTML the browser received proved otherwise.

    And honestly, I think people don’t complain about it because they ignore this issue and similar ones (browsers usually deal quietly with malformed HTML) or because they have workarounds, not because they’re happy with the feature.

  7. Mikhail Arkhipov (MSFT) says:

    I am not sure what you mean as ‘dead issue’ 🙂 What you have described is close to how designer actually works. ASP.NET control do render to design time HTML and MSHTML does not change it… As for developers, many things are too tedious to edit in markup (data based controls and templates come to mind) so I doubt average developer would support readonly view 😉 New Whidbey ‘chrome’ (ie tasks menu) on server controls in Design view is very popular feature.

    As for escapes, we, in fact, preserve them 99% of the time (I just edited a simple page in VS 2005 with &quot; in the text and it stays escaped. What I meant was *exact, 100%* preservation. Some cases are too ambiguous. Example:


    switch to design, select first two characters, copy and paste then elsewhere.What you’ll get in source is ‘aa’, not a&#97;.

  8. Ovidiu says:

    By "dead issue", I understand a problem that is alive and kicking but seems to be ignored and there’s no hope of a real fix for it 🙂

    I have the &quot; problem constantly in my projects. I write <quote>Click "Done" when finished</quote>, it should be <sourcequote>Click &quot;Done&quot; when finished</sourcequote> in the source view. However, it keep getting changed to <sourcequote>Click "Done" when finished.</sourcequote>.

    I’ll investingate whether it’s a configuration-specific issue. I’ll also test it on the Whidbey beta. If I’m able to reproduce it constantly, I’ll submit it as a bug.

    Thanks for your time.

  9. Mikhail Arkhipov (MSFT) says:

    I think I got lost. Which problem? MSHTML does not preserve user formatting? But it, in fact, shouldn’t. The blog post is about that Gecko does not do it either which means there are good reasons why browsers don’t do that (performance is one of them). It is convenient, but difficult to build designers based on browsers. I don’t think browsers shoul be fixed though :-).

  10. Ovidiu says:

    The problem is that Visual Studio is unable to emit not XHTML, but even valid HTML, and using MSHTML in the designer is a part of that (not the whole story, but an important part).

    It’s quite obvious that a browser rendering engine shouldn’t care about white spaces and pretty code formatting, but then don’t use it in the designer.

    Which brings us where we’ve been before: Please build a good designer. From here on we’re on an infinite loop.

  11. Darrel says:

    <em>The answer to #1 is that customers want WYSIWYG editor and that means the editor has to render the page the same way browser does.</em>

    Hmm…DW seems to be both a WYSIWG editor AND it doesn’t touch any of my own markup.

    <em>Alternative is to reimplement rendering engine, that is what FrontPage and DreamWeaver do. This path is quite expensive. </em>

    Doen’t you have access to FrontPage code? As for being very expensive, MS seems to have a bit more cash on hand than Macromedia. You’d think Bill could find some loose change in his couch to pay a few developers to handle this, don’t ya think?

    I love the concept and back-end functioning of .net. My single biggest complain with is the fact that I simply can NOT produce a web site that uses standards compliant, valid, semanticly correct, accessible markup UNLESS I completely write custom replacement controls for every built in webcontrol in…which doesn’t seem terrible practical.

    I wanted to make a two-column datalist. If I have an odd number of records, I end up with a row with only one TD. This is just one example of such a simple and obvious error in markup .net produces. Very frustrating.

  12. Mikhail Arkhipov (MSFT) says:

    Unfortunately, there is no such thing as ‘loose change’. Company divisions do not have access to an unlimited supply of money, there is a budget for every one of them. After all, Microsoft is a public company and has to be responsible to its shareholders.

    In my personal opinion, having money is not a reason to start spending them left and right. At least I personally don’t rush to buy something that I suddenly can afford 🙂

    Even having access to another application code, it is not neccessarily easy to reuse it. Sometimes it may be as hard as writing your own version since architectures or large applications may be vastly different which yields incompatible code.

    As for controls generating non-standard compliant HTML, there was significant amount of work invested in Whidbey. If you still see issues in VS 2005, please file them on MSDN Product Feedback site.

  13. Derek Williams says:

    Well, I have Whidby, but since it is not in production yet I am not permitted to use it for any of our production code. Silly management.