Word Blog HTML Quality

Every post you make as a blogger can be a huge learning experience. In Friday's post on the new Word 2007 blog post authoring feature I made a fairly modest claim that the HTML emitted by the feature would be better than the standard HTML from previous versions of Word. Well, it is, but I should have looked at the source code before I told everyone else to do so. There were a few problems:

  • Blog service vagaries

There were a number of issues introduced by Community Server (the blog system on which blogs.msdn.com is built). The upper case tags are the major example of that problem. Also, a number of issues were pointed out around the template for the site. The template is one of several standard CS template and, of course, had nothing to do with the HTML emitted by the blog feature (ID attributes with strange values are an example of this).

  • My stupid HTML mistakes

As I stated, I hand coded the image tags and I made a couple stupid HTML coding errors. Luckily I will not be shipped in the box with the feature and our developers will output it correctly.

  • Real problems we need to address

Also, several people made great suggestions for improvements that we want to look into. An example of this is needing to use <del> for strikethrough and the need to have proper tag content flow.

Goals

Most importantly I'd like to lay out the goals for the HTML output.

  • We will hand off valid XHTML for each post

We can't be held responsible for what Blogger, Spaces or anyone else does to the XHTML after we give it to them, but we'll send it to them as valid XHTML.

  • Clean HTML is more important than visual fidelity

This is a huge change for Word. Our focus has always been ensuring (as much as possible) that the HTML we output would result in full round trip of all the content and formatting in your document. The blog feature is all about representing what we can in a clean way without any special action/decision point on the part of the post author.

Suggestions welcome

With these two goals in mind, I would like to announce that we will post our the details of our XHTML output for public comment. The manner in which we do this (blog, discussion list, wiki, or something else) will be announced early next week. We can't promise that we will respond to all suggestions, but we will seriously consider them.