XHTML in Word 2007's blogging tool

Today we have a guest writer to discuss the HTML output that we have in the new blogging functionality for Word 2007. His name is Zeyad Rajabi and he's a program manager on the Word team. Zeyad works on file format related issues, including the HTML support in Word. All of Zeyad's posts will be under the "Word HTML" category if you are interested in tracking those seperately.  

As some of you may know from Joe Friend’s blog, Word 2007 will allow users to author blogs straight from Word. I want to follow up on Joe’s blog by giving you guys more details concerning our XHTML output for the blogging feature. I hope to use this blog as an opportunity for you to comment on our blogging XHTML output and to make any suggestions.

Goals

Before I get into details about our XHTML output, I want to outline the goals for our blogging feature. The design goals behind the XHTML output from the blog tool are significantly different from what we’ve done in the past:

  • Output XHTML compliant code for each post (we are following the W3C spec)
  • Output clean and readable XHTML

Instead of concentrating on supporting 100% of Word’s features (as we did in the past) the blog feature will support a much smaller set of features and additionally concentrate on outputting clean and readable XHTML. The blog feature will only output the necessary XHTML needed to represent the document. No more redundant HTML or CSS. No more Microsoft Office specific CSS properties. We will output just clean and easy to read XHTML.

Known Beta 2 Issues

There are still some known bugs in the XHTML output for Beta 2. I wanted to point them out so that you aren’t surprised:

  • Strikethrough - We are outputting CSS property text-decoration for strikethrough instead of <del>
  • Divs around lists - We are outputting div tags for every list item. We do not need to output these extra elements
  • Block level elements within inline elements - We are not XHTML compliant in some cases because we are not following proper tag content flow. We are outputting block-level elements inside inline elements.
  • Multi-level lists - We are incorrectly outputting multi-level lists in terms of being XHTML compliant. We are outputting the incorrect XHTML in that we are closing the lists for before sub lists are closed
  • Table bloat - Our XHTML output for tables is too heavy and contains too much redundancy

I am sure there are more bugs to be found and I’m sure you guys will help me add to the list! As you play with the blogging feature, please feel free to send me any questions or suggestions you have. I want to make this feature great for all of us.

XHTML Output

There is too much to discuss in this first post, so I think I’ll break down the XHTML output into multiple categories: formatting, styles, lists, images and tables. I’ll have a separate post for each category so we can have some more targeted discussions. Another thing that I was thinking about doing was pulling all of this together as a public spec that I can post. Again, I would love for you to send me suggestions on any or all of the categories.

For those interested take a look at the source code the blogging tool generated for this post (note that it's only the contents that would go inside the ).

Formatting

Today let’s look at some details around the XHTML we output for formatting features:

Feature

XHTML

Hyperlinks

<a href="https://www.foo.com" target="_blank" title="Tip">hyperlink</a>

Font

<span style="font-family:XXXXXX;">text</span>

Font Size

<span style="font-size:28pt">text</span>

Font Color

<span style="color:XXXXXX">Colored text </span>

Bold

<strong>text</strong>

Italic

<em>text</em>

Underline

<u>text</u>

Strikethrough

<del>text</del>

Highlighter

<span style="background-color:XXXXXX">text</span>

Alignment

<p align="left">text</p>

<p align="right">text</p>

<p align="center"text</p>

Indent

<blockquote>text</blockquote>

Suggestions are Welcome

I know there are a couple different approaches for all of these. If you disagree with our approach let me know. I’ve read a lot of differing opinions on some of these (especially indentation), so while we probably won’t get everyone to agree 100% on the approach, hopefully we can find the best approach.

Anything missing? Is there a better way of representing a feature in XHTML?