Word XHTML - Compliance and Styles

This is the 2nd post in a series by Zeyad Rajabi who is a program manager working on Word's XHTML output used in Word's new blogging feature.

My first blog gave a brief introduction on our XHTML output for the blogging feature in Word 2007. This post will outline details on the styles we output.

Goals

In my last post I said we wanted to be XHTML compliant by the time we ship this blogging feature. Today I want to be a little clearer as to what I mean: strict vs. transitional.

Working on Word I have come to understand Word’s HTML and CSS capabilities. Word only supports a subset of the standard HTML 4.0 specification and similarly only a subset of the standard CSS 1.0 specification. Yes, you read correctly, CSS 1.0. For the most part, the feature set we offer within the blogging tool allows us to output CSS properties that Word supports and can render correctly. However, there are a few examples where we are unable to output CSS properties (in order to be XHTML Strict compliant) because Word would not be able to read them in. All unknown HTML and CSS in Word are basically ignored, and it was a goal that the blog posts could be edited by Word after they are published.

What does that mean for our output?

At a minimum, our goal will be to always validate as XHTML 1.0 Transitional compliant code. For a basic blog we will validate as XHTML 1.0 Strict compliant code. For those blogs that use features where we cannot output Word supported CSS, our aim is to be XHTML 1.0 transitional compliant.

Word can certainly output any HTML or CSS, but the issue then is around roundtripping, which is the ability to generate HTML or CSS that can be read back in correctly. An obvious question would be to ask why Word can't just add the functionality to read those additional properties back in correctly. This would be great, but we are on a limited budget, and that would have meant taking away other features that we have prioritized higher. Because of this, there is a fine balancing act that we must perform: roundtripping vs. XHTML output.

XHTML Style Output

Feature

XHTML CSS Property

HTML Elements

Font

colorfont-familyfont-sizetext-decoration:line-throughtext-decoration:underline*

spanspanspanspanspan

Block

text-align*text-indent*

pp

Background

background-color

span

Box

margin-left*

p

Table Padding

padding-toppadding-leftpadding-bottompadding-right

tdtdtdtd

Table Borders

border-collapse:collapseborder-topborder-leftborder-bottomborder-right

tabletdtdtdtd

Position

width

col

CSS properties with * marked implies that we will output those XHTML CSS styles post Beta 2.

An interesting property that is missing is float. Unfortunately, Word does not understand that CSS property. Instead, we will use the HTML attribute align, which will make us XHTML Transitional compliant for the blogs with that type of content. We can output float, but if the post is ever read back into Word that property will be ignored, thus making the image not floating anymore.

Another interesting thing to point out is the styles we output for tables. As you can tell our HTML output for tables in Beta 2 is quite bloated. This table will certainly be updated as we get closer to release. I will post the complete spec of Word's XHTML support at a later date.

Suggestions are welcome

Anything missing?