Open XML WordprocessingML Style Inheritance (Post #4)

When working with WordprocessingML, nearly all of the information that we need to render paragraphs, tables, and numbered items is contained in styles, stored in the WordprocessingML Style Definitions part.  Styles are somewhat complicated because styles have inherited behavior – one style can be based on another style.  Rendering of text that has the derived style then is dependent on the derived style, it's base class, that base class's base class, and so on.  The Open XML specification refers to this list of styles that are derived from other styles as the 'style chain', which accurately describes the abstraction.

This is one in a series of posts on transforming Open XML WordprocessingML to XHtml.  You can find the complete list of posts here.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCWhen determining the set of properties for rendering a paragraph or table, the first job is to 'roll up' all styles in the style chain, creating a single set of properties that we can apply to the paragraph or table.  This process of 'rolling up' styles is made somewhat more complicated because there are four variations of semantics that we must apply to elements in the rolling-up process.

However, it's not too complicated, and after carefully defining the semantics of 'rolling-up' styles in the style chain, we can write a small bit of generalized code to do this – probably less than 100 lines of code.

You'll notice something about the semantics of style inheritance – by far, when rolling up the styles, the most common operation is to replace any elements in base styles with an element in a derived style.  In the code that I'm going to write which will roll-up styles, if the inheritance semantics are other than merging attributes or merging child elements, then the default behavior will be to do element replacement.  This will make the code as small and robust as possible.

This post probably isn't of very much interest to most people, but to the folks who are interested, it will be very important.  I'm in the process of writing a fairly compact conversion of Open XML to XHtml, and needed to work out the exact behavior of style inheritance.  After working it out, it made good sense to blog it to make life easier for others who need to work with rendering issues of WordprocessingML.

Merging Attributes

In some cases, we must iterate through attributes of a particular element, and if the element in the derived style has an attribute, we must apply that attribute, overriding the attribute in the base style.  In many cases, the base style may not define that particular attribute, so in that case, we must simply add the attribute to the element in the rolled-up style.  For example, we may have a style, SpaceBefore, which defines a style that has space before the paragraph, but no space after:

<w:stylew:type="paragraph"
w:customStyle="1"
w:styleId="SpaceBefore">
<w:namew:val="SpaceBefore"/>
<w:basedOnw:val="Normal"/>
<w:qFormat/>
<w:rsidw:val="00A670C6"/>
<w:pPr>
<w:spacingw:before="200"
w:after="0"/>
</w:pPr>
</w:style>

We may have a style, SpaceBeforeAndAfter, which defines the w:spacing element with a w:after attribute, like this:

<w:stylew:type="paragraph"
w:customStyle="1"
w:styleId="SpaceBeforeAndAfter">
<w:namew:val="SpaceBeforeAndAfter"/>
<w:basedOnw:val="SpaceBefore"/>
<w:qFormat/>
<w:rsidw:val="00A670C6"/>
<w:pPr>
<w:spacingw:after="200"/>
</w:pPr>
</w:style>

After 'rolling-up' the style chain, the style that we must apply to a paragraph that has the SpaceBeforeAndAfter style would look like this:

<w:stylew:type="paragraph"
w:customStyle="1"
w:styleId="SpaceBeforeAndAfter">
<w:namew:val="SpaceBeforeAndAfter"/>
<w:basedOnw:val="SpaceBefore"/>
<w:qFormat/>
<w:rsidw:val="00A670C6"/>
<w:pPr>
<w:spacingw:before="200"
w:after="200"/>
</w:pPr>
</w:style>

Merging Child Elements

In some cases, we must merge child elements.  We must iterate through all child elements of an element in the derived style, and if the base style doesn't contain a particular element, we must add that element to the 'rolled-up' style.  If the base style does contain the element of interest, then we must either merge attributes or replace the child elements, based on the semantics defined for that child element.  The w:pPr and w:rPr elements are examples of elements that require this type of inheritance.

Consider the style NotIndented, which defines paragraph properties (w:pPr) as follows:

<w:stylew:type="paragraph"
w:customStyle="1"
w:styleId="NotIndented">
<w:namew:val="NotIndented"/>
<w:basedOnw:val="Normal"/>
<w:qFormat/>
<w:rsidw:val="00082E03"/>
<w:pPr>
<w:spacingw:after="0"/>
</w:pPr>
</w:style>

The following style, Indented, derives from NotIndented:

<w:stylew:type="paragraph"
w:customStyle="1"
w:styleId="Indented">
<w:namew:val="Indented"/>
<w:basedOnw:val="NotIndented"/>
<w:qFormat/>
<w:rsidw:val="00082E03"/>
<w:pPr>
<w:indw:left="720"/>
</w:pPr>
</w:style>

After rolling up all styles in the style chain, the style that we should apply to text styled as Indented would be defined as follows:

<w:stylew:type="paragraph"
w:customStyle="1"
w:styleId="Indented">
<w:namew:val="Indented"/>
<w:basedOnw:val="NotIndented"/>
<w:qFormat/>
<w:rsidw:val="00082E03"/>
<w:pPr>
<w:spacingw:after="0"/>
<w:indw:left="720"/>
</w:pPr>
</w:style>

Note that both the w:spacing and w:ind elements require that their attributes be merged.  In most cases, per the list below, elements are replaced (as opposed to merging of attributes).

Replacing Elements

In some cases, while rolling-up styles, we must replace an element and its attributes wholesale.  We don't need to iterate through attributes, replacing individual attributes.  The w:top (Paragraph Border Above Identical Paragraphs) element has these semantics.  Consider the following style that defines a single line, with a size of 4 eighth's of a point, and with a color of red (FF0000 in hex):

<w:stylew:type="paragraph"
w:customStyle="1"
w:styleId="TopBorder1">
<w:namew:val="TopBorder1"/>
<w:basedOnw:val="Normal"/>
<w:qFormat/>
<w:rsidw:val="007850D3"/>
<w:pPr>
<w:pBdr>
<w:topw:val="single"
w:sz="4"
w:space="1"
w:color="FF0000"/>
</w:pBdr>
</w:pPr>
</w:style>

Here is a derived style, TopBorder2, which defines a top border, with a size of 18 eighth's of a point, and no color defined:

<w:stylew:type="paragraph"
w:customStyle="1"
w:styleId="TopBorder2">
<w:namew:val="TopBorder2"/>
<w:basedOnw:val="TopBorder1"/>
<w:qFormat/>
<w:rsidw:val="00315108"/>
<w:pPr>
<w:pBdr>
<w:topw:val="single"
w:sz="18"
w:space="1"/>
</w:pBdr>
</w:pPr>
</w:style>

After rolling up the styles in the style chain, the resulting style that should be applied to a paragraph styled TopBorder2 should be like this:

<w:stylew:type="paragraph"
w:customStyle="1"
w:styleId="TopBorder2">
<w:namew:val="TopBorder2"/>
<w:basedOnw:val="TopBorder1"/>
<w:qFormat/>
<w:rsidw:val="00315108"/>
<w:pPr>
<w:pBdr>
<w:topw:val="single"
w:sz="18"
w:space="1"/>
</w:pBdr>
</w:pPr>
</w:style>

Notice that the w:color attribute was not inherited from TopBorder1.  The w:top element, along with its attributes, was replaced wholesale.

(Update December 13, 2009 - I've written a bit of code to show how to implement XML inheritance.)

Style Conditional Table Formatting Properties

There is one special case where merging semantics are slightly more complicated.  Table styles have a very powerful feature called conditional table formatting.  This feature allows us to specify a special set of formatting properties for the top row, the first column, the bottom row, banded columns, banded rows, cells at the top left, top right, etc.  Conditional table formatting is defined in the w:tblStylePr element.  The fol