Assembling Paragraph and Run Properties for Cells in an Open XML WordprocessingML Table

When we want to render a paragraph and its runs inside of a cell, we need to assemble the paragraph and run properties from a number of places.  In a previous post, I explained how style inheritance works, and how you 'roll-up' styles from the style chain.  That is only part of the story.  This post details how we assemble styling information from:

  • Table styles
  • The formatting directly applied to tables, paragraphs, and runs
  • The global default paragraph and run properties.

This is one in a series of posts on transforming Open XML WordprocessingML to XHtml.  You can find the complete list of posts here.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC

In the process of assembling paragraph and run properties, we also need to correctly handle something called 'Toggle Properties'.

Note: next, I'm going to tackle the semantics of numbering styles, and then I believe I'm ready to start coding in earnest.  I've made a decision in this project to first implement a transform to XHtml without styling information.  The resulting XHtml will contain just the content of the document.  I decided to do this because it is useful in its own right, and we need it for another project.  Having code to extract the contents of a document in the most succinct form possible has a lot of uses.  Of course, this XHtml still can be rendered in a browser, and in some cases, it will be useful to do this.  Then, after publishing that code, I'll start implementing styling behavior.

Table Styles

A very powerful and cool feature of Word is that when you are applying a table style you pick and choose which aspects of the style you want to apply.  For consistency, you can apply the same style to all tables in your document, but some tables may have a total row, and other tables may not:

You can apply the same style to both, then pick and choose which aspects of the table style to apply.  When you are applying a table style, this is what the Ribbon in Word 2007 looks like:

You can see the range of check boxes in the Table Style Options section of the Ribbon, and how you can pick and choose which aspects of the style you want to apply.  This has ramifications for us when we are assembling styling information for a cell.  We know the table style for the table, and we also know the values of those check boxes, so we have to apply the various aspects of the table style per the user's preferences from those check boxes.

Before we dive into table styles in depth, we need to cover toggle properties, which play a part in how table styles work.

Toggle Properties

Toggle properties consist of a set of run properties that have a little twist in their semantics when assembling formatting information in preparation to rendering paragraphs in some fashion.  The w:b element (which styles a run as bold) is a good example of a toggle property.

Here's how toggle properties work:

Toggle properties only have their toggle behavior when associated with table styles, paragraph styles, and character styles.  If a run has been made bold per the table style, and the user applies a paragraph style that also has the w:b element, the net result is that original bolded text is now made not bold.  And if some portion of that paragraph has a bold character style applied to it, that portion is now made bold again.

This makes sense.  The table style designer designated that a cell be bold.  The paragraph style designer had the intention of making text in the paragraph stand out.  But the text is already bold, so that intent won't be satisfied, so to make it stand out, we reverse the boldness of the text.  The same reasoning also applies to a character style that has the w:b element.

It's just these three types of styles (table, paragraph, and character) that we need to process in this fashion.  If the user subsequently selects that text and presses the bold button on the toolbar, setting the properties on the run itself (not a style), we honor his or her intention, regardless of the boldness of the table, paragraph, or run styles.  Also, the global run properties completely override the toggling behavior (but not directly applied formatting).  If the w:b element is set on the global run property, effectively making the entire document bold, the entire document remains bold, unless formatting is set directly on a run.

The set of toggle properties are: §2.3.2.1 (Bold), §2.3.2.2 (Complex Script Bold), §2.3.2.4 (Display All Characters as Capital Letters), §2.3.2.11 (Embossing), §2.3.2.14 (Italics), §2.3.2.15 (Complex Script Italics), §2.3.2.16 (Imprinting), §2.3.2.21 (Display Character Outline), §2.3.2.29 (Shadow), §2.3.2.31 (Small Caps), §2.3.2.35 (Single Strikethrough), and §2.3.2.39 (Hidden Text).  The section numbers are for Ecma-376 version 1.

Assembling Styling Properties for Cells in a Table

Due to the richness of table styles, as shown above, table, row, cell, paragraph and run properties can be stored in multiple places in a table style.  Determining the properties for a table style involves rolling up those styles, in the exact same fashion as I described for rolling up style properties in the previous blog post.  While rolling up that information, we need to either merge attributes, merge child elements, or replace elements.

Shading of the table cells comes from the table cell properties (w:tcPr).  Formatting of the text in table cells comes from the paragraph properties (w:pPr) and run properties (w:rPr).  Other necessary properties for rendering come from the table properties (w:tblPr) and table row properties (w:trPr).  The process for assembling the correct table styling information for a cell is the same for each of these.  In the following section, I describe the process of assembling styling information for runs in a table per the table style, but the same approach applies to assembling styling information for the other aspects of a table style (table, row, cell, and paragraph properties).  When I write code to do this, of course, I'm going to write only one set of methods to do this assembling of styling information, and parameterize those methods so that I can use it for assembling all aspects of conditional table formatting properties.

To determine the run properties from a style for a cell in a table, we do the following, in order:

  • We first roll-up all table styles in the table style chain, per my post, Open XML WordprocessingML Style Inheritance.
  • We retrieve the value of the w:tblLook element from the table that we're rendering, which indicates which of the conditional table formatting properties we will apply to the table.
  • We create an empty list of the run style properties (the w:rPr element).  In the following steps, we will be adding run style properties to this list, based on the circumstances, and after assembling all the items in the list, we will roll them up to give us the appropriate styling information for the cell.  Note that in the following steps, if the w:tblStylePr element does not exist, it is not an error.  It just means that we don't need to do anything for that particular step.
  • We add to the list:
    • The run style property for the whole table style from w:tblStylePr[@w:type = 'wholeTable'].
    • If we should apply column banding, per the w:tblLook element
      • If the cell is an odd banded column cell, then add the run style property from w:tblStylePr[@w:type = 'band1Vert']
      • If the cell is an even banded column cell, then add the run style property from w:tblStylePr[@w:type = 'band2Vert']
    • If we should apply row banding, per the w:tblLook element
      • If the cell is an odd banded row cell, then add the run style property from w:tblStylePr[@w:type = 'band1Horz']
      • If the cell is an even banded column cell, then add the run style property from w:tblStylePr[@w:type = 'band2Vert']
    • If we should apply the first row formatting, per the w:tblLook element
      • If the cell is in the first row, then add the run style property from w:tblStylePr[@w:type = 'firstRow']
      • In addition, if the cell is in a row with the w:tblHeader element, then add the run style property from w:tblStylePr[@w:type = 'firstRow']
    • If we should apply the last row formatting, per the w:tblLook element
      • If the cell is in the last row, then add the run style property from w:tblStylePr[@w:type = 'lastRow']
    • If we should apply the first column formatting, per the w:tblLook element
      • If the cell is in the first column, then add the run style property from w:tblStylePr[@w:type = 'firstCol']
    • If we should apply the last column formatting, per the w:tblLook element
      • If the cell is in the last column, then add the run style property from w:tblStylePr[@w:type = 'lastCol']
    • If the cell is the top left cell, then add the run style property from w:tblStylePr[@w:type = 'nwCell']
    • If the cell is the top right cell, then add the run style property from w:tblStylePr[@w:type = 'neCell']
    • If the cell is the bottom left cell, then add the run style property from w:tblStylePr[@w:type = 'swCell']
    • If the cell is the bottom right cell, then add the run style property from w:tblStylePr[@w:type = 'seCell']
  • Now that we have a list of run properties, we roll them up.  We now have a set of style run properties that we can apply to the cell.

Note that this only gets the run properties for a table style.  Once we have rolled up the run properties for the table style, we assemble the following, in order:

  • The run properties for the table style (per the above procedure)
  • The run properties for the paragraph style for the paragraph that contains the run
  • The run properties for the run style applied to the run

We then roll these three up, implementing the toggling behavior for toggle properties that I described earlier.  Once we have done this process, we assemble the following, in order:

  • The global default run properties.
  • The rolled up run properties from the table styles.
  • The rolled up run properties from a directly applied run style.
  • The global defaults, with all properties except toggle properties removed.  (This will provide the behavior that global properties trump style toggle properties.)
  • The run properties that are applied directly to a run.

We roll these up, and we finally have the run properties that we can apply to the run.

When we're assembling the paragraph properties for a table style, we follow a similar procedure.  Once we have that rolled-up property, we need to assemble a new list of paragraph properties, in the following order:

  • The global default paragraph properties
  • The table style paragraph properties (per the above procedure)
  • The paragraph properties applied directly to a paragraph

We then roll up these three sets of paragraph properties, and we have the paragraph properties that we can apply to the paragraph in the cell.

This seems harder than it actually is.  While this is a bit involved, this is what enables the very cool table styling capabilities that we see in Word.  I just have to say, this is one of those cases where I really appreciate LINQ to XML.  I personally really would not want to write old-style imperative code to do this.

One more point about this – I mentioned in an earlier post about an approach of adding paragraph and run properties with ordering applied to every paragraph and run in the document.  I still think that this approach will work best.  It means that I can assemble the style paragraph properties for a cell, then add them to every paragraph in the cell.  I can assemble the style run properties for a cell, then add them to every run.  This means that I'll only need to compute the style paragraph properties for a particular cell once, not for every paragraph in the cell.  Same holds true for runs also.