Comparison of Html/CSS Tables to WordprocessingML Tables (Post #5)

Html tables and WordprocessingML tables have a lot in common.  Both can present complex tables with horizontally and vertically merged cells, and both have a rich set of capabilities for formatting.  But there are differences in their models and capabilities.  This blog post presents those differences, specifically around three areas:

  • Table Layout
  • Formatting
  • Differences in capabilities at the table, row, and cell level

This is one in a series of posts on transforming Open XML WordprocessingML to XHtml.  You can find the complete list of posts here.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC

I'm currently in the process of coding a pure functional transform from WordprocessingML to XHtml.  Understanding the exact differences between the two types of tables enables writing this transform as accurately as possible.  In addition, if you understand CSS and Html tables, this blog post provides an easy way to learn about WordprocessingML tables.  (If you're a CSS expert, and see something I'm doing incorrectly, please correct me. :)

Note: In a previous post, I talked about a plan to transform WordprocessingML styles to CSS classes.  I've decided to not use CSS classes to represent WordprocessingML styles.  Instead, I'm going to generate a style attribute for each object (p, table, tr, td, etc.) that contains all necessary formatting for that object.  My rational for this decision is detailed in this post, in the "Differences in Formatting" section below.  This isn't a decision that I'm taking lightly, but I believe it is the correct one.  But we'll see…

Differences in Table Layout

On the surface, the layout of WordprocessingML and Html tables look very similar.  Of course, both can present a simple table that contains data:

Both can contain horizontally and vertically merged cells:

Both can represent an irregular layout:

However, WordprocessingML and XHtml tables use a somewhat different model for layout.

In WordprocessingML, you first establish a grid with some number of grid columns.  Left and right edges of cells will always be on a grid column.  The mechanism for horizontal cell spanning is that you specify the number of grid columns that a cell spans.  You can specify that the first cell in a row starts after skipping a certain number of grid columns.

In contrast, in XHtml, there is no underlying grid on which you layout cells.  Instead, the cells themselves form the grid.

To make this difference clear, let's look at a simple example.  Consider the following table with four cells, but the vertical rule between the top two cells isn't aligned with the vertical rule between the bottom two cells:

Here is the WordprocessingML that describes this table.  Notice the w:tblGrid, which describes the grid, and the w:gridSpan elements on the top left and bottom right cells.  While the grid describes three grid columns, there are only two cells per row.

<w:tbl>
<w:tblPr>
<w:tblStylew:val="TableGrid"/>
<w:tblWw:w="0" w:type="auto"/>
<w:tblLookw:val="04A0"/>
</w:tblPr>
<w:tblGrid>
<w:gridColw:w="1368"/>
<w:gridColw:w="450"/>
<w:gridColw:w="1350"/>
</w:tblGrid>
<w:tr>
<w:tc>
<w:tcPr>
<w:tcWw:w="1818" w:type="dxa"/>
<w:gridSpanw:val="2"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Top Left</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:tcPr>
<w:tcWw:w="1350" w:type="dxa"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Top Right</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
<w:tr>
<w:tc>
<w:tcPr>
<w:tcWw:w="1368" w:type="dxa"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Bottom Left</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:tcPr>
<w:tcWw:w="1800" w:type="dxa"/>
<w:gridSpanw:val="2"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Bottom Right</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
</w:tbl>

Following is markup for a similar table in XHtml.  There are three cells per row instead of two.  The first two rows (the only ones we see) each contain a cell with a colspan attribute, merging two cells into one.  The third row, with no border and a height of zero pixels, defines three cells.  This is a trick based on the semantics of XHtml tables.  When determining the widths of cells, the browser looks at all rows of the table, and then calculates the column width, taking widths of all cells of that column into consideration.  Using this approach, we need to specify column widths only once, in the last invisible row of the table.

<table style='border-collapse:collapse;border:none'>
<tr>
<td colspan="2"
style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Top Left</p>
</td>
<td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Top Right</p>
</td>
</tr>
<tr>
<td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Bottom Left</p>
</td>
<td colspan="2"
style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Bottom Right</p>
</td>
</tr>
<tr style="max-height:0px">
<td style='width:68.4pt;border:none'></td>
<td style='width:22.5pt;border:none'></td>
<td style='width:67.5pt;border:none'></td>
</tr>
</table>

The differences in the model become even clearer when we specify that a grid column is skipped before placing the first cell.  The following table shows a row that contains one cell that is shifted to the right:

The WordprocessingML that describes this table follows.  The w:gridBefore element specifies that the one cell in the second row is to be placed in the second grid column.

<w:tbl>
<w:tblPr>
<w:tblStylew:val="TableGrid"/>
<w:tblWw:w="0"w:type="auto"/>
<w:tblLookw:val="04A0"/>
</w:tblPr>
<w:tblGrid>
<w:gridColw:w="2000"/>
<w:gridColw:w="2000"/>
</w:tblGrid>
<w:tr>
<w:tc>
<w:tcPr>
<w:tcWw:w="2000"w:type="dxa"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Top Left</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:tcPr>
<w:tcWw:w="2000"w:type="dxa"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Top Right</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
<w:tr>
<w:trPr>
<w:gridBeforew:val="1"/>
</w:trPr>
<w:tc>
<w:tcPr>
<w:tcWw:w="2000"w:type="dxa"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Bottom Right</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
</w:tbl>

Here is how we would form this table in XHtml:

<table style='border-collapse:collapse;border:none'>
<tr>
<td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Top Left</p>
</td>
<td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Top Right</p>
</td>
</tr>
<tr>
<td style="border:none;padding:0in 5.4pt 0in 5.4pt">
<p> </p>
</td>
<td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Bottom Right</p>
</td>
</tr>
<tr style="max-height:0px">
<td width="100" style='border:none'></td>
<td width="100" style='border:none'></td>
</tr>
</table>

In XHtml, we have no choice but to place a cell in the location where there is no cell visible.  We place a non-breaking space in that cell, as some browsers may collapse the cell if it contains no data.  We also specify padding.  The table then renders as desired.

There is a simple strategy that we can take when converting the WordprocessingML to XHtml, which is to generate XHtml cells based on the grid, not on cells.  We then specify appropriate colspan and style attributes to make the table render as we wish.

This subtle difference in abstraction is one of the most important differences between tables in WordprocessingML and XHtml.  By taking this difference into account, it is easy to craft an algorithm that will produce tables that will render as we wish in XHtml.  In addition to this difference in abstraction, there are a number of differences in formatting and capabilities.  I don't believe that I've isolated all of the differences, but I think I've found most of the important ones.  In some of the conversions, I didn't yet spend the time to find the correct CSS approach, so am still using an Html attribute approach.

Differences in Formatting

There are a number of analogous capabilities in formatting between tables in WordprocessingML and XHtml/CSS, but one of the key differences is that in WordprocessingML, there is a rich infrastructure of style inheritance.  Table styles can inherit from other table styles.  Paragraph styles can inherit from other paragraph styles.  Run styles can inherit from other run styles.  In contrast, in CSS, we can define classes, but we can't define that one class inherits from another class.  However, when specifying the class for an element such as a table, paragraph, or span, we can specify more than one class, and each class is applied in turn.  This is analogous to style inheritance, but the mechanisms are completely different.

It might seem that we could use the ability to specify multiple classes for an XHtml object to implement a form of style inheritance, but there is one important aspect of the semantics of WordprocessingML styles that make it impossible to use CSS classes to implement style inheritance.  Table styles in WordprocessingML have the capability to define what are called conditional table formatting properties.  These are properties that are applied in a specific order to a) the entire table, b) banded columns, c) banded rows, d) first and last row, e) first and last column, f) specific cells at the corners.  And, of course, conditional table formatting properties inherit from the same conditional formatting properties of the base style of a table style.  In theory, we could define styles for each of these conditional table formatting properties, and apply these styles in order of precedence to each cell in the table.  But let's say that we have one table style with a number of conditional formatting properties that derives from another style that also contains a number of conditional formatting properties.  When specifying the classes for a paragraph, it would look something like this:

<p class="BaseStyle BaseStyle_EntireTable BaseStyle_Banded_Columns BaseStyle_BandedRows (etc.)
DerivedStyle DerivedStyle_EntireTable DerivedStyle_BandedColumns (etc.)>Some text.</p>

If we had a string of derived table styles, we could end up applying 30 or 40 (or many more!) classes to a single paragraph or run.  But even so, it won't work, because if the BaseStyle contains some property P, and a conditional formatting property overrides that property, and then DerivedStyle overrides the BaseStyle property P, and the conditional formatting property does not define that property, then the property that should apply is the one defined in the conditional formatting for the BaseStyle, not the property defined in the DerivedStyle.  It simply won't work.  We could start playing around with ordering of applications of classes, but I would hate to debug this.

We could go through the effort of defining classes for each uniquely styled cell in each table.  This would involve rolling up all inherited styles, and implementing the appropriate semantics for overriding properties at the table, paragraph, and run level, keeping a list of uniquely styled paragraphs and runs, then generating a CSS class for each unique combination of properties.  This does have the advantages (and disadvantages) of moving styling information away from the paragraphs and runs into the internal style sheet.  These classes would have a computer-generated, non-descriptive name, so they wouldn't be helpful to a person who is reading the XHtml.  In addition, it is highly unlikely that these classes could be re-used.  It's not worth the effort, I believe.

One approach would be to define a certain set of CSS classes, then override those classes with locally applied styling information in the style attribute.  But that defeats the whole purpose of having CSS classes in the first place.  With that approach, we still don't have separation of content and presentation, and as you can see, attempting to use CSS classes to represent styles is very complex and prone to bugs.

The approach that I've decided to take is to properly roll-up styling information from the WordprocessingML and store that styling information in the style attribute for each object, optimizing that styling information so that if a property is defined at a higher level, it isn't redefined.  For instance, if the paragraph specifies that a particular font is used, then the run doesn't also specify it.  This optimization can be done after assembling all formatting information for each paragraph and run.  This has the advantage that this conversion really is strictly a conversion of WordprocessingML to its presentation.  By not using CSS classes, it makes the conversion more straightforward.  It will be easier to debug.  I think it is useful for this conversion to simply be a transform of WordprocessingML to its presentation, without involving the complexities that CSS classes bring.  In effect, we're using XHtml and CSS used at the object level purely as a presentation engine.

Table Capabilities

Following is a partial list of features of WordprocessingML tables, and how they map to XHtml table features:

  • Both support visually right-to-left tables for languages such as Hebrew and Arabic.  The w:bidiVisual element translates to the dir attribute of the table element.
  • Both support alignment of the table with respect to the margins of the containing section or object.  To translate the w:tblInd element, create a div element with the align attribute set to some value (right, left, center).
  • Both support background shading.  However, with WordprocessingML, you can specify a pattern for background shading.  It could be possible to generate images, but this isn't a key scenario.  For phase one, the conversion will convert to shading with patterns to a solid color.
  • WordprocessingML contains the abstraction of themes.  In certain places, the conversion needs to retrieve font and color information from a theme.
  • Both support table and cell borders.  However, WordprocessingML contains two features not supported in XHtml.  WordprocessingML supports a large number of cell borders, including many 'clip art' varieties, such as "apples", "babyRattle", and "bats".  All of the clip art varieties will be converted to a single line border.  Commonly used styles such as solid, dotted, double lines, etc. will convert to the corresponding style in XHtml/CSS.  In addition, WordprocessingML supports diagonal borders.  These aren't commonly used, and I'm going to delay supporting them.
  • Cell margin (w:tblCellMargin) maps to the CSS padding attribute.  Cell margin is the space between the cell contents extent and the cell border.  Cell margin is typically expressed in terms of dxa, or 1/1440 of a point.  The CSS padding attribute can be expressed in inches, points, or other units of measure.
  • Cell spacing (w:tblCellSpacing) maps to the cellspacing attribute of the table object.  Cell spacing is the space between cell borders, but within the table.  Cell spacing is merged between adjacent cells.  Cell spacing in WordprocessingML is typically expressed in terms of dxa, or 1/1440 of a point.  The XHtml cellspacing attribute is in terms of pixels. 
  • Both models support flowing text around a Table.  In WordprocessingML, it is supported via floating tables (w:tblOverlap).  In XHtml and CSS, set the align attribute of table to left, and specify appropriate margins so that the table renders properly with the correct space between the table and surrounding text.

Row Capabilities

Following is a partial list of features of WordprocessingML rows, and how they map to XHtml row features:

  • In WordprocessingML, rows have the ability to be hidden.  Given my primary goal in simply rendering the table properly, the proper conversion is to remove hidden rows from the converted XHtml.
  • In WordprocessingML, rows can be centered, aligned left, or aligned right.  There is no corresponding capability in XHtml.  For phase one, the conversion will disregard row alignment.
  • In WordprocessingML, you can specify that a particular row is a row header, and should be repeated on each printed page.  Headers in XHtml tables provide the ability to format them separately.  They take on a bold appearance by default.  These capabilities are really not analogous, so for phase one, will not convert one to the other.
  • Table row height can be converted.  w:trHeight converts to the CSS height property of a row.

Cell Capabilities

Following is a partial list of features of WordprocessingML cells, and how they map to XHtml cell features:

  • The w:noWrap element translates to the noWrap attribute of the td element.
  • Background shading of cells can be converted.  The same issues apply as with table background shading.
  • Cell borders can be converted.  The same issues apply as with table borders.
  • WordprocessingML has the capability to alter kerning so that the text fits exactly in a cell.  The w:tcFitText element translates to the CSS fit-text property.
  • WordprocessingML supports setting the text flow direction.  This isn't supported in XHtml tables.
  • Horizontal and vertical alignment is supported in both models.

With this post, I've detailed much of what I think I need to know to transform Open XML WordprocessingML tables to XHtml tables using CSS for formatting.  I've also outlined the strategy that I think I'll follow given the slightly different layout model of tables in WordprocessingML and tables in XHtml.  As I code the transform, I'll revise this post so that I can remember the details of the transform of WordprocessingML tables to XHtml tables.