Specifying the document settings

Earlier today I got an e-mail from Max asking if I could help clarify the section of the specification that deals with compatibility settings. Max was reading a blog post from IBM, and was hoping that I could respond with my point of view. Here's the original message from Max:

Hi,

I read this entry from Rob Weir: https://www.robweir.com/blog/2006/01/how-to-hire-guillaume-portes.html about specific properties inside Open XML, e.g. "useWord97LineBreakRules".

He makes the point how this can be an open format, when the format is documented, but certain properties are not, so in this example, nobody besides MS knows how these specific line breaking rules look like.

It would be great if you could comment your point of view on these issues as an response or on your blog.

Best regards

Max

This is a great question, and one that on the surface seems to have an obvious answer. But, similar to the spreadsheet date issues that we discussed towards the end of last year, the approach taken for compatibility settings was done to allow for the most interoperable format possible without negatively impacting the average end user. Let's explore this a bit more.

Leaving things out of the spec isn't the solution

If you look at section 2.15 in part 4 of the spec, you'll see that there are almost 400 pages of documentation covering 206 elements. I still remember when we first started the documentation of that section. We knew it would be tedious, but the more annoying part was that a good portion of it was for legacy functionality that we would have much rather just left out of the spec. In fact, in Word 2007, when you take a .doc file and upgrade it to .docx we ask you if we can do a full upgrade so that all of those legacy settings are removed.

Unfortunately for us, there was a legacy base of billions of documents out there, and many of them had one or more of these settings. For the same reason we had to give the user the option of doing a full upgrade to remove the legacy settings rather than just doing it automatically, we had to include it in the file format. So rather than just trying to sweep it under the carpet, we embarked on investigating and trying to understand as much as we could about what each of those legacy settings meant.

No requirement for conformance

For the most part, no one building a solution that reads from the format or writes to the format will care to deal with that part of the spec. It doesn't have a significant impact in the behavior of the documents, and as I already mentioned we actually try to get the user to upgrade and turn these settings off whenever we can.

If you look at the first part of the compat settings section (2.15.3) you'll see that we tried to make it as clear as possible that implementation of these settings is completely optional. If your customers really care about it and want you to implement it, then you'll need to do so, and the spec defines how you should represent that setting. Here's the blurb from the spec that tries to make this very clear:

It is important to note that all compatibility settings are optional in nature - applications may freely ignore all behaviors described within this section and these settings should not be added unless compatibility is specifically needed in one or more cases. The compatibility settings are provided for backward compatibility with documents created in legacy applications. As such, a number of the settings reference specific applications and specific versions of those applications. This is solely for backward compatibility reasons, and any of those settings are ignorable.

"Use OpenOffice.org 1.1 line spacing"

Since the folks pointing out this issue are extreme ODF proponents (which clearly shapes their agenda), it's worthwhile looking into whether or not there is a similar issue with the ODF spec. It's actually a great example of the difference between the two specs and the approaches the two groups took when defining the formats. ODF took the approach of just leaving it all unspecified. This is a similar approach that has been taken for a number of things (like spreadsheet functions, etc.).

For example, in OpenOffice, there is a compatibility option called: "Use OpenOffice.org 1.1 line spacing", that when saved out into the format is defined as follows:

<config:config-item config:name="UseFormerLineSpacing" config:type="boolean">false</config:config-item>

That's a bit cryptic isn't it? Unfortunately there is nothing in the ODF spec that explains what that means. It's left undefined. That's probably not as big of a deal for a setting like this, since it talks about using layout from a specific application, but none of the settings are defined in the spec. Other settings a bit more important to the actual display of a document that OpenOffice outputs but aren't defined at all in the spec are:

  • PrintTables
  • LoadReadonly
  • OutlineLevelYieldsNumbering
  • TableRowKeep
  • CharacterCompressionType

Those are just a few I noticed in a blank document I saved out today.

Will there eventually be a new standard within the standard?

This is the approach that was taken for all configuration settings. There is no mention in the standard of how to name these things or how two applications should interoperate. In the blank document I saved out I got a settings file that looks something like this:

<office:document-settings xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:xlink="https://www.w3.org/1999/xlink" xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0" xmlns:ooo="https://openoffice.org/2004/office" office:version="1.0">

<office:settings>

<config:config-item-set config:name="ooo:view-settings">

<config:config-item config:name="ViewAreaTop" config:type="int">0</config:config-item>

<config:config-item config:name="ViewAreaLeft" config:type="int">0</config:config-item>

<config:config-item config:name="ViewAreaWidth" config:type="int">30032</config:config-item>

<config:config-item config:name="ViewAreaHeight" config:type="int">17570</config:config-item>

<config:config-item config:name="ShowRedlineChanges" config:type="boolean">true</config:config-item>

<config:config-item config:name="InBrowseMode" config:type="boolean">false</config:config-item>

<config:config-item-map-indexed config:name="Views">

<config:config-item-map-entry>

<config:config-item config:name="ViewId" config:type="string">view2</config:config-item>

<config:config-item config:name="ViewLeft" config:type="int">3708</config:config-item>

<config:config-item config:name="ViewTop" config:type="int">3002</config:config-item>

<config:config-item config:name="VisibleLeft" config:type="int">0</config:config-item>

<config:config-item config:name="VisibleTop" config:type="int">0</config:config-item>

<config:config-item config:name="VisibleRight" config:type="int">30030</config:config-item>

<config:config-item config:name="VisibleBottom" config:type="int">17568</config:config-item>

<config:config-item config:name="ZoomType" config:type="short">0</config:config-item>

<config:config-item config:name="ZoomFactor" config:type="short">100</config:config-item>

<config:config-item config:name="IsSelectedFrame" config:type="boolean">false</config:config-item>

</config:config-item-map-entry>

</config:config-item-map-indexed>

</config:config-item-set>

<config:config-item-set config:name="ooo:configuration-settings">

<config:config-item config:name="AddParaTableSpacing" config:type="boolean">true</config:config-item>

<config:config-item config:name="PrintReversed" config:type="boolean">false</config:config-item>

<config:config-item config:name="OutlineLevelYieldsNumbering" config:type="boolean">false</config:config-item>

<config:config-item config:name="LinkUpdateMode" config:type="short">1</config:config-item>

<config:config-item config:name="PrintEmptyPages" config:type="boolean">true</config:config-item>

<config:config-item config:name="IgnoreFirstLineIndentInNumbering" config:type="boolean">false</config:config-item>

<config:config-item config:name="CharacterCompressionType" config:type="short">0</config:config-item>

<config:config-item config:name="PrintSingleJobs" config:type="boolean">false</config:config-item>

<config:config-item config:name="UpdateFromTemplate" config:type="boolean">false</config:config-item>

<config:config-item config:name="PrintPaperFromSetup" config:type="boolean">false</config:config-item>

<config:config-item config:name="AddFrameOffsets" config:type="boolean">false</config:config-item>

<config:config-item config:name="PrintLeftPages" config:type="boolean">true</config:config-item>

<config:config-item config:name="RedlineProtectionKey" config:type="base64Binary"/>

<config:config-item config:name="PrintTables" config:type="boolean">true</config:config-item>

<config:config-item config:name="ChartAutoUpdate" config:type="boolean">true</config:config-item>

<config:config-item config:name="PrintControls" config:type="boolean">true</config:config-item>

<config:config-item config:name="PrinterSetup" config:type="base64Binary"/>

<config:config-item config:name="IgnoreTabsAndBlanksForLineCalculation" config:type="boolean">false</config:config-item>

<config:config-item config:name="PrintAnnotationMode" config:type="short">0</config:config-item>

<config:config-item config:name="LoadReadonly" config:type="boolean">false</config:config-item>

<config:config-item config:name="AddParaSpacingToTableCells" config:type="boolean">true</config:config-item>

<config:config-item config:name="AddExternalLeading" config:type="boolean">true</config:config-item>

<config:config-item config:name="ApplyUserData" config:type="boolean">true</config:config-item>

<config:config-item config:name="FieldAutoUpdate" config:type="boolean">true</config:config-item>

<config:config-item config:name="SaveVersionOnClose" config:type="boolean">false</config:config-item>

<config:config-item config:name="SaveGlobalDocumentLinks" config:type="boolean">false</config:config-item>

<config:config-item config:name="IsKernAsianPunctuation" config:type="boolean">false</config:config-item>

<config:config-item config:name="AlignTabStopPosition" config:type="boolean">true</config:config-item>

<config:config-item config:name="ClipAsCharacterAnchoredWriterFlyFrames" config:type="boolean">false</config:config-item>

<config:config-item config:name="CurrentDatabaseDataSource" config:type="string"/>

<config:config-item config:name="DoNotCaptureDrawObjsOnPage" config:type="boolean">false</config:config-item>

<config:config-item config:name="TableRowKeep" config:type="boolean">false</config:config-item>

<config:config-item config:name="PrinterName" config:type="string"/>

<config:config-item config:name="PrintFaxName" config:type="string"/>

<config:config-item config:name="ConsiderTextWrapOnObjPos" config:type="boolean">false</config:config-item>

<config:config-item config:name="PrintRightPages" config:type="boolean">true</config:config-item>

<config:config-item config:name="IsLabelDocument" config:type="boolean">false</config:config-item>

<config:config-item config:name="UseFormerLineSpacing" config:type="boolean">false</config:config-item>

<config:config-item config:name="AddParaTableSpacingAtStart" config:type="boolean">true</config:config-item>

<config:config-item config:name="UseFormerTextWrapping" config:type="boolean">false</config:config-item>

<config:config-item config:name="DoNotResetParaAttrsForNumFont" config:type="boolean">false</config:config-item>

<config:config-item config:name="PrintProspect" config:type="boolean">false</config:config-item>

<config:config-item config:name="PrintGraphics" config:type="boolean">true</config:config-item>

<config:config-item config:name="AllowPrintJobCancel" config:type="boolean">true</config:config-item>

<config:config-item config:name="CurrentDatabaseCommandType" config:type="int">0</config:config-item>

<config:config-item config:name="DoNotJustifyLinesWithManualBreak" config:type="boolean">false</config:config-item>

<config:config-item config:name="UseFormerObjectPositioning" config:type="boolean">false</config:config-item>

<config:config-item config:name="PrinterIndependentLayout" config:type="string">high-resolution</config:config-item>

<config:config-item config:name="UseOldNumbering" config:type="boolean">false</config:config-item>

<config:config-item config:name="PrintPageBackground" config:type="boolean">true</config:config-item>

<config:config-item config:name="CurrentDatabaseCommand" config:type="string"/>

<config:config-item config:name="PrintDrawings" config:type="boolean">true</config:config-item>

<config:config-item config:name="PrintBlackFonts" config:type="boolean">false</config:config-item>

</config:config-item-set>

</office:settings>

</office:document-settings>

So what part of this is defined in the ODF spec? Well all of the elements are defined, but the problem is that there are only 4 elements used. The actual data can't be determined by the name of the element, but rather the value of the attribute "config:name".

None of those values are defined in the specification though, so there is no way for two applications that follow the spec to share any of these properties without the two applications working together to define a completely new standard within the standard.

Do you want to remove them from the spec?

I guess the question is whether or not folks would rather just see section 2.15 of the spec removed and take an approach similar to ODF. I don't see how that would help interoperability in any way though. Sure it makes the spec smaller, but does that really help? Does it make it easier to move files from one application to another?

-Brian