Office’s Support for ISO/IEC 29500 Strict

There has been some interest expressed lately regarding how soon Microsoft Office will offer full read/write support for the Strict conformance class of ISO/IEC 29500. I can certainly understand the interest in this topic from those involved in the standards process, as well as from our customers and other implementers. That’s why we’ve been looking into the issues and options for Strict support for quite some time. Many of you have observed our movement in this direction. Indeed, a member of WG 4 blogged about our progress toward Strict recently.

We generally don’t publicly discuss features this early in the product development lifecycle, but given the broad interest I’m going to share some of our thinking on Strict support here. We’re doing this to assure everyone involved that we understand – at all levels within Office – the importance of Strict support going forward. In short, we will support Strict no later than Office “15.” I’ll outline our general plans below, and ask you to stay tuned for more details as we get further into the Office 15 wave.

Conformance: Background and Jargon

Before I cover the topic at hand, I think it’s worthwhile to take a quick look at how conformance is defined in the standard itself. There are two versions of the Open XML standard that have been approved by standards bodies:

  • ECMA-376 was approved in 2006 by Ecma International. Ecma is a consortium standards body like OASIS, with members including implementers, vendors, public and private organizations, and individuals.
  • ISO/IEC 29500 was approved in 2008 by the member bodies of JTC 1, the joint technical committee of ISO and IEC responsible for development and maintenance of information technology standards. ISO and IEC are international standards bodies whose members are mostly countries. (To be precise, ISO/IEC members are the national standards organizations of various countries, such as ANSI in the US, BSI in the UK, or AFNOR in France.)

The ECMA-376 standard was submitted to JTC1 as a DIS (Draft International Standard) in 2007. Many countries (“member bodies”) participated in the standards process, and the version that was approved as ISO/IEC 29500 in 2008 included many changes that were suggested by the member bodies and then approved at the BRM (Ballot Resolution Meeting) in February 2008. For purposes of this post, the key changes to note are those in the conformance clause, which describes how to determine conformance to the standard.

In ECMA-376, two types of conformance were described in Section 2 of Part 1 of the standard: document conformance (Section 2.4) and application conformance (Section 2.5). These were just what you’d expect from their names: document conformance was about how to determine whether a document conforms to the standard, and application conformance was about how to determine whether an application conforms.

In ISO/IEC 29500, assessing conformance is more complicated because of several changes agreed to at the BRM that made conformance more granular than in ECMA-376.

The key change, which Alex Brown covers in his blog post, was the introduction of the concept of Strict and Transitional conformance classes. Transitional is intended to preserve the fidelity of existing binary documents being migrated to ISO/IEC 29500, and includes many legacy features for compatibility with existing documents. Strict is a subset of Transitional that does not include legacy features – this makes it theoretically easier for a new implementer to support (since it has a smaller technical footprint, so to speak), but also makes it less able to preserve the fidelity of existing documents.

Another conformance-related change at the BRM was the creation of separate conformance classes for word processing, spreadsheet and presentation documents and applications within both Strict and Transitional. So, for example, an application can be a conforming WML (WordprocessingML) Transitional application, or a document can be a conforming SML (SpreadsheetML) Strict document.

Yet another expansion of ECMA-376’s relatively simple approach to conformance was the addition of application descriptions, as covered in Section 2.6 of Part 1. An application may conform to either the Base Application Description (meaning that it supports at least one feature of its conformance class) or the Full Application Description (meaning that it supports every feature within its conformance class). That’s a pretty coarse distinction, but the standard anticipates refinement of application descriptions in Section 2.6.3, which states that “It is expected that additional application descriptions will be defined within the maintenance process for ISO/IEC 29500.” Indeed, SC 34/WG 4 (the working group tasked with maintenance of the standard) has discussed this concept just two weeks ago during the meetings in Stockholm, where Mohamed Zergaoui (representing France’s AFNOR) presented some thoughts on this topic. I expect that WG 4 will work to clarify and refine the conformance language of the standard going forward, and we look forward to participating in that process.

Office’s Approach to Open XML Conformance

Office 2007 was the first version of Office that supported the Open XML formats, with support for reading and writing of documents that conform to the ECMA-376 standard. To help improve interoperability between our implementation and others, we also published comprehensive implementer notes that transparently document the details of Office 2007’s implementation of ECMA-376.

After we shipped Office 2007, we got to work on the next version of Office, which was code-named “Office 14” but is now widely known as Office 2010, the version that we’ll be releasing very soon. For each new version of Office, we start with intensive research and planning to determine what new features will appear in the next release, and that process was ongoing during the DIS 29500 process. By the time of the BRM (in early 2008), we had our plans locked down and were working hard to deliver on Office 14, and meanwhile the standards community was working to make changes to the proposed DIS 29500 standard.

After approval and publication of final ISO/IEC 29500 text in 2008, the Word, Excel, PowerPoint and Graphics teams looked at how we could change our plans for Office 14 to accommodate the ISO/IEC version of the standard. As Shawn covered in a blog post one year after publication of the standard, we made the changes necessary to support ISO/IEC 29500 Transitional in Office 2010 .

The decision to start with Transitional was a relatively simple one at that time. Our primary consideration was simple: the needs of our customers. Our customers place a very high value on compatibility and interoperability, because they often need to allow people to collaborate across multiple versions of Office (due to varying upgrade schedules among trading partners, across supply chains, or between the departments of a large organization, for example). ISO/IEC 29500 Transitional is designed for high-fidelity interoperability with the binary formats and ECMA-376, so it’s the logical choice for these sorts of scenarios.

In addition to the work we did to move from ECMA-376 to Transitional, we also started doing the work to move toward Strict support as soon as the final text of ISO/IEC 29500 was locked down. For example, we invested resources in migration from VML to DrawingML for many features, we moved ink annotations to the new content part added at the BRM, and we added support for reading Strict files.

All of that work took has moved us much closer to full Strict support, and I’d like to state clearly and unequivocally at this time that we will support reading and writing of ISO/IEC 29500 Strict no later than the next major release of Office, code-named Office “15.”

I emphasized “and writing” there because we have already built read-only support for Strict into Office 2010, and Strict read-only support will also be available for Office 2007 SP2 through a downloadable filter. We’ve taken those steps to assure interoperability between Office 2007/2010 and other implementations of the Strict conformance class, including Office 15 in the future.

There’s one technical change that has come up during the maintenance process which I feel is worth pointing out, because of its large impact on the move to Strict support. There was a defect report submitted to WG 4 by the Swiss technical committee last year that proposed changing the namespaces of ISO/IEC 29500, so that implementers could have a simple and reliable mechanism for distinguishing ECMA-376 documents from ISO/IEC 29500 documents. WG 4 started discussing and debating various ways to address that proposal over a year ago, and last summer reached consensus on changing the Strict namespaces, but not the Transitional namespaces. This resulted in a large number of changes to the text of the standard – for those interested in a good overview of the magnitude of those changes, check out Orcmid’s latest blog post.

Implementers, including Microsoft Office, will need to think carefully about how to handle the namespace changes in a way that gives customers the best possible experience. This is yet another challenge in planning support for Strict, and something the product teams are currently looking into as we start planning for Office 15.

Maintenance of IS 29500

Another topic that Alex raised in his blog post was the ongoing maintenance activity in WG 4, including progress to date, prioritization of the work, and other considerations. I’d like to briefly respond to his thoughts here, while acknowledging that WG 4 itself is the proper place for in-depth discussion and planning of the maintenance process. Any person from any SC 34 member body can participate in WG 4, so if you have thoughts on maintenance of ISO/IEC 29500, I’d encourage you to get involved.

WG 4 has existed for about 18 months now, and we have worked through a very large number of defect reports in that time. Although I’ve not participated in other JTC 1 working groups before, I’ve heard that the pace of WG 4’s work, with conference calls of up to two hours every two weeks, ongoing email on the public WG 4 reflector, and face-to-face meetings every three months, has been exceptional. Over 340 defect reports have been submitted to date, and WG 4 has processed and closed 242 of those, with 36 others in “last call” status (meaning that a defined solution is pending final approval by WG 4), and less than 70 awaiting further consideration.

Japan, the UK, and Ecma have been the largest submitters of defect reports to date, and defect reports have also been submitted by Denmark, Switzerland, Czech Republic, Ireland, and others. The maintenance process is proceeding smoothly, and we’ve handled changes ranging from simple editorial corrections to major proposals such as the namespace change mentioned above. Through it all, I feel that the WG 4 team has really gelled, and we’ve established a productive results-oriented working style that is well-suited to both the participants and the work at hand. Could we improve the process in various ways? Of course we can, and we will. But I think it’s worth noting that WG 4 has been very productive to date, with the first batch corrigenda already prepared, reviewed and approved, the first set of amendments in the pipeline, and work already underway on the next sets of corrigenda and amendments.

I’d like to keep the discussion of WG 4 procedures within WG 4 itself, since those are the people who will ultimately be doing the work. But as I said above, if you have thoughts on how to tackle maintenance, please get involved. Contact your National Standards Body for information about how to participate from your country.

Validating IS 29500 Conformance

What’s the best way to assess conformance to a large complex document format standard? This is a question that challenges the best and brightest minds in all of the standards organizations responsible for such formats, including SC 34 as well as OASIS, Ecma, and others.

As Jesper Lund Stocholm recently noted in a blog post about his new validator project, schema validation is the easy part. It can be automated, and there’s no ambiguity regarding whether a specific XML instance is valid against a specific set of schemas. The bigger challenges come when you try to validate the semantic and syntactic constraints that are embodied in the normative text of the standard.

Many people are working on how to best tackle these challenges in the world of ISO/IEC 29500, including Jesper and Alex’s validator projects, as well as the work being done by Fraunhofer and others. Here at Microsoft, we’re excited to see so much talent being applied to this area, and we’re looking forward to working with fellow WG 4 members and others to assess conformance in a way that the community agrees is best.

This post is already quite long, so I’ll not go into a lot of detail here except to note that there are two main areas where we expect to see useful results soon that will help raise validation testing to a new level of rigor and repeatability:

  • Identification of the semantic constraints contained in the text of the standard, so that all validators can work against a known-complete set of such constraints. Fraunhofer has done some interesting work in this area, to extract potential semantic constraints from the normative text, and we’ll be working with them to find a way to provide those constraints to writers of ISO/IEC 29500 validators.
  • Availability of a community-driven document test library, which implementers can use to test interoperability across conformant implementations of the standard. Fraunhofer has started this work, and there is much more to be done. Microsoft has contributed to this activity, and we’ll be staying closely involved.

Regarding the specific details of conformance, Alex noted that in addition to conformance issues caused by bugs in implementations, there can be issues caused by contradictory provisions within the text of the standard. Such contradictions can and do occur within various standards (both ISO/IEC 26300 and ISO/IEC 29500 have at least one of them, for example), and one of the goals of standards maintenance is to identify and correct such errors. The common pattern for such contradictions is that some portion of the standard will state that implementers shall do X, but there is text elsewhere in the standard (often text that was added later) which states that implementers may do Y in the same situation.

A strict reading of the text would lead one to conclude that such errors make conformance impossible. As a practical matter, however, implementers need to do something – they need to make a judgment call regarding the most reasonable interpretation of the intent of the standard in these areas. In the case of our IS 29500 implementation, we have done exactly that, and we’ve documented such interpretations within our published ISO/IEC 29500 implementer notes, so that everyone can see how we’ve interpreted the standard.

Separately, such errors need to be corrected in the standard. We are also contributing to that work. As one recent example, I wrote up a defect report myself while WG 4 was in Stockholm, to address an internal inconsistency regarding relationship types that Alex’s Office-o-tron validator had identified. We will work with the community to proactive identify more of these sorts of errors and get them corrected, and as part of my job I’m thinking through how we can best do that going forward.

One other detail that Alex mentioned was the use of the phrase “new documents” in the conformance clause for ISO/IEC 29500 Transitional. He noted that this term is not defined in the standard, and we agree that the intent of that term needs to be clarified. Here’s relevant text from the conformance clause:

“The intent […] is to enable a transitional period during which existing binary documents being migrated to DIS 29500 can make use of legacy features to preserve their fidelity, while noting that new documents should not use them. […]

One thing to note there is the word should, which is a well-defined term. RFC 2119 covers the use of key words like should/shall/must/may within the normative text of standards, and here’s how should is defined:

3. SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

In the case of Office 2010’s use of Transitional, we have decided to prioritize compatibility and interoperability with existing implementations, because we believe this is in the best interest of our customers. So although the conformance clause says that Transitional “should not” be used for new documents, we have decided that the needs of customers, combined with the realities of the current document format ecosystem (most existing implementations are Transitional, recent major changes to the Strict namespaces), make Transitional the right choice. We will continue to update our plans in response to feedback from customers, other implementers, and the standards community going forward.

Summary

In closing, here’s where we stand:

  • In Office 2010, we’re providing read/write for Transitional and read-only support for Strict.
  • We will include write support for Strict no later than the initial release of Office 15. (More details will be forthcoming after we complete our planning.)
  • We are committed to continuing to work closely with the community on validation techniques, and we are actively using the available ISO/IEC 29500 validators to improve the quality of our implementation.

We’ve learned a lot from the IS 29500 standards process, and we continue to learn from the open and respectful exchange of ideas within SC 34 and the broader standards community. None of us have all of the answers, and many of the challenges that we collectively face are complex, but I’m confident that we can work through them and find solutions that address the needs of customers, implementers, standards workers, and other stakeholders.

And once again, I’d like to reiterate that if you have opinions about ISO/IEC 29500 maintenance, please get involved. I’m humbled by the level of expertise that WG 4 members bring to the table, and also by the commitment of those who volunteer large amounts of their own time to work toward improving the standard. I know I speak for every member of WG 4 in saying that we’d love to have even more participants involved.