Working with ODF in Word 2007 SP2


For those of us on the Office Interoperability team, as well as our colleagues throughout Office, today is a big day.  We’ve released SP2 (Service Pack 2 for Office 2007), which includes a bunch of updated features.  Gray Knowlton has a roundup of what’s new in SP2, but I think the feature of most interest to readers here is probably the built-support for ODF 1.1.


I first mentioned our plans for ODF support in a blog post last year, and I’ve also blogged in the past about the guiding principles that we followed in our ODF implementation.  Our decision to support ODF is just one aspect of Office’s broad commitment to choice and interoperability, as covered by Tom Robertson today on the Microsoft on the Issues blog.


For today’s post, I thought I’d put together a hands-on example of a typical user experience when working with ODF and Office 2007 SP2.  I’m going to focus on a typical document creation and editing scenario in Word.  Specifically, I’ll go through these steps:



  • Create a typical document in Word 2007 SP2, and save it as ODF.

  • Open that document in OpenOffice 3.0.

  • Back in Word, add some fancy styling and other typical enhancements to the document, then save the fancier version in ODF.

  • Open that fancier version in OpenOffice.

The starting point.  As a first step, I’ll create a document we can use as a starting point to try out some things.  So I select File/New in Word, add some text, insert a few of the things we all use regularly in documents (a title, headings of various levels, a numbered list, and a table), and do some simple formatting.  Here’s how it looks:


image


The next step is to save this as an ODT document.  That’s pretty simple – – just click the Office Button,  move your mouse to ‘Save As”,  and then select “OpenDocument Text” from the menu.  Before I go any further, it’s worth noting a couple of things about this step:



  • You can make ODF the default document format if you’d like, and then you won’t need to select it from the dropdown list each time

  • I’ll get a message warning me that my document may contain features that aren’t compatible with this format, because ODF can’t represent 100% of the things we can do in Word

Now I’ll open this document in OpenOffice version 3.0.1.  In a future post I’ll look at differences between various existing ODF implementations, but for today’s post I’m just going to stick to OpenOffice 3.0.1 and Office 2007 SP2.


When I open my ODT document in OpenOffice Writer, here’s what it looks like:


image


As you can see, the document looks essentially the same in both applications.  The page break is the only obvious difference – it occurs at a different point in the document due to differences between the default line-spacing values used in Word and OpenOffice.  Other than that detail, the document looks the same in both applications, with the same fonts, formatting, headings and content.


The line-spacing variation is something you can see in other ODT documents and other ODF implementations as well.  For example, if you open the latest draft of the ODF 1.2 specification (OpenDocument-v1.2-cd01-rev06.odt) in IBM Lotus Symphony 1.2.0, it is 931 pages long, but if you open the same document in OpenOffice Writer 3.0.1, it’s 875 pages long.  These types of variations demonstrate a fundamental difference between a fixed-layout format (such as PDF or XPS) and a flow-oriented layout like ODF or Open XML.  Flow-oriented formats work well for dynamic editing activities, whereas fixed-layout formats rigidly pin down the layout of a document so that it will be rendered exactly the same on different devices.  For these reasons, most people prefer to use a flow-oriented format during document authoring and editing, and a fixed-layout format for published documents that are no longer being edited.


Getting Fancier.  Now let’s move on to some fancier formatting and see how that works.  I’m going to open this document in Word and make a variety of changes:



  • I’ll switch to a different styleset, which will alter all of the styles in the document; I’ll choose the “Modern” styleset from Word’s built-in options

  • I’ll Insert an image into the body of the document, with square text-wrapping around it

  • I’ll apply a table style to the table; I’ll use one with header-row and first-column formatting turned on, as well as row and column banding

  • I’ll insert a header and a footer, using Word’s “Annual” style for header and footers

  • I’ll insert a table of contents, using the default settings

As a result of these changes, my document now looks like this in Word:


image


And if I save that version as an ODT file and open it in OpenOffice, I see this:


image


You’ll notice that many things are identical in both Word and OpenOffice, and a few things look a little different in each application.  Here are some things that are the same in both applications:



  • All of the content is the same – nothing is missing in either application

  • All of the title/header/text styling is the same

  • The table styling is the same

  • The header and footer look the same

  • If you were to try clicking on the links in the table of contents, you’d find that these work the same in both applications (i.e., clicking on an entry takes you to that part of the document)

And here are some things that appear differently in the two applications:



  • The formatting of the hyperlinks in the Table of Contents is different, due to differences in Word and OpenOffice’s default styling for hyperlinks

  • The document is a little longer in OpenOffice than in Word, due to issues like the default line-spacing issue mentioned above

  • The text-wrap margins around the inserted image also differ slightly, again due to differences in application defaults

If you’d like to test these sample documents yourself, they’re in a ZIP file attached to this blog post (below).


Getting more information. This demonstration was just a simple example, for those who are curious about how the new built-in ODF support works in Office.  You can find more detailed information about SP2’s support for ODF 1.1, including which features are supported by Word, Excel and PowerPoint, at these links:



Going forward, I’ll be doing some blog posts that get down into more of the technical details, to help explain some of the engineering decisions that we made in our implementation.  For example, tracked changes functionality is of interest to many users, so I’m working on a post to cover why we decided to not implement tracked changes in ODF.


What else would you like to understand about our implementation of ODF?  Share your questions and thoughts in the comment thread, or email me (dmahugh at microsoft dot com) if you have suggestions for topics you’d like to see covered here.  I’m very proud of the work my colleagues on the Word, Excel and PowerPoint teams have done to add ODF support, and I’m looking forward to discussing the details now that SP2 has been released.

SampleDocs.zip


Comments (21)

  1. From Microsoft: Today Microsoft is releasing Service Pack 2 for the 2007 Microsoft Office system. This

  2. orcmid says:

    Wow, great news.  Now we can talk about this as a released implementation of ODF.  Congratulations.

  3. Great news. I’m quite interested in how form fields are handled (haven’t checked that one out, but it’s tax season so all I can think of right now is filling out those forms 😉

    Keep up the good work.

  4. Peter Junge says:

    Congratulations to this great step towards interoperability.

  5. Office 2007 SP2 includes major performance enhancements for Office applications and servers, most notably

  6. Se poate descărca de pe Microsoft Update . Cele mai importante goodies din acest SP, după părerea mea

  7. Dave says:

    I’m assuming that the line spacing can be addressed directly: If the page or paragraph style has an explicit line height and leading, will that result in consistent text flow across page breaks, etc.?

  8. Darren Bell says:

    Can you test using the test documents that are available from Oasis?  This would show up any holes in the produced XML.

    Maybe then we’ll see it as a complete implementation.

    Also, what can Office do that ODF cannot store?

    Thanks.

  9. Ian Easson says:

    I did my own quick test.  I have a 525 page book I am nearly finished writing in Office 2007.  It has a lot of complexity, so I thought it would be a good test.

    With SP2 installed, I saved in ODF.  There were the line spacing issues you mentioned.  What I also found was that indents were incorrect (e.g., for a bulleted list, using the built-in List Bullet style).  I would have thought that the indent would have gone over OK, but apparently not.

    I did a second test.  Using the latest OpenOffice, I directly opened the .docx file.  The results were notably worse than using Office 2007 SP2.  Footnotes did not appear numbered, but instead showed as field codes.  The worst was the title page.  It had random sentences from throughout the text superimposed over the picture on the dust jacket.

    My conclusion is that it looks like Office SP2 .docx to .odt is the best route, rather than .docx directly into OpenOffice.

  10. Doug Mahugh says:

    Dennis/Bart/Peter — thanks.

    Dave — yes, if you use only styles that explicitly specify the line spacing, indents, margins, etc you can get a much more consistent appearance and reduce the difference in vertical spacing that you see in my examples.

    Darren — We did test our implementation with the test suite available from the http://testsuite.opendocumentfellowship.com/ to make sure that we can correctly read and write all of those.   Are those the documents you  are referring to?   If not, can you provide a link?

    Ian — interesting test.

  11. Jörg Wartenberg says:

    Thank you for this great feature! I hope Office will stay compatible with future versions of ODF too!

  12. Mitch 74 says:

    Congrats on the filter – for Word. However, since SP2 implements only ODF 1.1 (since ODF 1.2 is still only an advanced draft format), how are formulas stored in spreadsheets? I hear there’s also a problem with tables in slideshows (which is strange, since obviously Word can do ODF tables; why can’t Powerpoint?)

    I also wonder about page styles: I’d like to see how a document that alternates page formats, filler blank pages and such work in both. How are master documents handled?

    @Ian: .docx is a proprietary XML-based format that has a single implementation. OOo developers are having trouble developing an import filter for the following reasons:

    – actual file format doesn’t always conform to the published specification (encryption had to be reverse engineered, for example).

    – there are several redundant features: tables in Word, Excel and Powerpoint are different objects that share 95% of their properties (tables in OOo/ODF are the same, as no difference is made between one document and another) which all require a different import filter method to create a single object: a table

    – some features don’t match with OOo’s internal structure (geometrical shapes and text: OOo has 2 renderers, a simple one and Writer. The simple renderer is used for these shapes, but OOXML requires a richer one)

    – Office 2007/2008 is the only Office generation using .docx; Office 14 should use OXML, and OOo developers think that their time would be better spent developing an import filter that can manage most XML formats at once (better for support, reduces code redundancy)

    Upcoming version 3.1 will solve several problems here, and there are already further improvements planned/started for 3.2.

  13. Warden says:

    Doug, I really wish you could issue a response to the slashdot article. it brings up very interesting points worth answering.

  14. Doug Mahugh says:

    PHPPowerPoint 0.1.0 was released last week, as an open-source PHP API for generating PPTX files, much

  15. Doug Mahugh says:

    Mitch, there is no “filter” involved – it’s built-in support.  What made  you think that it’s a filter?  (I can’t find any place I’ve ever used that word regarding our ODF support, but would be glad to correct it if I have.)

    Also, your claim that docx is a proprietary format is hard for me to understand.  There are many implementers who have written code to generate DOCX files by working directly with the ECMA376 spec – in what sense are the resulting DOCX files proprietary?

    Regarding your questions:

    – as we read the specification, tables in presentations are not allowed in ODF 1.1 – that was added in ODF 1.2, which is not yet an approved or published standard

    – the tables issue was pretty thoroughly debated a couple years ago during the DIS29500 process; Open XML has three table models, each optimized for a particular document type, and ODF uses a single table model across all document types

    – we store formulas in our own namespace; this is the  only option available in any of the  published versions of ODF.  I will be writing about this in more detail in another post later this week.

    – the encryption approach used by our implementation of Open XML is documented at http://msdn.microsoft.com/en-us/library/cc313071.aspx, and code samples are available at http://offcrypto.codeplex.com/

    Your other comments seem to be more about OO’s plans than Office’s implementation, so I can’t add to those.

  16. Doug Mahugh says:

    Rob Weir posted on his blog a couple of days ago an Update on ODF Spreadsheet Interoperability . 

  17. Mitch 74 says:

    @dmahugh: reference to Office 97 installer: "additional file format filters"

    if you think of a better shortcut term, please tell me 🙂

    ECMA376 relies upon but doesn’t describe formats such as VML although they are declared ‘deprecated’, relies upon but doesn’t describe paper sizes internal to MS Office (non-compliant with ISO paper sizes), relies upon non-standard leap years and date formats. Were it really open, it would have been accepted as-is by ISO, instead of being strongly edited (1,000 modifications required for 6,000 pages, published 8 months after it ‘became a standard’ instead of 6 weeks). And stop me if I’m wrong, but currently no Office version complies with ISO 29500:2008.

    About the rest of my comment, it was directed at Ian.

    About tables: yes, it was debated for DIS29500. However, I fail to see how ODF 1.1 can’t accept tables in a presentation, as there are no differences between ODT, ODS, and ODP apart from the last letter: their XML manifests and contents are identical, so like a text document can include a table, so does a presentation – Impress couldn’t add a table to an ODF 1.1 presentation but Kpresenter could, and so can OO.o 3, even when setting the ODF compliance to 1.0/1.1.

    ODF 1.1 thus supports tables in presentation documents.

    There is NO reason Powerpoint would scrap a table added to an ODF presentation, since the currently standardized format accepts it, except to artificially limit the export filter. It’s not because an (now outdated) application didn’t support that particular feature that it can’t be done.

  18. Razvan says:

    Doug, I 100% agree to Rob that SP2 is a step BACKWARD on the road of interoperability. That’s because it simply destroys the de facto interoperability that exists between all other suites – it creates spreadsheets that can be manipulated with MS Office ONLY.

    Of couse, OpenFormula is not an official standard yet, but why has Microsoft chosen to break this de facto alignement ? Hopefully, OpenFormula WILL BE an ISO standard in a few months. What then ?

    How will you explain that to European countries that are now fully commited to ODF in their public administrations ?

    On a short term, it will be good for Microsoft business & monopoly. But for medium & long term, it’s suicidal. People – even non-technical – ARE INFORMED on your moves.

    We keep an eye on Microsoft…

  19. Doug Mahugh says:

    When I blogged about the release of SP2 with ODF support two weeks ago, I mentioned that I was planning

Skip to main content