Open XML Converters for Mac Office


There’s been a bit of flak about the Office Open XML file format converters for Mac Office.  Sheridan posted an update on MacMojo, and Schwieb weighed in regarding some of the comments that people have made.  There’s quite a bit of speculation gong on, and not a whole lot of information, so I’m going to try to dispel some of the fog.

This discussion centers on Word, because I’m a Word developer, but the general ideas hold for all three of the affected Office applications.  The most significant difference between Word and the rest of the suite is that Word has a converter API.  There’s a WinWord converter SDK that’s downloadable from the Microsoft support web site.  While there are some subtle differences (FSRef’s instead of file paths, for example), the overall API is the same for Mac Word.  Of particular importance is the fact that the lingua franca for converting Word files formats is RTF.

So, in order to write a converter for Word, you need two things: 1) a component that reads and writes the external file format; and 2) a component that generates and parses RTF.  Also, because of the hierarchical structure of XML, you need to have some form of intermediate representation of the file.

Let’s put our Win Word hat on for a second, go back in time about two years, and think about how we might do this.  Well, by the time Win Office ships, we’ll have a software component that satisfies all of those needs: Word itself, or the new version of Word, to be precise.  So, one, very efficient, way to implement that converter is to refactor the UI out of Word 12, repackage the result up with any other necessary components, and write a wrapper around all of it that exposes the API that the older version of Word expects converters to implement.

Do that, and you can ship the converters the same time you ship Office.  The big upside of this idea is that you can really narrow down the scope of testing you do on the converter itself, because you’ve already tested both the RTF and the Open XML components by testing Word itself.  So, you gain leverage from both a development and a testing perspective.

Now, let’s put our Mac Word hat back on, and think of what our options are given the reasoning I’ve stated above.  You can’t really ask the Win Office team to toss their idea in the trash just so you can work on the converters in tandem.  Well, you can, but one would have to be very optimistic to expect more than a polite, “Sorry.” I can’t think of a clearer example of the tail trying to wag the dog.

Instead of following in Win Word’s footsteps, how about we spin off a separate development team to work on the converters separately from Word itself?  I’ve read suggestions made by some that writing converters from scratch could have been done in a relatively (in some cases ridiculously) short amount of time.  So, let’s test that idea by doing some back-of-the envelope calculations.  You can check these numbers for yourself by downloading the reference XML schemas and performing some searches through the .xsd files.

First, when could we have realistically started working on this?  Well, not before Office 2004 shipped in April of 2004, so, ignoring the availability of specifications for the new format, let’s assume that we began work on this roughly two years ago.  The final draft of the spec wasn’t submitted to the ECMA until this past October, so in terms of actually having a spec to write to, 24 months is extremely optimistic for the time period available.

How big is the task? Word, alone, has more than 1100 individual XML elements that need to be processed.  We do this processing by writing something called a “handler”, and each one of these elements needs a handler.

Now, some of these elements are more complex than others.  A single, user-defined document property isn’t very complex.  A paragraph, or a document section, can be very complex.  For some of these handlers, one developer can whip out two or three a day.  Some of the other handlers will take a single developer up to an entire month to complete.  Trying to get more than one developer working on the same handler at the same time ends up being very counter-productive.  So, one handler per developer, and, on average, it’s fair to assume productivity of one handler per dev per day.

At that rate, a team of 5 developers will implement 25 handlers a week, which means that we’d have all the XML handlers written in 44 weeks.  Well, a little more than that, because I’ve rounded the number of elements down to the nearest 100.  Nevertheless, we’ve taken a little less than a year to get the converters reading the new file format.  We still aren’t writing the new file format, we have the RTF side of things to worry about, which is actually more complex than the XML side, and I’ve completely left out all of the design and coding for the intermediate representation of the file.  The intermediate representation, itself, is at least 6 to 8 months worth of work.

In other words, we’re almost halfway through the schedule, with less than a quarter of the development work done.  You want more developers?  I don’t have more developers.  This is just for Word.  We need additional teams for Excel and PowerPoint.  People want Universal Binaries of Mac Office in their hands, they’re adding new features to Win Office 12 that Mac Office 2004 won’t understand, Apple has a new HIView architecture that requires some re-architecting of parts of Mac Office, and none of this work adds a single new feature to Mac Office.

More importantly, we’ve also run out of time to test the converters.  Had we started writing converters from scratch, by the time we had something fully tested and ready for public consumption, it would have taken us longer than it has taken us on the route we’ve chosen, in no small part due to the fact that the current route we’ve chosen allows us to leverage almost all of the development work of the Win Office team.

The only reasonable choice for Mac Word has been to follow in Win Word’s footsteps.  For those of you who attended the last Mac BU customer council meeting in Redmond and were wondering what I was doing while sitting in the back corner, now you know.  I was busy refactoring Mac Word so that Mac Word 12 could, eventually, become the converter for the new file formats.

The big win for this strategy is that we get to do all of the things that customers are asking us to do with the next version of Mac Office: Universal Binaries, support for most of the new data types in Win Office 12, re-architecting the UI to take advantage of composited HIViews and add some compelling new features.

Lastly, can we port the Win Word converter?  Well, actually, in a way, porting the Win Word converter is exactly what we have been doing, but we’re still faced with having to wait until Win Word ships before we have the final source code to merge into what we’ve already ported.  Once that merge is done, then we still have to go through several months’ worth of testing and bug fixing before they’re ready for public use.

And that is precisely why there’s a delta between Win Office 2007 shipping and the full availability of converters for Mac Office.

Update:  I’d like to clear up some things about what I said earlier.  My back-of-the-envelope estimates included a lot more work than just supporting Open XML in Mac Office.  Open XML is the easy part.  It included the work required to generate RTF in both directions and to implement tools for developers.

If we had to add support for Open XML to Mac Word 12 without being able to port code from Win Word, the read/write estimates shrinks down to about 8.5 man/years (44 weeks x 5 devs x 2 for read+write).  As I recall, this about half of what it took to add HTML support to Word: 10 or so devs over a release cycle of 2 years.  Doing the work for PPT and Excel isn’t strictly a multiple of Word, because about 30% of the XML elements are shared between the three apps.  So, for all of Mac Office, I’d estimate it would take a total of about 5 devs over the release cycle to add full Open XML support starting from scratch, as part of the larger project.

 

Rick

Currently playing in iTunes: Time Loves a Hero by Little Feat

Comments (14)

  1. raul says:

    Despite all the kerfuffle on the internet about the office converter, the truth is anyone who has been around for a while knows there is almost always a lag before a format is adopted. I expect it will be years before most MS Office installations get upgraded and that the adoption curve will be a long one.  So waiting a couple of months will be no big deal for most people/companies.

    It would be nice if document formats for all programs were fully documented and if converter apis were always converted. This is a way to assure you customers that someone will always be able to crack open your documents and allows third parties to read and write documents in your format promoting adoption.

    My big worry is never with new formats but with old ones. I have Mac word 1.0, 3.0 files, macwrite files, fullwrite files all of which are now very difficult to open because formats have moved on….

  2. bynkii.com says:

    One of my biggest bugaboos is communications, or more specifically, a lack thereof from the people and companies I do business with. Well, personally too. One of the biggest signs that I had to marry Melissa and (relatively) fast, was that when I told

  3. Sounds like a good explanation, but I reserve the right to continue to be grumpy about the situation 🙁

  4. Hub says:

    So you are saying that porting the work done for Office on Windows is not possible? I thought you were champions in cross-platform?

    After all there is nothing magic in reading and parsing XML to convert to something else.

  5. Rick Schaut says:

    Hub,

    "After all there is nothing magic in reading and parsing XML to convert to something else."

    No.  However, there, apparently, was some magic involved in how you read what I wrote.  I did, in fact, say, "Well, actually, in a way, porting the Win Word converter is exactly what we have been doing."

  6. DD says:

    I agree. This whole docx thing is a tempest in a teapot. Its more important for users of Mac Office to get full Outlook compatibility and to come up with a reasonable solution to the macro problem.

    DD

  7. R. Mansfield says:

    A concern of mine is older files as well. I have Word files as old as v. WinWord 2.0 (from back in the days before I switched to the Mac).

    I wouldn’t mind converting all my files to newer formats. I know you guys have enough to do as it is, but it might be nifty to create a utility that searches a user’s hard drive and automatically updates all Word documents to current versions. Or maybe there’s something like that out there?

  8. Is Microsoft’s Open XML document standard "so complex and so geared towards compatibility with legacy Office compatibility that it could never be implemented as a fully functional file format by any competing personal productivity applications (PPAs)

  9. Not Tellin! says:

    I won’t debate with you about XML and DTD’s or well formed anything, in the end all I am left to wonder STILL is why so long! Reading above you make it seem to be a huge undertaking, and if it truly takes you 3-4 months to create what should be a SIMPLE XML document converter all the stuff I’ve been told about XML from my team, and all the promise of your OPEN XML standard submission, is just BS! The promise of XML in general and Open XML in particular, is it makes it easier for applications to share. Seems to me it was easier the old proprietary way!  

    The real issue here, if again I may be so bold to put on my PM hat, was that you allowed the converters to be written serially in the first place!  The converter (Mac/Win) should share a common code base and built together. There is no reason for a linear development cycle.  NONE.  That was a pure management decision. I would have had a single Mac engineer checking out the Win source, working on the Win team, and converting and recompiling to Xcode for a simultaneous or damn close, release. Our product has far FAR FAR more complex than MS Word and we still release concurrently, and that’s the whole product, not just a file converter.

    I can’t believe you didn’t know this was the default format so I can’t believe you didn’t see this uproar coming, and I can’t believe you didn’t determine that you needed to invest resources to get it done. I would have fought hard for this, I would have demanded the resources, your users deserve better, and the team doesn’t need this aggravation.

    So with that in mind I’ll stop posting so you can get back to work on what I still think is the best office suite out there, I just would have made different decisions, to show it also had the best management team.

  10. Rick Schaut says:

    NT,

    See my update regarding XML parsing.

    As for serial vs. parallel, what we’ve actually done is best thought of as a hybrid approach between those two extremes. And, I still think that represents the shortest possible solution to this particular problem.

    With that, I’m getting back to working my but off to get converters out to you folks.

  11. ADAXL says:

    People are worried because Microsoft has a nasty reputation for making products that do not play nice with *anything*, not even other Microsoft products. Mac office always had problems with Windows office, Outlook and others. Recently, Microsoft produced its so far greatest feat of un-niceness with the Zune, which does not work with Vista (OK, that can be fixed until Vista launches, but it still makes people nervous) and does not work with Microsofts older "playsforsure" initiative (this is a real problem for the "playsforsure"-based music stores and their customers).

  12. Alex Kac says:

    Not Tellin! – I don’t think you get it. I’m a dev and this is not XML parsing only. Its a CONVERSION from XML to a proprietary format and not only that – its a conversion of a newer file format that contains data that the old file format does not support and vice versa. This is not converting a CSV to an XML rep or XHTML to HTML or whatnot where its just some format. Its more like a dynamic conversion between Java and C#. Its actually processing the data into a new bit of data.

    And as much as what you say should be true – you are right, it is a management decision. A management decision on the Windows side to not write cross-platform code so the Mac side has to do major work to port.

  13. Luke A. Johnson says:

    If just for validation. Rick, I feel your pain! Working in an IT department I go through this all the time. Nothing you do will ever be right. It’s frustrating! I feel the macBU has, in most cases, taken care of us the best way thay can. I don’t doubt that theay will on this as well. So give ’em a break will ya!

  14. The dealers-of-lightning over at Parallels put out a new beta (build 3094) of their must-have Desktop For Mac product last week. Holy smokes, it’s cool. First, a bit of history. Parallels Desktop for Mac ("Parallels" for the rest of this article) is a