Highlighting in a document

I’ve had a lot of folks ask me to provide more information on what features are missing from ODF and why it was that we decided to create out own XML format (Open XML). I didn’t want to get too involved in pulling together a full detailed list, but it’s probably worthwhile pointing things out every once and awhile. Most of you know that ODF wasn’t even around when we first started working on our XML formats, so that’s really one of the big reasons. Another reason is that we need to make sure that we created an XML format that all of our customers could (and would) use. We want our customers to move all their existing documents into this new format and we need them to be willing to use it as the default format. ODF just wouldn’t have allowed us to achieve that (both because of a lack of functionality as well as different optimizations that sacrifice things like performance).

An area I just came across today that really surprised me was highlighting. I’m sure most folks are familiar with highlighting in a Word document. You can use highlighting to call attention to different areas in a document either for yourself or to point things out to others. The key about highlighting is that it does not affect any other formatting. Character shading (aka background-color in ODF) for instance will still be preserved when you highlight some text. I’ve seen some implementations out there that try to use shading as a substitute for highlighting, but that doesn’t really work because people may also want to apply shading in addition to highlighting. For example, you may have a range of text shaded with light gray (ie the background-color is light gray), and then you want to highlight some of the text in that range. Then, once folks have reviewed the document, you want to remove the highlighting without removing the gray shading. In the ODF spec I saw support for shading on text, but not highlighting which we view as two different things (I only saw mention of highlighting on tables).

I came across this the other day while I was looking through the ODF spec and comparing it to the Ecma draft trying to get a better handle as to why the ODF spec was so much lighter (700 pages compared to 4000). I wanted to see if there were things we could do to reduce the number of pages in the Open XML spec without losing any of the necessary information. It looks like while there are some things that can be done for minor size reductions, we just have a lot more functionality and there is no way we could get it anywhere close to that small while still fully covering wordprocessingML, spreadsheetML, and presentationML. There are three reasons that we have so much more content. The first is that we are just representing a much richer set of features (since we have to XMLize all the existing Microsoft Office binary documents) so as a result there is just a lot more to document. The second reason is that the ODF spec points off to other specs for certain things to provide more details. The third reason is that the Ecma Open XML spec is just a lot more detailed as to how things work. The WordprocessingML sections are the furthest along in the latest draft, and if you read through the paragraphs and rich formatting section for instance (Section 19), you’ll see what I’m talking about. The ODF spec on the other hand is very light and vague on a number of issues (like the numbering format issue I pointed out earlier).


Comments (10)

  1. orcmid says:

    I can tell you another difference.  You are reading and learning from their spec.  Mostly what we see from detractors of OOX is complaints about something I don’t think they are reading.  And I don’t think ODF is getting much critical reading.  Thanks for your care and even-tempered approach here.

    By the way, I just remembered that Doug Mahugh has a different blog for his tech work and finally subscribed to it.  He gives a great breakdown of what is to be found in TC45 working draft 1.3. It’s at http://blogs.msdn.com/dmahugh/archive/2006/05/25/DraftSpecTour13.aspx

    I suggested in a recent comment elsewhere that it might be useful to split the TC45 document into two, since it tends to fall into two parts, with the conceptual information in the first, um, 700 pages or so, and basically reference material in the rest.  I also think tables and other arrangements can cut down on the space taken, improving the density of some of the reference parts.  I love that the PDF takes full advantage of cross-referencing and linking.  The ODF spec is much harder to handle on my computer in that respect (and I am a lover of fine-grained section numbering in specifications too — makes it much easier to submit comments and suggestions).

  2. Gilberto says:

    – 35,000 more computers to OpenOffice.org

    – No problems with the migration

    – More to come

    35000 * whatever exorbitant amount you charge for Office = HAHAHAHA

  3. BrianJones says:

    Thanks Gilberto. That isn’t really related to this discussion about file formats, but I’m glad you shared it with us. 🙂


  4. Alex says:

    Dennis; you don’t think ODF is getting much critical reading?

    It got plenty of comments back from the last ISO round, there are a number of developers building support for it into their applications as the _default_ format, and lots of third-party developers using it.

    I know ODF doesn’t have all the features that OXML has. I also know that the way OXML is being developed, it will not have all the features of ODF – Microsoft are essentially treating Office as being "feature complete". If that’s what they think, great, but ODF is a specification which will be continuously developed.

    ODF is where the real innovation in office file formats is happening.

  5. BrianJones says:

    Alex, we aren’t treating Office as being "feature complete" at all. In fact if you look at Office 2007 there is a ton of innovation. Look at the support for custom defined schema and content controls in Word for example. That’s where developers are really getting excited (we have hundreds of thousands of 3rd party developers already building solutions on top of the XML support from Office 2003).

    My point with the ODF comparrison is that there already exist *billions* of Microsoft Office documents today and our spec absolutely has to support those documents. That’s not innovation, that’s just matching the world today. The spec will then continue to grow and evolve over the years in Ecma as we innovate and build.


  6. Thomas Lee says:

    I had dinner last week in Brussels, where I addressed a group of MEPs and others. My talk was around standards, open standards, and a need for EU patent reform. Over dinner, we discussed a variety of things – not least was document standards and Microsoft.

    Now I’m a techie, and while certainly no XML expert, I’ve read bits of both specs. And as an aside, the work you’ve published so far is impressive.

    From what I can see and read, OpenXML is richer, far richer, than ODF. You make this point in your post, and I can see why you’d prefer folks to go for it.

    I’m using B2 in anger, but for my two main uses of Word, I have to use .RTF and .DOC formats (compatibility reasons). I suspect that interoperability is going to be a bit of an issue for corporate clients – we’re watching this carefully to see how much of a real problem it’ll be.  But I’m certainly going to need to save as lower quality formats for the forseeable future.

    But as I think about the need to save in these lower fidelity formats, I can’t work out why you just don’t bite the bullet and do OpenDoc too. I totally agree that OXML is better, but it’s also better than the binary formats, and .RTF, which you do support with good effect.

    As I told some of the Office folks this week at the London Reviewers Workshop, if you add OpenDoc, a whole set of arguments just go away, and to people who you need to like you, you do something they agree with.  It’d be an easy thing to do (you could buy the company making the snapins for a song, surely).

    For the same reason having to do a separate download for PDF support in Office 2007 is lame, so is having to do a separate download for ODF.

    My .02€ worth


  7. As we move forward with the standardization of the Office Open XML formats, it’s interesting to look…

  8. BrianJones says:

    Hey Thomas, I hear you, and I wish it were that easy to build features into Office. Unfortunately, that just isn’t the case (it’s actually a big investment across all thre disciplines: program management, development, and testing)

    ODF in Office is really one of those features that I expect to see a number of add-ins come along and solve. We’re working hard enough just to finish up the features we currently have in the product and ship it on time. It would have been really hard (actually pretty much impossible) to justify investing the resources that would be necessary to support ODF given all the other things we’re trying to do this release.

    I know that a lot of folks think it’s the case that adding ODF support is no big deal and that we’ve been stubborn to not add it. That view though usually comes from a lack of experience in working on large scale software applications like Office. It’s taken a huge (and I mean *huge*) amount of work to build these new Open XML formats to a level where they are ready to be the default formats. There is no way we could have afforded to also do the ODF format at the same level of quality without also cutting a number of other features, which we weren’t prepared to do. Even today, any requests for ODF support are primarily political in nature, and we actually don’t have a lot of real customers asking for the support. If we do get to the point where there is a significant customer demand, then I’m sure we’d look into it. The Open XML format was absolutely necessary if we really wanted to move people to using XML as the default format (ODF just doesn’t cut it), and it didn’t seem like there was nearly enough demand to justify working on two new formats.

    The folks that do want that support will most likely get it through add-ins like the one the OpenDocument organization announced a few weeks ago.


  9. Oscar says:

    Maybe is because ODF is based on existing standards whilst Microsoft has decided to start from scratch?

  10. OK, forgive the random Sneaker Pimps reference and I promise we will move off this topic of ODF politics…