Numbering formats in ODF

Article
05/26/2006

I was wondering if someone more familiar with the ODF spec could help me out. I had made the following reply yesterday to Alex's assertion that ODF was as feature rich as Open XML and I want to make sure that I'm not misstating things:

I think you might want to dig a bit deeper into the formats. ODF does build on existing industry standards, but at times they are partial implementations, and it still leaves out a lot. For instance, Open XML actually uses more of the dublin core metadata schema than ODF does.

Another easy example would be to look at the different types of numbering for a wordprocessing file. In Microsoft Office you can say that the numbered list should be "first", "second" and "third" instead of "1.", "2." and "3.". ODF doesn't support that.

That's just the beginning though. If you are from another country like Japan or China, there is absolutely *zero* mention for how your numbering types are defined. The spec only specifies:

- Numeric: 1, 2, 3, ...
- Alphabetic: a, b, c, ... or A, B, C, ...
- Roman: i, ii, iii, iv, ... or I, II, III, IV,...

No mention at all about what you do for any other language. If you use OpenOffice, they actually do support other languages, and they even save out those other numbering formats into the ODF style:num-format attribute. The problem though is that behavior isn't defined in the spec, so how does anyone else that wants to read that document figure out what OpenOffice's extension means? Maybe I'm just missing something, as the ODF spec is really vague in a lot of areas, but I looked around for awhile and couldn't find anything.

Even if you don't pay attention to the things that are just flat-out missing from the format, the documentation for the things it does support is pretty minimal. In the latest Ecma draft, we have about 200 pages discussing the syntax of formulas for spreadsheets, ODF has a few lines. That gives me the impression that no one that does accounting or works on Wall Street was involved in the standard because I can't really imagine them allowing it to go through without specifying how formulas should be represented. It's no wonder the few applications referenced as being "full implementations" of ODF aren't even capable of full interoperability (link).

Alex then replied with the following:

OpenDocument is well known to support variety of languages, and the Japanese ISO member pointed out a couple of problems with the spec. (mostly to do with international URIs). I think they would have noticed if numbering was a problem. The guys in the middle-east were looking at it too.

You're absolutely right about formulas; OpenDocument does not specify a syntax, and that is something the TC is working on. There is a wider problem here, though: formula syntax is something users know directly. Should OpenDocument do something new, or just what Lotus 1-2-3/Excel did/do? OXML has the luxury of only caring about compatibility with Office file formats; OpenDocument is designed to be widely compatible with all.

I may have jumped the gun when I stated that there was *zero* documentation, but I'm curious to know where in the ODF spec these things are specified. When I looked at the numbering section (4.3 , 12.2.2, 14.10.2) it was pretty light, and only called out those three styles I mentioned above. In section 12.2.2 there is reference to the approach used in XSLT for the format attribute, but it just says the attribute is done in the same way, not the same actual formats. The spec then states though that it only supports a specific set, and that it does not support all the different types the XSLT approach uses. The spec says that the number styles supported are ("1", "a", "A", "I", and "I"). Let's assume though that the spec was just worded improperly and it does in fact use the XSLT format approach to the full extent. Then why does OpenOffice output Japanese numbering format like this:

The XSLT spec says that you only put the first character of the list in the format attribute (or at least that's how I interpret it). I didn't see any mention of the approach of putting the first three characters followed by an ellipses.

That was using Kanji numbering. The XSLT spec actually does call out directly how to do Katakana numbering, and OpenOffice actually doesn't do that properly either (the XSLT spec says it should format="ア" ). Instead, OpenOffice does this:

Now, for those familiar with Japanese numbers (and actually a whole host of other number styles) you know that it isn't always possible to represent a numbering style with just a single character . There are a couple different Kanji numbering styles that start with the same character (the difference is what you do once you get to 10). I assume that's why OpenOffice is going the route that it is.

Where is this approach documented though? Maybe I'm just misreading things here, and there is another portion of the ODF spec or the XSLT spec that allows for that approach? Or does this mean that if you are writing a Japanese document and use numbers with OpenOffice you aren't creating a valid ODF file? This newsgroup post implies that OpenOffice isn't yet fully supporting ODF so maybe that's the case? I suppose the response could just be that the format is extensible and you can place anything you want in that attribute, but how does that lead to interoperability? There's nothing to tell other applications how they should interpret that from as far as I can tell (again, I could be missing something obvious).

Almost every site I visit to find more information focuses almost completely on the marketing or political side of ODF. There are discussions around conformance, logo compliance, getting governments to support it, etc. etc. etc. I'm having a really hard time finding any good blogs or sites that discuss how to actually use it. I actually came across the oasis public mailing list archives that had some useful content, but I wasn't able to find anything about this issue.

-Brian

Numbering formats in ODF

Additional resources