Open XML numbering options

What kind of numbering options do people need in word-processing documents? That's a pretty open-ended question, and the answer depends on what you mean by "people." If you're just talking about people in the United States or other Western countries, there's a short list of options that will usually get the job done: decimal (1,2,3), alphabetical (a,b,c or A,B,C), perhaps Roman numbers (I,II,III) and a few others.

But if you look at that question from a more global perspective, there are many more options that you will need to include, because there are many different alphabets and therefore many different "alphabetical" lists.

Russian numbering options

Consider the Russian alphabet, for instance. It has the 33 characters shown here, and any alphabetical list in Russian needs to assign numbering prefixes in this order.

That Russian-style numbering is a requirement in many documents. The Russian constitution, for example. Check out Статья 71 (Clause 71) of the Russian constitution, which covers the responsibilities of the Russian Federation. Here are the first few items in that clause:

а) принятие и изменение Конституции Российской Федерации и федеральных законов, контроль за их соблюдением;
б) федеративное устройство и территория Российской Федерации;
в) регулирование и защита прав и свобод человека и гражданина; гражданство в Российской Федерации; регулирование и защита прав национальных меньшинств;
г) установление системы федеральных органов законодательной, исполнительной и судебной власти, порядка их организации и деятельности; формирование федеральных органов государственной власти;
д) федеральная государственная собственность и управление ею;
е) установление основ федеральной политики и федеральные программы в области государственного, экономического, экологического, социального, культурного и национального развития Российской Федерации;
ж) установление правовых основ единого рынка; финансовое, валютное, кредитное, таможенное регулирование, денежная эмиссия, основы ценовой политики; федеральные экономические службы, включая федеральные банки;
з) федеральный бюджет; федеральные налоги и сборы; федеральные фонды регионального развития;

Since this list uses the Russian alphabet, a document format can't just use English letters and translate each prefix -- that would leave out some letters. Instead, you need a numbering option that is truly "Russian alphabetical," so that it will make sense to Russian readers. (By the way, if you do happen to read Russian, check out my colleague Alexei Federov's recent article on Open XML in the Russian computer press.)

Open XML, which supports all of the numbering options that Word has picked up over the years, includes options for "russianUpper" and "russianLower" lists. So the numbering definition for a list like the one above is pretty simple. The lvl element just needs to have these three child elements:

<w:start w:val="1" />
<w:numFmt w:val="russianLower" />
<w:lvlText w:val="%1)" />

Those three elements say, in essence, start at the first value in this numbering option, use the lower-case Russian alphabet, and put a parenthesis after each item. It doesn't get much simpler or more obvious than that.

Other numbering options

Open XML also has about 50 other numbering options to choose from. Here are a few examples of what's available, as defined in Section 2.9 of Part 4 of the spec:

  • decimal = 1, 2, 3, etc.
  • cardinalText = one, two, three, etc.
  • ordinal = 1st, 2nd, 3rd, etc.
  • ordinalText = first, second, third, etc.
  • upperLetter = A, B, C, etc.
  • lowerLetter = a, b, c, etc.
  • upperRoman = i, ii, iii, etc.
  • lowerRoman = I, II, III, etc.
  • hex = 1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
  • chicago = *, †, ‡ (as defined in the Chicago Manual of Style for footnotes)
  • bullet = bulleted lists, with any character from any font (or a glyph) as the bullet prefix

Most of those options use the Latin alphabet and Arabic numerals, so they look familiar to speakers of Western languages. But there are also many numbering options in Open XML that are based on other alphabets and numbering systems. We already saw a Russian example, and here are a few other typical examples:

In each of those lists (which are just screen shots from Word), I changed just one thing in the underlying XML markup: the numFmt value. The resulting lists are dynamic lists that behave the way users would expect. For example, a Vietnamese person can insert an item above hai (2) in the vietnameseCounting list above, and the new item will be numbered hai (2), with the item that's currently hai becoming ba (3), and so on down the list.

The alternative, which you'd have to do in certain other document formats, is to hard-code these prefixes for each item. And then you'd have to re-generate new prefixes every time you add or remove items from the list. But with the Open XML approach, you just say that this is a vietnameseCounting list, and the values will be enumerated as Vietnamese numbers.

That darn spec is so big!

I've seen a few people lately complaining about the size of the Open XML spec, and some have even gone so far as to print out the whole thing and haul it around. These aren't typically people who are learning Open XML, in my experience -- people studying the spec tend to print out Part 1 (Fundamentals) or Part 3 (Primer). My spiral-bound copy of Part 3, for example, runs about 3/4" thick and covers all the basic concepts in 474 pages.

The lion's share of the spec is in Part 4, the Markup Language Reference. Part 4 includes definitions for all the elements and attributes in the 89 schemas that make up the Open XML specification, and it's roughly 5000 of the 6000 pages of the spec. As one of many examples, all the numbering options mentioned above are covered in Part 4.

It seems to me a logical question to ask people who complain about the size of the spec is "what would you like to see removed?" For example, some document formats don't define the storage of spreadsheet formulas, but Open XML specifies those in full detail to assure future interoperability -- should that information be removed? Some document formats don't have anything like the Open Packaging Convention, which enables custom schema support, embedded macros, and other types of document extensions that are very popular with users -- should that information be removed?

Some document formats don't include the multi-cultural numbering options mentioned above -- should those be removed? That might make sense from a narrow Euro-American perspective, but if you need to number the items in a Vietnamese document, conform to the legal-document numbering requirements in Korea or Japan, or store the Russian constitution in a document, then some of those other options become pretty important. I'm glad the Open XML spec didn't leave them out.