Trip report: WG4 Prague

Just back from another WG4 meeting, so I thought I’d write up a few thoughts. My trip was slightly longer than usual this time, mainly because I was with Doug at the XML Prague conference the preceeding weekend. Doug wrote a trip report about XML Prague, so I won’t talk about it here.

We had the traditional three days of meetings for WG4 and I’d say these were very productive. Aside from munching through defect reports, there were a few interesting topics that looked worthy of calling out. All the photos in here are Doug’s, unless I say otherwise.

Working out how to maintain Strict and Transitional

The text of IS 29500 specific to Office document formats is currently split into two parts – Part 1 (Strict) and Part 4 (Transitional). Part 4 is styled as an addendum to Part 1 because, originally, Transitional was a superset of Strict, so it made a lot of sense to lay it out that way. However, recently we’ve changed a few areas so that Strict actually has a larger feature set than Transitional – a good example being Amendment 1 (formerly Amendment 2) which deals with the treatment of ISO-8601 dates in spreadsheet cells. It’s difficult to describe restrictions in an addendum, and it’s even harder to describe paragraphs that are simply modified. Going forward, this problem is likely to worsen, as WG4 are not intending enlarging the feature set of Transitional with future enhancements. Should we reorganise the standard and, if so, how? We discussed a few options in the meeting, and a task force was assigned to report back with the pros and cons of the various approaches. We agreed the following day that the best solution was probably to split Parts 1 and 4 into entirely distinct texts, but we also agreed that this shouldn’t happen until such a time as a Revision of Part 1 was initiated, to avoid the duplication of the revision work. I suspect we’ll discuss this more in future meetings, as it’s likely something we’ll have to solve both for implementers and for ourselves as editors.

Changing process around maintenance of schemas

On another maintenance topic, we continued a discussion that’s been ongoing on the WG4 mailing list regarding how best to maintain the schema files. At the moment we have four sets of schemas, as we have full RNG and XSD schemas for both Parts 1 and 4. In total there’s some 2.5Mb of raw schemas, so it’s not an insignificant bundle. We’re obliged to distribute both printed and digital copies and until last week we maintained these separately, giving us some eight different schema sets to keep track of. We’ve actually found the current system to be pretty reliable, but there is widespread agreement that a better one could be found, and so we discussed this in some detail and came up with a plan.

From now on, we’ll use the electronic schemas as a master copy. Our electronic schemas are stored in an SVN repository (hosted on Assembla) and WG4 members are able to check in/out – by tagging schema changes with the relevant DR numbers, we will be able to easily derive schema changes which applied to a given DR, and we’ll also be able to consolidate the changes using standard source control tools. Whenever we are working on defect reports which require schema changes, publishing Amendments or altering the core text, we’ll integrate the necessary schema changes from the electronic copy. I was tasked with building some sort of tool to effect that, a task which I’m secretly rather looking forward to. We also agreed that schema changes would be okayed before DRs could be moved to “Last Call” by WG4, and so we have a new DR status to represent “waiting for schema”. I’m reminded of how valuable the face-to-face meetings are when we can discuss a topic at length, and finish off by agreeing on a new process and putting it in place immediately. Somehow this is often a little trickier on conference calls – perhaps this experience will all change with the onset of products like Cisco’s Telepresence, but we’re not quite there yet.

Measurement units in Strict

For some time, we have had a long discussion about a topic covered by two defect reports - DR 09-0295 [WML: gridCols measurement units] and DR 11-0001 [SML: Failures to specify measurement units explicitly]. As you can see from the DR itself there’s a long history on DR 09-0295, which runs to some sixteen pages, and it took us some time in the meeting even to fully grasp what that history was. However, I’ll attempt to summarise…

In ECMA-376, there were many attributes whose values were simply numbers, but which actually represented distances (e.g. marTop, the top margin for an HTML DIV element), which is a distance in twips. At the BRM meeting back in Geneva, it was determined that implementers should be able to specify units on these instead of relying on the default unit types – so instead of using the value “5” to mean 5 twips, they could use “0.003472 in” or “0.25 pt”. For a large number of such attributes, a union of types was created to allow both numeric values and strings with a two-digit unit specifier appended to them.

Now, things become a bit more complex. Firstly, the unit specifiers were made optional in both Strict and Transitional. In Beijing we agreed that they should be mandatory in Strict and optional in Transitional, and I wrote up the changes necessary for that. However, in Prague, we just re-agreed that they should be optional in both. Well, hey, it’s the right of a committee to change its mind. You might think that this means we can just leave the standard as it is right now! Well, you’d be wrong. DR 11-0001 points out a large number of places that appear not to have felt the benefit of the BRM changes, and it’s quite possible that there are more. Additionally, there are some places where the default units (specifically EMUs) are not actually available in the list of unit specifiers, and that seems inconsistent as well.

Murata-san, Jirka and I were tasked with establishing the possible scope of missing changes from the BRM, and once we have that data we’ll reconvene and discuss in WG4. We’re going to determine this by each attacking the standard from the angle we feel most confortable about – handily, for Murata-san and Jirka it’s the schema, and for yours truly it’s the prose. VBA, here we come.

Extra-curricular

Prague is a beautiful city, and so it seemed rude not to take a look at the local sights while we were there. Doug and I walked over Charles Bridge a few times to see if we could take some fine original shots. Given the number of giant cameras on the bridge at any one time, I think this might be the world’s most photographed landmark. If you don’t get the photo you wanted on Charles Bridge, console yourself with the fact that you can just hunt through Google image search and dig up the exact photo you had in mind, probably taken on the same day. I did take one half-decent photo (below) but some practice required, evidently.

Overall, I think these meetings went very well and I enjoyed my first time at XML Prague. We’ll be meeting again in Berlin in June – this will be more of a flying visit for me because I’ll be taking part in the 24 Hours of Lemons on the following weekend, and I’ll have to dash out at the end. Perhaps I’ll do some sort of combo trip report for the two events…