The Open XML specification is one of the most scrutinized specs ever to go through a standards process, and that’s great news for developers because there is a lot of useful new content in the proposed dispositions of various comments. A good example is on Brian Jones’s blog: the proposed disposition covering hashing algorithms. The disposition offers a lot of detailed new information about hashing options and the markup to support them. The disposition includes a variety of hashing algorithms to allow for various levels of security, including support of various ISO standards in this area.
In this case, the passwords are required not to encrypt the file to protect the contents of the file, but to control access to the ability to modify content. For instance, you can restrict formatting or editing of portions of the document, and a conforming application will restrict behavior appropriately. These passwords are stored in the document, but good security practice requires that the application hash the password, rather than storing them as plain text. After all, a user, unknowingly, might specify the same password for controlling access to a file as they specify for their bank account, or their admin account on their computer.
If you read the disposition, you can see a whole bunch of interesting detail, including a listing of the options of the hashing algorithms, the referenced ISO/IEC spec, the ability to specify the salt value and spin count of the hashing algorithm, and even notes on algorithms to be avoided because of publically known breaks. It then lists an XSD schema fragment that defines the contents of the document protection element.
Much has been made of the size difference between the Open XML and ODF specifications. It’s interesting to take a look at where they differ in the level of detail or specificity; for example, here’s what the ODF specification (http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1.pdf) has to say about hashing:
A user can use the user interface to reset the protection flag, unless the section is further protected by a password. In this case, the user must know the password in order to reset the protection flag. The text:protection-key attribute specifies the password that protects the section. To avoid saving the password directly into the XML file, only a hash value of the password is stored.
I kept thinking that I was missing something, and kept searching the spec from beginning to end, searching for “password” and “hash” and wading through irrelevant stuff. If you search for “hash” you find three locations (4.4.1, 8.1.1, and 8.5.1). If you search for “password”, in addition to the above sections, you find a section on sources for pilot tables (8.8.3), a section on a password controls (11.3.3), and information on zip file encryption (17.3). Well, anyway, I can’t find it. If someone else knows where this is in the spec, please let me know.
Interestingly, if you read the section on passwords for sources of pilot tables, you see:
The table:password attribute specifies the password required to access the source. It is passed to the service implementation. Its value is application and service specific.
In this case, no mention of hashing at all. According to the spec, is this password stored as plain text? What’s up with that?
It goes beyond the missing formulas in spreadsheets – if ODF is missing this much detail, it is no wonder that some of those involved often brag about the specification being so small compared to the Open XML specification. Actually, if I were them, I wouldn’t be bragging about this.
I’m going to rant a little bit more.
I’ve designed and built a whole bunch of software systems. I’ve written a pile of specs, and I’ve read many more piles of them. I look at it from a common sense point of view – if I were a system architect, how big would I expect an XML specification to be that defined ALL of the information necessary to implement the document format for a fully functional word processor, spreadsheet, and presentation program, that contained all the features that doesn’t leave some portion of users wanting? Well, I’d expect it to be about the size of the Open XML specification. There is a lot of necessary detail in it. Typical developers don’t need to read it from beginning to end. They treat it more like an encyclopedia. When you need to know about implementing access control, you read those portions of the spec.
Of course, for *good* architects and system designers, a spec this size doesn’t present any obstacle. In fact, it gives them a warm and fuzzy feeling, knowing that the spec is as complete as possible.
The level of detail being added to the Open XML spec is great for developers. It helps them to implement Open XML support more quickly and reliably. It’s also great for users because it protects the investment they’ve already made in those wordprocessing documents, spreadsheets and presentations, and assures long-term access to their own content.