Texas looks at the interoperability of file formats

For those of you interested in the policies/politics side of file formats, I've seen a couple folks point out this bill currently in place in Texas (https://www.capitol.state.tx.us/tlodocs/80R/billtext/html/SB00446I.htm)

As all of you know by now, I think it's very cool to see this attention being paid to file formats, and the importance they play in all of our lives. I've been working on this stuff for years, and it's always fun to see other folks talking about your work. Here are the traits they'd like to see in a file format in Texas:

Each electronic document created, exchanged, or maintained by a state agency must be created, exchanged, or maintained in an open, Extensible Markup Language based file format, specified by the department, that is:

  1. interoperable among diverse internal and external platforms and applications;
  2. published without restrictions or royalties;
  3. fully and independently implemented by multiple software providers on multiple platforms without any intellectual property reservations for necessary technology; and
  4. controlled by an open industry organization with a well-defined inclusive process for evolution of the standard.

It's great to look at things like this and think about the scenarios folks have in mind. Rather than talk about motivations in terms of "levels of openness", I think it's easier to discuss it in terms of scenarios or use cases. Most policies around file formats are there to ensure the following:

  1. Long term availability – You want to know that 100 years from now, you'll still be able to access your data. This is a complex problem, as it can affect everything from the software you use to the hardware you use that software on. The key in terms of file formats is that everything in the file format is fully documented, and the stewardship for that documentation belongs to an independent standards body. The ISO, Ecma, OASIS, and the W3C are all examples of organizations people feel comfortable trusting with the stewardship of that documentation.
  2. Freely available – You want to make sure that you don't need to worry about someone else holding rights over your documents. If there is IP behind the format technology for instance, you want to make sure there is some type of license available that will work for you. Not only that, but you want to make sure this will work for anyone else that you want to have access to your documents. All formats out there take slightly different approaches here (PDF, OpenXML, ODF, HTML, etc.), so it's important to pay attention to this.
  3. Fully interoperable and accessible – You want to know that people on other systems can still work with your files. This means that the format needs to be fully documented, and there is nothing in the format that would prevent it from working on a different system. A great indicator here is to look at the number of applications that support the format, and what systems those applications run on. HTML is a great example of an interoperable format. OpenXML and ODF are also both fully interoperable, but are also much younger. So while you don't see as many applications support OpenXML and ODF as you do HTML, you'll clearly start to see more and more pop up as time goes by.
  4. Meets customer and end user scenarios/needs – This is really the key. Without this, then the formats won't be used. Plain text meets the three goals above, but obviously wouldn't work for most folks' documents. You need to make sure the end user doesn't see any ill effects when you try to meet the other three objectives.

There are a lot of other factors that can help you achieve these four goals, but those are all implementation decisions, and don't necessarily prevent you from achieving your goals. For example, using existing technologies like ZIP and XML helps you achieve #3 because there are already tools out there that support them (they aren't necessary for success though). You could go invent your own technology as well, and still achieve #3 assuming you fully document that new technology, but it's often easier leverage what's already there and can help you achieve a more rapid level of adoption in the community.

If you look at the bill in Texas, you can see that they have these goals in mind, and have set 4 criteria points to help them meet the goals:

  1. interoperable among diverse internal and external platforms and applications – super important to be both interoperable as well as accessible as I discussed above.
  2. published without restrictions or royalties – This is really around meeting the first and second goals I identified above. You need to make sure that you will always have the ability to open these files, and you don't want to be forced to pay for that access. This is extremely important, especially when it comes to the core document content. You also want to have the ability to easily scan the files to see if the end user has decided to embed some content that is restricted.
  3. fully and independently implemented by multiple software providers on multiple platforms without any intellectual property reservations for necessary technology – This really helps to show that you aren't going to be tied into a specific application for accessing the content. The reason this is important to ask for is that you want to ensure that the files can be accessed by as many folks as possible.
  4. controlled by an open industry organization with a well-defined inclusive process for evolution of the standard – This is more of a future looking goal. It's not about accessing the document of today, but ensuring that if new ideas come along, they can be added into the format. It's a bit harder to create concrete scenarios here, because obviously you don't want to allow random changes that don't undergo some extensive review. You need to make sure for instance that future changes to the spec don't break existing implications. Very important topic.

As I said at the beginning, it's fun seeing so much attention being paid to file formats. It's always important to remove the more "religious" aspects from the debate, and really drill into the scenarios. What are you trying to do with the documents, and what do you want to see put in place to help you succeed.

-Brian