Article on Document Security

I just read this post on document security:

Obviously there are many different issues here, some easier to solve than others. Some of the problems are approachable with transparent file formats, as well as support for customer defined XML. One problem that can be solved with the new Office XML formats is that of unknown metadata being sent out with a document. Since the formats are now open, you can easily validate that the file doesn’t have hidden edits, comments, and sensitive metadata. There is no longer the need to worry about things being hidden in your files. It’s all represented in a well documented format that people can build solutions on top of. I had a post a couple weeks ago showing how you could apply a simple XSLT to remove comments and tracked changes from a document. You can imagine this being an automated process applied to all documents going out via e-mail or being posted on external websites. Of course this doesn’t prevent the problem of users posting documents that they shouldn’t, but it does help with those cases where people unknowingly post data that was hidden from them in their editing environment.

Additionally, you could use the support for customer XML to actually mark the documents up with your XML, and use that data to validate whether or not certain content should be removed when posted externally. I’ve seen people do this with Word 2003’s XML support by creating XML elements to specify “security” levels. If you apply your XML to document templates, it becomes easier to identify the types of documents traveling out of your organization. This kind of metadata can be a two-edged sword since it helps you identify the type of document, but is also something you don’t want to expose outside your organization (but as I said earlier, you can remove this data easily enough). It still doesn’t prevent the malicious user though since they can just remove the tags or copy the content and paste it as plain text into an e-mail. That’s where the problem becomes more complex.

This is an interesting topic that I’ll try to get into a bit more and how it relates to the new Office file formats. There are some tools out there today that are also relevant that I’ll try to dig up pointers to.


Comments (6)

  1. Gene Myers says:

    Hi Brian- I’m glad to see you found Joe Fantuzzi’s article interesting.

    Your response reminded me of a question I and others had at TechEd, with regards to a mechanism to secure the XML so that direct changes (outside of Word) to the XML can be noted/prohibited. You mentioned this may be a role of DRM. Have you (MS) given any thought to other options, since TechEd?

  2. Anon says:

    Why don’t you turn off these tracking "features" off by default? That way most users that don’t care for them are not caught off guard. Those that want them will know where to turn them on.

  3. BrianJones says:

    Hey Gene, let me try to explain that a bit more, as I realize I wasn’t as clear at tech-ed. Let’s first look at what the situation is today. Today, you save a .doc file from Word, and it goes in some location that you specify (local, network, etc.). If you haven’t used IRM or encryption, then anyone who can get access to that file can make changes through any number of methods. They can use Word, Works, Wordpad, OpenOffice or any other piece of software that supports the .doc format (there are a number out there). The .doc format was pretty complex, so there aren’t a lot of tools that could manipulate it, but anyone with the desire to could figure it out (they just have to spend enough time). If you want to protect that document from being modified, you have a couple options:

    1) Block access to the file itself. This has nothing to do with the format. It’s just general file and folder level permissions.

    2) Encrypt the file so it can’t be opened or modified. This can be done using IRM or the older encryption functionality. The integrity level is then tied to the effectiveness of the encryption.

    3) Sign the document. You can apply a signature so that you can validate if the file has been modified without the signature being reapplied.

    That’s the story with today’s formats. There are of course more granular options, but we don’t need to dig into that. Now, let’s look at the story for Office 12. The only thing that has really changed is that with .docx, the format is better documented and easier to work with. Often times when people tried to build tools for working with the .doc formats, it would result in a corrupt document. That problem should go away with the new formats. The methods for controlling access to the files though are the same. If someone can modify the .docx file outside of Word, they could have also done the same with the .doc file. If you want to lock down the access to the file, you need to use the same approach you would have used with .doc. Signatures will work just as effectively, as will encryption. The most foolproof approach though of course is to block access to the file all together.

    Make sense?


  4. BrianJones says:

    Anon, most of those features are off by default. Comments and edit history (tracked changes) are not turned on without an explicit user action. In Word 2003 we actually did some additional work to help users see if tracked changes had been turned on in their documents. When you open a document with revisions, we automatically show them, unlike in the past where they may have been hidden based on the view selected.


  5. spaceship789 says:

    I notice that in WPF there is a signing API that works with packages.

    Will these be any tools *within* office that allow you to apply these signatures?

    Or will we rely on external tools to do these things?