Why do Office ".xml" files behave differently from other ".xml" files?

Some of you who have worked with Office 2003 xml files may have noticed that while we use the ".xml" extension, the files still show unique icons and the original application is launched when you double click them. The files are totally valid XML files following the W3C 1.0 spec. The reason they behave differently is that we put a PI (Processing Instruction) at the top of the XML file that identifies which application created the XML. Open any of the Word XML files with a text editor, and you'll see the following:

<?mso-application progid="Word.Document"?>

That declaration is what let's us know that it's a Word XML file. We do the same thing with InfoPath and Excel XML files. There is a component that we call the msoxev that sniffs files with the .xml extension and looks for that PI. When it sees the PI, it then does a lookup in the registry to see if there is an application associated with the prodig attribute. If so, it will use that application for opening and editing the file.

We also run this in IE, so if you open one of the XML files in IE, it will automatically get handed off to the proper application. This is great if you are just following a hyperlink and want to view the file with the application that generated it. If you are debugging the files or want to view the XML directly in IE though, it's a bit of a pain. If you want to open the file in IE, and not get redirected, you have a couple options.

One time adjustment: If you want to change the behavior just for that specific document, you can open the file in a text editor and delete the PI. Then it will behave just like a regular XML file.

Permanent adjustment: This is a behavior you can easily modify if you want. The XEV mechanism just sniffs the registry to see what the content type for that file is. Go to the following: "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\11.0\Common\Filter\text/xml" and you'll see a collection of entries. The name of the string matches the progid attribute in the PI, and the value of that string is the content type for the file. If you don't want the current behavior, you can just delete the string or rename it, and it will now behave like any other XML file.

You can also customize this regkey to register your own applications that want to use the .xml extension.

We won't have this issue with the Office 12 XML files, because we actually use unique icons. It was something we had discussed doing with the Office 2003 XML files but eventually decided against it. The new default formats will still be XML, but they will actually be wrapped in a ZIP container and we decided using unique extensions (.docx, .pptx, .xslx) was the best way to go.

-Brian