Open XML diff tools


When learning about Open XML or developing Open XML solutions, it’s very common to find yourself wondering “what’s the difference between these two documents?” For example, you may see something in a document that you’d like to recreate programmatically, so you want to know what markup would be required. Or perhaps you’ve modified a document manually (using Word, say) and you want to know what markup changes were caused by your edits.


In those situations, a diff utility can save a lot of time. I’ll cover two good options for comparing Open XML documents below: Eric White’s free command-line tool OpenXmlDiff, which comes with source code and can be useful in automated workflows, and Altova’s commercial GUI tool DiffDog, which offers a variety of interactive capabilities for analyzing the differences between Open XML documents.


Eric White’s OpenXmlDiff


Eric White recently had a need for an Open XML diff utility, and he decided to create a tool from scratch. The result was OpenXmlDiff, a simple and straightforward command-line tool that generates a report of all the differences between two Open XML documents. The diff report is written to console output, so you can easily redirect it to a text file or another program. Eric has put together a screencast that provides a concise 3-minute overview of how to download and use OpenXmlDiff.


OpenXmlDiff uses the XML Diff and Patch Utility (a free download on MSDN) to analyze the differences between the same XML part within two different Open XML documents. That tool identifies the specific changes that would be need to transform one XML document (i.e., OPC part) into another, and OpenXmlDiff handles the details of the OPC package and generates a well-organized output report that summarizes differences at the package level and then shows the specific details for parts that differ.


OpenXmlDiff is a good option if you want to study source code or extend a tool on your own, and it’s also free. For those who want more of a slick GUI tool for comparing Open XML documents, there’s another good option …


Altova’s DiffDog


I had the pleasure of meeting Alexander Falk in person at TechEd two weeks ago, and we had lunch and talked about our mutual interests including XML standards, Open XML tools, and — most of all — photography. Ironically, we got so busy talking about photography that I forgot to take a picture of Alex, but I did snap a couple of photos of their booth, where a variety of Altova employees (including Tara and Erin, pictured) were on hand to answer questions and do demos.


Altova’s suite of XML tools has been evolving rapidly, and one of the areas where they’ve added quite a bit of functionality lately is Open XML support. For example, Alex blogged recently about how to use Altova’s MapForce to auto-generate C# code that creates an Open XML spreadsheet, and their XMLSpy and StyleVision products also provide built-in support for the Open XML formats.


Another Altova tool that can be very useful to Open XML developers is DiffDog, a full-featured general-purpose diff/merge utility that supports any type of text file and also offers XML-aware differencing and support for Open XML documents (i.e., OPC packages) and ZIP files.


DiffDog’s “XML-aware” approach means that it’s smart about how to organize differences in XML documents for various visualizations (text view, grid view), and it also provides options for how to handle whitespace, CDATA, ordering of attributes (semantically meaningless, but sometimes important to a developer) and many other XML-specific details. And with full support for parts in ZIP packages, you can easily use DiffDog on Open XML documents. Download the free 30-day trial version and check it out.


Comments (5)

  1. Ewa says:

    Hi!

    You are right. The support for Open XML in several tools from Altova is a nice thing!

    Although not directly related to Open XML: If you like XMLSpy you also might like its extensibility. You can learn more about it at xml-tools.com

  2. Jason Harrop says:

    Eric White’s tool will be very handy if you have programmatically created a broken document, and would like to see how Word fixed it.

    Another scenario I have dealt with recently is working out the difference between two versions of a paragraph. Our java library – docx4j – uses diffx (open source created by Topologi) to calculate the diff, then XSLT to convert the differerences to w:ins and w:del, which Word displays as tracked changes (as does docx4all).

    I’d like to do the same thing in my Word add-in, and considered using Microsoft’s Diff and Patch tool, which is actually open source, in that the source is available.  However, it is subject to an unattractive license, so my plan is to use diffx in the Word add-in as well.  I think there is an opportunity for Microsoft to properly open source the Diff and Patch code.

  3. Christian says:

    When will you add support for odf to these tools? You paid for and let build all these nifty little tools and now Microsoft says goodbye to OOXML. OOXML seems to be the next J++. Why do you try to sell technology that your company is going to abandon? Do you want to fool the ecosystem?

  4. Doug Mahugh says:

    > When will you add support for odf to these tools?

    Well, there are  two tools I’ve covered above, a free utility from Eric White and a tool developed and published by Altova.

    Eric posted his tool because he’s our Open XML technical evangelist, and he’s responsible for helping developers take full advantage of the Open XML formats.  ODF has nothing to do with that work, so I’m pretty doubtful you’ll see Eric adding ODF to OpenXmlDiff, but you could ask him on his blog.

    For Altova you’d have to ask them whether they have any ODF plans.  Coincidentally, I asked them that question myself recently, and the answer was that they’ve not heard any requests for ODF support from their customers.

    I’m not sure why you say that we’re saying goodbye to Open XML or we’re “abandoning” Open XML.  Quite the opposite, we remain firmly committed to Open XML.  The fact we’re supporting other formats doesn’t change the fact that Open XML is our default format and the most powerful document format we support.  (As I covered recently, with SP2 you’ll be able to set any other format as your default too, if you’d like.)

    Open XML provides a level of custom schema support, through OPC, custom markup, and custom XML parts, that is not matched by any other format we support, and we take full advantage of that capability in many ways.  (Check out how the Compatibility Trip can round-trip SmartArt through older versions of Office, for instance.)  Open XML also offers unparalleled compatibility with existing Office binary documents.  No other format even comes close on that measure, and this gives millions of our customers the peace of mind of knowing they can migrate to an XML-based format without the need to manually convert and verify every single document.

    As you’ve observed, we’re doing a lot these days to create better tool support for Open XML, both in terms of enhancements to our own tools such as the SDK, and in helping to raise awareness of Open XML tools from other vendors.  We’ll continue to make those types of investments, and we have good things planned for Open XML developers in the months and years ahead, with the Open XML SDK roadmap that we recently announced being a great example of that.  Our commitment to Open XML hasn’t wavered at all.