OCX Persistent Strings May Be Altered When Saved in Office 2007 Open Office XML Format

Summary

An ActiveX control that implements IPersistPropertyBag for property persistence may experience problems with their control string data when control is saved in Office 2007 using one of the new Office 2007 Open XML File Formats (such *.docx, *.xlsx, *.pptx, etc.). If the string contains certain control characters (like TAB or CRLF characters), these may be accidently removed when persisted, resulting in the data being incomplete or inaccurate when restored back in the control on next file open.

 

Root Cause

String data persisted in the property bag will appear in the XML file, and the XML encoder can remove extra white spaces, tabs, and carriage returns that are not properly formatted for the XML scheme. Office does not verify that the string is in a safe format to be persisted as XML, and does not encode the string first to preserve the string in its entirety. Office does not use IPersistPropertyBag for control persistence when saving to the older binary file formats, so this problem may appear new to Office 2007 since the problem only occurs if the control supports that interface and provides strings that contain extra characters that are not valid in XML/HTML output.

 

Workaround

The control maker can encode the string before persisting the data in the document in a format safe for use in an XML (or HTML) tag. Alternatively, they can remove support for the IPersistPropertyBag interface when the control is loaded in Office. This will force Office to use the binary persistence code it uses for non-XML files (IPersistStream). The binary blob will then be saved as a linked part in the document instead of as tags in the XML document itself.

 

More Information

The problem is typically seen with MFC controls. MFC uses property bags for communication purposes between the control and its property page, and the MFC code is not written to take into account that the bag content may be written out inside of an XML or HTML file. ATL (on the other hand) primarily uses it for Web Controls, and developers are more cautious about string encoding issues since storing strings in HTML has the same limitations as saving in XML. So ATL controls generally will not see this problem as frequently. Control developers are urged to test their controls fully before recommending their use in Office files.