MathML on the Windows Clipboard

 Sometimes people enquire how the Windows clipboard works and whether it supports MathML, jpeg, RTF and other formats in addition to built-in formats like CF_BITMAP and CF_UNICODETEXT that are defined in winuser.h. The answer to the second question is that Windows supports any format that you want to define, including private formats. This post gives a list of popular clipboard formats including those for MathML and describes how they work.

The built-in formats are listed in the table

 CF_BITMAP

 CF_PALETTE

 CF_DIB

 CF_PENDATA

 CF_DIBV5

 CF_RIFF

 CF_DIF

 CF_SYLK

 CF_ENHMETAFILE

 CF_TEXT

 CF_HDROP

 CF_TIFF

 CF_LOCALE

 CF_UNICODETEXT

 CF_METAFILEPICT

 CF_WAVE

 CF_OEMTEXT

 

 

Notably missing is the “Rich Text Format” (RTF), which was defined way back in 1988. But that’s partly because it’s so easy to define what you might name CF_RTF. You just call RegisterClipboardFormat(“Rich Text Format”) and you get back a unique 16-bit ID between 0xC000 and 0xFFFF. Any other application running on the same machine gets the same 16-bit number by making this call. In fact, you can define a format using any valid Unicode string and copy/paste it between applications provided they all register the same string and understand the format. Consequently it’s almost as easy to use registered clipboard formats as the built-in formats.

For MathML, the standard strings are “MathML”, “MathML Presentation”, and “MathML Content”. Microsoft Office applications support “MathML” and “MathML Presentation” and interpret “MathML” as Presentation MathML. Microsoft Word has an option to copy MathML in the plain-text slot (CF_TEXT) instead of the linear format. You can choose between the two using the Equation Options dialog (on the math ribbon click on the bottom-right icon of the Tools block). This plain-text slot option was a popular way to exchange MathML before the Windows MathML clipboard strings were standardized. Note that the MathML clipboard formats are only available if the selected text is completely contained within a math zone. The MathML formats cannot represent text in a math zone along with text not in a math zone. You need to use a format like RTF or HTML to copy such combinations.

In HTML, Office apps copy math zones as images with comments that contain “OMML”, the Office Math Markup Language. Similarly the docx and pptx formats represent math zones using OMML. However they do use MathML for math zones when converting to/from Open Office formats like odt.

Some common image clipboard formats in addition to those in the table above are “JFIF”, “PNG”, and “GIF”. Here “JFIF” stands for “JPEG File Interchange Format”. The CF_DIB and CF_DIBV5 formats are device-independent bitmap formats that Windows understands, but the Windows Imaging Component (WIC) does not understand. In case you need to use WIC for these, the previous post explains how to convert them to the CF_BITMAP format, which WIC understands.

A very powerful clipboard format is “DataObject”, which gives an IDataObject interface. This interface has methods to query what clipboard formats are available and to get the data for these formats. The clipboard always offers this format. Unless a specified format is requested, the format that is pasted is the first one offered by IDataObject::QueryGetData() that the paste target understands. On desktop applications, the paste target gets the clipboard’s IDataObject by calling OleGetClipboard(). The IDataObject interface provides a general way to interchange data between applications and can be used independently from the clipboard. The source application implements IDataObject and offers it to target applications via the source applications object model. For example, ITextRange::Copy() has the option to return an IDataObject. By using IDataObject directly, applications exchange data without changing what’s on the clipboard.

The Windows RT “Immersive” clipboard works somewhat differently and is used by Windows Store applications. It works with an object called a data package, which has functionality similar to that of the IDataObject. Accordingly the immersive clipboard capabilities are similar to those of the Windows desktop clipboard. In particular, you can interchange data between desktop applications and Windows Store applications. You can define new data formats, such as the MathML clipboard trio cited above. Each application refers to a particular format by using the format’s string name, rather than using the 16-bit ID returned by registering a string. A few clipboard formats are built in, namely, bitmap, HTML, RTF, Unicode Text, and URI. I haven’t tried copying MathML on Windows Store applications yet, but hope to do so soon.