Copying HTML on the clipboard

Setting plain text on the clipboard is easy. Call Clipboad.SetText("Hello!"), and it works great. But what if you want to set HTML?  Tempting to think you just call Clipboard.SetText("<b>Hello!</b>", TextDataFormat.Html). But that doesn't work because HTML on the clipboard (in CF_HTML format) requires that the string contains a header in front of the actual HTML.  (I first hit this when I wanted to make my RTF 2 HTML converter copy the html to the clipboard instead of write it to a file.)

You'll also notice this if you call Clipboard.GetText(TextDataFromat.Html) after copying HTML to the clipboard.  You don't just get the HTML string back, there's also a giant header in front of your string. This gives 2 problems:

1. How do you get the HTML from the clipboard (strip the header).
2. How do you copy raw HTML to the clipboard? (generate the header)

I have some sample code to do this below, but there's a few points I want to hit first. 

Example of a bad example:
<rant> The example code for Clipboard.SetText is very "clever". It manages to call the API, but in a way that's completely meaningless, and completely avoids mentioning this crucial header. 

 // Demonstrates SetText, ContainsText, and GetText.
public String SwapClipboardHtmlText(String replacementHtmlText)
{
    String returnHtmlText = null;
    if (Clipboard.ContainsText(TextDataFormat.Html))
    {
        returnHtmlText = Clipboard.GetText(TextDataFormat.Html);
        Clipboard.SetText(replacementHtmlText, TextDataFormat.Html);
    }
    return returnHtmlText;
}

My guess is that the sample writer originally tried to do something straightforward and useful, it didn't work (for the exact reasons I'm writing this blog post), and then came up with this more obscure meaningless excuse of an example. </rant>

So what's this header?

ClipBoard.SetText(..., TextDataFromat.Html)  is just shorthand for Clipboard.SetData("HTML Format", ...), which is just a wrapper around the raw win32 APIs and CF_HTML format, which require this text header. (In my experience, Winforms is usually great about not just being raw pinvokes to win32, but actually smoothing over the win32 APIs and exposing a layer that's fundamentally easy to use. I think this is a case that just fell through the cracks.)

The header is a text string that prefixes the actual string you set to the clipboard. The format is described here.   You'll first notice this if you call Clipboard.GetText(TextDataFromat.Html) after copying HTML to the clipboard.

So you don't just say ClipBoard.SetText("<b>Hello!</b>", TextDataFromat.Html).

You end up with a text string like this that you have to pass in:

 Version:1.0
StartHTML:000125
EndHTML:000260
StartFragment:000209
EndFragment:000222
SourceURL:file:///C:/temp/test.htm
<HTML>
<head>
<title>HTML clipboard</title>
</head>
<body>
<!--StartFragment--><b>Hello!</b><!--EndFragment-->
</body>
</html>

The header is in blue. The actual fragment is highlighted.

There's a method to the madness. It provides benefits like:

  1. context to the fragment, such as any enclosing tags the fragment is in. For example, if the text you copied is inside a bold tag, the context can capture that.
  2. a source URL, so you can resolve relative links.

Sample code:

I wrote a class to handle the copying + pasting of HTML snippets to the clipboard. Here's a little sample code demonstrating the class in use:

 
class Foo
{
    [STAThread]
    static void Main()
    {
        string html = "<b>Hello!</b>";
        HtmlFragment.CopyToClipboard(html);

        HtmlFragment html2 = HtmlFragment.FromClipboard();
        Debug.Assert(html2.Fragment == html);
    }
}

The sample code is at here. I tested it with IE7 and Frontpage. Since the header spec wasn't very precise, no general gaurantees. Use at your own risk, etc ,etc.

It worked well enough to hook up to my Rtf/Html converter and used that to paste the code snippets here.

Before, I'd save the HTML to a file (out.html), and then load that in IE and copy from there:

            TextWriter tw = new StreamWriter("out.html");
Format(tw, data);
tw.Close();
 

Now I can copy the HTML to the clipboard:

 
        StringWriter tw = new StringWriter();
        Format(tw, data);
        string s = tw.ToString();

        HtmlFragment.CopyToClipboard(s);