Improved HTML Copy in the latest Productivity Power Tools

Executive summary: HTML Copy is an extension for Visual Studio 2010 (part of Productivity Power Tools) that automatically copies source code from the Visual Studio editor to clipboard in HTML format (in addition to the existing TEXT and RTF formats), preserving font and color information. It does so automatically every time you press Ctrl+C or otherwise invoke the Edit.Copy command, so there is no explicit way to invoke the feature (it has no UI of its own).

In the latest release: all font and background color information is now correctly written to HTML, dark schemes such as those from https://studiostyl.es are now correctly supported. Format of the output markup is now customizable via a Tools Options page.

Hint: if you don’t want to read all the long details below, but still interested or have a question, there’s an FAQ at the end of this post. Don’t miss it!

Also see my previous posts about HTML Copy:

Configuring the output via the Tools Options page

I guess the biggest improvement in this version is that I finally added a Tools Options page that lets you customize the HTML markup that ends up on the clipboard:

Tools Options page

You can select how the snippet is prefixed (arbitrary markup, could be anything, e.g. custom JavaScript etc.) By default, I use <pre style=”...”>, because after looking at various blogs this is the most common way to format code. Moreover, you can insert the actual values for the font, foreground color and background color in curly braces. If you don’t want to have the values embedded, just remove the macro in curlies and hardcode your own value.

Format for the colorized spans – use CSS or embed styles directly

Colorized spans in code are encoded using the <span> tag – you can’t change that.

However, you can choose whether to use CSS for classes (EmitSpanClass). If you select this one, highlighted spans from the editor will add the actual classification type of the span into the resulting HTML markup:

 <span class="keyword">public</span>

The classification type is what the language service actually classifies the token as, so the full fidelity information from the compiler is preserved in your output HTML. For instance, DayOfWeek.Friday will be output as:

 <span class="User Types(Enums)">DayOfWeek</span><span class="operator">.</span><span class="identifier">Friday</span>

Then you just have to define your own CSS for the various classification types that the compiler uses, and your code will be correctly colorized in the browser.

If you don’t want to use separate CSS styles however, you have the option to embed the font and color information directly in your HTML. Set EmitSpanStyle to true (and EmitSpanClass to false), and you will produce stand-alone, independent HTML that doesn’t need any CSS:

 <span style="color:blue;">public</span>
 <span style="color:#2b91af;">DayOfWeek</span>.Friday

Preserving full language fidelity from the Visual Studio editor

HTML Copy has an advantage over other code formatting tools that colorize code for the web using their own, simple rules (often RegEx based). HTML Copy colorizes code exactly how Visual Studio editor displays it. Also, this works just as well for other code formats, such as HTML, VB, F# etc. As long as the VS editor has classified the text, we just grab that information and output it to clipboard while preserving the exact classification types and classification format definitions.

The controversy around the <br> tag

You have an option to replace line breaks with the <br> tag. Why? I had done some research to see how different RSS readers present formatted code and it turns out, if you use Outlook RSS Feed Reader, it has a bug where it just ignores line break characters inside <pre> tags and lumps all the code onto one long line which is word-wrapped. I’ve let the Office team know about the bug, but in the meanwhile, I decided to provide the option to use the <br> tag instead of line break in HTML. Downside of it is that in HTML source, your code looks like one long line, but the good news is that RSS Reader in Outlook then correctly displays the text. Other readers also correctly display <BR> tags inside <PRE> tags, so we should be good here, no?

No.

Internet Explorer doesn’t copy the <br> as a line break when selecting the code in the page and copying it to clipboard. Suppose you’re reading a blog and you would like to copy the code from the webpage. You select it, paste into notepad and you get one long line! That’s why, if you want your readers to be able to copy your code correctly from a webpage in IE, don’t use the <BR> tag, use the line break in HTML. I’ve let the IE team know about the bug (preserve line breaks when copying <BR> inside <PRE> to clipboard), but they didn’t fix it in time before IE9 shipped.

The controversy around the space and &nbsp;

A similar “surprise” was awaiting me when I installed Windows Live Writer 2011 to test HTML Copy with their latest version. Live Writer used to be the tool that I LOVED, because it was simple, did the right thing and was a pleasure to use.

Enter Windows Live Writer 2011.

When pasting HTML from clipboard using Ctrl+Alt+V, DOWN, ENTER, it apparently thinks that you pasted code from Word, and their special algorithm kicks in that is supposed to “fix” the tangled Word formatting. In reality however, they take the clean and well-formed HTML that I put to clipboard and mess it up beyond any recognition. In particular, if you were using spaces for indentation in your code, it will at random insert inexplicable &nbsp; all over the place. I’ve found that if I replace all my spaces with &nbsp; myself, then Live Writer at least has some mercy for the HTML markup and while still mangled, it displays more or less correctly in the browser, so I included the option. Naturally, I’ve done due diligence and informed the Live Writer team about this unfortunate behavior regression. Let’s see if they fix it as promptly as the Internet Explorer team did (ha-ha, my bitter sarcasm!).

Bug fixes

This release has a few bug fixes that I’ve accumulated from customer reports (thanks everyone for your feedback!). One good news is that I now correctly preserve the background color and font information, so the tool should now correctly work for folks with a dark scheme (sorry for the white on white stupidity – I should have known better!).

Copy from Remote Desktop and other Windows Clipboard oddities

One major thing that Ameen from the editor team helped me fix is that sometimes Copy just didn’t work inside Remote Desktop sessions. If you have HTML Copy installed, and press Ctrl+C inside remote desktop, and the code doesn’t get copied to clipboard – please just try copying it again a few times.

The story (Raymond Chen style). Windows clipboard obviously needs to be available to all the processes, which, naturally, run concurrently. I would have thought that they have some sort of a reader-writer lock, a monitor, semaphore – anything. Well, no. If a process tries to access clipboard while the clipboard is already open, it just fails with CLIPBRD_E_CANT_OPEN. A good description of this can be found here or here for example.

In brief, if you copy stuff to clipboard in remote desktop, shortly after you’re done, it will open the clipboard again to synchronize it back to the host machine. About the same time however, I come in and try to open the clipboard too (because I want to augment the TEXT and RTF formats already there with the new HTML format). Obviously, we both fail, and what do we both do? We both go sleep 100ms. Yes, that’s how the WinForms team and the WPF team worked around the lack of proper synchronization in Windows Clipboard. They retry 10 times and sleep for 100 milliseconds in between. The problem now is: they all wake up and try again at the same time. Better solution here would be at least to sleep a random number of milliseconds (module timer granularity), but alas this is not the case here.

The way I wrote HTML Copy originally is that it was listening to the Edit.Cut and Edit.Copy VS commands, and after VS has put TEXT and RTF to clipboard, I happened to additionally put HTML on clipboard in a second transaction. Now I just completely suppress Visual Studio’s handling of the Cut and Copy commands, and copy all the formats, TEXT, RTF and HTML in one fell swoop. This should hopefully alleviate the problems connected with copying over remote desktop.

The next most requested feature (not there yet)

The request I’ve been hearing the most is to add some sort of a command, that wouldn’t put HTML to clipboard in a separate clipboard format parallel to text and RTF, but instead put the HTML tags to clipboard directly in text format, so that you can paste source yourself to a simple text box such as the one in Wordpress or other web-based tools.

Obviously I can’t copy HTML markup to plain text clipboard during normal operation, because this would break copying code around in VS during programming (on second thought – hmm, maybe not such a bad idea after all??!)

So I guess I need to mentally prepare myself for adding an actual Visual Studio menu item to explicitly copy HTML markup to clipboard in plaintext format. Let’s hope I get around to it before the next update of the Productivity Power Tools.

FAQ

How do I use HTML Copy?

Provided you have installed Visual Studio 2010 and Productivity Power Tools, there are only two simple sides to it – Produce HTML and Consume HTML. You Produce HTML (or simply put it on clipboard) by copying code in the editor as you normally would (most people prefer Ctrl+C). You Consume HTML by pasting it in a program which has a “Paste Special” feature, such as Word, Outlook, OneNote or Windows Live Writer, and selecting HTML format instead of plain text format.

How do I paste code in Windows Live Writer?

Ctrl+Alt+V (or Ctrl+Shift+V on older versions) will pop up the Paste Special dialog. Then, select Keep Formatting (DOWN ARROW) and ENTER (if you don’t see Keep formatting there, it means HTML Copy didn’t work). Then you can switch to the Source view and actually examine the HTML. Caveat: Windows Live Writer 2011 mangles the HTML the moment you paste it, so please don’t blame me – I produce clean well-formed HTML without redundant tags. I also don’t use <font> tags, so if you see a <font> tag, it’s not me!

How do I get the source for the HTML markup?

You need a tool that lets you examine the clipboard, such as Clipview by Peter Büttner:

image

Don’t forget to push the View button in the top-left corner – the tool doesn’t refresh automatically! If you know a better way, please let me know!

How does this compare to other tools?

There are plenty existing tools out there to let you format code on your blog. Here are some links:

SyntaxHighlighter by Alex Gorbatchev is very nice and seems to be popular. One minor thing with it is that it requires JavaScript (not all readers support it). Code will still display decently though, so it’s not a big issue. Another downside of tools like this – they do their own colorization, and don’t preserve the exact classification of the compiler. Thus, they’re less precise. But again, this is very minor – it works reasonably well in the general case.

CopySourceAsHtml seems to be very nice (and HTML Copy is essentially the same thing, but Colin was first!) He did a fantastic job with the APIs we exposed at the time. We both colorize code correctly (like VS does), but we don’t output as neat functionality as SyntaxHighlighter (ability to copy/print code for example, which requires JavaScript). I also don’t support line numbers in HTML Copy.

Paste from VisualStudio by Douglas Stockwell is a Windows Live Writer plug-in that takes RTF from clipboard and converts it to HTML. An advantage is that it preserves correct coloring from the editor, downside is that it’s only for Live Writer.

Why did you decide to write yet another tool like this, with plenty of good tools already out there?

Architecture. With the brand new editor in Visual Studio 2010, it’s design is so nice and pluggable, it allows extensions to easily add whatever formats to clipboard. However text and RTF were implemented in the box, but HTML wasn’t. I really wanted a tool like this to be in the box, so I looked around. Unfortunately, the implementation of CopySourceAsHtml (the best tool I’ve found) didn’t take advantage of the new editor APIs so I decided to write my own. Since Productivity Power Tools seemed like a good ship vehicle to give my implementation proper testing, I decided to do just that. Maybe if the tool is good enough one day, we’ll put this in the box.

What are the known bugs and feature requests?

  1. Again, the biggest is showing raw HTML markup to the user.
  2. Then I have a request from Tomáš Skála to normalize indentation (should be something along the lines (no pun intended!) of while(lines.All(line => line.StartsWithSpace())) lines = lines.Select(line => line.TrimFirstSpace())
  3. Then I have a bug report from csrowell where Ctrl+L on two non-fully selected lines doesn’t fully delete both lines

As always, please let me know your questions and comments if any. Thanks!