International Mailto URIs in IE7


Introduction

New to IE7 is more reliable and standards-compliant support for international mailto URIs. This post will describe how users, application developers, and web developers can use this new feature of IE7.

The following is a simple example of a mailto URI which, when clicked, will launch your default email client to send a new message:

mailto:name@example.com

In IE6, mailto URIs containing characters not found in your system codepage may or may not appear in your mail client correctly, depending on the codepage of the document containing the mailto URI. But in IE7, mailto URIs with characters not found in your system codepage are handled in a standards-compliant manner, which will work regardless of the codepage of the document containing the mailto URI.

How to Use International Mailto URIs in IE7

For those of you who aren’t interested in how this feature works I’ll describe how to use this feature first. It requires two setup steps. First, you must have a mail program that correctly handles these mailto URIs, such as Outlook 2007 or Mozilla Thunderbird. Second, you must ensure that the ‘Use UTF-8 for mailto links’ checkbox on the ‘Advanced’ tab of your Internet Options is checked. Note that you will have to restart IE after changing the value of this checkbox in order for the change to take effect.

Mailto URIs Settings

Once you have a mail program that correctly interprets international mailto URIs and you have checked the checkbox you can reliably use mailto URIs that contain non US-ASCII characters.

Mailto URI Syntax Quick Overview

The mailto URI scheme allows you to specify various parameters associated with the creation of an email including among other things the recipients, the subject, and the body. For example the following mailto URI specifies an email that is sent to name@example.com, with the subject ‘mailto URIs’, and the body ‘I read your mailto blog post’

mailto:name@example.com?subject=mailto%20URIs&body=I%20read%20your%20mailto%20blog%20post

This is just a simple example of the syntax. For more information on mailto URIs, see the mailto URI scheme standard. What that standard won’t cover is how to include characters outside of US-ASCII in your addresses or body content. The IRI standard covers including non US-ASCII characters in URIs in general and that was the basis for this feature in IE7. Additionally, there’s a draft status document of a new mailto URI scheme standard that will obsolete the current mailto URI scheme standard and it includes specific information about non US-ASCII characters in mailto URIs.

How It Works In IE7

When IE7 is configured as described above, if you click on a mailto URI containing non US-ASCII characters, those characters will be converted to UTF-8 and then percent-encoded and the newly encoded URI will be passed to the mail client.

For example, consider the following mailto URI that contains a ‘®’ character in the subject line:

mailto:name@example.com?subject=Microsoft®

If you click on this link, IE will start your mail client with the following mailto URI:

mailto:name@example.com?subject=Microsoft%C2%AE

This is because the character ‘®’ is represented in UTF-8 by the byte sequence {C2, AE} which after being percent-encoded is the text ‘%C2%AE’.

No special consideration is made for IDN host names. If the mailto URI specifies an address that has the ASCII version of an IDN host, the address will be passed through without conversion to or from IDN. For example the following mailto URI will be passed unchanged to the mail client:

mailto:name@xn--lba.example.com

If an address contains the Unicode version of an IDN host, it will be converted to percent-encoded UTF-8 as in the first example and not to or from IDN. For example, IE7 will convert the following mailto URI from:

mailto:name@®.example.com

To the following when passing it to a mail client:

mailto:name@%C2%AE.example.com

If you are an application developer and would like to handle mailto URIs from IE7 appropriately you can check whether IE7 will be sending you these new style international mailto URIs by checking the following registry key:

[HKEY_CURRENT_USERSoftwareMicrosoftWindowsCurrentVersionInternet SettingsProtocolsMailto] “UTF8Encoding”=dword:00000001

This registry key changes with the checkbox described in the ‘How to Use International Mailto URIs in IE7’ section. When this registry key is set to 1, IE7 will send new style mailto URIs as described in this section. Otherwise, if the value is 0 or missing, IE will send old style mailto URIs as described in the next section.

Legacy Mode

If you use an earlier version of IE or do not configure IE7 to use standards-compliant mailto URIs, the behavior involving non US-ASCII characters in mailto URIs is more complicated.

When you click on a mailto URI with non US-ASCII characters in it, the URI is passed to the mail client encoded in the codepage corresponding to the character encoding of the document in which the URI was found. If a document doesn’t explicitly specify its character encoding then one is picked based on the content in the document and the current system codepage. You can view and change the character encoding of the current document by going to the IE menu item ‘View’ then ‘Encoding’. For more information on character encodings check out the excellent document W3C I18N Tutorial: Character sets & encodings in XHTML, HTML and CSS.

For example, the following HTML snippet has a mailto URI with a non US-ASCII character that I’ve picked arbitrarily (Unicode character U+3113 named ‘Bopomofo Letter ZH’).

<a href=”mailto:name@example.com?subject=&#x3113;”>example</a>

Suppose the same HTML document specifies its character encoding to be the Big5 encoding in the following fashion:

<meta http-equiv=”Content-Type” content=”text/html; charset=big5″/>

The Unicode character U+3113 is represented in Big5 by the byte sequence {A3, A4}. When sent to the mail client the mailto URI will be converted to Big5 with the non US-ASCII character converted to the byte sequence {A3, A4}.

Suppose instead that the HTML document specified its character encoding to be the GB2312 encoding like so:

<meta http-equiv=”Content-Type” content=”text/html; charset=gb2312″/>

The Unicode character U+3113 is represented in GB2312 by the byte sequence {A8, D3}. When sent to the mail client, the mailto URI will be converted to GB2312 with the non US-ASCII character converted to the byte sequence {A8, D3}.

This is a problem for the mail client handling the mailto URIs, since it has no guarantee of how non US-ASCII characters have been encoded. In versions of Outlook prior to the 2007 version, Outlook would assume the system codepage had been used to encode the URI. This means that this scenario would only work with older versions of Outlook, if the document you’re viewing has the same character encoding as your current system codepage.

Authoring International Mailto URIs

When you put an international mailto URI in a document, represent the non US-ASCII characters using the encoding of the document rather than percent-encoding the non US-ASCII characters in UTF-8. This will allow the browser to handle the mailto URI in the manner the user expects.

For example, if you wanted to include the previous example mailto URI in an HTML document you should use HTML encoding to represent the non US-ASCII character as ‘#x3113’.

<a href=”mailto:name@example.com?subject=&#x3113;”>example</a>

This will work in both IE7 and Legacy Mode under the conditions described in the previous sections.

You shouldn’t use percent-encoded UTF-8 to represent the non US-ASCII character as ‘%E3%84%93’ as it will not work in Legacy Mode:

<a href=”mailto:name@example.com?subject=%E3%84%93”>example</a>

If you use percent-encoded UTF-8 for the non US-ASCII characters, the browser will not modify the mailto URI and it will be passed straight to the mail client. If that mail client is an older version of Outlook then the non US-ASCII characters won’t be interpreted correctly.

The point is that when the non US-ASCII character is represented directly in the document’s encoding, the browser is given a chance to convert it into something the mail client understands. So the end user is more likely to be able to use that URI even if they’re using a previous version of IE or Outlook.

Conclusion

In this post, I’ve described how international mailto URIs are handled by IE6 and the improvements we’ve made for IE7 when a standards-compliant mail client is installed.

If you have any questions or comments on this topic, please leave me a note in the comments section.

Dave Risney
Software Design Engineer

Edit: link adjustment

Comments (14)

  1. Morten says:

    The use of mailto: should be banned. We can thank the spammers for that.

  2. Aedrin says:

    "The use of mailto: should be banned. We can thank the spammers for that."

    Yes. Let us also ban cars, because people drive drunk and kill others!

  3. Internet Explorer 7 cannot render YouTube RSS feeds:

    see – http://blogs.x2line.com/al/archive/2007/02/11/2839.aspx

  4. Will says:

    The article above should be retitled "YouTube does not generate valid RSS, which IE7 rightly refuses to render."

  5. You can also use the Quero IDN plug-in for IE to make IDN mailto links compatible with older mail clients such as Outlook Express. Quero automatically converts IDN email addresses to their ASCII representation. To write an email to somebody with an IDN address enter "mailto:name@®.example.com" in the Quero address box.

    http://www.quero.at/

  6. Hello Dave, Thanks for this useful post. It’s great to see that you’ve followed the IRI spec for this.  I have a couple of points.

    [1] I think the article seems to imply that you should use NCRs (eg. &#x3113;) to represent non-ASCII characters, whereas i think it is best to use the characters themselves whenever possible.  You say "For example, if you wanted to include the previous example mailto URI in an HTML document you should use HTML encoding to represent the non US-ASCII character as ‘#x3113’."  I would argue that you should say instead "For example, if you wanted to include the previous example mailto URI in an HTML document you should use Chinese characters to represent the non US-ASCII character, as in "?subject=ㄓ" (you can also use the escape ‘&#x3113’ if you have to)."  

    (On the other hand, (a) a bopomofo character is a bit of an odd choice, since bopomofo is rarely seen in Chinese text (just in the IME or ruby), and (b) with that character there may be a font issue for readers. I think you might have more success with an example such as the earlier used registration mark or an accented Latin character. For example, subject=Olá – where Olá is Portuguese for ‘hello’.  Just a suggestion.)

    [2] I think it would be worthwhile to mention that for legacy mode it is dangerous to insert characters in the mailto that are not available in the encoding of the page – even if the NCR approach allows you to do so (eg. if you have a euro character in the mailto URI, but the encoding of the page is ISO 8859-1).  Actually, i’m also curious to know what IE does in that situation.

    [3] If mailto:name@xn--lba.example.com is passed through to the mail client, is it expected that the mail client will resolve the punycode?

  7. Dave Risney [MSFT] says:

    @Richard Ishida

    Thanks for your comments.  Responses to your points:

    (3) If a hostname is passed through to the mail client whether its punycode encoded or not, its up to the mail client to display and resolve the hostname in the manner that the mail client chooses.

    (2) What exactly do you mean by ‘dangerous’ here?  In legacy mode if a character in the mailto URI has no representation in the destination codepage it will be represented with a question mark.

    (1) I used NCRs in my examples in order to avoid font issues, and so I could easily convey the meaning of the text without explaining the char. encoding of the HTML snippet and its byte representation.  I didn’t mean to suggest that people should necessarily use NCRs.  They should only use NCRs if their document’s encoding can’t directly represent that character.

  8. Thanks for the clarifications.

    Re (2), the danger being that some data is lost, ie. the euro sign becomes a question mark.  I find that people aren’t often clear about that.  They may think that somehow the escape means that the character will somehow magically appear in the mail client, regardless of the encoding of their document. (One way to avoid this issue, of course, is to use UTF-8 everywhere.)

  9. Dave Risney [MSFT] says:

    @Richard Ishida

    (2) Just to be clear, what you wrote is true for legacy mode only.  In non-legacy mode IE7 a mailto URI may contain characters that are not directly representable in the encoding of the document containing the mailto URI and these characters will make it to the mail client.

    Thanks again for your comments.

  10. tihomir says:

    In legacy mode if a character in the mailto URI has no representation in the destination codepage it will be represented with a question mark.

    http://vistahelp.blogspot.com

  11. The mail I got the other day from Wes Miller (yes, that Wes Miller!) forwarding someone else&#39;s question

  12. The mail I got the other day from Wes Miller (yes, that Wes Miller!) forwarding someone else’s question:

  13. The mailto tag used in html content is used to load the default mapi client (email client) on the user’s

Skip to main content