International Domain Names in IE7


Hi, I am Vishu Gupta, a developer on the IE team. For the past year, I have been working primarily on CURI and International Domain Names (IDN) support. Browser support for navigating to URLs written in users’ native languages is critical for making the Internet truly international. IDN relies upon a standardized mechanism known as “Punycode” for encoding Unicode domain names using only the ASCII characters that are permitted by the DNS system.

After XPSP2 was released, I was asked to study and evaluate what it would take to implement IDN support in Internet Explorer. We determined that the workitems involved in implementing IDN support in IE were:

  1. Converting the Unicode domain names to Punycode before sending them over the wire.
  2. Maintaining consistency within IE for handling domain names which enter IE in Punycode, and treating them equivalent to their Unicode counterparts.
  3. Handling compatibility for existing scenarios.
  4. Providing security against homograph-spoofing attacks without giving a bad user-experience for IDN URLs.

Conversion to Punycode

This is accomplished by using the APIs provided by the recently released “Microsoft Internationalized Domain Names (IDN) Mitigation APIs 1.0”; these APIs will ship with Windows Vista and IE7 and are also available for download here. You can learn more about these APIs by reading the MSDN documentation.

Maintaining consistency within IE

Many websites work around the limitation that IE6 does not support IDN by linking to the Punycoded URL. To improve user experience with those websites and to ensure that IE behaves consistently for equivalent Punycode and Unicode domain names, IE7 handles the URL as Nameprep Unicode internally (as suggested by RFC 3490). IE converts Unicode domain names to Punycode just before the domain name is resolved or sent to the proxy. This ensures, for example, that if the user added ŧēśŧ.example.com to the Restricted Sites zone, http://xn--hea8l8ac.example.com is also treated as a restricted site.

Maintaining compatibility

Using Punycode for name resolution is the default behavior for IE7. A new “International” section in the Internet Control Panel offers permits disabling IDN when sending the domain name either to the proxy or to the DNS resolver. Disabling both options will revert IE7 to IE6 behavior when handling Unicode domain names.

Blocking IDN spoofing

Lookalike attacks (sometimes called “homograph” attacks) are possible within the ASCII character set (the usual examples are www.example.com vs. www.examp1e.com). But, with IDN, the character repertoire expands from a few dozen characters to many thousands of characters from all of the world’s languages, thereby increasing the attack surface for spoofing attacks immensely.

There is little doubt that showing the Punycode form leaves no ground for spoofing using the full range of Unicode characters; however, showing Punycode isn’t very user-friendly. The design of our anti-spoofing mitigation for IDN aims to:

  1. Reduce attack surface
  2. Treat Unicode domain names fairly
  3. Offer a good user-experience for users worldwide
  4. Offer simple, logical options to enable the user to fine-tune the IDN-experience

Given these considerations, IE7 imposes restrictions on the scripts allowed to be displayed inside the address bar. These restrictions are based on the user’s configured browser language settings. Using APIs from the aforementioned idndl.dll, IE will detect what scripts (character sets) are used by the current domain name. If the domain name is contains characters outside of the user’s chosen languages, it is displayed in Punycode form to help prevent spoofing.

A domain name is displayed in Punycode if any of the following are true:

  • The domain name contains characters which are not a part of any language (e.g. www.▯.com)
  • Any one of its labels* contains a mix of scripts that do not appear together within a single language. For instance, Greek characters cannot mix with Cyrillic within a single label.
  • Any of its labels* contains characters that appear only in languages other than the user’s list of chosen languages. Note that ASCII-only labels are always permitted for compatibility with existing sites.

(* A label is a segment of a domain name, delimited by dots. www.microsoft.com contains three labels, “www”, “microsoft” and “com”.)

If none of the above conditions apply, the domain name is displayed in Unicode. Note that different languages are allowed to appear in different labels, so long as all of the languages are in the list chosen by the user. This is to support domain names like name.example.com where “example” and “name” are composed of different languages.

We do not describe “other language” URLs as “suspicious” because such URLs are completely harmless when displayed in Punycode form. Whenever IE7 has prevented an IDN domain name from displaying in Unicode, an Information Bar notifies the user that the domain name contains characters IE is not configured to display. It is easy to add additional languages to the Allow List using the IDN Information Bar. By default, the user’s list of languages will usually only contain the currently-configured Windows language.

Attack Surface Reduction

Our language-aware mitigation does two things:

  1. It disallows non-standard combinations of scripts from being displayed inside a label. This takes care of attacks like http://bạnk.example.com. That domain name will always be displayed as http://xn--bnk-sgz.example.com, because two scripts (Cyrillic and Latin) are mixed inside a label. This reduces the attack-surface to “single-language attacks”.
  2. It further reduces the surface attack for single-language attacks to only those users who have chosen to permit the target language.

Defense-in-Depth

Users who allow Greek in their language-settings are as susceptible to Greek-only spoofs as the population using English is susceptible to pure-ASCII based spoofs. That’s where IE7’s Phishing Filter kicks in for both Unicode and ASCII urls. If the user has opted into the Phishing Filter, a real-time check is performed during navigation to see if the target domain name is a reported phishing site. If so, navigation is blocked. For additional defense-in-depth, the Phishing Filter’s web service can apply additional heuristics to determine if the domain name is visually ambiguous. If so, the Phishing Filter will warn the user via the indicator in the IE address bar.

Whenever viewing a site addressed by an International Domain Name, an indicator will appear in the IE address bar to notify the user that IDN is in use. The user can click on the IDN indicator to view more information about the current domain name.

Users who do not wish to see Unicode addresses may set an Internet Control Panel option to “Always show encoded addresses”.

Call to Action

Internet Explorer 7 Beta 2 will include IDN support in nearly-final form and we would greatly appreciate feedback on the design. If you see a scenario not working properly (for example, if adding native language URLs to favorites was broken), please let us know.

 – Vishu Gupta

Update: Changed one example to not point to an actual domain name!

Comments (49)

  1. Anonymous says:

    I would not have thought that http://www.☺.com really exists, but it does! Was that intended when writing this post?

  2. Anonymous says:

    Yes, http://www.☺.com und http://www.☺.net really exists, both are also domain names of my private homepage http://www.frueh.net.

    I would be appreciated, if IE7 fully support this domains. Opera and Firefox support this domains, but also display it in Punycode.

    p.s. You actually removed your example. (e.g. http://www.☺.com vs. http://www.♧.com) Was ist not good enough?

  3. ieblog says:

    Dani,

    We get a lot of traffic and are often profiled on Slashdot. We wouldn’t want to strain someone’s web server or do a DOS on your site (or anyone else’s) so I changed the example.

    – Al Billings [MSFT]

  4. Anonymous says:

    In my opinion, you can leave this example, I permit this. This is much funnier and more interesting compared to an unexisting domain.

    The requests and traffic are absolutly harmless. IE6 actually does either way not support this domains.

  5. Anonymous says:

    Dani,

    We get a lot of traffic and are often profiled on Slashdot. We wouldn’t want to strain someone’s web server or do a DOS on your site (or anyone else’s) so I changed the example.

    – Al Billings [MSFT]

    How long can a Microsoft webpage withstand a thorough Slashdotting?

  6. Anonymous says:

    try €u.com

  7. Anonymous says:

    I’m no expert on Japanese, but won’t this fail for some reasonable Japanese label strings? A Japanese text can have characters in several different scripts (including Latin)–I’d be surprised if no one in Japan tried for a domain name that used both Japanese and Latin characters in the same label.

  8. Anonymous says:

    Intrope, this will allow native display of a url if the script set corresponding to the label is a subset of scripts used for Japanese language. One of these scripts may or may not be latin. Of course, Japanese needs to be in user’s list of allowed languages.

  9. Anonymous says:

    A suggestion, when a domain is reached that contains a script in a language that’s not in the language settings, you should show that yellow strip to be able to add the language.

  10. Anonymous says:

    Vishu: sounds like y’all are way ahead of me, then. Good! I’m looking forward to IE7…

  11. Rosyna says:

    How does IE7 handle things like http://www.sailor月.com (which is real)?

  12. Anonymous says:

    Perhaps off topic a bit, but will the phishing filter be IDN aware? Meaning, if the URL I enter is http://bạnk.example.com will it result in the same categorization as if I had gone to http://xn--bnk-sgz.example.com (assuming that site is a phishing site)? E.g., does the phishing filter treat the two URLs as equivilent?

  13. Dean Harding says:

    @AC

    That is mentioned in the post:

    > Whenever IE7 has prevented an IDN domain name from displaying in

    > Unicode, an Information Bar notifies the user that the domain name

    > contains characters IE is not configured to display. It is easy

    > to add additional languages to the Allow List using the IDN Information Bar.

  14. Anonymous says:

    Have Microsoft patented any of these anti-spoofing algorithms? Or are they free to implement in other browsers?

  15. Anonymous says:

    When will IE 7 beta 2 be available for download in MSDN?

  16. Anonymous says:

    Rosyna: your example url contains two scripts. If there is one language that contains characters from both the scripts and is present in user’s language settings, then the url will be displayed as is, otherwise http://www.xn--sailor-183m.com/ will be displayed and information bar will be shown.

  17. Anonymous says:

    I wonder how IE7 or future release will resolve the domain name followed by .com in native language, when entered in the address bar.

    Example: if someone directly wants to navigate to खोज.com, they may enters खोज.कॉम

  18. Anonymous says:

    Ravi– To date, I do not believe that there are any ICANN approved native-language TLDs, although they are expected in the future.

    IE7 will perform Punycodization on all labels in the hostname, including the TLD. Hence, the address you’ve specified is treated as: xn--21bm4l.xn--11b4c3d

  19. Anonymous says:

    I’ll test that function on IE, but only test, no more:), cause it is risky to browse the web with such a browser, and why microsoft "creates" "features" latest?). In Firefox this feature works for a long time.

  20. Anonymous says:

    Well, so much for http://www.אנקF.co.il and http://www.אנקP.co.il. Since the Hebrew letter פ can represent both F and P sound, some publications use F and P to differentiate between Funk and Punk. I guess they won’t be able to do that with IDNs, which is just as well (

    I’ve only seen it used by העיר, in the "trendier-than-thou" section).

    -Jonathan

  21. Rosyna says:

    So IE7 displays http://www.sailor月.com incorrectly then? There is no logical reason for it to ever be displayed as punycode as 月 is not what I’d called a spoofable character.

  22. Anonymous says:

    Native-language TLDs are provided by minc.org / i-dns.net using their proprietary IE plugin since its not a ICANN approved standard. I have been waiting for this feature in IE. Thanks.

  23. Anonymous says:

    To echo the comment of ‘sean’ on 20/12, can you please let us know when IE7 beta 2 will be available for download on MSDN…

    Thank you

  24. Anonymous says:

    To echo the comment of ‘sean’ on 20/12, can you please let us know when IE7 beta 2 will be available for download on MSDN.

    Thank you

  25. Anonymous says:

    Richard and Sean,

    Take a look at the post from Dean at http://blogs.msdn.com/ie/archive/2005/12/06/500599.aspx

    We’ll obviously let everyone know when that time comes.

    Thanks

    -Dave

  26. Anonymous says:

    With all the fixes going into IE7 now, is it feasible that there will come a day when web developers won’t need to do something special to get IE to work with the CSS and XHTML correctly (things like the [IF IE] conditional statments are a good example of what I am getting at)?

    Also, with regard to all the confusion surrounding the fix for the * html hack (Holly Hack), does this planned * html fix mean that IE7 will ignore the * universal selector altogether, or only when it is paired with html tag?

  27. Anonymous says:

    The problem with IDN is that it doesn’t work automatically with existing websites using non-ascii chars. People have added the utf-8 version of the hostname into their nameservers because thats what IE is using today, and it works fine. So please tell me once again why we need IDN with ie ?

  28. Anonymous says:

    Jorgen– You are correct in noting that IE has long supported international domain names using UTF-8.

    However, despite this preexisting feature, the Internet community has standardized around Punycode/IDN as the mechanism of choice for dealing with international domain names. While ~some~ systems have been updated to handle UTF-8, not all systems and devices can correctly operate on UTF-8 end-to-end.

    There are some minor advantages to Punycode over UTF-8 (primarily related to the fact that Punycode complies with longtime standards for DNS and hence by definition must work end-to-end with existing systems).

  29. Anonymous says:

    Hi, I am Viktor Krammer, a researcher in the field of web browser technologies. I have worked for almost one year on understanding and implementing the IDN standards in a free plug-in for IE6 (www.quero.at).

    The main advantages of the RFC-defined IDN standard over UTF8 URL encoding is

    1 Compatability with existing DNS

    IDNA is a client-side extension which works with existing DNS technology.

    2 Security

    Yes, the IDN standard is all about protecting the integrity of DNS and avoiding encoding ambiguities.

    3 Compression

    UTF8 is very inefficient for encoding non-Latin domain names. Punycode uses a compression algorithm to store domain names.

    I wish the IE team merry xmas and a happy new year!

    Viktor

  30. Anonymous says:

    I have a comment about process of Attack Surface Reduction

    It disallows non-standard combinations of scripts from being displayed inside a label. This takes care of attacks like http://bạnk.example.com. That domain name will always be displayed as http://xn--bnk-sgz.example.com, because two scripts (Cyrillic and Latin) are mixed inside a label. This reduces the attack-surface to “single-language attacks”.

    This is a list of characters which are used in our VietNamese domain name.

    http://www.vnnic.net.vn/tenmientv/bangma.htm

    in order to write a Vietnamese word, one label will contain a latin characters (A-z, a-z) and vietnamese character like ạ (has code 1EA1 in unicode table).

    For example

    The word "bạnk" has a character "ạ" (ưhich has code 1EA1 in unicode table) and this word is Vietnamese word.

    So if IE7 process like this, I afraid that Vietnamese domain name can not be used with IE 7

    THank you very much

    Happy new year

    Best wishes for IE Team

    Viet Anh

  31. Anonymous says:

    <<I afraid that Vietnamese domain name can not be used with IE 7>>

    Viet Anh– good question, but nothing to worry about.

    These URLs will work for Vietnamese users, because these two scripts appear together within a single language (Vietnamese), so this is permitted.

    As noted, we only block when a label "contains a mix of scripts that do not appear together within a single language."

  32. Anonymous says:

    Thank you for your answer

    You said :

    < As noted, we only block when a label "contains a mix of scripts that do not appear together within a single language."

    >

    Which stantard do you use to specify which set of characters belong to one langguage. for example, how do you specify set of Vietnamese characters ?

    I wonder if I I can send you a table of Vietnamese characters by e-mail. Could you please compare our table and your table and see if there are differences between them.

    Thank you very much

    VietAnh

    My e-mail : vietanh@vnnic.net.vn

  33. Anonymous says:

    While I must say I am not too fond of the idea of IDN myself, for reasons of compatibility and since I believe that it has many opportunities to be misused where it will disturb people, I nonetheless believe that given the issues considered and IDNs used after all, there can be many legitimate uses for mixing scripts (of different languages) in a label of a domain name:

    E.g. names which are indeed made of more than one language, or invented labels – just think how many "invented" names in English contain greek or Hebrew characters, many of which are currently spelled with transliteration or just naming the characters like "Omega-force" or "Aleph-null").

    Therefore I believe that showing just the Punycode for IDNs which mix scripts of different languages in a single label will not be a good idea.

    In order to mitigate/avoid spoofing, how about "color-coding" the different scripts in the URL? That is, using colors to make the usage of different scripts clearly visible to the user?

    Internet Explorer 7 could, in the URL entry box, for example (that’s my obvious notation, there may be better ones) underline the URL, and if there are different scripts used in the URL, simply display the underline under characters in those scripts with a different color.

    For example, a label spelled "<Gamma>toys4boys<Aleph>but<Eta><Sigma><Lambda>" could be shown with an underline, which is BLACK under the parts "toys", "boys", and "but", GREEN under the <Gamma> and <Eta><Sigma><Lambda>, RED under the "4" (if you consider digits a separate script – that is to be decided), and BLUE under the <Aleph>. In this example there are 4 different scripts and therefore IE7 chose some 4 colors for it.

    Another example, showing the effect on spoofing: Consider the text "www.p<cyrillic a>yp<cyrillic a>l.com". "www.p", "yp" and "l.com" will have a BLACK underline, while the two cyrillic a’s will be underlined in, say, RED. The user will immediately see that this is not just simply "www.paypal.com".

  34. Anonymous says:

    Tom Alsberg wrote about colorcoding URL..

    At first I thought "Yeah, that would be nice", but the the thought of colorblind and other impaired people hit me. Colors would be nice, but how would you "display" this on for example a braille reader?

    Happy New Year

    -Anders

  35. Anonymous says:

    >>Colors would be nice, but how would you "display" this on for example a braille reader?<<

    You don’t really have to, at least not on a braille reader. The problem with IDNs is, that they might look alike or very similiar to a seeing person. E.g. µ = u or paypal.com and paypal.com (with cyrillic characters). But a braille reader will not be fooled by that.

    But yes, for colorblind people this might be a problem. At least if there is no additional information but only color coding.

  36. Anonymous says:

    Yes, and maybe they could make bells and sirens go off also. A large flashing red light and a voice that announces "DANGER DANGER DANGER"

    But then we wouldn’t want to be excluding people that can’t see and hear so maybe they could release a noxious smell that permeates through the screen.

    But then what about the people that can’t smell maybe they could send a series of electric shocks through their fingertips on the keyboard in morse code. "Dot dot dash."

    Then what about the people that don’t have any senses. Maybe then they could just release a deadly nerve agent through the bass port on their speaker. At least then they will make sure that noone ever goes to a bad site again.

    Or you could just switch to another browser. With all the press releases about them and not one answer from Microsoft they obviously wouldn’t mind.

    Just more smoke and mirrors folks with technology that’s already out there and other companies have first.

  37. Anonymous says:

    Dear MS IE Team,

    why will there be a notification _whenever_ an IDN character is included in a domain name?

    A "normal" IDN should be treated like any other domain name. With your notification solution you will automatically worsen the reputation of IDNs. There always is an effect if you differ two things. The user will ask himself, why…

    Jean Pascal

  38. Anonymous says:

    six people were sitting on a bench discussing IDN’s,

    – the guy from the competing place said: ‘we have already solved it, look’

    – the manager said: ‘we cant use their solution, we’d have to pay for licensing, or at least it’s their solution, lets come up with somehting of our own (but it’s not going to be an easy solution because all the easy/obvious/non-complicated ways have already been patented by the lightfooted USPTO), just hope someone follows our standard, lets organize another conference where we can push our solution’

    – the programmer said: ‘uh, I don’t care, just give me somehting complicated to program (I like to program)’

    – the GUI specialist said: ‘wait, I know, lets use colors, we could warn people if there’s danger, we could use… RED, I wonder if red means the same in all countries (whatupp trafficlights)’

    – the security specialist, forced to speak/act by superiors said: ‘lets make it secure, hmm.. We could warn people if they use IDN’n?’

    – the commenter/onlooker said: ‘why not pick the best solution?’

  39. Anonymous says:

    >why will there be a notification _whenever_ an IDN character is included in a domain name?

    Excellent question. See my above post. Or look at the definition to "xenophobia" http://www.answers.com/xenophobia&r=67

  40. IEBlog says:

    Domain names are not limited to ASCII any longer, and as the web is growing more and more domain names…

  41. IEBlog says:

    In addition to the more prominent work we’ve done to enable international scenarios (like adding support