What a drag: Dragging a Uniform Resource Locator (URL)


Last time, we dragged some text around and found that the text would be interpreted as a URL if you dropped it onto Firefox, but Internet Explorer was not as willing to accept it. Today, we'll make the data object work for Internet Explorer.

The only change is that we have to provide the URL in the form of a CFSTR_SHELLURL clipboard format rather than as CF_TEXT. Take the program from last time and make two changes. First, use the handy-dandy search-and-replace function to change DATA_TEXT to DATA_URL throughout. (This step isn't technically necessary, but it's nice to have the name match its usage.) The real work happens in this change to the constructor:

CTinyDataObject::CTinyDataObject() : m_cRef(1)
{
  SetFORMATETC(&m_rgfe[DATA_URL],
               RegisterClipboardFormat(CFSTR_SHELLURL));
}

That's all; just change the clipboard format from CF_TEXT to CFSTR_SHELLURL. It is important to note that CFSTR_SHELLURL represents an ANSI string. Since "URLs are written only with the graphic printable characters of the US-ASCII coded character set," there is no loss of expressiveness by restricting to ANSI.

Run this new program and now you can click in the client area and drag/drop the (invisible) object onto Internet Explorer, where it will navigate to Microsoft's home page. (If your system supports Active Desktop, you can also drag/drop the invisible object to the desktop and create an Active Desktop component.)

Okay, so we have one version of the program that can drag a URL to Internet Explorer, and another version that can drag a URL to FireFox. Next time, we'll combine them to have a single data object that can drop to both. It's quite embarrassingly simple (because I planned it that way).

Comments (27)
  1. Rosyna says:

    Would this wok for URLs like http://sailor月.com/imgs/blingedout.jpg or do those have to be handled specially for drag and drop?

  2. Bob King says:

    It’s funny that the url that’s really there is:

    http://xn--sailer-183m.com/imgs/blingedout.jpg

    At least according to firefox’s rollover.

  3. mvadu says:

    @Rosyna

    http://sailor月.com/imgs/blingedout.jpg

    I am not able to open this URL through IE.

    I could see a rectangular box inplace of 月, and title bar says "Invalid syntax error".

    That means IE does not except UNICODE, and as per Raymond (or the link he mentioned) that is correct, so your URL is really invalid.

  4. acd says:

    As far as I see, the url from Rosyna has the unicode character 6708 from "CJK Unified Ideographs" before the dot.

  5. BryanK says:

    You have to use IDN to encode a hostname containing characters outside the 7-bit-ASCII range, when you look up that name in DNS.  So no, you can’t drop those characters as a CFSTR_SHELLURL onto IE, either: you have to drop the IDN-encoded version instead.

    (The IDN-encoded version is what FF shows in its status bar: xn--sailer-183m.com is the host name.)

    mvadu: I bet if you typed in the IDN-encoded host name, it would work in IE as well.  Not that users should be required to do that, of course, but allowing IDN encodings causes other grief too, so it’s not all bad that IE doesn’t work with it.  (How do you tell paypal.com apart from xn--pypal-4ve.com when your browser renders the IDN version using its native Cyrillic, but your font shows the same pixels for the ASCII ‘a’ and the Cyrillic ‘a’?  (The characters are homographs.)  That was the basis for one of the patches to Firefox a few years ago.)

  6. Mo says:

    Oh right, so the shell on a fully Unicode-enabled OS doesn’t have the foresight to support IRIs (like, for example, when IE happens to have full support for them)?

    That’s… er, great :

  7. Mike Dimmick says:

    mvadu and Mo are still running IE6. IDIs were added in IE7.

    (Run, don’t walk, to Windows Update.)

  8. mvadu says:

    @Mike Dimmick

    mvadu and Mo are still running IE6. IDIs were added in IE7.

    (Run, don’t walk, to Windows Update.)

    Yeah.. I am in a corporate network, so had to use IE6. But i don’t think Windows updated will install IE7. IE6 is still supported by MS and IE7 is just an optional component.

  9. Andrew Cook says:

    The linked RFC is obsolete.

    The updated RFC (ftp://ftp.rfc-editor.org/in-notes/rfc3986.txt) maintains the restriction to US-ASCII, although not so explicitly defined, in sections 2 and 2.5.

  10. Rick C says:

    Unless it’s being blocked at the corporate level, Windows Update will cheerfully nag you constantly about upgrading to IE7.

  11. poochner says:

    Yes, and IE 7 will nag you to upgrade to IE 7.  IE 6 on Win2K will nag you about upgrading to IE 7.  If only there were some way the IE 7 upgrade site could detect which browser I was already using…

  12. mvadu says:

    http://www.microsoft.com/windows/downloads/ie/getitnow.mspx

    is where you get IE7, but windows update will not say anything about IE6 as long its not obsolete.

  13. mvadu says:

    @Andrew Cook

    Even the RFC you mentioned does not say that you can use the UNICODE charset in URL, but they do say you need to encode them with %hexcode format if it can’t be represented in ASCII.

  14. mvadu says:

    I think the link mentioned by Ray is not working.

    If you really need the actual text from the URL that Ray mentioned try http://www.google.com/search?q=cache:aESSsrENPO8J:www.rfc.net/rfc1738.html+http://rfc.net/rfc1738.html%23s2.2.&hl=en&ct=clnk&cd=1&gl=us

  15. Leo Davidson says:

    I looked into Unicode URLs a couple of weeks ago and there seem to be at least three ways of encoding them, all incompatible with each other (and some impossible to tell apart from each other without some additional knowledge). It’s a mess.

    Even more so since the URL may be understood by the browser but then handed off to an ActiveX control (or whatever) which assumes a different Unicode URL format, or doesn’t support non-ASCII URLs at all.

  16. Rosyna says:

    The homographs issue was solved on the domain registrar’s side and on the browser side. domain registrars now refuse to accept most homographs and the browsers all warn if the URL contains homographs.

    Since 月 is not a homograph to another character in another encoding (sadly, 1 and l and I are not consider homographs for domains), there is no warning.

  17. Dean Harding says:

    "domain registrars now refuse to accept most homographs"

    Do you trust domain registrars to do the right thing, when the allow thousands of domains like "my-citi-bank.com" or "login-paypal.com" to registered all the time?

  18. Rosyna says:

    No, I definitely don’t trust them to do the right thing. It’s why I am glad some browsers (Safari, IE7 after I complained about the above example not working) do the right thing about homographs.

    A comment here makes me think Firefox does the wrong thing in some situations.

    But wow, this has gotten off-topic.

  19. Dean Harding says:

    "But wow, this has gotten off-topic."

    Heh, see what happens when the teacher steps out of the room :-)

  20. CornedBee says:

    Firefox displays IDNs in their real format only for specific TLDs, where the registrars have put active mechanisms for the avoidance of homographs in place. This includes, for example, the .at TLD, but not .com.

    If I copy the above URL and paste it in the address field, Firefox will indeed convert it to its Punycode form http://xn--sailor-183m.com/imgs/blingedout.jpg .

  21. Rosyna says:

    CornedBee, then I’d consider that a huge bug in Firefox’s implementation. Since 月 is not a homograph with another encoding.

  22. Mo says:

    @Mike:

    Actually, I run IE 6, IE 7 and the IE 8 b1, but I hadn’t actually checked what the status of full support for IRIs was in the respective versions, so I left my options open in the comment :)

    My point really was that a big deal was made about NT being Unicode-capable, and yet the shell forces you to encode URLs as ANSI strings—it was readily conceivable even back in 1995 (prior to IE and Windows being integrated) that Internet resources could, at some point, be addressed using characters outside of the ASCII character set (perhaps not for domain names, but resource paths, queries and anchors), and so it seems to be an artificial restriction—it should have been IE’s job to say “actually, I can’t navigate to this location as it’s not a supported URL”, just as it would if you’d typed it into the address bar, rather than the shell’s job to say “Ha! Don’t even think about trying to encode *that*”.

  23. BryanK says:

    > … Since 月 is not a homograph with another encoding.

    It’s not that it checks whether certain *characters* are homographs.  (That depends on the font that’s in use.)  It checks the TLD instead, and only allows certain supposedly "known-good" TLDs.

    (Now, that raises the question of how the registrars determine what characters might be homographs, so they can disallow a registration.  I don’t know; I’m not sure it’s possible in general.  Sure, there are Unicode code points that have a high likelihood of looking similar to Latin letters, and you can disallow those.  But there are lots of other character pairs that may or may not look the same as each other, depending on the font, and it’s not limited to just Latin letters either.)

  24. nwourms says:

    CornedBee, then I’d consider that a huge bug in Firefox’s implementation. Since 月 is not a homograph with another encoding.

    Why do we even need unicode urls? I’m sorry, but too bad if you are using an obsolete language. Internet was invented and pioneered by the west, thus Latin1 should remain the de-facto standard for urls. Sorry, if this seems mean, but seriously, if we didn’t already have enough to deal with then to remember Kanji or Big5 encoded urls… Time to enter the 21st century far-east, ideographic languages are dead. We don’t want your millions of useless glyphs polluting uri namespace!

  25. Dean Harding says:

    nwourms: That was the most arrogant, irresponsible comment I’ve ever read. Remind me never to buy any of your software.

Comments are closed.