Why does pasting a string containing an illegal filename character into a rename edit box delete the characters from the clipboard, too?


Ane asks why, if you have a string with an illegal filename character on the clipboard, and you paste that string into a rename edit box, do the illegal characters get deleted not just from the edit box but also the clipboard?

Basically, it's a bug, the result of a poor choice of default in an internal helper class.

There is an internal helper class for "monitoring an edit control" with options to do things like remove illegal characters. This helper class was written back in 1998, presumably with the intention of being used somewhere, but it never did get hooked up. Maybe the feature it was originally written for got cancelled, I can't quite tell. At any rate, this helper class had many options, one of which was "When pasting text containing illegal characters, should I filter the illegal characters from the clipboard, too?", and for some reason it defaulted to Yes. (I can see why the default was Yes from a coding standpoint. It was actually less work to filter the characters from the clipboard that it was to preserve them, but it's a bad default from an API design standpoint.)

Anyway, this helper class sat unused for a few years, but in 2000, Explorer decided to use this helper class so it could filter illegal characters out of file names when you used the Rename command. The code that uses this helper class chose which options it wanted, and probably due to oversight, the "preserve clipboard contents when pasting" flag was not specified.

So yeah, it's just a bug. But then again, it's a bug that's been around for over a decade, so who knows if there's somebody out there that relies on it.

Comments (37)
  1. Joe Dietz says:

    Someday (when I'm laid off or something), and feeling particularly eVil, I'm going to write some sort of super-cool application that all my friends are going to want, using only using undocumented APIs and semantics.

  2. Dan Bugglin says:

    @Joe Hype a program you're making for about a month and then release it in April 1st:

    void main() {

     return *0;

    }

    Don't release the source, of course. When the bug reports start coming in, claim it runs fine for you and you can't repro the crash, and they're on their own.

    For bonus points adjust the compile options to get the smallest size EXE to hint to the observant that it probably doesn't do much.

  3. Dan Bugglin says:

    And obviously that should be int, not void.  Plus I'm not sure if you can dereference a literal like that without casting it to a specific pointer type first.

  4. Simon Robinson says:

    But – being pedantic – does using int main() as the program entry point break the rule of using only 'undocumented APIs and semantics'

  5. Pierre B. says:

    @Simon: just document your program as being written is Cobol. That way, main() is undocumented.

    (Disclaimer: I don't know cobol and ignore if it has a main() function. I also can't be bothered to check just for a joke reply.)

    @Raymond: Seriously, is that the level of backward compatibility Windows is aiming at? My mind keeps oscillating between finding this fact wonderful and awful.

  6. ThomasW says:

    If it's a bug then why not simply fix it? I am sure writing this blog entry took longer than actually fixing the bug.

    [Fixing the bug is the easy part. Verifying that your fix didn't break anything is hard. (It's not like every Windows user uploads their personal workflow to a central location so we can run through all of them and verify that we didn't break anybody. And even if they did, imagine how long it would take to run through that massive database and repeat the workflow of every Windows user on the planet.) -Raymond]
  7. JohnL says:

    @Raymond re @ThomasW's comment, they could probably have fixed it around the Vista era – they broke a lot of stuff, so this probably wouldn't have caused too many extra issues.

    [This assumes the problem was known in the Vista era. -Raymond]
  8. John Ludlow says:

    @Raymond re @ThomasW's comment, they could probably have fixed it around the Vista era – they broke a lot of stuff, so this probably wouldn't have caused too many extra issues.

  9. Dylan says:

    @John

    It has, in fact, been fixed by win7.

  10. BC_Programmer says:

    >I am sure writing this blog entry took longer than actually fixing the bug.

    But at the same time, had he not taken the time to write this blog post you would have never gotten the self-satisfaction of making a snarky comment.

  11. Clipboarder says:

    The bug is still in Win 7. You can easily reproduce it.

    The question is weather more people are annoyed by the bug than there are who rely on it.

    [Interesting metric you're using there. So, for example, it's okay to add a feature that breaks an app if the number of people who use the feature is greater than the number of people who use the now-broken app? (So much for the long tail.) -Raymond]
  12. Clipboarder says:

    Well, if breaking the app can't be avoided, I'd do it that way.

    Of course the average time-saving and "anger-reducement" of that new feature (or fixed bug) should be greater than the anger caused by breaking an app. That's often difficult to archive by adding new features but easy by fixing bugs.

  13. Klimax says:

    I am optimistic and assume that sufficent number of people found it so usefull that they are using the bug for filtering…

    (for added evil uses WM_COMMAND or something like that… )

  14. Eric TF Bat says:

    That's an intriguing implication, Raymond.  Using your psychic debugging skillz, can you speculate on how this particular bug could be useful or necessary?  I can see a lot of upside to even the silliest bugs, and your blog is a terrific source for the weirdnesses of backward compatibility, but this one has me beaten.  I'd love to read your wild (informed) guesses on what sort of procedures or workflow could rely on something as obscure as this.

    [You have a string on the clipboard that you want to use as a file name, but it has illegal characters. How do you clean it up? Easy: Click a file on the desktop (doesn't matter which one), hit F2, Ctrl+V, Esc. Bingo: Clipboard cleaned up. I can see this entering some people's muscle memory, possibly even showing up on some "Windows Tips and Tricks" site and becoming part of some company's internal workflow. -Raymond]
  15. Joshua says:

    Easy. Due to the bug, the rename box trims trailing spaces that are in the clipboard. I can imagine lots of people depending on that by accident. You don't want a file named "welcome to eight  .doc" now do you?

  16. ulric says:

    "(So much for the long tail.)"

    In fact, Clipboarder is asking the very same question ("whether more people are annoyed by the bug than there are who rely on it.") Raymond asked about Scraps!  http://www.dotnetrocks.com/…/index8.html

  17. Nick says:

    I see where you're coming from, Raymond, but I'm not sure I understand why a bug such as this garners so much support for backwards compatibility.  How many people really rely on this odd behavior?  Maybe 1000?  I'd have to guess even less.

    I have to compare something like this with other changes made to Windows in the last few versions.  Compare this bug to the removal of the File Types dialog (I know you've blogged about this, and even wrote a TechNet column about it).  The number of people impacted by the complete removal of that feature (flawed as it was) is immensely greater than those relying on this odd little clipboard bug.

    This isn't a complaint, just an observation.  Why do bugs like this persist and yet other, much more drastic changes get pushed through?  You've talked a lot about how features get added to Windows (starting at -200 points, or whatever), but how do "features" like these get slated for removal?

    [Features become candidates for removal when the maintenance costs significantly exceed the benefit + the cost of removing it. (I hope this statement was obvious to most people.) The cost of removing the feature is definitely nonzero; a lot of research would have to be done first. The cost of maintaining this feature is currently zero. It's hard to beat zero. -Raymond]
  18. Joshua says:

    In my experience (not particular to Windows or even to Microsoft) weird little things like this most likely never get cleaned up.

  19. Wishful thinking says:

    I just want to be able to use the colon characters (:) in a file name. I know never going to happen.

  20. Cheong says:

    > Easy: Click a file on the desktop (doesn't matter which one), hit F2, Ctrl+V, Esc. Bingo: Clipboard cleaned up.

    But… even if the behaviour is fixed, all that I have to do is add a Ctrl+C from sequence. And if I were internal documentation writer, I won't rely on behaviour that'd probably be fixed in any next version of Windows. (Note that I write on the presumption that organization requires writing operation guide would prefer the documentation be written in safer way, and have someone with more experience to check the documents… which is not always the case…)

    [Companies complain when we change *anything* that invalidates their training materials, no matter how small the change. And adding Ctrl+Shift+Home, Ctrl+C to the steps is definitely not simple. (A three-key chord is a training nightmare.) -Raymond]
  21. Karellen says:

    Did any of these sorts of bugs get cleaned up for Win64? There were no ABI or back-compat issues there; it is, by definition, an ABI change. All apps must be compiled anew, and re-tested by their developers. For apps that no longer have source or developers, the old binaries would still run as before in the backwards compatible 32-bit WoW environment. But for new stuff? Oh, the kludges that could be ironed out…

    [There was a massive back-compat issue for Win64: You have hundreds of millions of lines of 32-bit source code that need to keep working when you recompiled as 64-bit. If you've never ported code, you don't understand how big a deal this is. -Raymond]
  22. dave says:

    I just want to be able to use the colon characters (:) in a file name. I know never going to happen.

    Right. It's now the same as saying "I just want to be able to use the backslash character () in a file name".  You cannot use it IN a name, because it has been defined as separation BETWEEN name components.  

    In the case of colon, it separates file name from stream name.

    c:temp>echo hello stream >foo:bar

    c:temp>more <foo:bar

    hello stream

  23. Random832 says:

    @Dylan "It has, in fact, been fixed by win7."

    No, but I can see why you might think this. It only affects the WM_UNITEXT clipboard data, not WM_TEXT, and it doesn't take ownership of the OLE clipboard. This gives some insight, incidentally, into just how the bug is happening and why it could be easier to modify the clipboard than to not do so.

    @Joshua: "Easy. Due to the bug, the rename box trims trailing spaces that are in the clipboard. I can imagine lots of people depending on that by accident. You don't want a file named "welcome to eight  .doc" now do you?" Well, for one thing, if the bug were fixed, it would still do everything it does now, to the pasted text – the bug is the fact that it actually removes them from the clipboard so if you then go and paste the same clipboard into notepad the characters it didn't like are gone. Also, this function doesn't remove trailing spaces – trailing spaces are removed from filenames [after you hit enter on the rename in explorer, not while you paste] by core win32 functionality that is not a bug, and this does not affect the clipboard.

  24. Ivo says:

    I don't buy it. This is a bug, plain and simple. An application should not alter the contents of the clipboard without an explicit cut or copy action from the user (I think you blogged about this in the past). What about all the people who depend on that correct behavior?

    By this logic no bugs should ever be fixed. For any bug or feature (or even lack of feature) you can argue that somebody can depend on it. Maybe somebody depends on the name of the OS being Windows 7. Maybe my code grabs the 9th character from the OS name and divides by it, instead if simply using 7. Now you are stuck – any future versions of Windows must be named Windows 7. Or another example – there is a buffer overflow bug in Windows Explorer. Some program (a virus) exploits it. You should never fix that because you will break compatibility with the virus.

    [You'd be surprised how many programs crash if the OS version is not the exact number they are expecting. This is why compatibility is hard. (And as we've already noted, security trumps compatibility.) -Raymond]
  25. Ivo says:

    My point was that if the criteria of fixing a bug, adding a feature, or removing a feature is "somebody may depend on the current behavior" then you can't ever do any change beyond version 1.0. Potentially any change can break something. So we would still be using Windows 1.0. (although since security trumps compatibility, we'll have a very secure Windows 1.0 by now).

    Also the formula "Features become candidates for removal when the maintenance costs significantly exceed the benefit + the cost of removing it." is incomplete. The correct one is "Features become candidates for removal when the maintenance costs + THE BENEFIT OF REMOVING IT significantly exceed the benefit (OF KEEPING IT) + the cost of removing it." It is true that the benefit of removing a feature is almost zero in most cases. However when you apply the formula to a bug (which you do in this case) then the benefit of removing a buggy feature is definitely not zero. Of course you can argue that "the benefit" in your formula is the difference between "benefit of keeping the feature" and "benefit of removing the feature". But then in this case it will be a negative value, so it is no longer true that "it is hard to beat zero". It's not hard when you start from a negative value.

  26. K says:

    Not removeing a bug because "someone might depend on it" is good in the short term, but horrible in the long term. It leads to abominations such as Swing in Java, or the seventy billion String types in C++. But it is pretty typical for current economic practices, where one sacrifices long-term stability to be able to buy some more "AAA"-stocks with their inflated bonus.

    I do not care about this bug much, but about the practise of keeping bugs because they *might* be useful for *someone*.

  27. James Schend says:

    I can't speak for Unix; I come from the world of Classic Mac OS where you can use any character you can type except the colon. Slashes were fine, both types. (Which still doesn't fix the problem of not being able to use the colon… but anyway.)

    From my background, the allowed characters in Windows (and Unix, and Linux, and OS X) seem extremely restrictive. And I constantly, constantly, get miffed by the "you can't use that character" dialog when naming files, because I don't have the list in my brain like people raised on Windows do.

  28. DWalker says:

    I'm often surprised that filenames can contain parentheses.  As a programmer, using parentheses as expression separators, it doesn't match my expectations that they would be legal in filenames.  Oh well.

  29. Kelden says:

    You can use similar unicode characters

    http://www.fileformat.info/…/search.htm

    FULLWIDTH COLON     U+FF1A     FULLWIDTH COLON     :

    or better

    MODIFIER LETTER RAISED COLON     U+02F8     MODIFIER LETTER RAISED COLON     ˸

    http://www.fileformat.info/…/search.htm&preview=entity

    SMALL REVERSE SOLIDUS     U+FE68     SMALL REVERSE SOLIDUS     ﹨

    FULLWIDTH REVERSE SOLIDUS     U+FF3C     FULLWIDTH REVERSE SOLIDUS     \

  30. Gabe says:

    Ivo: Changing version numbers is a big problem. Have you ever seen a web browser's User Agent string? For the past 10+ years almost every major browser has had to put "Mozilla/4.0" or "Mozilla/5.0" in the beginning of their user agent strings. Do you know why?

    It's because back in 1994 when Mosaic Communications started up, the codename for their beta web browser was Mozilla. In order for web sites to use all the fancy features it had, web sites had to sniff the user agent string to look for "Mozilla" and give the fancy pages to the fancy browsers and regular pages to regular browsers. By the time the product was given the name Netscape Navigator, it was too late to change the user agent string because that would cause people who downloaded the newer browser to not get the fancy pages. Not only was Netscape unable to change the internal name of their own product once it was released, every *other* browser was doomed to have to use the same name — and even the same version numbering system!

    Anybody who wanted to make a browser with things like tables and animated GIFs had to put "Mozilla" at the beginning of their user agent strings too, so that web servers would know to serve them the fancy pages. Every so often Netscape would come out with a new browser with fancy new features and increment the version number. This caused every other browser developer to upgrade to the new version number once they implemented whatever features Netscape had in that version.

    Apparently they stopped incrementing the version at 5.0 (a version that was never actually released), and there it has stayed since the late '90s. Thus, "Mozilla" was carved in stone over 16 years ago by a beta release and "5.0" was set over 10 years ago by a product that never even saw the light of day. Compatibility may be hard on Windows, but that's nothing compared to how hard it is for web browsers.

  31. James Schend says:

    Raymond, you must be getting an influx a lot of new readers or something… it seems like the backwards compatibility conversation happens every month.

    @dave: We understand that there's a reason you can't use : in a path, but that doesn't change the fact that we *want* to use : in a path. For example, to put a time in a filename!

    [What do unix people do when they want to use a slash in a file name? -Raymond]
  32. Evan says:

    @Wishful thinking: Just use Services for Unix (or SUA or whatever it's called now). Of course you won't be able to open that file from any program you care about, but that's a small price to pay for being able to name it whatever you want.

    @DWalker59: It's not like Unix prohibits you from using () in your file names, you just have to escape it in your shell. (Unix filesystems typically allow any character except / and NUL.)

    @Kelden: Hard to type and confusion inducing. A perfect suggestion.

  33. John says:

    Raymond, I'm curious how modifying the clipboard is less work than leaving it alone. As I see it, leaving it alone would be 1) read clipboard, 2) manipulate string; and modifying it would be 1) read clipboard, 2) manipulate string, and 3) write clipboard.

    Could you please explain real quick?

  34. Random832 says:

    @John: It has to do with how the clipboard works. Each clipboard format is a GlobalAlloc(GMEM_MOVEABLE) buffer allocated by the clipboard. GetClipboardData simply returns the handle. Normally, the application is supposed to lock the handle, copy the data to its own buffer, and unlock it, but there's nothing to actually stop the application from modifying it in place while it's got the handle locked. Once this 'clicked' for me, I verified that this is indeed what happens by doing a test to show that CF_TEXT [and other formats] is not modified, only CF_UNITEXT

    Which of course means that this is part of someone's workflow, then "it doesn't work if i'm pasting into [some non-unicode application]" is a bug from their point of view

  35. Marcos Marado says:

    I understand the reasons for not having : on filenames, but regarding "What do unix people do when they want to use a slash in a file name?", the answer is "they just use it".

    (On GNU/Linux:)

    >echo Hello >slash∕file

    >cat slash∕file

    Hello

    >

    [That's not the same as a / though. It's a look-alike character. Most people type "and/or" and not "and∕or". -Raymond]
  36. Nick says:

    [Features become candidates for removal when the maintenance costs significantly exceed the benefit + the cost of removing it. (I hope this statement was obvious to most people.) …]

    I appreciate your response, but that seems a little disingenuous.  Surely there's more involved than simple maintenance costs — you talk a lot about keeping features in place for compatibility, and I'm sure that the number of people using a feature must be considered as well.

    My point was just that features with large numbers of active users have been removed from Windows (for better or worse), and yet odd little bugs like this remain.  Maybe the problem is my equating "features" and "bugs"; perhaps you have a different view depending on which category we're dealing with (features are transient, but bugs are forever?).

  37. Marcos Marado says:

    Regarding the "we won't fix this bug because people might depend on it"… come on, if you would do that, you wouldn't fix *any* bug. And I'm not even suggesting, like some would, that you never fix any bugs anyway…

Comments are closed.