Smart quotes: The hidden scourge of text meant for computer consumption


Smart quotes—you know, those fancy quotation marks that curl “like this” ‘and this’ instead of standing up straight "like this" 'and this'—are great for text meant for humans to read. Like serifs and other typographic details, they act as subtle cues that aid in reading.

But don't let a compiler or interpreter see them.

In most programming languages, quotation marks have very specific meanings. They might be used to enclose the text of a string, they might be used to introduce a comment, they might even be a scope resolution operator. But in all cases, the language specification indicates that the role is played by the quotation mark U+0022 or apostrophe U+0027. From the language's point of view, the visually similar characters U+2018, U+2019, and U+02BC (among others) are completely unrelated.

I see this often on Web sites, where somebody decided to "edit" the content to make it "look better" by curlifying the quotation marks, turning what used to be working code into a big pile of syntax errors.

I even see it in email. Somebody encounters a crash in a component under development and connects a debugger and sends mail to the component team describing the problem and including the information on how to connect to the debugger like this:

WinDbg –remote npipe:server=abc,pipe=def

Or maybe like this:

Remote.exe “abc” “def”

And you, as a member of the team responsible for that component copy the text out of the email (to ensure there are no transcription errors) and paste it into a command line.

C:\> Remote.exe "abc" "def"

and you get the error

Unable to connect to server ôabcö

What happened? You got screwed over by smart quotes. The person who sent the email had smart quotes turned on in their email editor, and it turned "abc" into “abc”. You then got lulled into a false sense of security by the best fit behavior of WideCharToMultiByte, which says I can't represent “ and ” in the console code page, but I can map them to " which is a close visual approximation, so I'll use that instead. As a result, the value you see on the command line shows straight quotes, but that's just a façade behind which the ugly smart quotes are lurking.

I've even seen people hoist by their own smartly-quoted petard.

I can't seem to access a file called aaa bbb.txt. The command

type “aaa bbb.txt”

results in the strange error message

The system cannot find the file specified.
Error occurred while processing: "a.
The system cannot find the file specified.
Error occurred while processing: x.txt".

Why can't I access this file?

Somehow they managed to type smart quotes into their own command line.

So watch out for those smart quotes. When you're sending email containing code or command lines, make sure your editor didn't "make it pretty" and in the process destroy it.

Exercise: What is wrong with the WinDbg command line above?

Bonus chatter: PowerShell is a notable exception to this principle, for it treats all flavors of smart quotes and smart dashes as if they were dumb quotes and dumb dashes. From what I can tell, you have this guy to thank (or blame).

Comments (43)
  1. Anonymous says:

    "What is wrong with the WinDbg command line above?"

    It uses unicode char 0x2013 (EN DASH) where it should be using 0x002D (HYPHEN-MINUS).

  2. Anonymous says:

    The "-" is not the normal dash character. The dash is character Ox2D, while the one present here is 0x96.

    This is obvious if you see the text in the hex view of Visual Studio, the character doesn’t even appear as a dash in there.

  3. Anonymous says:

    I totally agree with Karellen.  Sometimes software tries to be a little too helpful.  A certain office productivity suite written by a software corporation in the northwest United States comes to mind.

    It looks like you’re writing a letter!!!

  4. Anonymous says:

    "PowerShell is a notable exception to this principle, for it treats all flavors of smart quotes and smart dashes as if they were dumb quotes and dumb dashes."

    I hope every shell is eventually updated to do this. Or even better, uses smart quotes to make it easier to detect mismatched quotes like "dfdfdf "Dfdfdfd "fdfdfd"

  5. Anonymous says:

    Everyone should turn of smart quotes until they get smarter.  I’d much rather see "dribble" quotes and apostrophes than the plethora of backwards apostrophes that have plagued us in the past few years.

    Even some of the big "McCain ’08" signs at the Republican convention had the friggin’ apostrophe backwards.  Lately, I’ve seen quite a few of these wrong in recently-released novels.

    "Turn `em off," I say! ;-)

  6. Anonymous says:

    Too helpful?  Smart characters should be on by default in document suites that are designed for churning out human-readable stuff.  If people can’t turn them off now, they certainly would never turn them on in the other case.  And when producing good looking documents, you want the correct typographical characters.

    That sounds like just the right amount of helpful.

    Now since I regularly paste code snippets into design docs that I write in Word, I personally have turned off many auto-correct settings and I get along just fine.  But it wouldn’t make sense to have all these features off by default.

  7. quotemstr says:

    What about smart quotes in comments? They seem relatively harmless there.

    [Iöm not so sure. -Raymond]
  8. Anonymous says:

    I don’t know if there are really people who go and actively change the quotes to pretty-quotes.  

    But I’ve had that changed me many times by software. I’m trying to remember when this happens.  I think the WordPress editor does it, and I think if you use MS Word as your editor, it changes it for you as well.  

  9. If people can’t turn them off now, they certainly would never turn them on in the other case.

    I disagree.  It just takes one person to turn it on and send email with the pretty curly quotes, and then people will ask them how to do it.

    apostrophe backwards

    I think this behavior started out because people don’t know how to defeat Word’s clever "oh, this is an opening single-quote" AI.

    But the backwards leading apostrophe is reaching a level of penetration bordering on acceptable behavior.

  10. Anonymous says:

    "I don’t know if there are really people who go and actively change the quotes to pretty-quotes."

    I sometimes do, ’cause it looks better. In many of the web pages I maintain, I will insert the html entities for the curly quotes instead of ". For a while I had the Unicode code points memorized. (Also for the en- and em- dash.)

    The problem is that it screws up searching; if you "properly" typeset, for instance, "doesn’t", the user won’t be able to actually use the browser’s find feature to search for it since the ‘ won’t match.

    I think more programs should take this into account so that I can do the right thing. I agree with Yuhong Bao.

    (Though, I’m somewhat of a typographical snob; I’m a bit spoiled by Latex and the easy curly quotes, dashes, ligatures, etc.)

  11. Anonymous says:

    Yuhong Bao,

    Using smart quotes in the command prompt has a nasty problem though – these characters are legal in filenames. Would you like to have filenames which you can’t access from the command prompt? There are bound to be enterprising users out there who figured out that they can’t have quotes in their filenames, unless they copy them from word or outlook, when they seem to be allowed, so lets always do that. Then when you open the file from explorer, and the app or CreateProcess or whoever it is deals with quotes in command lines (I think it may even be the CRT) parses the command line and helpfully sorts out the smart quotes, then bam – the user can’t open their file.

    I’ve never used powershell, so I don’t know how it deals with that.

  12. Anonymous says:

    Handy tip: Changing the font used in the console from the default bitmap one to Lucida Console allows for smart quotes to appear correctly.

    This is especially helpful when you consider that a file containing a smart double quote in its name is actually legal (it appears as a straight quote with the bitmap font)!

  13. Anonymous says:

    "Using smart quotes in the command prompt has a nasty problem though – these characters are legal in filenames."

    People figured out how to deal with this problem four decades ago.

    cmd.exe already has a method of escaping quotes: put two in a row. E.g. if ‘foo’ prints argv[1], then

     foo hello""world

    will print ‘hello"world’. (Or something like that; maybe you have to put quotes around it.)

    There’s no reason that you couldn’t do the same to escape curly quotes. Or, MS could ship a new shell that works more like the Unix shells, in that they actually allow real escape sequences; then you could put " in file names too. (PowerShell may meet this criterion; I’m  not sure.)

    (Those of you who actually know how the command line is parsed will know I’m sort of lying there, because programs can actually see the literal command line. But if you use the CRT to parse the command line into the standard argc/argv, then you will get what I said.)

  14. Anonymous says:

    "People figured out how to deal with this problem four decades ago."

    Sure they did, but unfortunately they didn’t bake this into the CRT so there are loads of old apps that won’t know how to do it. Where is that time machine again?

  15. Anonymous says:

    "Everyone should turn of smart quotes until they get smarter."

    The problem there (at least for me) is that when I set up a machine for development, I have roughly 10 or so MAJOR tasks, taking upwards 8 hours to do (install Windows, update Windows, lock Windows down, install Office, install VS.Net, add MSDN, install AJAX, install XNA, update .Net, set up project locations, install TFS, set up source control workspace(s), etc. I have roughly twice as many "minor" tasks (on par with the quote setting), and it is REAL easy to forget any # of those.

    Just a matter of priorities crossed with time.

  16. Anonymous says:

    If it can’t be expressed in ASCII, it’s not worth expressing.  You can use LaTeX or whatever for things like mathematical formulas.  For everything else, we just need to convert the entire World to U.S. English and everything will be fine.  In return, we will convert to the metric system.  I think that’s a fair trade.

  17. Anonymous says:

    Office isn’t horrible.  Misuse of Office is horrible.

    See also: Guns don’t kill people.

  18. Anonymous says:

    "See also: Guns don’t kill people."

    No, usually it’s the bullet that gets them. Though, I suppose if you were low on ammunition you might bludgeon someone to death with the firearm itself.

    Versatile objects, those guns.

  19. Anonymous says:

    Word is very aggresive by default in converting quotes to smart quotes and double dashes to em-dashes. Since it’s the default editor for email when composing in HTML or RTF, it’s a constant issue for me. I’ll extrapolate from the sample size of 1 and claim that it’s the cause of most of these kinds of problems.

  20. Anonymous says:

    "From what I can tell, you have this guy to thank (or blame)."

    I choose… blame.  What an awful idea, let’s hope this is a trend that dies a quick and painful death.  

    For starters, mismatching quotes in code even in the most reasonable font will be a nightmare to debug.  It’s a slippery slope to making compilers smart about closing the quotes for you, adding in semicolons, and correcting your spelling in literals.

  21. Anonymous says:

    It gets worse! Not only does the Microsoft Word replace straight quotes with smart quotes, Microsoft Outlook 2007 does as well, because by default Word is used as its text editor.

    And not only does Word replace these characters, it does so using an extended ASCII range that makes it completely incompatible with UTF-8, the default character encoding for both HTML and XML.

    http://www.garfieldtech.com/blog/stupid-quotes

    So it’s not just the command line that gets messed up. It’s the entire Internet.

  22. Anonymous says:

    "For starters, mismatching quotes in code even in the most reasonable font will be a nightmare to debug."

    PowerShell doesn’t distinguish between “ and ”; in other words, ‘foo “abc“’ will work the same as ‘foo "abc"’ and ‘foo “abc”’. So there’s no mismatching quotes any more than there is with straight ASCII quotes.

    "It’s a slippery slope to making compilers smart about closing the quotes for you, adding in semicolons, and correcting your spelling in literals."

    What? Alarmist much?

  23. Anonymous says:

    “somebody decided to “edit” the content to make it “look better” by curlifying the quotation marks”

    “The person who sent the email had smart quotes turned on in their email editor”

    You frame it like people do this on purpose.

    I can’t remember a time when anyone intentionally replaced normal quotes with smart quotes in anything even resembling resembling code, or turned smart quotes on in their (email) editing program.

    As far as I can tell, this is always due to editing programs making changes themselves, and having smart quote substitution turned on by default. Users generally have *failed to turn off* an arcane configuration option in order to get their editor to accept the keys they’re actually pressing on their keyboard.

    Yup, two possible options. One is safe. The other breaks some things, but has the minor aesthetic advantages of making some text appear slightly prettier. Guess which is the default.

    [I’ve had to fix slide decks where somebody decided to helpfully curlify my coded quotation marks. -Raymond]
  24. Anonymous says:

    Down with auto-‘smart’ even in apps like Word! It’s impossible to always choose the ‘correct’ quote/apostrophe character without higher-level understanding of the desired meaning of the text.

    Instead, give us default keyboards that can type the smart quotes and other Unicode typographical niceties explicitly. The layout I’m using at the moment puts “‘…’” on AltGr+{[…]} (along with others) and is perfectly convenient to generate them when I need them, without butting in when I don’t.

  25. Anonymous says:

    "The solution here is not to cripple word processing software, it’s to not use word processing software to write code."

    — That idea might seem reasonable, but it lasts about 10 seconds until someone sends a code snippet by e-mail, posts it to their blog, prints it in their PDF whitepaper, or the myriad of other places where auto-correcting editors come into play. Certainly no one sane would type their source code in a word processor, but the reality is that code gets copy-and-pasted from e-mails, blogs and magazines thousands of times a day.

    Many times programmers won’t even notice what they did, especially when it’s a code snippet from their co-worker via e-mail. Many people don’t think of e-mail composition as a word processor yet most applications treat it as such.

    I’m not suggesting crippling word processors as the answer but it’s equally unreasonable to suggest that allowing code near a word processor is wholly avoidable, either. The only real solution is to be more aware of the problem. (Or perhaps to make the source code editors more aware of the problem and highlight the potential error!)

  26. Anonymous says:

    I know how you feel bobince. I often turn auto-correct off and type my quotes with the ALT-codes instead.

    Most people don’t even know you can do that though, much less have the punctuation codes memorized. I remember the first time my son saw me typing ALT-codes. He was so amazed, he immediately memorized the ASCII chart so he could type with the numeric keypad only without using the rest of the keyboard at all. Went to school and freaked all his teachers out because none of them knew you could do that, either. It was really funny.

  27. Asztal says:

    WordPress seems to change things like 0x54385894 into 0×54385894. I don’t really see the point in that one.

  28. Anonymous says:

    For a programmatic example of how smart quotes (arguably not "true" smart quotes) are used, check out m4: http://en.wikipedia.org/wiki/M4_(computer_language).

  29. Anonymous says:

    @Asztal

    "Wordpress seems to change things like 0x54385894 into 0×54385894. I don’t really see the point in that one."

    I think it assumes by ‘x’ you mean the multiplication sign ‘×’.

    What’s wrong with just using " or ‘ ? It’s not harder to read, and before all this code page stuff came along we didn’t have any other quotes on computers. Didn’t harm us then, why would it harm us now?

  30. Anonymous says:

    So many editors–including e-mail programs–do this automatically these days. I’ve gotten burned by it, and I’m well aware of the problem.

    The worst part is most editors don’t convert smart quotes correctly even in a typographical context! Word, for example, converts *all* "straight" quotation marks to "curly" quotes regardless of context. This is incorrect. Straight quotes are actually numerical markers such as for inches and feet. Editors that encounter mid-sentence unmatched quotes following a number should leave them straight. There are other similar rules with em, en dashes and other punctuation that most editors screw up, too. As someone who spent many years hand-setting text this really aggravates me.

    Auto-correcting editors haven’t really made homespun typography any better, they’ve just made it wrong in a different way. Aargh.

  31. Anonymous says:

    I got so fed up with this sort of thing happening, I added code to my applications to look for these oddities and parse them as if they were the expected thing.

  32. Anonymous says:

    "It’s a slippery slope to making compilers smart about closing the quotes for you, adding in semicolons, and correcting your spelling in literals."

    Oh, Please please please, can I have the compiler spellcheck my literals? My variable names too? Well, actually the IDE should do it rather than the compiler, and it would need some kind of smarts for CamelCase and underscores_as_spaces, but I would love this feature. No more embarrassing code review comments asking what a massageParser is.

  33. Anonymous says:

    Somewhat similar irritation – people who refuse to email me a plain bitmap. They insist on pasting it into a Word document and sending that instead. This is of course extremely helpful because otherwise they wouldn’t be able to introduce me to their collection of macro viruses.

  34. Anonymous says:

    It’s like seeing the insides of a television and being amazed at the little intricacies especially the ones that shock you when you touch them.

  35. Anonymous says:

    At least it’s easy enough to keep yourself from making that mistake.  A while back I made a really simple program that just reloads whatever is in the clipboard as plaintext, and it corrects for all those curved quotes, em-dashes, etc., as well as stripping out all the formatting nonsense.  Bind it to a global key, like Ctrl-Shift-V and just do that before pasting.  I use it all the time now; way easier than looking for paste-special menu items or remembering how each application deals with rich text.

    If anyone’s interested:

    http://www.kavendek.com/stuff/SimplifyClipboard.cpp

    http://www.kavendek.com/stuff/SimplifyClipboard.exe

  36. Anonymous says:

    I once got send code for a SQL select. Very simple because I’m not a programmer.

    It was only finally when I sent the .mdb with the data stripped out that we realized why it worked for him but not for me. Yup, he had Word as his email editor.

    I’ve lost count of the times I’ve read about people complaining some forum software doesn’t show quotes. I tell them to cut and paste into the text box from Notebook not Word, but they still say it’s the newspaper/forum software that is wrong.

    (Another common complaint is how the discussion forum software has a lousy American spell check. The forum software doesn’t have a spell check at all of course, it’s the default spell check that comes with Firefox (which incidentally has stopped working for the last two or three updates).

  37. Anonymous says:

    There is a solution: use the demoronizer!

  38. Anonymous says:

    Daniel Earwicker: In fact Raymond devoted an entire article to exactly this:

    http://blogs.msdn.com/oldnewthing/archive/2008/08/19/8877486.aspx

  39. Anonymous says:

    Not going to discuss the pros and cons of smart-quotes, but I am confused by Dean’s comment that they use "an extended ASCII range that makes it completely incompatible with UTF-8."

    Unless I’m mistaken, we’re talking about characters in the U+2000 range, but UTF-8 can easily encode most of U+0000 to U+10FFFF, even restricted as it is by RFC 3629.

  40. Anonymous says:

    Nevermind, maybe. I think I figured it out.

    By "extended ASCII range", he’s referring to the upper half of a codepage like 1252, and stating that it is "completely incompatible with UTF-8" because you have to go through the effort of actually /mapping/ the codepoints.

  41. Anonymous says:

    There’s no reason that you couldn’t do the same to escape curly quotes.

    Actually, it’s easier to do it for curly quotes than for normal double quotes, because existing programs are unaware of curly quotes, and so, accept them as normal command line input.

    Usually, double quotes can be used to include space characters in a command line parameter.

    some_command "hello world"

    However, I’m aware of at least TWO 32 bits hexadecimal editors not supporting this feature, making it IMPOSSIBLE to open files with space characters in their path, through the command line.

    On the topic of smart quotes:

    They’re bad on the Web, because of shared responsibilities involving buggy Web browsers, Web servers and Web scripts.

    Typically:

    A normal person copy text in Word and paste it in a text form on a Web page, including smart quotes.

    The page is encoded in ISO-8859-1, and so, the Web browser figures that it must send a POST request with charset=ISO-8859-1.

    Since the buggy browser thinks that ISO-8859-1 = Windows-1252, it sends code points used for control characters in ISO-8859-1 (between 0x80 and 0xA0).

    The buggy Web server doesn’t validate input and store this buggy data in its data base, believing it’s valid ISO-8859-1.

    Then, if the POST data appears on a Web page (e.g. a blog comment), it serves the page as ISO-8859-1 or UTF-8. At best, with ISO-8859-1, it serves Windows-1252 and claims it to be ISO-8859-1, which is ok with buggy Web browsers believing that ISO-8859-1 = Windows-1252. For UTF-8, it might mess up the thing, either deleting the characters, or not translating them (sending sort of Windows-UTF-8), or, if we’re lucky, properly translating them to UTF-8.

    Things are so messy that, HTML5 actually REQUIRES Web browsers to assume that ISO-8859-1 = Windows-1252.

    <http://www.whatwg.org/specs/web-apps/current-work/#character-encodings-0&gt;

    Whether the WHATWG can, cannot, should or shouldn’t require HTML clients to violate HTTP and ISO-8859 standards is another topic.

  42. DWalker59 says:

    @ulric and others:  I was pasting a short piece of SQL code into an e-mail message using Outlook 2007, today, and since my message format is HTML by default, and uses Word as the editor, I had a heck of a time getting the quotes in the SQL to stay as straight quotes.

    I could probably change the default for Word when it’s used as the e-mail editor and not when I’m creating a regular Word document, but I don’t happen to know how to change one setting without the other.

    I occasionally need to underline something or make it RED and BOLD in an e-mail or risk having it overlooked; otherwise, I would use plain text e-mails.

Comments are closed.