How a bullet turns into a beep


Here’s a minor mystery:

echo •

That last character is U+2022. Select that line with the mouse, right-click, and select Copy to copy it to the clipboard. Now go to a command prompt and paste it and hit Enter.

You’d expect a • to be printed, but instead you get a beep. What happened?

Here’s another clue. Run this program.

class Mystery {
 public static void Main() {
  System.Console.WriteLine("\x2022");
 }
}

Hm, there’s that beep again. How about this program:

#include <stdio.h>
#include <windows.h>

int __cdecl main(int argc, char **argv)
{
 char ch;
 if (WideCharToMultiByte(CP_OEMCP, 0, L"\x2022", 1,
                         &ch,  1, NULL, NULL) == 1) {
  printf("%d\n", ch);
 }
 return 0;
}

Run this program and it prints “7”.

By now you should have figured out what’s going on. In the OEM code page, the bullet character is being converted to a beep. But why is that?

What you’re seeing is MB_USEGLYPHCHARS in reverse. Michael Kaplan discussed MB_USEGLYPHCHARS a while ago. It determines whether certain characters should be treated as control characters or as printable characters when converting to Unicode. For example, it controls whether the ASCII bell character 0x07 should be converted to the Unicode bell character U+0007 or to the Unicode bullet U+2022. You need the MB_USEGLYPHCHARS flag to decide which way to go when converting to Unicode, but there is no corresponding ambiguity when converting from Unicode. When converting from Unicode, both U+0007 and U+2022 map to the ASCII bell character.

“But converting a bullet to 0x07 is clearly wrong. I mean, who expects a printable character to turn into a control character?”

Well, you’re assuming that the code who does the conversion is going to interpret it as a control character. The code might treat it as a glyph character, like this:

// starting with the scratch program

void
PaintContent(HWND hwnd, PAINTSTRUCT *pps)
{
 HFONT hfPrev = SelectFont(pps->hdc, GetStockFont(OEM_FIXED_FONT));
 TextOut(pps->hdc, 0, 0, "\x07", 1);
 SelectFont(pps->hdc, hfPrev);
}

Run this program and you get a happy bullet in the corner of the window. The TextOut function does not interpret control characters as control characters; it interprets them as glyphs.

The WideCharToMultiByte function doesn’t know what you’re going to do with the string it produces. It converts the character and leaves you to decide what to do next. There doesn’t appear to be a WC_DONTUSEGLYPHCHARS flag, so you’re going to get glyph characters whether you like it or not.

(Postscript: You can see this happening in reverse from the command prompt. Then again, since this problem is itself a reversal, I guess you could say the behavior is happening in the forward direction now… Type echo ^A where you actually type Ctrl+A where I wrote ^A. The result: A smiling face, U+263A.)

Comments (20)
  1. cocobello says:

    Ctrl+G=BEL

  2. kiwiblue says:

    Can’t reproduce on XPSP2. I’m copying it into CMD.EXE console window, and echo actually copies the bullet character.

  3. kiwiblue says:

    Just tried "echo Ctrl+G", and it beeps nicely, so the internal speaker is OK.

  4. Johannes Rössel says:

    Whether this issue is reproducable depends on the settings for the command line window. Since I set my default cmd to Lucida Console some time ago it can actually display Unicode characters directly on the console. If, however, the window is set to use a raster font it only has the OEM codepage and will convert the bullet character into BEL which, on echo, will beep.

  5. Nish says:

    > Type echo ^A where you actually type Ctrl+A where I wrote ^A. The result: A smiling face, ☺ U+263A.) <<

    Raymond,

    Isn’t that behavior due to the fact that ASCII 1 is ☺ ?

  6. jim says:

    Interesting. A little experiment with a command prompt (XPSP2 again) reveals the following behaviour:

    Most control characters display as ^char and echo as glyphs. ^C aborts. ^G "echos" as a beep only. ^H and ^I are interpreted by the line editor as backspace and tab, respectively. ^J seems to be ignored completely. ^K and ^L echo as glyphs only amongst other input; on their own they just give the "ECHO is on." message. ^M is interpreted as ENTER. ^S suppresses the next character. ^Z is the end-of-input character; anything after it on a line appears in the line editor but is not echoed.

    Bored? Moi?

  7. Carlos says:

    jim missed out ^@, which seems to act as an end-of-input-on-the-current-line character, then prompts for more input on the next line.  e.g.:

    C:>echo hello^@world

    More? dolly

    hellodolly

  8. andy says:

    jim’s comment about ^Z reminded me of the article "Using the echo command to remember what you were doing." (http://blogs.msdn.com/oldnewthing/archive/2004/04/29/123012.aspx).

    Instead of pressing the "home"-button and then typing "echo " you can just press the "home"-button and type ^Z (i.e. press CTRL+Z) and get the same effect. 4 keys less to type :)

  9. Mike Dimmick says:

    Nish: ASCII 1 is not ☺, it is Start Of Heading (SOH). It may be ☺ in IBM codepage 437, but that’s a different thing entirely. Whether the ‘C0 Controls’ displayed as those symbols or performed their control function depended on which API you were using to display text. If you use the raw ‘display a character’ BIOS API or write directly into the display buffer, you get the display character; if you use the ‘display a string’ API the character is interpreted.

    I wrote a library to help port from Symbol Series 3000 (DOS-based with an extended largely-IBM-compatible BIOS) to Windows CE. My implementation of the ‘display a string’ API currently doesn’t emulate a teletype, so character code 7 produces a bullet rather than a beep. The C0 controls weren’t actually used much in Series 3000 programs so we tend to fix the program rather than add teletype support to the library.

  10. Mike Dimmick beat me to explaining ☺, but to Carlos: CTRL-@ is equal to character code 0 (zero), which is ASCII NUL. The use of zero in C as a string terminator might have something to do with the behaviour you’re seeing… or it might not :-)

    If you look at an ASCII table, you can see that @ is character 64, and the action of the CTRL key is (nominally) to reset the sixth bit, so that CTRL-@ -> 0 == NUL, CTRL-A -> 1 == SOH, CTRL-G -> 7 == BEL  and so forth:

    http://www.asciitable.com/

  11. Ben Cooke says:

    Somehow I had completely forgotten that there were glyphs in those control characters in the olden days. You had me scratching my head for a few minutes thinking "why would a BEL be a bullet?!".

    I’m sure lots of people of a suitable age remember making silly little demos/games involving those smiley face characters, the card symbols and the musical note. Those arrow characters were quite useful for scroll bars, too. I guess it would have been a waste not to use those characters for glyphs too, since those <32 values could be written to video memory just fine.

  12. SM says:

    (jim) >>"^J seems to be ignored completely …^M is interpreted as ENTER."

    That is interesting.  The ^J should be a Linefeed, and ^M should be a Carriage Return. Of course, with windows files, a line ending is noted with the CR LF combination.  I guess the enter key in cmd.exe only sends a CR?  

  13. Nick says:

    So ‘echo ^D’ prints a diamond. D for diamond, it all makes sense now!

    Regarding andy’s comment, if I want to save what I’ve typed at a prompt, I usually don’t press HOME at all.  What I do (that works with Bash and CMD) is just press CTRL-C. This cancels and gives me a new prompt, but leaves whatever I had on the previous line intact. Very handy.

    [But it doesn’t go into the command history, which is might inconvenient. -Raymond]
  14. Rick C says:

    On the other hand, if I try, also on XPSP2, it copies the bullet *and* beeps.

  15. Nick says:

    [But it doesn’t go into the command history, which is might inconvenient. -Raymond]

    Ah, true enough. I didn’t think about that.

    An alternative that does require use of HOME is to type a colon at the start of the line. It gets treated the same as a REM and does stay in the command history.

  16. Norman Diamond says:

    > GetStockFont(OEM_FIXED_FONT)

    I think you need to go into Control Panel and set your system’s default language for non-Unicode programs.  In fact even if your program IS Unicode I think you have to do that setting.  I should read and experiment to see if AppLocale will take care of it.  Anyway just getting the default code page changed doesn’t get the default font changed.

    > Type echo ^A where you actually type Ctrl+A

    > where I wrote ^A. The result:

    The result is a quotation mark.  I think in a command prompt window the command “mode con cp select=” some number will adjust the font together with the code page.

    [Sigh. “On a Windows XP machine in the default configuration for a US-English system.” I assume people are smart enough to figure that out. Are you nitpicking or were you genuinely confused? I can never tell with you. -Raymond]
  17. svark says:

    >Can’t reproduce on XPSP2. I’m copying it into >>CMD.EXE console window, and echo actually >>copies the bullet character.

    In the command window properties if the Font is chosen as Lucida it would print a bullet, otherwise if raster fonts is chosen it would sound a bell as Raymond noted.

  18. kiwiblue says:

    > Can’t reproduce on XPSP2. I’m copying

    > it into CMD.EXE console window, and

    > echo actually copies the bullet character.

    In the command window properties if the

    Font is chosen as Lucida it would print a

    bullet, otherwise if raster fonts is

    chosen it would sound a bell as Raymond

    noted.

    As pointed by Johannes Rössel in 5th reply.

  19. Buck Hodges says:

    Way back near the beginning of development of TFS version control, which was called Hatteras back then,

  20. Igor says:

    Happy New Year to Raymond. Perhaps associating bullet to beep is intentional ;)

Comments are closed.