Why can I type a lowercase s with caron with the numeric keypad, but not a lowercase r with caron?


For concreteness, let's assume that you are using 437 as your OEM code page (which as we all know is not actually provided by the OEM) and 1252 as your ANSI code page (which as we all known is not actually the product of the American National Standards Institute).

You can use Alt+0154 to type a Latin small letter s with caron because position 154 in code page 1252 is the Latin small letter s with caron. On the other hand, lowercase r with caron does not exist in code page 1252, nor does it exist in code page 437, so if you want to type that character, you're out of luck. The Alt+nnn sequence lets you type characters from the OEM code page, and Alt+0nnn lets you type characters from the ANSI code page, but if the character you want is in neither of them, then those sequences aren't going to help you.

(As an experiment, I didn't write any motivating discussion. It's actually easier for me because coming up with a narrative to accompany a dry technical article is hard work. If I don't have to do it, so much the better for me.)

[Raymond is currently away; this message was pre-recorded.]

Comments (23)
  1. TJ says:

    I had VS2008 open when I read this, and just out of pure stupidity I typed Alt+0154 as a variable name and it totally accepts it. I suppose it should since it’s a valid character, but I didn’t expect that.

    I could really be mean to other programmers now.

  2. pcooper says:

    If this is an experiment, how will you determine its success or failure?

  3. Matteo Italia says:

    By the number of the random "Vista sucks" posts; the less, the better.

  4. Marquess says:

    So that’s where the leading zero comes from. Of course, this raises the question: How do I enter arbitrary Unicode characters? (BMP would suffice for now)

    In unrelated news: Most modern language standards allow Unicode characters in identifiers. Ada 2005 defines the constant π in one of its libraries, for example.

    pcooper is right. Where’s your control group?

  5. LongtimeListenerFirstTimeCaller says:

    What the heck is going on and why should I care?

  6. Leo Davidson says:

    Apparently there is a registry key which enables Unicode values to be entered using Alt sequences, although I could not get it to work in Windows 7 (perhaps things have changed, or maybe it needs a restart which I did not try):

    http://www.fileformat.info/tip/microsoft/enter_unicode.htm

    Found via here:

    http://en.wikipedia.org/wiki/Unicode_input#Hex_input

    Both pages list some other methods, too.

  7. Gabe says:

    For those like me who are wondering how to enter the lowercase r with caron if we have the right code page, here’s how: http://www.fileformat.info/info/unicode/char/0159/codepage_support.htm

  8. I miss the motivating discussion already; it’s a big part of what I enjoy about this blog (I may be a pure Linux user and developer, but I still like Raymond’s writing style, and you have to be a foolish zealot to dismiss Microsoft completely).

  9. Marquess says:

    One reboot later (crazy Windows Update …), I can confirm the hex input method in Windows 6.1. Nice.

  10. Alexandre Grigoriev says:

    "(which as we all know is not actually provided by the OEM) and 1252 as your ANSI code page (which as we all known is not actually the product of the American National Standards Institute)."

    I used to see a lot of cases when microsoft docs dutifully expanded those abbreviations. It seems they don’t do it much anymore, which is good. It’s so stupid when you see that.

  11. Bahbar says:

    @TJ: I don’t know which language you used, but it’s not valid standard c/c++. The only characters officially allowed are a subset of ASCII (and yes, Visual lets you use a wider set).

  12. Guillaume says:

    What the heck is going on and why should I care?

    Have a good trip, Raymond !  ;)

  13. laonianren says:

    @Bahbar: you seem to be confusing "standard" and "portable".

    The source character set for C++ is implementation defined.  But if you want portability you should only use characters from the "basic source character set", a list of most of the characters that can be encoded with ASCII.

    Note that the basic source character set is not a character encoding; it’s just a list of characters.  So it’s not a subset of ASCII, which is an encoding.

    </pedantry>

  14. Josh Smeaton says:

    re: Experiment (and focusing on the ‘side discussion’ not the technical one) –

    The difference between having a story or motivating discussion to accompany a dry technical article just happens to be the difference between a blog post and an MSDN documentation page. The reason I read these posts is for the fusion, though I totally get the apprehension of doing so.

  15. Steve D says:

    Experiment feedback: I didn’t mind the lack of a set up, but perhaps not for every entry.  So you could consider it a nice but not obligatory.

  16. Cheong says:

    Supplimentary note: If you want to type characters in other range, you may install "Chinese (Unicode)" input method and use it instead. (Not sure if such input method is available in other languages…)

  17. Worf says:

    Hrm… that would be really mean is to use one of the many lookalike characters from unicode in a variable name… people will never figure out why their code won’t compile. Copy-and-paste works, but no one knows why typing it out doesn’t.

  18. Anonymous Coward says:

    I think the set up for this story is completely unbelievable.

    As for the variable names with Czechian accents, this is perfectly possible in VB. In C too, it should be doable even when the compiler borks, simply by letting your editor read and write variable names in a special encoding, or by piping your source through a similar filter before compile.

  19. MatusHorvath says:

    Hmm, actually you said you deleted the motivational part, so I assume you did write it. I would enjoy reading it, as I enjoy everything you write, maybe a bit more since I am almost Czech :).

  20. tb says:

    Bummer. :(

    The motivating discussions are part of what I really enjoy about your articles. The meat and potatoes (core tech details) are good and essential, but the spice (flavor text) makes it that much more enjoyable.

  21. Neil says:

    Interestingly Character Map’s "Characters to copy" field allows you to type Alt+0345 for ř.

  22. Random832 says:

    @Bahbar – Standard C allows certain ranges of "universal character names" (i.e. unicode characters) to appear in identifiers, and there is a paragraph that  strongly implies that a definite mapping must exist from characters present in the source character set to universal character names. A brief skim of the C++ standard shows it seems to have the same language. [actually, it’s more explicit on the point that a mapping must exist] – see Annex E for what characters are allowed.

    @Neil "Interestingly Character Map’s "Characters to copy" field allows you to type Alt+0345 for ř." – yeah, that seems to be a RichEdit feature. Wordpad does the same thing.

  23. Ian Boyd says:

    "Why can I type a lowercase s with caron with the numeric keypad, but not a lowercase r with caron?" sounded like enough of a motivating discussion to me.

    i’m can’t think of anything more that could be said – but then again i’m not very entertaining.

Comments are closed.

Skip to main content