Comments (12)
  1. Ben Hutchings says:

    Why do Microsoft employees call the system’s single-byte or multi-byte character encoding "ANSI" when it is never (AFAIK) an ANSI standard encoding?

  2. Steve Sheppard [MSFT] says:

    I always call ’em ASCII….

  3. Steve Sheppard [MSFT] says:

    Sorry, I misread your post. I thoguht you were just referring to single byte.

  4. Why do Microsoft employees call the system’s

    > single-byte or multi-byte character

    > encoding "ANSI" when it is never (AFAIK) an

    > ANSI standard encoding?

    Because it’s ISO-8859-1, which was created around 1987 by ISO/ANSI.

  5. J. Edward Sanchez says:

    Actually, Windows Latin 1 (a.k.a. Windows-1252, or CP1252) is different from ISO-8859-1; it contains 27 printable glyphs in the 80h-to-9Fh range, where ISO-8859-1 contains nonprintable control codes.

    The characters defined in ISO-8859-1 correspond exactly to the first 256 Unicode code points, while the Windows Latin 1 characters in the 80h-to-9Fh range correspond to Unicode code points scattered all over the place.

  6. Actually we typically call them ANSI because the actual interpretation of CHAR * strings is subject to CP_ACP ("the ANSI code page").

    Why is it called the ANSI code page? I dunno. I feel fortunate to have avoided the whole 16-bit era myself except for a few questions when I interviewed.

    ASCII is a 7-bit character set; and most code assumes that codes in the range 0-127 in a MBCS environment are the ASCII equivalents. I think that this assumption is so widely distributed that this is probably why we don’t have very good MBCS support for encodings where this assumption is not true.

  7. Yuri Khan says:

    > Why is it called the ANSI code page?

    Why *they* are called the ANSI code page, better to say, because there isn’t one fixed “ANSI” code page, there are lots, and it depends on the locale.

  8. Norman Diamond says:

    Base note:

    > If your 32-bit DLL contains strings longer

    > than 255 characters, then 16-bit programs

    > would be unable to read those strings.

    You mean 255 bytes. Depending on the actual characters, anywhere from 128 to 255 of them might be too many for a 16-bit program (when using these APIs). Microsoft still confuses "character" with "byte" too often. Now wait right there, you’re not getting off that easily.

    3/19/2004 12:45 PM Steve Sheppard [MSFT]:

    > I always call ’em ASCII….

    Only one ANSI code page is ASCII. The other ANSI code pages are not ASCII. I hope Mr. Chen gives you a stern lecture as soon as he finishes giving himself one.

  9. J. Edward Sanchez says:

    The term "ANSI" is commonly used to refer specifically to the Windows Latin 1 code page, which is also known as Windows-1252 and CP1252. It is not to be confused with ISO-8859-1, which, as I mentioned in an earlier post, contains more control codes but fewer printable characters.

    It should also be noted that many Windows code pages are indeed ASCII — or, to be more precise, supersets of ASCII. ASCII is a 7-bit character set, and many Windows code pages (including Latin 1 "ANSI") simply supplement ASCII by adding an eighth bit and up to 128 additional characters.

  10. Norman Diamond says:

    3/21/2004 6:12 PM J. Edward Sanchez:

    > The term "ANSI" is commonly used to refer

    > specifically to the Windows Latin 1 code page

    All through MSDN, the term "ANSI code pages" refer to all ANSI code pages.

    > It should also be noted that many Windows

    > code pages are indeed ASCII — or, to be

    > more precise, supersets of ASCII.

    Yup. Many are. Also, many aren’t.

    Of those that aren’t, many come close. Here’s one example: Among all the one-byte and two-byte characters of ANSI code page 932, 126 of the values are officially the same as ASCII values, and one more value is practically the same (no one minds that it displays as a tilde even though officially it’s an overline). For practical purposes only one of the values below 127, and all of the values in several ranges between 128 and 65535, are wildly different from ASCII.

    Meanwhile, "ANSI" doesn’t mean "ANSI code page 437" or "ANSI code page 850" or whichever you had in mind, ANSI code pages still mean all ANSI code pages.

  11. Ben Hutchings says:

    Norman: Code pages 437 and 850 are IBM code pages and can be the "OEM code page" on some machines. If I understand correctly, the "OEM code page" is the one that the BIOS uses and that DOS and NT consoles use by default.

    So far as I know ISO 8859-1 has nothing to do with ANSI – it is based on the DEC Multinational Character Set and Roman Czyborra says it was originally standardised by ECMA.

    ASCII was of course an ANSI (or ASA as it was back then) standard, and Windows "ANSI" code pages are based on ASCII, but then so are the OEM code pages, so that doesn’t explain it either.

Comments are closed.