Why is the OEM code page often called ANSI?


It has been pointed out that the documentation for the cmd.exe program says

/A Causes the output of internal commands to a pipe or file to be ANSI

even though the output is actually in the OEM code page. Why do errors such as this persist?

Because ANSI sometimes means OEM.

The "A" versions of the console functions accept characters in the OEM code page despite the "A" suffix that would suggest ANSI. What's more, if you call the SetFileAPIsToOEM function, then "A" functions that accept file names will also interpret the filenames in the OEM code page rather than the ANSI code page.

"There are two types of people in the world: Those who believe that the world can be divided into two types of people, and those who do not."

There are those who mentally divide the world of characters into two groups: Unicode and 8-bit. And as you can see, many of them were involved in the original design of Win32. There are "W" functions (Unicode) and "A" functions (ANSI). There are no "O" functions (OEM). Instead, the OEM folks got lumped in with the ANSI folks.

There are also those who realize the distinction, but out of laziness or convenience often use "ANSI" as an abbreviation for "an appropriate 8-bit character set, determined from context". In the context of console programming, the appropriate 8-bit character set is the OEM character set.

The person who wrote the online help for cmd.exe clearly meant ANSI to mean "That thing that isn't Unicode."

/A Causes the output of internal commands to a pipe or file to be ANSI
/U Causes the output of internal commands to a pipe or file to be Unicode

I'll leave you to decide whether this author belongs to the "Everything is either Unicode or ANSI" camp or the "just being casual" camp.

Related: Keep your eye on the code page.

Comments (15)
  1. DrPizza says:

    You’re all wrong.

    "Everything is Unicode."

  2. jeff robertson says:

    Maybe the author was of the "OEM is called ANSI because the OEM shading and box-drawing characters are frequently used in the ANSI ART scene" camp.

  3. Chris Becke says:

    nonono. it was the ASCII character art scene.

  4. Mihai says:

    I think the help is at least confusing.

    See also http://blogs.msdn.com/michkap/archive/2005/06/29/433669.aspx

    What I think it would be the right thing to do: change the help to say OEM and add an extra switch to produce "real" ANSI.

    But let’s hope Monad will come and fix everything :-)

  5. Ben Hutchings says:

    It’s not as if "ANSI" or "OEM" mean what they say, anyway. It would perhaps be more accurate to say that the "ANSI" APIs deal with byte strings. Then call the "ANSI" and "OEM" code pages the "default" and "DOS-compatible" code pages respectively, or something. Though I realise this is not going to happen, because when it comes to the details, backward compatibility always trumps usability.

  6. There are 10 types of people : Those who understand binary and those who don’t.

    Sorry, couldn’t resist ;-)

  7. foxyshadis says:

    ANSI art were those trippy animated color demos you could get by loading ansi.sys (text-mode screen manipulation routines) and a few batch files.

  8. Nick Lamb says:

    "Everything is Unicode."

    I hear that. ASCII, ISO 8859-1 and codepage 1252 are just poor encodings of Unicode that, like UCS-2, are restricted to a subset of the full range. Maybe Microsoft can bend the "no changes to existing code pages" rule one more time to squeeze in a Byte Order Mark and make it official.

  9. Chris Becke says:

    The next question is, what IS the ansi code page?

    Articles like this imply there is just one true ANSI codepag – which I thought was the case until string conversion problems on far east systems led me to discover GetACP and its implications.

  10. Norman Diamond says:

    > The person who wrote the online help for

    > cmd.exe clearly meant ANSI to mean "That

    > thing that isn’t Unicode."

    Which means that the person who wrote the online help didn’t know what he/she meant, since there are two[*] of those things that aren’t Unicode, and how can we know which of those two is "that thing".

    Meanwhile the posted evidence makes it pretty obvious that the help does need fixing.

    [* Though in Microsoft’s Japanese code pages and maybe some others, there’s only one.]

  11. dave says:

    Since everything is Unicode (c’mon people, it’s been around for 15 years now), I propose that the terminal ‘A’ in function names be redefined to mean ‘Archaic character representation’.

  12. dos says:

    CP850 4-ever!

  13. Norman Diamond says:

    Friday, October 28, 2005 3:30 AM by Chris Becke

    > The next question is, what IS the ansi code

    > page?

    Depending on your needs at the moment, it could be the one specified in a particular function call, it could be the one that was most recently set for the active thread, it could be the one that the user has set as their default, it could be the one that the administrator has set as the machine’s default, or it could be the one that Microsoft set as the default for whichever language version of Windows they put in the package.

    > Articles like this imply there is just one

    > true ANSI codepag

    There’s just one of interest at the moment when a program is calling for a conversion to or from Unicode, or writing to a DOS window, etc.

    There are two of interest at a moment when a program is calling for a conversion from one code page to another, if that’s possible.

    There’s a list of interest at a moment when a program is presenting a list for the user to choose from.

  14. Sam says:

    "There are two kinds of people, those who finish what they start and so on." – Robert Byrne

Comments are closed.