Why is the default console codepage called “OEM”?


Last year, we learned that the ANSI code page isn’t actually ANSI. Indeed, the OEM code page isn’t actually OEM either.

Back in the days of MS-DOS, there was only one code page, namely, the code page that was provided by the original equipment manufacturer in the form of glyphs embedded in the character generator on the video card. When Windows came along, the so-called ANSI code page was introduced and the name “OEM” was used to refer to the MS-DOS code page. Michael Kaplan went into more detail earlier this year on the ANSI/OEM split.

Over the years, Windows has relied less and less on the character generator embedded in the video card, to the point where the term “OEM character set” no longer has anything to do with the original equipment manufacturer. It is just a convenient term to refer to “the character set used by MS-DOS and console programs.” Indeed, if you take a machine running US-English Windows (OEM code page 437) and install, say, Japanese Windows, then when you boot into Japanese Windows, you’ll find that you now have an OEM code page of 932.

Comments (7)
  1. Great! I’ve always wondered why Microsoft uses the terms "ANSI" and "OEM", seemingly ignoring the fine registry at http://www.iana.org/assignments/character-sets

    Could you also please explain where the term "code page" came from? I find it a bit confusing, and prefer "character set" and "character encoding" as defined by the W3C at http://www.w3.org/International/resource-index

  2. LarryOsterman says:

    Code Page is a term that IBM adopted for PC-DOS 3.3 when they ported their NLS system from their mainframes to PC-DOS (IBM did all the development work for PC-DOS 3.3 and PC-DOS 4.0).

    I’m not sure where IBM got the phrase from.

  3. Mihai says:

    For Christoffer:

    >> prefer "character set" and "character encoding"

    Code page is not mean the same thing with "character set" or "character encoding" (which are also not the same).

    Code page matches "coded character set" in the UNIX world.

    You can talk about the "Latin script" or "Cyrillic character set". This is the collection of characters, no numbers associated. Mac and Windows can use different char-to-number mappings (code pages/coded character sets) for the same character set (charset).

    This is why for a font you select the charset.

    Once you map characters to numeric values, it becomes a "coded character set" (or code page).

    Then you might have several ways to represent the same code page. These are "encodings". UTF-7, UTF-8, UTF-16, UTF-32, "Java escaped (u3213)", MIME, Base-64 can be are different encodings of the same coded character set. Same as decimal, hex, binary, octal, are different representations for the same number.

    For a long time the Unicode Consortium confused "coded character set" with "encoding", now they are starting to fix it.

  4. stu says:

    "Back in the days of MS-DOS, there was only one code page, namely, the code page that was provided by the original equipment manufacturer in the form of glyphs embedded in the character generator on the video card."

    Maybe in the really early days of MS-DOS, but by DOS 5 at least and probably before, there were loadable code pages.

  5. Cheong says:

    [quote]

    Maybe in the really early days of MS-DOS, but by DOS 5 at least and probably before, there were loadable code pages.

    [/quote]

    Yes. Seems "country.sys" and "nlsfunc" command is avaliable only after MSDOS 3.3.

  6. Matthew Lock says:

    Does anyone know why Japanese DOS uses the yen sign as the path separator rather than forward slash?

Comments are closed.