What is the deal with the ES_OEMCONVERT flag?


The ES_OEMCONVERT edit control style is a holdover from 16-bit Windows. This ancient MSDN article from the Windows 3.1 SDK describes the flag thus:

ES_OEMCONVERT causes text entered into the edit control to be converted from ANSI to OEM and then back to ANSI. This ensures proper character conversion when the application calls the AnsiToOem function to convert a Windows string in the edit control to OEM characters. ES_OEMCONVERT is most useful for edit controls that contain filenames.

Set the wayback machine to, well, January 31, 1992, the date of the article.

At this time, the predominant Windows platform was Windows 3.0. Windows 3.1 was still a few months away from release, and Windows NT 3.1 was over a year away. The predominant file system was 16-bit FAT, and the relevant feature of FAT of this era for the purpose of this discussion is that file names were stored on disk in the OEM character set. (We discussed the history behind the schism between the OEM and ANSI code pages in an earlier article.)

Since GUI programs used the ANSI character set, but file names were stored in the OEM character set, the only characters that could be used in file names from GUI programs were those that exist in both character sets. If a character existed in the ANSI character set but not the OEM character set, then there would be no way of using it as a file name; and if a character existed in the OEM character set but not the ANSI character set, the GUI program couldn't manipulate it.

The ES_OEMCONVERT flag on a edit control ensures that only characters that exist in both the ANSI and OEM character sets are used, hence the remark "ES_OEMCONVERT is most useful for edit controls that contain filenames".

Fast-forward to today.

All the popular Windows file systems support Unicode file names and have for ten years. There is no longer a data loss converting from the ANSI character set to the character set used by the file system. Therefore, there is no need to filter out any characters to forestall the user typing a character that will be lost during the conversion to a file name. In other words, the ES_OEMCONVERT flag is pointless today. It's a leftover from the days before Unicode.

Indeed, if you use this flag, you make your program worse, not better, because it unnecessarily restricts the set of characters that the user will be allowed to use in file names. A user running the US-English version of Windows would not be allowed to enter Chinese characters as a file name, for example, even though the file system is perfectly capable of creating files whose names contain those characters.

Comments (14)
  1. michkap says:

    Though there are people who speak fluent question mark, a dialect created by non-Unicode applications.

  2. Rosyna says:

    So I wonder, why isn’t this flag just ignored on Windows 5.x? If it can only do bad things and is pointless, I mean?

  3. Matt Green says:

    There may be legacy applications that are expecting filenames in the OEM format, and simply disabling the flag could break them.

  4. binaryc says:

    So I guess the next question is why doesn’t MSDN have "Do not use this flag" in bright red text?

  5. There are still valid uses for the flag, as the current documentation points out: This style is most useful for edit controls that contain file names that will be used on file systems that do not support Unicode.

  6. I can easily imagine a situation in which this flag would still be useful.

    If your application created a boot floppy, where the operating system and OS on the floppy used FAT16 (perhaps because it has no concerns about Y2K), and you wanted to make sure files added to the boot floppy did not have invalid file names, you might want to use such a flag.

    Although, with the ubiquity of CD-ROM burners, one would hope floppies are on their way out.

  7. Miles Archer says:

    Ok. So will this flag be supported forever? Does that make sense to you? It doesn’t for me. There should be some way to drop a feature that has long since outlived it’s usefulness.

    I understand and agree with the logic not to break legacy programs, but if there is no way to drop useless items, eventually there will be nothing left but useless items.

  8. Mongo says:

    The only way to drop a feature that’s no longer useful is to, first of all, drop everything that still uses that feature even if it offends you.

    There are still old applications that will break if that flag suddenly starts getting ignored.

    On the other hand, a simple #ifdef IGNORE_DEPRECATED_FEATURES in the platform SDK header files around some of these things would be a good way to encourage new code to do things a better way.

    (Probably not exactly a trivial change to make, but it seems like a good idea anyway.)

  9. Cheong says:

    One "maybe" related program.

    It happens to me that we have a VB6 program that will "shell execute" WGET program to synchronize a remote directory via FTP.(With CHT charset on local WinXP and CHS on remote Win2003 SBS) And we found that the filenames download with -m option becomes garbage characters.

    Perheps it’ll be better if we restrict the filenames in English charset. (I know that we can do it inside the program, anyway)

  10. Norman Diamond says:

    All the popular Windows file systems support

    > Unicode file names and have for ten years.

    True.

    > There is no longer a data loss converting

    > from the ANSI character set to the character

    > set used by the file system.

    False. There are cases where IE, running under Windows 2000, viewed a page on Microsoft’s site (or others) and saved the page to a disk file (in VFAT16 or FAT32), after which applications in Windows 2000 (IE or others) could open the file but applications in Windows 98 couldn’t. This is even when Windows 2000 and Windows 98 were the same language version.

    It is also rather trivial to connect an external hard drive to a machine running Windows 98 or 2000 or ME or XP and create a file, then connect the drive to a machine running a different language version of Windows 98 or ME and be unable to access the file. So we could almost say that Windows systems for the past 5 years instead of 10 avoid the problem. Almost, except for ME.

  11. binaryc says:

    Does ME really count as an operating system?

  12. nikos says:

    There is no longer a data loss converting

    > from the ANSI character set to the character

    > set used by the file system

    that’s only for unicode executables

    take any ansi file manager browsing folder with non-english filenames (e.g. greek) and they will appear filled with question marks. Something that easily escapes notice unless you are supplying code to non-english locales!

  13. Nikos: A valid but irrelevant point. That data loss you’re seeing is not in the conversion from ANSI to Unicode the file system character. Rather it’s the conversion in the opposite direction that is lossy.

  14. Bryan says:

    Mongo:

    > On the other hand, a simple #ifdef

    > IGNORE_DEPRECATED_FEATURES in the platform SDK

    > header files around some of these things would

    > be a good way to encourage new code to do

    > things a better way.

    FWIW, Gtk+ and some of its dependent libs (atk, gdk, perhaps a few of the others) have something exactly like this. Some releases deprecate some widgets (for example, the GtkTreeView has obsoleted several old widgets, like the GtkCList/GtkCTree), and if you #define GTK_DISABLE_DEPRECATED before you #include the main Gtk+ header file (or you use -DGTK_DISABLE_DEPRECATED on the gcc command line), then you won’t get the definitions for those widgets’ structs, or prototypes for the supporting functions.

    The functions and widgets are still available in the library so it doesn’t break binary compatibility, but programs that define <whatever>_DISABLE_DEPRECATED won’t be able to use them.

Comments are closed.