Here is one more –
For those of you using Windows, do the following:
1.) Open an empty notepad file
2.) Type “Bush hid the facts” (without the quotes)
3.) Save it as whatever you want.
4.) Close it, and re-open it.
This one is kindda intelligent, but I’ve read it so many times that not it fails to impress me any longer .. here is why this happens ..
Notepad actually looks for a header in the file that’s generally known as BOM (BOM tags the file encoding and the type)
for example here is the string “hello” in Unicode (little endian):
FF FE 48 00 65 00 6C 00 6C 00 6F 00
This is the Unicode (little-endian) encoding with BOM. The BOM (FF FE) serves two purposes: First, it tags the file as a Unicode document, and second, the order in which the two bytes appear indicate that the file is little-endian.
If a BOM is found then we are good, but if its not there notepad will use an API IsTextUnicode . This API uses various statistical and deterministic methods to make its determination, under the control of flags passed in the lpi parameter. When the function returns, the results of such tests are reported using the same parameter. The IS_TEXT_UNICODE_STATISTICS and IS_TEXT_UNICODE_REVERSE_STATISTICS tests use statistical analysis and these tests are not foolproof. So, sometime it fails ..I agree it’s not an ideal scenario, but apparently we don’t have too many choices .. Raymond explains it in great detail in Some files come up strange in Notepad and The Notepad file encoding problem, redux
Also, have a look at the following :