We have some “best-fit” behavior which we generally consider to be “bad”. Any loss of data is generally a bad thing, so we recommend storing data in Unicode (so you don’t lose anything). Assuming you can’t use Unicode, why is it so bad to just make everything ASCII-like? Maybe you have a published house or direct marketing firm that can’t handle Unicode, so you’ll just get rid of those annoying decorations.
In American English the diacritics are effectively quaint decorations. Many people naïvely assume that when word auto-corrects naive to naïve that this is just a prettiness factor. When they resume spell checking their résumé the diacritics become more important. In English its fair to spell résumé as resume, but it seems cooler to add the accents. Since we stole (borrowed is more politically correct) the word from French, we have a french-like pronunciation of résumé, and aren’t likely to confuse it with resume.
In most other languages diacritics aren’t optional. You wouldn’t exchange a z with an s in english just because they look similar. “A real singer” is a lot different than “a real zinger”.
Recently I encountered the the following example, a user wanted to get around those pesky diacritics by mapping to ASCII.
The suggested input was:
último año de carrera
The desired output was:
ultimo ano de carrera
My Spanish is nearly non-existent, however word’s spell checker tells me these are all legitimate Spanish words, even without the accents. The meaning goes from something like “the last year of the race” to “I completed the anus of the race.”
Now imagine that you’re trying to reach a new market and you do that to your customer’s names or potential customer’s names, how long will they remain your customer?