Ask Learn
Preview
Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms using trusted Microsoft documentation.
Please sign in to use Ask Learn.
Sign inThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
One of the reasons I always suggest "Use Unicode!" is that there are security problems converting between code pages.
One of the reasons I always suggest "Use Unicode!" is that there are security problems converting between code pages. In short if data is going to be converted between code pages after some sort of security validation is done, then that validation could be negated. This is true of lots of data transformations, but it seems to surprise people a lot when applied to code page transformations.
There are lots of reasons for this, but some are:
A related problem is the IDN and code page parsing that browsers sometimes do. & named and numeric entities in HTML can end up with a different appearance. % escaping is common in URLs, and IDN xn-- encoding happens in domain names. An application may decode these, even at unexpected times, and cause problems if the data was assumed to be in a different state before the decoding.
So the moral is: Do any security tests after any conversions have been done. If you have to retransmit the data, try to use an encoding like Unicode that has fewer edge case behaviors that could trip you up. If possible, revalidate the data after the transmission if it has to be decoded.
Ask Learn is an AI assistant that can answer questions, clarify concepts, and define terms using trusted Microsoft documentation.
Please sign in to use Ask Learn.
Sign in