Dealing with control/junk characters in the message body using code

Article
03/28/2018

Emails, meeting invitations and NDR messages sometimes may have control characters. Those characters would likely have been added by the sending application or API which was not written properly (due to a bug or encoding setting issue). It's also possible something in the transport path of the message has altered the body and have added such characters. Add-ins on the sending side of an email may alter the body of a message and introduce issues – an example is an add-in which will add message footer text. Also, please keep in mind that some controls are sensitive to control characters and may not display characters correctly, truncate text due to a NULL or other control character or throw an error.

In contrast, if you are seeing more than just a few control characters and the whole body looks like gibberish then something else may be the culprit. If the full message body (or body) is scrambled (looks like gibberish or fake Chinese) then there may be issues with the language encoding settings in the email or with what is being used to view body.

Generally, Exchange stores the body it gets without removing control characters. Code should always read and handle issues with control character in bodies (usually the action is to remove them). Exchange does not scan and remove control (i.e. junk) characters from bodies. Outlook is a very mature email client which is built to handle many types of bad data and from what I've seen it usually does a very good job of removing the problem characters when the message is read or when forwarded.

The way to make your code reliable is to read the full text and remove the problem characters. You should also consider following up on what added those characters and ask the application's owner to fix their code.

Dealing with control/junk characters in the message body using code

Additional resources