Support non-UTF payloads in Logic App with a conversion Azure Function

Article
02/27/2018

Much of cloud services assume that a text payload will be some form of UTF (unicode) encoding. Some even do assume it is UTF-8. Such that when your text payload is in a different encoding, such as a page code based encoding, the non-basic latin characters gets mangled. This is particularly common with Flat Files because they integrate with ancient systems that often were not written with Unicode support.

To avoid this issue, base 64 encode the non-Unicode text payload as soon as you receive it in a cloud service. Then the following helper Azure Function code will enable you to convert any .NET supported encoding to any other .NET supported encoding, such as UTF-8 (which I recommend for Azure services). Do the opposite if you need to send non-Unicode text payload: keep it as UTF-8 as long as possible, then at the edge of your system, use this function to convert from based 64 encoded UTF-8 to any required encoding which will be again base 64 encoded. Just do the base 64 decode after the helper function and send the payload out.

 using System;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Host;
using Newtonsoft.Json;

namespace ConvertEncoding
{
    public static class Function1
    {
        [FunctionName("Function1")]
        public static async Task<object> Run([HttpTrigger(WebHookType = "genericJson")]HttpRequestMessage req, TraceWriter log)
        {
            log.Info($"Webhook was triggered!");
 
            Encoding inputEncoding = null;
 
            string jsonContent = await req.Content.ReadAsStringAsync();
            dynamic data = JsonConvert.DeserializeObject(jsonContent);
 
            if (data == null || data.text == null || data.encodingInput == null || data.encodingOutput == null)
            {
                return req.CreateResponse(HttpStatusCode.BadRequest, new
                {
                    error = "Please pass text/encodingOutput properties in the input Json object."
                });
            }
 
            try
            {
                string encodingInput = data.encodingInput;
                inputEncoding = Encoding.GetEncoding(name: encodingInput);
            }
            catch (ArgumentException)
            {
                return req.CreateResponse(HttpStatusCode.BadRequest, new
                {
                    error = "Input char set value '" + data.encodingInput + "' is not supported. Supported value are listed at https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx."
                });
            }
 
            Encoding encodingOutput = null;
            try
            {
                string outputEncoding = data.encodingOutput;
                encodingOutput = Encoding.GetEncoding(outputEncoding);
            }
            catch (ArgumentException)
            {
                return req.CreateResponse(HttpStatusCode.BadRequest, new
                {
                    error = "Output char set value '" + data.encodingOutput + "' is not supported. Supported value are listed at https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx."
                });
            }
 
            string input = data.text;
            var outputBytes = Encoding.Convert(srcEncoding: inputEncoding, dstEncoding: encodingOutput, bytes: Convert.FromBase64String(input));
 
            var response = req.CreateResponse(HttpStatusCode.OK);
            response.Content = new StringContent(content: Newtonsoft.Json.JsonConvert.SerializeObject(new
            {
                text = Convert.ToBase64String(outputBytes)
            }).ToString(), encoding: encodingOutput, mediaType: "application/json");
 
    return response;
        }
    }
}

Sample input (this is a Belgium/French street name with accentuated characters):

 {  
    "text": "UnVlIEJlYXUtU8Opam91cg==",
    "encodingInput": "utf-8",
    "encodingOutput": "windows-1252"
}

Outputs:

 {
    "text": "UnVlIEJlYXUtU+lqb3Vy"
}

And if you flip the input, output and encoding you get back the original encoding.

Base 64 encoding ensures nobody on the wire is going to assume a UTF encoding. Once you convert the any non-UTF flat file such (as windows-1252) to utf-8, you can base 64 decode it safely and process it with flat file decode.

Update 05/16/2018: This solution is also applicable to providing non-UTF-8 payload to encode to the AS2 connector. Not doing an encoding conversion while passing such content to AS2 may result in MIC hash mismatch between the partners as the non-UTF-8 payload will not be interpreted correctly (alike mangled / garbled characters).

Support non-UTF payloads in Logic App with a conversion Azure Function

Additional resources