Converting a text file from one encoding to another

Buck Hodges

May 18th, 20040 0

The .NET framework handles file encodings very nicely. Not too long ago, I needed to convert files from one encoding to another for a library that didn’t handle the encoding of the original file. Since someone on an internal alias asked about doing this a couple of weeks ago, I thought it would be useful to post it here.

The .NET runtime uses Unicode as the encoding for all strings. The StreamReader and StreamWriter classes in System.IO take an Encoding as a parameter. So, to convert from one encoding to another, we just need to specify the original encoding and read the file contents into a string followed by writing out the string in the desired encoding.

The Path class, also in System.IO, provides us with an easy way to create temporary files in the Windows temporary directory. We can write the results to a temporary file so that if anything goes wrong, the destination file is not overwritten. Also, it allows the conversion to work when the source and destination are the same file.

StreamReader allows us to read the source file in blocks so that we don’t have any size limitations on the file that need to convert.

The Main() method below is just a trivial wrapper to call the ConvertFileEncoding()since it wasn’t oringally a standalone app.

// Example: convert test.cs test-conv.cs ascii utf-8

using System;
using System.IO;
using System.Text;
public class Convert
{
    public static void Main(String[] args)
    {
        // Print a simple usage statement if the number of arguments is incorrect.
        if (args.Length != 4)
        {
            Console.WriteLine(“Usage: {0} inputFile outputFile inputEncoding outputEncoding”,
                              Path.GetFileName(Environment.GetCommandLineArgs()[0]));
            Environment.Exit(1);
        }
        ConvertFileEncoding(args[0], args[1], Encoding.GetEncoding(args[2]),
                            Encoding.GetEncoding(args[3]));
    }
    /// <summary>
    /// Converts a file from one encoding to another.
    /// </summary>
    /// <param name=”sourcePath”>the file to convert</param>
    /// <param name=”destPath”>the destination for the converted file</param>
    /// <param name=”sourceEncoding”>the original file encoding</param>
    /// <param name=”destEncoding”>the encoding to which the contents should be converted</param>
    public static void ConvertFileEncoding(String sourcePath, String destPath,
                                           Encoding sourceEncoding, Encoding destEncoding)
    {
        // If the destination’s parent doesn’t exist, create it.
        String parent = Path.GetDirectoryName(Path.GetFullPath(destPath));
        if (!Directory.Exists(parent))
        {
            Directory.CreateDirectory(parent);
        }
        // If the source and destination encodings are the same, just copy the file.
        if (sourceEncoding == destEncoding)
        {
            File.Copy(sourcePath, destPath, true);
            return;
        }
        // Convert the file.
        String tempName = null;
        try
        {
            tempName = Path.GetTempFileName();
            using (StreamReader sr = new StreamReader(sourcePath, sourceEncoding, false))
            {
                using (StreamWriter sw = new StreamWriter(tempName, false, destEncoding))
                {
                    int charsRead;
                    char[] buffer = new char[128 * 1024];
                    while ((charsRead = sr.ReadBlock(buffer, 0, buffer.Length)) > 0)
                    {
                        sw.Write(buffer, 0, charsRead);
                    }
                }
            }
            File.Delete(destPath);
            File.Move(tempName, destPath);
        }
        finally
        {
            File.Delete(tempName);
        }
    }
}