Don't Roundtrip Ciphertext Via a String Encoding

One common mistake that people make when using managed encryption classes is that they attempt to store the result of an encryption operation in a string by using one of the Encoding classes.  That seems to make sense right?  After all, Encoding.ToString() takes a byte[] and converts it to a string which is exactly what they were looking for.  The code might look something like this:

    public static string Encrypt(string data, string password)
    {
        if(String.IsNullOrEmpty(data))
            throw new ArgumentException("No data given");
        if(String.IsNullOrEmpty(password))
            throw new ArgumentException("No password given");

        // setup the encryption algorithm
        Rfc2898DeriveBytes keyGenerator = new Rfc2898DeriveBytes(password, 8);
        Rijndael aes = Rijndael.Create();
        aes.IV = keyGenerator.GetBytes(aes.BlockSize / 8);
        aes.Key = keyGenerator.GetBytes(aes.KeySize / 8);

        // encrypt the data
        byte[] rawData = Encoding.Unicode.GetBytes(data);
        using(MemoryStream memoryStream = new MemoryStream())
        using(CryptoStream cryptoStream = new CryptoStream(memoryStream, aes.CreateEncryptor(), CryptoStreamMode.Write))
        {
            memoryStream.Write(keyGenerator.Salt, 0, keyGenerator.Salt.Length);
            cryptoStream.Write(rawData, 0, rawData.Length);
            cryptoStream.Close();

            byte[] encrypted = memoryStream.ToArray();
            return Encoding.Unicode.GetString(encrypted);
        }
    }

    public static string Decrypt(string data, string password)
    {
        if(String.IsNullOrEmpty(data))
            throw new ArgumentException("No data given");
        if(String.IsNullOrEmpty(password))
            throw new ArgumentException("No password given");

        byte[] rawData = Encoding.Unicode.GetBytes(data);
        if(rawData.Length < 8)
            throw new ArgumentException("Invalid input data");
        
        // setup the decryption algorithm
        byte[] salt = new byte[8];
        for(int i = 0; i < salt.Length; i++)
            salt[i] = rawData[i];

        Rfc2898DeriveBytes keyGenerator = new Rfc2898DeriveBytes(password, salt);
        Rijndael aes = Rijndael.Create();
        aes.IV = keyGenerator.GetBytes(aes.BlockSize / 8);
        aes.Key = keyGenerator.GetBytes(aes.KeySize / 8);

        // decrypt the data
        using(MemoryStream memoryStream = new MemoryStream())
        using(CryptoStream cryptoStream = new CryptoStream(memoryStream, aes.CreateDecryptor(), CryptoStreamMode.Write))
        {
            cryptoStream.Write(rawData, 8, rawData.Length - 8);
            cryptoStream.Close();

            byte[] decrypted = memoryStream.ToArray();
            return Encoding.Unicode.GetString(decrypted);
        }
    }
}

The first mistake some people make is to use ASCII encoding.  This will nearly always fail to work since ASCII is a seven bit encoding, meaning any data that is stored in the most significant bit will be lost.  If your cipherdata can be guaranteed to contain only bytes with values less than 128, then its time to find a new encryption algorithm :-)

So if we don't use ASCII we could use UTF8 or Unicode right?  Those both use all eight bits of a byte.  In fact this approach tended to work with v1.x of the CLR.  However a problem still remains ... just because these encodings use all eight bits of a byte doesn't mean that every arbitrary sequence of bytes represents a valid character in them.  For v2.0 of the framework, the Encoding classes had some work done so that they explicitly reject illegal input sequences (As the other Shawn-with-a-w discusses here).  This leads to bad code that used to work (due to v1.1 not being very strict) to start failing on v2.0 with exceptions along the line of:

Unhandled Exception: System.Security.Cryptography.CryptographicException: Length of the data to decrypt is invalid.
   at System.Security.Cryptography.RijndaelManagedTransform.TransformFinalBlock(Byte[] inputBuffer, Int32 inputOffset, Int32 inputCount)
   at System.Security.Cryptography.CryptoStream.FlushFinalBlock()
   at System.Security.Cryptography.CryptoStream.Dispose(Boolean disposing)
   at System.IO.Stream.Close()

Which at first glance looks like the CryptoStream is broken.  However, take a closer look at what's going on.  If we check the encrypted data before converting it into a string in Encrypt and compare that to the raw data after converting back from a string in Decrypt we'll see something along the lines of:

encrypted=array [72] { 111, 49, 30, 0, 29 .... }
rawData=array [68] { 111, 49, 30, 0, 8, ... }

So round tripping through the Unicode encoding caused our data to become corrupted.  That the decryption didn't work due to having an incomplete final block is actually a blessing -- the worst case scenario here is that you end up with some corrupted ciphertext that can still be decrypted -- just to the wrong plaintext.  That results in your code silently working with corrupt data.

You might not see this error all the time either, sometimes you might get lucky and have some ciphertext that is actually valid in the target encoding.  However, eventually you'll run into an error here so you should never be using the Encoding classes for this purpose.  Instead if you want to convert the ciphertext into a string, use Base64 encoding.  Replacing the two conversion lines with:

    byte[] encrypted = memoryStream.ToArray();
    return Convert.ToBase64String(encrypted);

    byte[] rawData = Convert.FromBase64String(data);

Results in code that works every time, since base 64 encoding is guaranteed to be able to accurately represent any input byte sequence.