Sender authentication part 30: The canonicalization process

Canonicalization is the process of preparing a message for signing.  This process is necessary because of the way email is handled in transit by various mail servers.  For example, some mail relayers handle white space and line wraps just fine, others do not and strip them or insert them.  All email was once 7-bit ASCII and now most of it is 8-bit ASCII.  What happens if the message is forwarded through one and then the other? 

The intent of canonicalization is to make a minimal transformation of the message for the purpose of signing the message (the actual message itself is not changed).  Thus, DomainKeys specifies two types of methods of canonicalizing the message (man, there are a lot of red, wavy underlines in my LiveWriter... let's hit Ignore All... ah, that's better, maybe I should stop inventing words as I go along).

From RFC 4870:

The "simple" Canonicalization Algorithm

  • Each line of the email is presented to the signing algorithm in the order it occurs in the complete email, from the first line of the headers to the last line of the body.
  • If the "h" tag is used, only those header lines (and their continuation lines if any) added to the "h" tag list are included.
  • The "h" tag only constrains header lines. It has no bearing on body lines, which are always included.
  • Remove any local line terminator.
  • Append CRLF to the resulting line.
  • All trailing empty lines are ignored. An empty line is a line of zero length after removal of the local line terminator.

If the body consists entirely of empty lines, then the header/body line is similarly ignored. For those of you who don't understand geek-speak, CRLF means "carriage-return line-feed", which is the equivalent of hitting "enter" on your keyboard to wrap the line.

The "nofws" Canonicalization Algorithm

The "No Folding Whitespace" algorithm (nofws) is more complicated than the "simple" algorithm for two reasons; folding whitespace is removed from all lines and header continuation lines are unwrapped.

  • Each line of the email is presented to the signing algorithm in the order it occurs in the complete email, from the first line of the headers to the last line of the body.
  • Header continuation lines are unwrapped so that header lines are processed as a single line.
  • If the "h" tag is used, only those header lines (and their continuation lines if any) added to the "h" tag list are
     included.
  • The "h" tag only constrains header lines. It has no bearing on body lines, which are always included.
  • For each line in the email, remove all folding whitespace characters. Folding whitespace is defined in RFC 2822 as being the decimal ASCII values 9 (HTAB), 10 (NL), 13 (CR), and 32 (SP).
  • Append CRLF to the resulting line.
  • Trailing empty lines are ignored. An empty line is a line of zero length after removal of the local line terminator. Note that the test for an empty line occurs after removing all folding whitespace characters.

If the body consists entirely of empty lines, then the header/body line is similarly ignored.

So, we see that the process of canonicalization is arranging the headers and body of the message such that we can later reconstruct it to verify that the contents of it before sending it are the same as after receiving it.