Theorem 3 is a corollary to Theorem 2, the two basically go together.
One of the things that spam filters must do is catch as much spam as possible. This would be fairly easy if it weren’t for the fact that a great deal of spam contains content that can routinely be found in legitimate messages or resembles a message that appears to be legitimate. Social engineering spam advertising products is notorious for this. Consider the following example:
Hi, how’s it going with your weight loss program? I wasn’t doing too well with it myself, but I thought I would give it a whirl. You should try going to the following site.
Talk to you later.
On the surface of this, it appears to be very legitimate. Each of those sentences in and of themselves are perfectly reasonable. They might be used in a casual conversation between people. A spam filter would need to be able to interpret the context of the message in order to make a determination of whether or not this is spam. Even a person, if not familiar with this type of spam, might be fooled into clicking on the above link (tsk, tsk).
Another point of contention is examining the content of email and classifying as spam all mail that contains poor grammar. Some time ago, one of my management superiors jokingly remarked that we ought to classify all email as spam that uses butchered grammar, but then he realized some of his messages would be classified as spam as well. It was a comment said in jest but he actually hit the nail on the head – people, in everyday conversations, can and do use grammar that is exceedingly poor. Emoticons and internet-speak (abbreviations like LOL and LMAO) are only the tip of the iceberg. People will often use poor grammar when referring to each other, “forget” to include punctuation, call each other names, and so forth. Email that looks butchered to some people is capable of being interpreted by others. Thus, simply targetting bad message composition structure is not good enough grounds alone to make a decision on the classification of spam because while spam can contain that content, so can regular mail.
Theorems 2 and 3 are two sides of the same coin. A spam filter cannot unilaterally be overly aggressive because doing so would result in flagging legitimate mail that uses the same patterns.