Microsoft Dynamics CRM, Email correlation and smart matching

What is correlation and why is it required.

One of the important scenarios in email management within CRM is to have the incoming email get associated with the correct object it’s regarding to. Consider the scenario where you have created an email related to a case and sent to a customer. The customer responds to the email. The incoming email is tracked in CRM and should now get automatically associated with the same case it is being responded to.

We take a two step approach in finding out the correct regarding object for an incoming email. The first steps is to find the correlated outgoing email to which the customer has responded and the next step is to get the regarding object out of the co-related email and set it on the incoming email.

How was correlation done in CRM 3.0?

In CRM 3.0 every outgoing email from CRM was suffixed with a CRM token in its subject. The CRM token was in the format CRM:0001001 and was configurable via the system settings. When an incoming email was tacked in CRM the email would be checked for the presence of CRM token. If one was found, the system will then looks for the most recent email with the same email token to correlate the two. Once correlation is done, the regarding object of the correlated email if found was set on the incoming email.

How is correlation done in CRM 4.0?

Most of our customers did not want to have a fancy looking token suffixed to the subject line of every email sent out of CRM. So in CRM 4.0 we introduced a new concept of smart matching that is used to correlated emails. The usage of email token is optional and can be configured though system settings. The following blog article talks about it.

But there is subtle difference in how the email token is used in CRM 3.0 and CRM 4.0 version. In CRM 3.0 the presence of the token was the only way to identify and correlated emails. In CRM 4.0 the presence of the token only increases the accuracy of the correlation but does not determine it. Thus it’s possible that an incoming email having an email token does not get correlated to the outgoing email with the same email token. This is especially true if the customer has updated the subject of the email, but retained the token thinking it would be ok.

How does smart matching work:

Smart matching relies completely on the existence of similarity between emails. The subject and recipients (from, to, cc and bcc) list are the two important components that are considered with checking for similarity.

When an email is sent from CRM, there are two sets of hashes generated for it and stored in the database.

a. Subject hashes:

To generate subject hashes, the subject of the email, which may include the CRM token if its usage is enabled in system settings, is first checked for noise words like RE: FW: etc. The noise words are stripped off the subject and then tokenized. All the non empty tokens (words) are then hashed to generate subject hashes.

b. Recipient hashes:

To generate the recipient hashes the recipient (from, to, cc, bcc) list is analyzed for unique email addresses. For each unique email address an address hash is generated.

Next when an incoming email is tracked (arrived) in CRM, the same method is followed to create the subject and recipient hashes.

To find the correlation between the incoming email and the outgoing email the stored subject and recipient hashes are searched for matching values. Two emails are correlated if they have the same count of subject hashes and at least two matching recipient hashes.

How can smart matching be configured?

One size never fits all and so the above described constrain for correlation, which is the default behavior of out of box CRM, can be configured to suite individual needs.

There are four registry keys that allow you to manipulate the smart matching behavior. These registry keys need to be added under the CRM server registry hive only. I.e. HKLM\Software\Microsoft\MSCRM

1. HashFilterKeywords

a. Description: This is a regular expression that is used to cancel out the noise in the subject line. All matching instances of the regular expression present in the subject line are replaced with empty strings before generating the subject hashes.

b. Default value: ^[\s]*([\w]+\s?:[\s]*)+

Basically it indicate that we internally (by default) will ignore any word at (multiples of it) at the start of the subject line that has a “:” at the end of it example:

Subject

Ignored words

1

Test

None

2

RE: Test

RE:

3

FW: RE: Test

FW: RE:

Note: By default we do not ignore starting phrases in the subject line like “Out of office:” as this does not have the first word with the “:” next to it. For ignoring this phrase you can update the regular expression in the registry as “^[\s]*([\w]+\s?:[\s]*)+|Out of office:”. Do not place the double quote that I have around the string in the example into the registry. The text in the registry should only be the regular expression you want to use for ignoring words from the subject line.

2) HashMaxCount

a. Description: This is the max number of hashes that will be generated for any subject or recipient list. I.e. if the subject after noise cancellation contains more than 20 words only the first 20 words are considered.

b. Default value: 20

3) HashDeltaSubjectCount

a. Description: This is the maximum delta allowed between subject hash counts of the emails to be correlated.

b. Default value: 0

4) HashMinAddressCount

a. Description: This is the minimum hash count matches required on the recipients list for the emails to be correlated.

b. Default value: 2

Limitations:

The email hashes are generated when the email are sent out. If you change the HashFilterKeywords or the HashMaxCount via registry key only the new outgoing and incoming emails will be affected. The existing email hashes are not recalculated. Also CRM does not provide any out of box functionality to re-calculate the hashes.

Also the smart matching currently does not have a time limit on how old the correlated email could be. In CRM 5.0 we would address this along other improvements to smart matching.