SharePoint 2013 Search: Data Normalization using Query Rules and Variables

SharePoint 2013 Search administrators have asked me several times, "How can I match the data XX-YYYY-ZZ when my users enter XXYYYYZZ or XX YYYYZZ?"  One option is to add all the alternate forms to the index, whether in the original source or through Content Enrichment.  Another, and perhaps more manageable, option is to match alternate forms using a query rule and to re-write the query.

Below are steps on how to create an Advance Query Text Match Query Rule that uses regex to match possible alternate forms of a query term, re-write the query using named groups and Query Variables, and match the normalized form in the index.  In the steps are examples of how to match alternate forms of a Mail Stop address that begins with the prefix MB followed by 4 digits and a quadrant designation (e.g., NW).  The Query Rule will then re-write the query using the normalized form MB XXXX NW.

From the Search Administration, Site Collection or Site Administration pages, select Query Rules and create a new Query Rule for the desired Result Source.

1. Create an Advanced Query Text Match rule.

2. Add a regular expression using .NET group names syntax (https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx). For example, in this regex, the group names are term1, term2, mb, num1 and quad:

(?i:(?<terms1>.*)(?<mb>MB)(\s?|-)(?<num1>\d{4})(\s?|-)(?<quad>NE|NW|SE|SW)(?<terms2>.*))

The intention of this regex is to use term1 and term2 as catch-all groups for anything preceding or following the Mail Stop syntax.  The other groups--mb, num1, and quad--match the parts of the Mail Stop syntax without any separators or delimiters the users may or may not add in the search box.

3. Create a “Change ranked results by changing the query” action. When you click on the Keyword filter, you should see your group names from the regex added to the list. Configure the Query Text as follows:

{terms1?} MailStop="{mb} {num1} {quad}" {terms2?}

The query uses Query Variables to reconstruct the user query using the normalized form of the Mail Stop in which it is indexed.  In this example, the new query specifically searches the MailStop managed property. It is not a default managed property, so don’t expect it to exist in your environment; I’m using it solely for purposes of illustration.  Since the regex match is very precise, the new query text can precisely query the correct Managed Property field.

The ‘?’ following each group name will remove that group from the new query text if the group value is empty.  Otherwise, you’d get the name of the group in its place.

4. Test your Query Rule. Note the groupings under Conditions and the New Query Text under Actions.