Manage and Generate the SharePoint Thesaurus files with a SharePoint List (Part 1 of 2)

I have seen a few implementations of SharePoint Thesaurus generators that use various means to manage the data in a thesaurus file and then generate the file.  Why not use a SharePoint list to do the same thing.  If you are using Excel, as most of these do, why not a list.  By using a native SharePoint list you can expose the list to various users and collaborate on the data, just what a SharePoint list is meant to do.  Once completed the list can be created on any site within a SharePoint site collection, which includes Central Admin if you prefer to have administrators manage the information there.

More information on the thesaurus can be found here.  In general the thesaurus in SharePoint is made up of two different parts.  Expansion sets, which provide synonyms to search, and the replacement sets that provide replacement values for a particular keyword.  There are several ways you could implement this, but I chose to create a SharePoint list to allow users to enter their thesaurus entries and take advantage of workflows, approval, etc. as part of creating those entries.  The list will be based on a content type that defines the fields we expose to users to manage the entries.  When the user is ready a custom action will be available to generate the file which can then be sent to an administrator to apply to the farm.  This solution is targeted at SP 2010, but the same would apply to 2007 as well, which is where I first created this idea.

The first step is to create a schema for our list/content type.  I created a fields for ‘Entry Type’, ‘Match’ and ‘Substitutions’.  Entry type will be a choice type with values of Expansion and Replacement.  The ‘Match’ field will be used to identify the word you are targeting for replacement in the case of replacements or the word you are generating synonyms for in the expansion set.  Although expansions don’t need a target word I think it helps the user intuitively think about what they are doing.  Again, your implementation may vary.  Finally the ‘Substitutions’ field which provide the synonyms or replacement values in a comma delimited format.  You could also have a field defining the target language, etc. if you were generating multiple thesaurus files from your list.  When all is said and done we have a content type and the corresponding fields as defined below.

    1: <?xml version="1.0" encoding="utf-8"?>
    2: <Elements xmlns="https://schemas.microsoft.com/sharepoint/">
    3:     <Field ID="{A2124DA1-5194-4f17-AE5B-23D118C52FF0}"
    4:           Name="ThesaurusEntryType"
    5:           DisplayName="Entry Type"
    6:           Description="Type of Thesaurus Output to create"
    7:           Group="Blog Fields"
    8:           Type="Choice"
    9:           Required="TRUE"
   10:           Format="Dropdown"
   11:           Overwrite="TRUE">
   12:         <CHOICES>
   13:             <CHOICE>Expansion</CHOICE>
   14:             <CHOICE>Replacement</CHOICE>
   15:         </CHOICES>
   16:     </Field>
   17:     <Field ID="{F77E9210-2D52-413d-BB21-6C0788DF2803}"
   18:            Name="ThesaurusWord"
   19:            DisplayName="Match"
   20:            Description="Word to match in the thesaurus"
   21:            Group="Blog Fields"
   22:            Type="Text"
   23:            Required="TRUE"
   24:            Overwrite="TRUE">
   25:     </Field>
   26:     <Field ID="{C7271283-D681-4d56-A502-F2AD78A86412}"
   27:            Name="ThesaurusSubs"
   28:            DisplayName="Substitutions"
   29:            Description="Comma delimited list of words to substitute in the thesaurus"
   30:            Group="Blog Fields"
   31:            Type="Text"
   32:            Required="TRUE"
   33:            Overwrite="TRUE">
   34:     </Field>
   35:   <!-- Parent ContentType: Item (0x01) -->
   36:   <ContentType ID="0x01004aa858bfc0e14b2ba3b4beb23b2cdb48"
   37:                Name="Thesaurus"
   38:                Group="Blog Content Types"
   39:                Description="Content type for managing the thesaurus entries for search"
   40:                Inherits="TRUE"
   41:                Version="0">
   42:     <FieldRefs>
   43:         <RemoveFieldRef ID="{fa564e0f-0c70-4ab9-b863-0177e6ddd247}" Name="Title" />
   44:         <FieldRef ID="{A2124DA1-5194-4f17-AE5B-23D118C52FF0}" Name="ThesaurusEntryType" DisplayName="Entry Type" Required="TRUE" />
   45:         <FieldRef ID="{F77E9210-2D52-413d-BB21-6C0788DF2803}" Name="ThesaurusWord" DisplayName="Match" Required="TRUE" />
   46:         <FieldRef ID="{C7271283-D681-4d56-A502-F2AD78A86412}" Name="ThesaurusSubs" DisplayName="Substitutions" Required="TRUE" />
   47:     </FieldRefs>
   48:   </ContentType>
   49: </Elements>

The next step is to build a list definition from this content type.  Visual Studio 2010 makes this very easy.  Right click the project and select Add New Item.  Choose List Definition from Content Type as the type and click OK.  The wizard walks you through a few steps to select the content type, etc. and you end up with a list definition elements file that looks similar to what is shown below.  I made a few changes in order to clean it up a bit.  I changed the name attribute and changed the Type attribute from 10000 to 10101 just because I could.

    1: <?xml version="1.0" encoding="utf-8"?>
    2: <Elements xmlns="https://schemas.microsoft.com/sharepoint/">
    3:     <!-- Do not change the value of the Name attribute below. If it does not match the folder name of the List Definition project item, an error will occur when the project is run. -->
    4:     <ListTemplate
    5:         Name="Thesaurus List"
    6:         Type="10101"
    7:         BaseType="0"
    8:         OnQuickLaunch="TRUE"
    9:         SecurityBits="11"
   10:         Sequence="410"
   11:         DisplayName="Thesaurus List"
   12:         Description="My List Definition"
   13:         Image="/_layouts/images/itgen.png"/>
   14: </Elements>

The next change to apply here is a add a custom action to our list so that users have a button to click in order to generate the thesaurus file.  For a 2007 version of the custom button you could paste the snippet shown below under the ListTemplate element.  This will render in the actions menu in 2007.  In 2010 it will show up in the Custom Commands Ribbon when the list is being viewed.  The UrlAction dictates that the link will cause the page to request the thesaurus file in the blog folder under the _layouts directory.  Notice also that we are passing the {ListId} token to the page for reference.

    1: <CustomAction Id="Blog.CustomActions.ThesaurusGenerator"
    2:         RegistrationType="List"
    3:         RegistrationId="10101"
    4:         GroupId="ActionsMenu"
    5:         Location="Microsoft.SharePoint.StandardMenu"
    6:         Sequence="1001"
    7:         Description="Generates a thesaurus file for sharepoint"
    8:         Title="Create Thesaurus File">
    9:             <UrlAction Url="~site/_layouts/blog/thesaurus.aspx?List={ListId}"/>
   10:     </CustomAction>

A 2010 version of the same may look like what is shown below.  In 2010 the location attribute of the CommandUIDefinition indicates where the action should go.  It has a syntax of Ribbon.[Tab].[Group].  In the case referenced here it would render a button to the List ribbon in the Actions group ( Connect ), again allowing a user to export the list to a thesaurus compatible XML file.

    1: <CustomAction Id="ThesaurusCustomRibbonButton"
    2:               RegistrationId="10101"
    3:               RegistrationType="List"
    4:               Location="CommandUI.Ribbon"
    5:               Sequence="5"
    6:               Title="Thesaurus Ribbon Customization">
    7:     <CommandUIExtension>
    8:         <CommandUIDefinitions>
    9:             <CommandUIDefinition Location="Ribbon.List.Actions.Controls._children">
   10:                 <Button
   11:                     Id="Ribbon.List.Actions.CreateThesaurusFile"
   12:                     Sequence="5"
   13:                     Description="Create Thesaurus File" 
   14:                     LabelText="Create Thesaurus File"
   15:                     Alt="Create Thesaurus File" 
   16:                     Command="CreateThesaurusFile"
   17:                     Image32by32="/_layouts/1033/images/formatmap32x32.png"
   18:                     Image16by16="/_layouts/1033/images/formatmap16x16.png"
   19:                     TemplateAlias="o1" />
   20:             </CommandUIDefinition>
   21:         </CommandUIDefinitions>
   22:         <CommandUIHandlers>
   23:             <CommandUIHandler
   24:               Command="CreateThesaurusFile"
   25:               CommandAction="_layouts/blog/thesaurus.aspx?List={ListId}" />
   26:         </CommandUIHandlers>
   27:     </CommandUIExtension>
   28: </CustomAction>

Lastly we need to make some edits to the schema.xml file that was generated for us by the wizard.  There are two views defined by the wizard by default, each with a different set of columns to display.  In each section change the <ViewFields /> element to that shown below.  This will make the default view of the list display only the fields we are interested in as part of this project.

    1: <ViewFields>
    2:     <FieldRef Name="ThesaurusEntryType" />
    3:     <FieldRef Name="ThesaurusWord" />
    4:     <FieldRef Name="ThesaurusSubs" />
    5: </ViewFields>

This provides us the content type, the list schema and a custom action to generate the thesaurus at will.  We will pick up here in the next post and discuss the custom application page that handles generating the thesaurus and the special handling required to do it correctly.