Creating a custom XML indexing connector for FAST Search Server 2010 for SharePoint

XML files are often used as input to search engines. The article Custom XML Item Processing describes one way of indexing XML content for FAST Search Server 2010. Another option is to create a custom XML indexing connector. This option:

  • Selects and creates the mapping from the XML elements to crawled properties outside of document processing. This means you don’t have to enable optional document processing and configuration of the XML mapper.
  • Supports the indexing of multiple XML formats with your search solution.
  • Indexes XML files that contain multiple XML elements that should be indexed as separate items or documents. Note: This is unlike the method described in Custom XML Item Processing, in which each XML element must be stored in an XML file.

Example of a custom XML indexing connector for FAST Search Server 2010 for SharePoint

The custom XML indexing connector example described in this article is an extension of the custom indexing connector example described in Code Sample: MyFileConnector Custom Indexing Connector.

To install and enable the custom XML indexing connector and parsing sample

1. Download the Create custom XML indexing connector for FAST Search for SharePoint (.zip), which includes the Microsoft Visual Studio project and MyFileModel.xml Business Connectivity Services model file.

2. Extract the contents to a folder on your computer.

3. In Visual Studio, open the MyFileConnector project.

4. In Solution Explorer, expand the References folder, and then restore any missing project references. The sample includes references to the following SharePoint Server 2010 assemblies:

  • Microsoft.BusinessData
  • Microsoft.SharePoint
  • Microsoft.Office.Server.Search.Connector

5. On the application server, add the sample assembly (MyFileConnector.dll) to the global assembly cache. For more information, see How to: Install an Assembly into the Global Assembly Cache.

6. Copy MyFileModel.xml to the application server.

7. Open the SharePoint Management Shell. For information about using this tool, see Administering Service Applications Using the SharePoint 2010 Management Shell.

8. At the command prompt, type the following command, and then run it.

$searchapp = Get-SPEnterpriseSearchServiceApplication -Identity <name of FAST Query SSA>

9. At the command prompt, type the following command, and then run it.

New-SPEnterpriseSearchCrawlCustomConnector -SearchApplication $searchapp -protocol myfile -ModelFilePath "\\ServerName\FolderName\MyFileModel.xml" -Name myfile

10. Add the following registry subkey to the server, and then set the value to OSearch14.ConnectorProtocolHandler.1

[HKEY_LOCAL_MACHINE]\ SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\ProtocolHandlers\myfile

11. At the command prompt, type the following command, and then run it.

net stop osearch14 

12. At the command prompt, type the following command, and then run it.

net start osearch14

 

Indexing and searching your XML content

After you enable your custom XML indexing connector, you need to create a content source for the file repository, and then start a full crawl. 

To create a content source and start a full crawl

1.In Central Administration, in the Application Management section, click Manage service applications.

2. Click your FAST Search Content Search Service Application.

3. In the Crawling section, click Content Sources, and then click New Content Source.

4. Enter a name for the content source.

5. In the Content Source Type section, click Custom Repository.

6. In the Type of Repository section, click myfile.

7. In the Start Addresses section, enter myfile://FileServerName/FileShareName/.

8. Start a full crawl of this content source.

In the first crawl, the custom XML indexing connector discovers and extracts new crawled properties from your XML file. To make the crawled properties searchable and enable functionality such as sorting and navigation, create corresponding managed properties and map the crawled properties to the new managed properties.

You can create and map managed properties through the UI or by using the following Windows PowerShell code example:

To create a new managed property

1. At the command prompt, type the following.

$mp = New-FASTSearchMetadataManagedProperty –Name <name> -Type <type> -Description <description>

Where:

· <name> is the name of the new managed property

· <type> is an integer representing the data type of the new managed property. Valid values are:
-- 1 (Text)
-- 2 (Integer)
-- 3 (Boolean)
-- 4 (Float)
-- 5 (Decimal)
-- 6 (Datetime)

· <description> is a text description of the new managed property

      

To map a crawled property to a managed property

      1. At the command prompt, type the following: 

$cp = Get-FASTSearchMetadataCrawledProperty -name “cpname”

New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

 

After you create the managed property and map the crawled property, perform a new full crawl.

The XML elements are now searchable.  The following search result for prodprice shows the new managed property prodprice that is mapped to the crawled property price from the XML input file.

blog_image

  

 

Suggested extension: setting up an incremental crawl for the custom XML indexing connector

This example shows only an example of how to create a custom XML indexing connector. This example does not explain how to perform incremental crawls. The incremental crawl looks for items that have changed based on last modified time. You can use the following steps to set up an incremental crawl.

Note: This is only a suggested procedure and has not been tested.

To set up an incremental crawl for the custom XML indexing connector

1. Define a DeletedCountField and LastModifiedTImeStampField on your container entity.

2. Between folder and item, add a filter to the association navigator hat has a property named “CrawlSTartTime,” and associate it with an input parameter as follows:

<Method Name="GetAllFiles" LobName="GetAllFiles">

              <FilterDescriptors>

               <FilterDescriptor Name="LastModifiedSince" Type="Input">

                  <Properties>

                    <Property Name="CrawlStartTime" Type="System.String">x</Property>

                  </Properties>

                </FilterDescriptor>

              </FilterDescriptors>

               <Parameters>               

                <Parameter Name="modifiedSince" Direction="In">

                  <TypeDescriptor Name="modifiedSince" TypeName="System.DateTime" AssociatedFilter="LastModifiedSince" />

                </Parameter>

               ….

3. Modify the GetAllFiles method to take the DateTime parameter (DateTime modifiedSince), and have that method return only items that have changed since the supplied date time.