Searching External Systems using SharePoint 2010 Business Connectivity Services (BCS) within throttling limits
Summary: Learn how to use SharePoint Server 2010 Business Connectivity Services (BCS) feature to search content in external system when the number of records in system are more than throttling limits defined for the system.
Searching external systems with large number of records is a complex problem because on one end the BCS is trying to avoid bringing down the external system/SharePoint by limiting number of records which can be fetched but on other end it will not be able to crawl the system due to throttling. In this walkthrough, you will see you how properties can be configured on metadatamodel to chunk the calls for crawling the external system. The article will also show the changes in requirements for implementing search for External Systems.
Applies to: Microsoft SharePoint Server 2010
Search In BCS
Setup Server Environment
Setup BCS in Farm
Create Metadata Model
In this scenario, AdventureWorks database hosted in SQL Server 2008 is used as an External System containing the records to be crawled. A metadata model file is created with IdEnumerator and a SpecificFinder stereotypes implemented to retrieve the data for an External Content Type(ECT). Properties are configured in model to help BCS get the ids for external items to be crawled in batches while staying under the throttling limits.
Search In BCS
Changes have been made in the search functionality in BCS and IdEnumerator is no a requirement for implementing the crawl. BCS uses one of the following stereotypes for crawling the external system:
- Finder Method: If this stereotype is implemented and it has RootFinder property applied to it, then search will use it to crawl the external source.
<Property Name="RootFinder" Type="System.String"></Property></Properties>
- IdEnumerator:If no Finder method is found with RootFinder property applied to it, IdEnumerator is used to crawl the system.
In addition to above, SpecificFinder stereotype is used if the columns returned by it are more than what's returned by Finder/IdEnumerator.
There are other properties shown in Table 1 which affect the behavior of search on UI and how data gets crawled
This property is applied on Model Metadata object. This specifies that a LobSystemInstance in the model file should be displayed in the search user interface. For e.g. dispalyed on page for defining the Line of Business search content source
Specifies the Finder method that will be used to enumerate the items to crawl
If set on the Finder or Idenumerator method instance, will cache the values returned and not call the SpecificFinder unless there is a cache miss.
Table 1-Metadata model properties for search
Set up Server Environment
The list of software required to run this scenario is as follows:
- 1. Microsoft SQL Server 2008 x64
- 2. Microsoft SharePoint Server 2010
Create AdventureWorks Database
AdventureWorks sample database available on Codeplex is used in this scenario to represent the external system.
Setup BCS in Farm
First step in this scenario is to setup the BCS infrastructure in the farm. It inlcudes setting up BCS Service Application and Profile pages for the ECT.
Follow the following steps to setup the infrastructure:
- 1. Start the Business Data Connectivity Service in the farm
- 2. Create a Business Data Connectivity Service Application
- 3. Create a Web Application http://bcs.contoso.com and create a site collection on root site
- 4. Create a subsite with url http://bcs.contoso.com/profile. This will be used for hosting the profile pages for all ECTs.
- 5. Give rights on the AdventureWorks database and table Sales.SalesOrderHeader to the account used for accessing ECT
Create Metadata Model
SharePoint Designer 2010 (SPD) provides functionality to create ECTs and save or export them in a metadata model file but the IdEnumerator stereotypes cannot be modelled using SPD. Therefore this stereotype has to be modelled manually. To make things easier and get the basic metadata ready, it's recommended to create the initial metadata model using SPD. This can be further edited using any xml editor/notepad to add the IdEnumerator or other methods types (which cannot be created using SPD).
In this scenario we will be using IdEnumerator to crawl the system. After getting the list of Id's identifying the external items, search will call SpecificFinder on each Id returned by the enumerator to index the content as the fields in SpecificFinder are more than what gets returned by IdEnuemrator.
LastIdSeen filter type is used to batch the calls to external system. LastIdSeen filter is used by SharePoint Search to pass the identifier of the last external item seen as part of call to IdEnumerator method. It passes this as the filter value while calling the IdEnumerator in a loop. The external system should implement the logic to return the records under the throttling limits by using the filter value as the point from where next set of items needs to be fetched. In this scenario, we are using the value 43658 as default value for filter. This gets used when IdEnumerator is called for first time and Search has yet not seen any Id(LastSeenId).
Snippet below shows method definition for a LastIdSeen Filter.
<FilterDescriptor Type="LastId" Name="LastIdSeen">
<Property Name="UsedForDisambiguation" Type="System.Boolean">false</Property>
<Property Name="IsDefault" Type="System.Boolean">false</Property>
<Property Name="CaseSensitive" Type="System.Boolean">false</Property>
<Parameter Direction="In" Name="@SalesOrderId">
<TypeDescriptor TypeName="System.Int32" IdentifierName="SalesOrderID" AssociatedFilter="LastIdSeen" Name="SalesOrderID">
<DefaultValue MethodInstanceName="SalesOrderHeaderReadIds" Type="System.Int32">43658</DefaultValue>
Metadata snippet below shows IdEnumerator method. If no default value is passed, this gets the top 1000 rows else gets the records starting from the LastIdSeen parameter(SalesOrderId) passed to this method.
<Method IsStatic="false" Name="SalesOrderHeaderReadListIds">
<Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=220.127.116.11, Culture=neutral, PublicKeyToken=b77a5c561934e089">Text</Property>
<Property Name="RdbCommandText" Type="System.String">IF (@SalesOrderId < 43658) BEGIN
SELECT TOP 1000 SalesOrderId, ModifiedDate FROM [Sales].[SalesOrderHeader] ORDER BY SalesOrderId ASC
SELECT SalesOrderId, ModifiedDate FROM [Sales].[SalesOrderHeader] WHERE SalesOrderId BETWEEN @SalesOrderId AND (@SalesOrderId+1000) ORDER BY [SalesOrderId] ASC
File containing the metadata is available at the end of article as an attachment.
Once ECT has been defined, external data can be crawled and the items can be indexed by SharePoint search. Pre-requisites to enable crawling are:
- ECT created with Finder or IdEnumerator defined
- A SpecificFinder defined if the fields to be crawled are not available as part of Finder/Enumerator
- SharePoint Search Service started and Service Application created for search
- Content Source of type Line of Business Data defined with AdventureWorksForSearch model as external data source to be crawled
- Profile page created for SalesOrder ECT at http://bcs.contoso.com/profile
- Ensure that the crawl account has rights to connect to AdventureWorks database and read permissions for content of site http://bcs.contoso.com/profile
Once all of above is done, start a full crawl of the content source and ensure that content is crawled succesfully by verifying into the crawl log.
Search for any SalesOrder which is in present in the system document and verify if the results are returned for that. In our example, search for term "SO43659" and you will get one result from external system as shown. The result item url is the profile page used to show the ECT profile. On selecting the link in search result, system navigates you to the profile page and show details of the external item.
BCS Search has been simplified and now items can be easily crawled by using SpecificFinder stereotypes. In addition using throttling features, governance can be imposed to ensure that external system and SharePoint doesn't get overhauled by users requests.