SP2013 - How to better balance your Distributed File System (DFS) crawl

Hi Search Enthusiasts

 

I came across one problematic when crawling a Distributed File System split across multiple physical locations but grouped under one DFS share.

The main DFS share was set to \\contoso.com where underneath the content was split per region/location

For instance

\\contoso.com\Europe

\\contoso.com\Asia

\\contoso.com\Americas

 

As you set your content source to crawl the root directory \\contoso.com , and since you potentially crawl remote locations (network latency implies), the crawl could end up with serious performance problem.

One location may impact the entire content source crawl (coupled under the same CS).

As locations might be more responsive than others (it was the case here) and since we crawl only one site (one host) it is not possible to adapt our crawl experience (can't have multiple Content Source pointing to the same host).

 

In this case, the best workaround is to simulate multiple hosts and configure crawl impact rules to handle slow locations. 

 

Quick Workaround

1. Create virtual host to identify each content location

192.168.0.1 emea.contoso.com

192.168.0.1 asia.contoso.com

192.168.0.1 americas.contoso.com

You can do that in the etc/hosts locally or globally in your DNS.

DNS is preferred though since those virtual hosts will be shown out of the box in the search results .

 

2. Define one content source per content location.

 

 

3. Identify the performance of each location (network latency mainly)

4. Create Crawl Impact Rules that reflect your content location performance

 

 In the above example,

  1. emea gets full crawl potential (very responsive).
  2. asia being the slowest location will have a slow paced crawl (slow network latency).
  3. Americas is set to half the potential of emea (responsive).

 

Since we have one CS per location, we may also schedule full or incremental at ease.

 

As said earlier in step#1, OOB the display URL in the Search Results will show the virtual host URL, so some customizing are needed in your search front end. Not to say, that this trick might not be suitable when dealing with a consequent number of locations.

One last thing, I would strongly recommend to check on Brian's blog post regarding the Host Distribution Rules in SP2013

https://blogs.msdn.com/b/sharepoint_strategery/archive/2013/06/30/why-host-distribution-rules-dont-apply-to-sharepoint-2013.aspx

 

Et Voila !

 

Stay Tuned.