SP2013 - Crawler is unable to crawl some file share directories.

Hi Search Enthusiasts !

Today I would like to share a SP2013 crawl issue one of my customers faced a while back.

A File Share content source was set up in the Search Service Application but everytime a full crawl completes the crawl log was showing that no documents were sucessfully indexed.

How to troubleshoot that crawl behavior ?

One of the best tool to validate the crawl behavior against file shares is ProcMon (https://technet.microsoft.com/en-us/library/bb896645.aspx)

To investigate the crawl, here's what you could do 

  • Start ProcMon on one or more Crawl Components. Since SP2013, it is now impossible to determine which crawl component will crawl your content source.
  • Set up the Filter on the process mssdmn.exe.
  • You may want to drop the Filtered Event to avoid huge memory consumption (Procmon events capture is backed by Virtual Memory).

  • Start your capture
  • Run a Full Crawl against your file share content source.
  • Upon completion, go to the corresponding ProcMon (since you might have one per Crawl component).
  • Go to Tools/File Summary

  • Sort by Path

  • Choose a File Share Directory path for instance (i.e. "\\fileshare1\dir1") and double-click.
  • Back to the main ProcMon events window, Look for some QueryBasicInformationFile operation.

5/7/2014 9:56:35 AM 9:56:35.9568217 AM mssdmn.exe 7636 QueryBasicInformationFile \\fileshare1\dir1 0.0000062 SUCCESS CreationTime: 4/2/2011 7:00:55 AM, LastAccessTime: 5/7/2014 9:56:34 AM, LastWriteTime: 4/2/2011 8:23:22 AM, ChangeTime: 5/7/2014 12:24:49 AM, FileAttributes: DO

On the directory, check the File Attributes. In our case they were set to DO

D stands for Directory
O stands for Offline

For reference : https://hiddencodes.wordpress.com/2013/10/08/decode-attributesfileattributes-value-in-procmon-output/

By default, the crawler doesn't crawl offline files !

How to change that default crawl behavior ?

To crawl offline files, on each crawl component,

  1. Regedit the following hive HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\15.0\Search\Global\Gathering Manager
  2. Add a new DWORD called CrawlOfflineFiles
  3. Set the value to 1
  4. Restart the osearch15 service on all Crawl Components.
  5. Start a Full Crawl.

Et Voila ! Your Offiline files should now be crawled and searchable.