SharePoint 2010 cannot crawl PDF files


Background

  • Environment: Windows 2008 SP2, SharePoint 2010 October CU, SQL Server 2008 SP2
  • PDF files were hosted within SharePoint
  • Adobe PDF ifilter were installed correctly

Requirement

SharePoint search should be able to search within PDF content

 Issue

After numerous checks and cross checked (by multiple people), Search was just not be able to crawl PDF content.

Regarding installing Adobe PDF iFilter, there are multiple documentation on the internet, but I was primarily going off http://support.microsoft.com/kb/2293357

Weird thing was everything was working per documentation in development (single) server environment, but just would not work in QA (multi) server environment.

After painstaking review of ULS (with verbose turned on for SharePoint Search), ULS had listed below entry

 05/15/2012
12:58:50.58        mssdmn.exe (0x13C0)        0x063C        SharePoint Server Search        Exceptions        1hjo        Medium        Exception thrown: 0x80070005 (d:\office\source\search\native\ytrip\tripoli\filtereg\filtreghelper.cxx:788 ip 0x000007FEE19E2D59)

 Above error was detected everytime an attempt was made to request PDF file.

 For e.g.

 CHttpAccessorHelper::InitRequestInternal - successful request for 'http://servername/Shared%20Documents/PDF%20Search%20Text%20Sample.pdf'.  [httpacchelper.cxx:613]  d:\office\source\search\native\gather\protocols\http\httpacchelper.cxx

 Resolution

Exception 0x80070005 is related to Access denied, hence this pointed to the fact that service account running SharePoint Search (for e.g. Domain\svc_SPS_Search) did not have access to PDF Filter file used by Adobe.

In dev env, it worked as everything was running under farm account).

 After giving appropriate READ permissions to Adobe Filter binary folder (and restarting Search service), we were able to crawl PDF files hosted within SharePoint

 Note: This issue may also occur for FoxIT on multi server environment (with Search service account being different from farm account).

Comments (3)
  1. PDF content is not crawling in multifarn says:

    I have done the same steps as described at support.microsoft.com/…/2293357 and Working fine in my Test Server (Single Farm) but it is not working yet in Production Server (Multifarm).

    My Production Servers are

    2 Front End Server (Server Farm 1 & Server Farm 2) running under NBL

    2 Storage (Clustered)

    I have check Crawl Log (PDF content is not crawling)

    Is there thing left to do?

    Any Help?

  2. SharePoint 2010 Search - PDF content search configuration in multi-farm Environment says:

    Finally, I came up with the solution!! after considering all others possibilities.

    Final Solution given at social.msdn.microsoft.com/…/2d4470ef-fd0a-4e12-8886-68837865dfba

    Find Details Step by Step from my Blog Post

       SharePoint 2010 Search – PDF content search configuration in multi-farm Environment

  3. COnfused says:

    What/where is the  Adobe Filter binary folder and how did you go about giving it permsions

Comments are closed.

Skip to main content