Index PDF Files in SharePoint

One of the most common question that I face everyday is how to index PDF files in SharePoint. The business scenarios in which this is required are mainly the companies that want to move to a paperless office.A lot of companies like those in the insurance sector would like to find a way to scan the documents and store it digitally and later on search through those documents.Well iFilter does the magic.

One word of caution: If the PDF file contains images instead of text, i.e. you cannot select the text using a Acrobat Viewer - SharePoint will not be able to index it even with iFilter for Adobe configured. The workaround are:

1. Convert the image files in TIFF format and configure iFilter for TIFF files in SharePoint so that the TIFF files are indexed.

2. Use a Acrobat Writer to convert the image in Text (OCR) and then use iFilter for Adobe to index PDF files in SharePoint

Now lets dig down to the configuration part.

First you need to obtain Adobe iFilter from https://www.adobe.com/support/downloads/detail.jsp?ftpID=2611 (this is the current version as on today)

Visit https://support.microsoft.com/default.aspx?scid=kb;en-us;555209  for complete guidlines on how to setup the iFilter. As far as my experience goes, its pretty simple.

Add the file extension to your index server by going to your site settings and configure search and indexing. There is an option for "Include file types", select that option and then enter "New File Type" and enter "pdf" as a file extension to index.
You will need to re-index your entire site to capture this change.
After both these changes you need to run IISRESET on the sps server.

PDF files that have been uploaded before the iFilter was installed will not be indexed by default. Rebuild the Full Text Catalogue in SQL to index the files that were uploaded before the iFilter was installed.

 

This posting is provided "AS IS" with no warranties, and confers no rights.