Recently one of my customer had 5 million invoices and he wanted to have them scanned and stored with in SharePoint. He wanted to know if SharePoint ( out of the box) will be able to index the data with in these scanned documents. He was not ready to purchase an OCR software for these files to be searchable.
In such scenario’s its better to get the scanned documents in .tif (TIFF) format. The reason is SharePoint 2010 can index TIFF files without the need of OCR and for that matter you dont even need to buy any IFiler for it.
Below steps will help you achieve it. ( This is assuming that TIFF files are uploaded on a SharePoint site)
1. The first step to enable the iFilter .Its a feature to Windows Server. his is done through Server Manger. Click on Features in the tree and then Add Features on the right.
2. Choose Windows Tiff IFilter.
3. After this, we need to make changes in the Group Policy console. To get to Group Policy Console, click on Start -> Run and type gpedit.msc
4. You should see the below screen with option to click on Administrative Templates -> OCR.
5. The first option is to force OCR for all pages in a tiff. The TIFF iFilter attempts to optimize performance by skipping blank pages or pages that have non-textual content such as pictures. In my scenario almost every tiff was an invoice so I enabled this to ensure that no pages were missed during the OCR process.
6. The next setting is for the OCR language that you wish to check for. By default this will be the server system language, however if you have several different languages that you are expecting you can enable them here as long as they are
part of the same code page. We need enable this option as well ( same process as the previous step )
7. Now you need to do a IISRESET and also RESTART SharePoint Timer Service.
8. Now that you have the Tiff iFilter installed a full crawl will need to be run to OCR the documents.
9. Once the crawl has finished go to a Search Center Site and search for some text that should be in one of your tiffs to see if everything worked. Example below:
Hope this blog post was helpful!.