Behavior of Incremental crawl when the Web Front End for crawling is not available


Hi, this is Ajitha with the SharePoint Technical Support team.

Recently while working with a customer, I was asked how the crawl behaves if the dedicated Web Front end (WFE) for crawling becomes unavailable. To answer this question I read through some interesting information about crawl which I thought I should share. This information would be present with you already.

Some situations when the Web Front end can become unavailable - IIS on the  WFE is hung, Application pool for the site is stopped, Network or hardware failure on the server, System rebooted unexpectedly, Server crash etc....

What is the Crawl behavior in a situation "the item has been deleted from the site or the crawl account's access to the item has been revoked"


Full crawl will mark an item for deletion in the content index during crawl. And at the end of the crawl the items are deleted from the content index. There are two separate stored procedures to perform these jobs.

And Incremental Crawl behaves in a similar manner to full crawl.


For more details on crawl refer the links below:


Now going back the original question "Incremental crawl behavior when the dedicated server is unavailable"

What I noticed is that if the site is unavailable for crawling, the indexer will try to crawl the site and fail. Also the result for full and incremental crawl is not the same.


If you perform a full crawl when the site is not available, it will delete all the items in the content index. Oops so if the site becomes unavailable just before the scheduled full crawl we are in for trouble as we will end up with an empty content index.

On the other hand the consecutive incremental crawl will keep the items in the content index. But if the incremental crawl fails one hundred consecutive times, the index server removes the affected content from the index. Refer the topic "Reasons to perform a full crawl" @ http://technet.microsoft.com/en-us/library/cc160651.aspx


I tested this behavior with the steps below


  1. Created a web application with a publishing site.

  2. Created a SSP to crawl only this site.

  3. Performed a full crawl

  4. Stopped the application pool for the web application and performed the full crawl.

  5. Started the application pool and performed the full crawl again

  6. Scheduled the incremental crawl to occur every minute

  7. Stopped the application pool for the web application

Below listed is the screenshot of the above test:

Note: Click on the images to see larger picture

FULL CRAWL

I) Site

clip_image002

II) Items in index after successful full crawl

clip_image004

III) Site unavailability after the application pool for the site is stopped.

clip_image006

IV) Manually initiate a full crawl. (Here full crawl is not scheduled)

clip_image008

V) Index status after the full crawl

clip_image010

VI) Error in crawl log - "The crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly."

clip_image012

                                                                                                                                                                                                                                                                                       


INCREMENTAL CRAWL


I) Site

image

II) Items in index after successful full crawl


image


III) Site unavailability after the application pool for the site is stopped.

clip_image006


 


IV) Incremental crawl schedule


clip_image001[4]

V) Error in the crawl log - "The crawler could not communicate with the server. Check that the server is available and that the firewall access is configured correctly."


image


VI) Content still present in the index after consecutive crawls


image


VII) Content cleared after 100 consecutive incremental crawls


image


 


I hope this answers the question about crawl behavior when WFE setup for crawling becomes unavailable.

Comments (0)

Skip to main content