Introducing the ExportCrawlLog STSADM Command Extension

In versions of SharePoint prior to MOSS 2007 each time a crawl was executed, a new group of log messages were stored to the database.  Also, the name of the log itself was changed in the documentation and the user interface. Formerly this log was known as the Gatherer Log, but it is now called the Crawl Log. 

When troubleshooting problems with the crawl of a particular content source it was (and still is) sometimes useful to compare and contrast the messages logged between one crawl and the next.  In MOSS 2007, the storage of the crawl log messages has been optimized/minimized such that only the most recent message for a given URL is stored in the database.  As a consequence, the results from a prior crawl are overwritten by results from subsequent crawls.  In other words, you can only ever see the most recent log message for a given URL.

This is where the STSADM command extension “ExportCrawlLog” comes in. The motivation for preparing this tool is to provide a way to make a “snapshot” of the Crawl Log information at a point in time to facilitate post-mortem analysis of crawl problems.  As a bonus, in addition to extracting crawl log detail, it also provides some summary reporting features.  The goal of the tool is to provide a means of gathering data by which you can track and manage the health of your index over time.  For instance you could setup a scheduled task to run this command once a day and generate summary reports that can provide data for trend monitoring.

ExportCrawlLog uses only the published APIs of the SharePoint Object Model and must be run on the index server of your SharePoint Farm. ExportCrawlLog is available as source code on Codeplex at https://www.codeplex.com/ExportCrawlLog and is part of the Search Community Toolkit.

Please use the Discussion tracking and Issue tracking features of Codeplex to offer your feedback.

Larry Kuhn
Architect
Microsoft Consulting Services.