SQL File groups and Search

This article has been a long time coming, but it is finally here.  In the post below I will cover how to configure the Search database to span multiple filegroups.  First I’ll cover a little about the benefits of doing so:

General references on what SQL file groups are:

The method that we have chosen to implement filegroups on the Search database is one of segregation.  We have identified all of the tables and indexes within the database that are solely used for crawling and not used at all to satisfy end-user queries.  The remaining tables and indexes are used for end-user queries.  However, the nature of the Search and indexing problem still dictates that the "query" tables are written to during a crawl.  The crawl only tables and indexes are isolated into their own filegroup.  With the crawl and query centric filegroups identified you can now ensure that the IO intensive process of crawling has a reduced impact on the IO subsystem that is hosting the query filegroup by ensuring that these filegroups are on separate spindles.

The whole goal of using filegroups is to improve the performance of the system.  This is done by providing an additional file.  This file must be placed on a different set of spindles to see any kind of performance enhancement.  If your SQL machine is not IO bound for the Search database then implementing filegroups will not provide you with any benefits. 

To make the migration process easier we did not actually create a query filegroup.  We simply created a new filegroup called "CrawlFileGroup" and moved the crawl tables out of the PRIMARY filegroup.  Such that PRIMARY effectively becomes the query filegroup.  This migration process is one that can be quite expensive to complete and could take hours to finish.  Keep this in mind when scheduling this on your production servers.  Because the move involves dropping and recreating numerous clustered indexes you should assume that the DB is offline during this move as many long running locks will be taken to recreate the index.  

Issues and concerns with using filegroups:

Back-up and Restore

One concern that you will need to be aware of in you planning for deploying filegroups on the Search database is that your restore process will be slightly impacted.  Out of the box Search restore is unaware of the filegroup that will exist within the backup image.  Because of this there is no way to indicate where this file should be restored to.  As a result the restore process is going to try and place the crawl filegroup file onto the same drive that it existed  on when you ran the back-up.  Once you enable filegroups you will be committed to making sure that all future machines that you restore your back-up to have a drive with the same drive letter that you initially created the filegroup on.   

Future upgrades, Service packs and Hot fixes

Each Hotfix, Service Pack and update that you apply to the server has the potential to modify the index that was moved into the CrawlFileGroup or add an new index to one of the tables moved to the filegroup.  When/if this happens the index will be moved back or created in the primary filegroup.  Updates will also clean out any non-product sproc.  Because of the risk of index modification with updates applied you will need to reinstall the stored proc and run the scripts again after each update applied.

The risk of a new index being added or modified quite low at this time.  We have confirmed that this does not occur if upgrading from RTM to SP1.  But, it does happen when upgrading from SP1  to the Infrastructure Update.  Future Updates are less like to modify the set of indexes.

However, the risk still exists and you will want to re-run the scripts below after each update that you apply to your system.  In the case when you apply an update and the index did not change running the script is a no-op and nothing gets moved.  So it is very cheap to run the script on a system that already has the indexes moved. 

SQL 2005 and greater

The script that is moving the indexes is utilizing new features that were released in SQL 2005.  As such you cannot perform this optimization with SQL 2000. 

Step- by-Step instructions for applying filegroups to your environment.

To deploy this you will need to manually create a file group on the Search database.  To do this execute the following steps:

a. Go to the Filegroups section of the Search database properties within SQL Server Management Studio.

b. From the Filegroups section click add and fill in the name "CrawlFileGroup." The scripts are written assume the filegroup has this name, failure to use this name will result in early failures  in the script


c. Once you have a new filegroup with the name CrawlFileGroup you need add a file into this group.  To do this select the Files section of the database properties dialog and add a new file into the CrawlFileGroup.  Be sure that you place this file onto a separate drive with isolated spindles.


d. Next you need to install the stored proc that will move the indexes and tables to the new filegroup.  Open the script named  MoveTableToFileGroup.sql within Management Studio and execute it; ensuring that you are working with the Search database  This will create a stored proc named proc_MoveTableToFileGroup.  Confirm that this sproc does indeed exist within the Search database.

e. Open and execute the second script named   MoveCrawlTablesToFileGroup.sql, this is the script that does all of the work by calling proc_MoceTableToFileGroup for each table that is dedicated for crawling. 

That is all there is to it.  You have now moved you crawl tables on to a separate set of spindles. 

Thank you for your time and as always I welcome any feedback or questions

Dan Blood
Senior Test  Engineer
Microsoft Corp

Comments (4)

  1. David says:

    Why not use extra data file (ndf) spread over multiple spindles instead of additionnal file groups?

  2. (このポストでは、書籍「 VSTO と SharePoint Server 2007 による開発技術 」(8章 : エンタープライズ検索と BDC) の補足情報を記載しています。基本的な内容 / 全体は、書籍を参照してください。)

  3. Using MOSS Search in scenarios where low latency is needed, requires special attentions to get the best