Search is one of the key features provided by Microsoft Office SharePoint Server (MOSS) and there are lot of improvements in this area in SharePoint 2010. I would prefer to talk about the scalability improvements in this release compared to MOSS.
The major bottlenecks for search in MOSS are,
1. There is only one Search db which is part of the SSP and sharing this across crawling and querying limits the system. This also affect the crawl speed as well as query latency when the number of items in the index increases.
2. Single index flat file on query servers does not scale.
3. Indexer is the single point of failure for search subsystem.
4. Load on SQL as the crawl/query tables are in the SSP Search database.
In SharePoint 2010, the search system can be split into multiple independently scalable components
Crawl Components (Indexer)
If crawl process is bottleneck, add additional crawler machines
Crawl history databases (SQL) and Metadata databases (SQL)
If SQL database is the bottleneck, add additional databases
Index Partitions (Query server)
If flat file index is bottleneck, split it into multiple flat files.
Admin Component (not scaled-out)
Includes associated search admin database to store configuration information
One interesting point to note is that Crawler machine is stateless worker, means it doesn’t store any index on its hard drive. Completes indexing and propagates content to query servers.