Search indexing for entities (Code, Work Item, WIKI) works in 2 phases:
- Bulk Indexing (BI) where the entire code and work item artifacts in all projects/repositories under a Collection are indexed. This is a time consuming operation and depends on the size of the artifacts under the collection.
- Continuous Indexing (CI) which handles all incremental updates to the artifacts (add/updated/delete) and indexes them. This is notification based model where the indexer listens to TFS events and operates based on those event notifications. CI handles almost all update operations including CRUD operations at Project/Repository/Collection layer (such as Repository renames, Project add/deletes, etc.). The operation time for these CI would depend again on the size of the incremental update. BI always precedes CI i.e. a CI will never execute on a project/repository until BI is completed for the same.
In addition to this, Search indexer has in-built resilience measures to monitor and run patch operations for any missed notifications and/or any other reason where the index is not up-to-date with the actual state of artifacts in TFS. These patch operations run at specific intervals (typically every 12 hours duration) on each Collection.
Keeping the above background information in mind, in most occasion one would never need to run a BI operation (i.e. Re-Index) on a Collection. However there are certain instances planned/unplanned where it will require a re-indexing. This can happen for various reasons such as -
- Index shard corruptions which are not auto-recovered from the indexer. Any such index level errors would required a clean re-indexing.
- Manually/accidentally deleted index data folder entirely (or specific index folders within it). Indexer does not auto-recover from such actions.
- Planning to move the index into another machine, let's say as part of an upgrade. In this case, the configuration upgrade step does take care of re-indexing internally, but the point is, re-indexing does happen.
- Search query shows stale results. Could be because of some consistent error, or the BI has simply aborted/crashed. And now, it requires a full re-indexing of that collection.
Coming to the primary focus of this post, how to do a clean re-indexing? This applies to both Code as well as other entities such as Work Item or WIKI.
Clean-up Index Data and Re-index
This applies to scenarios such as  &  listed above where there are index level errors and the index needs to be re-built.
- Pause Indexing for all collections. Run the following script on TFS Configuration DB
- Login to the machine where the Elasticsearch (ES) is running
- Stop the ES service
- Delete the entire Search Index folder (something like, C:\TfsData\Search\IndexStore, or wherever you had configured it to be)
- Restart the TFS Job Agent service(s) on the AT machines
- Get the list of Associated Job Ids from EVERY Collection DB by running the following query at every collection:
WHERE [AssociatedJobId] IS NOT NULL
(Save these JobIds; they will be used to run the delete command on tbl_JobQueue in one of the steps further down. The sequence of steps are very important, hence do not swap any of the steps)
- Delete the following tables from each of the collection databases.
DELETE FROM [Search].[tbl_IndexingUnit]
DELETE FROM [Search].[tbl_IndexingUnitChangeEvent]
DELETE FROM [Search].[tbl_IndexingUnitChangeEventArchive]
DELETE FROM [Search].[tbl_JobYield]
DELETE FROM [Search].[tbl_TreeStore]
DELETE FROM [Search].[tbl_DisabledFiles]
DELETE FROM [Search].[tbl_ItemLevelFailures]
DELETE FROM [Search].[tbl_ResourceLockTable]
- Delete Indexing jobs from JobQueue using the command below with inputs of the above query. Note: this needs to run at configuration level.
DELETE FROM [Tfs_Configuration].[dbo].[tbl_JobQueue]
WHERE JobSource = ‘<CollectionHostId from tbl_ServiceHost>’ and
JobId in (<list of Associated JobIds from the tbl_IndexingUnit>) and AgentId IS NULL
- [This step applies ONLY to TFS 2018 Update 2 and earlier]
- Open "%Program Files%\Microsoft Team Foundation Server 2018\Search\ES\%ESVersionFolder%\config\elasticsearch.yml"
- Insert a line : "action.auto_create_index": "false";
- Save the elasticsearch.yml file.
- Restart the ES service
- Run ResumeIndexing.ps1 script on TFS Configuration DB
- Run this script (pick from the correct TFS release folder) on each of the collections: https://github.com/Microsoft/Code-Search/blob/master/TFS_2018Update3/MissingIndexFolderTriggerCollectionIndexing.ps1
Try the last script on a smaller collection first (which has less number of repositories) so that you can verify that indexing happened correctly and the results are query-able.
Re-index at Collection level
This applies to scenario where the index configuration and health is good; however the search results are not as expected and you need to refresh the index for this specific collection (something in the lines of scenario  above).
There are two approaches to re-indexing here:
(A) Extension Uninstall and Install
- Uninstall the extension cleanly (Refer to the detailed guidance in the post here)
- Install the specific entity extension for the collection from the Local Gallery (http://<Server>/tfs/_gallery)
- [This step applies ONLY to TFS 2017 Update 3 and beyond]
- Verify the current status of the Account Fault-In Job which got triggered by the entity extension install is not continuously re-queueing itself for extended period of time (say, > 15min)
SELECT [StartTime], [Result], [ResultMessage]
FROM [Tfs_Configuration].[dbo].[tbl_JobHistory] as JobHistory
[Tfs_Configuration].[dbo].[tbl_ServiceHost] as ServiceHost
ON JobHistory.JobSource = ServiceHost.HostId
WHERE JobId = 'Entity-AccountFaultInJobId'
-- for Code = '02F271F3-0D40-4FA0-9328-C77EBCA59B6F'
-- for WorkItem = '03CEE4B8-ECC1-4E57-95CE-FA430FE0DBFB'
-- for WIKI = '27B11FD5-1DA5-48B4-A732-761CE99F5A5F'
and ResultMessage like '%Requeue the Account Fault-In job since Extension Uninstall sequence is still in progress%'
order by StartTime desc
If you continue to see a ResultMessage such as "Requeue the Account Fault-In job since Extension Uninstall sequence is still in progress", it implies the entry #\Service\ALMSearch\Settings\IsExtensionOperationInProgress\%EntityType%\Uninstalled was not reset correctly (where EntityType = Code, WorkItem or WIKI depending on the extension that was uninstalled in above step). Refer the uninstall extension post here on the mitigation to clean this up.
- Depending on the code/work item volume in the collection, the re-indexing will take it's time. To monitor the indexing progress, check the blog post here.
(B) Collection re-indexing through script.
- Run this script (pick from the correct TFS release folder): https://github.com/Microsoft/Code-Search/blob/master/TFS_2018Update3/TriggerCollectionIndexing.ps1
- To monitor the indexing progress, check the blog post here.
Re-index at Repository level
This applies to the scenario where the index configuration and health is good; however the search results are not as expected for some specific repository and you need to refresh the index for this repository (something in the lines of scenario  above). Currently this applies to Code Search only.
- Run this script (pick from the correct TFS release folder): https://github.com/Microsoft/Code-Search/blob/master/TFS_2018Update3/Re-IndexingCodeRepository.ps1
- To monitor the indexing progress, check the blog post here.
Couple of important points related to re-indexing of collection/repository -
- Bulk indexing is a costly operation. Depending on the volume of code/work item data in the collection, it might take from few minutes to the order of few days to complete. Hence, in case of search query returning stale data, it's advisable to wait for 12-24 hours for the indexer's scheduled patch operation to execute and auto-patch the index.
- For all scripts, do ensure you are picking up the correct version from the appropriate TFS release folder in GitHub.