Test Management Warehouse Adapter – General issues and their resolutions

TFS warehouse is a critical component of On-Prem TFS reporting stack as it enables reporting based on common dimensions across different stages in the Team Project lifecycle. Data in the TFS warehouse flows from collection databases through a set of adapters.

In TFS 2015 Update 3, we fixed key performance issues in Test Management Warehouse Adapter to make it more reliable and fast. However, SQL performance depends a lot on the data shape and size of the store and you could still face issues when the warehouse is updated. Such issues generally surface during warehouse rebuild or if the adapter hasn’t run for a long period and a lot of data has to be processed into warehouse.

The focus of this blog post is on diagnosing general timeout issues with Test Management Warehouse adapter and on their resolutions.

Note that while the resolutions mentioned in this blog post will help fix the adapter timeout issues, it is strongly recommended to avoid the problem itself by getting rid of unwanted\old test management data with help of test results data retention which is available in TFS 2015 RTM and newer versions.

Note: Test Management Warehouse adapter works on SQL Server Rowversion logic. Since last sync, if there is any change in a test management record in collection database, it will sync the change to warehouse. In TFS 2017 RTM, the schema of multiple test management tables has been revamped to optimize on storage space and performance. During upgrade to TFS 2017 RTM, the existing test management data gets migrated to the new schema causing change in the rowversion associated with almost all the test management records in collection databases. Test management adapter sees this as a new change and tries to sync all the records again to warehouse which can be time consuming. At present no warning is shown to the user during pre-upgrade steps about this behavior. It will be fixed in newer versions of TFS. To avoid this it is recommended to sync warehouse fully before starting the upgrade. This scenario has been handled and will not cause migrated rows to sync again to warehouse.

Common timeout issues with test management adapter & their resolutions

Test artifacts get synced to warehouse in batches. Batch size for a test artifact is configurable and can be increased (or decreased) to process more (or less) data in a single iteration. While processing more data in a single batch is desirable as it will reduce the number of iterations required to sync whole data and hence should enhance adapter’s performance, it may not turn out that way as higher batch size will increase SQL transaction size which in turn will consume more CPU and memory thus leading to thrashing and overall system performance degradation. On the other hand having a very small batch size will increase the number of iterations required to process whole data which can be very time consuming. Thus batch size should be set to an optimal value which will cause less iterations and also keep memory and CPU requirements under check.

For test management adapter, timeout generally occurs when current batch of data under processing is leading to large SQL transaction size. Workaround is to reduce appropriate batch size temporarily till adapter processes the data fully once. After that, batch size can be reset back to the default value.

Batch sizes for different test artifacts are stored as key value pairs in warehouse database. Batch size key is different for each test artifact. Appropriate batch size key needs to be tweaked depending upon the artifact being processed during timeout.

For a batch size key, selection of appropriate batch size value is an iterative process. It is recommended to reduce this value by an order of 10 at a time and retry test management adapter processing till it succeeds. E.g.: 1000, 100, 10, 1. Note that test management adapter batch size settings will not affect other adapters in any way.

Thus test management adapter timeout issues resolution is a two-step process:

  1. Select appropriate batch size key to be reduced: Can be determined by looking at the stack trace. Check section ‘Test Management adapter timeout scenarios’ for details
  2. Set batch size value: Check section ‘SQL queries to set test management adapter batch size value’ for details

Note: During adapter timeout, following error message is received:

Exception Message: TF246018: The database operation exceeded the timeout limit and has been cancelled. Verify that the parameters of the operation are correct. (type DatabaseOperationTimeoutException)

Steps to obtain error message and stack trace are available in attached file:

DiagnosticInformation.zip

Test Management adapter timeout scenarios

1. Timeout during test results processing

Typical stack trace:

at Microsoft.TeamFoundation.TestManagement.Warehouse.WarehouseResultDatabase.QueryTestResults(SqlBinary watermark, Int32 limit, ProcessRowCallback resetCallback, ProcessMappingDataCallback dataCallback, ResolveIdentities resolveIdentitiesCallBack) …Resolution: Reduce test results processing batch size (default = 2000)

at Microsoft.TeamFoundation.TestManagement.Warehouse.TeamTestWarehouseAdapter.QueryForResults(WarehouseResultDatabase wrd, SqlBinary waterMark, Int32 limit)

Batch Size Key: /Adapter/Limit/TestManagement/FactTestResult

2. Timeout during test point processing

Typical stack trace:

at Microsoft.TeamFoundation.TestManagement.Warehouse.WarehouseResultDatabase.QueryTestPointData(SqlBinary watermark, SqlBinary endWatermark, int limit,                                     IEnumerable<IdToAreaIteration> areaIterationMap, ProcessRowCallback deletedCallback, ProcessMappingDataCallback addedCallback, ResolveIdentities resolveIdentitiesCallBack)

at Microsoft.TeamFoundation.TestManagement.Warehouse.TeamTestWarehouseAdapter.QueryForTestPoints(WarehouseResultDatabase wrd, SqlBinary waterMark, Int32 limit)

Resolution: Reduce test point processing batch size (default = 10000)

Batch Size Key: /Adapter/Limit/TestManagement/FactTestPoint

3.  Timeout while processing deletes

Typical stack trace:

... Microsoft.TeamFoundation.Warehouse.WarehouseDataAccessComponent.DestroyResults(String projectId String data ResultDeletionFormat format Int32 limit)   at Microsoft.TeamFoundation.TestManagement.Warehouse.TeamTestWarehouseAdapter.ProcessRunDeletes(IWarehouseDataAccessComponent dac ObjectTypes objectType)   at Microsoft.TeamFoundation.TestManagement.Warehouse.TeamTestWarehouseAdapter.DeleteTcmObject(IWarehouseDataAccessComponent dac ObjectTypes objectType Boolean deleteResults)

Resolution: Reduce test run deletion batch size (default = 10)

Batch Size Key: Adapter/Config/TestManagement/RunDeleteBatchSize

SQL queries to set test management adapter batch size value

Following are the set of queries to achieve this. Execute them against the warehouse database.

Note: These settings will not affect ongoing test management warehouse job processing if any. They will come into effect from next invocation of the adapter.

DECLARE @PropertyScope NVARCHAR(256) = NULL

–-NULL implies that property is applicable for all collections.

--SET it to collection GUID for a particular collection.

--To obtain Collection GUID, run following query against TFS Configuration database:

--SELECT HostId as CollectionGuid, Name as CollectionName FROM tbl_ServiceHost

DECLARE @BatchSizeKey NVARCHAR(256) = <Select appropriate batch size key. Check section Test Management adapter timeout scenarios for details>

DECLARE @BatchSize INT = 0

-- Dump current settings for backup (Save this for reverting back later)

-- If query returns 0 rows, it means default value is in use. Hence at the end  

-- this key can be deleted to restore the default setting

EXEC [dbo].[prc_PropertyBag_Get] @Property_Scope = NULL, @Property_Key = @BatchSizeKey, @Property_Value = @BatchSize OUTPUT

SELECT @BatchSize

SET @BatchSize INT = 100 --Set batch size to an appropriate value lower than the default.

--It is recommended to reduce this by an order of 10 at a time and retry completing the

--test management adapter processing. E.g.: 1000, 100, 10, 1

--Update batch size for the job step

EXEC [dbo].[prc_PropertyBag_Set] @Property_Scope = NULL, @Property_Key = @BatchSizeKey, @Property_Value = @BatchSize

-- DO THIS AFTER RESOLVING ISSUE:

-- After the warehouse full processing is complete, subsequent processing is incremental   -- only. We can now revert batch size to factory settings (run this if all is well)

-- Run this query on the warehouse to see the batch size configurations potentially set --- as workaround

EXEC [dbo].[prc_PropertyBag_Get] @Property_Scope = NULL, @Property_Key = @BatchSizeKey, @Property_Value = @BatchSize OUTPUT

SELECT @BatchSize

-- For the ones identified in the last query, use the query below to delete them and

-- reset to factory settings.

EXEC [dbo].[prc_PropertyBag_Delete] @Property_Scope = NULL, @Property_Key = @BatchSizeKey

Tip: Supported only in TFS 2015 Update 3 and newer versions

In case there is no requirement to report on code coverage data, then it can be turned off (default is ON) completely in the test adapter by executing following query against warehouse database:

INSERT INTO _PropertyBag

VALUES(NULL, '/Adapter/Config/TestManagement/CodeCoverageProcessingEnabled', 'false') –‘true’ to enable

Above query will disable code coverage processing for all the collections. To disable it for specific collections, run following query for each collection:

INSERT INTO _PropertyBag

VALUES(‘Collection Guid’, '/Adapter/Config/TestManagement/CodeCoverageProcessingEnabled', 'false') --‘true’ to enable

Collection Guid can be obtained by running following query against TFS Configuration database:

SELECT * FROM tbl_ServiceHost

Value of ‘HostId’ column is the collection GUID for a collection

Note that, running above queries will not affect code coverage data in the collection databases. It will simply stop code coverage data sync in the warehouse. It can be turned on any time by executing above queries with value: ‘true’.

Also note that these settings will not affect ongoing test management warehouse job processing if any. They will come into effect from next invocation of the adapter.

For any other issues with test management warehouse adapter, capture diagnostic data as mentioned in attached file DiagnosticInformation.zip and contact Microsoft support team.

Important points regarding warehouse rebuild

  1. In several warehouse issues reported by users, we found that during event of a warehouse job failure, they trigger warehouse rebuild hoping that it will resolve the problem. While it might work but it is not a recommended solution as it can take considerable time to sync whole data and the job can fail again with same problem. In such cases if it is a timeout issue with test management warehouse adapter then follow the resolution mentioned in this blog else reach out to Microsoft support team.
  2. It is often a confusion on whether to rebuild warehouse during TFS upgrade - whether to an RTM version or to an update. Note that if a rebuild is required then it gets triggered  automatically during upgrade. User doesn’t need to initiate it explicitly.

Written and Reviewed by: Shyam Gupta diagnosticinformation