A case study on TFS identity replication

(Chinese version

A couple of days ago I worked on a case about TFS identity/group membership problems. There are two problems involved in the single case.

Problem 1: I tried to upload a customized work item type definition to a team project, but it failed and give an error message: "TF201072: A user or group could not be found. Verify that the users and groups used in your work item type definition have been added to the Team Foundation Server".

As you can imagine, we checked the team project settings->group membership and found that things look pretty good: the groups were there with the expected members. The work item type definition was ok, nothing special and it could be uploaded successfully to another TFS server.  So what’s wrong?

I have seen this error message many times. Commonly, it is found in a failed project creation log, occurring when TFS validates a work item type. When a customized template is being uploaded, TFS also validates it first. This error message actually means the user or group exists in the TFS main identity storage (TFSIntegration), but not the identity storage in the work item tracking database(TFSWorkitemtracking).

To verify this point, We ran two SQL queries against the TFS databases:

select max(sequence_id) as GSSMaxIdentitySeqID, max(last_update) as GSSLastIdentityUpdateTime from TfsIntegration..tbl_security_identity_cache

select max(seqid) as WITMaxIdentitySeqID, max(LastSyncUTC) as WITLastIdentityUpdateTime from TfsWorkItemTracking..ADObjects

The results showed that the TfsIntegration..tbl_security_identity_cache was up to date , while the time stamp returned from the second query(TfsWorkItemTracking..ADObjects) was from a long time ago. This exposed the problem: the work item tracking identity cache was not synchronized with the main identity store.

For historical reasons work item tracking has its own identity storage. It replicates the users and groups data from the main identity storage,but when and how does this happen? Well when there is a main identity storage change in the TFSIntegration, TFS system raises a “DataChangedEvent”, and work item tracking subscribes to this event. When the event is raised TFS calls a work item tracking web service to do the synchronization.

The subscription is recorded in table TFSIntegration..tblsubscription. Check this table--if your TFS is working properly you should find at least 4 records like this one. These 4 records are created when TFS is installed, and are used by TFS itself.

 

tblsubscription

Now back to the case. The two identity storages were not synchronized. The synchronization might have failed , or it might not have happened at all. If the synchronization failed, there should be corresponding error records in the application event log. I have checked the event log already and found no error, so the synchronization had not happened at all.

Getting this point, we opened table TFSIntegration..tblsubscription and found all 4 built-in records were missing for some unknown reason. We manually made up them and this solved the problem. The complete steps are:

1. Manually add the 4 missing records into the TFSIntegration..tblsubscription. The subscriber should be the TFS service account’s SID. Use a web browser to ensure the addresses are indeed accessible;

2. Reset IIS;

3. Make an innocuous identity change (e.g. create then delete a TFS group). This changes the main identity store, raises the “DataChangedEvent”, and triggered the synchronization to work item tracking database via event subscription.

4. Run the two SQL Queries again. This time, the two timestamps should be very close to each other. If this is not the case, check the application event log to see if the synchronization failed for some reason.

After the 4 steps, we were able to upload the customized work item template successfully.

Yet, at that moment, the identity issues are not completely solved --

problem 2: 2 months ago I created a group in AD and added my team members into this group. Then I made this AD group contributors of a team project. This worked fine. Last month two developers joined my team. I added their accounts into the AD group, but till now we have been unable to assign any work item to the new members--they are not in the "assign to" dropdown list.

I looked into the service management console on the TFS AT server, and found the "Microsoft Team Foundation Server Task Scheduler" service was not running. I manually started this service, waited for some time and checked work item assigning again. This time, the new members appears in the dropdown list.

https://msdn.microsoft.com/en-us/library/ms252473.aspx explains “Synchronization of Group Identities Between Active Directory and Team Foundation Server” :In deployments where Team Foundation Server is running in an Active Directory domain, group and identity information is synchronized when any of the following events occur:.

· The application-tier server for Team Foundation starts.

· An Active Directory group is added to a group in Team Foundation Server.

· The amount of time specified in the web.config file elapses. (The default is 1 hour.)

"Microsoft Team Foundation Server Task Scheduler" service takes care of the 3rd point. It triggers the synchronization periodically. On a TFS application tier server this service should be configured as automatically start, and should keep running all the way. Otherwise the AD change might not be reflected to the TFS system as expected.

 

Summary:  

From a single case we witnessed the process that the AD user and group information is replicated to TFS main identity store first (TFSIntegration), then to the work item tracking database. A hint for troubleshooting is, when the information is found to be not synchronized, look into the AT server application event log to see if there is any synchronization failure first. If nothing is there, the next step is to check if the synchronization did not happen at all.

 ==========

I got the SQL queries in problem 1 from this MSDN forum thread which is about another main identity storage->WITidentity cache synchronization issue.