Writing Version Control Migration Tools - Handling Namespace Cycles

Last time I discussed the issues surrounding namespace conflicts when designing a version control migration tool. The basic lesson was that there are times when operations must be executed in a specific order for them to migrate properly. But what if there are namespace collisions for which there is no order that resolves the collision?

Then we must be talking about operational cycles.

Consider the following scenario.

There exist two items in the VC system:

$/Project/Item1.txt

$/Project/Item2.txt

 

On these two items the following operations occurred:

 

> tf rename $/Project/Item1.txt $/Project/temp.txt

> tf rename $/Project/Item2.txt $/Project/Item1.txt

> tf rename $/Project/temp.txt $/Project/Item2.txt

> tf checkin /i

 

When this is checked in there are two pending changes and the end result is:

 

rename $/Project/Item1.txt (was $/Project/Item2.txt)

rename $/Project/Item2.txt (was $/Project/Item1.txt)

 

Think back on the algorithm we looked at in the previous post. We look at each change item and attempt to migrate it in place. In this case we would attempt to pend a rename from item1 to item2 but that would fail because item2 already exists.

So what do we do?

We need to identify and resolve the cycle.

Identifying The Cycle

Identifying a cycle means you need to look at every item’s source and target name (i.e. Item1.txt and Item2.txt). For simple actions like add and rename they will be the same – but if the action is complex such as a rename or merge they will differ.

So for each item you need to compare if its target name is the same as some other actions source item. If it is then the action whose source name conflicts needs to happen before the action whose target name conflicts.

So using the example from the previous post:

rename foo bar

add foo

 

“foo” is the target action of the add and the source action of the rename. So the rename must happen first. Since the add has the same source and target name it cannot conflict and is therefore the end of the conflict chain.

So identifying a cycle means that you need to figure out which items are involved in the cycle and which ones can break it. If none can break it then you need to add intermediate actions to do the work for you.

 

This picture represents the 2 item cycle of renames described earlier (Item1 and Item2):

 

2 Item Cycle

 

Identifying this cycle is pretty straight forward. First we look at Item1 and see that its source name is “Item1” and its target name is “Item2”. Then we look at Item2 and see that its source name is “Item2” and its target name is “Item1”. So we have identified a two item cycle.

Breaking the Cycle

Resolving a 2 item cycle is straight forward. You simply inject a third operation between the two that collide.

The following image demonstrates this:

 

Breaking a 2 Item Cycle

 

It also demonstrates my decidedly poor artistic talent (and my ability to (ab)use Power Point).

But don’t go thinking that every cycle is a two item cycle. Any cycle can be N items long – here is an example of a 4 item cycle

 

Example of a 4 Item Cycle

 

As I used up the extent of my artistic talent on the first three pictures I don’t have an image showing the breaking of a four item cycle. But it’s sufficient to say that breaking the four item cycle is just a matter of identifying multiple two item cycles and breaking each one.

So what’s the bottom line here?

 

1) Designing to handle cycles must happen early in the migration design process as it is fundamental to success.

2) If you can identify and break cycle conflicts you can identify and resolve non-cyclic conflicts using the same algorithm.

3) Cycles can have N nodes long so be prepared for the worst (but rarely are cycles more than 2 nodes long).

4) Breaking an N node cycle is just the repeated process of identifying and breaking 2 node cycles.

5) Breaking cycles means you must be able to pend actions on the target system that aren’t in the formal record of the source system (i.e. two recorded rename operations become three executed rename operations on the target system).