Writing Version Control Migration Tools - Handling Namespace Conflicts

Migrating from one version control system to another is tough. I don’t care what the internet forums are saying or what Joe from down the hall told you. It’s hard. Very hard. Deceptively hard.

The obvious algorithm looks trivial:

FOR EACH Changeset CS in History DO

     FOR EACH Change C in Changeset CS DO

           SWITCH C.Action

           CASE ADD:

                DownloadFromSource(C)

                PendAddOnTarget(C)

                BREAK

           CASE EDIT:

                DownloadFromSource(C)

                PendEditOnTarget(C)

                BREAK

           ... and so on ...

     NEXT

NEXT

Piece of cake. I’ll have it ready by noon.

The problem is that this algorithm falls over pretty quickly when presented with even relatively trivial changesets.

Over the next few posts I’ll present a few examples of changes that cause that algorithm to fall over and what some of the options are. These are not pathological cases that never happen in the real world. These are real examples that I see “in the wild” very frequently.

We’ll start with a simple change that has a namespace collision (i.e. the same namespace, “foo” in this case, is involved in two different operations in a single changeset). Foo is renamed to “bar” and then a new item named “foo” is added.

rename foo bar

add foo

In the above presented algorithm there is no guarantee that the rename of foo to bar would occur before the add of foo. If it does not occur first then the add will fail since there is already an item in that namespace.

Further the target system needs to support this. Imagine you were mirroring TFS to Perforce and had that sequence of operations. The Perforce equivalent would be:

 

integrate foo bar

delete foo

add foo

 

See the problem? Perforce doesn’t allow pending an add where you already have a delete. So you need to start making decisions. Do you do something like:

 

integrate foo bar

edit foo

 

Well – that’s nice only now you have an integration relationship between two otherwise unrelated items. How about this:

add bar

edit foo

 

Ok – that works. Only now you’ve lost the history that says that bar used to be foo. Maybe the answer is multiple changesets:

 

integrate foo bar

delete foo

submit

add foo

submit

 

Alright – but now you’ve split a single source operation into two target operations. Is that acceptable? If it is – what if the process crashed after the first submit. Will the migration tool have enough context when it is restarted to finish the second change? What if a length of time passed between the two submits and another foo was added? What should the checkin comment for the first and second changes be? Should the first indicate that there will be a second? Should the second indicate that it is part of a multiple checkin sequence? What if there are hundreds of files like this? Does each get its own checkin? Can you group them?

So what is the bottom line here? Two things:

1) When migrating there may be an implied order to the operations that must be satisfied for the change to work.

2) System feature parity makes migrating much more complex.

Next time: cycles – what they are, how to identify them and what you can do about them.