Stop cherry-picking, start merging, Part 5: Exploiting the three-way merge


Last time, we answered some questions based on what we know about the recursive merge algorithm. Today, we'll answer questions based on what we know about the three-way merge algorithm.

After choosing a merge base (possibly by manufacturing one via the recursive merge algorithm), the three-way merge algorithm takes the three versions identified by the merge base, the source head commit, and the destination head commit. It then identifies the changes in the two head commits relative to the merge base and tries to reconcile them.

The important detail here is what doesn't participate in the merge: Everything else.

In particular, any commits leading up to the head commits have no effect. And you can take advantage of this when answering the next few questions.

What if I already made the fix in my feature branch by committing directly to it, rather than creating a patch branch? Can I create a patch branch retroactively?

Yes, you can create a patch branch retroactively. Suppose you are in this situation:

    apple
    M1   master
apple ↙︎
A
  ↖︎
    F1 F1a   feature
    apple   berry

Starting from a common commit A, you fork off a feature branch and commit a change F1. Meanwhile, the master branch commits a change M1. You then discover a terrible problem in the feature branch and apply an emergency fix F1a to the feature branch. Further investigation reveals that this terrible problem also exists in the master branch. How do you get the fix into the master branch without running the risk of a cherry-pick disaster?

Go ahead and create your patch branch like before, and merge it into both the master and feature branches.

    apple       berry
    M1 ← ← ← M2   master
apple ↙︎     berry ↙︎
A ← ← ← P       patch
  ↖︎       ↖︎
    F1 F1a F2   feature
    apple   berry   berry

We created a new branch called patch based on the common ancestor commit A, and cherry-picked our fix F1a to the patch branch as commit P. We then merged commit P into the master branch, and also into the feature branch, producing commits M2 and F2, respectively. The merge into the master branch as M2 propagates the fix to the master branch, and the merge into the feature branch as F2 has no code effect because the fix is already in the feature branch. However, the merge into the feature branch is a crucial step, because it establishes commit P as the new common ancestor.

Observe that as far as the three commits involved in the merge are concerned, everything look the same as if you had made the fix in the patch branch originally. The fix is in the patch branch and in the heads of the master and feature branches. The feature branch can continue making changes, possibly to the same file, and that will be correctly detected as a change in the feature branch.

From a merge-theoretical point of view, you can use your thumb and cover up commit F1a, because that commit doesn't participate in the three-way merge:

    apple       berry
    M1 ← ← ← M2   master
apple ↙︎     berry ↙︎
A ← ← ← P       patch
  ↖︎       ↖︎
    F1 ← ← ← F2   feature
    apple       berry

And then you see that this diagram is the same as the diagram we had when the change originated in the patch branch.

How can I verify that a merge carried no code change?

If you have committed the merge locally, then you can run local git commands to get your answer. If you just want a yes/no answer as to whether the most recent commit carried no code change, you can see whether the trees are the same.

git diff-tree HEAD

If there is no output, then the trees are the same, which means that there was no code change.

If you don't trust git diff-tree, you can compare the trees manually:

git rev-parse HEAD^{tree} HEAD~^{tree}

(If you are using cmd.exe, then you'll have to double the ^ characters because ^ is the command prompt's escape character.)

If you want to see the differences, you can use git diff HEAD~ HEAD to view them.

If you use an online service to manage pull requests, then you'll have to consult your online service's documentation to see if there's a way to preview the merge commit and diff it against the parent. (We'll pick up this topic in a future installment.)

What if I already made the fix in my feature branch by committing directly to it, and then I cherry-picked the change into the master branch? Can I create a patch branch retroactively?

Yes, you can still create a patch branch retroactively. This is just an extension of the case where you want to retroactively pull the commit back from the feature branch, except this time you're retroactively pulling the commit back from both branches:

    apple   berry   berry
    M1 M1a M2   master
apple ↙︎     berry ↙︎
A ← ← ← P       patch
  ↖︎       ↖︎
    F1 F1a F2   feature
    apple   berry   berry

The analysis is the same: The only commits that participate in the three-way merge are the common merge base P and the heads of the master and feature branches.

What if I already made the fix in my feature branch by committing directly to it, and then I cherry-picked the change into the master branch, and I already made further changes in both branches, including a conflicting change in my feature branch? Can I create a patch branch retroactively?

Yes, you can still create the patch branch retroactively, but you have to be a bit careful because you want the merge into the feature branch to contain no code changes; the merge is for bookkeeping purposes only.

    apple   berry   berry   berry
    M1 M1a M2 M3   master
apple ↙︎         berry ↙︎
A ← ← ← ← ← P       patch
  ↖︎           ↖︎
    F1 F1a F2 F3   feature
    apple   berry   cherry   cherry

From the initial common commit A, the feature branch makes an unrelated commit F1, then makes the fix F1a, and then makes a second commit F2 that alters the fix from berry to cherry. Meanwhile, the main branch makes an unrelated commit M1, then cherry-picks the fix M1a, and then makes another unrelated commit M2.

How do you connect the fix in the feature branch with its cherry-picked doppelgänger?

As before, create a patch branch from the common commit A and cherry-pick F1a into it. This is the fix that you want to be considered as existing in both the master and feature branches. Merge this branch into the master and feature branches, as usual. The merge into the master branch will go cleanly because the master branch hasn't made any changes that conflict with the fix. However, the merge into the feature branch will encounter a merge conflict because the feature branch continued and made a subsequent conflicting change F2.

When you get that merge conflict, specify that you want to keep the changes in the feature branch and ignore the changes in the patch branch. In other words, you want this to be a no-code-change merge. You can use the -s ours option to git merge to indicate that you want no code changes from the merge; you are doing this only for bookkeeping purposes.

I use an online service to manage pull requests. How can I force the online service to use the -s ours merge algorithm?

This is really a question for your online service. But let's suppose that your online service doesn't let you customize the merge algorithm. How can you force it anyway?

You can do it by pre-merging the result in your pull request. Note that this means that you will need two patch branches, one for each of the merge destinations.

    apple   berry   berry   berry
    M1 M1a M2 M3       master
apple ↙︎         berry ↙︎
A ← ← ← ← P           patch-master
  ↖︎           ↖︎ apple
  ↖︎             ~P       patch-feature
  ↖︎               ↖︎
    F1 F1a F2 ← ← ← F3   feature
    apple   berry   cherry       cherry

As is customary, we start with a common ancestor commit A. The feature branch makes an unrelated commit F1, and then applies an important bug fix as commit F1a. The master branch makes an unrelated change M1, and then cherry-picks the fix as commit M1a. Both branches make additional changes: In the master branch, an unrelated commit M2, and in the feature branch, a conflicting commit F2.

Now you want to retroactively connect the commit F1a with its cherry-pick commit M1a so that when the master and feature branches merge, you don't get a conflict (or worse, a silent revert).

We start as before and create a patch branch from the common ancestor commit A, and create a commit P that describes the commit that got cherry-picked. This branch merges cleanly into the master branch with the cherry-picked version M1a. However, this branch doesn't merge cleanly into the feature branch made a conflicting commit F2, and your online service service rejects the pull request due to the conflict.

To fix this, you need to make sure that the branch submitted to your online service has all the conflicts pre-resolved. Create a new patch-feature branch from the patch branch you used for the master branch, and in that patch-feature branch, revert commit P, producing commit ~P, so that the patch-feature branch shows no net code change relative to the common ancestor commit A.¹

Now that the patch-feature branch has no net change, it should merge cleanly into the feature branch. There was no code change in the payload, but the reason for the merge wasn't to pick up a code change; it was to connect the master and feature branches via the shared commit P, which becomes the new common ancestor for the future merge of the master and feature branches.

Conclusion

Okay, we saw the sorts of problems that cherry-picks can create, from merge conflicts (sometimes in unrelated branches) to silent reverts. In practice, people cherry-pick only because they don't have a better choice available. They would rather perform a partial merge but git doesn't support partial merges, so people feel that they have to cherry-pick. But I showed that partial merges are possible after all! You just have to think about the graph the right way: Instead of merging directly between branches, you create a helper branch that contains the partial content and merge the helper branch into the desired destinations.

As we saw when we explored the recursive merge algorithm, if you expect that your change will need to be cherry-picked to many other branches, you can stage a helper branch that is based on a commit far back enough in time that everybody who would be interested in cherry-picking the change will also have the commit your branch is based on. (In practice, this means going back to the commit that introduced the change that you are trying to patch.) If everybody merges from that helper branch rather than cherry-picking, then when all the branches merge together, the helper branch will contribute to the merge base, and that avoids the conflicts and other bad things.

My team applied the techniques in this series, and following the guidance herein reduced the number of conflicts in a single merge from over 1500 files to only 20. This changed an unmanageable merge to one that could be handled by contacting the person responsible for each conflict and asking them to resolve it.

(Note: This series is only half-over, even though I wrote a Conclusion. So don't worry: There's plenty of agony still to come.)

Footnote

¹ Another way to do this is to create a new branch named patch-feature from commit F2, and then perform a git merge -s ours patch-master to create a no-code-change merge from the patch-master branch. This results in a line from P2 to F2, which is harmless:

    apple   berry   berry   berry
    M1 M1a M2 M3       master
apple ↙︎         berry ↙︎
A ← ← ← ← P           patch-master
  ↖︎           ↖︎ cherry
  ↖︎             P2       patch-feature
  ↖︎           ↙︎   ↖︎
    F1 F1a F2 ← ← ← F3   feature
    apple   berry   cherry       cherry

If you want to get rid of the superfluous line, you could use the --squash option, but I would leave it because it makes it clearer what happened. (Otherwise, it will look like the patch branch made a huge commit.)

Personally, I would use git commit-tree to construct commit P2. I'll talk about the magical powers of git commit-tree at some unspecified future point.

However you created the patch-feature branch, you can then create a pull request from the patch-feature branch to the feature branch.

Comments (20)
  1. Yuri Khan says:

    To think through what contortions people are willing to go just to avoid rewriting history.

    My solution to the situation “while working on a feature F, I found a critical bug that is also in master” would be to stop working on F, branch off current master, fix the bug, submit the fix. Rebase F onto the fix, continue working. When ready to submit F, rebase it onto the newest master that hopefully includes the fix by that point.

    1. Aged .Net Guy says:

      That word “hopefully” doesn’t sound like robust engineering practice. In a scaled-up environment that means your rebasing proposed feature becomes event driven; awaiting the fix appearing in the tip of master. Assuming you can recognize it with all the other changes going on around it.

      I also wonder how scalable your approach is once there are not one feature branch but 25. Several of which have the same dependencies and may be growing new patches of their own.

      1. Joshua says:

        You didn’t think it all the way through. He rebased onto the pull request branch and continued developing. This only depends on Yuri’s bugfix being merged into master before his feature change.

        1. Rebasing the feature branch has its own problems. (1) It breaks bisect, since commits will lay down a working tree that was never tested; (2) it makes a big mess if the feature branch had taken merges. (3) It breaks all the PRs in your online service. (4) It throws your team into disarray because they all have to rebase their changes on top of your rebase. (5) It breaks data retention requirements because you no longer have an accurate copy of what was in the product at any particular point in time. For example, if you release a build out of that feature branch, and then you rebase the feature branch, you lose the source code that went into that build. Lawyers don’t like it when you are unable to prove things like this.

          1. Yuri Khan says:

            (1) Depends on your testing strategy. If you rebase all feature branches in the release, then test the whole release, all is good.
            (2) Yes. So you don’t merge into feature branches. You rebase onto whatever other code you depend on to develop, and again when that gets accepted.
            (3) There specifically exist plugins for web-based Git frontends to rebase PRs when the big green button gets pushed. (They also check that the resulting code builds and passes tests before actually merging the rebased branch.)
            (4) Small price to pay for a history that is reasonably easy to read. It’s nice to have an SVG rendering of a small part of the graph with symbolic commit names such as F2; in reality, what you have is “git log –graph –oneline –color –decorate –all” output which is not nearly as neat.
            (5) Rule of feature branch releases #1: Do not release from feature branches. Rule #2: if you do release from a feature branch, put a tag on it and push it.

            I do recognize that there are various trade-offs between a merge workflow and a rebase workflow.

          2. Suppose you rebase the feature branch, and you later discover a problem that it introduced when it merged into master. You can’t bisect the feature branch because its history doesn’t match reality. Disallowing merges into feature branches means that a feature that takes three months to develop will be developed against a three-month-old copy of master, which seems wrong. You might think throwing a team of 100 people into disarray is a small price to pay. I think those 100 people may disagree with you. And suppose you find a problem in your feature branch that wasn’t there yesterday. “What changes were made since yesterday? I can’t tell because we rebased this morning. Git log says the entire branch changed.”

          3. Joshua says:

            “What changes were made since yesterday? I can’t tell because we rebased this morning. Git log says the entire branch changed.”

            When there’s a merge conflict that I can’t trivially resolve I back out of the rebase. Therefore (rebase wasn’t backed out case), it’s a problem with master’s pull and we track it down. Every case I’ve seen of this could be reproduced in master, thus permitting fixing it.

          4. Then I guess you’re lucky. On any given day, you either pull from master or make local commits. You never have days where you, say, pull from master and make 200 local commits. And you always discover problems within 24 hours.

          5. Joshua says:

            I think it’s the no nontrivial rebase rule.

      2. Yuri Khan says:

        If the fix is not in master by the time the feature branch is ready to be accepted and the feature branch depends on it, then the fix is submitted as part of the feature branch.

        If the feature branch can be applied without the fix, okay, so be it. That bug was probably not so critical after all.

        How scalable? If it’s good enough for the Linux kernel, it’s probably good enough for me.

        1. Cesar says:

          > If it’s good enough for the Linux kernel, it’s probably good enough for me.

          The Linux kernel uses a merge workflow, not a rebase workflow. Linus himself said you should not rebase published branches “[…] This means: if you’re still in the “git rebase” phase, you don’t push it out. If it’s not ready, you send patches around, or use private git trees (just as a “patch series replacement”) that you don’t tell the public at large about.” (https://lwn.net/Articles/328436/)

  2. nathan_works says:

    Entirely unrelated to branching, but I do miss Raymond’s “unscientific NCAA bracket predictions” from years past.

    1. What do you mean unscientific? They were HIGHLY SCIENTIFIC.

    2. Brian says:

      They were wonderfully “highly scientific”, and I too miss them. Wait till next year, I guess.

      1. Al Go says:

        He’ll post predictions after the tourney is over. His scientific “predictions” will be 100% accurate.

  3. Peter Doubleday says:

    I cannot emphasise enough the empirical evidence here (which is presented almost bashfully, or if you prefer, cmdfully). Reducing merge conflicts in a large project from 1500 to 20 is a stupendous gain. In fact, it’s almost noticeable enough that a given set of management would see it, and mandate a day for the team(s) to go through these articles (and the following set) in order to remodel source control practices and workflow. Which almost all teams should do. I mean, honestly. Imagine both the productivity gains and the quality gains!

    On a parallel note, I’ve been contemplating a transference of these techniques to Perforce/Source Depot. They’re a centralised database, and not a distributed CVS, so the workflow and the history is a little different, but obviously the graph theory remains the same.

    I know it’s a lot to ask, but could you add a summary article (at your leisure, of course) to discuss the best practices for a couple of popular non-DVCS products when handling cherry-picking and selecting the optimum root ancestor for the patch branches?

    Even “Kindly Uncle Raymond” advice would probably help thousands of IT shops out there.

    1. Brian says:

      I’d like to add TFS source control to that list/request.

    2. osexpert says:

      “reduced the number of conflicts in a single merge from over 1500 files to only 20”
      Not 1500 conflicts, 1500 files. He did not say how many conflicts there was originally.

      1. Counting conflicts is imprecise. Suppose there are two conflict blocks. Are they really two separate conflicts or one large conflict? (Because one of the “unchanged” lines is a blank line, say.)

  4. Richard says:

    This is what I believed git’s “Cherry pick” actually did.

    Do you have any idea why it doesn’t?

Comments are closed.

Skip to main content