Stop cherry-picking, start merging, Part 1: The merge conflict

Cherry-picking is a common operation in git, and it's not a good idea. Sometimes it's a neutral idea, but I haven't yet found a case where it's actually good.

This is the start of a series that will begin by explaining why cherry-picking is bad, continue by explaining why cherry-picking is worse, then try to talk you down from the ledge by showing how to get the same effect as a cherry-pick but with merging, showing how to apply that technique to the case where you need to do a retroactive merge, and wrap up by showing how to apply that technique to the case where you already made the mistake of cherry-picking and want to fix it before something bad or worse happens.

It's a tall order, but I'm been meaning to write this up for a while, and what's gotta get done gotta get done.

In order to cherry-pick, you need two branches, one to be the donor and one to be the recipient. Let's call them the master branch and the feature branch. And for simplicity's sake, let's say that the commit being cherry-picked is a one-line change to a single file. Each commit will be annotated with the contents of that one line.

apple   apple       berry
A M1 ← ← ← M2   master
    F1 F2       feature
    apple   berry

For the purpose of illustration, I'm using a dotted line to denote cherry-picks. This dotted line doesn't really exist in the repo, but I'm drawing it to help express the chronology. (Eventually, I'll stop drawing dotted lines, too.)

You have some common ancestor A, and in the commit, the line in question is the word apple. From that common ancestor, the two branches diverge: Commit F1 happens on the feature branch, and commit M1 happens on the master branch. These changes don't affect the line in question, so it still says apple. You then make some commit F2 in the feature branch that changes the line in question from apple to berry, and you cherry-pick commit F2 into the master branch as M2.

So far, nothing exciting is happening.

Time passes, more commits occur, and your commit graph looks like this:

apple   apple       berry   berry
A M1 ← ← ← M2 M3   master
    F1 F2 ← ← ← F3   feature
    apple   berry       berry

You made another commit M3 to the master branch and another commit F3 to the feature branch. Neither of these commits affected the line in question, so the line is still the word berry.

It's time to merge back, and since the line in question is the same in both branches, the merge is trivial, and the result in the final merged result is berry.

apple   apple       berry   berry   berry
A M1 ← ← ← M2 M3 M4   master
    F1 F2 ← ← ← F3       feature
    apple   berry       berry

This is the ideal case.

It is also relatively uncommon in an active code base.

Consider this alternate timeline: After the cherry-pick, additional commits M3 to the master branch and F3 to the feature branch are made, but this time commit F3 changes the line in question to cherry. This could be because the person who made the original commit F2 found an improvement (cherries are on sale right now), or maybe they made a larger change that happened to require switching from berries to cherries.

Whatever the reason, the commit graph now looks like this:

apple   apple       berry   berry
A M1 ← ← ← M2 M3 💥   master
    F1 F2 ← ← ← F3       feature
    apple   berry       cherry

This time, when it's time to merge the feature branch back into the master branch, there is a merge conflict. The base of the three-way merge contains apple, the incoming feature branch has cherry and the existing master branch has berry.

<<<<<<< HEAD (master)
||||||| merged common ancestors
>>>>>>> feature

The conflict occurred because the cherry-picked changes were subsequently changed again by one of the branches. We've been using dotted lines in our diagrams to emphasize that the cherry-pick relationship is all in our heads, and not actually recorded anywhere in the commit graph.

You have to hope that whoever resolves this merge conflict remembers the history of this line, or can access the team's knowledge of this line of code to understand that the correct resolution it to accept the changes in the feature branch rather than the one in the master branch.

In this case, there haven't been many changes, and there are only two branches involved, and hopefully there aren't too many other conflicts in the merge (so that the person resolving the merge hasn't gotten tired and burnt out), so the chance of a correct resolution are pretty good. But consider this three-branch scenario:

apple   apple   berry   berry
A ← ← ← M1 ← ← ← M2 ← ← ← M3       master
  ↖︎ apple   apple   apple   cherry ↖︎
    V1 ← ← ← V2 ← ← ← V3 V4 💥   victim
  ↖︎     ↙︎
  F1 F2 ← ← ← F3   feature
  apple   berry   cherry

Start with a commit A, where the line in question is apple. We create a branch based on commit A, ominously named victim, and add a commit called V1, which doesn't affect the line in question, so it still is apple. From the victim branch we create our feature branch from commit V1, and then the story is the same: To the feature branch, we add the same commit F1 from before, which doesn't affect the line in question, so it continues to be apple. Meanwhile, the master branch added a commit M1 which doesn't affect the line in question.

We continue as before: The feature branch adds a commit F2 which changes the line in question to berry, and the master branch cherry-picks that commit as M2. The feature branch makes another change F3 which happens to update the line in question from berry to cherry, while the master branch adds a commit M3 that doesn't change the line in question, so it remains berry.

All through this, the victim branch is blithely unaware of the cherry-picking disaster being created by the feature and master branches. It commits changes V2 and V3 which have nothing to do with the line in question, so the line is still apple.

Eventually, the feature branch merges its changes back into the victim branch, producing commit V4, where the line in question is now cherry, thanks to the changes that were made in the feature branch.

The time bomb has now moved into the victim branch.

The victim branch decides to take a merge from the master branch, and that is where the conflict is detected, because this is the first time the original change F2 encounters its cherry-picked doppelgänger M2. The poor person stuck with this merge conflict has no idea of the deal with the devil struck by the feature and master branches behind his back. Furthermore, the person stuck with this merge conflict may be exhausted from dealing with all the other (valid) conflicts caused by the merge from the master branch and may not have the mental energy to reverse-engineer how the two branches ended up the way they did and figure out which side is right.

Basically, when you cherry-pick a commit, you now have two copies of the commit sitting in the graph. Any lines of code affected by that commit must remain untouched in both branches until the two copies of the commit finally merge. If either branch modifies any line touched by the cherry-pick, then that creates a powderkeg that can sit quietly indefinitely. It is at the time somebody tries to merge the two commits together that the explosion occurs, and that point could be in a faraway place not immediately related to the branches involved in the cherry-pick. This means that the person trying to resolve the merge was never part of the cherry-pick madness and may not know who to talk to in order to figure out what happened.

Okay, that was a long story, and you probably knew most of it already, but believe it or not, as bad as this is, it could get even worse: The explosion might not happen.

Wait, why is it worse that an explosion doesn't happen? We'll pick this up next time.

Comments (39)

  1. kantos says:

    I’ve always seen the need to cherry pick as a symptom of bad planning or branch discipline. If you have a F1 that is the parent of F1.1 and F1.2 it should have it’s own branch. That way in case F1.1 isn’t ready to ship but F1 and F1.2 are, then you merge F1 and F1.2 and call it done. That said you don’t always know that’s the case when you start, but again that seems like someone should have spent a bit more time on a whiteboard and less time in an IDE in that case.

    1. Joshua says:

      We use cherry picking to get bugfixes from master to release branches. (Or sometimes from the release branches into master). A cherry-pick from a feature branch to master is ridiculously rare.

      1. Merging a spot fix from the master into the feature branch means that you will also pick up changes you may not be ready for yet. When the master branch takes commits to thousands of files daily, that’s a heavy payload for what should be a one-line fix. “Why is everything broken?” “Oh, I merged ten thousand files from master in order to pick up a one-line fix.”

        1. Guillaume Davion says:

          When you cherry pick, you only take one specific commit, not the whole history leading to it.

          So if there is one specific commit for a bug fix, you will only take the changes in it, so not 10k files, just the ones affected by the bugfix.
          If the developers do commits affecting the whole repository for each bugfix, that’s an other sorry obviously.

          1. Sorry, my reply was to the wrong comment. I meant to reply to EvilKiru who said “You should always merge from the master branch to the feature branch, never cherry-pick.”

  2. EvilKiru says:

    Or you could prevent this scenario by only allowing merges into the feature branch. The rule where I work is that if you find a bug in the feature branch, you go fix it in the master branch, then merge the fix into the feature branch.

    1. Marvy says:

      Raymond merged his comment into the wrong thread, have to pick it out:

  3. Tim says:

    I agree that cherry picking is a bad idea for *copying* commits but do see it as useful for *moving* commits. There’s been a couple times I’ve been bouncing between branches for merging or reviews then forgotten to switch back to my feature branch when I resume work. In that scenario I’ve cherry picked to get the commit on the right branch then rebased to delete the commit.

    1. pete says:

      No-one is suggesting that cherry-picking is a bad feature for branch manipulation. It’s just best done in private, before pushing.

  4. Jason says:

    I have nothing to add, but I do have a question: Why are all your arrows pointing backwards, into the past?

    1. Because git commits are a DAG which point to one or more previous commits.

      1. That’s answering the question without solving the problem. Basically, the answer is “Tradition.” It’s traditional in git to draw the commit graph rather than a timeline. Yes, it takes getting used to.

        1. pmbAustin says:

          I don’t think it will ever not look wrong to me. Or make any sense at all to me.

        2. Isn’t that the difference between diagramming the user flow versus the data flow? User flow would have the arrows point towards HEAD, but data flow would have them point towards initial commit. Also I might be making up terms at this point.

  5. Antonio Rodríguez says:

    Worse than an explosion? Worse than failure, as in “The Daily WTF”. I’d bet that is when GIT silently resolves the merge without raising any conflict (and thus, leaving you without the ability to solve it). This case could be so bad that it involves a build break whose cause is “in a galaxy far far away”. And those are fun to solve :-( .

  6. BZ says:

    Our team has a dedicated dev and release branch for our project. Any new code goes into the dev branch (well actually it is merged into the dev branch from a ticket branch). Once that issue is QA’d we (each developer) cherry pick those changes to the release branch *assuming it’s approved for release*. This last assumption means that the dev and release branches are rarely if ever in sync.

    That branch is regression tested and merged into master for release. Then master is back-merged into the dev and release branches. This is done by one individual who probably wasn’t handling any of the cherry picks.

    Other teams have their own similar setups sharing the same master branch and release schedule. Usually teams don’t touch each other’s code very often, but it does happen. Also each branch has config files that are not to be merged or cherry-picked ever (because they reference binary versions of other components undergoing the same process in their own repositories).

    Somehow this has worked reasonably well for us, but I’m waiting to see how it could be improved.

  7. pmbAustin says:

    I’m not a git user… can someone explain to me why the arrows are going the ‘wrong’ way? And why pushing changes up to the main branch is called a ‘pull request’? I’m having a really hard time with git because everything seems completely BACKWARDS to me, and seeing the diagrams above just emphasizes it, because I can’t reconcile in my brain what you’re saying with what the picture is showing… the picture is completely backwards. Change the arrows to point the other way and it makes sense. A common ancestor A is branched, so the arrow should go TOWARDS THE BRANCH, not backwards back to the ancestor. I’m so confused by git and its bizarre terminology…

    1. Cesar says:

      > can someone explain to me why the arrows are going the ‘wrong’ way?

      This better reflects git’s internal data structures: every commit contains the hash of its parent(s). Most explanations show them pointing the other way, to reflect the temporal flow of the changes, but Raymond is probably going into some low-level explanation of the data structures later in the series.

      > And why pushing changes up to the main branch is called a ‘pull request’?

      Because you aren’t pushing changes up to the main branch. Instead, you’re recording a request on some system for whoever manages the main branch to pull your changes into the main branch. Traditionally, this was/is done by sending an email to the maintainer of the main branch, with a link to your branch; the maintainer then pulls (fetch+merge) your branch into their branch.

      1. pmbAustin says:

        Thanks, that helps. But I still don’t think I’ll ever wrap my brain around Git terminology… everything just seems backwards and odd and off. My mental model of source control just doesn’t map to it at all, and it’s like I’m constantly grinding my mental clutch every time I try.

        1. d-coder says:

          You’re not the only one.

  8. pete says:

    Thank you Raymond. Please make as much noise about this as you can! I don’t understand people who maintain a beautiful class hierarchy with nicely factored code, then litter their development history with cherry-picks like socks on a teenager’s bedroom floor. Put it in the right place the first time around! If you get it wrong, cherry-pick your way out of your mistake and be ashamed – don’t make it part of your process.

  9. Zarat says:

    Can’t wait to see what you are going to do with merging to replace cherry picking. I’ve been using TFS for a long time and the ability to merge individual commits instead of whole branches is something I really miss when having to work with git.

    1. Richard says:

      Merging a single commit is cherry-picking.

      That’s what it means.
      By definition it partially breaks the history of the repo, because it causes some files to have a different path through history than others.

      1. Merging a single commit is not the same as cherry-picking. Just draw the graph.

  10. Marvy says:

    If you’re doing a series, you may want some tags beyond “other”.

    1. I have only four categories, and I don’t usually tag series.

  11. Kevin says:

    > It is at the time somebody tries to merge the two commits together that the explosion occurs, and that point could be in a faraway place not immediately related to the branches involved in the cherry-pick. This means that the person trying to resolve the merge was never part of the cherry-pick madness and may not know who to talk to in order to figure out what happened.

    So I’ve been thinking through this for a while now, and I think I see where our difference of opinion is arising.

    At Microsoft, we know[1] they develop primarily on feature branches, and merge stuff inwards into the trunk as it stabilizes. But many places do the exact opposite: they develop primarily on the trunk and merge stuff outwards into release branches as it stabilizes. There is a critical difference between these approaches: Merging inwards results in a lot of code originating from different branches landing in the same place, while merging outwards results in the same code landing in lots of different places.

    Proposition: Cherry-picking is evil when you cherry-pick inwards (onto a branch that accepts incoming merges from many different branches, or a branch where active development occurs), but not when you cherry-pick outwards (onto a branch that only accepts merges from a single other branch, the same branch that you cherry-picked from, and which is otherwise inert). Discuss? Counterexamples?


    1. Git doesn’t understand “directions”. All branches are equivalent in git’s eyes. Just reverse the labels “master” and “feature” in the diagram, and you have the same situation. A change is made in the master branch, it is cherry-picked into the feature branch, and then a subsequent change is made in the master branch, and then the two branches merge. Conflict.

      1. Kevin says:

        Yes, I understand that. My point is that, if the branch you cherry-pick onto is only ever merged into from one other branch, and that’s the branch you’re cherry-picking from, then I don’t see how problems can arise.

        You can criticize that as unrealistic, but then as I explained, not everyone uses feature branches the way Microsoft does. For that matter, not everyone uses Git.

        1. Even if you cherry-pick only from branches you intend to merge from, you still have this problem. Observe that in the first explosion diagram, the master branch cherry-picked from the feature branch, and then it later merged from that same branch.

          1. Kevin says:

            I don’t want to get into a lengthy internet argument with you, but at my workplace, M1, M3, and perhaps even the final merge would be prohibited by policy (because you don’t make commits on that branch, you make them on the origin branch and cherry-pick them in, and eventually the whole branch is quietly abandoned and replaced). The destination branch is nothing more than a “what went into this release” tracker, and for that purpose I see nothing wrong with an occasional cherry-pick.

            Again, I agree that A) this is a very idiosyncratic workflow, B) for Microsoft’s branching strategy it doesn’t make sense, and C) you can only get away with cherry-picking if your branching strategy very rigidly avoids the kinds of issues you are blogging about, which D) are far too often ignored by people making careless cherry-picks on random Git projects.

          2. I’m assuming that you want the two branches to merge eventually; you just need to get one commit into the other branch faster than the others. If your design is that the branches will never merge, then merge conflicts are nonexistent and the issue is moot.

  12. cheong00 says:

    We did have 2 branch of HR systems in my ex-company. The retail branch is a copy of master branch.

    We did erratic change to the retail branch to speed up the payroll calculations, but the change we made is too huge to port back to master branch (there are table structure change, and even change in meaning of constant values) so our boss made the decision that these two branches will never merge back.

    That’s one way why an explosion doesn’t happen. (And I’m sure that it isn’t what you mean :P )

    1. Joshua says:

      That’s when you cherry-pick everything into retail, delete master and rename retail master.

      1. cheong00 says:

        Too bad we can do that, because the master branch is already in use by a major client which as accountable for over 50% of our department’s maintenance fee income. And we cannot move their system to retail branch because their payroll calculation is significantly more complex than any other companies. No one in the company dares to rewrite that part.

  13. Pierre B. says:

    As others have said, I only ever saw cherry-picking in git as a way to bring a particular bug-fix into a release-branch to do a point-release or bug-fix release. For example from v12.0 to v12.1 or v12.0.1. Those release branches don’t see any further development except these feature or bug-fix merges.

    Of course, it only works if you plan your initial commits correctly, meaning don’t mix multiple unrelated fixes or dev into a single commit. Unfortunately, the temptation to slip in a small fix with a feature is very strong. (But that problem is unrelated to cherry-pick vs merge.)

  14. Jonathan Wilson says:

    I use cherry-picking all the time to take specific fixes from the main development trunk of a project I work on and back-port it to a different branch. Most of the development work that gets done on the main trunk will never get pushed to that particular branch (its a more stable build for a specific purpose where the main development trunk cant be used) but sometimes fixes get made that do apply to this particular branch, hence the cherry picking.

  15. hli says:

    I use cherry-picking to do more extensive history-rewriting (e.g. rebasing a branch on top of a branch that was itself rebased). My rule here is: when I have cherry-picked, the donor branch needs to go away. Do not have duplicate commits.

    1. Joe says:

      I’ve long found that carrying a “feature” branch forward is full of problems. Once I merge/cherry pick to master, I “delete” that feature branch. Another advantage this has is that on the short term, you can see exactly the purpose of each [well named] branch.

Skip to main content