Supercharging the Git Commit Graph
June 25, 2018
Have you ever run gitk and waited a few seconds before the window appears? Have you struggled to visualize your commit history into a sane order of contributions instead of a stream of parallel work? Have you ever run a force-push and waited seconds for Git to give any output? You may be having performance issues due to the number of commits in your repository.
If you have a large repository, then you may notice that git log --graph takes a few seconds to write any output, while Visual Studio Team Services (VSTS) returns these results very quickly.

This is due to some really cool algorithms we built out and tested server side. We recently took the first steps in bringing those algorithms to the whole open source Git community by submitting the code to the core git project.
This week marks the release of Git 2.18.0 and Git for Windows 2.18.0. There are a lot of cool features and performance enhancements in this one, so I hope you upgrade and enjoy! One new feature in 2.18 is a serialized commit-graph. I think a lot of users will benefit from this feature, especially if you are working in a large repository with tens of thousands of commits (or more). The feature is optional, so right now you’ll need to enable it manually.
How to Enable the Commit-Graph Feature
Currently, the commit-graph feature requires a bit of self-maintenance, but we hope to improve this expeirence in future versions.
This is an experimental feature! Please use with caution. You can always turn off the feature using git config core.commitGraph false. There are a few Git features that don’t work well with the commit-graph, such as shallow clones, replace-objects, and commit grafts. If you never use any of those features, then you should have no problems!
To enable the commit-graph feature in your repository, run git config core.commitGraph true. Then, you can update your commit-graph file by running
git show-ref -s | git commit-graph write --stdin-commits
You are good to go! That last command created a file at .git/objects/info/commit-graph relative to your repository root. This file contains a compact description of your commit history that is faster to parse than unzipping your packfiles and loose objects.
Go and test your favorite commands and see how long they take. You can compare commands before and after the commit-graph feature using something like the following:
time git -c core.commitGraph=false log --graph --oneline -10 time git -c core.commitGraph=true log --graph --oneline -10
I’d love to hear from you if you’ve had success with certain commands because of this feature!
Performance Numbers
If you don’t feel like testing this yourself without proof of the benefit, here are some performance numbers for a few important repos: Linux, Git, and Windows. In the case of Linux and Git, I include the exact commits I use so you can reproduce a similar experiment.
Linux
The Linux kernel repository is the gold standard for Git performance. It has a good number of files, and many commits (over 750,000), and is publicly available for everyone to clone and test themselves.
| Command | Before | After | Change |
|---|---|---|---|
git merge-base master topic |
0.52 | 0.06 | -88% |
git branch --contains |
76.20 | 0.04 | -99% |
git tag --contains |
5.30 | 0.03 | -99% |
git tag --merged |
6.30 | 1.50 | -76% |
git log --graph -10 |
5.90 | 0.74 | -87% |
For this test, I had the following branch values:
master: 032b4cc8ff84490c4bc7c4ef8c91e6d83a637538 topic: 62d18ecfa64137349fac9c5817784fbd48b54f48
This version of master can reach 722,849 commits and is 30,986 commits behind topic.
Git
The Git repository is also publicly available, but is much smaller than the Linux repository. However, it is large enough to see benefits with the commit-graph feature.
| Command | Before | After | Change |
|---|---|---|---|
git merge-base master topic |
0.10 | 0.04 | -60% |
git branch --contains |
0.76 | 0.03 | -96% |
git tag --contains |
0.70 | 0.03 | -96% |
git tag --merged |
0.74 | 0.12 | -84% |
git log --graph -10 |
0.44 | 0.05 | -89% |
For this test, I had the following branch values:
master: b50d82b00a8fc9d24e41ae7dc30185555f8fb0a0 topic: e144d126d74f5d2702870ca9423743102eec6fcd
This version of master can reach 49,361 commits and is 2,032 commits behind topic.
The Windows Repository
The developers making Microsoft Windows use Git, enhanced by the Git Virtual File System (GVFS). We deployed the commit-graph feature to the Windows developers with a recent (private) release of GVFS. In that version, GVFS handles the maintenance of the commit-graph file, so it is updated with every fetch.
| Command | Before | After | Change |
|---|---|---|---|
git status --ahead-behind |
14.30 | 4.70 | -67% |
git merge-base A B |
11.40 | 1.80 | -84% |
git branch --contains |
9.40 | 1.60 | -83% |
git log --graph -10 |
24.30 | 5.30 | -78% |
My local version of master has 2,214,796 reachable commits. The reason git status improves is because my local version of master is 81,776 commits behind origin/master, and git status walks commits to compute this count. With 4,000+ developers working in the repo, the branches move very quickly, so this is a realistic difference between a local and remote branch.
The above performance numbers are nice, but they are also isolated tests that I ran on my machine. It’s much better to have real-life examples of this helping users in their actual workflows.
For example, one user complained that a force-push command was slow. We found that the amount of data being sent to the server was not the problem. Instead, we found that the logic for deciding if a force-push is necessary walks the entire commit history from the new ref location. This meant that Git was walking over two million commits! The improved parse speed of the commit-graph feature was enough to improve the force-push time in this example from 90 seconds to 30 seconds. We are working to modify this logic so it doesn’t require walking all of those commits.
My History with the Git Commit Graph
Before I joined Microsoft, I was a mathematician working in computational graph theory. I spent years thinking about graphs every day, so it was a habit that was hard to break. Good thing Git stores its data as a directed acyclic graph, so everything we do in Git involves graphs in one way or another.
A few years ago, I left academia and joined the VSTS Git server team. My first year was spent mainly on implementing a commit-graph feature that accelerated commit walks on the service. While my contributions were only on the back-end server code, a fantastic team created a way to visualize the commit history as a graph in the web. This means that whenver you view the history of your repo, you’ll see the same output as if you ran git log --graph, complete with a visualization of commit parents. Also, Matt talked a bit about the commit-graph in a performance blog post.


git log --topo-order
The above pictures show the commit history page on VSTS for the GitForWindows repository and a related git log --topo-order call. The --topo-order flag tells Git to order the commits the same as a git log --graph call, but doesn’t render the commit-to-parent edges. In this case, there are so many merges that the git log --graph output becomes a huge mess. VSTS uses the same graph rendering as Team Explorer in Visual Studio.
One problem with launching this feature was that the corresponding Git command is slow. For the command above, git log --topo-order took 2.8 seconds. It takes even longer for larger repositories that have millions of commits! Today, the web request in VSTS takes around 0.22 seconds including a round trip to the server. Trying to do similar commands with the Linux kernel (750K commits) or the Windows repository (2 million commits) becomes quite painful in the command-line, but the web view stays around 200-400 milliseconds for most queries.
After being on the Git server team for VSTS, I chose to switch teams to the client team that works on Git, GVFS, and other version control clients. The primary reason I wanted to switch was so I could provide the same performance benefits we implemented on our servers to the Git community. The commit-graph feature in Git 2.18 is a major step in this direction. The current state of the commit-graph feature is almost exactly as I described in a talk at Git Merge 2018:
You can continue reading the next article in this series, Part II: File Format. In the coming weeks, I’ll post more articles that give more details about the commit-graph feature in Git 2.18, some powerful algorithms we have in VSTS, and how we are bringing those algorithms to Git soon.
I’ve been using this for a while now.
git config –global alias.lg “log –color –graph –pretty=format:’%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)%Creset’ –abbrev-commit –“
I have a similar alias for `git lg` to give me pretty formats for `git log –graph`. Have you tried yours before and after using the commit-graph feature?
The format above is a little broken. Should be: “`git log –color –graph –pretty=format:’%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)%Creset’ –abbrev-commit –“`
this include the commit author
`git log –color –graph –pretty=format:’%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)%Creset’ –abbrev-commit`
this includes the commit author `git log –color –graph –pretty=format:’%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)%Creset’ –abbrev-commit`
No I haven’t but what benefit would it add the the alias?
It makes it faster!
Ahaha, nice. I will do some riming profiling when I have time and post results if you’re interested, although I imagine you’ve done the same 🙂
What speed gains did you get in % or ms compared to the old alias?
Do I have to run `git commit-graph write` after every commit? What happens if I don’t do that: just a performance hit or wrong results?
You don’t need to run it every time. The feature works in “mixed” mode, where some commits are not in the commit-graph file. We are working to make this maintenance automatic in Git 2.19.
Should we be running `git show-ref -s | git commit-graph write –stdin-commits` every so often to keep it updated?
It’s a good idea, but you don’t need to do it too frequently. Perhaps after a big fetch. We are working to make this automatic in the future.
That is such a great step in the right direction.
A feature question: will git keep the “commit-graph” file up to date after its initial creation, or do I need to re-run “git show-ref -s […]” after each commit / pull /…?
We are working to make the maintenance automatic, it just didn’t make the cutoff for 2.18. See the work in progress here: https://public-inbox.org/git/20180608135548.216405-1-dstolee@microsoft.com/T/#m47efe814116e2f3884a76ca40c2d87aad6b2d967
As I understand –*-order is so slow because git ensures no child follows a parent. For that, git has to know that all children of the commit it outputs next have been output already. Unfortunately, commits only know their parents. So, git has to load the entire commit graph before it can output any commit.
Are you serializing the children of each commit?
I think the default order (no arguments) is much faster because git just outputs the newest of all open commit graph nodes without any lookahead. This will produce the same order as –date-order if and only if the committer dates are reliable and no child is older than its parents. Is that about right?
You’re aboslutely right. Requiring topological constraints is what makes it expensive.
We do serialize the commit-parent relationship, but not the reverse. Part of this is because the commit-parent relationship never changes for a given commit, but a commit-child relationship does change. Also, when doing topo-sorting using Kahn’s algorithm, we need to compute the “in-degree” of a commit, but only for commits in the set of commits we are walking. If we have a child that is not part of the set, then we don’t want to consider that relationship.
I’ll talk more about the file format and how we can speed up –order even more in a futrure blog post. Stay tuned!
Thanks! Can’t wait for the maintenance to get automated. I’ve run into having to use the annoyingly slow ordering before (https://bugzilla.mozilla.org/show_bug.cgi?id=1466948) so this is a killer feature for me.
Using the command line and repo from that bug:
git log -1: 2 ms
git log -1 –date-order: 4251 ms
git -c core.commitGraph=true log -1 –date-order: 436 ms
I’m glad this helps speed up your `git log` calls. Thanks for the perf numbers!
We are working on a way to get a much better than 10x performance boost in the future. I’ll blog about it in a couple weeks, but the Git version of the feature requires a significant refactoring of the revision-walking code. It’s on our list, but we want to land the automated maintenance before embarking on that big effort.
what’s the right format? log –color –graph –pretty=format:’%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)%Creset’ –abbrev-commit –
how do u automate this? git commit-graph write
To enable the commit-graph feature in your repository, run git config core.commitGraph true. Then, you can update your commit-graph file by running
‘git show-ref -s | git commit-graph write —stdin-commits’
how do you automatically update this?