What’s new in Git for Windows 2.10?

It has been a busy time since my last post. There have been nine public releases of Git for Windows in the meantime. And a lot has happened.

Most importantly, Git for Windows v2.10.0 has been released. Download it here. Or look at its homepage.

Let me take this opportunity to mention a couple of highlights:

The interactive rebase is now much faster

One of Git’s most powerful commands lets the user reorder commits, edit commit messages and split/join commits. It is called the interactive rebase, or rebase -i.

Originally intended as a simple side project to help myself contribute changes to the Git project itself, it evolved into a very powerful tool that helps with refining topic branches until they are ready to be merged. Essentially, it generates a list of commits called “edit script” or “todo script”, lets the user edit that script, and then re-applies the code changes accordingly.

The initial version was a very simple shell script, and here lies the rub: shell scripting is good for prototyping quick and dirty, but it lacks direct access to Git’s data structures, proper error handling, and most importantly: speed. This matters especially on Windows, where it is more expensive to spawn processes than, say, on Linux. Also, Git for Windows has to resort to a POSIX emulation layer to run shell scripts, which means an enormous performance impact.

Being a power user of the interactive rebase myself, I hence set out to address this problem. As rebase -i had turned into a monster of a shell script in the meantime, my idea was to go for an incremental route: re-implement some parts of the interactive rebase in C and switch the shell script over to use those parts, one by one.

The end result is an interactive rebase that, according to a benchmark included in Git’s source code, runs ~5x faster on Windows, ~4x faster on MacOSX and still ~3x faster on Linux.

Git for Windows v2.10.0 includes this new and improved code, and MacOSX and Linux will soon benefit, too.

Technical details, for the curious

The initial step of this incremental route was still a very, very big step: to process the “todo script” in a “builtin”, i.e. in a Git command implemented in C and using Git’s internal API.

This idea has not been new: there was a plan to back the interactive rebase by a “sequencer”, a to-be-written low-level command that could also be used by other applications directly. The sequencer was introduced alright, and it was similar in design to the interactive rebase, but only used to implement git cherry-pick <commit-range>. It still took quite a bit of work to modify that sequencer to enable it to run interactive rebases.

In addition to this work, I had to touch other code parts, too, such as fixing some regression tests, introducing a performance benchmark for rebase -i, modifying many code paths not to exit uncleanly, and bits and pieces that could be cleaned up “while at it”. To make things more manageable, I decided to split up the 100 or so patches into several patch series. These are the patch series that have been already accepted into “upstream” Git (i.e. the Git project of which Git for Windows is a friendly fork):

  • b232439 (Merge branch ‘js/t3404-typofix’, 2016-05-17)
  • 7b02771 (Merge branch ‘js/perf-rebase-i’, 2016-05-23)
  • 3437017 (Merge branch ‘js/perf-on-apple’, 2016-07-06)
  • 62e5e83 (Merge branch ‘js/find-commit-subject-ignore-leading-blanks’, 2016-07-11)
  • c510926 (Merge branch ‘js/sign-empty-commit-fix’, 2016-07-13)
  • 6c35952 (Merge branch ‘js/t3404-grammo-fix’, 2016-07-13)
  • 63641fb (Merge branch ‘js/log-to-diffopt-file’, 2016-07-19)
  • 3d55eea (Merge branch ‘js/am-call-theirs-theirs-in-fallback-3way’, 2016-07-19)
  • c97268c (Merge branch ‘js/rebase-i-tests’, 2016-07-28)
  • 1a5f1a3 (Merge branch ‘js/am-3-merge-recursive-direct’, 2016-08-10)

(The hex commit names as well as the date indicate when the respective patch series has been accepted into Git’s source code by the Git maintainer.)

There were a few bits and pieces missing, of course, mostly the part where the sequencer actually learns to perform the grunt work of interactive rebases. Those bits and pieces have been contributed to the Git project over the last two weeks, and I will work on them until they get accepted:

  • the require-clean-work-tree branch that refactors out useful code from the git pull command,
  • the libify-sequencer branch to allow the sequencer to handle errors other than simply exiting,
  • the prepare-sequencer branch to rearrange the sequencer code and make it easier to extend,
  • the sequencer-i branch that teaches the sequencer to understand interactive rebase’s edit scripts,
  • the rebase--helper branch to add a new low-level command to actually call the sequencer in rebase -i mode, and
  • the rebase-i-extra branch that re-implements complex processing of the edit scripts in C.

These patches have been developed since early February, and they finally get to benefit the users!

Bonus track: cross-validating the interactive rebases

Made it so far without falling asleep? Congratulations. So now for some fun part: how can I be so certain that this code is ready for prime time?

The answer: I verified it. Inspired by GitHub’s blog post on their Scientist library, I taught my personal Git version to cross-validate each and every interactive rebase that I performed since the middle of May. That is, each and every interactive rebase I ran was first performed using the original shell script, then using the git rebase--helper, and then the results were confirmed to be identical (modulo time stamps).

Of course, that means that I did not benefit from the speed improvements until this past week, when I finally turned off the cross-validation. But it added enormously to the confidence in the correctness of the new code.

Full disclosure: the cross-validation did find three regressions that were not caught by the regression test suite (which I have subsequently adjusted to test for those issues, of course). So it was worth the effort.

MinGit: Git for Windows applications

Another big new feature since Git for Windows 2.8 is MinGit: with every new Git for Windows version, we now offer .zip archives of “Git for Windows applications”.

Let’s look at the motivation for this new feature first: Visual Studio’s Team Explorer, as well as GitHub for Windows and many other applications working on Git repositories and work trees, accesses Git functionality by calling git.exe of Git for Windows, providing input and processing output. This requires Git for Windows to be installed separately, as it is a separate software package, and results in the common dependency problems: how to tell whether the available Git version provides all the functionality required by the application. An alternative to using the installed Git for Windows would be to bundle a complete, kn0wn-good Git for Windows version, but that would require an additional ~200MB on disk.

Enter MinGit.

The idea is to provide a version of Git for Windows that does not provide an interactive user interface, that does not provide localisations, that does not provide GUIs and that omits git svn (which would require an entire Perl infrastructure). Essentially, it is a Git for Windows that was stripped down as much as possible without sacrificing the functionality in which 3rd-party software may be interested.

It currently requires only ~45MB on disk.

In the same spirit, my excellent colleague Jeff Hostetler worked on a new, enhanced low-level mode of the git status command that gives applications a quick, complete picture of a Git work tree’s state, using a single git.exe invocation. This feature has been contributed to the Git project and is already available as an experimental option in Git for Windows.

Other highlights in v2.10.0

  • In particular when rebasing onto fast-moving branches, git rebase is much, much faster now. This feature has been contributed by my colleague Kevin Willford.
  • The browser to use to display help pages can now be configured via the help.browser setting; this used to be disabled on Windows for years. This fix was already available in Git for Windows and is now also part of the Git project’s source code.
  • The git mv dir non-existing-dir/ command now works as expected in Bash on Ubuntu on Windows (previously, it relied on a Linux-specific deviation from the POSIX specs). While not strictly a Git for Windows issue, this came up in my testing of Git in Bash on Ubuntu on Windows.
  • To help develop cross-platform projects, files can be marked as executable on Windows via the git add --chmod=+x option.
  • The initial phase of a git fetch is much faster now when the remote repository contains a lot of branches and/or tags.
  • The git grep command already knew to ignore the case via the -i option. This mode now respects non-ASCII locales, too.
  • When merging text files in a complicated commit history, Git no longer gets confused by line endings.
  • Git no longer passes on open handles of temporary files to child processes. This could previously result in locking problems, where the child processes prevented the parent process from deleting the temporary files.
  • Git’s build process now ensures that no files are added to Git’s source code whose names are illegal on Windows.

These improvements are part of the “upstream” Git v2.10.0 (i.e. not only for Windows). See the full release notes here.

Fun facts: The making of Git for Windows v2.10.0

Every Git for Windows release starts with rebasing the Windows-specific patches of Git for Windows’ master branch to the released version of upstream Git. It is not a simple rebase, though! It retains the branch structure of currently 48 topic branches, using the special-purpose tool called “Git garden shears”.

After that, I run Git’s entire regression test suite. If that is not passing, I investigate it and fix bugs before going on with the release. Happily, this was not necessary this time, also because I had performed a Git garden shears run three days earlier, in preparation for v2.10.0. Running the regression test suite on my (rather beefy) work laptop took 47m6.638s.

Once the regression tests all pass, the “real” release engineering begins, in a dedicated virtual machine. This entails the very same steps, each and every time, so I automated them. This not only avoids mistakes in the release engineering process, it also saves tons of my time. Essentially, I run a series of steps by calling please.sh <step> [<options>], in order, doing other things while the computer does the hard work:

  1. Update the packages (such as bash, curl, gcc, etc) in the 32-bit and the 64-bit Git for Windows SDK: 0m25.141s
  2. Finalize the release notes (verifying the date, the upstream Git version, etc): 0m34.739s
  3. Tag v2.10.0.windows.1 (again verifying a couple of things and rendering the release notes into the tag): 0m8.335s
  4. Build the Git packages (32-bit and 64-bit, including HTML and man help pages): 40m25.803s
  5. Install the Git packages into the SDKs: 1m55.946s
  6. Upload the Git packages to Git for Windows’ Pacman repository: 6m51.918s
  7. Build the installers, portable Git, MinGit, NuPkg: 18m56.284s
  8. Upload the installers, portable Git, etc: 23m16.976s

That means that the computer worked for 2 hours 19 minutes and 41.78 seconds to prepare Git for Windows v2.10.0. During this time, I wrote this blog post.