Announcing GVFS (Git Virtual File System)

Here at Microsoft we have teams of all shapes and sizes, and many of them are already using Git or are moving that way. For the most part, the Git client and Team Services Git repos work great for them. However, we also have a handful of teams with repos of unusual size! For example, the Windows codebase has over 3.5 million files and is over 270 GB in size. The Git client was never designed to work with repos with that many files or that much content. You can see that in action when you run “git checkout” and it takes up to 3 hours, or even a simple “git status” takes almost 10 minutes to run. That’s assuming you can get past the “git clone”, which takes 12+ hours.

Even so, we are fans of Git, and we were not deterred. That’s why we’ve been working hard on a solution that allows the Git client to scale to repos of any size. Today, we’re introducing GVFS (Git Virtual File System), which virtualizes the file system beneath your repo and makes it appear as though all the files in your repo are present, but in reality only downloads a file the first time it is opened. GVFS also actively manages how much of the repo Git has to consider in operations like checkout and status, since any file that has not been hydrated can be safely ignored. And because we do this all at the file system level, your IDEs and build tools don’t need to change at all!

In a repo that is this large, no developer builds the entire source tree. Instead, they typically download the build outputs from the most recent official build, and only build a small portion of the sources related to the area they are modifying. Therefore, even though there are over 3 million files in the repo, a typical developer will only need to download and use about 50-100K of those files.

With GVFS, this means that they now have a Git experience that is much more manageable: clone now takes a few minutes instead of 12+ hours, checkout takes 30 seconds instead of 2-3 hours, and status takes 4-5 seconds instead of 10 minutes. And we’re working on making those numbers even better. (Of course, the tradeoff is that their first build takes a little longer because it has to download each of the files that it is building, but subsequent builds are no slower than normal.)

While GVFS is still in progress, we’re excited to announce that we are open sourcing the client code at https://github.com/Microsoft/gvfs. Feel free to give it a try, but please be aware that it still relies on a pre-release file system driver. The driver binaries are also available for preview as a NuGet package, and your best bet is to play with GVFS in a VM and not in any production environment.

In addition to the GVFS sources, we’ve also made some changes to Git to allow it to work well on a GVFS-backed repo, and those sources are available at https://github.com/Microsoft/git. And lastly, GVFS relies on a protocol extension that any service can implement; the protocol is available at https://github.com/Microsoft/gvfs/blob/master/Protocol.md.

Edited February 6: Brian Harry discusses GVFS over on his blog, too.

108