Announcing GVFS (Git Virtual File System)
February 3, 2017
Here at Microsoft we have teams of all shapes and sizes, and many of them are already using Git or are moving that way. For the most part, the Git client and Team Services Git repos work great for them. However, we also have a handful of teams with repos of unusual size! For example, the Windows codebase has over 3.5 million files and is over 270 GB in size. The Git client was never designed to work with repos with that many files or that much content. You can see that in action when you run “git checkout” and it takes up to 3 hours, or even a simple “git status” takes almost 10 minutes to run. That’s assuming you can get past the “git clone”, which takes 12+ hours.
Even so, we are fans of Git, and we were not deterred. That’s why we’ve been working hard on a solution that allows the Git client to scale to repos of any size. Today, we’re introducing GVFS (Git Virtual File System), which virtualizes the file system beneath your repo and makes it appear as though all the files in your repo are present, but in reality only downloads a file the first time it is opened. GVFS also actively manages how much of the repo Git has to consider in operations like checkout and status, since any file that has not been hydrated can be safely ignored. And because we do this all at the file system level, your IDEs and build tools don’t need to change at all!
In a repo that is this large, no developer builds the entire source tree. Instead, they typically download the build outputs from the most recent official build, and only build a small portion of the sources related to the area they are modifying. Therefore, even though there are over 3 million files in the repo, a typical developer will only need to download and use about 50-100K of those files.
With GVFS, this means that they now have a Git experience that is much more manageable: clone now takes a few minutes instead of 12+ hours, checkout takes 30 seconds instead of 2-3 hours, and status takes 4-5 seconds instead of 10 minutes. And we’re working on making those numbers even better. (Of course, the tradeoff is that their first build takes a little longer because it has to download each of the files that it is building, but subsequent builds are no slower than normal.)
While GVFS is still in progress, we’re excited to announce that we are open sourcing the client code at https://github.com/Microsoft/gvfs. Feel free to give it a try, but please be aware that it still relies on a pre-release file system driver. The driver binaries are also available for preview as a NuGet package, and your best bet is to play with GVFS in a VM and not in any production environment.
In addition to the GVFS sources, we’ve also made some changes to Git to allow it to work well on a GVFS-backed repo, and those sources are available at https://github.com/Microsoft/git. And lastly, GVFS relies on a protocol extension that any service can implement; the protocol is available at https://github.com/Microsoft/gvfs/blob/master/Protocol.md.
Edited February 6: Brian Harry discusses GVFS over on his blog, too.
IMHO, the problems, which you described in the first paragraph of this article, caused for the most part not by Git itself, but by NTFS file system used in Windows. Mac’s file system (NFS+) handles very large number of small files (common to Git, also npm repositories etc.) without any issues (or at least much much better than NTFS).
And probably instead of inventing yet another weird stuff like GVFS, Microsoft would better invest its time and money in fixing the root issue, that is the NTFS file system…
I can definitely see how improving NTFS could help developers download 270 GIGABYTES of data on an initial clone faster. The limit definitely isn’t the speed of the link between them and the git server.
(sarcasm, for the Americans amongst you)
Jesus Christ, what a dick.
Rudeness aside, the article says:
> You can see that in action when you run “git checkout” and it takes up to 3 hours, or even a simple “git status” takes almost 10 minutes to run
Those things do not require network traffic, and in my experience even with small git repos, the Windows git client is cripplingly slow compared to other systems. I’ve always assumed the problem is NTFS, but whatever the cause, it definitely exists
How often have you tried to have 3.5M/270GB of files on your NFS+ partition? If the answer is never, you might not be in any position to actually claim anything. Do show your biggest repos and benchmarks on how much faster they are on NFS+ compared to NTFS. Will be interesting to see how much performance would increase on a different FS. Though you could’ve included them in your original message since you clearly have done those.
Also I’m sure macOS magically can fix the problem of transfering millions of files, as well as doing other operations on them. Do tell how this magic works, my macOS doesn’t seem to have that. Have I not turned some option on?
Not to mention GVFS is not some “weird stuff”, except maybe for macOS users, where you don’t get to use anything other than what’s given to you, so you must always claim that’s the best there is and nothing can beat it. But maybe if you opened your eyes a bit and thought about it you might understand why this is actually usable for many different situations.
Thanks James for expressing my thoughts on paper!
Jane you ignorant slt. You both are warrior-idiots. Get a life.
Nothing in this response is relevant to anything that was said.
You both miss the point.
First of all Windows need to be massively broken project if repo is so big. I can clone of Linux in around 10 min.
Full Linux history is 5 millions objects but in term of size is much smaller 2.5 GB.
Secondly Git can make shallow copy. You can pull only branches your are interested and you can cherry pick as well.
It is still more efficient that work on your file-system because: search, code auto-completion, refactoring and git will work better.
Linux is a kernel not a full fledged desktop operating system.
Try to download all source files from one repo with gnome/kde, linux kernel, all tools and co to compare it properly.
So says the guy who uses an OS that ships with “Candy Soda Crush” and has to trick users into installing “updates”.
Who needs to open their eyes here?
I have never had “Candy Soda Crush” or any other WinStore game on my system. And the tricking users to install updates is because people don’t like change even if it is better for them, hell people would still be on XP if they could.
Both OSs have advantages and disadvantages. I use macOS for development and Windows for gaming. You both need to open your eyes and realize not everything is for everybody and the reason we have multiple options is because people prefer different things.
Have you ever worked with a codebase as large as Windows?
Neither NFSv4 or HFS+ will be much help here.
300 gigabytes of source code will take some time to download, what is done here is to only download metadata and the files you actually need locally.
I have not worked with repos myself that is 280 gigabytes, but tens of gigabytes will be a pain to work with too, most of the time.
NTFS and HFS+ are both old filesystems, but they are stable. It will take time to replace them both. But, no I don’t think you understand the issue at hand here.
I think a more interesting discussion is about git checkout/status taking so long on (presumably) entirely downloaded git repositories. Is a one-time download of 300G over a corp network really that much of a problem, comparatively?
Looking at why local operations take a long time is definitely worthwhile. Apart from not being optimized around 300G/3.5M files repositories, I think git was originally written with a strong assumption of a linuxy file system, so it might make sense that it doesn’t work quite as fast on NTFS, the inherent merits of NTFS aside. There’s definitely work that can be done here (and some work has be done here!) to improve git’s purely local performance on Windows systems, ignoring network speeds.
I see Facebook and Git doing this sort of thing with Mercurial (why…? because it’s easier to extend, probably). I think MS doing this with git is natural and very helpful for people working in gigantic repositories that aren’t able to be broken up easily, perhaps because of legacy code and connection stretching say 20 years…
IMHO If you’d used words like HFS+ or APFS as a comparison against NTFS then there might be some value to your statements, if backed up with a performance comparison report showing HFS+ with journaling enabled against NTFS. But NFS is a network file system. NTFS is a disk file system.
The kind of factor performance gains this brings likely outweigh any OS level file system differences. File systems are tricky to compare as they have difference features that impact performance.
Most people don’t deal with a repo this big… for those that do, this looks to be a valuable timesaving tool.
Yes, I meant HFS+, of course. “NFS+” was a typo…
Maybe I was wrong calling this GVFS tech “weird stuff”, as it will help to improve Git performance by downloading only files which are needed, but my point that NTFS is way slower than other file systems e.g. HFS+ when dealing with very large number of small files, is still valid. I’m too lazy to do benchmarks etc., but maybe I will. Speed difference is noticeable by naked eye though.
It isn’t valid. You just made it up!
May be some of those dirty mac port NTFS drivers perform bad, but be assured it does perform always fast on native windows. All those aged and week arguments compare apple to oranges, most originate from a time HFS did not know journaling
Feed trolls with arguments:
https://github.com/Microsoft/GVFS/issues/4
Or, god forbid,, EXT4…
I don’t suppose the problem could be the 3.5 million files and 270 GB of bloatware that is the Windows codebase?
Shots fired!
But I may just like the new OSS flavor of Microsoft.
This really doesn’t sound that crazy to me. If you consider that according to the blog, a single branch of Windows is roughly 50k files (which actually sounds pretty small to me, I work regularly on a product of similar size which is much smaller than windows). Then consider you have 30 years of branches, versions, etc then 3.5 million files doesn’t sound particularly huge. That would mean each file has been modified just 70 times on average over the years.
The codebase for Windows is 3.5M files, not 3.5M files spread across all revisions. Also I thought the scrapped most of the history and started over from scratch fairly recently, so its not 30 years of history. Its likely just the history of Windows 10 in that repo. Each developer, working on a specific feature may only need to access 50-100k files for that specific part of Windows. 3.5M files is a HUGE project.
All Windows developers have full read access to Windows. In addition, they have write access to other parts of Windows on request.
I believe the oldnewthing blog stated that everything before Windows 2000 has been archived in some mountain and has to be requested if you need access but I cannot find the exact blog post.
Found it: “All history prior to Windows 2000 has been archived into a salt mine in Montana somewhere. I could probably get access if I had a valid business justification. -Raymond” https://blogs.msdn.microsoft.com/oldnewthing/20140227-00/?p=1643/
Windows is still pretty svelte compared to the typical linux distro. A fair criticism would be to say that it needs to be modularized into independently developed packages so that this huge single repo issue goes away.
Joeri,
That was my first thought, too! Perhaps the problem is a lack of modularization. I can’t imagine most developers have to work on every distribution of every part of Windows at once. That leads me to believe that this is either a contrived case to demonstrate why a virtualized git loader is necessary, or highlights a serious flaw in the partitioning of duties in the Windows codebase. Surely, as a developer, I should be able to work on a few hundred thousand files and still be covering a fairly large part of the Windows domain for a particular distribution target?
That size is actually the entire OS repo. It includes Windows OneCore, Desktop, Mobile, HoloLens, Xbox, IOT, etc. Plus tools, and other code we ingest from feeds and store in our tree. It’s the full enchilada, not just Desktop.
Yikes! GVFS might be a good stopgap solution, but partitioning the repo out into different modules might be the better solution in the long run.
What he said.
When I read this: “In a repo that is this large, no developer builds the entire source tree.” I couldn’t decide whether to laugh or cry.
You wrote down the problem: you’ve put all your projects into a single repository and you can’t check just one project out.
But it seems that you decided that instead of reorganizing and partitioning the repository into a repository per project (i.e. trying to work the way git was designed), you’d take the cool new tool and try to make it work the way your old tool did.
And this highlights what I consider to be a major flaw in git, the inability to work with just part of a repository. Git provides no way to work with a unit of code any smaller than a branch. Partitioning a repository almost always ends up either losing part of your valuable history, or leaving you with a large history consisting of the very files you want to avoid downloading in the first place. Not to mention the philosophical complaint that partitioning solely to overcome limitations of your VCS is a serious case of the tail wagging the dog.
None of the other major version control systems have this problem, just git. For all the things I really like about git, this one flaw is huge. In the past couple of years I’ve worked on two large projects using git, both of which evolved to where they needed to reorganize and repartition the repos. Both ended up missing major chunks of project history in the restructured repos. You have to go back to the legacy repos to see most changes made before the restructure. This was two completely separate projects with different staff and different repo managers, yet they both ended up in the same boat, all because of something that would have been a total non-issue for any other version control system. I chose NOT to use git on my latest project, primarily because of this one flaw. Am I sad to lose local commits and a host of other features? Sure. But for large projects with multiple components whose boundaries may change, git just isn’t worth it.
I co#7dn&u821l;t agree more! I’m learning to save for bigger items that I absolutely love rather than having a closet full of clothes that don’t fit me well or I hate wearing. I’ve found that yes, I may be spending more on items but I’m not doing it as often, it’s important to find that balance! Great post and I love this outfit on you!!
Having everything in a single repo is a huge administrative simplification. Multiple repos soon becomes lots of multiple repos. Consider then the important “big project” represented by branch big_project. Well you need such a branch on all repos – or do we need it on just the repos we’re touching for this project? or what about if we find we need repo XYZ after all, we’ll need to create branches as and when we need them on each repo.
But later – “what did solution ABC look like when we started this project”? well that depends on which repo its in because they all have their own big_project branch and it was created on Jan 23 in repo_XYZ but that branch was created on Feb 19 in repo JKL because we only realized we needed that repo in Feb and anyway….
No – avoid multiple repos unless there’s some good reason and repo size should not influence project and development practices.
Is that 270 GB of compressed files? What the heck are they putting in their code base?
What issue did you have using submodules? The scenario where a developer only needs to build a little bit of the tree seems like a perfect use case for them..
Intermodule dependencies. Windows has compatibility-related tight couplings all over the place.
This so begs for the answer I’ve been waiting to use forever: Use Wine.
Work with the Wine folks to support whatever legacy you need to.
It’s not like you _want_ to support developing for win32 forever… right?
I’m being facetious.
“any file that has not been hydrated can be safely ignored” Hydrated?
Will you guys please stop inventing weird words? It’s an epidemic lately.
A collection of tools is not a “workload.” A toolbox is a collection of tools. Building three sheds is a workload.
OneDrive does not provide a “sync experience.” It syncs.
“When you first install Visual Studio 2017 RC, you’ll see the new experience.” No, you don’t see an experience. You have one. And with Visual Studio 2015 or 2017, the experience is bad.
There’s jargon, there’s marketing, and then there’s this new breed of nonsense. Knock it off.
Because you have never seen that word used in this context before doesn’t make it was invented right there and then? You would know of that word and context if you used Redux.
You should read Raymond Chen’s occasional “Microspeak” blog posts for more beautiful examples of Microsoft-internal jargon.
This made my day 😀
Hahaha. Hard to argue with that. You make good points.
I will do my best to try not to destroy and/or re-invent the English language in future posts.
Glad to know someone out there cares about this stuff. 🙂
I heard the term hydrated near on 20 years ago and used it as recently as a couple of weeks ago. I think you just learned a new word, rather than the word itself just being invented.
I’ve heard the term used for this kind of thing many times. Not a new word, not a new meaning, just new to you.
Amazing work guys! Love the out of box thinking that was involved here. Don’t let the negative comments here deter your efforts, you guys are doing great things. Thank you!
True. I don’t want to take away from the awesome direction Microsoft is taking by contributing open source tools and sharing them with the community. I think I, like others are just a bit confused about the (possibly contrived) case of developers working on 3.5 million files in one go. I can’t imagine looking at my dev directory, seeing a folder with 270GB of source for just one project I’m working on, and thinking “…this is fine.” 🙂
That said, absolutely want to echo the sentiment Z. Great work and awesome to see new ideas coming out!
Git is designed to work with monolithic kernel sized projects. Did you ever think splitting Windows kernel and userspace repositories?
A long time ago somebody decided it would be good to have GUI in WinNT kernel…
I will be happy once Windows will decouple GUI & Kernel one day & make Windows modular & extensible with packages & make package manager.
Would APPX/NuGet pair not work for Windows kernel?
Check out Windows Server Nano.
They’re heading that way. There’s no GUI. You can’t RDP into it. Console is recovery only to basically tell you what IP address it grabbed. Only way you can interact with the OS is using powershell via WinRM.
Yes, but I am talking not only about servers here…modular means more agile way of development, more faster updates, less code-bases etc.
Thread from an ex Microsoft engineer about their monorepo decision:
https://twitter.com/xjoeduffyx/status/827633982116212736
I’m a sysadmin, but I’d love an built in git client. Could it be that the “new” feature comming to OneDrive for having a virtual structure and only download files if accessed is working with the same code/method?
Anyway nice to hear some insides and i like that you put it opensource
Funny thing, OneDrive on Windows 8 already _had_ that kind of structure, where files would only exist locally once you actually accessed them. All the metadata was grabbed, but the files themselves were just pointers into the cloud via (IIRC) NTFS extended attributes. It’s the #1 thing I miss about OneDrive on Windows 10, and I can’t wait for it to come back.
If I’m not mistaken, Microsoft has done the similar things with accessing non-local files on NTFS before, as well. Pretty sure the chapter on NTFS in Windows Internals includes some information about that.
I thought the Windows team used Source Depot for revision control, when (and why) did this change? Are there any blog posts or presentations about the transition, that must have been a fairly massive project!
Funny you should ask, this just popped up as well. 🙂
https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-git-and-some-back-story/
You lost me in the first paragraph. I understand it’s a problem that you are trying to solve, but if you zoom out you’ll see a monolithic architecture that consists of 3.5 million files!!! That is your real problem. 🙂
Anyways, this may have other use-cases that are more interesting than the initial use-case.
Wow, all this negativity? All these experts? Sheesh, I work on a system that’s a mere 270 MB repo, and being in NZ and using a US-based cloud-hosted server, that takes an annoyingly long time to do a clone. Using something like this means that the initial clone won’t take 270 MB; as it will only download whatever’s required for the version I checkout. I no longer have to consider rewriting history to remove old files to shrink the repo down (and all the attendant issues among my team that force-pushing all history would cause, and goodness knows how long git filter-branch would take to run, days probably.)
And, perhaps my system could have be suffering from a monolithic architecture, but frankly that’s irrelevant. It is what it is, hindsight is 20/20, my project started YEARS ago, and most dammingly, our previous source control system (TFSVC) handled it with ease. (We only moved to Git because everyone else is, we can connect with a variety of tools, and a vastly easier to migrate the history as we wanted to move to cloud based hosting.)
I’m hoping that this becomes (when ready) built into Visual Studio Team Services (the online services) and Visual Studio!
“The first big debate was – how many repos do you have – one for the whole company at one extreme or one for each small component? A big spectrum. Git is proven to work extremely well for a very large number of modest repos so we spent a bunch of time exploring what it would take to factor our large codebases into lots of tenable repos. Hmm. Ever worked in a huge code base for 20 years? Ever tried to go back afterwards and decompose it into small repos? You can guess what we discovered. The code is very hard to decompose. The cost would be very high. The risk from that level of churn would be enormous. And, we really do have scenarios where a single engineer needs to make sweeping changes across a very large swath of code. Trying to coordinate that across hundreds of repos would be very problematic.”
https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-git-and-some-back-story/
Seriously good stuff, but … better name please !!!
VFS has a different meaning in the UNIX world, similar to mux in window.
Some names would be ..
“GitCache”
“GitLazyFetch”
-Faraz
Nice names.
Some other suggestions:
LazyGitClone / GitLazyClone
EmptyGitClone / GitEmptyClone
GitShellClone
Yeah, it would be nice if consideration for existing projects outside of Microsoft would be used to prevent clashes for anyone doing a search for whatever name MS has “appropriated.”
So teams are moving to Git from your own Team Foundation Server?
No, people are using Git *hosted* inside Team Foundation Server, which is a fully fledged Git server.
Troll harder, fool.
He probably meant Team Foundation Version Control.
I also wonder, why did they prefer Git to TFVC.
Check out this companion post from bharry for more info: https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-git-and-some-back-story/
One of the arguments for using a DVCS like Git over a more conventional server-based VCS is the cost of branching vs the cost of shelvesets, and the ability to store these incremental diffs of work locally vs only remotely. The “plane scenario” is far more difficult with using traditional VCS. While I have not been privy to the start of 1ES conversations, it would be my guess that if people were going to have to be moving from Source Depot (which is remarkably similar to a very old version of Perforce), in part to resolve all the custom Microsoft toolchain issues, then why not get more “bang” out of it and use what more people are using and getting productivity dividends out of. I believe roughly before this time or around it, teams like ASP.NET and Project Orleans have been working in OSS style and as the various OSS projects at Microsoft were introduced they all made a shift from Codeplex to GitHub so that they could be closer to where more developers are typically located and feel more like part of their community. Having Git used on both externally facing OSS projects and internal-only ones (Office, Windows, Xbox, and their services, and everything else) makes it a more valuable skill, and allows workflow tooling to be shared in VS across both these types of projects.
lmao, now I’m thinking of Mr T.
>Ej hvor lækkert Amalie! Det er sjovt at se billeder af hele dit værelse, fordi det er svært at forestille sig, når vi kun ser dit tøjstativ i baggrunden, på videoerne! Den nye titel ting du har fået på din hjemmeside, "en header" eller hvad den he2#dr&e8d30; Jeg synes ikke den passer specielt godt til dig? Den kan laves på Word på 1 sek, og du er bare sejere og mere unik end det! Jeg synes jo at ens header den udstråler hvem man er!KH
Bravo! This is fantastic stuff!
And it opens up all sorts of opportunities for us to develop tooling to offer a light-weight view of the Git Repo. That could be especially useful for the use-case where someone wants to simply browse the code in a large repo without having to pull the whole lot down, or without using a Web UI.
The ability to modify a single file and then commit that back to the repo – without clone the entire repo – is GOLD!
Thank you very much!
Brilliant tech … can see this used in many different ways beyond just a virtualized source control.
As some have alluded, I would love OneDrive to jump onto this and quite possibly help enable this in its cloud..
What does “hydrated” mean in this context? A moderate amount of Googling has left me no less confused.
Am I correct in thinking that this is available only on Windows?
Think of it in terms of “hydrating” a record or a data set or something. If it’s “de-hydrated”, it’s not filled up with data. In some contexts this might just mean that the thing is stale (but it the same size overall), but in this context I take it to mean that the file is just a light-weight representation of the file that looks like the real file to Git, but didn’t need to be fully copied. In that way it can work with Git and would only be actually copied at the time it was needed.
I’m using git on cygwin because Windows doesn’t feature a portable command line interface, and the Linux on Windows stuff is too far from production-ready. Unfortunately, git and lots of other tools (particularly bash scripts) suffers horribly from the lack of copy-on-write support when doing a fork(). A lot of us would badly like to see that fixed.
My only real performant, cross-development alternative right now is to run my dev environment out of a docker container in Windows, but this means I have to orchestrate coordination through the volume interface or a socket or something to run Windows native commands, or to run them under Wine inside the container, both of which seem a little silly and really just reduce Windows to an embedded target system from my point of view.
The gvfs sounds clever but also rather against one of the key features of git, which is that every local repo is a full redundant copy every time you pull. If most of the data isn’t really local until accessed, then you no longer have redundant copies. It’s really nice that you don’t technically need backups.
Having the whole repo local is also a great way to stay productive even if you’ve lost connectivity to the net, but if you stall out on the first access to a piece of data that isn’t in your cache, then you are still dependent on the net.
If you’re working with a really big set of repos, you need really big pipes and drives to tame it. That just seems like an intrinsic given with git. I agree with others here that it sounds like bloat, which is probably the the underlying issue that needs addressing.
It seems like a good application for sicking machine learning on your continuous integration solution to figure out what the apps are actually trying to do from a low-level dataflow point of view (instrument all your memory accesses at a low level using page faults and compiler options to insert instrumentation in the actual mov instructions- which will make everything run very slowly but yield pivotal dataflow information) so that it can rewrite the legacy code re-using the identifiers it can harvest from the C++ but in a more functional language less prone to bugs than C++. There are probably amazingly redundant amounts of code and lots of old legacy systems that can be reduced to much cleaner code through this kind of dataflow analysis.
That’s a lot moonshottier than gvfs, but gvfs just feels like it’s missing some of the important functions of git in eagerness to squirm out of doing the real (hard, awful) work that needs doing.
The normal Git for Windows package uses MinGW/MSYS2, so it performs a lot better. It also comes with Bash and a basic shell environment. It works well enough for my needs, and avoids a lot of the issues you run into if you don’t keep the Cygwin environment separated from native Windows.
F*cnâki€™ amazing things here. I am very glad to see your post. Thanks a lot and i am looking forward to contact you. Will you please drop me a e-mail?
I’m so glad you can use it! I promised Ellen years ago that I would somehow get this onto a computer. Glad I could finally transfer it before my old computer died! I’m sure this is a cloze activity or matching worksheet we could come up with. There are some technical words- naming the different characters, etc.
Thank you for a concise post that manages to always be informative and exceptional at once. Bravo!!! I just now had to visit your website. Fantastic! I once got a new book from the online store that looks at the same thing in this article. But I found yours better to understand, especially for one elderly person at all like me. I’ll be sharing the idea with my friends as well as family. Today I have learned some things from you.VA:F [1.9.17_1161]please wait…
That’s a smart way of looking at the world.
I don’t get all the vitriol in these comments.
How about we start by thanking the MSFT team for releasing such an awesome project and putting in so much work, rather than criticizing them because you don’t like Windows or whatever.
Regardless of what your OS & Software preferences are, there are very few people that use Git that have to work with a repo that is 270GB+ on a regular basis. So rather than bitching, just be grateful than when you eventually do have to tackle this issue someone before you made a dent towards fixing it.
Thanks guys!
Cool! What a wonderful comment.
Why not a file system for this in windows like AUFS? so it can be used outside GIT. Or I believe Windows Container has similar technology to do COW, why not re-purpose that?
Why in the hell would you pick a name that is already “taken”? That only serves to confuse people. Especially when using google to search for information regarding gvfs. Which gvfs you say? I don’t know. You tell me…
https://en.wikipedia.org/wiki/GVfs
So are you not taking away the ability of GIT to work offline and having entire history local
Read Brian Harry’s blog for the background story of this solution: https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-git-and-some-back-story. When presented at Git-Merge last Friday I had my doubts but after reading the different solutions that were tried out prior to taking this route I think they did the right thing.
Google has all of their code (except Android stuff) in a single repository. They use a custom VCS and custom client to access and use it:
http://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
Great thing for Microsoft to apply the same idea to git!
This has to be called “Lazy Git”.
I like it. This also opens the door to plenty of mind-blowing alternatives, like this: Why sit around waiting to hydrate your entire repo when you have “Git Z Boy”?!
This sounds like an awesome project though – looking forward to hearing updates!
Related blog post. Talks about Scaling Git and some of the reasoning why they chose one report for Windows core.
https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-git-and-some-back-story/
*repo (not report)
I’m curious, what comprises the 270GB? Is it all text-based code? If so, that is impressive. If the codebase has a significant amount of binaries or large resource files, has Microsoft looked into Git LFS (https://git-lfs.github.com/) ?
It’s all the Windows source including resources, test code, test data, etc.
Does this mean folks will go back to putting all their binaries into SC instead of using Nuget
https://research.google.com/pubs/pub45424.html “The repository contains 86TBa of data”
So how they do it?:D
They don’t use git: http://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
I would love to use this tech as a replacement for One Drive for Business, since you removed the “placeholder files” or whatever you called them, it just isn’t suitable for low storage machines, which is surely one of the plus sides to using cloud storage? Plus I trust GIT more than whatever aging tech One Drive for Business uses, WebDav?
As i understand it, what Microsoft is actually developing is a FUSE equivalent driver for Windows (finally!).
And then a cache for GIT, because they need it themselves. Nobody else. However huge windows is, 2.5 million files is still silly.
Wow, i am amazed. What a contribution.
every problem in computer science can be solved by adding a layer of indirection
Congratulations on inventing rsync
I could nnot refrain from commenting. Exceptionlly wepl written! Hi
there! Someoone iin myy Faceebook grkup shared tthis site wth uss sso I
came tto look itt over. I’m definitly loving thhe information. I’m book-marking and will bee tweeting
thiis too mmy followers! Superb blopg andd outstandiong design. I’ve beeen browing on-line greager than 3 hours today, butt I by
nno mesns discoivered any attention-grabbing aticle like yours.
It iis prertty price sufficient ffor me. In my opinion, iff
aall webmasters aand bloggrs ade good conjtent material ass
you probaly did, tthe nett will likely bee a loot ore helpful
thhan eever before. http://cspan.org
But what about https://en.wikipedia.org/wiki/GVfs
And this dear children is why Microsoft developers should stop taking drugs!
Are there any blog posts or presentations about the transition, that must have been a fairly massive project!
So are you not taking away the ability of GIT to work offline and having entire history local
You don’t want to make it too fast though, you wouldn’t want someone to clone the whole Windows source code onto a removable device in like 1 minute, then get away with it.
thank you so much it was great