Hosting Git on the Cloud with Windows Azure Virtual Machines (Part 1)

With the June release, we have announced a new feature for Windows Azure: persistent Virtual Machines. We already had a VM role, however it was not persistent, and at any time, the Windows Azure Fabric Controller could decide to reimage your Virtual Machine, and everything you would have installed on the VM after having built the image would have been lost. The new persistent VM is essentially an “Infrastructure as a Service” VM, as its state will be durable. It is exactly as if it was a physical machine under your desk (or in your datacenter), except is has replication, making the data stored on it very safe.

This opens a number of interesting scenarios. In this series, we will see how to leverage Windows Azure Virtual Machines to host a Git repository. Git is a source control system that has gained a lot of popularity in the last few years. It is different from the “traditional” source control systems (Microsoft’s TFS, Subversion, etc…) in that it is distributed. Each contributor has a “clone” of the repository on their own machine, commit their changes locally, and then push their changes to other clones of the repository. Usually, though, it is practical to have an “authoritative clone”, which is the clone where all the contributors push their changes to.

This authoritative clone can take many forms. The best solution today is probably to host it on Github. It is free for any open source project, however, if you don’t want your source code to be public, you will have to pay a subscription fee (from $7 to $200 per month, depending on the plan). You can also host your central Git repository on a file share on your local network, or on a file sharing service such as Dropbox or SkyDrive.

If you host your repository on Windows Azure, the biggest benefit is the highly durable storage we provide. If you enable “Geo-replication” on your Virtual Hard Disk, the disk will not only be replicated 3 times in the primary datacenter of your choice, but also in a distant datacenter, at least 400 miles away. If a disaster happens in the primary datacenter (earthquake, plane crash, etc…), you know your source code is still safe. This is not a guarantee you get if you host your code on a free generic file storage service, and even less if you host it on your own file share.

If you choose to host it on Windows Azure, with the current pricing, it will cost you $3.75 per month for storage (with a 30 GB VHD), and $14.40 per month for compute (for an extra-small VM): in total $18.15 per month. This is slightly more than the basic Github plan, but you can have an unlimited number of repositories on your VM, and extra space is extremely cheap ($0.02 per month per gigabyte). And since you will have a server up and running, you can then also use it for hosting various tools at no extra cost (web interface for Git, project tracking tool, issue tracking, databases, etc…).

Hosting a remote Git repository can be done either on Windows or Linux. Since it is slightly simpler to do it on Linux (Git was initially created to host the Linux Kernel code), and the new Windows Azure Virtual Machines can host Linux seamlessly, I will take that opportunity to demonstrate how to do it on Linux.

In the second post of the series, we will see how to provision a permanent Linux Virtual Machine with Windows Azure.