A Brief Tutorial on a Shared Git Repository

Sunday 23 January 2011 by Bradley M. Kuhn

A while ago, I set up Git for a group privately sharing the same central repository. Specifically, this is a tutorial for those who would want to have a Git setup that is a little bit like a SVN repository: a central repository that has all the branches that matter published there in one repository. I found this file today floating in a directory of “thing I should publish at some point”, so I decided just to put it up, as every time I came across this file, it reminded me I should put this up and it's really morally wrong (IMO) to keep generally useful technical information private, even when it's only laziness that's causing it.

Before you read this, note that most developers don't use Git this way, particularly with the advent of shared hosting facilities like Gitorious, as systems like Gitorious solve the weirdness of problems that this tutorial addresses. When I originally wrote this (more than a year ago), the only well-known project that I found using a system like this was Samba; I haven't seen a lot of other projects that do this. Indeed, this process is not really what Git is designed to do, but sometimes groups that are used to SVN expect there to be a “canonical repository” that has all the contents of the shared work under one proverbial roof, and set up a “one true Git repository” for the project from which everyone clones.

Thus, this tutorial is primarily targeted to a user mostly familiar with an SVN workflow, that has ssh access to host.example.org that has a writable (usually by multiple people) Git repository living in the directory /git/REPOSITORY.git/.

Ultimately, The stuff that I've documented herein is basically to fill in the gaps that I found when reading the following tutorials:

So, here's my tutorial, FWIW. (I apologize that I make the mortal sin of tutorial writing: I drift wildly between second-person-singular, first-person-plural, and passive-voice third-person. If someone sends me a patch to the HTML file that fixes this, I'll fix it. :)

Initial Setup

Before you start using git, you should run these commands to let it know who you are so your info appears correctly in commit logs:

             $ git config --global user.email Your.Email@example.com
             $ git config --global user.name “Your Real Name”
            

Examining Your First Clone

To get started, first we clone the repository:

              $ git clone ssh://host.example.org/git/REPOSITORY.git/
            

Now, note that Git almost always operates in the terms of branches. Unlike Subversion, Git's branches are first-class citizens and most operations in Git operate around a branch. The default branch is often called “master”, although I tend to avoid using the master branch for much, mainly because everyone who uses git has a different perception of what the master branch should embody. Therefore, giving all your branches more descriptive name is helpful. But, when you first import something into git, (for example, from existing Subversion trees), everything from Subversion's trunk is thrown on the master branch.

So, we take a look at the result of that clone command. We have a new directory, called REPOSITORY, that contains a “working checkout&rquo; of the repository, and under that there is one special directory, REPOSITORY/.git/, which is a full copy of the repository. Note that this is not like Subversion, where what you have on your local machine is merely one view of the repository. With Git, you have a full copy of everything. However, an interesting thing has been done on your copy with the branches. You can take a look with these commands:

              $ git branch
              * master
              $ git branch -r
              origin/HEAD
              origin/master
            

The first list of branches are the branches that are personal and local to you. (By default, git branch uses the -l option, which shows you only “local” branches; -r means “remote” branches. You can also use -a to see all of them.) Unless you take action to publish your local branches in some way, they will be your private area to work in and live only on your computer. (And be aware: they are not backed up unless you back them up!) The remote ones, that all start with “origin/” track the progress on the shared repository.

(Note the term “origin” is a standard way of referring to “the repository from whence you cloned”, and origin/BRANCH refers to “BRANCH as it looks in the repository from whence you cloned”. However, there is nothing magical about the name “origin”. It's set up to DTRT in your WORKING-DIRECTORY/.git/config file, and the clone command set it all up for you, which is why you have them now.)

Get to Work

The canonical way to “get moving” with a new task in Git is to somehow create a branch for it. Branches are designed to be cheap and quick to create so that users will not be shy about creating a new one. Naming conventions are your own, but generally I like to call a branch USERNAME/TASK when I'm still not sure exactly what I'll be doing with it (i.e., who I will publish it to, etc.) You can always merge it back into another branch, or copy it to another branch (perhaps using a more formal name) later.

Where do you Start Your Branch From?

Once a repository exists, each branch in the repository comes from somewhere — it has a parent. These relationships help Git know how to easily merge branches together. So, the most typical procedure of starting a new branch of your own is to begin with an existing branch. The git checkout command is the easiest to use to start this:

               git checkout -b USERNAME/feature origin/master
            

In this example, we've created our own local branch, called USERNAME/feature, and it's started from the current state of origin/master. When you are getting started, you will probably usually want to always base your new branches off of ones that exist on the origin. This isn't a rule, it's just less confusing for a newbie if all your branches have a parent revision that live on the server.

Now, it's important to note here that no branch stands still. It's best to think about a branch as a “moving pointer” to a linked list of some set of revisions in the repository.

Every revision stored in git, local or remote, has a SHA1 which is computed based on the revisions before it plus new patch the revision just applied.

Meanwhile, the only two substantive differences between one of these SHA1 identifiers and an actual branch is that (a) Git keeps changing what identifier the branch refers to as new commits come in (aka it moves the branch's HEAD), and (b) Git keeps track of the history of identifiers the branch previously referred to.

So, above, when we asked git checkout to creat a new branch called USERNAME/feature based on origin/master, the two important things to realize are that (a) your new branch has its HEAD pointing at the same head that is currently the HEAD of origin/master, and (b) you got a new list to start adding revisions in the new branch.

We didn't have to use branch for that. We could have simply started our branch from any old SHA1 of any revision. We happened to want to declare a relationship with the master branch on the server in this case, but we could have easily picked any SHA1 from our git log and used that one.

Do Not Fear the checkout

Every time you run a git checkout SOMETHING command, your entire working directory changes. This normally scares Subversion users; it certainly scared me the first time I used git checkout SOMETHING. But, the only reason it is scary is because svn switch, which is the roughly analogous command in the Subversion world, so often doesn't do something sane with your working copy. By contrast, switching branches and changing your whole working directory is a common occurrence with git.

Note, however, that you cannot do git checkout with uncommitted changes in your directory (which, BTW, also makes it safer than svn switch). However, don't be too Subversion-user-like and therefore afraid to commit things. Remember, with Git (and unlike with Subversion), committing and publishing are two different operations. You can commit to your heart's content on local branches and merge or push into public branches later. (There are even commands to squash many commits into one before putting it on a public branch, in case you don't want people to see all the intermediate goofiness you might have done. This is why, BTW, many Git users commit as often as an SVN user would save in their editors.)

However, if you must switch checkouts but really do fear making commits, there is a tool for you: look into git stash.

Share with the Group

Once you've been doing some work, you'll end up with some useful work finished on a USERNAME/feature branch. As noted before, this is your own private branch. You probably want to use the shared repository to make your work available to others.

When using a shared Git repository, there are two ways to share your branches with your colleagues. The first procedure is when you simply want to publish directly on an existing branch. The second is when you wish to create your own branch.

Publishing to Existing Branch

You may choose to merge your work directly into a known branch on the remote repository. That's a viable option, certainly, but often you want to make it available on a separate branch for others to examine, even before you merge it into something like the master branch. We discuss the slightly more complicated new branch publication next, but for the moment, we can consider the quicker process of publishing to an existing branch.

Let's consider when we have work on USERNAME/feature and we would like to make it available on the master branch. Make sure your USERNAME/feature branch is clean (i.e., all your changes are committed).

The first thing you should verify is that you have what I call a “local tracking branch” (this is my own term that I made up, I think, you won't likely see it in other documentation) that is tied directly with the same name to the origin. This is not completely necessary, but is much more convenient to keep track of what you are doing. To check, do a:

               $ git branch -a
               * USERNAME/feature
                 master
                 origin/master
            

In the list, you should see both master and origin/master. If you don't have that, you should create it with:

               $ git checkout -b master origin/master
            

So, either way, you wan to be on the master branch. To get there if it already existed, you can run:

               $ git checkout master
            

And you should be able verify that you are now on master with:

               $ git branch
               * master
               ...
            

Now, we're ready to merge in our changes:

               $ git merge USERNAME/feature
               Updating ded2fb3..9b1c0c9
               Fast forward
               FILE ...
               N files changed, X insertions(+), Y deletions(-)
            

If you don't get any message about conflicts, everything is fine. Your changes from USERNAME/feature are now on master. Next, we publish it to the shared repository:

              $ git push
              Counting objects: N, done.
              Compressing objects: 100% (A/A), done.
              Writing objects: 100% (A/A), XXX bytes, done.
              Total G (delta T), reused 0 (delta 0)
              refs/heads/master: IDENTIFIER_X -> IDENTIFIER_Y
              To ssh://host.example.org/git/REPOSITORY.git
               X..Y  master -> master
            

Your changes can now be seen by others when they git pull (See below for details).

Publishing to a New Branch

Suppose, what you wanted to instead of immediately putting the feature on the master branch, you wanted to simply mirror your personal feature branch to the rest of your colleagues so they can try it out before it officially becomes part of master. To do that, first, you need tell Git we want to make a new branch on the shared repository. In this case, you do have to use the git push command as well. (It is a catch-all command for any operations you want to do to the remote repository without actually logging into the server where the shared Git repository is hosted. Thus, Not surprisingly, nearly any git push commands you can think of will require you to be net.connected.)

So, first let's create a local branch that has the actual name we want to use publicly. To do this, we'll just use the checkout command, because it's the most convenient and quick way to create a local branch from an already existing local branch:

              $ git branch -l
              * USERNAME/feature
                master
                ...
              $ git checkout -b proposed-feature USERNAME/feature
              Switched to a new branch “proposed-feature”
              $ git branch -l
              * proposed-feature
                USERNAME/feature
                master
                ...
            

Now, again, we've only created this branch locally. We need an equivalent branch on the server, too. This is where git push comes in:

              $ git push origin proposed-feature:refs/heads/proposed-feature
            

Let's break that command down. The first argument for push is always “the place you are pushing to”. That can be any sort of git URL, including ssh://, http://, or git://. However, remember that the original clone operation set up this shorthand “origin” to refer to the place from whence we cloned. We'll use that shorthand here so we don't have to type out that big long URL.

The second argument is a colon-separated item. The left hand side is the local branch we're pushing from on our local repository, and the right hand side is the branch we are pushing to on the remote repository.

(BTW, I have no idea why refs/heads/ is necessary. It seems you should be able to say proposed-feature:proposed-feature and git would figure out what you mean. But, in the setups I've worked with, it doesn't usually work if you don't put in refs/heads/.)

That operation will take a bit to run, but when it is done we see something like:

              Counting objects: 35, done.
              Compressing objects: 100% (31/31), done.
              Writing objects: 100% (33/33), 9.44 MiB | 262 KiB/s, done.
              Total 33 (delta 1), reused 27 (delta 0)
              refs/heads/proposed-feature: 0000000000000000000000000000000000000000
                                             -> CURRENT_HEAD_SHA1_SUM
              To ssh://host.example.org/git/REPOSITORY.git/
               * [new branch]      proposed-feature -> proposed-feature
            

In older Git clients, you may not see that last line, and you won't get the origin/proposed-feature branch until you do a subsequent pull. I believe newer git clients do the pull automatically for you.

Reconfiguring Your Client to see the New Remote Branch

Annoyingly, as the creator of the branch, we have some extra config work to do to officially tell our repository copy that these two branches should be linked. Git didn't know from our single git push command that our repository's relationship with that remote branch was going to be a long term thing. To marry our local to origin/proposed-feature to a local branch, we must use the commands:

              $ git config branch.proposed-feature.remote origin
              $ git config branch.proposed-feature.merge refs/heads/proposed-feature
            

We can see that this branch now exists because we find:

              $ git branch -a
              * proposed-feature
                USERNAME/feature
                master
                origin/HEAD
                origin/proposed-feature
                origin/master
             

After this is done, the remote repository has a proposed-feature branch and, locally, we have a proposed-feature branch that is a “local tracking branch” of origin/proposed-feature. Note that our USERNAME/feature, where all this stuff started from, is still around too, but can be deleted with:

            git branch -d USERNAME/feature
            

Finding It Elsewhere

Meanwhile, someone else who has separately cloned the repository before we did this won't see these changes automatically, but a simple git pull command can get it:

              $ git pull
              remote: Generating pack...
              remote: Done counting 35 objects.
              remote: Result has 33 objects.
              remote: Deltifying 33 objects...
              remote:  100% (33/33) done
              remote: Total 33 (delta 1), reused 27 (delta 0)
              Unpacking objects: 100% (33/33), done.
              From ssh://host.example.org/git/REPOSITORY.git
               * [new branch]      proposed-feature -> origin/proposed-feature
              Already up-to-date.
              $ git branch -a
              * master
                origin/HEAD
                origin/proposed-feature
                origin/master
            

However, their checkout directory won't be updated to show the changes until they make a local “mirror” branch to show them the changes. Usually, this would be done with:

              $ git checkout -b proposed-feature origin/proposed-feature
            

Then they'll have a working copy with all the data and a local branch to work on.

BTW, if you want to try this yourself just to see how it works, you can always make another clone in some other director just to play with, by doing something like:

              $ git clone ssh://host.example.org/git/SOME-REPOSITORY.git/ \
                extra-clone-for-git-didactic-purposes
            

Now on this secondary checkout (which makes you just like the user who is not the creator of the new branch), work can be pushed and pulled on that branch easily. Namely, anything you merge into or commit on your local proposed-feature branch will automatically be pushed to origin/proposed-feature on the server when you git push. And, anything that shows up from other users on the origin/proposed-feature branch will show up when you do a git pull. These two branches were paired together from the start.

Irrational Rebased Fears

When using a shared repository like this, it's generally the case that git rebase usually screws something up. When Git is used in the “normal way”, rebase is one of the amazing things about Git. The rebase idea is: you unwind the entire work you've done on one of your local branches, bringing in changes that other people have made in the meantime, and then reapply your changes on top of them.

It works out great when you use Git the way the Linux Project does. However, if you use a single, shared repository in a work group, rebase can be dangerous.

Generally speaking, though, with a shared repository, you can use git merge and won't need rebasing. My usual work flow is that I get started on a feature with:

              $ git checkout -b bkuhn/new-feature starting-branch
            

I work work work away on it. Then, when it's ready, I send a patch around to a mailing list that I generate with:

              $ git diff $(git merge-base starting-branch bkuhn/new-feature) bkuhn/new-feature
            

Note that the thing in the $() returns a single identifier for a version, namely, the version of the fork point between starting-branch and bkuhn/new-feature. Therefore, the diff output is just the stuff I've actually changed. This generates all the differences between the place where I forked and my current work.

Once I have discussed and decided with my co-developers that we like what I've done, I do this:

              $ git checkout starting-branch
              $ git merge bkuhn/new-feature
            

If all went well, this should automatically commit my feature into starting-branch. Usually, there is also an origin/starting-branch, which I've probably set up for automatic push/pull with my local starting-branch, so I then can make the change officially by running:

              $ git push
            

The fact that I avoid rebase is probably merely FUD, and if I learned more, I could use it safely in cases with shared repository. But I have no advice on how to make it work. In particular, this Git FAQ entry shows quite clearly that my work sequence ceases to work all that well when you do a rebase — namely, doing a git push becomes more complicated.

I am sure a rebase would easily become very necessary if I lived on bkuhn/new-feature for a long time and there had been tons of changes underneath me, but I generally try not to dive to deep into a fork, although many people love DVCS because they can do just that. YMMV, etc.

Posted on Sunday 23 January 2011 at 14:45 by Bradley M. Kuhn.

Comment on this post in this identi.ca conversation.



Creative Commons License This website and all documents on it are licensed under a Creative Commons Attribution-Share Alike 3.0 United States License .


#include <std/disclaimer.h>
use Standard::Disclaimer;
from standard import disclaimer
SELECT full_text FROM standard WHERE type = 'disclaimer';

Both previously and presently, I have been employed by and/or done work for various organizations that also have views on Free, Libre, and Open Source Software. As should be blatantly obvious, this is my website, not theirs, so please do not assume views and opinions here belong to any such organization.

— bkuhn


ebb is a (currently) unregistered service mark of Bradley M. Kuhn.

Bradley M. Kuhn <bkuhn@ebb.org>