Learning my ways with git

During my work, I sometimes get in touch with new technology. This usually involves long time searching for tutorials and best practices. More annoying, I keep on searching my browser history to recover some pieces of information from a tutorial I read a few days before. In this article I gather all the information I regularly need to work with git and github. It might be useful for others, but it is primarily targeted on myself. 

Some Motivation

Well, I actually do write source code. I feel like doing research in software engineering really benefits from coding. In most of my research projects I write some code that allows me to handle data in software projects in a different and new way. Most often, it seems like I could have reached my goal much easier by using some scripts and existing tools (e.g. R, NetMiner). But during my research, I feel kind of limited by everything weaker than a general purpose programming language. It is very probable that this limitation is only existing in my mind. However, research is partly a creative process and being creative is a state of mind. I find expressing my thoughts in Java very inspiring.

That being said, I am of course also concerned with managing my source code. GitHub is great for this, because it allows to share the mechanisms that I used to do my research. In a perfect world, others would use my code to verify my results and extend them by new data points or related studies. In reality, my research prototypes include too many quick hacks to be really useful to others. Nevertheless, I try.

In doing so, I need to define my own workflows. They should be simple (because most often, I am working alone), standard conform (because sometimes I want others to help me and I do not want to burden them with learning new things), and easy to remember. I found a beautiful post describing a workflow around github that perfectly makes sense to me: If the guys at GitHub can do with this workflow, it should be great for me…

The only issue: As a git newbie I always need to lookup the commands to do. So… here they are (correct, this is a selfish post as a reminder to myself. But please feel free to use it for your own purposes or to suggest improvements 🙂 I gathered these commands from the diaspora workflow page, /tmp/labs git cheat sheet, and Geoff’s blog post on the matter, especially in Oliver’s update in the comments.

The Workflow

On a new machine

Sometimes I want to start fresh from the repository – either on a new machine or in a different workspace. Here is how you do it:

git clone git@github.com:<you>/<project>.git
cd <project>

Now, we are ready to start.

Working on a new feature

When starting to develop a new feature, it is time to create a branch. This is great, especially for me: I can start developing something new, sometimes many things in parallel. Meanwhile, the main branch remains functional. If I feel like I found a dead end, I can simply delete the branch. Of course, having worked with CVS and SVN, I am a little bit afraid that merging branches might be a problem. But I have been told that git can deal with this.

git checkout -b <new_feature>

This creates a local branch. <new_feature> should be a descriptive name. I think it is good practice to have the feature name start with the issue number. Now we have a local branch. This is not enough, because we want to store our changes on the server.

git push origin <new_feature>

Now the server knows the branch. If we do a git push, it will go to the branch. At the moment, I get an error message here: Apparently, the push to the branch works, but git also tries to push to master. And that fails:

! [rejected]        master -> master (non-fast-forward)

Easy to ignore, because that is what I want. Still, we also want to receive pulls from our branch.

git branch --set-upstream <new_feature> origin/<new_feature>

This does the trick and we are set to work.

Working in the branch

No surprises here. Do the changes and then:

git commit -a
git push

This loads the changes to the branch on github.

Synchronization with master branch

Case 1: Getting changes from master in your branch

It is not unusual that the master branch changes, even if you are the only developer, e.g. you might do a bug fix. Here is what you need to do to get those changes into your branch:

git pull
git merge origin/master

Do not forget to pull first! Otherwise, git will not find anything to merge. After the merge, you have a number of commits that you can push to your branch on the server to bring it up to date.

Case 2: Finishing your work and pushing your changes into the master branch

At some point the new feature is developed and tested and should be included into the main branch. Push all your changes into your branch on the server.

git commit -a
git push

Then, switch to the master branch.

git checkout master

Now, do a merge:

git merge <new_feature>
git push

Understanding the state of things

Well, you sure want to look on your github page regularly. But of course there are also some commands that come in handy.

git status
git branch -r

Cleaning up

If you want to delete your branch online, you can do the following (I did not test this):

git push origin :<new_feature>

Unused commands

I did not use some of the commands that I encountered. The only one that comes to my mind now is

git rebase master

Apparently, some people prefer this above git merge. I will write more on the matter once I have my own opinion about this. The above seems to work for me.


7 thoughts on “Learning my ways with git

    • Yes, I was very reluctant to start this, because I remember the problems we had in this software company to merge branches. But, if you are coding mostly alone, the effort is less (almost nothing).

      Without branching, I had the following problem: I hacked a new feature, when suddenly another researcher was visiting. Now I wanted to show my cool prototype, but it would not run anymore. Mosten often in that case, the svn was inconsistent, too – I felt reluctant to have a weeks work only on my local machine. Now, with branches, this risk seems to be mitigated.

  1. The error I mentioned above does not show anymore. I guess I had defined more than one upstream at that time. After some testing, I am completely happy. Let’s see what more experience yields.

  2. Good post, I would really recommend not using rebase unless you have to..as anyone that has used your branch will suffer a horrible horrible death. Another command that is very useful is git cherry-pick, it can be used to grab certain commits from branch to branch. One more thing I noticed is that you are doing a lot of ‘git pull’, ‘git merge’, which to me seems a little redundant as a git pull is by definition git fetch + git merge, unless you are doing it to different branches. I’ve been using only git for the last 3 years on all my projects and I still have things to learn 🙂

    • Thanks, Braden – this is great additional information. Let’s see. I was not aware about “rebasing” causing problems, thank you for the warning. Could you elaborate what happens to other users?

      Also, thank you for highlighting “git cherry-pick”. This is really handy, e.g. when doing some small improvements that are not related to the current task, or when bug fixes in the master branch become available.

      About “git pull” and “git merge”, actually I only use this to get changes from origin/master into my current branch. As you point out, “git pull” is not sufficient, because I need to merge two branches. It is required, because otherwise the information about the new commits in master is not available locally. I think I could substitute it with “git fetch” (but still need to try this). But it might be a bad idea, because this could cause the merge being applied to a old version of my branch.

      Three years? Isn’t that amazing? I think git is usable after only a very short time of learning. But then, there is so much else to learn. What a great and rewarding learning curve!

    • About the cherry-picking again. I was wondering how to get to know the commits I want to pick and found this beautiful command:

      git log --graph --pretty=format':%C(yellow)%h%Cblue%d%Creset %s %C(white) %an, %ar%Creset'

      • Yeah one of the great commands that I use all the time is similar to that (I actually put it in my .gitconfig and aliased it. Here is an example of what I mean

        lol = log –graph –decorate –pretty=oneline –abbrev-commit
        lola = log –graph –decorate –pretty=oneline –abbrev-commit –all
        s = status
        b = branch
        branch = auto
        diff = auto
        interactive = auto
        status = auto

        So now I can type git lol(a) and s for status etc, which saves tons of time.

        With regards to the rebasing problems I was hinting at, it’s important to remember that once commits are pushed to a remote, you shouldn’t rebase them afterwards. The reason is that people could pull your commits, base their work on your commits, then if you decide to rebase (change the history) and push other people will have messed up histories and require themselves to rebase again to get the code working. This gets worse and worse the more people you have using your code. Basically a good rule I’ve seen around the internet is unless you’re absolutely sure noone is using your code, rebasing pushed commits is a no-no.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s