Simple version control with Git in 10 minutes

Disclaimer: I am new to git. I am definitely not an expert. I am a doctoral candidate just learning, and posting this as a reference for myself and, hopefully, as a helpful guide for other beginners. But this is far from exhaustive, and I may even inadvertedly introduce some bad practice.
In fact, recent studies have shown that you will get marmelade on your shirt just before leaving for an important interview if you follow my advise too uncritically.

As the amount, length and complexity of the code I am writing has started to grow, the trusty old copy the entire work folder and continue working on the new copy every time you made a major improvement to your code so you don’t mess it up too bad -approach seems to be more and more uncool. Clumsy, risky, taking up insane amounts of space when there’s also data files etc. to keep track of. So, I decided to see what that version control hype was all about. Maaan, I so should have done that earlier.

Part I: Simple version control/change tracking on a workstation

My user case

I am, a Ph.D. student in astronomy, generally writing code to analyze my data. I work on a university desktop running Ubuntu (but this HowTo should also work just fine on a Mac). I have a laptop also running Ubuntu that I occasionally use to work on my stuff from home. A remote SSH connection is usable but often slow and frustrating, and it’s vulnerable to network outages that sometimes happen on my wireless network.

I want to keep track of changes in my working directory (one for each project I am working on that I want to keep track of independently), I want to be able to perform crazy experiments with my code without having to worry about messing up my stable working version, and I want to be able to synchronize the versions on my home and office computer.

There may very well be other grand and righteous versioning systems out there, I chose Git for the following reasons:

  • It seems to be very popular, also among grand-scale projects like e.g. the Linux kernel
  • It is scalable – while it can manage huge and complex projects, my typical small and simple project is quick and  easy to set up.
  • It is flexible: I don’t need a central server to set it up. Any folder on my machine can act as a Git repository. I can synchronize a folder between my office desktop and my private laptop. And should I change my mind and want to set up a full-fledged shared repository on an online server, it is easy to migrate.

Installation and setup

Install

On Linux, Git is very likely found in your package manager. On Ubuntu, install the package from the Software Center or through the Terminal:

$ sudo aptitude install git-core

…that should be it. On a Mac, you can install it from here, or through MacPorts, where it is easy as:

$ sudo port install git-core

– provided you have MacPorts set up already. I’m sure Windows users can install it too, but I have no idea how as I have not been using Windows for work purposes for a very long time.

Personalize

To tell Git about your name and e-mail, run:

$ git config --global user.name "Your Name"
$ git config --global user.email "your_email@server.com"

This step is not strictly necessary now, but it will be if you are going to collaborate on a project at a later stage, and it’s easy so just as well just get it over with.

And hey! You can set up a name for your current repo only by omitting the ‘–global‘ option. That way, your info will be remembered only by the repository you are currently working on.

Setting up and maintaining a local repository

Setting up any local directory – that is, any folder on your computer – as a git repository is easy. It does not provide synchronization (yet), but it allows you to track changes to your work and revert to any earlier stage if you’d want to, and to switch between stable and experimental branches. If you want to, it is easy to add more functionality later.

Creating repository from scratch

Let’s start by creating a new folder for our fictional project. Open the terminal, go to the folder where you want to create your project. Now we’ll create a folder, navigate into it, initiate an empty repository and tell Git to add the current folder to the list of files and folders to track:

 
$ mkdir Gittest 
$ cd Gittest 
$ git init 
$ git add . 

Adding files & folders

Git needs to be told specifically which files to track. It is fully possible to have a bunch of untracked files in your repository. So let us create and add a file, e.g. a(n empty) Python-script:

 
$ touch newfile.txt 
$ touch myscript.py 
$ git add newfile.txt myscript.py 

Git is now tracking the file myscript.py.

Turning existing project into Git repository

This is very easy, thanks to the fact that Git automatically adds all contents of a folder if you add the folder. So suppose you have a folder where you already have a bunch of subfolders and files, and you want to set it up as a Git repository. Just navigate to it and do the same as before: $ git init $ git add . Git is now tracking all files in the folder and any subfolders. Remember, though, that if you add any folders and files later, you still need to specifically add these before they are tracked.

First snapshot

After you have run the above steps, it is time to make the first snapshot of the new repository. This is done by running:

$ git commit 

This opens a text editor – in this case Vim – and prompts you to write a commit message to describe the changes made since last commit; see below. Write the message, save the file and exit the editor.

Unknownname

The current state of your tracked files is now saved with a unique, auto-generated name tag. Snapshots can be taken as often as one wants.

Staging and committing

Git keeps track of your files, but it does not keep track of every little change you make (this is for you to be able to commit changes to one file while holding back changes to another file). Every time you make a commit, you need to stage your changes – that is, to specify which files and changes to include by using ‘git add’ as described above. A shortcut to commit all changes you’ve made to already tracked files, and to avoid the editor opening, is:

$ git commit -a -m "Commit message"

If you want to see which changes are staged and which are not, you can run

 
$ git status

The command

$ git reset 

clears all changes – staged or non-staged – you have made since the last commit.

In practice, I hardly ever worry about staged and unstaged changes, as mostly I just commit with the -a option. But there are times when it comes in handy.

Branching and merging

Suppose you have a piece of code that works and is stable and you need to use it on a regular basis, but on the other hand you really want to add this awesome experimental feature to it. The answer is to create a new branch that we can call e.g. ‘experimental’:

$ git branch experimental

This new branch is basically a safe playground to conduct all the crazy experiment you could think of without ruining anything. To move to the new experimental branch, do:

 
$ git checkout experimental 
$ git branch

The latter git branch with no argument shows you which branches exist, and marks the current branch with an asterisk (‘*‘). All changes made now will be tracked in the experimental branch while leaving the ‘master’ branch alone. If you want to quickly fix a bug in your master branch, you run ‘git checkout master’, modify your code, commit the changes and run ‘git checkout experimental’. This way, you can track changes to two different versions of your code.

I’ll get back to that… (Stashing)

Suppose you are hacking away on your experimental branch. You have made changes to a couple of files, maybe only staged one for commit. It’s all one big, happy creative mess, absolutely not ready for a commit just about now – when you find out you need to fix a simple but important bug in the stable master branch. For situations like this, git has the ‘stash’ command:

$ git stash 

This will save the status of your files and changes and “put them aside” for later while reverting the current branch to what it was after the last commit. So say you have added three lines to a file, staged it and than stashed it; you open the file and the three lines are nowhere to be seen. But run

$ git stash apply 

and the changes will be back (and can be stashed again etc.). So once you have stashed your changes, you can checkout the master branch, squash that bug, go back to the experimental branch and run run ‘git stash apply’, and your creative mess is back out where it was. Very neat!

Merging

Once that experimental feature is ready to go prime time it is time to merge the ‘experimental’ branch back into ‘master’. With everything staged and committed in the experimental branch, checkout the master branch and run:

$ git merge experimental

Your new awesome feature has now been merged into the master branch, but any bugs you have fixed in master in the meantime of course stay fixed. Notice, though, that while Git is very good at tracking and merging changes, conflicting changes to the same line are marked in the file and will have to be resolved manually.

If your experimental branch is not needed anymore, you can delete it by

$ git branch -d experimental 

This requires that there are no changes in the branch that are not merged. If you have this but still want to delete it, force it by using a capital D.

$ git branch -D experimental

Viewing log and restoring old version

To keep track of which changes have been made, you can run

$ git log --graph 
0unknownname

The ‘–graph’ option is not necessary but provides the nice illustration of the branching and merging history seen in the left side of thew window.

Undo specific change

As mentioned above, any changes made since the last commit will be cleared with

$ git reset 

If you want to undo the effect of a given commit, you find the ID of the commit by running git log and then call:

$ git revert <commit id> 

If there are no conflicts, Git can even revert a commit made in the master branch before merging and still keep results of the merging.

Learn more

If this has made you curious to learn more, there is of course a lot of places on the internet that give much more in-depth information about Git. But in fact, Git also has excellent built-in documentation. Try entering

$ git help

in the terminal to see a list of available commends with a short description (there are many more than listed here!), and

$ git help [COMMAND]

to get in-depth information and examples for each of the commands.


In the next post (soon to come), I will write how to keep a synchronized copy of your repository (a clone) on a remote machine. Stay tuned!

Testing code syntax

I’m going to try to post some code here in ther near future (or at least I hopt to), so here is a test of the alleged code highlighting feature in Posterous:
import scipy as sp

a = list(['Peter', 'Lars', 'Rebecca', 'Josephine'])

for i in sp.arange(len(a)):
    namenum = a[i]+'-'+str(i)
    print namenum
    print 'All done!'

So, how does Posterous handle this code?
Nicely, I hope.

 

[Edit: But WordPress chokes on Posterous’ code tags. Well, well.]