Do Open Source with Git and Github

This article originally appeared in the May 2012 php|architect magazine.

Often I find absolutely competent programmers, who aren't involved in open source, either because they don't know how to approach a project, or because they just aren't sure how the process even works. In this article we'll look at one example, the conference feedback site joind.in, and how you can use GitHub to start contributing code to this project. Since so many projects are hosted on github, this will help you get started with other projects, too.

The tl;dr Version for the Impatient

  1. Fork the main repo so you have your own github repo for the project
  2. Clone your repo onto your development machine
  3. Create a branch
  4. Make changes, commit them
  5. Push your new branch to your github repository
  6. Open a pull request

This article goes through this process in more detail, so you will be able to work with git and github projects as you please.

Start with Source Control

Source control is at the centre of the development process. Whether you're completely new to source control or have used other systems, git can take some getting used to but this article includes everything you need for simple changes, and there are some resources for further reading included at the end.

If you've worked with centralised systems, such as Subversion, you already have the idea of source control showing us who changed what and when. Sometimes you'll get a meaningful comment to go with that, sometimes you won't! Git gives us exactly the same thing, here's some recent changes on the joind.in repo on github:

Recent commits to joind.in

Recent commits to joind.in

There are two main things to understand when you move to the distributed systems such as git (or mercurial). There are many other differences, but two that are really key to understanding this:

there are many repositories for each project

a commit is a patch, not a snapshot

To explain further, when you work with git, you will have one repository on a server somewhere, and one on your local development machine. When you want to work with code you clone the repository rather than checking code out from it. When you clone, you get the whole repository – you can't checkout from a subdirectory. This means that whereas with a centralised system you might have a single repository with many projects inside it, a git-based setup typically has many top-level repositories and you can get out as many as you need for your work.

Commits in git are really different than they are in subversion, and for a few reasons. The most striking thing for a subversion user is that the revisions aren't numbered; instead they have unique hashes to identify each one, for example in my log I see this:

commit aa502ec9d564e145b1e219743d06fb73ab816bc6
Author: Lorna Mitchell 
Date:   Sun Mar 4 23:13:38 2012 +0000

    straightening out a problem with API metadata not showing up when there is only one result in a list

Commits are not numbered sequentially because git commits can be applied in any order. Each commit is simply a patch, you don't inspect the differences between two revisions, but instead you examine the contents of each commit. We'll see more examples of this later when we get the code checked out and start making our own changes.

Get a Github Repo

Already we mentioned that there will be multiple repositories in play when you work with git. The very first thing you need to do is create an account on github if you don't have one already, and log in. Go to the main repository of the project you want to work on; for us this will be joind.in: https://github.com/joindin/joind.in and look for the "Fork" button in the top right hand corner.

Fork the main repo by clicking the "Fork" button at the top right

Fork the main repo by clicking the "Fork" button at the top right

The terminology to fork a project is quite confusing, as github uses it in a way that is quite different from what we usually mean when we discuss project forks. In this case however, you are not forming a rogue offshoot project, you're just making a repository of your own that you can push changes to!

A Repository of Your Own

At this point, we have two repositories; the joind.in repository that belongs to the joindin organisation, and yours, which will be under https://github.com//joind.in. Next we will make a third repository, and this one is your development copy of the code. The situation we're aiming for looks something like this. You have already dealt with the github end, we'll now clone your github repo onto your local machine or wherever you usually do development.

Project repo, your github repo, your local repo[/caption

Project repo, your github repo, your local repo[/caption

To clone the repo, you will need to have git installed on the machine you're cloning to. Then, go to your github repo and take a look in the bar near the top:

The git URL of your github repo

The git URL of your github repo

Copy the contents of the box to your clipboard, this is the URL we need in order to be able to clone the repository:

git clone [url] [directory]

So, for me to pull this onto my own machine, I adapt that command to use the address of my own github repository:

lorna@taygete:~/joind.in$ git clone git@github.com:lornajane/joind.in.git dev
Cloning into dev...
Permission denied (publickey).                                                                                     
fatal: The remote end hung up unexpectedly

If you haven't used github before, then you will probably see an error message like this about your keys not being set up. We need to generate some ssh keys (unless you have them already), place them so that your ssh program can find them, and also give your public key to github so it can verify you when you SSH in. Github have some very approachable instructions on this and many other aspects of setup2 but in a nutshell, you need to do the following:

  1. Create a public/private key pair, using ssh-keygen or an equivalent method, these will be saved to your local machine, but you can copy them onto other machines if you need to use them from different places
  2. Give your public key to github. Click on your name at the top, then "Edit your Profile" and you'll see an "SSH Keys" entry on the left hand navigation. You add your public key here by copying and pasting the contents of the .pub file generated in step one. If you're not sure, there are links directly to the help pages you need on this page.
  3. Check that your connection works by ssh-ing to github with the command ssh -T git@github.com
  4. You should be all set!

As an aside, I use different SSH keys for different servers, so mine are not named as ssh expects the default to be. To allow github know which keys to use, I have an SSH alias set up which knows that to connect GitHub (rather than github.com), it needs to use particular keys – very handy if you do want to keep keys separate for separate concerns, such as keeping github keys separate from server keys you use for work access, and so on.

If all goes well, then your successful clone with output something more like:

lorna@taygete:~/joind.in$ git clone git@GitHub:lornajane/joind.in.git dev                                  
Cloning into dev...                                                                                               
remote: Counting objects: 16743, done.
remote: Compressing objects: 100% (4698/4698), done.
remote: Total 16743 (delta 12989), reused 15540 (delta 11881)
Receiving objects: 100% (16743/16743), 4.16 MiB | 429 KiB/s, done.
Resolving deltas: 100% (12989/12989), done.

You now have the full repository cloned into the directory you named, if you check the contents of this directory, you'll see the project files are there. The cloned respository is your working copy, you will make changes here and then when you want to share them, push them back to github.

Remotes and Aliases

Before we get too carried away with hacking, there are a couple of other things to get set up which will help us a lot later on. Again, there is excellent documentation from github itself detailing this process: http://help.github.com/fork-a-repo/ but it's very important so we'll walk through it here too

When you clone a repository, you automatically get an alias called origin which points to the github repository you were cloning. We'll use this word, origin, to mean your github repo.

We want to add another remote for the main repo that you forked – we'll call this upstream since it's a pretty common term and one that github themselves use. We probably don't have permissions to push changes to upstream, but by adding it as an alias it makes it very easy to keep our code in sync with changes happening in the main project.

To add the upstream remote, run this command:

lorna@taygete:~/joind.in/dev$ git remote add upstream git@github.com:joindin/joind.in.git

You can add any other remotes and name them as you please – I have lots of remotes of other joind.in contributors where I've added their repos to try out their changes. This lets me test things out, even things that aren't ready for production yet, safely, in a branch (more about branches in one moment).

Staying in Sync

Even without making any major changes in your project, you still need to be able to pull in all the changes that have been added to the main repository. Since we added the upstream alias, we can do this in a couple of steps quite easily.
First, we bring the changes that have been made in the main repo into our local one, the process looks something like this:

Pull changes from upstream into your development repo

Pull changes from upstream into your development repo

To actually make this happen, we'll pull from the master branch of the upstream repository. I've included some sample output, but this will look different depending on which changes have happened upstream since you last updated:

lorna@taygete:~/joind.in/dev$ git pull upstream master
From GitHub:joindin/joind.in
 * branch            master     -> FETCH_HEAD
Updating aa502ec..dc57771
Fast-forward
 CHANGELOG |    6 ------
 testfile  |    1 -
 2 files changed, 0 insertions(+), 7 deletions(-)
 delete mode 100644 CHANGELOG
 delete mode 100644 testfile

Now we have these added to our local repo, we can just push them back to the repository on github that belongs to us. When we do this, we're moving the same changes that we received from the upstream onto our origin:

Push changes into your github repo

Push changes into your github repo

The command is literally push – we push to the origin repo, which is the one on github that belongs to our user:

lorna@taygete:~/joind.in/dev$ git push
Total 0 (delta 0), reused 0 (delta 0)
To git@GitHub:lornajane/joind.in.git
   aa502ec..dc57771  master -> master

You may notice here that I didn't provide any additional arguments to my git push command. By default, this will push all the "tracking" branches from the local repository to the repo that this was cloned from. You can provide repository and branch names as optional arguments, for example:

git push joe-bloggs feature-x

This command would push my feature-x branch to a remote that I had added to my repository called joe-bloggs – but only if I had write access to that remote repo. Since with github we usually just push to our own repositories and then request that repository owners pull our changesets, you will almost always be pushing to your origin. We will see examples of pushing our own branches later on, first we need to create a branch and make some code changes.

Branching in Git

There are two main things you need to know about branching in git:

  1. Your working copy can changes itself between branches, there is no need for copying code
  2. Your branches are private until you share them

The point about how the working copy behaves is really important, because it's completely different from how subversion works. In subversion, when we "branch", we just create a copy of the project, the two copies are quite independent from one another. In git, we branch and the repository changes so that our working copy represents the contents of that branch. You don't create any new directories, update any virtual hosts, or need to do anything else – just check out the branch you want.

When you create a branch, you usually do so on your local repository, then you can share it by pushing it to github. If you don't want to share it, or don't want to share it yet, then you don't have to – but you can still commit. Whether you think people should be developing in isolation on their own machines without a backup of their code or not is a debate for another day, but it's still quite a neat feature; as developers, we always need safe places to try out new ideas.

By default, the "main" branch in git is called "master". However for any feature that you are working on, you will want to branch. Exactly when you branch and what you call your branch depends a bit on the process that the project you're using adopts. Most projects I've used have a process either based on, or similar to, the guidelines used by Zend Framework 2 - and joind.in uses guidelines based on that too.

If you've been burned by branching in subversion, try to set those bad experiences aside – branching and merging in git is quite different and much more approachable! When done properly, branching in a sensible manner makes your changes easy to use and forms the basis of a quality development process.

Create a Branch

To create a branch, you checkout a branch that doesn't exist, with the -b switch to tell git to create it. As an example, here's a branch I was about to create anyway, so we can walk through the process:

lorna@taygete:~/joindin/dev$ git checkout -b impact-banner
M       src/.htaccess

We now have a second branch in our repository. You can see which branches you have and which one you're currently on by using the branch command:

lorna@taygete:~/joind.in/dev$ git branch
* impact-banner
  master

This shows all the branches you have in this repository, with the asterisk to mark which one this repository is currently pointing to. Look back at the output of the clone command above, did you notice that I had a modified file there? That's a file that I have changed, but not added to a commit, so it just hangs around and comes with me whenever I switch between branches. In this case, it's because I have joind.in's debug environment variable enabled in .htaccess and we don't want to commit that!

Now that we have a new branch, we can go ahead and develop our excellent contribution and commit to it. What I love about working with branches in git is how easy it is to switch between versions. I am often working on more than one thing at a time on joind.in, so I can have a few branches on the go, one for each feature, particularly when some small bug gets reported while I'm working on a big API change or something along those lines.

You might wonder why I don't make the small change in my master branch; we would do it this way in a centralised system. The reason is that it makes the next step easier – we can make whatever changes we like to our repo, but we need to keep an eye on how we package these so that they can be passed back to the original project. We'll talk about pull requests in a moment, but for now bear in mind that with this process every change, small or far-reaching, gets its own branch.

Making Changes

Let's make a small change in this branch. Joind.in is nominated for a PHP Architect impact award, so I'm going to add a link to that right at the top of the site. When I've made my change to the relevant view template, I can see the change by looking at the output of git status.

lorna@taygete:~/joind.in/dev$ git status
# On branch impact-banner
# Changes not staged for commit:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#
#       modified:   src/.htaccess
#       modified:   src/system/application/views/main/index.php
#
no changes added to commit (use "git add" and/or "git commit -a")

To check what was actually changed, use git diff – this shows you all changes that are present and haven't been committed yet.

If we want to commit those changes, we first need to indicate which ones will form part of our commit, by using the add command:

lorna@taygete:~/joind.in/dev$ git add src/system/application/views/main/index.php
lorna@taygete:~/joind.in/dev$ git status
# On branch impact-banner
# Changes to be committed:
#   (use "git reset HEAD ..." to unstage)
#
#       modified:   src/system/application/views/main/index.php
#
# Changes not staged for commit:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#
#       modified:   src/.htaccess
#

Now when we commit, the entries under "Changes to be committed:" will be included in our commit.

lorna@taygete:~/joind.in/dev$ git commit -m "adding a notice about the impact awards to the homepage"
[impact-banner f150a3a] adding a notice about the impact awards to the homepage
 1 files changed, 3 insertions(+), 0 deletions(-)

Here I used the -m switch to allow me to supply a message on the same line as my commit command. You could alternatively just use git commit on its own, and git will launch your default editor for you to add your commit message to the file. When you save the file, your commit will be created. Our changes are now recorded, but only on our local repo. We will take some time to look at them, and then move on to see how we can share them with the wider world.

Inspecting Changes

The entire purpose of a version control system is that you can see which changes were made, when and by whom. We can look at what's been happening lately on this branch, by using git log:

Output of git log command

Output of git log command

Here you can see everything that has happened until now for this particular version of the code. What you don't see here though is which changes are on which branch, however git can show us this too, using the --graph option with git log. Personally, I find it much more useful to see all the information on one page, so I like to add another command-line switch, the --oneline option, and combined these switches produce output like this:

Git Log with --graph and --oneline

Git Log with --graph and --oneline

We see a very short summary of each commit, a few characters of the hash (but enough to identify each one uniquely – you can use the first 6 or so characters of a git commit hash instead of a whole one in most places) plus the first part of the commit message. Each branch is shown in a different colour, with the commits showing which branch they were added to and when they joined this branch. Since our branch is created from master, we see everything in the past merging in to master, and then our own commits added along the same line.

We can drill in and see some more detail on a particular change by using git show and the change we want. To give a simple example, you can see that about halfway down, I've removed a duplicate label. I can take a closer look at what actually happened there using git show.

lorna@taygete:~/joind.in/dev$ git show 221d8b7                                                                                                                   
commit 221d8b77ad7f80c4c4471eee9483a3fbade67f9a                                                                                                                                        
Author: Lorna Mitchell                                                                                                                                            
Date:   Sun Feb 5 15:00:20 2012 +0000                                                                                                                                                  
                                                                                                                                                                                       
    removing duplicate label on login form                                                                                                                                             
                                                                                                                                                                                       
diff --git a/src/system/application/views/template2.php b/src/system/application/views/template2.php                                                                                   
index 41b30c8..2f365d4 100644                                                                                                                                                          
--- a/src/system/application/views/template2.php                                                                                                                                       
+++ b/src/system/application/views/template2.php                                                                                                                                       
@@ -131,7 +131,6 @@ $title[] = $this->config->item('site_name');                                                                                                                       
                         

- Or login via these services:<br /> <a href="/facebook/request_token"><img src="/inc/img/signin_facebook.png" alt="Sign in with Facebook" title="Sign in with Facebook" /></a> </p> <p>

This is great for looking what the commit was, and seeing the diff as well, all in one place. The output of the diff is the same as shown by git diff and looks similar to any other diff tools you've used in the past.

By default, git will pipe all of the commands which could output more than you can see on your screen through a pager such as less, so you'll usually see the information in bite-sized, or should I say screen-sized, chunks.

Sharing Changes

There's no point using a collaborative version control system if you don't collaborate, so in this next section we'll share our new changes with the world. Earlier on, we saw how to pull in changes from the upstream repository, now we'll be passing our changes in the other direction.

As things stand, we have our change in a branch on our local repository, something like this:

You developed a great feature

You developed a great feature

To make our splendid new feature visible to others, we need to push it onto our github repository. Earlier we saw that git push would push all our "tracking" branches on to the repository. A tracking branch is one that has a link to a branch in a remote repository. In git, our branches can be entirely private, we can create a branch in our local repository, commit to it as much as we like, change the commits that are on that branch, and merge some of all of them into another branch, or not. None of this has to be publicly visible if we don't want it to be.

In this case however, we do want to share, so we will ask git to push this branch to the origin repository – this automatically sets it up as a tracking branch, so that any further changes that are made on this branch, in either repo, will be transmitted when we push/pull. To push this specific branch we do:

lorna@taygete:~/joind.in/dev$ git push origin impact-banner 
Counting objects: 15, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (8/8), done.
Writing objects: 100% (8/8), 832 bytes, done.
Total 8 (delta 6), reused 0 (delta 0)
To git@GitHub:lornajane/joind.in.git
 * [new branch]      impact-banner -> impact-banner

If we go back to github now, we will be able to find this new branch on your github repository, look under the "branches" dropdown on the left. If you see your branch listed, you have pushed it successfully:

Push git branch

Push git branch


Do you see that "Pull Request" button at the top? Once you have selected the branch containing your changes, you can ask the owner of the main repository, for example joindin/joind.in to pull in your changes by clicking the button. You will be prompted to give a reason for the changes, if you're fixing a bug then give its reference number, if you're creating a feature then give a clear explanation of what you have changed and why. As a project lead, I don't accept changes when I can't understand why I need them – so making yourself clear is really useful here, for both of us!

Pull request

Pull request

The pull request will be listed on the main project pull requests list (http://github.com/joindin/joind.in/pulls) until it is either accepted or closed by someone who has write acces to that repository. Once your pull request is accepted, you will be listed as a contributor to the project – that's publicly available information, so bragging rights are absolutely yours! Take a look at the joind.in contributors list: https://github.com/joindin/joind.in/contributors if you know or meet any of these people, please high five them :)

"Git" Involved

Hopefully we've given you enough here to get started with github projects, perhaps an open source project that you're keen to get involved with, or a project of your own that you would like to share the code from. Getting involved with an open source project is about so much more than just code . As a project lead, I am grateful for feedback, people reporting bugs, replicating bugs, or sometimes just asking questions about the project, you certainly don't need to be a code wizard when you start looking at what's inside an open source project, although you may end up as a code wizard by the time you've "just had a quick look" at enough elements of the project – you have been warned!

Resources

A few links for further reading, should you wish to:
Github's own help pages: http://help.github.com
A site that helped me understand many concepts: http://think-like-a-git.net/
Some certified-friendly open source projects: http://www.phpwomen.org/wordpress/os-project-opportunities

7 thoughts on “Do Open Source with Git and Github

  1. > Each commit is simply a patch, you don't inspect the differences between two revisions, but
    > instead you examine the contents of each commit.

    I don't think that this is true. A commit in git _is_ a snapshot. It contains (amongst other things) a pointer the the whole tracked directory/file structure and its parent commit. git diff shows the differences between two commits (i.e. the patch).

  2. Pingback: GIT, Does a solo dev need it? | Development Debauchery

  3. Pingback: First Phing Plugin | LornaJane

Leave a Reply

Please use [code] and [/code] around any source code you wish to share.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>