New open source project on work time – git-history-data

Announcing a little open source project that I have built at work and been allowed to publish Freely.

git-history-data analyses a Git source code repository and dumps out data in a form that is easy to analyse.

I wrote an article demonstrating how to use it to find out some interesting information about the codebase of Git itself and got it published on IBM DeveloperWorks Open: Learning about the Git codebase using git-history-data.

Difficult merges in Git – don’t panic!

A video in which I try to explain what merging and rebasing really are, to help you understand what is going on when Git presents you with scary-looking conflict messages. I also explain why you shouldn’t panic because it’s hard to lose your work, and how to get you work back if you really mess up:

Slides here: Difficult Merges in Git.

A commit represents the state of the world (and the history leading up to that state). A commit is not a diff.

Merging means making a new commit with two (or more) “parents” (previous commits) that represents the result of merging the changes from two different threads of development that happened separately. None of the already-committed commits are modified – you just get a new commit on top. History is more complicated, but true.

Rebasing means modifying the history of one thread of development so it looks like it happened after the other one. This involves modifying all the commits in that thread. There is no extra merge commit, so you lose the history of the merge that happened. History is simple, but it’s a lie, and if you messed up the rebasing process, you can’t get back to where you were (once your old commits have been garbage-collected).

Options for code reviews with Git

We’re thinking about switching to Git for my work, and I want to be confident we can still support good code reviews if we make the switch.

I am a big fan of in-person reviews, and for that, git difftool is enough but sometimes you need to do it asynchronously, and then you need a tool or a process or something.

Here are the options as I see them so far (please comment if you know others I should consider):

  1. Emailing patches. Git has git format-patch and git bundle that allow creating a file containing changes that can be sent by email or message. These can be reviewed as patches or applied to the working tree and reviewed in context.
  2. Feature branch and pull request. Devs to push their changes to a branch in a shared repo and send an email or message asking a colleague to pull the branch. The reviewer looks at the changes in the repo or pulls them, then either sends back comments, or merges the branch into their own and delivers to the master branch.
  3. Tools. There are several extra tools that sit in front of Git and deliver changes when they are reviewed. These include: Gerrit, Critic, Review Board.


What git server should I use?

At work we are considering whether we can use Git for our source control. I am a big fan of Git, so I’d like to see this happen.

We only need to work against a central repository most of the time, so I’m looking at what servers might work for us.

Update: This StackExchange question may help: Self-hosted replacement for Github.

Update: Added software from the StackExchange answers to the list.

Features we will need:

  • User management
  • Repository management
  • Browsing code and diffs via the web
  • Hosted in-house

Features we might want:

  • External user authentication e.g. via LDAP
  • Code review
  • Integrating with an issue tracker

Most of my use of Git so far has been against large servers like GitHub (which I really like) and SourceForge, but recently I set up a test Git server using gitolite and gitweb, which gives me my 4 “needs” above but not my 3 “wants”. It also requires command-line use of git to administer and SSH keys from users, so might not suit our system administrators or all our developers.

So, lazyweb, what server should I recommend?

Here is my research so far:

Free git server software

GitLab – looks a lot like GitHub, and appears to satisfy all 4 of my needs and all 3 of my wants. Might be a bit decentralised (ironically) for our usage e.g. the docs talk about using merge requests for code review whereas I’d expect we’d want a commit-gating style which is what I believe Gerrit provides.

SCM-Manager – looks very corporate. Likely it could satisfy my needs and my wants.

Gerrit + Gitblit – lots of code review features, used by major projects including the Android open source project. Weird that Gerrit doesn’t include a code browser and you have to add something like Gitblit. I think this will give me all 4 of my needs and all 3 of my wants.

Gitolite + gitweb – this is what I am using at the moment, and it works well, satisfying the needs above, but not the wants. Gitolite configuration is done by editing config files and pushing them into a special git repository on the server. Adding users means adding a user’s SSH key to the config repository, so requires tech-savvy users and admins. gitweb is fast and clear. My only complaint is that you don’t seem to be able to control the amount of context you see in a diff (often I want to see the full files).

Gitorious – the software behind a mature public site that some people really like. Doesn’t appear to do LDAP authentication, and may not integrate with issue tracking.

tuleap – project planning, chat, issue tracking, builds, document management, discussion board, news all in one product. Includes Gerrit for code reviews, Jenkins for build management. Supports LDAP and OpenID authentication.

Phabricator – code hosting and review, issue tracking, wiki, alerts, message boards, blogs, Q&A, polls all in one product. Supports LDAP and OAuth authentication.

GitPrep – explicitly a clone of GitHub. Seems to look nice, but a young project and not talked about much on the Internet yet.

GitBucket – explicitly a clone of GitHub. Couldn’t find much information beyond that.

GitList – only a repository viewer, but could possibly be used with gitolite instead of plain gitweb – may have more features such as full-file diffs, but I’m not sure. Looks pretty, and doesn’t have much documentation.

CGit – only a repository viewer, but definitely allows specifying the amount of context in diffs (and if you edit the URL directly you can ask for as much as you want). Very plain interface, and minimal documentation. Claims to be fast.

Cydra – may turn out to be good but no web site at the moment, so probably not mature enough to consider.

Gitosis appears to be a dead project.

Paid (in-house) git server software

Very unlikely that we will pay for anything, but here are the options I have found so far:

Atlassian Stash – one-time payment e.g. $6,000 for 100 users.

GitHub Enterprise – apparently they do in-house installations but I couldn’t find any information. GitHub has an excellent interface and features.

Microsoft Team Foundation Server – a larger system that offers Git integration as a feature.

RhodeCode – 50 users for $199/month, supports LDAP and Active Directory authentication, and code review. Claims to be highly secure.