Deleting commits from the git history

Today I wanted to fix a Git repo that contained some bad commits (i.e. git fsck complained about them). [I wanted to do this because GitLab was not allowing me to push the bad commits.]

I wanted the code to look exactly as it did before, but the history to look different, so the bad commits disappeared, and (presumably) the work done in the bad commits to look like it was done in the commits following them.

Here’s what I ran:

git filter-branch -f --commit-filter '

    if [ "${GIT_COMMIT}" = "abdcef012345abcdef012345etcetcetc" ];
        echo "Skipping GIT_COMMIT=${GIT_COMMIT}" >&2;
        skip_commit "$@";
        git commit-tree "$@";
' --tag-name-filter cat -- --all

(Where abdcef012345abcdef012345etcetcetc was the ID of the commit I wanted to delete.)

Of course, you can make this cleverer to exclude multiple commits at a time, or run this several times, putting in the right commit ID each time.

New open source project on work time – git-history-data

Announcing a little open source project that I have built at work and been allowed to publish Freely.

git-history-data analyses a Git source code repository and dumps out data in a form that is easy to analyse.

I wrote an article demonstrating how to use it to find out some interesting information about the codebase of Git itself and got it published on IBM DeveloperWorks Open: Learning about the Git codebase using git-history-data.

Difficult merges in Git – don’t panic!

A video in which I try to explain what merging and rebasing really are, to help you understand what is going on when Git presents you with scary-looking conflict messages. I also explain why you shouldn’t panic because it’s hard to lose your work, and how to get you work back if you really mess up:

Slides here: Difficult Merges in Git.

A commit represents the state of the world (and the history leading up to that state). A commit is not a diff.

Merging means making a new commit with two (or more) “parents” (previous commits) that represents the result of merging the changes from two different threads of development that happened separately. None of the already-committed commits are modified – you just get a new commit on top. History is more complicated, but true.

Rebasing means modifying the history of one thread of development so it looks like it happened after the other one. This involves modifying all the commits in that thread. There is no extra merge commit, so you lose the history of the merge that happened. History is simple, but it’s a lie, and if you messed up the rebasing process, you can’t get back to where you were (once your old commits have been garbage-collected).