This is the English translation of Das Git-Buch (The Git Book) by Valentin Haenel and Julius Plenz, 2nd Ed.2014, released under CC BY-NC-SA 4.0 license. Translated from German by Alexander Bolli and Tristano Ajmone in 2020.

Book Status

This document is still in Beta version, but fully translated; so enjoy reading it and leave us some feedback on how we might improve it.

We’re currently proofreading and polishing the entire text, fixing some styling and formatting issues. Any help with proofreading is much appreciated; if you wish to contribute submit your changes via pull request on the beta-dev branch of the project repository:

Preface

Git was developed in early 2005 by Linus Torvalds, the creator and current maintainer of the Linux kernel. For the management of the kernel sources, the development team had initially decided to use the commercial version control system BitKeeper. Problems arose when the company behind BitKeeper, which provided the tool to the project free of charge, accused a developer of revealing the mechanisms of the software by reverse engineering. As a result, Torvalds decided to write a new version control system.

Simply switching to another system was not an option: The alternatives had a centralized architecture and did not scale well enough. The requirements of the kernel project on a version control system are, however, also huge: Between a little version jump (e.g. 2.6.35 to 2.6.36) there are over 500,000 changed lines in almost 1000 files. Responsible for this are over 1000 individuals.

So what were the Design Goals of the new program? Two characteristics crystallized quickly as design goals: speed or performance and verifiable integrity of the managed data.

After only a few weeks of work, a first version of Git was able to manage its own source code. Implemented as a small shell script collection with performance-critical parts in C, this version was still far from being a “full-fledged” version control system.

Since version 1.5 (February 2007), Git offers a new and tidier user interface and extensive documentation, allowing people not directly involved in Git development to use it.

The basic concepts have remained the same up to current versions: First and foremost, the object model and index, key features that distinguish Git from other VCS. The Unix philosophy of “one tool, one job” is also consistently applied here; the subcommands of Git are each independent, executable programs or scripts. Even in the 2.0 version there are still (as at the beginning of the development) some subcommands with shell scripts implemented (e.g. git pull).

Linus Torvalds himself does hardly any programming on Git these days; a few months after the first release, Junio C. Hamano took over as maintainer.

Not only the revolutionary approach of Git, but also the fact that the entire kernel development was migrated to Git quickly and successfully has given Git a steep rise. Many projects, some of them very large, now use Git and benefit from the flexibility it has gained.

Who Is This Book Intended For?

The book is aimed at both professional software developers and users who want to work on small scripts, web pages or other documents or who want to get actively involved in an (open source) project. It teaches basic version control techniques, introduces the basics of Git, and explains all the major use cases.

Work that you don’t manage with a version control system is work that you might have to do again—​whether it’s because you accidentally delete a file or consider parts obsolete that you need later. For any form of productive text and development work, you need a tool that can record and manage changes to files. Git is flexible, fast, and equally suited for small projects by individuals or large projects involving hundreds of developers, such as the Linux kernel.

Developers who already use a different version control system can benefit from switching to Git. Git allows a much more flexible way of working and is in many respects not as restrictive as comparable systems. It supports true merging and guarantees the integrity of managed data.

Git also benefits open source projects, because each developer has his or her own repository, which prevents disputes over commit privileges. Git also makes it much easier for newcomers to get started.

Although most of the examples and techniques presented refer to source code, there is no fundamental difference to managing documents written in LaTeX, HTML, AsciiDoc or related formats.

How to Read the Book?

Ch. 1, Introduction and First Steps gives a brief overview: How do you initialize a git repository and manage files in it? It also covers the most important configuration settings.

Ch. 2, The Basics covers two key concepts of Git: the index and the object model. Along with other important commands that are introduced there, understanding these two concepts is essential to the safe use of Git.

Ch. 3, Practical Version Control discusses practical aspects of version control. In particular, it covers the branches and merges that are so central to Git. It also discusses how to resolve merge conflicts in detail.

Ch. 4, Advanced Concepts discusses advanced concepts, with a special focus on the Rebase command, an essential tool for any git professional. Other important commands follow, including Blame, Stash, and Bisect.

Only Ch. 5, Distributed Git looks at the distributed aspects of Git: how to share changes between repositories, how developers can collaborate. Then Ch. 6, Workflows provides an overview of strategies for coordinating development work in a project.

We recommend that you read at least the first five chapters in a row. They describe all the important concepts and techniques for using Git safely in large projects. You can read the following chapters in any order, depending on your interests and needs.

Ch. 7, Git Servers covers installation and maintenance of Git services: two web-based repository browsers and access management for hosted repositories using Gitolite.

Ch. 8, Git Automation summarizes various aspects of automation: How to write hooks and custom Git commands, and how to rewrite the complete version history if necessary.

Finally, Ch. 9, Interacting with Other Version Control Systems discusses migration from other systems to Git. The focus here is on converting existing Subversion repositories, and on the ability to talk to Subversion from within Git.

The appendices deal with the installation and integration of Git into the shell. An outlook on the hosting service Github and a detailed description of the structure and maintenance mechanisms of a git repository provide further background information.

Conventions

The examples are only executed on the shell. Even though some editors and IDEs now offer quite a good Git integration, and even though there are a lot of graphical front-ends for Git, you should first learn the basics with the real Git commands.

The shell prompt is a single dollar sign ($); keyboard input is printed in semi-bold, like this

$ git status

To find your way around the shell faster and better, we strongly recommend adding git functionality to the shell, such as displaying the branch in the prompt (see Ch. 10, Shell Integration).

Unless otherwise noted, we refer to Git version 2.0. The examples all run with English local settings.

Newly introduced terms are written in italics.

Installation and “The Git-Repository”

The installation of Git is described in detail in App. A, Installation. Some examples use the Git source repository, the repository where Git is actively developed. This repository is also called Git-via-Git or git.git.

After you have installed Git, you can download the repository with the following command

$ git clone git://git.kernel.org/pub/scm/git/git.git

The process takes a few minutes, depending on the connection speed and server load.

Documentation and Help

A comprehensive documentation of Git is available in the form of pre-installed man pages. Almost every subcommand has its own man page, which you can call in three equivalent ways, here for the git status command, for example:

$ git help status
$ git status --help
$ man git-status

On the Git website⁠[1] you can also find links to the official tutorial and other free documentation.

A large, vibrant community has formed around Git. The Git mailing list⁠[2] is the lynchpin of the development: patches are sent in, new features are discussed, and questions about using Git are answered. However, the list, with sometimes more than 100 emails a day, some of them very technical, is only suitable for beginners to a limited extent.

The Git Wiki⁠[3] contains documentation as well as an extensive link collection of tools based on Git⁠[4] and FAQs⁠[5].

Alternatively, the #git IRC channel on the Freenode network provides a place to get rid of questions not already answered in the FAQs or documentation.

For those switching from the Subversion environment, the Git-SVN Crash Course[6] is recommended, a comparison of Git and Subversion commands that will help you transfer your Subversion knowledge to the Git world.

Also worth mentioning is Stack Overflow[7], a platform by programmers for programmers, on which technical issues, including Git, are discussed.

Downloads and Contacts

The sample repositories of the first two chapters and a collection of all longer scripts are available for download at http://gitbu.ch/.

If you have any comments, please contact us by e-mail at one of the following addresses: kontakt@gitbu.ch, valentin@gitbu.ch or julius@gitbu.ch.

Acknowledgements

First of all, we’d like to thank all the developers and maintainers of the Git project as well as the mailing list and the IRC channel.

Many thanks to Sebastian Pipping and Frank Terbeck for comments and tips. Special thanks to Holger Weiß for his review of the manuscript and helpful ideas. We thank the entire Open Source Press Team for the good and efficient cooperation.

Our thanks go especially to our parents, who have always supported and encouraged us.

Valentin Haenel and Julius Plenz — Berlin, June 2011

Preface to the 2nd Edition

In the 2nd edition, we have limited ourselves to carefully recording the changes in the use of Git that were introduced up to version 2.0 — in fact, many commands and error messages are now more consistent, so that in some places this represents a significant simplification of the text. Inspired by questions from Git training courses and our own experience, new hints on problems, solutions, and interesting features are included.

We thank all those who sent in corrections to the first edition: Philipp Hahn, Ralf Krüdewagen, Michael Prokop, Johannes Reinhold, Heiko Schlichting, Markus Weber.

Valentin Haenel and Julius Plenary Session — Berlin, September 2014

Preface to the Creative Commons Edition

The publisher Open Source Press, who initially convinced us to write this book at all and published it over the past few years, has ceased operations as of 31.12.2015 and has transferred all rights to the published texts back to the authors. We especially thank Markus Wirtz for the always good and productive collaboration that has connected us over many years.

Due to mainly very positive feedback on this text we decided to make it freely available under a CreativeCommons-License.

Valentin Haenel and Julius Plenz — Berlin/Sydney, January 2016

1. Introduction and First Steps

The following chapter provides a concise introduction to the basic concepts and configuration settings of Git. A small sample project shows how to put a file under version control with Git, and the commands you use to perform the most important tasks.

1.1. Basic Terminology

Some important technical terms will be used repeatedly in the following and therefore require a brief explanation. If you have experience with another version control system, you will be familiar with some of the concepts involved, though perhaps under a different name.

Version Control System (VCS)

A system for managing and versioning software or other digital information. Prominent examples are Git, Subversion, CVS, Mercurial (hg), Darcs and Bazaar. Synonyms are Software Configuration Management (SCM) and Revision Control System.

We distinguish between centralized and distributed systems. In a centralized system, such as Subversion, there must be a central server where the history of the project is stored. All developers must connect to this server to view the version history or make changes. In a distributed system like Git, there are many equivalent instances of the repository, so each developer has their own repository. The exchange of changes is more flexible, and does not necessarily take place through a central server.

Repository

The repository is a database where Git stores the different states of each file in a project over time. In particular, every change is packaged and saved as a commit.

Working Tree

The working directory of Git (sometimes called sandbox or checkout in other systems). This is where you make all modifications to the source code. It’s often called the Working Directory.

Commit

Changes to the working tree, such as modified or new files, are stored in the repository as commits. A commit contains both these changes and metadata, such as the author of the changes, the date and time, and a commit message that describes the changes. A commit always references the status of all managed files at a particular point in time. The various Git commands are used to create, manipulate, view, or change the relationships between commits.

HEAD

A symbolic reference to the newest commit in the current branch. This reference determines which files you find in the working tree for editing. It is therefore the “head” or tip of a development branch (not to be confused with HEAD in systems like CVS or SVN).

SHA-1

The Secure Hash Algorithm creates a unique 160 bit checksum (40 hexadecimal characters) for any digital information. All commits in Git are named after their SHA-1 sum (commit ID), which is calculated from the contents and metadata of the commit. It is, so to speak, a content-dependent version number, such as f785b8f9ba1a1f5b707a2c83145301c807a7d661.

Object model

A git repository can be modeled as a graph of commits, manipulated by git commands. This modeling makes it very easy to describe how Git works in detail. For a detailed description of the object model, see Sec. 2.2, “The Object Model”.

Index

The index is an intermediate level between the working tree and the repository, where you prepare a commit. The index therefore indexes which changes to which files you want to package as commits. This concept is unique to Git and often causes difficulties for beginners and people switching to Git. We discuss the index in detail in Sec. 2.1.1, “Index”.

Clone

When you download a Git repository from the Internet, you create a clone of that repository. The clone contains all the information contained in the source repository, especially the entire version history including all commits.

Branch

A branch in the development. Branches are used in practice, for example, to develop new features, prepare releases, or to provide old versions with bug fixes. Branches are — just like the merging of branches (Merge) — extremely easy to handle in Git and an outstanding feature of the system.

master

Because you need at least one branch to work with Git, the Branch master is created when you initialize a new repository. The name is a convention (similar to trunk in other systems); you can rename or delete this branch as you wish, as long as at least one other branch is available. The master is technically no different from other branches.

Tag

Tags are symbolic names for hard-to-remember SHA-1 sums. You can use tags to mark important commits, such as releases. A tag can simply be an identifier, such as v1.6.2, or it can contain additional metadata such as author, description, and GPG signature.

1.2. First Steps with Git

To get you started, we’ll use a small example to illustrate the workflow with Git. We create a repository and develop a one-liner, a “Hello, World!” program in Perl.

In order for Git to assign a commit to an author, you need to enter your name and email address:

$ git config --global user.name "John Doe"
$ git config --global user.email "john.doe@example.com"

Note that a subcommand is specified when Git is called, in this case config. Git provides all operations through such subcommands. It is also important that no equal sign is used when calling git config. The following call is therefore incorrect:

$ git config --global user.name = "John Doe"

This is a trip hazard, especially for beginners, because Git does not output an error message, but takes the equals sign as the value to set.

1.2.1. Our First Repository

Before we use Git to manage files, we need to create a repository for the sample project. The repository will be created locally, so it will only be on the file system of the machine you are working on.

It’s generally recommended that you practice using Git locally first, and only later dive into the decentralized features and functions of Git.

$ git init example
Initialized empty Git repository in /home/esc/example/.git/

First, Git creates the directory example/ if it doesn’t already exist. Git then initializes an empty repository in this directory and creates a subdirectory .git/ for it, which is used to manage internal data. If the example/ directory already exists, Git creates a new Git repository in it. If both the directory and a repository already exist, Git does nothing. We change to the directory and look at the current state with git status:

$ cd example
$ git status
On branch master

Initial commit

nothing to commit (create/copy files and use "git add" to track)

Git tells us that we’re about to commit (Initial commit), but hasn’t found anything to commit (nothing to commit). Instead, it gives a hint as to what the next steps should be (most Git commands do that, by the way): “Create or copy files, and use git add to manage them with Git.”

1.2.2. Our First Commit

Now let’s give Git a first file to manage, which is a “Hello World!” program in Perl. Of course, you can write any program in the programming language of your choice instead.

We’ll first create the hello.pl file with the following content

print "Hello World!\n";

and execute the script once:

$ perl hello.pl
Hello World!

That means we’re ready to manage the file with Git. But let’s take a look at the output of git status first:

$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

      hello.pl
nothing added to commit but untracked files present (use "git add" to track)

While the first commit is still pending, Git registers that there are already files in that directory, but the system is unaware of them — Git calls them untracked. This is, of course, our little Perl program. To manage it with Git, we use the command git add <file>:

$ git add hello.pl

The add generally stands for “add changes” — so you will need it whenever you have edited files, not just when you first add them!

Git doesn’t provide output for this command. Use git status to check if the call was successful:

$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

      new file:   hello.pl

Git will apply the changes — our new file — at the next commit. However, this commit is not yet complete — we’ve only prepared it so far.

To be precise, we’ve added the file to the Index, an intermediate stage where you collect changes that will be included in the next commit. For further explanation of this concept, see Sec. 2.1.1, “Index”.

With git status, under Changes to be committed, you can always see which files are in the Index, i.e., will be included in the next commit.

Everything is ready for the first commit with the git commit command. We also pass the -m option on the command line with a commit message describing the commit:

$ git commit -m "First version"
[master (root-commit) 07cc103] First version
 1 file changed, 1 insertion(+)
 create mode 100644 hello.pl

Git will confirm that the process has been successfully completed and the file will be managed from now on. The somewhat cryptic output means Git has created the initial commit (root-commit) with the appropriate message. A line has been added to a file, and the file has been created with Unix permissions 0644.⁠[8]

As you’ve no doubt noticed by now, git status is an indispensable command in your daily work — we’ll use it again here:

$ git status
On branch master
nothing to commit, working directory clean

Our sample repository is now “clean”, because there are no changes in the Working Tree or Index, nor are there any files that are not managed with Git (untracked files).

1.2.3. Viewing Commits

To conclude this brief introduction, we’ll introduce you to two very useful commands that you’ll often use to examine the version history of projects.

First, git show allows you to examine a single commit — it’s the most recent one, with no arguments:

$ git show
commit 07cc103feb393a93616842921a7bec285178fd56
Author: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Tue Nov 16 00:40:54 2010 +0100

    First version

diff --git a/hello.pl b/hello.pl
new file mode 100644
index 0000000..fa5a091
--- /dev/null
+++ b/hello.pl
@@ -0,0 +1 @@
+print "Hello World!\n";

You see all relevant information about the commit: the commit ID, the author, the date and time of the commit, the commit message, and a summary of the changes in Unified-Diff format.

By default, git show always prints the HEAD (a symbolic name for the most recent commit), but you could also specify, for example, the commit ID, which is the SHA-1 checksum of the commit, a unique prefix to it, or the branch (master in this case). Thus, the following commands are equivalent in this example:

$ git show
$ git show HEAD
$ git show master
$ git show 07cc103
$ git show 07cc103feb393a93616842921a7bec285178fd56

If you want to view more than one commit, git log is recommended. More commits are needed to demonstrate the command in a meaningful way; otherwise, the output would be very similar to git show, since the sample repository currently contains only a single commit. So let’s add the following comment line to the “Hello World!” program:

# Hello World! in Perl

For the sake of the exercise, let’s take another look at the current status with git status:

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working
   directory)

      modified:   hello.pl

no changes added to commit (use "git add" and/or "git commit -a")

After that, as already described in the output of the command, use git add to add the changes to the index. As mentioned earlier, git add is used both to add new files and to add changes to files already managed.

$ git add hello.pl

Then create a commit:

$ git commit -m "Comment line"
[master 8788e46] Comment line
 1 file changed, 1 insertion(+)

Now git log shows you the two commits:

$ git log
commit 8788e46167aec2f6be92c94c905df3b430f6ecd6
Author: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Fri May 27 12:52:58 2011 +0200

    Comment line

commit 07cc103feb393a93616842921a7bec285178fd56
Author: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Tue Nov 16 00:40:54 2010 +0100

    First version

1.3. Configuring Git

Like most text-based programs, Git offers a wealth of configuration options. So now’s the time to do some basic configuration. These include color settings, which are turned on by default in newer versions, to make it easier to capture the output of Git commands, and small aliases (abbreviations) for frequently needed commands.

You configure Git with the git config command. The configuration is saved in a format similar to an INI file. Without specifying further parameters, the configuration applies only to the current repository (.git/config). With the --global option, it is stored in the .gitconfig file in the user’s home directory, and is then valid for all repositories.⁠[9]

Important settings that you should always configure are the user name and e-mail address:

$ git config --global user.name "John Doe"
$ git config --global user.email "john.doe@example.com"

Note that you must protect spaces in the setting value (using quotation marks or backslashes). Also, the value follows the name of the option directly — an equal sign is not necessary here either. The result of the command can be found in the file ~/.gitconfig:

$ less ~/.gitconfig
[user]
    name = John Doe
    email = john.doe@example.com

The settings are now “global”, meaning they apply to all repositories you edit under that user name. If you want to specify an e-mail address other than your globally defined one for a particular project, simply change the setting there (this time, of course, without adding --global):

$ git config user.email maintainer@project.example.com

When querying an option, Git will first use the setting in the current repository if it exists, otherwise the one from the global .gitconfig; if this does not exist either, it will fall back to the default value.⁠[10] The latter is available for all options in the man page git-config. You can get a list of all the settings you have set using git config -l.

You can also edit the .gitconfig file (or the repository .git/config) by hand. This is especially useful for deleting a setting — although git config also offers a --unset option, it is easier to delete the corresponding line in an editor.

The commands git config -e or git config --global -e launch the editor configured for Git on the local or global configuration file.

Note, however, that when you set options with an appropriate command, Git automatically protects problematic characters in the option’s value so that no bad configuration files are created.

1.3.1. Git Aliases

Git offers you the possibility to abbreviate single commands and even whole command sequences via Aliases. The syntax is:

$ git config alias.<alias-name> <command>

To set st as an alias for status:

$ git config --global alias.st status
$ git st
On branch master
...

You can also include options in an alias, for example:

$ git config --global alias.gconfig 'config --global'

You will find more useful aliases later in the book; how to create more complex aliases is described in Sec. 8.3.8, “Extended Aliases”. But first, some useful abbreviations:

[alias]
    st = status
    ci = commit
    br = branch
    co = checkout
    df = diff
    he = help
    cl = clone

1.3.2. Adjusting Colours

Very helpful is the color.ui option, which checks whether Git should color the output of various commands. Thus, deleted files and lines appear red, new files and lines appear green, commit IDs appear yellow, etc. In newer Git versions (1.8.4 and later) this setting is already set automatically, so you don’t need to do anything.

The color.ui option should be set to auto — if the output from Git is to a terminal, colors are used. If the command is written to a file instead, or the output is piped to another program, Git will not output color sequences, as this could interfere with automatic processing.

$ git config --global color.ui auto

1.3.3. Configuring Character Sets

Unless set otherwise, Git assumes UTF-8 as the character encoding for all text, especially author names and the commit message. If you want a different encoding, you should configure it explicitly:⁠[11]

$ git config i18n.commitEncoding ISO-8859-1

Similarly, the setting i18n.logOutputEncoding determines the character set Git converts names and commit messages to before outputting them.

The encoding of the files managed by Git is not important here and is not affected by these settings — files are only bit streams that Git does not interpret.

If you have to handle files encoded according to ISO-8859-1 in a UTF-8 environment, you should adjust the setting of your pager (see below) accordingly. The following setting is recommended for authors:

$ git config core.pager 'env LESSCHARSET=iso8859 less'

1.3.4. Line End Settings

Since Git runs on Windows systems like it does on unixoid systems, it has to solve the problem of different line-end conventions. (This only affects text files — binaries that Git recognizes as such are excluded from this treatment).

The core.eol setting, which can take one of the values lf, crlf or native, is mainly relevant for this. The default setting native lets Git use the system default — Unix: Line Feed (lf) only, Windows: Carriage Return & Line Feed (crlf). The file is automatically converted to get line feeds only, but is checked out with CRLF if necessary.

Git can convert between the two types when you check out the file, but it’s important not to mix the two. For this, the core.safecrlf option provides a mechanism to warn the user (value warn) or even disallow the commit (value true).

A safe setting, which also works with older Git versions on Windows systems, is to set core.autocrlf to input: This will automatically replace CRLF with LF when reading files from the filesystem. Your editor must then be able to handle LF line endings accordingly.

You can also specify these settings explicitly per file or subdirectory, so that the format is the same across all platforms (see Sec. 8.1, “Git Attributes — Treating Files Separately”).

1.3.5. Editor, Pager and Browser Settings

Git automatically starts an editor, pager, or browser for certain actions. Usually reasonable defaults are used, but if not, you can configure your preferred program with the following options:

  • core.editor

  • core.pager

  • web.browser

A word about the pager: By default, Git uses the less program, which is installed on most basic systems. The command is always started whenever a Git command produces output on a terminal. However, less is automatically configured by an environment variable to quit when the output is completely fit on the terminal. So, if a command produces a lot of output, less will automatically come to the foreground — and remain invisible otherwise.

If core.pager is set to cat, Git will not use a pager. However, this behavior can be achieved from command to command using the --no-pager parameter. In addition, you can use git config pager.diff false to ensure that the output of the diff command is never sent to the pager.

1.3.6. Configuration via Environment Variables

Some options can also be overridden by environment variables. In this way, options can be set in a shell script or alias for a single command only.

GIT_EDITOR

the editor that Git starts, for example, to create the commit message. Alternatively, Git uses the EDITOR variable.

GIT_PAGER

the pager to be used. The value cat switches the pager off.

GIT_AUTHOR_EMAIL, GIT_COMMITTER_EMAIL

uses the appropriate email address for the author or committer field when creating a commit.

GIT_AUTHOR_NAME, GIT_COMMITTER_NAME

analogous to the name.

GIT_DIR

Directory in which the Git repository is located; only makes sense if a repository is explicitly stored under a directory other than .git.

The latter variable is useful, for example, if you want to access the version history of another repository within a project without changing directory:

$ GIT_DIR="~/proj/example/.git" git log

Alternatively, you can use the -c option before the subcommand to overwrite a setting for this call only. For example, you could tell Git to disable the core.trustctime option for the upcoming call:

$ git -c core.trustctime=false status

1.3.7. Automatic Error Correction

The value of the help.autocorrect option determines what Git should do if it can’t find the subcommand you entered, for example if you accidentally type git statsu instead of git status.

If the option is set to a number n greater than zero and Git only finds a subcommand similar to the typed command, this command is executed after n tenths of a second. A value of -1 executes the command immediately. Unset or with the value 0, only the possibilities are listed.

So to correct a typo after one second, set:

$ git config --global help.autocorrect 10
$ git statsu
WARNING: You called a Git command named 'statsu', which does not exist.
Continuing under the assumption that you meant 'status'
in 1.0 seconds automatically...
[...]

You can of course cancel the command during this time with Ctrl+C.

2. The Basics

In this chapter, we’ll introduce you to the most important Git commands that you can use to manage your project files in Git. Understanding the Git object model is essential for advanced usage; we’ll cover this important concept in the second section of the chapter. While these explanations may seem overly theoretical at first, we encourage you to read them carefully. All further actions will be much easier for you with the knowledge of this background.

2.1. Git Commands

The commands you learned to get started (especially add and commit) work on the index. In the following, we will take a closer look at the index and the extended use of these commands.

2.1.1. Index

The content of files for Git resides on three levels: the working tree, the index, and the Git repository. The working tree corresponds to the files as they reside on your workstation’s file system — so if you edit files with an editor, search in them with grep, etc., you always operate on the working tree.

The repository is the repository for commits, that is, changes, with author, date, and description. The commits together make up the version history.

Unlike many other version control systems, Git now introduces a new feature, the index. It’s a somewhat elusive intermediate level between the working tree and the repository. Its purpose is to prepare commits. This means that you don’t always have to check in all the changes you have made to a file as commits.

The Git commands add and reset act (in their basic form) on the index, making changes to the index and deleting them again; only the commit command transfers the file to the repository as it is held in the index (Figure 1, “Commands add, reset and commit).

index
Figure 1. Commands add, reset and commit

In the initial state, i.e. when git status outputs the message nothing to commit, the working tree and index are synchronized with HEAD. The index is therefore not “empty”, but contains the files in the same state as they are in the working tree.

Usually, the workflow is then as follows: First, you make a change to the working tree using an editor. This change is transferred to the index by add and finally saved in the repository by commit.

You can display the differences between these three levels using the diff command. A simple git diff shows the differences between the working tree and the index — the differences between the (actual) files on your working system and the files as they would be checked in if you called git commit.

The git diff --staged command, on the other hand, shows the differences between the index (also called the staging area) and the repository, that is, the differences that a commit would commit to the repository. In the initial state, when the working tree and index are in sync with HEAD, neither git diff nor git diff --staged produces output.

If you want to apply all changes to all files, there are two shortcuts: First, the -u or --update option of git add. This transfers all changes to the index, but does not yet create a commit. You can further abbreviate it with the -a or --all option of git commit. This is a combination of git add -u and git commit, which puts all changes to all files into one commit, bypassing the index. Avoid getting into the habit of using these options — they may be handy as shortcuts on occasion, but they reduce flexibility.

2.1.1.1. Word-Based Diff

An alternative output format for git diff is the so-called Word-Diff, which is available via the --word-diff option. Instead of the removed and added lines, the output of git diff shows the added (green) and removed (red) words with an appropriate syntax and color-coded.⁠[12] This is useful when you are only changing single words in a file, for example when correcting AsciiDoc or LaTeX documents, because a diff is difficult to read if added and removed lines differ by only one word:

$ git diff
...
-   die Option `--color-words` zur Verfgung steht. Statt der entfernten
+   die Option `--color-words` zur Verfügung steht. Statt der entfernten
...

However, if you use the --word-diff option, only words that have been changed will be displayed marked accordingly; in addition, line breaks are ignored, which is also very practical because a reorientation of the words is not included as a change in the diff output:

$ git diff --word-diff
...
--color-words zur [-Verfgung-]{Verfügung} steht.
...

If you work a lot with continuous text, it is a good idea to set up an alias to abbreviate this command, so that you only have to type git dw, for example:

$ git config --global alias.dw "diff --word-diff"

2.1.2. Creating Commits Step by Step

But why create commits step-by-step — don’t you always want to check in all changes?

Yes, of course, you usually want to commit your changes completely. However, it can be useful to check them in step by step, for example, to better reflect the development history.

An example: You have worked intensively on your software project for the past three hours, but because it was so exciting, you forgot to pack the four new features into handy commits. In addition, the features are scattered over various files.

At best, you want to be selective, that is, you don’t want to commit all changes from one file, but only certain lines (functions, definitions, tests, …​), and from different files.

Git’s index provides the flexibility you need for this. You collect some changes in the index and pack them into a commit — but all other changes are still preserved in the files.

We’ll illustrate this using the “Hello World!” example from the previous chapter. As a reminder, the contents of the hello.pl file

# Hello World! in Perl
print "Hello World!\n";

Now we prepare the file so that it has several independent changes that we don’t want to combine into a single commit. First, we add a shebang line at the beginning.⁠[13] We also add a line naming the author, and the Perl statement use strict, which tells the Perl interpreter to be as strict as possible in its syntax analysis. It is important for our example that the file has been changed in several places:

#!/usr/bin/perl
# Hello World! in Perl
# Author: Valentin Haenel
use strict;
print "Hello World!\n";

With a simple git add hello.pl all new lines would be added to the index — so the state of the file in the index would be the same as in the working tree. Instead, we use the --patch option or short -p.⁠[14] This has the effect that we are interactively asked which changes we want to add to the index. Git offers us each change one by one, and we can decide on a case-by-case basis how we want to handle them:

$ git add -p
diff --git a/hello.pl b/hello.pl
index c6f28d5..908e967 100644
--- a/hello.pl
+++ b/hello.pl
@@ -1,2 +1,5 @@
+#!/usr/bin/perl
 # Hello World! in Perl
+# Author: Valentin Haenel
+use strict;
 print "Hello World!\n";
Stage this hunk [y,n,q,a,d,/,s,e,?]?

This is where Git shows all changes, since they’re very close together in the code. If the changes are far apart or spread across different files, they’re offered separately. The term hunk refers to loosely connected lines in the source code. Some of the options we have at this point include the following:

Stage this hunk[y,n,q,a,d,/,s,e,?]?

The options are each only one letter long and difficult to remember. A small reminder is always given by [?]. We have summarized the most important options below.

y (yes)

Transfer the current hunk to the index.

n (no)

Don’t pick up the current hunk.

q (quit)

Do not pick up the current hunk or any of the following ones.

a (all)

Pick up the current hunk and all those that follow (in the current file).

s (split)

Try to split the current hunk.

e (edit)

Edit the current hunk.⁠[15]

In the example we split the current hunk and enter s for split.

Stage this hunk [y,n,q,a,d,/,s,e,?]? [s]
Split into 2 hunks.
@@ -1 +1,2 @@
+#!/usr/bin/perl
 # Hello World! in Perl

Git confirms that the hunk was successfully split, and now offers us a diff that contains only the shebang line.⁠[16] We specify y for yes and q for quit on the next hunk. To check if everything worked, we use git diff with the --staged option, which shows the difference between index and HEAD (the latest commit):

$ git diff --staged
diff --git a/hello.pl b/hello.pl
index c6f28d5..d2cc6dc 100644
--- a/hello.pl
+++ b/hello.pl
@@ -1,2 +1,3 @@
+#!/usr/bin/perl
 # Hello World! in Perl
 print "Hello World!\n";

To see which changes are not yet in the index, a simple call to git diff is enough to show us that — as expected — there are still two lines in the working tree:

$ git diff
diff --git a/hello.pl b/hello.pl
index d2cc6dc..908e967 100644
--- a/hello.pl
+++ b/hello.pl
@@ -1,3 +1,5 @@
 #!/usr/bin/perl
 # Hello World! in Perl
+# Author: Valentin Haenel
+use strict;
 print "Hello World!\n";

At this point we could create a commit, but for demonstration purposes we want to start from scratch. So we use git reset HEAD to reset the index.

$ git reset HEAD
Unstaged changes after reset:
M   hello.pl

Git confirms and names the files that have changes in them; in this case, it’s just the one.

The git reset command is in a sense the counterpart of git add: Instead of transferring differences from the working tree to the index, reset transfers differences from the repository to the index. Committing changes to the working tree is potentially destructive, as your changes may be lost. Therefore, this is only possible with the --hard option, which we discuss in Sec. 3.2.3, “Reset and the Index”.

If you frequently use git add -p, it is only a matter of time before you accidentally select a hunk you didn’t want. If the index was empty, this is not a problem since you can reset it to start over. It only becomes a problem if you have already recorded many changes in the index and don’t want to lose them, i.e. you remove a particular hunk from the index without wanting to touch the other hunks.

Analogous to git add -p there is the command git reset -p, which removes single hunks from the index. To demonstrate this, let’s first apply all changes with git add hello.pl and then run git reset -p.

$ git reset -p
diff --git a/hello.pl b/hello.pl
index c6f28d5..908e967 100644
--- a/hello.pl
+++ b/hello.pl
@@ -1,2 +1,5 @@
+#!/usr/bin/perl
 # Hello World! in Perl
+# Author: Valentin Haenel
+use strict;
 print "Hello World!\n";
Unstage this hunk [y,n,q,a,d,/,s,e,?]?

As in the example with git add -p, Git offers hunks one by one, but this time all the hunks in the index. Accordingly, the question is: Unstage this hunk [y,n,q,a,d,/,s,e,?]?, i.e. whether we want to remove the hunk from the index again. As before, by entering the question mark we get an extended description of the available options. At this point we press s once for split, n once for no and y once for yes. Now only the shebang line should be in the index:

$ git diff --staged
diff --git a/hello.pl b/hello.pl
index c6f28d5..d2cc6dc 100644
--- a/hello.pl
+++ b/hello.pl
@@ -1,2 +1,3 @@
+#!/usr/bin/perl
 # Hello World! in Perl
 print "Hello World!\n";

In the interactive modes of git add and git reset, you must press the Enter key after entering an option. The following configuration setting will save you this extra keystroke.

$ git config --global interactive.singlekey true

A word of warning: A git add -p may tempt you to check in versions of a file that are not executable or syntactically correct (e.g. because you forgot an essential line). So don’t rely on your commit being correct just because make — which works on working tree files! -- runs successfully. Even if a later commit fixes the problem, it will still be a problem, among other things, with automated debugging via bisect (see Sec. 4.8, “Finding Regressions — Git Bisect”).

2.1.3. Creating Commits

You now know how to exchange changes between working tree, index, and repository. Let’s turn to the git commit command, which you use to “commit” changes to the repository.

A commit keeps track of the state of all the files in your project at any given time, and also contains meta-information:⁠[17]

  • Name of the authors and e-mail address

  • Name of the committer and e-mail address

  • Creation date

  • Commit date

In fact, the name of the author does not have to be the name of the committer (who commits). Often, commits are integrated or edited by maintainers (for example, by rebase, which also adjusts the committer information, see Sec. 4.1, “Moving commits — Rebase”). The committer information is usually of secondary importance, though — most programs only show the author and the date the commit was made.

When you create a commit, Git uses the user.name and user.email settings configured in the previous section to identify the commit.

If you call git commit without any additional arguments, Git will combine all changes in the index into one commit, and open an editor to create a commit message. However, the message will always contain instructions commented out with hash marks (#), or information about which files are changed by the commit. If you call git commit -v, you will still get a diff of the changes you will check in, below the instructions. This is especially useful for keeping track of the changes, and for using the auto-complete feature of your editor.

Once you exit the editor, Git creates the commit. If you don’t specify a commit message or delete the entire contents of the file, Git will abort and not create a commit.

If you only want to write one line, you can use the --message option, or short -m, which allows you to specify the message directly on the command line, thus bypassing the editor:

$ git commit -m "Dies ist die Commit-Nachricht"
2.1.3.1. Improving a Commit

If you rashly entered git commit, but want to make the commit slightly better, the --amend (“correct”) option helps. The option causes git to “add” the changes in the index to the commit you just made.⁠[18] You can also customize the commit message. Note that the SHA-1 sum of the commit will change in any case.

The git commit --amend call only changes the current commit on a branch. Sec. 4.1.9, “Improving a Commit” describes how to improve past commits.

Calling git commit --amend automatically starts an editor, so you can edit the commit message as well. Often, however, you will only want to make a small correction to a file without adjusting the message. For authors, an alias fixup is useful in this situation:

$ git config --global alias.fixup "commit --amend --no-edit"
2.1.3.2. Good Commit Messages

What should a commit message look like? Not much can be changed in the outer form: The commit message must be at least one line long, but preferably no longer than 50 characters. This makes lists of commits easier to read. If you want to add a more detailed description (which is highly recommended!), separate it from the first line with a blank line. No line should be longer than 76 characters, as is usual for email.

Commit messages often follow the habits or specifics of a project. There may be conventions, such as references to the bug tracking or issue system, or a link to the appropriate API documentation.

Note the following points when writing a commit description:

  • Never create empty commit messages. Commit messages such as Update, Fix, Improvement, etc. are just as meaningful as an empty message — you might as well leave it at that.

  • Very important: Describe why something was changed and what the implications are. What has been changed is always obvious from the diff!

  • Be critical and note if you think there is room for improvement or the commit may introduce bugs elsewhere.

  • The first line should not be longer than 50 characters, so the output of the version history always remains well formatted and readable.

  • If the message becomes longer, a short summary (with the important keywords) should be in the first line. After a blank line follows an extensive description.

We can’t stress enough how important a good commit description is. When committing, a developer remembers the changes well, but after a few days, the motivation behind them is often forgotten. Your colleagues or project members will thank you, too, because they can commit changes much faster.

Writing a good commit message also helps to briefly reflect on what has been done and what is still to come. You may find that you’ve forgotten one important detail as you write it.

You can also argue about a timeline: The time it takes you to write a good commit message is a minute or two. But how much less time will the bug-finding process take if each commit is well documented? How much time will you save others (and yourself) if you provide a good description of a diff, which may be hard to understand? Also, the blame tool, which annotates each line of a file with the commit that last changed it, will become an indispensable tool for detailed commit descriptions (see Sec. 4.3, “Who Made These Changes? — Git Blame”).

If you are not used to writing detailed commit messages, start today. Practice makes perfect, and once you get used to it, the work will go quickly — you and others will benefit.

The Git repository is a prime example of good commit messaging. Without knowing the details of Git, you’ll quickly know who changed what and why. You can also see how many hands a commit goes through before it’s integrated.

Unfortunately, the commit messages in most projects are still very spartan, so don’t be disappointed if your peers are lazy about writing, but rather set a good example and provide detailed descriptions.

2.1.4. Moving and Deleting Files

If you want to delete or move files managed by Git, use git rm or git mv. They act like the regular Unix commands, but they also modify the index so that the action is included in the next commit.⁠[19]

Like the standard Unix commands, git rm also accepts the -r and -f options to recursively delete or force deletion. git mv also offers an option -f (force) if the new filename already exists and should be overwritten. Both commands accept the option -n or --dry-run, which simulates the process and does not modify files.

To delete a file from the index only, use git rm --cached. It then remains in the working tree.

You will often forget to move a file via git mv or delete it via git rm, and use the standard Unix commands instead. In this case, simply mark the file (already deleted by rm) as deleted in the index, too, using git rm <file>.

To rename the file, proceed as follows: First mark the old file name as deleted using git rm <old-name>. Then add the new file: git add <new-name>. Then check via git status whether the file is marked as “renamed”.

Internally, it doesn’t matter to Git whether you move a file regularly via mv, then run git add <new-name> and git rm <old-name>. In any case, only the reference to a blob object is changed (seeSec. 2.2, “The Object Model”).

However, Git comes with a so-called Rename Detection: If a blob is the same and is only referenced by a different file name, Git interprets this as a rename. If you want to examine the history of a file and follow it if it is renamed, use the following command:

$ git log --follow -- <file>

2.1.5. Using Grep on a Repository

If you want to search for an expression in all files of your project, you can usually use grep -R <expression> ..

However, Git offers its own grep command, which you can call up using git grep <expression>. This command usually searches for the expression in all files managed by Git. If you want to examine only some of the files instead, you can specify the pattern explicitly. With the following command you can find all occurrences of border-color in all CSS files:

$ git grep border-color -- '*.css'

The grep implementation of Git supports all common flags that are also present in GNU Grep. However, calling git grep is usually an order of magnitude faster, since Git has significant performance advantages due to the object database and the multithreaded design of the command.

The popular grep alternative ack is characterized mainly by the fact that it combines the lines of a file matching the search pattern under a corresponding “heading”, and uses striking colors. You can emulate the output of ack with git grep by using the following alias:

$ git config alias.ack '!git -c color.grep.filename="green bold" \
  -c color.grep.match="black yellow" -c color.grep.linenumber="yellow bold" \
  grep -n --break --heading --color=always --untracked'

2.1.6. Examining the Project History

Use git log to examine the project’s version history. The options of this command (most of which also work for git show) are very extensive, and we will introduce the most important ones below.

Without any arguments, git log will output the author, date, commit ID, and the full commit message for each commit. This is handy when you need a quick overview of who did what and when. However, the list is a bit cumbersome when you’re looking at a lot of commits.

If you only want to look at recently created commits, limit git log’s output to n commits with the -<n> option. For example, the last four commits are shown with:

$ git log -4

To display a single commit, enter:

$ git log -1 <commit>

The <commit> argument is a legal name for a single commit, such as the commit ID or SHA-1 sum. However, if you do not specify anything, Git automatically uses HEAD. Apart from single commits, the command also understands so-called commit ranges (series of commits), see Sec. 2.1.7, “Commit-Ranges”.

The -p (--patch) option appends the full patch in Unified-Diff format below the description. Thus, a git show <commit> from the output is equivalent to git log -1 -p <commit>.

If you want to display the commits in compressed form, we recommend the --oneline option: It summarizes each commit with its abbreviated SHA-1 sum and the first line of the commit message. It is therefore important that you include as much useful information as possible in this line! For example, this would look like this:⁠[20]

$ git log --oneline
25f3af3 Correctly report corrupted objects
786dabe tests: compress the setup tests
91c031d tests: cosmetic improvements to the repo-setup test
b312b41 exec_cmd: remove unused extern

The --oneline option is only an alias for --pretty=oneline. There are other ways to customize the output of git log. The possible values for the --pretty option are:

oneline

Commit-ID and first line of the description.

short

Commit ID, first line of the description and author of the commit; output is four lines.

medium

Default; output of commit ID, author, date and complete description.

full

Commit ID, author’s name, name of the committer and full description — no date.

fuller

Like medium, but additionally date and name of the committer.

email

Formats the information from medium so that it looks like an e-mail.

format:⁠<string>

Any format can be adapted by placeholders; for details see the man page git-log(1), section “Pretty Formats”.

Independently of this, you can display more information about the changes made by the commit below the commit message. Consider the following examples, which clearly show which files were changed in how many places:

$ git log -1 --oneline 4868b2ea
4868b2e setup: officially support --work-tree without --git-dir

$ git log -1 --oneline --name-status 4868b2ea
4868b2e setup: officially support --work-tree without --git-dir
M       setup.c
M       t/t1510-repo-setup.sh

$ git log -1 --oneline --stat 4868b2ea
4868b2e setup: officially support --work-tree without --git-dir
 setup.c               |   19
 t/t1510-repo-setup.sh |  210 +++++++++++++++++------------------
 2 files changed, 134 insertions(), 95 deletions(-)

$ git log -1 --oneline --shortstat 4868b2ea
4868b2e setup: officially support --work-tree without --git-dir
 2 files changed, 134 insertions(+), 95 deletions(-)
2.1.6.1. Time Constraints

You can restrict the time of the commits to be displayed using the --after or --since and --until or --before options. The options are all synonymous, so they give the same results.

You can specify absolute dates in any common format, or relative dates, here are some examples:

$ git log --after='Tue Feb 1st, 2011'
$ git log --since='2011-01-01'
$ git log --since='two weeks ago' --before='one week ago'
$ git log --since='yesterday'
2.1.6.2. File-Level Restrictions

If you specify one or more file or directory names after a git log call, Git will only display the commits that affect at least one of the specified files. Provided a project is well structured, the output of commits can be severely limited and a particular change can be found quickly.

Since filenames may collide with branches or tags, you should be sure to specify the filenames after a -- which means that only file arguments follow.

$ git log -- main.c
$ git log -- *.h
$ git log -- Documentation/

These calls only output the commits in which changes were made to the main.c file, an .h file, or a file under Documentation/.

2.1.6.3. Grep for Commits

You can also search for commits in the style of grep, where the --author, --committer, and --grep options are available.

The first two options filter commits by author or committer name or address, as expected. For example, list all commits that Linus Torvalds has made since early 2010:

$ git log --since='2010-01-01' --author='Linus Torvalds'

You can also enter only part of the name or e-mail address here, so searching for 'Linus' would produce the same result.

For example, you can use --grep to search for keywords or phrases in the commit message, such as all commits that contain the word “fix” (not case-sensitive):

$ git log -i --grep=fix

The -i (or --regexp-ignore-case) option causes git log to ignore the pattern case (also works with --author and --committer).

All three options treat the values as regular expressions, just like grep (see the regex(7) man page). The -E and -F options change the behaviour of the options in the same way as egrep and fgrep: to use extended regular expressions or to search for the literal search term (whose special characters lose their meaning).

To search for changes, use the so-called Pickaxe tool. This will help you find commits whose diffs contain a certain regular expression (“grep for diffs”):

$ git log -p -G<regex>

The <regex> must be specified directly, i.e. without spaces, after the -G pickaxe option. The --pickaxe-all option causes all changes to the commit to be listed, not just those containing the change you are looking for.

Note that in earlier versions of Git, this operation was performed by the -S option, but it differs from -G in that it only finds the commits that change the number of times the pattern occurs — especially code shifts, i.e., removals and additions elsewhere in a file, are not found.

Equipped with these tools, you can now tame masses of commits yourself. Just specify as many criteria as you need to reduce the number of commits.

2.1.7. Commit-Ranges

So far, we’ve only looked at commands that require only a single commit as an argument, explicitly identified by its commit ID, or implicitly by the symbolic name HEAD, which references the most recent commit.

The git show command displays information about a commit, while the git log command starts at a commit, and then goes back in the version history until the beginning of the repository (called the root commit) is reached.

An important tool for specifying a series of commits is the so-called commit ranges in the form <commit1>..<commit2>. Since we have not yet worked with multiple branches, this is simply a range of commits in a repository, from <commit1> exclusive to <commit2> inclusive. If you omit one of the two boundaries, Git will take the value HEAD.

2.1.8. Differences between Commits

The command git show or git log -p has been used to show only the difference from the previous commit. If you want to see the differences between several commits, the command git diff.

The diff command performs several tasks. As already seen, you can examine the differences between the working tree and the index without specifying any commits, or the differences between index and HEAD with the --staged option.

However, if you pass two commits or a commit range to the command, the difference between these commits is displayed instead.

2.2. The Object Model

Git is based on a simple but extremely powerful object model. It is used to map the typical elements of a repository (files, directories, commits) and the development over time. Understanding this model is very important, and it helps to abstract from typical Git steps to better understand them.

In the following, we will again use a “Hello World!” program as an example, this time in the Python programming language.⁠[21]

objektmodell programm crop
Figure 2. “Hello World!” Program in Python

The project consists of the file hello.py as well as a README file and a directory test. If you run the program with the command python hello.py, you will get the output: Hello World!. In the directory test is a simple shell script, test.sh, which displays an error message if the Python program does not output the string Hello World! as expected.

The repository for this project consists of the following four commits:

$ git log --oneline
e2c67eb Kommentar fehlte
8e2f5f9 Test Datei
308aea1 README Datei
b0400b0 Erste Version

2.2.1. SHA-1 — The Secure Hash Algorithm

SHA-1 is a secure hash algorithm that calculates a checksum of digital information: the SHA-1 sum. The algorithm was introduced in 1995 by the American National Institute of Standards and Technology (NIST) and the National Security Agency (NSA). SHA-1 was developed for cryptographic purposes and is used for checking the integrity of messages and as a basis for digital signatures. Figure 3, “SHA-1 Algorithm” shows how it works, where we calculate the checksum of hello.py.

The algorithm is a mathematical one-way function that maps a bit sequence of maximum length 264-1 bits (about 2 exbibytes) to a checksum of length 160 bits (20 bytes). The checksum is usually represented as a hexadecimal character string of length 40. The algorithm results in 2160 (approx. 1.5 · 1049) different combinations for this length of checksum, and therefore it is very, very unlikely that two bit sequences have the same checksum. This property is called collision safety.

sha
Figure 3. SHA-1 Algorithm

Despite all efforts of cryptologists, several years ago various theoretical attacks on SHA-1 became known, which are supposed to make the generation of collisions possible with a considerable computing effort.⁠[22] For this reason, NIST today recommends the use of the successors of SHA-1: SHA-256, SHA-384 and SHA-512, which have longer checksums and thus make the generation of collisions more difficult. On the Git mailing list there was a debate about switching to one of these alternatives, but this step was not considered necessary.⁠[23]

This is because, although there is a theoretical attack vector on the SHA-1 algorithm, this does not compromise the security of Git. In fact, the integrity of a repository is not primarily protected by the collision resistance of an algorithm, but by the fact that many developers have identical copies of the repository.

The SHA-1 algorithm plays a central role in Git because it is used to build checksums of the data stored in the Git repository, the Git objects. This makes them easy to reference as SHA-1 sums of their contents. In your daily work with Git, you will usually only use SHA-1 sums of commits, known as commit IDs. This reference can be passed to many Git commands, such as git show and git diff. Depending on the repository, you often only need to specify the first few characters of an SHA-1 sum, since in practice a prefix is sufficient to uniquely identify a commit.

2.2.2. The Git Objects

All data stored in a Git repository is available as Git objects. There are four types:⁠[24]

Table 1. Git Objects
Object Saves…​ References other objects Correspondence

Blob

File content

No

File

Tree

Blobs and Trees

Yes

Directory

Commit

Project state

Yes, a tree and further commits

Snapshot/Archive at a time

Tag

Tag information

Yes, an object

Naming important snapshots or blobs

Figure 4, “Git Objects” shows three objects from the example project — a blob, a tree, and a commit.⁠[25] The representation of each object includes the object type, the size in bytes, the SHA-1 sum, and the contents. The blob contains the content of the file hello.py (but not the file name). The tree contains references to one blob for each file in the project, i.e. one for hello.py and one for README, plus one tree per subdirectory, i.e. in this case only one for test. The files in the subdirectories are referenced separately in the respective trees that map these subdirectories.

git objects
Figure 4. Git Objects

So the commit object contains exactly one reference to a tree, and that reference is to the tree of the project content — this is a snapshot of the state of the project. The commit object also contains a reference to its direct ancestors, along with the metadata “author” and “committer” and the commit message.

Many Git commands expect a tree as an argument. However, because a commit, for example, references a tree, this is called a tree-ish argument. This refers to any object that can last be resolved to a tree. This category also includes tags (see Sec. 3.1.3, “Tags — Marking Important Versions”). Similarly, commit-ish is an argument that can be resolved to a commit.

File contents are always stored in blobs. Trees only contain references to blobs and other trees in the form of the SHA-1 sums of these objects. A commit in turn references a tree.

2.2.3. The Object Database

All Git objects are stored in the object database and are identifiable by their unique SHA-1 sum, i.e. you can find an object in the database by its SHA-1 sum once it has been stored. Thus, the object database basically functions like a large hash table, where the SHA-1 sums serve as keys for the stored contents:⁠[26]

e2c67eb ⟶ commit
8e2f5f9 ⟶ commit
308aea1 ⟶ commit
b0400b0 ⟶ commit
a26b00a ⟶ tree
6cf9be8 ⟶ blob  (README)
52ea6d6 ⟶ blob  (hello.py)
c37fd6f ⟶ tree  (test)
e92bf15 ⟶ blob  (test/test.sh)
5b4b58b ⟶ tree
dcc027b ⟶ blob  (hello.py)
e4dc644 ⟶ tree
a347f5e ⟶ tree

You will first see the four commits that make up the Git repository, including the e2c67eb commit shown in Figure 4, “Git Objects”. This is followed by trees and blobs, each with file or directory correspondence. So-called top-level trees have no directory name: They refer to the top level of a project. A commit always references a top-level tree, so there are four of them.

The hierarchical relationship of the objects listed above is shown in Figure 5, “Hierarchical Relationship of Git Objects”. On the left-hand side, you can see the four commits that are already in the repository, and on the right-hand side, the referenced contents of the most recent commit (C4). As described above, each commit contains a reference to its direct predecessor (the resulting graph of commits is discussed below). This relationship is illustrated by the arrows pointing from one commit to the next.

git objects hierarchy
Figure 5. Hierarchical Relationship of Git Objects

Each commit references the top-level tree — including the C4 commit in the example. The top-level tree in turn references the files hello.py and README in the form of blobs, and the subdirectory test in the form of another tree. Because of this hierarchical structure and the relationship of the individual objects to one another, Git is able to map the contents of a hierarchical file system as Git objects and store them in the object database.

2.2.4. Examining the Object Database

In a short digression we will go into how to examine the object database of Git. To do this, Git provides so-called plumbing commands, a group of low-level tools for Git, as opposed to the porcelain commands you usually work with. These commands are therefore not important for Git beginners, but are simply intended to give you a different approach to the concept of the object database. For more information, see Sec. 8.3, “Writing Your Own Git Commands”.

Let’s first look at the current commit. We’ll use the git show command with the --format=raw option, so let’s output the commit in raw format, so that everything this commit contains is displayed.

$ git show --format=raw e2c67eb
commit e2c67ebb6d2db2aab831f477306baa44036af635
tree a26b00aaef1492c697fd2f5a0593663ce07006bf
parent 8e2f5f996373b900bd4e54c3aefc08ae44d0aac2
author Valentin Haenel <valentin.haenel@gmx.de> 1294515058 +0100
committer Valentin Haenel <valentin.haenel@gmx.de> 1294516312 +0100

    Kommentar fehlte
...

As you can see, all the information in Figure 4, “Git Objects” is output: the SHA-1 sums of the commit, tree, and direct ancestor, plus the author and committer (including the date as a Unix timestamp), and the commit description. The command also provides the diff output for the previous commit — but this is not part of the commit, strictly speaking, and is therefore omitted here.

Next, let’s take a look at the tree referenced by this commit, using git ls-tree, a plumbing command to list the contents stored in a tree. It’s similar to ls -l, except that it is in the object database. With --abbrev=7 we shorten the output SHA-1 sums to seven characters.

$ git ls-tree --abbrev=7 a26b00a
100644 blob 6cf9be8  README
100644 blob 52ea6d6  hello.py
040000 tree c37fd6f  test

As in Figure 4, “Git Objects” the tree referenced by the commit contains one blob for each of the two files, and one tree (also: subtree) for the test directory. We can look at its contents again with ls-tree, since we now know the SHA-1 sum of the tree. As expected, you can see that the test tree references exactly one blob, the blob for the file test.sh.

$ git ls-tree --abbrev=7 c37fd6f
100755 blob e92bf15  test.sh

Finally, we make sure that the blob for hello.py really contains our “Hello World!” program and that the SHA-1 sum is correct. The command git show shows any objects. If we pass the SHA-1 sum of a blob, its contents are output. To check the SHA-1 sum we use the plumbing command git hash-object.

$ git show 52ea6d6
#! /usr/bin/env python

""" Hello World! """

print 'Hello World!'
$ git hash-object hello.py
52ea6d6f53b2990f5d6167553f43c98dc8788e81

A note for curious readers: git hash-object hello.py does not produce the same output as the Unix command sha1sum hello.py. This is because not only the file content is stored in a blob. Instead, the object type, in this case blob, and the size, in this case 67 bytes, are stored in a header at the beginning of the blob. The hash-object command therefore does not calculate the checksum of the file content, but of the blob object.

2.2.5. Deduplication

The four commits that make up the sample repository are shown again in Figure 6, “Repository Content”, but in a different way: The dashed bordered tree and blob objects indicate unchanged objects, all others were added or changed in the corresponding commit. The reading direction here is from bottom to top: at the bottom is C1, which contains only the file hello.py.

Since trees only contain references to blobs and other trees, each commit stores the status of all files, but not their contents. Normally, only a few files change during a commit. New blob objects (and therefore new tree objects) are now created for the new files or those to which changes have been made. However, the references to the unchanged files remain the same.

repository content
Figure 6. Repository Content

Even more: A file that exists twice only exists once in the object database. The contents of this file are stored as a blob in the object database and are referenced by a tree in two places. This effect is known as deduplication: Duplicates are not only prevented, but not made possible in the first place. Deduplication is an essential feature of Content-Addressable File Systems, i.e. file systems that know files only by their contents (such as Git, for example, by giving an object the SHA-1 sum of itself as “name”).

Consequently, a repository in which the same 1 MB file exists 1000 times takes up only slightly more than 1 MB. Git essentially has to manage the blob, plus a commit and a tree with 1000 blob entries (20 bytes each plus the length of the filename). A checkout of this repository, on the other hand, consumes about 1 GB of space on the filesystem because Git resolves deduplication.⁠[27]

The git checkout and git reset commands restore a previous state (see also Sec. 3.2, “Restoring Versions”): You specify the reference of the corresponding commit, and Git searches for it in the object database. The reference is then used to find the tree object of this commit from the object database. Finally, Git uses the references contained in the tree object to find all other tree and blob objects in the object database and replicates them as directories and files on the file system. This allows you to restore exactly the project state that was saved with the commit at the time.

2.2.6. The Graph Structure

Because each commit stores its direct ancestors, a graph structure is created. More precisely, the arrangement of the commits creates a Directed Acyclic Graph (DAG). A graph consists of two core elements: the nodes and the edges connecting these nodes. In a directed graph, the edges are also characterized by a direction, which means that when you run the graph, you can only use the edges that point in the appropriate direction to move from one node to the next. The acyclic property rules out that you can find your way back to a node by any route through the graph. So you cannot move in a circle.⁠[28]

Most Git commands are used to manipulate the graph: to add/remove nodes or to change the relation of the nodes to each other. You’ll know you’ve reached an advanced level of Git competency when you’ve internalized this rather abstract concept, and when you’re working with branches on a daily basis, you always think of the graph behind them. Understanding Git at this level is the first and only real hurdle to mastering Git safely in everyday life.

The graph structure is derived from the object model, because each commit knows its direct ancestor (possibly several in the case of a merge commit). The commits form the nodes of this graph — the references to ancestors form the edges.

An example graph is shown in Figure 7, “A Commit Graph”. It consists of several commits, which are colored to make it easier to distinguish between their affiliations to different development branches. First, the commits A, B, C, and D were made. They form the main development branch. Commits E and F contain feature development, which was transferred to the main development branch with commit H. Commit G is a single commit that has not yet been integrated into the main development branch.

commit graph
Figure 7. A Commit Graph

One result of the graph structure is the cryptographically secured integrity of a repository. Git uses the SHA-1 sum of a commit to reference not only the contents of the project files at a given point in time, but also all commits executed up to that point, and their relationship to each other, i.e. the complete version history.

The object model makes this possible: each commit stores a reference to its ancestors. These references are then used to calculate the SHA-1 sum of the commit itself. So you get a different commit if you reference another ancestor.

Since the predecessor in turn references predecessors, and its SHA-1 sum depends on the predecessors, and so on, this means that the complete version history is implicitly encoded in the commit ID. Implicit here means: If even one bit of a commit changes anywhere in the version history, then the SHA-1 sum of subsequent commits, especially the topmost one, is no longer the same. The SHA-1 sum doesn’t say anything detailed about the version history, though; it’s just a checksum of it.

2.2.6.1. References: Branches and Tags

However, there is not much you can do with a pure commit graph. To reference (i.e., work with) a node, you need to know its name, which is the SHA-1 sum of the commit. In everyday use, however, you rarely use the SHA-1 sum of a commit directly, but instead use symbolic names, called references, which Git can resolve to the SHA-1 sum.

Git basically offers two types of references, branches and tags. These are pointers to a commit graph, which are used to mark specific nodes. Branches have a “moving” character, meaning that they move up as new commits are added to the branch. Tags, on the other hand, are static in nature, and mark important points in the commit graph, such as releases.

Figure 8, “Example of a Commit Graph with Branches and Tags” shows the same commit graph with the master, HEAD, feature, and bugfix branches. And the v0.1 and v0.2 tags.

commit graph with refs
Figure 8. Example of a Commit Graph with Branches and Tags

3. Practical Version Control

The following chapter introduces all the essential techniques you’ll use in your daily work with Git. In addition to a more detailed description of the index and how to restore old versions, the focus is on working effectively with branches.

3.1. References: Branches and Tags

In the CVS/SVN environment, “Branch” and “Merge” are often a book with seven seals for newcomers, but for experts they are a regular cause for hair-raising. In Git, branching and merging are commonplace, simple, transparent, and fast. It’s common for a developer to create multiple branches and perform multiple merges in one day.

The tool Gitk is helpful in order not to lose the overview of several branches. With gitk --all you show all branches. The tool visualizes the commit graph explained in the previous section. Each commit represents one line. Branches are displayed as green labels, tags as yellow pointers. For more information, see Sec. 3.6.2, “Gitk”.

gitk basic
Figure 9. The sample repository from Ch. 2, The Basics. For illustration purposes, the second commit has been tagged v0.1.

Because branches in Git are “cheap” and merges are easy, you can afford to use branches excessively. Want to try something, prepare a small bug fix, or start with an experimental feature? You can create a new branch for each of these. You want to test if one branch is compatible with the other? Merge them together, test everything, then delete the merge again and continue developing. This is common practice among developers using Git.

First, let’s look at references in general. References are nothing more than symbolic names for the hard to remember SHA-1 sums of commits.

These references are stored in .git/refs/. The name of a reference is determined by the file name, and the target is determined by the contents of the file. For example, the master branch you have been working on all along looks like this:

$ cat .git/refs/heads/master
89062b72afccda5b9e8ed77bf82c38577e603251

If Git needs to manage a lot of references, they may not be stored as files under .git/refs/. Instead, Git creates a container that contains packed references (Packed Refs): One line per reference with name and SHA-1 sum. This makes sequential resolution of many references faster. Git commands search for branches and tags in the .git/packed-refs file if the corresponding .git/refs/<name> file does not exist.

Under .git/refs/ there are several directories that represent the “type” of reference. There is no fundamental difference between these references, only when and how they are used. The references you will use most often are branches. They are stored under .git/refs/heads/. Heads refers to what is sometimes called a “tip” in other systems: The latest commit on a development branch.⁠[29] Branches move up when you make commits on a branch, so they remain at the top of the version history.

commit
Figure 10. A branch always references the most recent commit

Branches in other developers' repositories (e.g. the master branch of the official repository), so-called remote tracking branches, are stored under .git/refs/remotes/ (see Sec. 5.2.2, “Remote-Tracking-Branches”). Tags, static references, which are mostly used for versioning, are stored under .git/refs/tags/ (see Sec. 3.1.3, “Tags — Marking Important Versions”).

3.1.1. HEAD and Other Symbolic References

Eine Referenz, die Sie selten explizit, aber ständig implizit benutzen, ist HEAD. Sie referenziert meist den gerade ausgecheckten Branch, hier master:

One reference that you rarely use explicitly, but always implicitly, is HEAD. It usually refers to the branch you just checked out, in this case master:

$ cat .git/HEAD
ref: refs/heads/master

HEAD can also point directly to a commit if you type git checkout <commit-id>. However, you are then in so-called detached-head mode, in which commits may get lost, see also Sec. 3.2.1, “Detached HEAD”.

The HEAD determines which files are found in the working tree, which commit becomes the predecessor when a new one is created, which commit is displayed by git show, and so on. When we speak of “the current branch”, we mean the HEAD in a technically correct sense.

The simple commands log, show, and diff take HEAD as their first argument, without any further arguments. The output of git log is the same as the output of git log HEAD, and so on — this applies to most commands that operate on a commit if you don’t specify one explicitly. HEAD is thus similar to the shell variable PWD, which specifies “where you are”.

When we talk about a commit, a command usually doesn’t care whether you specify the commit ID in full or in abbreviated form, or whether you access the commit by reference, such as a tag or branch. However, such a reference may not always be unique. What happens if there is a branch master and a tag with the same name? Git checks if the following references exist:

  • .git/<name> (mostly only useful for HEAD or similar)

  • .git/refs/<name>

  • .git/refs/tags/<name>

  • .git/refs/heads/<name>

  • .git/refs/remotes/<name>

  • .git/refs/remotes/<name>/HEAD

Git will take the first matching reference it finds. So you should always give tags a unique scheme so that they don’t get confused with branches. This way you can address branches directly by name instead of heads/<name>.

Especially important are the suffixes ^ and ~<n>. The syntax <ref>^ indicates the direct ancestor of <ref>. This does not always have to be unique: If two or more branches were merged, the merge commit has several direct ancestors. <ref>^ or <ref>^1 then denotes the first direct ancestor, <ref>^2 the second, and so on.⁠[30] So the syntax HEAD^^ means “the two-level previous direct ancestor of the current commit”. Note that ^ may have a special meaning in your shell and you may need to protect it with quotes or a backslash.

relative refs
Figure 11. Relative References, ^ and ~<n>

The syntax <ref>~<n> is equivalent to repeating ^ n times: HEAD~10 thus denotes the tenth direct predecessor of the current commit. Note: This does not mean that only eleven commits are stored between HEAD and HEAD~10: Since ^ only follows the first string in any merge, the eleven commits stored between the two references, and all the other commits integrated by a merge, are the same. The syntax is documented in the git-rev-parse(1) man page in the “Specifying Revisions” section.

3.1.2. Managing Branches

A branch is created in Git in no time. All Git needs to do is identify the currently checked out commit and store the SHA-1 sum in the .git/refs/heads/<branch-name> file.

$ time git branch neuer-branch
git branch neuer-branch  0.00s user 0.00s system 100% cpu 0.008 total

The command is so fast because (unlike other systems) no files need to be copied and no additional metadata needs to be stored. Information about the structure of the version history can always be derived from the commit that a branch references and its ancestors.

Here is an overview of the most important options:

git branch [-v]

Lists local branches. The currently checked-out branch is marked with an asterisk. You can also use -v to display the commit IDs to which the branches point and the first line of the description of the corresponding commits.

$ git branch -v
  maint  65f13f2 Start 1.7.5.1 maintenance track
* master 791a765 Update draft release notes to 1.7.6
  next   b503560 Merge branch _master_ into next
  pu     d7a491c Merge branch _js/info-man-path_ into pu
git branch <branch> [<ref>]

Creates a new branch <branch> pointing to commit <ref> (<ref> can be the SHA-1 sum of a commit, another branch, etc.). If you do not specify a reference, this is HEAD, the current branch.

git branch -m <new-name>

git branch -m <old-name> <new-name>

In the first form the current branch is renamed to <new-name>. In the second form <old-name> is renamed to <new-name>. The command fails if this would overwrite another branch.

$ git branch -m master
fatal: A branch named 'master' already exists.

If you rename a branch, Git will not display a message. So you can check afterwards to make sure the renaming was successful:

$ git branch
* master
  test
$ git branch -m test pu/feature
$ git branch
* master
  pu/feature
git branch -M …​

Like -m, except that a branch is also renamed if it overwrites another branch. Attention: Commits of the overwritten branch may be lost!

git branch -d <branch>

Delete <branch>. You can specify several branches at once. Git refuses to delete a branch if it is not yet fully integrated into its upstream branch, or, if it does not exist, into HEAD, the current branch. (For more on upstream branches, see Sec. 5.3.2, “git pull”).

git branch -D …​

Deletes a branch, even if it contains commits that have not yet been integrated into the upstream or current branch. Note: These commits may be lost unless they are referenced differently.

3.1.2.1. Changing Branches: Checkout

You can change branches with git checkout <branch>. If you create a Branch and want to switch directly to it, use git checkout -b <branch>. The command is equivalent to git branch <branch> && git checkout <branch>.

What happens during a checkout? Each branch references a commit, which in turn references a tree, that is, the image of a directory structure. A git checkout <branch> now resolves the reference <branch> to a commit and replicates the commit’s tree to the index and to the working tree (i.e., the filesystem).

Since Git knows which version of files are currently in the index and working tree, only the files that differ on the current and new branches need to be checked out.

Git makes it hard for users to lose information. Therefore, a checkout is more likely to fail than overwrite any unsaved changes in a file. This happens in the following two cases:

  • The checkout would overwrite a file in the working tree that contains changes. Git will display the following error message: error: Your local changes to the following files would be overwritten by checkout: file.

  • The checkout would overwrite an untracked file, i.e. a file that is not managed by Git. Git then aborts with the error message: error: The following untracked working tree files would be overwritten by checkout: file.

If, however, changes are stored in the working tree or index that are compatible with both branches, a checkout takes over these changes. This would look like this, for example:

$ git checkout master
A   neue-datei.txt
Switched to branch master

This means that the file new-file.txt was added, which does not exist on either branch. So since no information can be lost here, the file is simply transferred. The message: A new-file.txt reminds you which files you should still take care of. A stands for added, D for deleted and M for modified.

If you’re sure you don’t need your changes anymore, you can use git checkout -f to ignore the error messages and run the checkout anyway.

If you want to keep the changes and change the branch (e.g., interrupt your work and fix a bug on another branch), git stash will help (Sec. 4.5, “Outsourcing Changes — Git Stash”).

3.1.2.2. Branch Naming Conventions

In principle, you can name branches almost arbitrarily. Exceptions are spaces, some special characters with special meaning for Git (e.g. *, ^, :, ~), as well as two consecutive dots (..) or a dot at the beginning of the name.⁠[31]

It makes sense to always enter branch names completely in lower case letters. Since Git manages branch names under .git/refs/heads/ as files, it is essential that you use upper and lower case.

You can group branches into “namespaces” by using a / as a separator. Branches that are related to the translation of a software can then be named e.g. i18n/german, i18n/english etc. If several developers share a repository, you can also create “private” branches under <username>/<topic>. These namespaces are represented by a directory structure, so that a directory <username>/ with the branch file <topic> is created under .git/refs/heads/.

The main development branch of your project should always be called master. Bugfixes are often managed on a branch maint (short for “maintenance”). The next release is usually prepared for next. Features that are still in an experimental state should be developed in pu (for “proposed updates”) or in pu/<feature>. For a more detailed description of how to use branches to structure development and organize release cycles, see Ch. 6, Workflows on Workflows.

3.1.2.3. Deleted Branches and “Lost” Commits

Commits each have one or more predecessors. Therefore, you can walk through the commit graph “directed”, that is, from newer to older commits, until you reach a root commit.

It’s not the other way around: if a commit knew its successor, that version would have to be stored somewhere. This would change the SHA-1 sum of the commit, and the successor would have to reference the corresponding new commit, which would give it a new SHA-1 sum, so the predecessor would have to be changed, and so on. So Git can only go through the commits from a named reference (such as a branch or HEAD) in the direction of earlier commits.

Therefore, if the “top” of a branch is deleted, the topmost commit is no longer referenced (in Git jargon: unreachable). As a result, the predecessor is no longer referenced, and so on, until the next commit comes along that is referenced in some way (either by a branch, or by having a successor that is itself referenced by a branch).

So when you delete a branch, the commits on that branch are not deleted, they are just “lost”. Git simply doesn’t find them anymore.

However, they will still be present in the object database for a while.⁠[32] So you can easily restore a branch by explicitly specifying the previous (and supposedly deleted) commit as a reference:

$ git branch -D test
Deleted branch test (was e32bf29).
$ git branch test e32bf29

Another way to retrieve deleted commits is the reflog (see Sec. 3.7, “Reflog”).

3.1.3. Tags — Marking Important Versions

SHA-1 sums are a very elegant solution to describe versions decentrally, but they are semantically poor and unwieldy for humans. Unlike linear revision numbers, commit IDs alone tell us nothing about the order of versions.

During the development of software projects, different “important” versions need to be marked so that they can be easily found in the repository. The most important ones are usually those that are released, called releases. Release candidates are also often marked in this way, i.e. versions that form the basis for the next version and are checked for critical bugs in the course of quality assurance without adding new features. Depending on the project and development model, there are different conventions for marking releases and procedures for preparing and publishing them.

In the open source area, two versioning schemes have become established: the classic major/minor/micro versioning scheme and, more recently, date-based versioning. With major/minor/micro versioning, which is used e.g. with the Linux kernel and also Git, a version is identified by three (often four) numbers: 2.6.39 or 1.7.1. With date-based versioning, on the other hand, the designation is derived from the time of the release, e.g.: 2011.05 or 2011-05-19. This has the great advantage that the age of a version is easily identifiable.⁠[33]

Git offers tags (“labels”) that can be used to mark any Git object — usually commits — to highlight prominent states in its development history. Like branches, tags are implemented as references to objects. Unlike branches, however, tags are static, meaning that they are not moved when new commits are added, and always point to the same object. There are two types of tags: annotated and lightweight. Annotated tags are tagged with metadata, such as author, description, or GPG signature. Lightweight tags, on the other hand, “simply” point to a specific Git object. For both types of tags, Git creates references under .git/refs/tags/ or .git/packed-refs. The difference is that for each annotated tag, Git creates a special Git object — a tag object — in the Object Database to store the metadata and SHA-1 sum of the selected object, while a Lightweight tag points directly to the selected object. Figure 12, “The Tag Object” shows the contents of a tag object; compare also the other git objects, Figure 4, “Git Objects”.

tags
Figure 12. The Tag Object

The tag object shown has both a size (158 bytes) and a SHA-1 sum. It contains the name (0.1), the object type and the SHA-1 sum of the referenced object as well as the name and e-mail of the author, which is called tagger in Git jargon. In addition, the tag contains a tag message that describes the version, for example, and optionally a GPG signature. In the Git project, for example, a tag message consists of the current version designation and the signature of the maintainer.

In the following, let’s first look at how you manage tags locally. Sec. 5.8, “Exchanging Tags” describes how you exchange tags between repositories.

3.1.3.1. Managing Tags

You can manage tags with the command git tag. Without arguments it shows all existing tags. Depending on the size of the project, it is worth limiting the output with the -l option and a corresponding pattern. With the following command you display all variants of version 1.7.1 of the git project, i.e. both the release candidates with the addition -rc* and the (four-digit) maintenance releases:

$ git tag -l v1.7.1*
v1.7.1
v1.7.1-rc0
v1.7.1-rc1
v1.7.1-rc2
v1.7.1.1
v1.7.1.2
v1.7.1.3
v1.7.1.4

The content of a tag is provided by git show:

$ git show 0.1 | head
tag 0.1
Tagger: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Wed Mar 23 16:52:03 2011 +0100

Erste Veröffentlichung

commit e2c67ebb6d2db2aab831f477306baa44036af635
Author: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Sat Jan 8 20:30:58 2011 +0100

Gitk presents tags as yellow, arrow-like boxes that are clearly distinguishable from the green, rectangular branches:

tag screenshot
Figure 13. Tags in Gitk
3.1.3.2. Lightweight Tags

To add a lightweight tag to the HEAD, pass the desired name to the command (in this example, to mark an important commit)

$ git tag api-aenderung
$ git tag
api-aenderung

To add a lightweight tag to the HEAD, pass the desired name to the command (in this example, to mark an important commit)

$ git tag pre-regression HEAD~23
$ git tag
api-aenderung
pre-regression

Tags are unique — if you try to recreate a tag, Git will abort with an error message:

$ git tag pre-regression
fatal: tag 'pre-regression' already exists
3.1.3.3. Annotated Tags

Annotated tags are created with the -a option. As with git commit, an editor will open and allow you to write the tag message. Or you can pass the tag message with the option -m — in which case the option -a is redundant:

$ git tag -m "Zweite Veröffentlichung" 0.2
3.1.3.4. Signed Tags

To verify a signed tag, use the -v (verify) option:

$ git tag -v v1.7.1
object d599e0484f8ebac8cc50e9557a4c3d246826843d
type commit
tag v1.7.1
tagger Junio C Hamano <gitster@pobox.com> 1272072587 -0700

Git 1.7.1
gpg: Signature made Sat Apr 24 03:29:47 2010 CEST using DSA key ID F3119B9A
gpg: Good signature from "Junio C Hamano <junkio@cox.net>"
...

Of course, this assumes that you have both GnuPG installed and that you have already imported the signer’s key.

In order to sign tags yourself, you must first set the preferred key:

$ git config --global user.signingkey <GPG-Key-ID>

Now you can create signed tags with the -s (sign) option:

$ git tag -s -m "Dritte Veröffentlichung" 3.0
3.1.3.5. Deleting and Overwriting Tags

Use the -d and -f options to delete or overwrite tags:

$ git tag -d 0.2
Deleted tag '0.2' (was 4773c73)

The options should be used with caution, especially if you use the tags not only locally, but also publish them. Under certain circumstances, tags may indicate different commits — version 1.0 in repository X points to a different commit than version 1.0 in repository Y. But see also Sec. 5.8, “Exchanging Tags”.

3.1.3.6. Lightweight vs. Annotated Tags

For public versioning of software, annotated tags are generally more useful. Unlike lightweight tags, they contain meta-information that shows who created a tag and when — the person contact is unique. Users of software can also find out who has approved a particular version. For example, it’s clear that Junio C. Hamano has tagged Git version 1.7.1 — so it has his “seal of approval”. The statement also confirms the cryptographic signature, of course. Lightweight tags, on the other hand, are particularly suitable for applying local markers, for example to identify certain commits relevant to the current task. However, make sure not to upload such tags to a public repository (see Sec. 5.8, “Exchanging Tags”), as they might spread. If you only use the tags locally, you can also delete them once they have fulfilled their service (see above).

3.1.3.7. Non-Commit Tags

With tags you can mark any Git object, not only commits, but also trees, blobs and even tag objects themselves! The classic example is to put the GPG public key used by the maintainer of a project to sign tags in a blob.

For example, the tag junio-gpg-pub in the Git repository of Git points to the key of Junio C. Hamano:

$ git show junio-gpg-pub | head -5
tag junio-gpg-pub
Tagger: Junio C Hamano <junkio@cox.net>
Date:   Tue Dec 13 16:33:29 2005 -0800

GPG key to sign git.git archive.

Because this blob object is not referenced by any tree, the file is virtually separate from the actual code, but still exists in the repository. In addition, a tag on a “lonely” blob is necessary so that it is not considered unreachable and is deleted during repository maintenance.⁠[34]

To use the key, proceed as follows:

$ git cat-file blob junio-gpg-pub | gpg --import
gpg: key F3119B9A: public key "Junio C Hamano <junkio@cox.net>" imported
gpg: Total number processed: 1
gpg:               imported: 1

You can then verify all tags in the Git-via-Git repository, as described above.

3.1.3.8. Describing Commits

Tags are very useful for describing any commit “better”. The git describe command gives a description consisting of the most recent tag and its relative position in the commit graph. Here’s an example from the git project: we describe a commit with the SHA-1 prefix 28ba96a, which is located in the commit graph seven commits after version 1.7.1:

describe screenshot
Figure 14. The commit to be described highlighted in gray
$ git describe --tags
v1.7.1-7-g28ba96a

The output of git describe is formatted as follows:

<tag>-<position>-g<SHA-1>

The tag is v1.7.1; the position indicates that there are seven new commits between the tag and the described commit.⁠[35] The g before the ID indicates that the description is derived from a Git repository, which is useful in environments with multiple version control systems. By default, git describe only searches for annotated tags, but the --tags option extends the search to include lightweight tags.

The command is very useful because it translates a content-based identifier into something useful for humans: v1.7.1-7-g28ba96a is much closer to v1.7.1 than v1.7.1-213-g3183286. This allows you to compile the output directly into the software in a way that makes sense, just like in the Git project:

$ git describe
v1.7.5-rc2-8-g0e73bb4
$ make
GIT_VERSION = 1.7.5.rc2.8.g0e73bb
...
$ ./git --version
git version 1.7.5.rc2.8.g0e73bb

This way a user knows roughly what version he has, and can track which commit the version was compiled from.

3.2. Restoring Versions

The goal of version control software is not just to examine changes between commits. Above all, it is also important to restore older versions of a file or entire directory trees, or to undo changes. In Git, the commands checkout, reset, and revert are particularly useful for this.

The Git command checkout can not only change branches, but also restore files from previous commits. The syntax is general:

git checkout [-f] <referenz> -- <muster>

checkout resolves the given reference (and HEAD if missing) to a commit and extracts all files matching <pattern> to the working tree. If <pattern> is a directory, it refers to all files and subdirectories in it. Unless you explicitly specify a pattern, all files are checked out. Changes to a file are not simply overwritten, unless you specify the -f option (see above). HEAD is also set to the corresponding commit (or branch).

However, if you specify a pattern, checkout overwrites this file(s) without prompting. So to discard all changes to <file>, enter git checkout — <file>: Git then replaces <file> with the version in the current branch. This way, you can also reconstruct the older state of a file:

$ git checkout ce66692 -- <datei>

The double minus separates the patterns from the options or arguments. It is not necessary, however: If there are no branches or other references with that name, Git will try to find one. So the separation only makes it clear that you want to recover the file(s) in question.

To view the contents of a file from a particular commit without checking it out, use the following command:

$ git show ce66692:<file>

Use --patch or -p to call git checkout in interactive mode. The procedure is the same as for git add -p (see Sec. 2.1.2, “Creating Commits Step by Step”), but here you can reset hunks of a file step-by-step.

3.2.1. Detached HEAD

If you check out a commit that is not referenced by a branch, you are in detached-HEAD mode:

$ git checkout 3329661
Note: checking out '3329661'.

You are in 'detached HEAD' state. You can look around, make
experimental changes and commit them, and you can discard any
commits you make in this state without impacting any branches
by performing another checkout.

If you want to create a new branch to retain commits you create,
you may do so (now or later) by using -b with the checkout command
again. Example:

  git checkout -b new_branch_name

HEAD is now at 3329661... Add LICENSE file

As the explanation, which you can hide by setting the option advice.detachedHead to false, already warns you, changes you make now will be lost in case of doubt: Since your HEAD is the only direct reference to the commit after that, further commits are not directly referenced by a branch (they are unreachable, see above).

So working in detached HEAD mode is especially useful if you want to try something quickly: Has the bug actually already appeared in commit 3329661? Was there actually a README file at the time of 3329661?

If you want to do more than just look around from the commit you checked out, for example, to see if your software already had a particular bug at the time, you should create a branch:

$ git checkout -b <temp-branch>

Then you can make commits as usual without fear of losing them.

3.2.2. Rolling Back Commits

If you want to undo all the changes a commit makes, the revert command helps. However, it does not delete a commit, but creates a new one whose changes are exactly the opposite of the other commit: Deleted lines become added lines, and vice versa.

Suppose you have a commit that creates a LICENSE file. The patch of the corresponding commit looks like this:

--- /dev/null
+++ b/LICENSE
@@ -0,0 +1 @@
+This software is released under the GNU GPL version 3 or newer.

Now you can undo the changes:

$ git revert 3329661
Finished one revert.
[master a68ad2d] Revert "Add LICENSE file"
 1 files changed, 0 insertions(+), 1 deletions(-)
 delete mode 100644 LICENSE

Git creates a new commit on the current branch — unless you specify otherwise — with the description Revert "<Old commit message>". This commit looks like this:

$ git show
commit a68ad2d41e9219383449d703521573477ee7da48
Author: Julius Plenz <feh@mali>
Date:   Mon Mar 7 05:28:47 2011 +0100

    Revert "Add LICENSE file"

    This reverts commit 3329661775af3c52e6b2ad7e9e7e7d789ba62712.

diff --git a/LICENSE b/LICENSE
deleted file mode 100644
index 3fd9c20..0000000
--- a/LICENSE
+++ /dev/null
@@ -1 +0,0 @@
-This software is released under the GNU GPL version 3 or newer.

Note that from now on, both the commit and the revert will appear in the version history of a project. You therefore only undo the changes, but do not delete any information from the version history.

You should therefore only use revert if you need to undo a change that has already been published. However, if you are developing locally in a separate branch, it makes more sense to delete these commits completely (see the following section on reset and the topic Rebase, Sec. 4.1, “Moving commits — Rebase”).

If you want to perform a rebase, but not for all changes to the commit, but only for those to a file, you can use this procedure:

$ git show -R 3329661 -- LICENSE | git apply --index
$ git commit -m 'Revert change to LICENSE from 3329661'

The git show command prints the changes from commit 3329661 that apply to the LICENSE file. The -R option causes the unified-diff format to be displayed “the other way around” (reverse). The output is passed to git apply to make the changes to the file and index. The changes are then checked in.

Another way to undo a change is to check out a file from a previous commit, add it to the index, and check it in again:

$ git checkout 3329661 -- <datei>
$ git add <datei>
$ git commit -m 'Reverting <datei> to resemble 3329661'

3.2.3. Reset and the Index

If you are deleting a commit completely, not just undoing it, use git reset. The reset command sets the HEAD (and thus the current branch), and optionally the index and working tree, to a particular commit. The syntax is git reset [<option>] [<commit>].

The most important types of resets are the following:

-⁠-⁠soft

Resets only the HEAD; index and working tree remain unaffected.

-⁠-⁠mixed

Default setting if you do not specify an option. Sets HEAD and index to the specified commit, but the files in the working tree are not affected.

-⁠-⁠hard

Synchronizes HEAD, Index and Working Tree and sets them to the same commit. Changes in the working tree may be lost!

If you call git reset without any options, this is equivalent to a git reset --mixed HEAD. We’ve already seen this command: Git sets the current HEAD to HEAD (so it doesn’t change it) and the index to HEAD — in this case, the changes you added before are lost.

The possible uses of this command are many and varied and will reappear in the various command sequences. Therefore it is important to understand the functionality, even if there are sometimes alternative commands that have the same effect.

Suppose you have made two commits to master that you actually want to move to a new branch to work on further. The following command sequence creates a new branch pointing to HEAD, and then resets HEAD and the current branch master two commits. Then check out the new branch <new-feature>.

$ git branch <neues-feature>
$ git reset --hard HEAD^^
$ git checkout <neues-feature>

Alternatively, the following sequence has the same effect: you create a Branch <new-feature> that points to the current commit. Then you delete master and re-create it so that it points to the second predecessor of the current commit.

$ git checkout -b <new-feature>
$ git branch -D master
$ git branch master HEAD^^
3.2.3.1. Using Reset

With reset you do not delete any commits, but only move references. As a result, the commits that are no longer referenced are lost, and are therefore deleted (unreachable). So you can use reset to delete only the topmost commits on a branch, not arbitrary commits “somewhere in the middle,” as this would destroy the commit graph. (For the somewhat more complicated deletion of commits “in the middle,” see rebase, Sec. 4.1, “Moving commits — Rebase”).

Git always stores the original HEAD under ORIG_HEAD. So if you have performed a reset by mistake, use git reset --hard ORIG_HEAD to undo it (even if the commit was supposedly deleted). However, this does not affect lost changes to the working tree (which you have not yet checked in) — they are deleted irrevocably.

The result from above (moving two commits to a new branch) can also be achieved this way:

$ git reset --hard HEAD^^
$ git checkout -b <new-feature> ORIG_HEAD

A common use of reset is to discard changes on a test basis. You want to try a patch? Add some debugging output? Change a few constants? If you don’t like the result, a git reset --hard deletes all changes to the working tree.

You can also use reset to “make your version history nice.” For example, if you have a few commits on a branch <feature> based on master, but they are not well structured (or much too large), you can create a branch <reorder-feature> and pack all changes into new commits:

$ git checkout -b <reorder-feature> <feature>
$ git reset master
$ git add -p
$ git commit
$ ...

The command git reset master sets index and HEAD to the state of master. However, your changes in the working tree are preserved, i.e. all changes that distinguish the branch <feature> from master are now only contained in the files in the working tree. Now you can add the changes incrementally using git add -p and package them into (several) handy commits.⁠[36]

Suppose you are working on a change and want to check it in temporarily (to continue working on it later). You can then use the following commands:

$ git commit -m 'feature (noch unfertig)'
(später)
$ git reset --soft HEAD^
(weiterarbeiten)

The command git reset --soft HEAD^ resets the HEAD one commit, but leaves the index and the working tree untouched. So all changes from your temporary commit are still in the index and working tree, but the actual commit is lost. You can now make further changes and create a new commit later. Similar functionality is provided by the --amend option for git commit, as well as the git stash command, which is explained in Sec. 4.5, “Outsourcing Changes — Git Stash”.

3.3. Merging Branches

Merging branches is called merging in Git; the commit that merges two or more branches together is called a merge commit.

Git provides the merge subcommand, which allows you to merge one branch into another. This means that any changes you make to the branch will be reflected in the current one.

Note that the command integrates the specified branch into the currently checked-out branch (i.e., HEAD). The command therefore only needs one argument:

$ git merge <branch-name>

If you handle your branches carefully, there should be no problems with merging. If there are, then this section also presents strategies for resolving merge conflicts.

First, we will look at an object-level merge process.

3.3.1. Two-Branches Merge

The two branches, topic and master, that you want to merge, each reference the most recent commit in a chain of commits (F and D), and these two commits in turn reference a tree (corresponding to the top-level directory of your project).

First, Git calculates a so-called merge base, that is, a commit that both of the commits to be merged have as common ancestors. Usually there are several such bases — in the diagram below, A and B — and then the most recent one (which has the other bases as ancestors) is used.⁠[37] In simple terms, this is the commit where the branches diverged (i.e., B).

Now, if you want to merge two commits (D and F to M), then the trees referenced by the commits must be merged.

merge base commit
Figure 15. Merge base and merge commit

Git does this as follows:⁠[38] If a tree entry (another tree or a blob) is the same in both commits, then that very tree entry will be taken over in the merge commit. This happens in two cases:

  1. A file has not been changed by either commit, or a subdirectory does not contain a changed file: In the first case, the blob SHA 1 sum of this file is the same in both commits. In the second case, the same tree object is referenced by both commits. The referenced blob or tree is therefore the same as the one referenced in the merge base.

  2. A file was changed on both sides and equivalently (same blobs). This happens, for example, if all changes to a file were copied from one branch using git cherry-pick (see Sec. 3.5, “Taking over Individual Commits: Cherry Picking”). The referenced blob is then not the same as in the merge base.

If a tree entry disappears in one of the commits, but is still present in the other, and is the same as in the merge base, then it is not taken over. This is equivalent to deleting a file or directory if no changes have been made to the file on the other side. Similarly, if a commit brings a new tree entry, it is copied to the merge tree.

Now what happens if a file from the commits has different blobs, that is, the file has been changed at least on one side? In the event that one of the blobs is the same as in the merge base, only one side of the file has been changed, so Git can simply adopt those changes.

However, if both blobs are different from the merge base, you might run into problems. First, Git tries to apply the changes on both sides.

A 3-way merge algorithm is usually employed for this purpose. Unlike the classic 2-way merge algorithm, which is used when you have two different versions A and B of a file and want to merge them, this 3-way algorithm involves a third version C of the file, extracted from the above merge base. Therefore, because a common ancestor of the file is known, the algorithm can in many cases better (that is, not only based on the line number or context) decide how to merge changes. In practice, so many trivial merge conflicts are already solved automatically without user intervention.

However, there are conflicts that no merge algorithm, no matter how good, can merge. This happens, for example, if the context in version A of the file was changed just before a change in file B, or, worse still, version A and B and C have different versions of a line.

Such a case is called a merge conflict. Git merges all the files as best it can, and then presents the conflicting changes to the user so they can manually merge them (and thus resolve the conflict) (see Sec. 3.4, “Resolving Merge Conflicts”).

Although it is basically possible to generate a syntactically correct resolution with an algorithm that is specially designed for the respective programming language, an algorithm cannot look beyond the semantics of the code, i.e., cannot grasp the meaning of the code. Therefore, a solution generated in this way would usually not make sense.

3.3.2. Fast Forward Merges: Fast Forwarding One Branch

The git merge command does not always create a merge commit. A trivial case, but one that does occur frequently, is the so-called fast-forward merge, i.e. a fast forward merge of the branch.

A fast forward merge occurs when a branch, for example topic, is the child of a second branch, master:

ff before
Figure 16. Before the fast forward merge

A simple git merge topic in Branch master now causes master to simply be moved forward — no merge commit is created.

ff after
Figure 17. After the fast forward merge — no merge commit was created

Of course, such a behavior only works if the two branches have not diverged, i.e. if the merge base of both branches is one of the two branches itself, in this case master.

This behavior is often desirable:

  1. You want to integrate upstream changes, that is, changes from another Git repository. You typically use a command like git merge origin/master to do this. A git pull will also perform a merge. To learn how to merge changes between git repositories, see Ch. 5, Distributed Git.

  2. You want to add an experimental branch. Because it’s quick and easy to create branches in Git, it’s a good idea to start a new branch for each feature. If you’ve tried something experimental on a branch and want to integrate it without being able to tell when it’s “time to integrate”, you can do so by fast-forwarding.

With the options --ff-only and --no-ff you can adjust the merge behavior. If you use the first option and the branches cannot be merged using fast-forward, Git will abort with an error message. The second option forces Git to create a merge commit even though fast forward would have been possible.

There are different opinions on whether changes should always be integrated via fast-forward or whether it is better to create a merge commit, although this is not absolutely necessary. The results are the same in both cases: Changes from one branch are integrated into another.

However, when you create a Merge-Commit, the integration of a feature becomes clear. Consider the following two excerpts from the version history of a project:

ff no ff vergleich
Figure 18. Integration of a feature with and without fast forward

In the above case, you cannot easily see which commits were previously developed in branch sha1-caching, that is, they have to do with a specific feature of the software.

In the lower version, however, you can see at first glance that there were exactly four commits on that branch, and that it was then integrated. Since nothing was developed in parallel, the merge commit would in principle be unnecessary, but it does make the integration of the feature clear.

So instead of relying on the magic of git merge, it makes sense to create two aliases (see Sec. 1.3.1, “Git Aliases”) that force or forbid fast forward merge:

nfm = merge --no-ff     # no-ff-merge
ffm = merge --ff-only   #    ff-merge

An explicit merge commit is also helpful because you can undo it with a single command. This is useful, for example, if you have integrated a branch but it has bugs: If the code is running in production, it is often desirable to merge the entire change back in until the bug is fixed. Use for this:

git revert -m 1 <merge-commit>

Git then produces a new commit that reverses any changes made by the merge. The -m 1 option here specifies which “side” of the merge should be considered the mainline, or stable line of development: its changes are preserved. In the above example, -m 1 would cause the changes made by the four commits from branch sha1-caching, the second string of the merge, to be undone.

3.3.3. Merge Strategies

Git has five different merge strategies, some of which can be further adjusted by strategy options. You determine the strategy by -s, so a merge call is as follows:

git merge -s <strategy> <branch>

Some of these strategies can only merge two branches, others any number.

resolve

The resolve strategy can merge two branches using a 3-way merge technique. The newest (best) of all possible bases is used as the merge base. This strategy is fast and generally produces good results.

recursive

This is the standard strategy that Git uses to merge two branches. A 3-way merge algorithm is also used here. However, this strategy is more clever than resolve: If several merge bases exist, all of which have “equal rights,”⁠[39] then Git first merges these bases together, and then uses the result as the merge base for the 3-way merge algorithm. In addition to the fact that merges with file renames can be processed more easily as a result, a test run on the version history of the Linux kernel has shown that these strategies result in fewer merge conflicts than the resolve strategy. The strategy can be adapted by various options (see below).

octopus

Standard strategy when three or more branches are merged. In contrast to the two strategies mentioned above, the octopus strategy can only perform merges if no error occurs, i.e. if no manual conflict resolution is necessary. The strategy is especially designed to integrate many topic branches that are known to be compatible with the mainline (main development strand).

ours

Can merge any number of branches, but does not use a merge algorithm. Instead, the blobs or trees of the current branch (that is, the branch from which you entered git merge) are always used. This strategy is mainly used when you want to overwrite old developments with the current state of affairs.

subtree

Works like recursive, but the strategy does not compare the trees “on equal footing,” but tries to find the tree of one side as a subtree of the other side and only then merge them. This strategy is useful, for example, if you manage the Documentation/ subdirectory of your project in a separate repository. Then you can merge the changes from that repository into the master repository by using git pull -s subtree <documentation-repo> to apply the subtree strategy, which recognizes the contents of <documentation-repo> as a subdirectory of the master repository and applies the merge process only to that subdirectory. This topic is discussed in more detail in Sec. 5.11, “Managing Subprojects”.

3.3.4. Options for the Recursive Strategy

The default strategy recursive knows several options that adjust the behavior especially with regard to conflict resolution. You specify them with the option -X; the syntax is:

git merge -s recursive -X <option> <branch>

If you only merge two branches, you do not need to explicitly specify the recursive strategy by -s recursive.

Since the strategy can only merge two branches, it is possible to speak of our version and theirs: our version is the checked-out branch in the merge process, while their version references the branch you want to integrate.

ours

If a merge conflict occurs that would normally need to be resolved manually, our version is used instead. The strategy option is different from ours, however, because it ignores any changes made by the other side(s). The ours option, on the other hand, takes all changes made by our side and the other side, and only gives priority in the event of a conflict and only at the points of conflict on our side.

theirs

Like ours, except that the opposite is true: in case of conflicts, their version is preferred.

ignore-space-change, ignore-all-space, ignore-space-at-eol

Since whitespace does not play a syntactic role in most languages, these options allow you to tell Git to try to resolve a merge conflict automatically if whitespace is not important. A common use case is when an editor or IDE has automatically reformatted source code.

The option ignore-space-at-eol ignores whitespace at the end of the line, which is especially helpful if both sides use different line-end conventions (LF/CRLF). If you specify ignore-space-change, whitespace is also treated as a pure separator: Thus, when comparing a line, it is irrelevant how many spaces or tabs are in one place — indented lines remain indented, and separated words remain separated. The option ignore-all-space ignores any whitespace.

This is the general strategy: If their version brings in only whitespace changes covered by the specified option, they are ignored and our version is used; if they bring in further changes and our version has only whitespace changes, their version is used. However, if both sides have not only whitespace changes, there is still a merge conflict.

In general, after a merge that you could only solve by using one of these options, it is recommended to normalize the corresponding files again, i.e. to make the line endings and indentations uniform.

subtree=<tree>

Similar to the subtree strategy, but an explicit path is specified here. Similar to the above example, you would use:

git pull -Xsubtree=Documentation <documentation-repo>

3.4. Resolving Merge Conflicts

As already described, some conflicts cannot be resolved by algorithms — in this case manual rework is necessary. Good team coordination and fast integration cycles can minimize major merge conflicts. But especially in early development, when possibly the internals of a software are changed instead of adding new features, conflicts can occur.

If you are working in a larger team, the developer who has done most of the work on the conflicted code is usually responsible for finding a solution. However, such a conflict resolution is usually not difficult if the developer has a good overview of the software in general and of his piece of code and its interaction with other parts in particular.

We will go through the solution of a merge conflict using a simple example in C. Take a look at the following output.c file:

int i;

for(i = 0; i < nr_of_lines(); i++)
    output_line(i);

print_stats();

The piece of code goes through all lines of an output and outputs them one after the other. Finally it returns a small statistic.

Now two developers change something in this code. The first one, Axel, writes a function that wraps the lines before they are output and replaces output_line in the above piece of code with his improved version output_wrapped_line:

int i;
int tw = 72;

for(i = 0; i < nr_of_lines(); i++)
    output_wrapped_line(i, tw);

print_stats();

The second developer, Beatrice, modifies the code so that her newly introduced configuration setting max_output_lines is honored and not too many lines are output:

int i;

for(i = 0; i < nr_of_lines(); i++) {
    if(i > config_get("max_output_lines"))
        break;
    output_line(i);
}

print_stats();

So Beatrice uses the “obsolete” version output_line, and Axel does not yet have the construct that checks the configuration setting.

Now Beatrice tries to transfer her changes on Branch B to the branch master, where Axel has already integrated his changes:

$ git checkout master
$ git merge B
Auto-merging output.c
CONFLICT (content): Merge conflict in output.c
Automatic merge failed; fix conflicts and then commit the result.

In the output.c file, Git now places conflict markers, highlighted in semi-bold at the bottom to indicate where changes overlap. There are two pages: The first is HEAD, i.e. the branch to which Beatrice wants to apply the changes — in this case master. The other side is the branch to be integrated — B. The two sides are separated by a series of equal signs:

int i;
int tw = 72;

<<<<<<< HEAD
for(i = 0; i < nr_of_lines(); i++)
    output_wrapped_line(i, tw);
=======
for(i = 0; i < nr_of_lines(); i++) {
    if(i > config_get("max_output_lines"))
        break;
    output_line(i);
}
>>>>>>>

print_stats();

It should be noted here that only the actual conflicting changes are objected to by Beatrice. Axel’s definition of tw above is accepted without any problems, although it is not yet available in Beatrice.

Beatrice must now resolve the conflict. This is done by first editing the file directly, modifying the code as it should be, and then removing the conflict markers. If Axel has documented in detail in his commit message⁠[40] how his new function works, this should be done quickly:

int i;
int tw = 72;

for(i = 0; i < nr_of_lines(); i++) {
    if(i > config_get("max_output_lines"))
        break;
    output_wrapped_line(i, tw);
}

print_stats();

Beatrice must then add the changes using git add. If no conflict markers remain in the file, Git will indicate that a conflict has been resolved. Finally, the result has to be checked in:

$ git add output.c
$ git commit

The commit message should definitely state how this conflict was resolved. It should also mention possible side effects on other parts of the program.

Normally, merge commits are “empty”, i.e., there is no diff output in git show (because the changes were caused by other commits). This is different in the case of a merge commit that resolves a conflict:

$ git show
commit 6e6c55810c884356402c078f30e45a997047058e
Merge: f894659 256329f
Author: Beatrice <beatrice@gitbu.ch>
Date:   Mon Feb 28 05:59:36 2011 +0100

    Merge branch 'B'

    * B:
      honor max_output_lines config option

    Conflicts:
        output.c

diff --cc output.c
index a2bd8ed,f4c8bec..e39e39d
--- a/output.c
+++ b/output.c
@@@ -1,7 -1,9 +1,10 @@@
  int i;
 +int tw = 72;

- for(i = 0; i < nr_of_lines(); i++)
+ for(i = 0; i < nr_of_lines(); i++) {
+     if(i > config_get("max_output_lines"))
+         break;
 -    output_line(i);
 +    output_wrapped_line(i, tw);
+ }

  print_stats();

This combined diff output differs from the usual unidiff format: There is not only one column with the markers for added (+), removed (-) and context or unchanged (), but two. So Git compares the result with both ancestors. The lines changed in the second column are exactly the same as Axel’s commit; the (semi-bold) changes in the first column are Beatrice’s commit including conflict resolution.

The default way, as seen above, is the following:

  1. Open conflicting file

  2. Resolve conflict, remove markers

  3. Mark file as “resolved” via git add

  4. Repeat steps one to three for all files where conflicts occurred

  5. Check in conflict solutions via git commit

If you don’t know how to resolve the conflict on an ad hoc basis (for example, if you want to hire the original developer to produce a conflict-free version of the code), you can use git merge --abort to abort the merge process — that is, to restore your working tree to the state it was in before you initiated the merge. This command also aborts a merge that you have already partially resolved. Attention: All changes that have not been checked in will be lost.

To get an overview of which commits caused changes to your file relevant to the merge conflict, you can use the command

git log --merge -p -- <file>

Git then lists the diffs of commits that have made changes to <file> since the merge base.

If you are in a merge conflict, a file with conflicts is stored in three stages: Stage one contains the version of the file in the merge base (that is, the common original version of the file), stage two contains the version from the HEAD (that is, the version from the branch into which you are merging). Finally, stage three contains the file in the version of the branch you are merging into (this has the symbolic reference MERGE_HEAD). The working tree contains the combination of these three stages with conflict markers. However, you can display these versions with git show :<n>:<file>:

$ git show :1:output.c
$ git show :2:output.c
$ git show :3:output.c

With a program specially developed for 3-way merges, however, it is much easier for you to keep an overview. The program looks at the three stages of a file, visualizes them accordingly and offers you options to move changes back and forth.

3.4.1. Help with Merging: Mergetool

In the case of non-trivial merge conflicts, a merge tool is recommended that visualizes the three stages of a file accordingly, thereby facilitating the resolution of the conflict.

Common IDEs and editors such as Vim and Emacs offer such a mode. There are also external tools such as KDiff3[41] and Meld.⁠[42] The latter visualizes particularly well how a file has changed between commits.

meld example
Figure 19. The example merge conflict, visualized in the merge tool “Meld”

You launch such a merge tool via git mergetool. Git will go through all the files that contain conflicts and display each one (when you press enter) in a merge tool. By default this is Vimdiff.⁠[43]

Such a program will usually display the three versions of a file — our page, their page, and the file merged as far as possible, including conflict markers — in three columns side by side, the latter sensibly in the middle. It is always essential that you make the change (conflict resolution) in the middle file, i.e. in the working copy. The other files are temporary and are deleted again when the merge tool is finished.

In principle, you can use any other tool. The mergetool script simply stores the three stages of the file with the corresponding file name and starts the diff tool on these three files. If it quits again, Git checks to see if there are any conflict markers left in the file — if not, Git will assume that the conflict was resolved successfully and automatically add the file to the index using git add. Finally, when you have finished processing all the files, you only need to make one commit call to seal the conflict resolution.

The merge.tool option determines which tool Git starts on the file. The following commands are already preconfigured, meaning that Git already knows in which order the program expects the arguments and which additional options need to be specified:

araxis bc3 codecompare deltawalker diffmerge diffuse
ecmerge emerge gvimdiff gvimdiff2 gvimdiff3 kdiff3
meld opendiff p4merge tkdiff tortoisemerge
vimdiff vimdiff2 vimdiff3 xxdiff

To use your own merge tool, you must set merge.tool to a suitable name, for example mymerge, and then at least specify the mergetool.mymerge.cmd option. The shell evaluates the expression stored in it, and the variables BASE, LOCAL, REMOTE, and MERGED, which are contained in the file with the conflict markers, are set to the corresponding temporary files. You can further configure the properties of your merge command, see the git-config(1) man page in the mergetool configuration section.

If you temporarily (not permanently) decide to use another merge program, specify it with the -t <tool> option. So to try Meld, during a merge conflict, simply type git mergetool -t meld — of course Meld must be installed for this to work.

3.4.2. Rerere: Reuse Recorded Resolution

Git has a relatively unknown (and poorly documented), but very helpful feature: Rerere, short for Reuse Recorded Resolution. You need to set the rerere.enabled option to true to have the command called automatically (note the d at the end of enabled).

The idea behind Rerere is simple but effective: Whenever a merge conflict occurs, Rerere automatically records a pre-image, an image of the conflict file including markers. In the case of the example above, it would look like this:

$ git merge B
Auto-merging output.c
CONFLICT (content): Merge conflict in output.c
Recorded preimage for 'output.c'
Automatic merge failed; fix conflicts and then commit the result.

If the conflict is resolved as above and the solution is checked in, Rerere saves the conflict resolution:

$ vim output.c
$ git add output.c
$ git commit
Recorded resolution for 'output.c'.
[master 681acc2] Merge branch 'B'

So far Rerere has not really helped. But now we can delete the merge commit completely (and are back to the situation before the merge). Then we execute the merge again:

$ git reset --hard HEAD^
HEAD is now at f894659 wrap output at 72 chars
$ git merge B
Auto-merging output.c
CONFLICT (content): Merge conflict in output.c
Resolved 'output.c' using previous resolution.
Automatic merge failed; fix conflicts and then commit the result.

Rerere notices that the conflict is known and that a solution has already been found.⁠[44] So Rerere calculates a 3-way-merge between the saved pre-image, the saved solution and the version of the file in the working tree. This way Rerere can resolve not only the same conflicts, but also similar ones (if in the meantime further lines outside the conflict area have been changed).

The result is not directly added to the index. The solution is simply copied to the file. You can then use git diff to check whether the solution looks useful, run tests if necessary, etc. If everything looks good, you can use the automatic solution via git add as usual.

3.4.2.1. Why Rerere Makes Sense

One might object: Who voluntarily takes the risk of deleting an already (possibly costly) resolved merge conflict in order to want to repeat it at some point?

However, the procedure is desirable: First of all, it doesn’t make sense to simply periodically and out of habit merge the mainline — i.e. the main development thread, e.g. master — into the topic branch (we will come back to this later). But if you have a long-lived topic branch and want to test it occasionally to see if it is compatible with the mainline, you don’t want to resolve the conflicts by hand every time — once resolved, Rerere will resolve conflicts automatically. This way you can successively develop your feature, knowing that it is in conflict with the mainline. But at the time of the integration of the feature the conflicts are all automatically resolvable (because you have occasionally saved conflict solutions with Rerere).

In addition, Rerere is also called automatically in conflict cases that arise in a rebase process (see Sec. 4.1, “Moving commits — Rebase”). Again, once conflicts have been resolved, they can be automatically resolved again. Once you have merged a branch into the mainline for test purposes and resolved a conflict, this solution is automatically applied when you rebuild this branch on the mainline via rebase.

3.4.2.2. Using Rerere

In order for the Rere functionality to be used, you must set the rerere.enabled option to true, as mentioned above. Rerere will then be called automatically when a merge conflict occurs (to capture the pre-image, possibly to resolve the conflict) and when a conflict resolution is checked in (to save the resolution).

Rerere stores information such as pre-image and resolution in .git/rr-cache/, uniquely identified by a SHA-1 sum. You almost never need to call the git rerere subcommand, as it is already handled by merge and commit. You can also use git rerere gc to delete very old solutions.

What happens if a wrong conflict resolution was checked in? Then you should delete the conflict resolution, otherwise Rerere will reapply the solution when you repeat the conflicted merge. To do this, there is the command git rerere forget <file> — directly after Rerere has checked in a wrong solution, you can delete the wrong solution in this way and restore the original state of the file (i.e. with conflict markers). If you only want to do the latter, a git checkout -m <file> will also help.

3.4.3. Avoiding Conflicts

Decentralized version control systems generally manage merges much better than central ones. This is mainly due to the fact that it is common practice in decentralized systems to check in many small changes locally first. This avoids “monster commits”, which offer much more potential for conflict. This finer granular development history and the fact that merges are usually data in the version history (as opposed to simply copying the lines of code) mean that decentralized systems do not have to look at the mere contents of files when merging.

Prevention is the best way to minimize merge conflicts. Make small commits! Combine your changes so that the resulting commit makes sense as a unit. Always build Topic Branches on the latest release. Merge from topic branches into “collection branches” or directly into master, not the other way around.⁠[45] Using Rerere prevents conflicts that have already been resolved from constantly reoccurring.

Obviously, good communication among developers is also important for prevention: If several developers implement different and mutually influencing changes to the same function, this will certainly lead to conflicts sooner or later.

Another factor that unfortunately often leads to unnecessary(!) conflicts is autogenerated content. Suppose you write the documentation of a software in AsciiDoc[46] or work on a LaTeX project with several contributors: Never add the compiled man pages or the compiled DVI/PS/PDF to the repository! In the autogenerated formats, small changes to the plaintext (i.e. in the Ascii or LaTeX version) can cause large (and unpredictable) changes to the compiled formats that Git will not resolve adequately. Instead, it makes sense to provide appropriate Makefile targets or scripts to generate the files, and possibly keep the compiled version on a separate branch.⁠[47]

3.5. Taking over Individual Commits: Cherry Picking

It will happen that you don’t want to integrate an entire branch directly, but rather parts, i.e. individual commits, first. The cherry-pick (“pick the good cherries”) git command is responsible for this.

The command expects one or more commits to be copied to the current branch. For example:

$ git cherry-pick d0c915d
$ git cherry-pick topic~5 topic~1
$ git cherry-pick topic~5..topic~1

The middle command copies two explicitly specified commits; the last command, on the other hand, copies all commits belonging to the specified commit range.

Unlike a merge, however, only the changes are integrated, not the commit itself. To do this, it would have to reference its predecessor, so that the predecessor would also have to be integrated, and so on, which is equivalent to a merge. So when you take over commits with cherry-pick, new commits are created with a new commit ID. Git can’t know that these commits are actually the same.

So if you are merging two branches that you have cherry-picked changes between, conflicts can occur.⁠[48] These are usually trivial to resolve, and the strategy options ours and theirs might be helpful (see Sec. 3.3.4, “Options for the Recursive Strategy”). The rebase command, on the other hand, recognizes such commit duplications,⁠[49] and omits the duplicated commits. This allows you to take some commits “from the middle” and then rebuild the branch the commits came from.

The cherry-pick command also understands these merge strategy options itself: If you want to copy a commit to the current branch, and if you want to make sure the new commit is right in case of conflict, use:

git cherry-pick -Xtheirs <commit>

The -n or --no-commit option tells Git to commit the changes from a commit to the index, but not to make a commit yet. This allows you to “aggregate” several small commits into the index first, and then package them as one commit:

$ git cherry-pick -n 785aa39 512f3e9 4e4a063
Finished one cherry-pick.
Finished one cherry-pick.
Finished one cherry-pick.
$ git commit -m "Diverse kleine Änderungen"

3.6. Visualizing Repositories

When you have created and merged some branches, you will have noticed that the following is the case: it’s easy to lose track.

The arrangement of commits and their relationships to each other is called the topology of a repository. In the following, we will introduce the graphical program gitk, among other things, to examine these topologies.

For small repositories, first call gitk --all, which displays the entire repository as a graph. Clicking on the individual commits displays the meta-information as well as the generated patch.

3.6.1. Revision Parameters

Since the listing of multiple commits is hard to keep track of, we examine a small sample repository with several branches merged together:

revision list commit graph gitk
Figure 20. The graph of commits as displayed in gitk

We recognize four branches (A-D) and one tag release. We can also display this tree on the console with the appropriate command line options using the log command (branch and tag names are printed in semi-bold for better distinction):

$ git log --decorate --pretty=oneline --abbrev-commit --graph --all
* c937566 (HEAD, D) commit on branch D
| *   b0b30ef (release, A) Merge branch 'C' into A
| |\
| | * 807db47 (C) commit on branch C
| | * 996a53b commit on branch C
| |/
|/|
| * 83f6bf3 commit on branch A
| *   5b2c291 Merge branch 'B' into A
| |\
| | * 2417cf7 (B) commit on branch B
| |/
|/|
| * 0bf1433 commit on branch A
|/
* 4783886 initial commit

The output of the log command is equivalent to the view in Gitk. However, git log is much faster than Gitk and does not require another program window.

So for a quick overview, it’s much more convenient to set up an alias that automatically adds the many long options. The authors use the alias tree for this, which you can define as follows:

$ git config --global alias.tree \'log --decorate \
   --pretty=oneline --abbrev-commit --graph'

By using git tree --all you get an ASCII version of the graph of the git repository. In the following, we use this alias to represent the topology.

Now we change the above command: instead of the --all option, which puts all commits in the tree, we now specify B (the name of the branch)

$ git tree B
* 2417cf7 (B) commit on branch B
* 4783886 initial commit

We receive all commits that are accessible from B. A commit only knows its predecessor(s) (several if branches are merged). “All commits reachable from B” thus refers to the list of commits from B onwards, up to a commit that has no predecessor (called a root commit).

Instead of one, the command can also accept multiple references. So to get the same output as with the --all option, you must specify references A, B, and D. C can be omitted because the commit is already “collected” on the way from A to the root commit.

Of course, you can also specify an SHA-1 sum directly instead of symbolic references:

$ git tree 5b2c291
*   5b2c291 Merge branch 'B' into A
|\
| * 2417cf7 (B) commit on branch B
* | 0bf1433 commit on branch A
|/
* 4783886 initial commit

If a reference is preceded by a caret (^), this negates the meaning.⁠[50] So the notation ^A means: not the commits that are accessible from A. However, this switch only excludes these commits, but not the others. So the above log command with the argument ^A will not output anything, because Git only knows which commits should not be displayed. So again, we add --all to list all commits, minus those that are accessible from A:

$ git tree --all ^A
* c937566 (HEAD, D) commit on branch D

An alternative notation is available with --not: Instead of ^A you can also write --not A.

Such commands are especially useful for examining the difference between two branches: Which commits are in branch D that are not in A? The command returns the answer:

$ git tree D ^A
* c937566 (HEAD, D) commit on branch D

Because this question is often asked, there is another, more intuitive notation for it: A..D is equivalent to D ^A:

$ git tree A..D
* c937566 (HEAD, D) commit on branch D

Of course the order is important here: “D without A” is a different set of commits than “A without D”! (Compare also the complete graph.)

In our example there is a tag release. To check which commits from branch D (which could stand for “Development”) are not yet included in the current release, simply specify release..D.

The syntax A..B can be remembered as the idiom “from A to B”. However, this “difference” is not symmetrical, i.e. A..B are usually not the same commits as B..A.

Alternatively, Git provides the symmetrical difference A..B. It is equivalent to the argument A B --not $(git merge-base A B), so it includes all the commits that can be reached from A or B, but not both.

3.6.1.1. Reference vs. List of References

In the example, A always refers to all commits that are accessible from A. But actually a branch is just a reference to a single commit. So why does log always list all commits reachable from A, while the git command show with the argument A only shows this one commit?

The difference is what the commands expect as an argument: show expects an object, that is, a reference to a single object, which is then displayed.⁠[51] Many other commands expect one (or more) commits instead, and these commands convert the arguments into a list of commits (traversing the list until the root commit).

3.6.2. Gitk

Gitk is a graphical program implemented in Tcl, which is usually packaged by distributors along with the actual Git commands — so you can be sure to find it on almost any system.

It represents individual commits or the entire repository in a three-part view: at the top is the tree structure with two additional columns for author and date, below is a list of changes in unified diff format, and a list of files to restrict the changes displayed.

The graph view is intuitive: Different colors help to distinguish the different version strings. Commits are always blue dots, with two exceptions: The HEAD is highlighted in yellow, and a commit that is not a root commit, but whose predecessor is not displayed, is shown in white.

Branches with an arrowhead indicate that further commits have been made on the branch. However, Gitk hides the branch due to the time distance between commits. A click on the arrowhead will take you to the continuation of the branch.

Branches appear as green labels, the currently checked out branch additionally bold. Tags are shown as yellow arrows.

You can delete or check out a branch with a right click on it. Right-clicking on commits opens a menu in which you can perform actions on the selected commit. The only thing that might be easier to do with Gitk than from the command line is cherry picking, i.e. transferring individual commits to another branch (see also Sec. 3.5, “Taking over Individual Commits: Cherry Picking”).

gitk
Figure 21. Complex topology in Gitk

Gitk accepts essentially the same options as git log. Some examples:

$ gitk --since=yesterday -- doc/
$ gitk e13404a..48effd3
$ gitk --all -n 100

The first command shows all commits since yesterday that have made changes to a file under the doc/ directory. The second command limits the commits to a specific range, while the third command shows the 100 most recent commits from all branches.

Experience shows that beginners are often confused because gitk by default only shows the current branch. This is probably because gitk is often called to get an overview of all branches. Therefore the following shell alias is useful: alias gik='gitk --all'.

Many users leave gitk open during work. Then it’s important to update the display from time to time so that more recent commits appear. With F5 (Update) you load all new commits and refresh the display of the references. Sometimes, however, if you delete a branch, for example, this is not enough. Although the branch is no longer displayed, there may still be unreachable commits in the GUI as artifacts. The key combination Ctrl+F5 (Reload) completely reloads the repository, which solves the problem.

As an alternative to gitk, you can use the GTK-based gitg or Qt-based qgit on UNIX systems; on an OS X system, for example, you can use GitX; for Windows, you can use GitExtensions. Some IDEs now also have corresponding visualizations (e.g. the Eclipse plugin EGit). Furthermore, you can use full-fledged Git clients like Atlassian SourceTree (OS X, Windows; free of charge), Tower (OS X; commercial) as well as SmartGit (Linux, OS X and Windows; free for non-commercial use).

3.7. Reflog

The Reference Log (Reflog) are log files that Git creates for each branch and HEAD. They store when a reference was moved from where to where. This happens especially with the checkout, reset, merge and rebase commands.

These log files are stored under .git/logs/ and are named after the reference. The reflog for the master branch can be found under .git/logs/refs/heads/master. There is also the command git reflog show <reference> to list the reflog:

$ git reflog show master
48effd3 master@{0}: HEAD^: updating HEAD
ef51665 master@{1}: rebase -i (finish): refs/heads/master onto 69b9e27
231d0a3 master@{2}: merge @{u}: Fast-forward
...

The Reflog command is rarely used directly and is just an alias for git log -g --oneline. In fact, the -g option causes the command not to show the predecessors in the commit graph, but to process the commits in the order in which they were reflogged.

You can easily try this: Create a test commit, then delete it again with git reset --hard HEAD^. The command git log -g will now first show the HEAD, then the deleted commit, and then the HEAD again.

The reflog thus also references commits that are otherwise no longer referenced, i.e. are “lost” (see Sec. 3.1.2, “Managing Branches”). The reflog might help you if you have deleted a branch that you would have needed after all. Although a git branch -D also deletes the branch’s reflog. However, you had to check out the branch to commit to it, so use git log -g HEAD to find the last time you checked out the branch you were looking for. Then create a branch that points to this (seemingly lost) commit ID, and your lost commits should be back.⁠[52]

Commands that expect one or more references can also implicitly use Reflog. In addition to the syntax already found in the output of git log -g (e.g. HEAD@{1} for the previous position of the HEAD), Git also understands <ref>@{<when>}. Git interprets the time <when> as an absolute or relative date and then consults the reflog of the corresponding reference to find out what the next log entry in time is. This is then referenced.

Two examples:

$ git log 'master@{two weeks ago}..'
$ git show '@{1st of April, 2011}'

The first command lists all commits between HEAD and the commit the master branch pointed to two weeks ago (note the suffix .. which means a commit range up to HEAD). This doesn’t necessarily have to be a commit that is two weeks old: if you test moved the branch to the very first commit in the repository two weeks ago using git reset --hard <initial-commit>, then that very commit will be referenced.⁠[53]

The second line shows the commit to which the currently checked out branch (due to missing explicit reference before the @) pointed on April 1, 2011. In both commands, the argument with a Reflog attachment must be enclosed in quotation marks to make sure Git gets the argument completely.

Note that the reflog is only available locally and therefore does not belong to the repository. If you send a commit ID or tag name to another developer, it references the same commit, but a master@{yesterday} can reference different commits depending on the developer.

If you don’t specify a branch and time, Git will assume HEAD. This allows you to use @ as the short form for HEAD in commands. Furthermore, many commands understand the argument - as @{-1}, which is “last position of HEAD”:

$ git checkout feature   # vorher auf "master"
$ git commit ...         # Änderungen, Commits machen
$ git checkout -         # zurück auf "master"
$ git merge -            # Merge von "feature"

4. Advanced Concepts

The following chapter covers selected advanced concepts. The focus is on the Rebase command with its many applications. We find out who changed a line in the source code (Blame) and when, and how to tell Git to ignore files and directories. We’ll also look at how to stash changes to the working tree and annotate commits (Notes). Finally, we show you how to quickly and automatically find commits that introduce a bug (Bisect).

4.1. Moving commits — Rebase

In the section on Git’s internals, we mentioned earlier that you can move and modify commits in a Git repository (graphically speaking) at will. In practice, this is made possible primarily by the git command rebase. This command is very powerful and important, but sometimes a bit more demanding to use.

Rebase is an artificial word which means “to put something on a new basis”. What it means is that a group of commits is moved around within the commit graph, building commit after commit based on another node. The following graphics illustrate how this works:

rebase before
Figure 22. Before the rebase
rebase after
Figure 23. …​and after that

In its simplest form the command is git rebase <reference> (in the above diagram: git rebase master). This means that Git first marks all commits <reference>..HEAD, i.e. the commits that can be reached from HEAD (the current branch) minus the commits that can be reached from <reference> - in other words, everything that is in the current branch but not in <reference>. In the diagram, these are E and F.

The list of these commits is stored temporarily. Git then checks out the commit <reference> and copies the individual cached commits in the original order as new commits to the branch.

There are a few points to consider:

  • Because the first node of the topic branch (E) now has a new predecessor (D), its metadata and thus its SHA-1 sum changes (it becomes E_). The second commit (F) then also has a different predecessor (E_ instead of E), its SHA-1 sum changes (it becomes F_) and so on - this is also called the ripple effect. Overall, all copied commits will have new SHA-1 sums - so they’re the same (in terms of changes), but not identical.

  • Such an action, just like a merge operation, can result in conflicting changes. Git can partially resolve them automatically, but aborts with an error message if the conflicts are not trivial. The rebase process can then either be “repaired” and continued, or aborted (see below).

  • If no other reference points to node F, it will be lost, because reference HEAD (and the corresponding branch, if applicable) will be shifted to node F_ in case of a successful rebase. So if F has no more reference (and no predecessors referencing F), Git can no longer find the node, and the tree “disappears”. If you’re not sure whether you need the original tree again, you can simply reference it with the tag command, for example. In that case, the commits will be preserved even after a rebase (but then in duplicate at different places in the commit graph).

4.1.1. An Example

Consider the following situation: The sqlite-support branch branches off from the “fixed a bug…​” commit. But the master branch has already moved on, and a new 1.4.2 release has been made.

screenshot rebase vorher
Figure 24. Before the rebase

Now sqlite-support is checked out and rebuilt to master:

$ git checkout sqlite-support
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: include sqlite header files, prototypes
Applying: generalize queries
Applying: modify Makefile to support sqlite

Rebase applies the three changes introduced by commits from the sqlite-support branch to the master branch. After that, the repository looks like this in Gitk:

screenshot rebase nachher
Figure 25. After rebase

4.1.2. Extended Syntax and Conflicts

Normally git rebase will always build the branch you are currently working on on a new one. However, there is a shortcut: If you want to base topic on master, but you are on a completely different branch, you can do this via

$ git rebase master topic

Git does the following internally:

$ git checkout topic
$ git rebase master

Please note the (unfortunately not very intuitive) order:

git rebase <on which> <what>

A rebase can lead to conflicts. The process then stops with the following error message:

$ git rebase master
...
CONFLICT (content): Merge conflict in <datei>
Failed to merge in the changes.
Patch failed at ...
The copy of the patch that failed is found in:
   .../.git/rebase-apply/patch

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".

You proceed as with a regular merge conflict (see Sec. 3.4, “Resolving Merge Conflicts”) - git mergetool is very helpful here. Then simply add the changed file via git add and let the process continue via git rebase --continue.⁠[54]

Alternatively, the problematic commit can be skipped using the git rebase --skip command. The commit is then lost unless it is referenced in another branch somewhere else! So you should only perform this action if you are certain that the commit is obsolete.

If none of this helps (e.g. if you can’t solve the conflict at that point, or if you realize that you are rebuilding the wrong tree), pull the emergency brake: git rebase --abort. This will discard all changes to the repository (including successfully copied commits), so that the state afterwards is exactly the same as it was when the rebase process was started. The command also helps if at some point you forget to finish a rebase process, and other commands complain that they can’t do their job because a rebase is in progress.

4.1.3. Why Rebasing Makes Sense

Rebase is primarily useful for keeping the commit history of a project simple and easy to understand. For example, a developer might be working on a feature, but then have something else to do for a few weeks. Meanwhile, however, development on the project has progressed, there’s been a new release, etc. Only now does the developer get to finish a feature. (Even if you want to send patches via email, rebase helps to avoid conflicts, see Sec. 5.9, “Patches via E-mail”.)

For the version history it is now much more logical if his feature was not “dragged along” unfinished for a long period of time alongside the actual development, but if the development branches off from the last stable release.

Rebase is good for exactly this change in history: The developer can now simply enter the command git rebase v1.4.2 on the branch where he developed the feature, to rebuild his feature branch on the commit with the release tag v1.4.2. This makes it much easier to see what differences the feature really brings to the software.

It also happens to every developer in the heat of the moment that commits end up in the wrong branch. There is a bug that happens to be there, which is quickly fixed by a commit; but then a test must be written directly to avoid this bug in the future (another commit), and this must be noted in the documentation. After the actual work is done, you can use Rebase to “transplant” those commits to another location in the commit graph.

Rebase can also be useful if a branch requires a feature that has only recently been incorporated into the software. A merge of the master branch does not make sense semantically, because then these and other changes are inseparably merged with the feature branch. Instead, you rebase the branch on a new commit that already contains the required feature, and then use that in further development.

4.1.4. When Rebasing Is Not Useful — Rebase vs. Merge

The concept of rebase is initially a little difficult to understand. But once you have understood what is possible with it, the question arises: What is the point of a simple merge if you can edit everything with rebase?

When git-rebase is not used, or hardly used at all, a project history often develops that becomes relatively unmanageable, because merges have to be performed constantly and for a few commits at a time.

If, on the other hand, too much rebase is used, there is a danger that the entire project will be senselessly linearized: The flexible branching of Git is used for development, but the branches are then integrated into the publishing branch one after the other (!) like a zip fastener via rebase. This presents us with two main problems:

  • Logically related commits are no longer recognizable as such. Since all commits are linear, the development of multiple features is inextricably intertwined.

  • The integration of a branch can no longer be easily undone, because identifying those commits that once belonged to a feature branch is only possible manually.

This is how you can make the most of Git’s flexible branching. The conclusion is that rebase should be used neither too much nor too little. Both make the project history (in different ways) confusing.

In general, you are doing well with the following rules of thumb:

  1. A feature is integrated by merge when it is finished. It is best to avoid creating a fast forward merge so that the merge commit is preserved as the time of integration.

  1. While you are developing, you should use rebase frequently (especially interactive rebase, see below).

  1. Logically separate units should be developed on separate branches - logically related ones possibly on several, which are then merged by rebase (if that makes sense). The merging of logically separate units is then done by merge.

4.1.5. A Word of Warning

As mentioned earlier, a rebase inevitably changes the SHA-1 sums of all commits that are “rebuilt”. If these changes have not yet been published, that is, if a developer has them in a private repository, that’s not too bad either.

But if a branch (e.g.  `master`) is published⁠[55] and later rewritten via rebase, this has unpleasant consequences for all involved: All branches based on master will now reference the old copy of the master branch that has been rewritten. So each branch must be rebased to the new master (which in turn changes all commit IDs). This effect continues, and can be very time-consuming to fix (depending on when such a rebase happens, and how many developers are involved in the project), especially if you’re new to git.

Therefore you should always remember the following rule:

Only edit unpublished commits with the rebase command!

Exceptions are conventions like personal branches or pu. The latter is an abbreviation for Proposed Updates and is usually a branch where new, experimental features are tested for compatibility. No one builds their own work on this branch, so it can be rewritten without problems and prior notice.

Another possibility is offered by private branches, i.e. those that start with <user>/ for example. If you make an agreement that developers will do their own development on these branches, but always base their features on “official” branches, then the developers may rewrite their branches as they wish.

4.1.6. Avoiding Code Duplication

If a feature is being developed over a long period of time, and parts of the feature are already flowing into a mainstream release (e.g. via cherry-pick), the rebase command will detect these commits and omit them when copying or rebuilding the commits, because the change is already contained in the branch.

For example, after a rebase, the new branch consists only of the commits that have not yet been incorporated into the base branch. This way, commits do not appear twice in the version history of a project. If the branch had simply been merged, the same commits with different SHA-1 sums would sometimes be present in different places in the commit graph.

4.1.7. Managing Patch Stacks

There are situations where there is a vanilla version (“simplest version”) of a piece of software and also a certain number of patches applied to it before the vanilla version is shipped. For example, your company builds software, but before each delivery to the customer, some adjustments have to be made (depending on the customer). Or you have open source software in use, but have adapted it a bit to your needs - every time a new, official version of the software is released, you have to reapply your changes and then rebuild the software.⁠[56]

To manage patch stacks, there are some programs that build on top of Git, but give you the convenience of not having to work directly with the rebase command. For example, TopGit[57] allows You can define dependencies between branches - if something changes in a branch and other branches depend on it, TopGit will rebuild them on demand. An alternative to TopGit is Stacked Git[58].

4.1.8. Restricting Rebase via --onto

Now, you may have wondered: git rebase <reference> always copies all commits that are between reference> and HEAD. But what if you only want to implement part of a branch, to “transplant” it, so to speak? Consider the following situation:

rebase onto before
Figure 26. Before the rebase --onto

You were developing a feature on the branch topic when you noticed a bug; you created a branch bugfix and found another bug. Semantically speaking, your branch bugfix has nothing to do with the topic branch. Therefore, it makes sense to branch off from the master branch.

But if you now rebuild the branch bugfix using git rebase master, the following happens: All nodes that are in bugfix but not in master are copied to the master branch in order - that is, nodes D, E, F, and G. However, D and E are not part of the bugfix at all.

This is where the --onto option comes into play: It allows you to specify a start and end point for the list of commits to be copied. The general syntax is

git rebase --onto <on which> <start> <end>

In this example, we only want to build the commits F and G (or also: the commits from topic to bugfix) from the top of master. Therefore the command is

$ git rebase --onto master topic bugfix

The result looks as expected: