Git: Distributed Version Control for Code and Documents

This is the English translation of Das Git-Buch (The Git Book) by Valentin Haenel and Julius Plenz, 2nd Ed.2014, released under CC BY-NC-SA 4.0 license. Translated from German by Alexander Bolli and Tristano Ajmone in 2020.

Book Status

This document is still in Beta version, but fully translated; so enjoy reading it and leave us some feedback on how we might improve it.

We’re currently proofreading and polishing the entire text, fixing some styling and formatting issues. Any help with proofreading is much appreciated; if you wish to contribute submit your changes via pull request on the beta-dev branch of the project repository:

https://fossy-cats.github.io/Git-Buch_EN

Preface

Git was developed in early 2005 by Linus Torvalds, the creator and current maintainer of the Linux kernel. For the management of the kernel sources, the development team had initially decided to use the commercial version control system BitKeeper. Problems arose when the company behind BitKeeper, which provided the tool to the project free of charge, accused a developer of revealing the mechanisms of the software by reverse engineering. As a result, Torvalds decided to write a new version control system.

Simply switching to another system was not an option: The alternatives had a centralized architecture and did not scale well enough. The requirements of the kernel project on a version control system are, however, also huge: Between a little version jump (e.g. 2.6.35 to 2.6.36) there are over 500,000 changed lines in almost 1000 files. Responsible for this are over 1000 individuals.

So what were the Design Goals of the new program? Two characteristics crystallized quickly as design goals: speed or performance and verifiable integrity of the managed data.

After only a few weeks of work, a first version of Git was able to manage its own source code. Implemented as a small shell script collection with performance-critical parts in C, this version was still far from being a “full-fledged” version control system.

Since version 1.5 (February 2007), Git offers a new and tidier user interface and extensive documentation, allowing people not directly involved in Git development to use it.

The basic concepts have remained the same up to current versions: First and foremost, the object model and index, key features that distinguish Git from other VCS. The Unix philosophy of “one tool, one job” is also consistently applied here; the subcommands of Git are each independent, executable programs or scripts. Even in the 2.0 version there are still (as at the beginning of the development) some subcommands with shell scripts implemented (e.g. git pull).

Linus Torvalds himself does hardly any programming on Git these days; a few months after the first release, Junio C. Hamano took over as maintainer.

Not only the revolutionary approach of Git, but also the fact that the entire kernel development was migrated to Git quickly and successfully has given Git a steep rise. Many projects, some of them very large, now use Git and benefit from the flexibility it has gained.

Who Is This Book Intended For?

The book is aimed at both professional software developers and users who want to work on small scripts, web pages or other documents or who want to get actively involved in an (open source) project. It teaches basic version control techniques, introduces the basics of Git, and explains all the major use cases.

Work that you don’t manage with a version control system is work that you might have to do again—whether it’s because you accidentally delete a file or consider parts obsolete that you need later. For any form of productive text and development work, you need a tool that can record and manage changes to files. Git is flexible, fast, and equally suited for small projects by individuals or large projects involving hundreds of developers, such as the Linux kernel.

Developers who already use a different version control system can benefit from switching to Git. Git allows a much more flexible way of working and is in many respects not as restrictive as comparable systems. It supports true merging and guarantees the integrity of managed data.

Git also benefits open source projects, because each developer has his or her own repository, which prevents disputes over commit privileges. Git also makes it much easier for newcomers to get started.

Although most of the examples and techniques presented refer to source code, there is no fundamental difference to managing documents written in LaTeX, HTML, AsciiDoc or related formats.

How to Read the Book?

Ch. 1, Introduction and First Steps gives a brief overview: How do you initialize a git repository and manage files in it? It also covers the most important configuration settings.

Ch. 2, The Basics covers two key concepts of Git: the index and the object model. Along with other important commands that are introduced there, understanding these two concepts is essential to the safe use of Git.

Ch. 3, Practical Version Control discusses practical aspects of version control. In particular, it covers the branches and merges that are so central to Git. It also discusses how to resolve merge conflicts in detail.

Ch. 4, Advanced Concepts discusses advanced concepts, with a special focus on the Rebase command, an essential tool for any git professional. Other important commands follow, including Blame, Stash, and Bisect.

Only Ch. 5, Distributed Git looks at the distributed aspects of Git: how to share changes between repositories, how developers can collaborate. Then Ch. 6, Workflows provides an overview of strategies for coordinating development work in a project.

We recommend that you read at least the first five chapters in a row. They describe all the important concepts and techniques for using Git safely in large projects. You can read the following chapters in any order, depending on your interests and needs.

Ch. 7, Git Servers covers installation and maintenance of Git services: two web-based repository browsers and access management for hosted repositories using Gitolite.

Ch. 8, Git Automation summarizes various aspects of automation: How to write hooks and custom Git commands, and how to rewrite the complete version history if necessary.

Finally, Ch. 9, Interacting with Other Version Control Systems discusses migration from other systems to Git. The focus here is on converting existing Subversion repositories, and on the ability to talk to Subversion from within Git.

The appendices deal with the installation and integration of Git into the shell. An outlook on the hosting service Github and a detailed description of the structure and maintenance mechanisms of a git repository provide further background information.

Conventions

The examples are only executed on the shell. Even though some editors and IDEs now offer quite a good Git integration, and even though there are a lot of graphical front-ends for Git, you should first learn the basics with the real Git commands.

The shell prompt is a single dollar sign ($); keyboard input is printed in semi-bold, like this

$ git status

To find your way around the shell faster and better, we strongly recommend adding git functionality to the shell, such as displaying the branch in the prompt (see Ch. 10, Shell Integration).

Unless otherwise noted, we refer to Git version 2.0. The examples all run with English local settings.

Newly introduced terms are written in italics.

Installation and “The Git-Repository”

The installation of Git is described in detail in App. A, Installation. Some examples use the Git source repository, the repository where Git is actively developed. This repository is also called Git-via-Git or git.git.

After you have installed Git, you can download the repository with the following command

$ git clone git://git.kernel.org/pub/scm/git/git.git

The process takes a few minutes, depending on the connection speed and server load.

Documentation and Help

A comprehensive documentation of Git is available in the form of pre-installed man pages. Almost every subcommand has its own man page, which you can call in three equivalent ways, here for the git status command, for example:

$ git help status
$ git status --help
$ man git-status

On the Git website⁠^[1] you can also find links to the official tutorial and other free documentation.

A large, vibrant community has formed around Git. The Git mailing list⁠^[2] is the lynchpin of the development: patches are sent in, new features are discussed, and questions about using Git are answered. However, the list, with sometimes more than 100 emails a day, some of them very technical, is only suitable for beginners to a limited extent.

The Git Wiki⁠^[3] contains documentation as well as an extensive link collection of tools based on Git⁠^[4] and FAQs⁠^[5].

Alternatively, the #git IRC channel on the Freenode network provides a place to get rid of questions not already answered in the FAQs or documentation.

For those switching from the Subversion environment, the Git-SVN Crash Course⁠^[6] is recommended, a comparison of Git and Subversion commands that will help you transfer your Subversion knowledge to the Git world.

Also worth mentioning is Stack Overflow⁠^[7], a platform by programmers for programmers, on which technical issues, including Git, are discussed.

Downloads and Contacts

The sample repositories of the first two chapters and a collection of all longer scripts are available for download at http://gitbu.ch/.

If you have any comments, please contact us by e-mail at one of the following addresses: kontakt@gitbu.ch, valentin@gitbu.ch or julius@gitbu.ch.

Acknowledgements

First of all, we’d like to thank all the developers and maintainers of the Git project as well as the mailing list and the IRC channel.

Many thanks to Sebastian Pipping and Frank Terbeck for comments and tips. Special thanks to Holger Weiß for his review of the manuscript and helpful ideas. We thank the entire Open Source Press Team for the good and efficient cooperation.

Our thanks go especially to our parents, who have always supported and encouraged us.

Valentin Haenel and Julius Plenz — Berlin, June 2011

Preface to the 2nd Edition

In the 2nd edition, we have limited ourselves to carefully recording the changes in the use of Git that were introduced up to version 2.0 — in fact, many commands and error messages are now more consistent, so that in some places this represents a significant simplification of the text. Inspired by questions from Git training courses and our own experience, new hints on problems, solutions, and interesting features are included.

We thank all those who sent in corrections to the first edition: Philipp Hahn, Ralf Krüdewagen, Michael Prokop, Johannes Reinhold, Heiko Schlichting, Markus Weber.

Valentin Haenel and Julius Plenary Session — Berlin, September 2014

Preface to the Creative Commons Edition

The publisher Open Source Press, who initially convinced us to write this book at all and published it over the past few years, has ceased operations as of 31.12.2015 and has transferred all rights to the published texts back to the authors. We especially thank Markus Wirtz for the always good and productive collaboration that has connected us over many years.

Due to mainly very positive feedback on this text we decided to make it freely available under a CreativeCommons-License.

Valentin Haenel and Julius Plenz — Berlin/Sydney, January 2016

1. Introduction and First Steps

The following chapter provides a concise introduction to the basic concepts and configuration settings of Git. A small sample project shows how to put a file under version control with Git, and the commands you use to perform the most important tasks.

1.1. Basic Terminology

Some important technical terms will be used repeatedly in the following and therefore require a brief explanation. If you have experience with another version control system, you will be familiar with some of the concepts involved, though perhaps under a different name.

Version Control System (VCS): A system for managing and versioning software or other digital information. Prominent examples are Git, Subversion, CVS, Mercurial (hg), Darcs and Bazaar. Synonyms are Software Configuration Management (SCM) and Revision Control System.

We distinguish between centralized and distributed systems. In a centralized system, such as Subversion, there must be a central server where the history of the project is stored. All developers must connect to this server to view the version history or make changes. In a distributed system like Git, there are many equivalent instances of the repository, so each developer has their own repository. The exchange of changes is more flexible, and does not necessarily take place through a central server.

Repository: The repository is a database where Git stores the different states of each file in a project over time. In particular, every change is packaged and saved as a commit.

Working Tree: The working directory of Git (sometimes called sandbox or checkout in other systems). This is where you make all modifications to the source code. It’s often called the Working Directory.

Commit: Changes to the working tree, such as modified or new files, are stored in the repository as commits. A commit contains both these changes and metadata, such as the author of the changes, the date and time, and a commit message that describes the changes. A commit always references the status of all managed files at a particular point in time. The various Git commands are used to create, manipulate, view, or change the relationships between commits.

HEAD: A symbolic reference to the newest commit in the current branch. This reference determines which files you find in the working tree for editing. It is therefore the “head” or tip of a development branch (not to be confused with HEAD in systems like CVS or SVN).

SHA-1: The Secure Hash Algorithm creates a unique 160 bit checksum (40 hexadecimal characters) for any digital information. All commits in Git are named after their SHA-1 sum (commit ID), which is calculated from the contents and metadata of the commit. It is, so to speak, a content-dependent version number, such as f785b8f9ba1a1f5b707a2c83145301c807a7d661.

Object model: A git repository can be modeled as a graph of commits, manipulated by git commands. This modeling makes it very easy to describe how Git works in detail. For a detailed description of the object model, see Sec. 2.2, “The Object Model”.

Index: The index is an intermediate level between the working tree and the repository, where you prepare a commit. The index therefore indexes which changes to which files you want to package as commits. This concept is unique to Git and often causes difficulties for beginners and people switching to Git. We discuss the index in detail in Sec. 2.1.1, “Index”.

Clone: When you download a Git repository from the Internet, you create a clone of that repository. The clone contains all the information contained in the source repository, especially the entire version history including all commits.

Branch: A branch in the development. Branches are used in practice, for example, to develop new features, prepare releases, or to provide old versions with bug fixes. Branches are — just like the merging of branches (Merge) — extremely easy to handle in Git and an outstanding feature of the system.

master: Because you need at least one branch to work with Git, the Branch master is created when you initialize a new repository. The name is a convention (similar to trunk in other systems); you can rename or delete this branch as you wish, as long as at least one other branch is available. The master is technically no different from other branches.

Tag: Tags are symbolic names for hard-to-remember SHA-1 sums. You can use tags to mark important commits, such as releases. A tag can simply be an identifier, such as v1.6.2, or it can contain additional metadata such as author, description, and GPG signature.

1.2. First Steps with Git

To get you started, we’ll use a small example to illustrate the workflow with Git. We create a repository and develop a one-liner, a “Hello, World!” program in Perl.

In order for Git to assign a commit to an author, you need to enter your name and email address:

$ git config --global user.name "John Doe"
$ git config --global user.email "john.doe@example.com"

Note that a subcommand is specified when Git is called, in this case config. Git provides all operations through such subcommands. It is also important that no equal sign is used when calling git config. The following call is therefore incorrect:

$ git config --global user.name = "John Doe"

This is a trip hazard, especially for beginners, because Git does not output an error message, but takes the equals sign as the value to set.

1.2.1. Our First Repository

Before we use Git to manage files, we need to create a repository for the sample project. The repository will be created locally, so it will only be on the file system of the machine you are working on.

It’s generally recommended that you practice using Git locally first, and only later dive into the decentralized features and functions of Git.

$ git init example
Initialized empty Git repository in /home/esc/example/.git/

First, Git creates the directory example/ if it doesn’t already exist. Git then initializes an empty repository in this directory and creates a subdirectory .git/ for it, which is used to manage internal data. If the example/ directory already exists, Git creates a new Git repository in it. If both the directory and a repository already exist, Git does nothing. We change to the directory and look at the current state with git status:

$ cd example
$ git status
On branch master

Initial commit

nothing to commit (create/copy files and use "git add" to track)

Git tells us that we’re about to commit (Initial commit), but hasn’t found anything to commit (nothing to commit). Instead, it gives a hint as to what the next steps should be (most Git commands do that, by the way): “Create or copy files, and use git add to manage them with Git.”

1.2.2. Our First Commit

Now let’s give Git a first file to manage, which is a “Hello World!” program in Perl. Of course, you can write any program in the programming language of your choice instead.

We’ll first create the hello.pl file with the following content

print "Hello World!\n";

and execute the script once:

$ perl hello.pl
Hello World!

That means we’re ready to manage the file with Git. But let’s take a look at the output of git status first:

$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

      hello.pl
nothing added to commit but untracked files present (use "git add" to track)

While the first commit is still pending, Git registers that there are already files in that directory, but the system is unaware of them — Git calls them untracked. This is, of course, our little Perl program. To manage it with Git, we use the command git add <file>:

$ git add hello.pl

The add generally stands for “add changes” — so you will need it whenever you have edited files, not just when you first add them!

Git doesn’t provide output for this command. Use git status to check if the call was successful:

$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

      new file:   hello.pl

Git will apply the changes — our new file — at the next commit. However, this commit is not yet complete — we’ve only prepared it so far.

To be precise, we’ve added the file to the Index, an intermediate stage where you collect changes that will be included in the next commit. For further explanation of this concept, see Sec. 2.1.1, “Index”.

With git status, under Changes to be committed, you can always see which files are in the Index, i.e., will be included in the next commit.

Everything is ready for the first commit with the git commit command. We also pass the -m option on the command line with a commit message describing the commit:

$ git commit -m "First version"
[master (root-commit) 07cc103] First version
 1 file changed, 1 insertion(+)
 create mode 100644 hello.pl

Git will confirm that the process has been successfully completed and the file will be managed from now on. The somewhat cryptic output means Git has created the initial commit (root-commit) with the appropriate message. A line has been added to a file, and the file has been created with Unix permissions 0644.⁠^[8]

As you’ve no doubt noticed by now, git status is an indispensable command in your daily work — we’ll use it again here:

$ git status
On branch master
nothing to commit, working directory clean

Our sample repository is now “clean”, because there are no changes in the Working Tree or Index, nor are there any files that are not managed with Git (untracked files).

1.2.3. Viewing Commits

To conclude this brief introduction, we’ll introduce you to two very useful commands that you’ll often use to examine the version history of projects.

First, git show allows you to examine a single commit — it’s the most recent one, with no arguments:

$ git show
commit 07cc103feb393a93616842921a7bec285178fd56
Author: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Tue Nov 16 00:40:54 2010 +0100

    First version

diff --git a/hello.pl b/hello.pl
new file mode 100644
index 0000000..fa5a091
--- /dev/null
+++ b/hello.pl
@@ -0,0 +1 @@
+print "Hello World!\n";

You see all relevant information about the commit: the commit ID, the author, the date and time of the commit, the commit message, and a summary of the changes in Unified-Diff format.

By default, git show always prints the HEAD (a symbolic name for the most recent commit), but you could also specify, for example, the commit ID, which is the SHA-1 checksum of the commit, a unique prefix to it, or the branch (master in this case). Thus, the following commands are equivalent in this example:

$ git show
$ git show HEAD
$ git show master
$ git show 07cc103
$ git show 07cc103feb393a93616842921a7bec285178fd56

If you want to view more than one commit, git log is recommended. More commits are needed to demonstrate the command in a meaningful way; otherwise, the output would be very similar to git show, since the sample repository currently contains only a single commit. So let’s add the following comment line to the “Hello World!” program:

# Hello World! in Perl

For the sake of the exercise, let’s take another look at the current status with git status:

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working
   directory)

      modified:   hello.pl

no changes added to commit (use "git add" and/or "git commit -a")

After that, as already described in the output of the command, use git add to add the changes to the index. As mentioned earlier, git add is used both to add new files and to add changes to files already managed.

$ git add hello.pl

Then create a commit:

$ git commit -m "Comment line"
[master 8788e46] Comment line
 1 file changed, 1 insertion(+)

Now git log shows you the two commits:

$ git log
commit 8788e46167aec2f6be92c94c905df3b430f6ecd6
Author: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Fri May 27 12:52:58 2011 +0200

    Comment line

commit 07cc103feb393a93616842921a7bec285178fd56
Author: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Tue Nov 16 00:40:54 2010 +0100

    First version

1.3. Configuring Git

Like most text-based programs, Git offers a wealth of configuration options. So now’s the time to do some basic configuration. These include color settings, which are turned on by default in newer versions, to make it easier to capture the output of Git commands, and small aliases (abbreviations) for frequently needed commands.

You configure Git with the git config command. The configuration is saved in a format similar to an INI file. Without specifying further parameters, the configuration applies only to the current repository (.git/config). With the --global option, it is stored in the .gitconfig file in the user’s home directory, and is then valid for all repositories.⁠^[9]

Important settings that you should always configure are the user name and e-mail address:

$ git config --global user.name "John Doe"
$ git config --global user.email "john.doe@example.com"

Note that you must protect spaces in the setting value (using quotation marks or backslashes). Also, the value follows the name of the option directly — an equal sign is not necessary here either. The result of the command can be found in the file ~/.gitconfig:

$ less ~/.gitconfig
[user]
    name = John Doe
    email = john.doe@example.com

The settings are now “global”, meaning they apply to all repositories you edit under that user name. If you want to specify an e-mail address other than your globally defined one for a particular project, simply change the setting there (this time, of course, without adding --global):

$ git config user.email maintainer@project.example.com

When querying an option, Git will first use the setting in the current repository if it exists, otherwise the one from the global .gitconfig; if this does not exist either, it will fall back to the default value.⁠^[10] The latter is available for all options in the man page git-config. You can get a list of all the settings you have set using git config -l.

You can also edit the .gitconfig file (or the repository .git/config) by hand. This is especially useful for deleting a setting — although git config also offers a --unset option, it is easier to delete the corresponding line in an editor.

The commands git config -e or git config --global -e launch the editor configured for Git on the local or global configuration file.

Note, however, that when you set options with an appropriate command, Git automatically protects problematic characters in the option’s value so that no bad configuration files are created.

1.3.1. Git Aliases

Git offers you the possibility to abbreviate single commands and even whole command sequences via Aliases. The syntax is:

$ git config alias.<alias-name> <command>

To set st as an alias for status:

$ git config --global alias.st status
$ git st
On branch master
...

You can also include options in an alias, for example:

$ git config --global alias.gconfig 'config --global'

You will find more useful aliases later in the book; how to create more complex aliases is described in Sec. 8.3.8, “Extended Aliases”. But first, some useful abbreviations:

[alias]
    st = status
    ci = commit
    br = branch
    co = checkout
    df = diff
    he = help
    cl = clone

1.3.2. Adjusting Colours

Very helpful is the color.ui option, which checks whether Git should color the output of various commands. Thus, deleted files and lines appear red, new files and lines appear green, commit IDs appear yellow, etc. In newer Git versions (1.8.4 and later) this setting is already set automatically, so you don’t need to do anything.

The color.ui option should be set to auto — if the output from Git is to a terminal, colors are used. If the command is written to a file instead, or the output is piped to another program, Git will not output color sequences, as this could interfere with automatic processing.

$ git config --global color.ui auto

1.3.3. Configuring Character Sets

Unless set otherwise, Git assumes UTF-8 as the character encoding for all text, especially author names and the commit message. If you want a different encoding, you should configure it explicitly:⁠^[11]

$ git config i18n.commitEncoding ISO-8859-1

Similarly, the setting i18n.logOutputEncoding determines the character set Git converts names and commit messages to before outputting them.

The encoding of the files managed by Git is not important here and is not affected by these settings — files are only bit streams that Git does not interpret.

If you have to handle files encoded according to ISO-8859-1 in a UTF-8 environment, you should adjust the setting of your pager (see below) accordingly. The following setting is recommended for authors:

$ git config core.pager 'env LESSCHARSET=iso8859 less'

1.3.4. Line End Settings

Since Git runs on Windows systems like it does on unixoid systems, it has to solve the problem of different line-end conventions. (This only affects text files — binaries that Git recognizes as such are excluded from this treatment).

The core.eol setting, which can take one of the values lf, crlf or native, is mainly relevant for this. The default setting native lets Git use the system default — Unix: Line Feed (lf) only, Windows: Carriage Return & Line Feed (crlf). The file is automatically converted to get line feeds only, but is checked out with CRLF if necessary.

Git can convert between the two types when you check out the file, but it’s important not to mix the two. For this, the core.safecrlf option provides a mechanism to warn the user (value warn) or even disallow the commit (value true).

A safe setting, which also works with older Git versions on Windows systems, is to set core.autocrlf to input: This will automatically replace CRLF with LF when reading files from the filesystem. Your editor must then be able to handle LF line endings accordingly.

You can also specify these settings explicitly per file or subdirectory, so that the format is the same across all platforms (see Sec. 8.1, “Git Attributes — Treating Files Separately”).

1.3.5. Editor, Pager and Browser Settings

Git automatically starts an editor, pager, or browser for certain actions. Usually reasonable defaults are used, but if not, you can configure your preferred program with the following options:

core.editor
core.pager
web.browser

A word about the pager: By default, Git uses the less program, which is installed on most basic systems. The command is always started whenever a Git command produces output on a terminal. However, less is automatically configured by an environment variable to quit when the output is completely fit on the terminal. So, if a command produces a lot of output, less will automatically come to the foreground — and remain invisible otherwise.

If core.pager is set to cat, Git will not use a pager. However, this behavior can be achieved from command to command using the --no-pager parameter. In addition, you can use git config pager.diff false to ensure that the output of the diff command is never sent to the pager.

1.3.6. Configuration via Environment Variables

Some options can also be overridden by environment variables. In this way, options can be set in a shell script or alias for a single command only.

GIT_EDITOR: the editor that Git starts, for example, to create the commit message. Alternatively, Git uses the EDITOR variable.

GIT_PAGER: the pager to be used. The value cat switches the pager off.

GIT_AUTHOR_EMAIL, GIT_COMMITTER_EMAIL: uses the appropriate email address for the author or committer field when creating a commit.

GIT_AUTHOR_NAME, GIT_COMMITTER_NAME: analogous to the name.

GIT_DIR: Directory in which the Git repository is located; only makes sense if a repository is explicitly stored under a directory other than .git.

The latter variable is useful, for example, if you want to access the version history of another repository within a project without changing directory:

$ GIT_DIR="~/proj/example/.git" git log

Alternatively, you can use the -c option before the subcommand to overwrite a setting for this call only. For example, you could tell Git to disable the core.trustctime option for the upcoming call:

$ git -c core.trustctime=false status

1.3.7. Automatic Error Correction

The value of the help.autocorrect option determines what Git should do if it can’t find the subcommand you entered, for example if you accidentally type git statsu instead of git status.

If the option is set to a number n greater than zero and Git only finds a subcommand similar to the typed command, this command is executed after n tenths of a second. A value of -1 executes the command immediately. Unset or with the value 0, only the possibilities are listed.

So to correct a typo after one second, set:

$ git config --global help.autocorrect 10
$ git statsu
WARNING: You called a Git command named 'statsu', which does not exist.
Continuing under the assumption that you meant 'status'
in 1.0 seconds automatically...
[...]

You can of course cancel the command during this time with Ctrl+C.

2. The Basics

In this chapter, we’ll introduce you to the most important Git commands that you can use to manage your project files in Git. Understanding the Git object model is essential for advanced usage; we’ll cover this important concept in the second section of the chapter. While these explanations may seem overly theoretical at first, we encourage you to read them carefully. All further actions will be much easier for you with the knowledge of this background.

2.1. Git Commands

The commands you learned to get started (especially add and commit) work on the index. In the following, we will take a closer look at the index and the extended use of these commands.

2.1.1. Index

The content of files for Git resides on three levels: the working tree, the index, and the Git repository. The working tree corresponds to the files as they reside on your workstation’s file system — so if you edit files with an editor, search in them with grep, etc., you always operate on the working tree.

The repository is the repository for commits, that is, changes, with author, date, and description. The commits together make up the version history.

Unlike many other version control systems, Git now introduces a new feature, the index. It’s a somewhat elusive intermediate level between the working tree and the repository. Its purpose is to prepare commits. This means that you don’t always have to check in all the changes you have made to a file as commits.

The Git commands add and reset act (in their basic form) on the index, making changes to the index and deleting them again; only the commit command transfers the file to the repository as it is held in the index (Figure 1, “Commands add, reset and commit”).

Figure 1. Commands add, reset and commit

In the initial state, i.e. when git status outputs the message nothing to commit, the working tree and index are synchronized with HEAD. The index is therefore not “empty”, but contains the files in the same state as they are in the working tree.

Usually, the workflow is then as follows: First, you make a change to the working tree using an editor. This change is transferred to the index by add and finally saved in the repository by commit.

You can display the differences between these three levels using the diff command. A simple git diff shows the differences between the working tree and the index — the differences between the (actual) files on your working system and the files as they would be checked in if you called git commit.

The git diff --staged command, on the other hand, shows the differences between the index (also called the staging area) and the repository, that is, the differences that a commit would commit to the repository. In the initial state, when the working tree and index are in sync with HEAD, neither git diff nor git diff --staged produces output.

If you want to apply all changes to all files, there are two shortcuts: First, the -u or --update option of git add. This transfers all changes to the index, but does not yet create a commit. You can further abbreviate it with the -a or --all option of git commit. This is a combination of git add -u and git commit, which puts all changes to all files into one commit, bypassing the index. Avoid getting into the habit of using these options — they may be handy as shortcuts on occasion, but they reduce flexibility.

2.1.1.1. Word-Based Diff

An alternative output format for git diff is the so-called Word-Diff, which is available via the --word-diff option. Instead of the removed and added lines, the output of git diff shows the added (green) and removed (red) words with an appropriate syntax and color-coded.⁠^[12] This is useful when you are only changing single words in a file, for example when correcting AsciiDoc or LaTeX documents, because a diff is difficult to read if added and removed lines differ by only one word:

$ git diff
...
-   die Option `--color-words` zur Verfgung steht. Statt der entfernten
+   die Option `--color-words` zur Verfügung steht. Statt der entfernten
...

However, if you use the --word-diff option, only words that have been changed will be displayed marked accordingly; in addition, line breaks are ignored, which is also very practical because a reorientation of the words is not included as a change in the diff output:

$ git diff --word-diff
...
--color-words zur [-Verfgung-]{Verfügung} steht.
...

If you work a lot with continuous text, it is a good idea to set up an alias to abbreviate this command, so that you only have to type git dw, for example:

$ git config --global alias.dw "diff --word-diff"

2.1.2. Creating Commits Step by Step

But why create commits step-by-step — don’t you always want to check in all changes?

Yes, of course, you usually want to commit your changes completely. However, it can be useful to check them in step by step, for example, to better reflect the development history.

An example: You have worked intensively on your software project for the past three hours, but because it was so exciting, you forgot to pack the four new features into handy commits. In addition, the features are scattered over various files.

At best, you want to be selective, that is, you don’t want to commit all changes from one file, but only certain lines (functions, definitions, tests, …), and from different files.

Git’s index provides the flexibility you need for this. You collect some changes in the index and pack them into a commit — but all other changes are still preserved in the files.

We’ll illustrate this using the “Hello World!” example from the previous chapter. As a reminder, the contents of the hello.pl file

# Hello World! in Perl
print "Hello World!\n";

Now we prepare the file so that it has several independent changes that we don’t want to combine into a single commit. First, we add a shebang line at the beginning.⁠^[13] We also add a line naming the author, and the Perl statement use strict, which tells the Perl interpreter to be as strict as possible in its syntax analysis. It is important for our example that the file has been changed in several places:

#!/usr/bin/perl
# Hello World! in Perl
# Author: Valentin Haenel
use strict;
print "Hello World!\n";

With a simple git add hello.pl all new lines would be added to the index — so the state of the file in the index would be the same as in the working tree. Instead, we use the --patch option or short -p.⁠^[14] This has the effect that we are interactively asked which changes we want to add to the index. Git offers us each change one by one, and we can decide on a case-by-case basis how we want to handle them:

$ git add -p
diff --git a/hello.pl b/hello.pl
index c6f28d5..908e967 100644
--- a/hello.pl
+++ b/hello.pl
@@ -1,2 +1,5 @@
+#!/usr/bin/perl
 # Hello World! in Perl
+# Author: Valentin Haenel
+use strict;
 print "Hello World!\n";
Stage this hunk [y,n,q,a,d,/,s,e,?]?

This is where Git shows all changes, since they’re very close together in the code. If the changes are far apart or spread across different files, they’re offered separately. The term hunk refers to loosely connected lines in the source code. Some of the options we have at this point include the following:

Stage this hunk[y,n,q,a,d,/,s,e,?]?

The options are each only one letter long and difficult to remember. A small reminder is always given by [?]. We have summarized the most important options below.

`y` (yes)	Transfer the current hunk to the index.
`n` (no)	Don’t pick up the current hunk.
`q` (quit)	Do not pick up the current hunk or any of the following ones.
`a` (all)	Pick up the current hunk and all those that follow (in the current file).
`s` (split)	Try to split the current hunk.
`e` (edit)	Edit the current hunk.⁠^[15]

In the example we split the current hunk and enter s for split.

Stage this hunk [y,n,q,a,d,/,s,e,?]? [s]
Split into 2 hunks.
@@ -1 +1,2 @@
+#!/usr/bin/perl
 # Hello World! in Perl

Git confirms that the hunk was successfully split, and now offers us a diff that contains only the shebang line.⁠^[16] We specify y for yes and q for quit on the next hunk. To check if everything worked, we use git diff with the --staged option, which shows the difference between index and HEAD (the latest commit):

$ git diff --staged
diff --git a/hello.pl b/hello.pl
index c6f28d5..d2cc6dc 100644
--- a/hello.pl
+++ b/hello.pl
@@ -1,2 +1,3 @@
+#!/usr/bin/perl
 # Hello World! in Perl
 print "Hello World!\n";

To see which changes are not yet in the index, a simple call to git diff is enough to show us that — as expected — there are still two lines in the working tree:

$ git diff
diff --git a/hello.pl b/hello.pl
index d2cc6dc..908e967 100644
--- a/hello.pl
+++ b/hello.pl
@@ -1,3 +1,5 @@
 #!/usr/bin/perl
 # Hello World! in Perl
+# Author: Valentin Haenel
+use strict;
 print "Hello World!\n";

At this point we could create a commit, but for demonstration purposes we want to start from scratch. So we use git reset HEAD to reset the index.

$ git reset HEAD
Unstaged changes after reset:
M   hello.pl

Git confirms and names the files that have changes in them; in this case, it’s just the one.

The git reset command is in a sense the counterpart of git add: Instead of transferring differences from the working tree to the index, reset transfers differences from the repository to the index. Committing changes to the working tree is potentially destructive, as your changes may be lost. Therefore, this is only possible with the --hard option, which we discuss in Sec. 3.2.3, “Reset and the Index”.

If you frequently use git add -p, it is only a matter of time before you accidentally select a hunk you didn’t want. If the index was empty, this is not a problem since you can reset it to start over. It only becomes a problem if you have already recorded many changes in the index and don’t want to lose them, i.e. you remove a particular hunk from the index without wanting to touch the other hunks.

Analogous to git add -p there is the command git reset -p, which removes single hunks from the index. To demonstrate this, let’s first apply all changes with git add hello.pl and then run git reset -p.

$ git reset -p
diff --git a/hello.pl b/hello.pl
index c6f28d5..908e967 100644
--- a/hello.pl
+++ b/hello.pl
@@ -1,2 +1,5 @@
+#!/usr/bin/perl
 # Hello World! in Perl
+# Author: Valentin Haenel
+use strict;
 print "Hello World!\n";
Unstage this hunk [y,n,q,a,d,/,s,e,?]?

As in the example with git add -p, Git offers hunks one by one, but this time all the hunks in the index. Accordingly, the question is: Unstage this hunk [y,n,q,a,d,/,s,e,?]?, i.e. whether we want to remove the hunk from the index again. As before, by entering the question mark we get an extended description of the available options. At this point we press s once for split, n once for no and y once for yes. Now only the shebang line should be in the index:

$ git diff --staged
diff --git a/hello.pl b/hello.pl
index c6f28d5..d2cc6dc 100644
--- a/hello.pl
+++ b/hello.pl
@@ -1,2 +1,3 @@
+#!/usr/bin/perl
 # Hello World! in Perl
 print "Hello World!\n";

In the interactive modes of git add and git reset, you must press the Enter key after entering an option. The following configuration setting will save you this extra keystroke.

$ git config --global interactive.singlekey true

A word of warning: A git add -p may tempt you to check in versions of a file that are not executable or syntactically correct (e.g. because you forgot an essential line). So don’t rely on your commit being correct just because make — which works on working tree files! -- runs successfully. Even if a later commit fixes the problem, it will still be a problem, among other things, with automated debugging via bisect (see Sec. 4.8, “Finding Regressions — Git Bisect”).

2.1.3. Creating Commits

You now know how to exchange changes between working tree, index, and repository. Let’s turn to the git commit command, which you use to “commit” changes to the repository.

A commit keeps track of the state of all the files in your project at any given time, and also contains meta-information:⁠^[17]

Name of the authors and e-mail address
Name of the committer and e-mail address
Creation date
Commit date

In fact, the name of the author does not have to be the name of the committer (who commits). Often, commits are integrated or edited by maintainers (for example, by rebase, which also adjusts the committer information, see Sec. 4.1, “Moving commits — Rebase”). The committer information is usually of secondary importance, though — most programs only show the author and the date the commit was made.

When you create a commit, Git uses the user.name and user.email settings configured in the previous section to identify the commit.

If you call git commit without any additional arguments, Git will combine all changes in the index into one commit, and open an editor to create a commit message. However, the message will always contain instructions commented out with hash marks (#), or information about which files are changed by the commit. If you call git commit -v, you will still get a diff of the changes you will check in, below the instructions. This is especially useful for keeping track of the changes, and for using the auto-complete feature of your editor.

Once you exit the editor, Git creates the commit. If you don’t specify a commit message or delete the entire contents of the file, Git will abort and not create a commit.

If you only want to write one line, you can use the --message option, or short -m, which allows you to specify the message directly on the command line, thus bypassing the editor:

$ git commit -m "Dies ist die Commit-Nachricht"

2.1.3.1. Improving a Commit

If you rashly entered git commit, but want to make the commit slightly better, the --amend (“correct”) option helps. The option causes git to “add” the changes in the index to the commit you just made.⁠^[18] You can also customize the commit message. Note that the SHA-1 sum of the commit will change in any case.

The git commit --amend call only changes the current commit on a branch. Sec. 4.1.9, “Improving a Commit” describes how to improve past commits.

Calling git commit --amend automatically starts an editor, so you can edit the commit message as well. Often, however, you will only want to make a small correction to a file without adjusting the message. For authors, an alias fixup is useful in this situation:

$ git config --global alias.fixup "commit --amend --no-edit"

2.1.3.2. Good Commit Messages

What should a commit message look like? Not much can be changed in the outer form: The commit message must be at least one line long, but preferably no longer than 50 characters. This makes lists of commits easier to read. If you want to add a more detailed description (which is highly recommended!), separate it from the first line with a blank line. No line should be longer than 76 characters, as is usual for email.

Commit messages often follow the habits or specifics of a project. There may be conventions, such as references to the bug tracking or issue system, or a link to the appropriate API documentation.

Note the following points when writing a commit description:

Never create empty commit messages. Commit messages such as Update, Fix, Improvement, etc. are just as meaningful as an empty message — you might as well leave it at that.

Very important: Describe why something was changed and what the implications are. What has been changed is always obvious from the diff!

Be critical and note if you think there is room for improvement or the commit may introduce bugs elsewhere.

The first line should not be longer than 50 characters, so the output of the version history always remains well formatted and readable.

If the message becomes longer, a short summary (with the important keywords) should be in the first line. After a blank line follows an extensive description.

We can’t stress enough how important a good commit description is. When committing, a developer remembers the changes well, but after a few days, the motivation behind them is often forgotten. Your colleagues or project members will thank you, too, because they can commit changes much faster.

Writing a good commit message also helps to briefly reflect on what has been done and what is still to come. You may find that you’ve forgotten one important detail as you write it.

You can also argue about a timeline: The time it takes you to write a good commit message is a minute or two. But how much less time will the bug-finding process take if each commit is well documented? How much time will you save others (and yourself) if you provide a good description of a diff, which may be hard to understand? Also, the blame tool, which annotates each line of a file with the commit that last changed it, will become an indispensable tool for detailed commit descriptions (see Sec. 4.3, “Who Made These Changes? — Git Blame”).

If you are not used to writing detailed commit messages, start today. Practice makes perfect, and once you get used to it, the work will go quickly — you and others will benefit.

The Git repository is a prime example of good commit messaging. Without knowing the details of Git, you’ll quickly know who changed what and why. You can also see how many hands a commit goes through before it’s integrated.

Unfortunately, the commit messages in most projects are still very spartan, so don’t be disappointed if your peers are lazy about writing, but rather set a good example and provide detailed descriptions.

2.1.4. Moving and Deleting Files

If you want to delete or move files managed by Git, use git rm or git mv. They act like the regular Unix commands, but they also modify the index so that the action is included in the next commit.⁠^[19]

Like the standard Unix commands, git rm also accepts the -r and -f options to recursively delete or force deletion. git mv also offers an option -f (force) if the new filename already exists and should be overwritten. Both commands accept the option -n or --dry-run, which simulates the process and does not modify files.

To delete a file from the index only, use git rm --cached. It then remains in the working tree.

You will often forget to move a file via git mv or delete it via git rm, and use the standard Unix commands instead. In this case, simply mark the file (already deleted by rm) as deleted in the index, too, using git rm <file>.

To rename the file, proceed as follows: First mark the old file name as deleted using git rm <old-name>. Then add the new file: git add <new-name>. Then check via git status whether the file is marked as “renamed”.

Internally, it doesn’t matter to Git whether you move a file regularly via mv, then run git add <new-name> and git rm <old-name>. In any case, only the reference to a blob object is changed (seeSec. 2.2, “The Object Model”).

However, Git comes with a so-called Rename Detection: If a blob is the same and is only referenced by a different file name, Git interprets this as a rename. If you want to examine the history of a file and follow it if it is renamed, use the following command:

$ git log --follow -- <file>

2.1.5. Using Grep on a Repository

If you want to search for an expression in all files of your project, you can usually use grep -R <expression> ..

However, Git offers its own grep command, which you can call up using git grep <expression>. This command usually searches for the expression in all files managed by Git. If you want to examine only some of the files instead, you can specify the pattern explicitly. With the following command you can find all occurrences of border-color in all CSS files:

$ git grep border-color -- '*.css'

The grep implementation of Git supports all common flags that are also present in GNU Grep. However, calling git grep is usually an order of magnitude faster, since Git has significant performance advantages due to the object database and the multithreaded design of the command.

The popular grep alternative ack is characterized mainly by the fact that it combines the lines of a file matching the search pattern under a corresponding “heading”, and uses striking colors. You can emulate the output of ack with git grep by using the following alias:

$ git config alias.ack '!git -c color.grep.filename="green bold" \
  -c color.grep.match="black yellow" -c color.grep.linenumber="yellow bold" \
  grep -n --break --heading --color=always --untracked'

2.1.6. Examining the Project History

Use git log to examine the project’s version history. The options of this command (most of which also work for git show) are very extensive, and we will introduce the most important ones below.

Without any arguments, git log will output the author, date, commit ID, and the full commit message for each commit. This is handy when you need a quick overview of who did what and when. However, the list is a bit cumbersome when you’re looking at a lot of commits.

If you only want to look at recently created commits, limit git log’s output to n commits with the -<n> option. For example, the last four commits are shown with:

$ git log -4

To display a single commit, enter:

$ git log -1 <commit>

The <commit> argument is a legal name for a single commit, such as the commit ID or SHA-1 sum. However, if you do not specify anything, Git automatically uses HEAD. Apart from single commits, the command also understands so-called commit ranges (series of commits), see Sec. 2.1.7, “Commit-Ranges”.

The -p (--patch) option appends the full patch in Unified-Diff format below the description. Thus, a git show <commit> from the output is equivalent to git log -1 -p <commit>.

If you want to display the commits in compressed form, we recommend the --oneline option: It summarizes each commit with its abbreviated SHA-1 sum and the first line of the commit message. It is therefore important that you include as much useful information as possible in this line! For example, this would look like this:⁠^[20]

$ git log --oneline
25f3af3 Correctly report corrupted objects
786dabe tests: compress the setup tests
91c031d tests: cosmetic improvements to the repo-setup test
b312b41 exec_cmd: remove unused extern

The --oneline option is only an alias for --pretty=oneline. There are other ways to customize the output of git log. The possible values for the --pretty option are:

`oneline`	Commit-ID and first line of the description.
`short`	Commit ID, first line of the description and author of the commit; output is four lines.
`medium`	Default; output of commit ID, author, date and complete description.
`full`	Commit ID, author’s name, name of the committer and full description — no date.
`fuller`	Like `medium`, but additionally date and name of the committer.
`email`	Formats the information from `medium` so that it looks like an e-mail.
`format:⁠<string>`	Any format can be adapted by placeholders; for details see the man page `git-log(1)`, section “Pretty Formats”.

Independently of this, you can display more information about the changes made by the commit below the commit message. Consider the following examples, which clearly show which files were changed in how many places:

$ git log -1 --oneline 4868b2ea
4868b2e setup: officially support --work-tree without --git-dir

$ git log -1 --oneline --name-status 4868b2ea
4868b2e setup: officially support --work-tree without --git-dir
M       setup.c
M       t/t1510-repo-setup.sh

$ git log -1 --oneline --stat 4868b2ea
4868b2e setup: officially support --work-tree without --git-dir
 setup.c               |   19
 t/t1510-repo-setup.sh |  210 +++++++++++++++++------------------
 2 files changed, 134 insertions(), 95 deletions(-)

$ git log -1 --oneline --shortstat 4868b2ea
4868b2e setup: officially support --work-tree without --git-dir
 2 files changed, 134 insertions(+), 95 deletions(-)

2.1.6.1. Time Constraints

You can restrict the time of the commits to be displayed using the --after or --since and --until or --before options. The options are all synonymous, so they give the same results.

You can specify absolute dates in any common format, or relative dates, here are some examples:

$ git log --after='Tue Feb 1st, 2011'
$ git log --since='2011-01-01'
$ git log --since='two weeks ago' --before='one week ago'
$ git log --since='yesterday'

2.1.6.2. File-Level Restrictions

If you specify one or more file or directory names after a git log call, Git will only display the commits that affect at least one of the specified files. Provided a project is well structured, the output of commits can be severely limited and a particular change can be found quickly.

Since filenames may collide with branches or tags, you should be sure to specify the filenames after a -- which means that only file arguments follow.

$ git log -- main.c
$ git log -- *.h
$ git log -- Documentation/

These calls only output the commits in which changes were made to the main.c file, an .h file, or a file under Documentation/.

2.1.6.3. Grep for Commits

You can also search for commits in the style of grep, where the --author, --committer, and --grep options are available.

The first two options filter commits by author or committer name or address, as expected. For example, list all commits that Linus Torvalds has made since early 2010:

$ git log --since='2010-01-01' --author='Linus Torvalds'

You can also enter only part of the name or e-mail address here, so searching for 'Linus' would produce the same result.

For example, you can use --grep to search for keywords or phrases in the commit message, such as all commits that contain the word “fix” (not case-sensitive):

$ git log -i --grep=fix

The -i (or --regexp-ignore-case) option causes git log to ignore the pattern case (also works with --author and --committer).

All three options treat the values as regular expressions, just like grep (see the regex(7) man page). The -E and -F options change the behaviour of the options in the same way as egrep and fgrep: to use extended regular expressions or to search for the literal search term (whose special characters lose their meaning).

To search for changes, use the so-called Pickaxe tool. This will help you find commits whose diffs contain a certain regular expression (“grep for diffs”):

$ git log -p -G<regex>

The <regex> must be specified directly, i.e. without spaces, after the -G pickaxe option. The --pickaxe-all option causes all changes to the commit to be listed, not just those containing the change you are looking for.

Note that in earlier versions of Git, this operation was performed by the -S option, but it differs from -G in that it only finds the commits that change the number of times the pattern occurs — especially code shifts, i.e., removals and additions elsewhere in a file, are not found.

Equipped with these tools, you can now tame masses of commits yourself. Just specify as many criteria as you need to reduce the number of commits.

2.1.7. Commit-Ranges

So far, we’ve only looked at commands that require only a single commit as an argument, explicitly identified by its commit ID, or implicitly by the symbolic name HEAD, which references the most recent commit.

The git show command displays information about a commit, while the git log command starts at a commit, and then goes back in the version history until the beginning of the repository (called the root commit) is reached.

An important tool for specifying a series of commits is the so-called commit ranges in the form <commit1>..<commit2>. Since we have not yet worked with multiple branches, this is simply a range of commits in a repository, from <commit1> exclusive to <commit2> inclusive. If you omit one of the two boundaries, Git will take the value HEAD.

2.1.8. Differences between Commits

The command git show or git log -p has been used to show only the difference from the previous commit. If you want to see the differences between several commits, the command git diff.

The diff command performs several tasks. As already seen, you can examine the differences between the working tree and the index without specifying any commits, or the differences between index and HEAD with the --staged option.

However, if you pass two commits or a commit range to the command, the difference between these commits is displayed instead.

2.2. The Object Model

Git is based on a simple but extremely powerful object model. It is used to map the typical elements of a repository (files, directories, commits) and the development over time. Understanding this model is very important, and it helps to abstract from typical Git steps to better understand them.

In the following, we will again use a “Hello World!” program as an example, this time in the Python programming language.⁠^[21]

Figure 2. “Hello World!” Program in Python

The project consists of the file hello.py as well as a README file and a directory test. If you run the program with the command python hello.py, you will get the output: Hello World!. In the directory test is a simple shell script, test.sh, which displays an error message if the Python program does not output the string Hello World! as expected.

The repository for this project consists of the following four commits:

$ git log --oneline
e2c67eb Kommentar fehlte
8e2f5f9 Test Datei
308aea1 README Datei
b0400b0 Erste Version

2.2.1. SHA-1 — The Secure Hash Algorithm

SHA-1 is a secure hash algorithm that calculates a checksum of digital information: the SHA-1 sum. The algorithm was introduced in 1995 by the American National Institute of Standards and Technology (NIST) and the National Security Agency (NSA). SHA-1 was developed for cryptographic purposes and is used for checking the integrity of messages and as a basis for digital signatures. Figure 3, “SHA-1 Algorithm” shows how it works, where we calculate the checksum of hello.py.

The algorithm is a mathematical one-way function that maps a bit sequence of maximum length 2⁶⁴-1 bits (about 2 exbibytes) to a checksum of length 160 bits (20 bytes). The checksum is usually represented as a hexadecimal character string of length 40. The algorithm results in 2¹⁶⁰ (approx. 1.5 · 10⁴⁹) different combinations for this length of checksum, and therefore it is very, very unlikely that two bit sequences have the same checksum. This property is called collision safety.

Figure 3. SHA-1 Algorithm

Despite all efforts of cryptologists, several years ago various theoretical attacks on SHA-1 became known, which are supposed to make the generation of collisions possible with a considerable computing effort.⁠^[22] For this reason, NIST today recommends the use of the successors of SHA-1: SHA-256, SHA-384 and SHA-512, which have longer checksums and thus make the generation of collisions more difficult. On the Git mailing list there was a debate about switching to one of these alternatives, but this step was not considered necessary.⁠^[23]

This is because, although there is a theoretical attack vector on the SHA-1 algorithm, this does not compromise the security of Git. In fact, the integrity of a repository is not primarily protected by the collision resistance of an algorithm, but by the fact that many developers have identical copies of the repository.

The SHA-1 algorithm plays a central role in Git because it is used to build checksums of the data stored in the Git repository, the Git objects. This makes them easy to reference as SHA-1 sums of their contents. In your daily work with Git, you will usually only use SHA-1 sums of commits, known as commit IDs. This reference can be passed to many Git commands, such as git show and git diff. Depending on the repository, you often only need to specify the first few characters of an SHA-1 sum, since in practice a prefix is sufficient to uniquely identify a commit.

2.2.2. The Git Objects

All data stored in a Git repository is available as Git objects. There are four types:⁠^[24]

Table 1. Git Objects
Object	Saves…	References other objects	Correspondence
Blob	File content	No	File
Tree	Blobs and Trees	Yes	Directory
Commit	Project state	Yes, a tree and further commits	Snapshot/Archive at a time
Tag	Tag information	Yes, an object	Naming important snapshots or blobs

Figure 4, “Git Objects” shows three objects from the example project — a blob, a tree, and a commit.⁠^[25] The representation of each object includes the object type, the size in bytes, the SHA-1 sum, and the contents. The blob contains the content of the file hello.py (but not the file name). The tree contains references to one blob for each file in the project, i.e. one for hello.py and one for README, plus one tree per subdirectory, i.e. in this case only one for test. The files in the subdirectories are referenced separately in the respective trees that map these subdirectories.

Figure 4. Git Objects

So the commit object contains exactly one reference to a tree, and that reference is to the tree of the project content — this is a snapshot of the state of the project. The commit object also contains a reference to its direct ancestors, along with the metadata “author” and “committer” and the commit message.

Many Git commands expect a tree as an argument. However, because a commit, for example, references a tree, this is called a tree-ish argument. This refers to any object that can last be resolved to a tree. This category also includes tags (see Sec. 3.1.3, “Tags — Marking Important Versions”). Similarly, commit-ish is an argument that can be resolved to a commit.

File contents are always stored in blobs. Trees only contain references to blobs and other trees in the form of the SHA-1 sums of these objects. A commit in turn references a tree.

2.2.3. The Object Database

All Git objects are stored in the object database and are identifiable by their unique SHA-1 sum, i.e. you can find an object in the database by its SHA-1 sum once it has been stored. Thus, the object database basically functions like a large hash table, where the SHA-1 sums serve as keys for the stored contents:⁠^[26]

e2c67eb ⟶ commit
8e2f5f9 ⟶ commit
308aea1 ⟶ commit
b0400b0 ⟶ commit
a26b00a ⟶ tree
6cf9be8 ⟶ blob  (README)
52ea6d6 ⟶ blob  (hello.py)
c37fd6f ⟶ tree  (test)
e92bf15 ⟶ blob  (test/test.sh)
5b4b58b ⟶ tree
dcc027b ⟶ blob  (hello.py)
e4dc644 ⟶ tree
a347f5e ⟶ tree

You will first see the four commits that make up the Git repository, including the e2c67eb commit shown in Figure 4, “Git Objects”. This is followed by trees and blobs, each with file or directory correspondence. So-called top-level trees have no directory name: They refer to the top level of a project. A commit always references a top-level tree, so there are four of them.

The hierarchical relationship of the objects listed above is shown in Figure 5, “Hierarchical Relationship of Git Objects”. On the left-hand side, you can see the four commits that are already in the repository, and on the right-hand side, the referenced contents of the most recent commit (C4). As described above, each commit contains a reference to its direct predecessor (the resulting graph of commits is discussed below). This relationship is illustrated by the arrows pointing from one commit to the next.

Figure 5. Hierarchical Relationship of Git Objects

Each commit references the top-level tree — including the C4 commit in the example. The top-level tree in turn references the files hello.py and README in the form of blobs, and the subdirectory test in the form of another tree. Because of this hierarchical structure and the relationship of the individual objects to one another, Git is able to map the contents of a hierarchical file system as Git objects and store them in the object database.

2.2.4. Examining the Object Database

In a short digression we will go into how to examine the object database of Git. To do this, Git provides so-called plumbing commands, a group of low-level tools for Git, as opposed to the porcelain commands you usually work with. These commands are therefore not important for Git beginners, but are simply intended to give you a different approach to the concept of the object database. For more information, see Sec. 8.3, “Writing Your Own Git Commands”.

Let’s first look at the current commit. We’ll use the git show command with the --format=raw option, so let’s output the commit in raw format, so that everything this commit contains is displayed.

$ git show --format=raw e2c67eb
commit e2c67ebb6d2db2aab831f477306baa44036af635
tree a26b00aaef1492c697fd2f5a0593663ce07006bf
parent 8e2f5f996373b900bd4e54c3aefc08ae44d0aac2
author Valentin Haenel <valentin.haenel@gmx.de> 1294515058 +0100
committer Valentin Haenel <valentin.haenel@gmx.de> 1294516312 +0100

    Kommentar fehlte
...

As you can see, all the information in Figure 4, “Git Objects” is output: the SHA-1 sums of the commit, tree, and direct ancestor, plus the author and committer (including the date as a Unix timestamp), and the commit description. The command also provides the diff output for the previous commit — but this is not part of the commit, strictly speaking, and is therefore omitted here.

Next, let’s take a look at the tree referenced by this commit, using git ls-tree, a plumbing command to list the contents stored in a tree. It’s similar to ls -l, except that it is in the object database. With --abbrev=7 we shorten the output SHA-1 sums to seven characters.

$ git ls-tree --abbrev=7 a26b00a
100644 blob 6cf9be8  README
100644 blob 52ea6d6  hello.py
040000 tree c37fd6f  test

As in Figure 4, “Git Objects” the tree referenced by the commit contains one blob for each of the two files, and one tree (also: subtree) for the test directory. We can look at its contents again with ls-tree, since we now know the SHA-1 sum of the tree. As expected, you can see that the test tree references exactly one blob, the blob for the file test.sh.

$ git ls-tree --abbrev=7 c37fd6f
100755 blob e92bf15  test.sh

Finally, we make sure that the blob for hello.py really contains our “Hello World!” program and that the SHA-1 sum is correct. The command git show shows any objects. If we pass the SHA-1 sum of a blob, its contents are output. To check the SHA-1 sum we use the plumbing command git hash-object.

$ git show 52ea6d6
#! /usr/bin/env python

""" Hello World! """

print 'Hello World!'
$ git hash-object hello.py
52ea6d6f53b2990f5d6167553f43c98dc8788e81

A note for curious readers: git hash-object hello.py does not produce the same output as the Unix command sha1sum hello.py. This is because not only the file content is stored in a blob. Instead, the object type, in this case blob, and the size, in this case 67 bytes, are stored in a header at the beginning of the blob. The hash-object command therefore does not calculate the checksum of the file content, but of the blob object.

2.2.5. Deduplication

The four commits that make up the sample repository are shown again in Figure 6, “Repository Content”, but in a different way: The dashed bordered tree and blob objects indicate unchanged objects, all others were added or changed in the corresponding commit. The reading direction here is from bottom to top: at the bottom is C1, which contains only the file hello.py.

Since trees only contain references to blobs and other trees, each commit stores the status of all files, but not their contents. Normally, only a few files change during a commit. New blob objects (and therefore new tree objects) are now created for the new files or those to which changes have been made. However, the references to the unchanged files remain the same.

Figure 6. Repository Content

Even more: A file that exists twice only exists once in the object database. The contents of this file are stored as a blob in the object database and are referenced by a tree in two places. This effect is known as deduplication: Duplicates are not only prevented, but not made possible in the first place. Deduplication is an essential feature of Content-Addressable File Systems, i.e. file systems that know files only by their contents (such as Git, for example, by giving an object the SHA-1 sum of itself as “name”).

Consequently, a repository in which the same 1 MB file exists 1000 times takes up only slightly more than 1 MB. Git essentially has to manage the blob, plus a commit and a tree with 1000 blob entries (20 bytes each plus the length of the filename). A checkout of this repository, on the other hand, consumes about 1 GB of space on the filesystem because Git resolves deduplication.⁠^[27]

The git checkout and git reset commands restore a previous state (see also Sec. 3.2, “Restoring Versions”): You specify the reference of the corresponding commit, and Git searches for it in the object database. The reference is then used to find the tree object of this commit from the object database. Finally, Git uses the references contained in the tree object to find all other tree and blob objects in the object database and replicates them as directories and files on the file system. This allows you to restore exactly the project state that was saved with the commit at the time.

2.2.6. The Graph Structure

Because each commit stores its direct ancestors, a graph structure is created. More precisely, the arrangement of the commits creates a Directed Acyclic Graph (DAG). A graph consists of two core elements: the nodes and the edges connecting these nodes. In a directed graph, the edges are also characterized by a direction, which means that when you run the graph, you can only use the edges that point in the appropriate direction to move from one node to the next. The acyclic property rules out that you can find your way back to a node by any route through the graph. So you cannot move in a circle.⁠^[28]

Most Git commands are used to manipulate the graph: to add/remove nodes or to change the relation of the nodes to each other. You’ll know you’ve reached an advanced level of Git competency when you’ve internalized this rather abstract concept, and when you’re working with branches on a daily basis, you always think of the graph behind them. Understanding Git at this level is the first and only real hurdle to mastering Git safely in everyday life.

The graph structure is derived from the object model, because each commit knows its direct ancestor (possibly several in the case of a merge commit). The commits form the nodes of this graph — the references to ancestors form the edges.

An example graph is shown in Figure 7, “A Commit Graph”. It consists of several commits, which are colored to make it easier to distinguish between their affiliations to different development branches. First, the commits A, B, C, and D were made. They form the main development branch. Commits E and F contain feature development, which was transferred to the main development branch with commit H. Commit G is a single commit that has not yet been integrated into the main development branch.

Figure 7. A Commit Graph

One result of the graph structure is the cryptographically secured integrity of a repository. Git uses the SHA-1 sum of a commit to reference not only the contents of the project files at a given point in time, but also all commits executed up to that point, and their relationship to each other, i.e. the complete version history.

The object model makes this possible: each commit stores a reference to its ancestors. These references are then used to calculate the SHA-1 sum of the commit itself. So you get a different commit if you reference another ancestor.

Since the predecessor in turn references predecessors, and its SHA-1 sum depends on the predecessors, and so on, this means that the complete version history is implicitly encoded in the commit ID. Implicit here means: If even one bit of a commit changes anywhere in the version history, then the SHA-1 sum of subsequent commits, especially the topmost one, is no longer the same. The SHA-1 sum doesn’t say anything detailed about the version history, though; it’s just a checksum of it.

2.2.6.1. References: Branches and Tags

However, there is not much you can do with a pure commit graph. To reference (i.e., work with) a node, you need to know its name, which is the SHA-1 sum of the commit. In everyday use, however, you rarely use the SHA-1 sum of a commit directly, but instead use symbolic names, called references, which Git can resolve to the SHA-1 sum.

Git basically offers two types of references, branches and tags. These are pointers to a commit graph, which are used to mark specific nodes. Branches have a “moving” character, meaning that they move up as new commits are added to the branch. Tags, on the other hand, are static in nature, and mark important points in the commit graph, such as releases.

Figure 8, “Example of a Commit Graph with Branches and Tags” shows the same commit graph with the master, HEAD, feature, and bugfix branches. And the v0.1 and v0.2 tags.

Figure 8. Example of a Commit Graph with Branches and Tags

3. Practical Version Control

The following chapter introduces all the essential techniques you’ll use in your daily work with Git. In addition to a more detailed description of the index and how to restore old versions, the focus is on working effectively with branches.

3.1. References: Branches and Tags

In the CVS/SVN environment, “Branch” and “Merge” are often a book with seven seals for newcomers, but for experts they are a regular cause for hair-raising. In Git, branching and merging are commonplace, simple, transparent, and fast. It’s common for a developer to create multiple branches and perform multiple merges in one day.

The tool Gitk is helpful in order not to lose the overview of several branches. With gitk --all you show all branches. The tool visualizes the commit graph explained in the previous section. Each commit represents one line. Branches are displayed as green labels, tags as yellow pointers. For more information, see Sec. 3.6.2, “Gitk”.

Figure 9. The sample repository from Ch. 2, The Basics. For illustration purposes, the second commit has been tagged v0.1.

Because branches in Git are “cheap” and merges are easy, you can afford to use branches excessively. Want to try something, prepare a small bug fix, or start with an experimental feature? You can create a new branch for each of these. You want to test if one branch is compatible with the other? Merge them together, test everything, then delete the merge again and continue developing. This is common practice among developers using Git.

First, let’s look at references in general. References are nothing more than symbolic names for the hard to remember SHA-1 sums of commits.

These references are stored in .git/refs/. The name of a reference is determined by the file name, and the target is determined by the contents of the file. For example, the master branch you have been working on all along looks like this:

$ cat .git/refs/heads/master
89062b72afccda5b9e8ed77bf82c38577e603251

If Git needs to manage a lot of references, they may not be stored as files under .git/refs/. Instead, Git creates a container that contains packed references (Packed Refs): One line per reference with name and SHA-1 sum. This makes sequential resolution of many references faster. Git commands search for branches and tags in the .git/packed-refs file if the corresponding .git/refs/<name> file does not exist.

Under .git/refs/ there are several directories that represent the “type” of reference. There is no fundamental difference between these references, only when and how they are used. The references you will use most often are branches. They are stored under .git/refs/heads/. Heads refers to what is sometimes called a “tip” in other systems: The latest commit on a development branch.⁠^[29] Branches move up when you make commits on a branch, so they remain at the top of the version history.

Figure 10. A branch always references the most recent commit

Branches in other developers' repositories (e.g. the master branch of the official repository), so-called remote tracking branches, are stored under .git/refs/remotes/ (see Sec. 5.2.2, “Remote-Tracking-Branches”). Tags, static references, which are mostly used for versioning, are stored under .git/refs/tags/ (see Sec. 3.1.3, “Tags — Marking Important Versions”).

3.1.1. HEAD and Other Symbolic References

Eine Referenz, die Sie selten explizit, aber ständig implizit benutzen, ist HEAD. Sie referenziert meist den gerade ausgecheckten Branch, hier master:

One reference that you rarely use explicitly, but always implicitly, is HEAD. It usually refers to the branch you just checked out, in this case master:

$ cat .git/HEAD
ref: refs/heads/master

HEAD can also point directly to a commit if you type git checkout <commit-id>. However, you are then in so-called detached-head mode, in which commits may get lost, see also Sec. 3.2.1, “Detached HEAD”.

The HEAD determines which files are found in the working tree, which commit becomes the predecessor when a new one is created, which commit is displayed by git show, and so on. When we speak of “the current branch”, we mean the HEAD in a technically correct sense.

The simple commands log, show, and diff take HEAD as their first argument, without any further arguments. The output of git log is the same as the output of git log HEAD, and so on — this applies to most commands that operate on a commit if you don’t specify one explicitly. HEAD is thus similar to the shell variable PWD, which specifies “where you are”.

When we talk about a commit, a command usually doesn’t care whether you specify the commit ID in full or in abbreviated form, or whether you access the commit by reference, such as a tag or branch. However, such a reference may not always be unique. What happens if there is a branch master and a tag with the same name? Git checks if the following references exist:

.git/<name> (mostly only useful for HEAD or similar)
.git/refs/<name>
.git/refs/tags/<name>
.git/refs/heads/<name>
.git/refs/remotes/<name>
.git/refs/remotes/<name>/HEAD

Git will take the first matching reference it finds. So you should always give tags a unique scheme so that they don’t get confused with branches. This way you can address branches directly by name instead of heads/<name>.

Especially important are the suffixes ^ and ~<n>. The syntax <ref>^ indicates the direct ancestor of <ref>. This does not always have to be unique: If two or more branches were merged, the merge commit has several direct ancestors. <ref>^ or <ref>^1 then denotes the first direct ancestor, <ref>^2 the second, and so on.⁠^[30] So the syntax HEAD^^ means “the two-level previous direct ancestor of the current commit”. Note that ^ may have a special meaning in your shell and you may need to protect it with quotes or a backslash.

Figure 11. Relative References, ^ and ~<n>

The syntax <ref>~<n> is equivalent to repeating ^ n times: HEAD~10 thus denotes the tenth direct predecessor of the current commit. Note: This does not mean that only eleven commits are stored between HEAD and HEAD~10: Since ^ only follows the first string in any merge, the eleven commits stored between the two references, and all the other commits integrated by a merge, are the same. The syntax is documented in the git-rev-parse(1) man page in the “Specifying Revisions” section.

3.1.2. Managing Branches

A branch is created in Git in no time. All Git needs to do is identify the currently checked out commit and store the SHA-1 sum in the .git/refs/heads/<branch-name> file.

$ time git branch neuer-branch
git branch neuer-branch  0.00s user 0.00s system 100% cpu 0.008 total

The command is so fast because (unlike other systems) no files need to be copied and no additional metadata needs to be stored. Information about the structure of the version history can always be derived from the commit that a branch references and its ancestors.

Here is an overview of the most important options:

git branch [-v]

Lists local branches. The currently checked-out branch is marked with an asterisk. You can also use -v to display the commit IDs to which the branches point and the first line of the description of the corresponding commits.

$ git branch -v
  maint  65f13f2 Start 1.7.5.1 maintenance track
* master 791a765 Update draft release notes to 1.7.6
  next   b503560 Merge branch _master_ into next
  pu     d7a491c Merge branch _js/info-man-path_ into pu

git branch <branch> [<ref>]: Creates a new branch <branch> pointing to commit <ref> (<ref> can be the SHA-1 sum of a commit, another branch, etc.). If you do not specify a reference, this is HEAD, the current branch.

git branch -m <new-name>

git branch -m <old-name> <new-name>

In the first form the current branch is renamed to <new-name>. In the second form <old-name> is renamed to <new-name>. The command fails if this would overwrite another branch.

$ git branch -m master
fatal: A branch named 'master' already exists.

If you rename a branch, Git will not display a message. So you can check afterwards to make sure the renaming was successful:

$ git branch
* master
  test
$ git branch -m test pu/feature
$ git branch
* master
  pu/feature

git branch -M …: Like -m, except that a branch is also renamed if it overwrites another branch. Attention: Commits of the overwritten branch may be lost!

git branch -d <branch>: Delete <branch>. You can specify several branches at once. Git refuses to delete a branch if it is not yet fully integrated into its upstream branch, or, if it does not exist, into HEAD, the current branch. (For more on upstream branches, see Sec. 5.3.2, “git pull”).

git branch -D …: Deletes a branch, even if it contains commits that have not yet been integrated into the upstream or current branch. Note: These commits may be lost unless they are referenced differently.

3.1.2.1. Changing Branches: Checkout

You can change branches with git checkout <branch>. If you create a Branch and want to switch directly to it, use git checkout -b <branch>. The command is equivalent to git branch <branch> && git checkout <branch>.

What happens during a checkout? Each branch references a commit, which in turn references a tree, that is, the image of a directory structure. A git checkout <branch> now resolves the reference <branch> to a commit and replicates the commit’s tree to the index and to the working tree (i.e., the filesystem).

Since Git knows which version of files are currently in the index and working tree, only the files that differ on the current and new branches need to be checked out.

Git makes it hard for users to lose information. Therefore, a checkout is more likely to fail than overwrite any unsaved changes in a file. This happens in the following two cases:

The checkout would overwrite a file in the working tree that contains changes. Git will display the following error message: error: Your local changes to the following files would be overwritten by checkout: file.

The checkout would overwrite an untracked file, i.e. a file that is not managed by Git. Git then aborts with the error message: error: The following untracked working tree files would be overwritten by checkout: file.

If, however, changes are stored in the working tree or index that are compatible with both branches, a checkout takes over these changes. This would look like this, for example:

$ git checkout master
A   neue-datei.txt
Switched to branch master

This means that the file new-file.txt was added, which does not exist on either branch. So since no information can be lost here, the file is simply transferred. The message: A new-file.txt reminds you which files you should still take care of. A stands for added, D for deleted and M for modified.

If you’re sure you don’t need your changes anymore, you can use git checkout -f to ignore the error messages and run the checkout anyway.

If you want to keep the changes and change the branch (e.g., interrupt your work and fix a bug on another branch), git stash will help (Sec. 4.5, “Outsourcing Changes — Git Stash”).

3.1.2.2. Branch Naming Conventions

In principle, you can name branches almost arbitrarily. Exceptions are spaces, some special characters with special meaning for Git (e.g. *, ^, :, ~), as well as two consecutive dots (..) or a dot at the beginning of the name.⁠^[31]

It makes sense to always enter branch names completely in lower case letters. Since Git manages branch names under .git/refs/heads/ as files, it is essential that you use upper and lower case.

You can group branches into “namespaces” by using a / as a separator. Branches that are related to the translation of a software can then be named e.g. i18n/german, i18n/english etc. If several developers share a repository, you can also create “private” branches under <username>/<topic>. These namespaces are represented by a directory structure, so that a directory <username>/ with the branch file <topic> is created under .git/refs/heads/.

The main development branch of your project should always be called master. Bugfixes are often managed on a branch maint (short for “maintenance”). The next release is usually prepared for next. Features that are still in an experimental state should be developed in pu (for “proposed updates”) or in pu/<feature>. For a more detailed description of how to use branches to structure development and organize release cycles, see Ch. 6, Workflows on Workflows.

3.1.2.3. Deleted Branches and “Lost” Commits

Commits each have one or more predecessors. Therefore, you can walk through the commit graph “directed”, that is, from newer to older commits, until you reach a root commit.

It’s not the other way around: if a commit knew its successor, that version would have to be stored somewhere. This would change the SHA-1 sum of the commit, and the successor would have to reference the corresponding new commit, which would give it a new SHA-1 sum, so the predecessor would have to be changed, and so on. So Git can only go through the commits from a named reference (such as a branch or HEAD) in the direction of earlier commits.

Therefore, if the “top” of a branch is deleted, the topmost commit is no longer referenced (in Git jargon: unreachable). As a result, the predecessor is no longer referenced, and so on, until the next commit comes along that is referenced in some way (either by a branch, or by having a successor that is itself referenced by a branch).

So when you delete a branch, the commits on that branch are not deleted, they are just “lost”. Git simply doesn’t find them anymore.

However, they will still be present in the object database for a while.⁠^[32] So you can easily restore a branch by explicitly specifying the previous (and supposedly deleted) commit as a reference:

$ git branch -D test
Deleted branch test (was e32bf29).
$ git branch test e32bf29

Another way to retrieve deleted commits is the reflog (see Sec. 3.7, “Reflog”).

3.1.3. Tags — Marking Important Versions

SHA-1 sums are a very elegant solution to describe versions decentrally, but they are semantically poor and unwieldy for humans. Unlike linear revision numbers, commit IDs alone tell us nothing about the order of versions.

During the development of software projects, different “important” versions need to be marked so that they can be easily found in the repository. The most important ones are usually those that are released, called releases. Release candidates are also often marked in this way, i.e. versions that form the basis for the next version and are checked for critical bugs in the course of quality assurance without adding new features. Depending on the project and development model, there are different conventions for marking releases and procedures for preparing and publishing them.

In the open source area, two versioning schemes have become established: the classic major/minor/micro versioning scheme and, more recently, date-based versioning. With major/minor/micro versioning, which is used e.g. with the Linux kernel and also Git, a version is identified by three (often four) numbers: 2.6.39 or 1.7.1. With date-based versioning, on the other hand, the designation is derived from the time of the release, e.g.: 2011.05 or 2011-05-19. This has the great advantage that the age of a version is easily identifiable.⁠^[33]

Git offers tags (“labels”) that can be used to mark any Git object — usually commits — to highlight prominent states in its development history. Like branches, tags are implemented as references to objects. Unlike branches, however, tags are static, meaning that they are not moved when new commits are added, and always point to the same object. There are two types of tags: annotated and lightweight. Annotated tags are tagged with metadata, such as author, description, or GPG signature. Lightweight tags, on the other hand, “simply” point to a specific Git object. For both types of tags, Git creates references under .git/refs/tags/ or .git/packed-refs. The difference is that for each annotated tag, Git creates a special Git object — a tag object — in the Object Database to store the metadata and SHA-1 sum of the selected object, while a Lightweight tag points directly to the selected object. Figure 12, “The Tag Object” shows the contents of a tag object; compare also the other git objects, Figure 4, “Git Objects”.

Figure 12. The Tag Object

The tag object shown has both a size (158 bytes) and a SHA-1 sum. It contains the name (0.1), the object type and the SHA-1 sum of the referenced object as well as the name and e-mail of the author, which is called tagger in Git jargon. In addition, the tag contains a tag message that describes the version, for example, and optionally a GPG signature. In the Git project, for example, a tag message consists of the current version designation and the signature of the maintainer.

In the following, let’s first look at how you manage tags locally. Sec. 5.8, “Exchanging Tags” describes how you exchange tags between repositories.

3.1.3.1. Managing Tags

You can manage tags with the command git tag. Without arguments it shows all existing tags. Depending on the size of the project, it is worth limiting the output with the -l option and a corresponding pattern. With the following command you display all variants of version 1.7.1 of the git project, i.e. both the release candidates with the addition -rc* and the (four-digit) maintenance releases:

$ git tag -l v1.7.1*
v1.7.1
v1.7.1-rc0
v1.7.1-rc1
v1.7.1-rc2
v1.7.1.1
v1.7.1.2
v1.7.1.3
v1.7.1.4

The content of a tag is provided by git show:

$ git show 0.1 | head
tag 0.1
Tagger: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Wed Mar 23 16:52:03 2011 +0100

Erste Veröffentlichung

commit e2c67ebb6d2db2aab831f477306baa44036af635
Author: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Sat Jan 8 20:30:58 2011 +0100

Gitk presents tags as yellow, arrow-like boxes that are clearly distinguishable from the green, rectangular branches:

Figure 13. Tags in Gitk

3.1.3.2. Lightweight Tags

To add a lightweight tag to the HEAD, pass the desired name to the command (in this example, to mark an important commit)

$ git tag api-aenderung
$ git tag
api-aenderung

To add a lightweight tag to the HEAD, pass the desired name to the command (in this example, to mark an important commit)

$ git tag pre-regression HEAD~23
$ git tag
api-aenderung
pre-regression

Tags are unique — if you try to recreate a tag, Git will abort with an error message:

$ git tag pre-regression
fatal: tag 'pre-regression' already exists

3.1.3.3. Annotated Tags

Annotated tags are created with the -a option. As with git commit, an editor will open and allow you to write the tag message. Or you can pass the tag message with the option -m — in which case the option -a is redundant:

$ git tag -m "Zweite Veröffentlichung" 0.2

3.1.3.4. Signed Tags

To verify a signed tag, use the -v (verify) option:

$ git tag -v v1.7.1
object d599e0484f8ebac8cc50e9557a4c3d246826843d
type commit
tag v1.7.1
tagger Junio C Hamano <gitster@pobox.com> 1272072587 -0700

Git 1.7.1
gpg: Signature made Sat Apr 24 03:29:47 2010 CEST using DSA key ID F3119B9A
gpg: Good signature from "Junio C Hamano <junkio@cox.net>"
...

Of course, this assumes that you have both GnuPG installed and that you have already imported the signer’s key.

In order to sign tags yourself, you must first set the preferred key:

$ git config --global user.signingkey <GPG-Key-ID>

Now you can create signed tags with the -s (sign) option:

$ git tag -s -m "Dritte Veröffentlichung" 3.0

3.1.3.5. Deleting and Overwriting Tags

Use the -d and -f options to delete or overwrite tags:

$ git tag -d 0.2
Deleted tag '0.2' (was 4773c73)

The options should be used with caution, especially if you use the tags not only locally, but also publish them. Under certain circumstances, tags may indicate different commits — version 1.0 in repository X points to a different commit than version 1.0 in repository Y. But see also Sec. 5.8, “Exchanging Tags”.

3.1.3.6. Lightweight vs. Annotated Tags

For public versioning of software, annotated tags are generally more useful. Unlike lightweight tags, they contain meta-information that shows who created a tag and when — the person contact is unique. Users of software can also find out who has approved a particular version. For example, it’s clear that Junio C. Hamano has tagged Git version 1.7.1 — so it has his “seal of approval”. The statement also confirms the cryptographic signature, of course. Lightweight tags, on the other hand, are particularly suitable for applying local markers, for example to identify certain commits relevant to the current task. However, make sure not to upload such tags to a public repository (see Sec. 5.8, “Exchanging Tags”), as they might spread. If you only use the tags locally, you can also delete them once they have fulfilled their service (see above).

3.1.3.7. Non-Commit Tags

With tags you can mark any Git object, not only commits, but also trees, blobs and even tag objects themselves! The classic example is to put the GPG public key used by the maintainer of a project to sign tags in a blob.

For example, the tag junio-gpg-pub in the Git repository of Git points to the key of Junio C. Hamano:

$ git show junio-gpg-pub | head -5
tag junio-gpg-pub
Tagger: Junio C Hamano <junkio@cox.net>
Date:   Tue Dec 13 16:33:29 2005 -0800

GPG key to sign git.git archive.

Because this blob object is not referenced by any tree, the file is virtually separate from the actual code, but still exists in the repository. In addition, a tag on a “lonely” blob is necessary so that it is not considered unreachable and is deleted during repository maintenance.⁠^[34]

To use the key, proceed as follows:

$ git cat-file blob junio-gpg-pub | gpg --import
gpg: key F3119B9A: public key "Junio C Hamano <junkio@cox.net>" imported
gpg: Total number processed: 1
gpg:               imported: 1

You can then verify all tags in the Git-via-Git repository, as described above.

3.1.3.8. Describing Commits

Tags are very useful for describing any commit “better”. The git describe command gives a description consisting of the most recent tag and its relative position in the commit graph. Here’s an example from the git project: we describe a commit with the SHA-1 prefix 28ba96a, which is located in the commit graph seven commits after version 1.7.1:

Figure 14. The commit to be described highlighted in gray

$ git describe --tags
v1.7.1-7-g28ba96a

The output of git describe is formatted as follows:

<tag>-<position>-g<SHA-1>

The tag is v1.7.1; the position indicates that there are seven new commits between the tag and the described commit.⁠^[35] The g before the ID indicates that the description is derived from a Git repository, which is useful in environments with multiple version control systems. By default, git describe only searches for annotated tags, but the --tags option extends the search to include lightweight tags.

The command is very useful because it translates a content-based identifier into something useful for humans: v1.7.1-7-g28ba96a is much closer to v1.7.1 than v1.7.1-213-g3183286. This allows you to compile the output directly into the software in a way that makes sense, just like in the Git project:

$ git describe
v1.7.5-rc2-8-g0e73bb4
$ make
GIT_VERSION = 1.7.5.rc2.8.g0e73bb
...
$ ./git --version
git version 1.7.5.rc2.8.g0e73bb

This way a user knows roughly what version he has, and can track which commit the version was compiled from.

3.2. Restoring Versions

The goal of version control software is not just to examine changes between commits. Above all, it is also important to restore older versions of a file or entire directory trees, or to undo changes. In Git, the commands checkout, reset, and revert are particularly useful for this.

The Git command checkout can not only change branches, but also restore files from previous commits. The syntax is general:

git checkout [-f] <referenz> -- <muster>

checkout resolves the given reference (and HEAD if missing) to a commit and extracts all files matching <pattern> to the working tree. If <pattern> is a directory, it refers to all files and subdirectories in it. Unless you explicitly specify a pattern, all files are checked out. Changes to a file are not simply overwritten, unless you specify the -f option (see above). HEAD is also set to the corresponding commit (or branch).

However, if you specify a pattern, checkout overwrites this file(s) without prompting. So to discard all changes to <file>, enter git checkout — <file>: Git then replaces <file> with the version in the current branch. This way, you can also reconstruct the older state of a file:

$ git checkout ce66692 -- <datei>

The double minus separates the patterns from the options or arguments. It is not necessary, however: If there are no branches or other references with that name, Git will try to find one. So the separation only makes it clear that you want to recover the file(s) in question.

To view the contents of a file from a particular commit without checking it out, use the following command:

$ git show ce66692:<file>

Use --patch or -p to call git checkout in interactive mode. The procedure is the same as for git add -p (see Sec. 2.1.2, “Creating Commits Step by Step”), but here you can reset hunks of a file step-by-step.

3.2.1. Detached HEAD

If you check out a commit that is not referenced by a branch, you are in detached-HEAD mode:

$ git checkout 3329661
Note: checking out '3329661'.

You are in 'detached HEAD' state. You can look around, make
experimental changes and commit them, and you can discard any
commits you make in this state without impacting any branches
by performing another checkout.

If you want to create a new branch to retain commits you create,
you may do so (now or later) by using -b with the checkout command
again. Example:

  git checkout -b new_branch_name

HEAD is now at 3329661... Add LICENSE file

As the explanation, which you can hide by setting the option advice.detachedHead to false, already warns you, changes you make now will be lost in case of doubt: Since your HEAD is the only direct reference to the commit after that, further commits are not directly referenced by a branch (they are unreachable, see above).

So working in detached HEAD mode is especially useful if you want to try something quickly: Has the bug actually already appeared in commit 3329661? Was there actually a README file at the time of 3329661?

If you want to do more than just look around from the commit you checked out, for example, to see if your software already had a particular bug at the time, you should create a branch:

$ git checkout -b <temp-branch>

Then you can make commits as usual without fear of losing them.

3.2.2. Rolling Back Commits

If you want to undo all the changes a commit makes, the revert command helps. However, it does not delete a commit, but creates a new one whose changes are exactly the opposite of the other commit: Deleted lines become added lines, and vice versa.

Suppose you have a commit that creates a LICENSE file. The patch of the corresponding commit looks like this:

--- /dev/null
+++ b/LICENSE
@@ -0,0 +1 @@
+This software is released under the GNU GPL version 3 or newer.

Now you can undo the changes:

$ git revert 3329661
Finished one revert.
[master a68ad2d] Revert "Add LICENSE file"
 1 files changed, 0 insertions(+), 1 deletions(-)
 delete mode 100644 LICENSE

Git creates a new commit on the current branch — unless you specify otherwise — with the description Revert "<Old commit message>". This commit looks like this:

$ git show
commit a68ad2d41e9219383449d703521573477ee7da48
Author: Julius Plenz <feh@mali>
Date:   Mon Mar 7 05:28:47 2011 +0100

    Revert "Add LICENSE file"

    This reverts commit 3329661775af3c52e6b2ad7e9e7e7d789ba62712.

diff --git a/LICENSE b/LICENSE
deleted file mode 100644
index 3fd9c20..0000000
--- a/LICENSE
+++ /dev/null
@@ -1 +0,0 @@
-This software is released under the GNU GPL version 3 or newer.

Note that from now on, both the commit and the revert will appear in the version history of a project. You therefore only undo the changes, but do not delete any information from the version history.

You should therefore only use revert if you need to undo a change that has already been published. However, if you are developing locally in a separate branch, it makes more sense to delete these commits completely (see the following section on reset and the topic Rebase, Sec. 4.1, “Moving commits — Rebase”).

If you want to perform a rebase, but not for all changes to the commit, but only for those to a file, you can use this procedure:

$ git show -R 3329661 -- LICENSE | git apply --index
$ git commit -m 'Revert change to LICENSE from 3329661'

The git show command prints the changes from commit 3329661 that apply to the LICENSE file. The -R option causes the unified-diff format to be displayed “the other way around” (reverse). The output is passed to git apply to make the changes to the file and index. The changes are then checked in.

Another way to undo a change is to check out a file from a previous commit, add it to the index, and check it in again:

$ git checkout 3329661 -- <datei>
$ git add <datei>
$ git commit -m 'Reverting <datei> to resemble 3329661'

3.2.3. Reset and the Index

If you are deleting a commit completely, not just undoing it, use git reset. The reset command sets the HEAD (and thus the current branch), and optionally the index and working tree, to a particular commit. The syntax is git reset [<option>] [<commit>].

The most important types of resets are the following:

`-⁠-⁠soft`	Resets only the `HEAD`; index and working tree remain unaffected.
`-⁠-⁠mixed`	Default setting if you do not specify an option. Sets `HEAD` and index to the specified commit, but the files in the working tree are not affected.
`-⁠-⁠hard`	Synchronizes `HEAD`, Index and Working Tree and sets them to the same commit. Changes in the working tree may be lost!

If you call git reset without any options, this is equivalent to a git reset --mixed HEAD. We’ve already seen this command: Git sets the current HEAD to HEAD (so it doesn’t change it) and the index to HEAD — in this case, the changes you added before are lost.

The possible uses of this command are many and varied and will reappear in the various command sequences. Therefore it is important to understand the functionality, even if there are sometimes alternative commands that have the same effect.

Suppose you have made two commits to master that you actually want to move to a new branch to work on further. The following command sequence creates a new branch pointing to HEAD, and then resets HEAD and the current branch master two commits. Then check out the new branch <new-feature>.

$ git branch <neues-feature>
$ git reset --hard HEAD^^
$ git checkout <neues-feature>

Alternatively, the following sequence has the same effect: you create a Branch <new-feature> that points to the current commit. Then you delete master and re-create it so that it points to the second predecessor of the current commit.

$ git checkout -b <new-feature>
$ git branch -D master
$ git branch master HEAD^^

3.2.3.1. Using Reset

With reset you do not delete any commits, but only move references. As a result, the commits that are no longer referenced are lost, and are therefore deleted (unreachable). So you can use reset to delete only the topmost commits on a branch, not arbitrary commits “somewhere in the middle,” as this would destroy the commit graph. (For the somewhat more complicated deletion of commits “in the middle,” see rebase, Sec. 4.1, “Moving commits — Rebase”).

Git always stores the original HEAD under ORIG_HEAD. So if you have performed a reset by mistake, use git reset --hard ORIG_HEAD to undo it (even if the commit was supposedly deleted). However, this does not affect lost changes to the working tree (which you have not yet checked in) — they are deleted irrevocably.

The result from above (moving two commits to a new branch) can also be achieved this way:

$ git reset --hard HEAD^^
$ git checkout -b <new-feature> ORIG_HEAD

A common use of reset is to discard changes on a test basis. You want to try a patch? Add some debugging output? Change a few constants? If you don’t like the result, a git reset --hard deletes all changes to the working tree.

You can also use reset to “make your version history nice.” For example, if you have a few commits on a branch <feature> based on master, but they are not well structured (or much too large), you can create a branch <reorder-feature> and pack all changes into new commits:

$ git checkout -b <reorder-feature> <feature>
$ git reset master
$ git add -p
$ git commit
$ ...

The command git reset master sets index and HEAD to the state of master. However, your changes in the working tree are preserved, i.e. all changes that distinguish the branch <feature> from master are now only contained in the files in the working tree. Now you can add the changes incrementally using git add -p and package them into (several) handy commits.⁠^[36]

Suppose you are working on a change and want to check it in temporarily (to continue working on it later). You can then use the following commands:

$ git commit -m 'feature (noch unfertig)'
(später)
$ git reset --soft HEAD^
(weiterarbeiten)

The command git reset --soft HEAD^ resets the HEAD one commit, but leaves the index and the working tree untouched. So all changes from your temporary commit are still in the index and working tree, but the actual commit is lost. You can now make further changes and create a new commit later. Similar functionality is provided by the --amend option for git commit, as well as the git stash command, which is explained in Sec. 4.5, “Outsourcing Changes — Git Stash”.

3.3. Merging Branches

Merging branches is called merging in Git; the commit that merges two or more branches together is called a merge commit.

Git provides the merge subcommand, which allows you to merge one branch into another. This means that any changes you make to the branch will be reflected in the current one.

Note that the command integrates the specified branch into the currently checked-out branch (i.e., HEAD). The command therefore only needs one argument:

$ git merge <branch-name>

If you handle your branches carefully, there should be no problems with merging. If there are, then this section also presents strategies for resolving merge conflicts.

First, we will look at an object-level merge process.

3.3.1. Two-Branches Merge

The two branches, topic and master, that you want to merge, each reference the most recent commit in a chain of commits (F and D), and these two commits in turn reference a tree (corresponding to the top-level directory of your project).

First, Git calculates a so-called merge base, that is, a commit that both of the commits to be merged have as common ancestors. Usually there are several such bases — in the diagram below, A and B — and then the most recent one (which has the other bases as ancestors) is used.⁠^[37] In simple terms, this is the commit where the branches diverged (i.e., B).

Now, if you want to merge two commits (D and F to M), then the trees referenced by the commits must be merged.

Figure 15. Merge base and merge commit

Git does this as follows:⁠^[38] If a tree entry (another tree or a blob) is the same in both commits, then that very tree entry will be taken over in the merge commit. This happens in two cases:

A file has not been changed by either commit, or a subdirectory does not contain a changed file: In the first case, the blob SHA 1 sum of this file is the same in both commits. In the second case, the same tree object is referenced by both commits. The referenced blob or tree is therefore the same as the one referenced in the merge base.
A file was changed on both sides and equivalently (same blobs). This happens, for example, if all changes to a file were copied from one branch using git cherry-pick (see Sec. 3.5, “Taking over Individual Commits: Cherry Picking”). The referenced blob is then not the same as in the merge base.

If a tree entry disappears in one of the commits, but is still present in the other, and is the same as in the merge base, then it is not taken over. This is equivalent to deleting a file or directory if no changes have been made to the file on the other side. Similarly, if a commit brings a new tree entry, it is copied to the merge tree.

Now what happens if a file from the commits has different blobs, that is, the file has been changed at least on one side? In the event that one of the blobs is the same as in the merge base, only one side of the file has been changed, so Git can simply adopt those changes.

However, if both blobs are different from the merge base, you might run into problems. First, Git tries to apply the changes on both sides.

A 3-way merge algorithm is usually employed for this purpose. Unlike the classic 2-way merge algorithm, which is used when you have two different versions A and B of a file and want to merge them, this 3-way algorithm involves a third version C of the file, extracted from the above merge base. Therefore, because a common ancestor of the file is known, the algorithm can in many cases better (that is, not only based on the line number or context) decide how to merge changes. In practice, so many trivial merge conflicts are already solved automatically without user intervention.

However, there are conflicts that no merge algorithm, no matter how good, can merge. This happens, for example, if the context in version A of the file was changed just before a change in file B, or, worse still, version A and B and C have different versions of a line.

Such a case is called a merge conflict. Git merges all the files as best it can, and then presents the conflicting changes to the user so they can manually merge them (and thus resolve the conflict) (see Sec. 3.4, “Resolving Merge Conflicts”).

Although it is basically possible to generate a syntactically correct resolution with an algorithm that is specially designed for the respective programming language, an algorithm cannot look beyond the semantics of the code, i.e., cannot grasp the meaning of the code. Therefore, a solution generated in this way would usually not make sense.

3.3.2. Fast Forward Merges: Fast Forwarding One Branch

The git merge command does not always create a merge commit. A trivial case, but one that does occur frequently, is the so-called fast-forward merge, i.e. a fast forward merge of the branch.

A fast forward merge occurs when a branch, for example topic, is the child of a second branch, master:

Figure 16. Before the fast forward merge

A simple git merge topic in Branch master now causes master to simply be moved forward — no merge commit is created.

Figure 17. After the fast forward merge — no merge commit was created

Of course, such a behavior only works if the two branches have not diverged, i.e. if the merge base of both branches is one of the two branches itself, in this case master.

This behavior is often desirable:

You want to integrate upstream changes, that is, changes from another Git repository. You typically use a command like git merge origin/master to do this. A git pull will also perform a merge. To learn how to merge changes between git repositories, see Ch. 5, Distributed Git.
You want to add an experimental branch. Because it’s quick and easy to create branches in Git, it’s a good idea to start a new branch for each feature. If you’ve tried something experimental on a branch and want to integrate it without being able to tell when it’s “time to integrate”, you can do so by fast-forwarding.

With the options --ff-only and --no-ff you can adjust the merge behavior. If you use the first option and the branches cannot be merged using fast-forward, Git will abort with an error message. The second option forces Git to create a merge commit even though fast forward would have been possible.

There are different opinions on whether changes should always be integrated via fast-forward or whether it is better to create a merge commit, although this is not absolutely necessary. The results are the same in both cases: Changes from one branch are integrated into another.

However, when you create a Merge-Commit, the integration of a feature becomes clear. Consider the following two excerpts from the version history of a project:

Figure 18. Integration of a feature with and without fast forward

In the above case, you cannot easily see which commits were previously developed in branch sha1-caching, that is, they have to do with a specific feature of the software.

In the lower version, however, you can see at first glance that there were exactly four commits on that branch, and that it was then integrated. Since nothing was developed in parallel, the merge commit would in principle be unnecessary, but it does make the integration of the feature clear.

So instead of relying on the magic of git merge, it makes sense to create two aliases (see Sec. 1.3.1, “Git Aliases”) that force or forbid fast forward merge:

nfm = merge --no-ff     # no-ff-merge
ffm = merge --ff-only   #    ff-merge

An explicit merge commit is also helpful because you can undo it with a single command. This is useful, for example, if you have integrated a branch but it has bugs: If the code is running in production, it is often desirable to merge the entire change back in until the bug is fixed. Use for this:

git revert -m 1 <merge-commit>

Git then produces a new commit that reverses any changes made by the merge. The -m 1 option here specifies which “side” of the merge should be considered the mainline, or stable line of development: its changes are preserved. In the above example, -m 1 would cause the changes made by the four commits from branch sha1-caching, the second string of the merge, to be undone.

3.3.3. Merge Strategies

Git has five different merge strategies, some of which can be further adjusted by strategy options. You determine the strategy by -s, so a merge call is as follows:

git merge -s <strategy> <branch>

Some of these strategies can only merge two branches, others any number.

resolve: The resolve strategy can merge two branches using a 3-way merge technique. The newest (best) of all possible bases is used as the merge base. This strategy is fast and generally produces good results.

recursive: This is the standard strategy that Git uses to merge two branches. A 3-way merge algorithm is also used here. However, this strategy is more clever than resolve: If several merge bases exist, all of which have “equal rights,”⁠^[39] then Git first merges these bases together, and then uses the result as the merge base for the 3-way merge algorithm. In addition to the fact that merges with file renames can be processed more easily as a result, a test run on the version history of the Linux kernel has shown that these strategies result in fewer merge conflicts than the resolve strategy. The strategy can be adapted by various options (see below).

octopus: Standard strategy when three or more branches are merged. In contrast to the two strategies mentioned above, the octopus strategy can only perform merges if no error occurs, i.e. if no manual conflict resolution is necessary. The strategy is especially designed to integrate many topic branches that are known to be compatible with the mainline (main development strand).

ours: Can merge any number of branches, but does not use a merge algorithm. Instead, the blobs or trees of the current branch (that is, the branch from which you entered git merge) are always used. This strategy is mainly used when you want to overwrite old developments with the current state of affairs.

subtree: Works like recursive, but the strategy does not compare the trees “on equal footing,” but tries to find the tree of one side as a subtree of the other side and only then merge them. This strategy is useful, for example, if you manage the Documentation/ subdirectory of your project in a separate repository. Then you can merge the changes from that repository into the master repository by using git pull -s subtree <documentation-repo> to apply the subtree strategy, which recognizes the contents of <documentation-repo> as a subdirectory of the master repository and applies the merge process only to that subdirectory. This topic is discussed in more detail in Sec. 5.11, “Managing Subprojects”.

3.3.4. Options for the Recursive Strategy

The default strategy recursive knows several options that adjust the behavior especially with regard to conflict resolution. You specify them with the option -X; the syntax is:

git merge -s recursive -X <option> <branch>

If you only merge two branches, you do not need to explicitly specify the recursive strategy by -s recursive.

Since the strategy can only merge two branches, it is possible to speak of our version and theirs: our version is the checked-out branch in the merge process, while their version references the branch you want to integrate.

ours: If a merge conflict occurs that would normally need to be resolved manually, our version is used instead. The strategy option is different from ours, however, because it ignores any changes made by the other side(s). The ours option, on the other hand, takes all changes made by our side and the other side, and only gives priority in the event of a conflict and only at the points of conflict on our side.

theirs: Like ours, except that the opposite is true: in case of conflicts, their version is preferred.

ignore-space-change, ignore-all-space, ignore-space-at-eol: Since whitespace does not play a syntactic role in most languages, these options allow you to tell Git to try to resolve a merge conflict automatically if whitespace is not important. A common use case is when an editor or IDE has automatically reformatted source code.

The option ignore-space-at-eol ignores whitespace at the end of the line, which is especially helpful if both sides use different line-end conventions (LF/CRLF). If you specify ignore-space-change, whitespace is also treated as a pure separator: Thus, when comparing a line, it is irrelevant how many spaces or tabs are in one place — indented lines remain indented, and separated words remain separated. The option ignore-all-space ignores any whitespace.

This is the general strategy: If their version brings in only whitespace changes covered by the specified option, they are ignored and our version is used; if they bring in further changes and our version has only whitespace changes, their version is used. However, if both sides have not only whitespace changes, there is still a merge conflict.

In general, after a merge that you could only solve by using one of these options, it is recommended to normalize the corresponding files again, i.e. to make the line endings and indentations uniform.

subtree=<tree>

Similar to the subtree strategy, but an explicit path is specified here. Similar to the above example, you would use:

git pull -Xsubtree=Documentation <documentation-repo>

3.4. Resolving Merge Conflicts

As already described, some conflicts cannot be resolved by algorithms — in this case manual rework is necessary. Good team coordination and fast integration cycles can minimize major merge conflicts. But especially in early development, when possibly the internals of a software are changed instead of adding new features, conflicts can occur.

If you are working in a larger team, the developer who has done most of the work on the conflicted code is usually responsible for finding a solution. However, such a conflict resolution is usually not difficult if the developer has a good overview of the software in general and of his piece of code and its interaction with other parts in particular.

We will go through the solution of a merge conflict using a simple example in C. Take a look at the following output.c file:

int i;

for(i = 0; i < nr_of_lines(); i++)
    output_line(i);

print_stats();

The piece of code goes through all lines of an output and outputs them one after the other. Finally it returns a small statistic.

Now two developers change something in this code. The first one, Axel, writes a function that wraps the lines before they are output and replaces output_line in the above piece of code with his improved version output_wrapped_line:

int i;
int tw = 72;

for(i = 0; i < nr_of_lines(); i++)
    output_wrapped_line(i, tw);

print_stats();

The second developer, Beatrice, modifies the code so that her newly introduced configuration setting max_output_lines is honored and not too many lines are output:

int i;

for(i = 0; i < nr_of_lines(); i++) {
    if(i > config_get("max_output_lines"))
        break;
    output_line(i);
}

print_stats();

So Beatrice uses the “obsolete” version output_line, and Axel does not yet have the construct that checks the configuration setting.

Now Beatrice tries to transfer her changes on Branch B to the branch master, where Axel has already integrated his changes:

$ git checkout master
$ git merge B
Auto-merging output.c
CONFLICT (content): Merge conflict in output.c
Automatic merge failed; fix conflicts and then commit the result.

In the output.c file, Git now places conflict markers, highlighted in semi-bold at the bottom to indicate where changes overlap. There are two pages: The first is HEAD, i.e. the branch to which Beatrice wants to apply the changes — in this case master. The other side is the branch to be integrated — B. The two sides are separated by a series of equal signs:

int i;
int tw = 72;

<<<<<<< HEAD
for(i = 0; i < nr_of_lines(); i++)
    output_wrapped_line(i, tw);
=======
for(i = 0; i < nr_of_lines(); i++) {
    if(i > config_get("max_output_lines"))
        break;
    output_line(i);
}
>>>>>>>

print_stats();

It should be noted here that only the actual conflicting changes are objected to by Beatrice. Axel’s definition of tw above is accepted without any problems, although it is not yet available in Beatrice.

Beatrice must now resolve the conflict. This is done by first editing the file directly, modifying the code as it should be, and then removing the conflict markers. If Axel has documented in detail in his commit message⁠^[40] how his new function works, this should be done quickly:

int i;
int tw = 72;

for(i = 0; i < nr_of_lines(); i++) {
    if(i > config_get("max_output_lines"))
        break;
    output_wrapped_line(i, tw);
}

print_stats();

Beatrice must then add the changes using git add. If no conflict markers remain in the file, Git will indicate that a conflict has been resolved. Finally, the result has to be checked in:

$ git add output.c
$ git commit

The commit message should definitely state how this conflict was resolved. It should also mention possible side effects on other parts of the program.

Normally, merge commits are “empty”, i.e., there is no diff output in git show (because the changes were caused by other commits). This is different in the case of a merge commit that resolves a conflict:

$ git show
commit 6e6c55810c884356402c078f30e45a997047058e
Merge: f894659 256329f
Author: Beatrice <beatrice@gitbu.ch>
Date:   Mon Feb 28 05:59:36 2011 +0100

    Merge branch 'B'

    * B:
      honor max_output_lines config option

    Conflicts:
        output.c

diff --cc output.c
index a2bd8ed,f4c8bec..e39e39d
--- a/output.c
+++ b/output.c
@@@ -1,7 -1,9 +1,10 @@@
  int i;
 +int tw = 72;

- for(i = 0; i < nr_of_lines(); i++)
+ for(i = 0; i < nr_of_lines(); i++) {
+     if(i > config_get("max_output_lines"))
+         break;
 -    output_line(i);
 +    output_wrapped_line(i, tw);
+ }

  print_stats();

This combined diff output differs from the usual unidiff format: There is not only one column with the markers for added (+), removed (-) and context or unchanged (␣), but two. So Git compares the result with both ancestors. The lines changed in the second column are exactly the same as Axel’s commit; the (semi-bold) changes in the first column are Beatrice’s commit including conflict resolution.

The default way, as seen above, is the following:

Open conflicting file
Resolve conflict, remove markers
Mark file as “resolved” via git add
Repeat steps one to three for all files where conflicts occurred
Check in conflict solutions via git commit

If you don’t know how to resolve the conflict on an ad hoc basis (for example, if you want to hire the original developer to produce a conflict-free version of the code), you can use git merge --abort to abort the merge process — that is, to restore your working tree to the state it was in before you initiated the merge. This command also aborts a merge that you have already partially resolved. Attention: All changes that have not been checked in will be lost.

To get an overview of which commits caused changes to your file relevant to the merge conflict, you can use the command

git log --merge -p -- <file>

Git then lists the diffs of commits that have made changes to <file> since the merge base.

If you are in a merge conflict, a file with conflicts is stored in three stages: Stage one contains the version of the file in the merge base (that is, the common original version of the file), stage two contains the version from the HEAD (that is, the version from the branch into which you are merging). Finally, stage three contains the file in the version of the branch you are merging into (this has the symbolic reference MERGE_HEAD). The working tree contains the combination of these three stages with conflict markers. However, you can display these versions with git show :<n>:<file>:

$ git show :1:output.c
$ git show :2:output.c
$ git show :3:output.c

With a program specially developed for 3-way merges, however, it is much easier for you to keep an overview. The program looks at the three stages of a file, visualizes them accordingly and offers you options to move changes back and forth.

3.4.1. Help with Merging: Mergetool

In the case of non-trivial merge conflicts, a merge tool is recommended that visualizes the three stages of a file accordingly, thereby facilitating the resolution of the conflict.

Common IDEs and editors such as Vim and Emacs offer such a mode. There are also external tools such as KDiff3⁠^[41] and Meld.⁠^[42] The latter visualizes particularly well how a file has changed between commits.

Figure 19. The example merge conflict, visualized in the merge tool “Meld”

You launch such a merge tool via git mergetool. Git will go through all the files that contain conflicts and display each one (when you press enter) in a merge tool. By default this is Vimdiff.⁠^[43]

Such a program will usually display the three versions of a file — our page, their page, and the file merged as far as possible, including conflict markers — in three columns side by side, the latter sensibly in the middle. It is always essential that you make the change (conflict resolution) in the middle file, i.e. in the working copy. The other files are temporary and are deleted again when the merge tool is finished.

In principle, you can use any other tool. The mergetool script simply stores the three stages of the file with the corresponding file name and starts the diff tool on these three files. If it quits again, Git checks to see if there are any conflict markers left in the file — if not, Git will assume that the conflict was resolved successfully and automatically add the file to the index using git add. Finally, when you have finished processing all the files, you only need to make one commit call to seal the conflict resolution.

The merge.tool option determines which tool Git starts on the file. The following commands are already preconfigured, meaning that Git already knows in which order the program expects the arguments and which additional options need to be specified:

araxis bc3 codecompare deltawalker diffmerge diffuse
ecmerge emerge gvimdiff gvimdiff2 gvimdiff3 kdiff3
meld opendiff p4merge tkdiff tortoisemerge
vimdiff vimdiff2 vimdiff3 xxdiff

To use your own merge tool, you must set merge.tool to a suitable name, for example mymerge, and then at least specify the mergetool.mymerge.cmd option. The shell evaluates the expression stored in it, and the variables BASE, LOCAL, REMOTE, and MERGED, which are contained in the file with the conflict markers, are set to the corresponding temporary files. You can further configure the properties of your merge command, see the git-config(1) man page in the mergetool configuration section.

If you temporarily (not permanently) decide to use another merge program, specify it with the -t <tool> option. So to try Meld, during a merge conflict, simply type git mergetool -t meld — of course Meld must be installed for this to work.

3.4.2. Rerere: Reuse Recorded Resolution

Git has a relatively unknown (and poorly documented), but very helpful feature: Rerere, short for Reuse Recorded Resolution. You need to set the rerere.enabled option to true to have the command called automatically (note the d at the end of enabled).

The idea behind Rerere is simple but effective: Whenever a merge conflict occurs, Rerere automatically records a pre-image, an image of the conflict file including markers. In the case of the example above, it would look like this:

$ git merge B
Auto-merging output.c
CONFLICT (content): Merge conflict in output.c
Recorded preimage for 'output.c'
Automatic merge failed; fix conflicts and then commit the result.

If the conflict is resolved as above and the solution is checked in, Rerere saves the conflict resolution:

$ vim output.c
$ git add output.c
$ git commit
Recorded resolution for 'output.c'.
[master 681acc2] Merge branch 'B'

So far Rerere has not really helped. But now we can delete the merge commit completely (and are back to the situation before the merge). Then we execute the merge again:

$ git reset --hard HEAD^
HEAD is now at f894659 wrap output at 72 chars
$ git merge B
Auto-merging output.c
CONFLICT (content): Merge conflict in output.c
Resolved 'output.c' using previous resolution.
Automatic merge failed; fix conflicts and then commit the result.

Rerere notices that the conflict is known and that a solution has already been found.⁠^[44] So Rerere calculates a 3-way-merge between the saved pre-image, the saved solution and the version of the file in the working tree. This way Rerere can resolve not only the same conflicts, but also similar ones (if in the meantime further lines outside the conflict area have been changed).

The result is not directly added to the index. The solution is simply copied to the file. You can then use git diff to check whether the solution looks useful, run tests if necessary, etc. If everything looks good, you can use the automatic solution via git add as usual.

3.4.2.1. Why Rerere Makes Sense

One might object: Who voluntarily takes the risk of deleting an already (possibly costly) resolved merge conflict in order to want to repeat it at some point?

However, the procedure is desirable: First of all, it doesn’t make sense to simply periodically and out of habit merge the mainline — i.e. the main development thread, e.g. master — into the topic branch (we will come back to this later). But if you have a long-lived topic branch and want to test it occasionally to see if it is compatible with the mainline, you don’t want to resolve the conflicts by hand every time — once resolved, Rerere will resolve conflicts automatically. This way you can successively develop your feature, knowing that it is in conflict with the mainline. But at the time of the integration of the feature the conflicts are all automatically resolvable (because you have occasionally saved conflict solutions with Rerere).

In addition, Rerere is also called automatically in conflict cases that arise in a rebase process (see Sec. 4.1, “Moving commits — Rebase”). Again, once conflicts have been resolved, they can be automatically resolved again. Once you have merged a branch into the mainline for test purposes and resolved a conflict, this solution is automatically applied when you rebuild this branch on the mainline via rebase.

3.4.2.2. Using Rerere

In order for the Rere functionality to be used, you must set the rerere.enabled option to true, as mentioned above. Rerere will then be called automatically when a merge conflict occurs (to capture the pre-image, possibly to resolve the conflict) and when a conflict resolution is checked in (to save the resolution).

Rerere stores information such as pre-image and resolution in .git/rr-cache/, uniquely identified by a SHA-1 sum. You almost never need to call the git rerere subcommand, as it is already handled by merge and commit. You can also use git rerere gc to delete very old solutions.

What happens if a wrong conflict resolution was checked in? Then you should delete the conflict resolution, otherwise Rerere will reapply the solution when you repeat the conflicted merge. To do this, there is the command git rerere forget <file> — directly after Rerere has checked in a wrong solution, you can delete the wrong solution in this way and restore the original state of the file (i.e. with conflict markers). If you only want to do the latter, a git checkout -m <file> will also help.

3.4.3. Avoiding Conflicts

Decentralized version control systems generally manage merges much better than central ones. This is mainly due to the fact that it is common practice in decentralized systems to check in many small changes locally first. This avoids “monster commits”, which offer much more potential for conflict. This finer granular development history and the fact that merges are usually data in the version history (as opposed to simply copying the lines of code) mean that decentralized systems do not have to look at the mere contents of files when merging.

Prevention is the best way to minimize merge conflicts. Make small commits! Combine your changes so that the resulting commit makes sense as a unit. Always build Topic Branches on the latest release. Merge from topic branches into “collection branches” or directly into master, not the other way around.⁠^[45] Using Rerere prevents conflicts that have already been resolved from constantly reoccurring.

Obviously, good communication among developers is also important for prevention: If several developers implement different and mutually influencing changes to the same function, this will certainly lead to conflicts sooner or later.

Another factor that unfortunately often leads to unnecessary(!) conflicts is autogenerated content. Suppose you write the documentation of a software in AsciiDoc⁠^[46] or work on a LaTeX project with several contributors: Never add the compiled man pages or the compiled DVI/PS/PDF to the repository! In the autogenerated formats, small changes to the plaintext (i.e. in the Ascii or LaTeX version) can cause large (and unpredictable) changes to the compiled formats that Git will not resolve adequately. Instead, it makes sense to provide appropriate Makefile targets or scripts to generate the files, and possibly keep the compiled version on a separate branch.⁠^[47]

3.5. Taking over Individual Commits: Cherry Picking

It will happen that you don’t want to integrate an entire branch directly, but rather parts, i.e. individual commits, first. The cherry-pick (“pick the good cherries”) git command is responsible for this.

The command expects one or more commits to be copied to the current branch. For example:

$ git cherry-pick d0c915d
$ git cherry-pick topic~5 topic~1
$ git cherry-pick topic~5..topic~1

The middle command copies two explicitly specified commits; the last command, on the other hand, copies all commits belonging to the specified commit range.

Unlike a merge, however, only the changes are integrated, not the commit itself. To do this, it would have to reference its predecessor, so that the predecessor would also have to be integrated, and so on, which is equivalent to a merge. So when you take over commits with cherry-pick, new commits are created with a new commit ID. Git can’t know that these commits are actually the same.

So if you are merging two branches that you have cherry-picked changes between, conflicts can occur.⁠^[48] These are usually trivial to resolve, and the strategy options ours and theirs might be helpful (see Sec. 3.3.4, “Options for the Recursive Strategy”). The rebase command, on the other hand, recognizes such commit duplications,⁠^[49] and omits the duplicated commits. This allows you to take some commits “from the middle” and then rebuild the branch the commits came from.

The cherry-pick command also understands these merge strategy options itself: If you want to copy a commit to the current branch, and if you want to make sure the new commit is right in case of conflict, use:

git cherry-pick -Xtheirs <commit>

The -n or --no-commit option tells Git to commit the changes from a commit to the index, but not to make a commit yet. This allows you to “aggregate” several small commits into the index first, and then package them as one commit:

$ git cherry-pick -n 785aa39 512f3e9 4e4a063
Finished one cherry-pick.
Finished one cherry-pick.
Finished one cherry-pick.
$ git commit -m "Diverse kleine Änderungen"

3.6. Visualizing Repositories

When you have created and merged some branches, you will have noticed that the following is the case: it’s easy to lose track.

The arrangement of commits and their relationships to each other is called the topology of a repository. In the following, we will introduce the graphical program gitk, among other things, to examine these topologies.

For small repositories, first call gitk --all, which displays the entire repository as a graph. Clicking on the individual commits displays the meta-information as well as the generated patch.

3.6.1. Revision Parameters

Since the listing of multiple commits is hard to keep track of, we examine a small sample repository with several branches merged together:

Figure 20. The graph of commits as displayed in gitk

We recognize four branches (A-D) and one tag release. We can also display this tree on the console with the appropriate command line options using the log command (branch and tag names are printed in semi-bold for better distinction):

$ git log --decorate --pretty=oneline --abbrev-commit --graph --all
* c937566 (HEAD, D) commit on branch D
| *   b0b30ef (release, A) Merge branch 'C' into A
| |\
| | * 807db47 (C) commit on branch C
| | * 996a53b commit on branch C
| |/
|/|
| * 83f6bf3 commit on branch A
| *   5b2c291 Merge branch 'B' into A
| |\
| | * 2417cf7 (B) commit on branch B
| |/
|/|
| * 0bf1433 commit on branch A
|/
* 4783886 initial commit

The output of the log command is equivalent to the view in Gitk. However, git log is much faster than Gitk and does not require another program window.

So for a quick overview, it’s much more convenient to set up an alias that automatically adds the many long options. The authors use the alias tree for this, which you can define as follows:

$ git config --global alias.tree \'log --decorate \
   --pretty=oneline --abbrev-commit --graph'

By using git tree --all you get an ASCII version of the graph of the git repository. In the following, we use this alias to represent the topology.

Now we change the above command: instead of the --all option, which puts all commits in the tree, we now specify B (the name of the branch)

$ git tree B
* 2417cf7 (B) commit on branch B
* 4783886 initial commit

We receive all commits that are accessible from B. A commit only knows its predecessor(s) (several if branches are merged). “All commits reachable from B” thus refers to the list of commits from B onwards, up to a commit that has no predecessor (called a root commit).

Instead of one, the command can also accept multiple references. So to get the same output as with the --all option, you must specify references A, B, and D. C can be omitted because the commit is already “collected” on the way from A to the root commit.

Of course, you can also specify an SHA-1 sum directly instead of symbolic references:

$ git tree 5b2c291
*   5b2c291 Merge branch 'B' into A
|\
| * 2417cf7 (B) commit on branch B
* | 0bf1433 commit on branch A
|/
* 4783886 initial commit

If a reference is preceded by a caret (^), this negates the meaning.⁠^[50] So the notation ^A means: not the commits that are accessible from A. However, this switch only excludes these commits, but not the others. So the above log command with the argument ^A will not output anything, because Git only knows which commits should not be displayed. So again, we add --all to list all commits, minus those that are accessible from A:

$ git tree --all ^A
* c937566 (HEAD, D) commit on branch D

An alternative notation is available with --not: Instead of ^A you can also write --not A.

Such commands are especially useful for examining the difference between two branches: Which commits are in branch D that are not in A? The command returns the answer:

$ git tree D ^A
* c937566 (HEAD, D) commit on branch D

Because this question is often asked, there is another, more intuitive notation for it: A..D is equivalent to D ^A:

$ git tree A..D
* c937566 (HEAD, D) commit on branch D

Of course the order is important here: “D without A” is a different set of commits than “A without D”! (Compare also the complete graph.)

In our example there is a tag release. To check which commits from branch D (which could stand for “Development”) are not yet included in the current release, simply specify release..D.

The syntax A..B can be remembered as the idiom “from A to B”. However, this “difference” is not symmetrical, i.e. A..B are usually not the same commits as B..A.

Alternatively, Git provides the symmetrical difference A..B. It is equivalent to the argument A B --not $(git merge-base A B), so it includes all the commits that can be reached from A or B, but not both.

3.6.1.1. Reference vs. List of References

In the example, A always refers to all commits that are accessible from A. But actually a branch is just a reference to a single commit. So why does log always list all commits reachable from A, while the git command show with the argument A only shows this one commit?

The difference is what the commands expect as an argument: show expects an object, that is, a reference to a single object, which is then displayed.⁠^[51] Many other commands expect one (or more) commits instead, and these commands convert the arguments into a list of commits (traversing the list until the root commit).

3.6.2. Gitk

Gitk is a graphical program implemented in Tcl, which is usually packaged by distributors along with the actual Git commands — so you can be sure to find it on almost any system.

It represents individual commits or the entire repository in a three-part view: at the top is the tree structure with two additional columns for author and date, below is a list of changes in unified diff format, and a list of files to restrict the changes displayed.

The graph view is intuitive: Different colors help to distinguish the different version strings. Commits are always blue dots, with two exceptions: The HEAD is highlighted in yellow, and a commit that is not a root commit, but whose predecessor is not displayed, is shown in white.

Branches with an arrowhead indicate that further commits have been made on the branch. However, Gitk hides the branch due to the time distance between commits. A click on the arrowhead will take you to the continuation of the branch.

Branches appear as green labels, the currently checked out branch additionally bold. Tags are shown as yellow arrows.

You can delete or check out a branch with a right click on it. Right-clicking on commits opens a menu in which you can perform actions on the selected commit. The only thing that might be easier to do with Gitk than from the command line is cherry picking, i.e. transferring individual commits to another branch (see also Sec. 3.5, “Taking over Individual Commits: Cherry Picking”).

Figure 21. Complex topology in Gitk

Gitk accepts essentially the same options as git log. Some examples:

$ gitk --since=yesterday -- doc/
$ gitk e13404a..48effd3
$ gitk --all -n 100

The first command shows all commits since yesterday that have made changes to a file under the doc/ directory. The second command limits the commits to a specific range, while the third command shows the 100 most recent commits from all branches.

Experience shows that beginners are often confused because gitk by default only shows the current branch. This is probably because gitk is often called to get an overview of all branches. Therefore the following shell alias is useful: alias gik='gitk --all'.

Many users leave gitk open during work. Then it’s important to update the display from time to time so that more recent commits appear. With F5 (Update) you load all new commits and refresh the display of the references. Sometimes, however, if you delete a branch, for example, this is not enough. Although the branch is no longer displayed, there may still be unreachable commits in the GUI as artifacts. The key combination Ctrl+F5 (Reload) completely reloads the repository, which solves the problem.

As an alternative to gitk, you can use the GTK-based gitg or Qt-based qgit on UNIX systems; on an OS X system, for example, you can use GitX; for Windows, you can use GitExtensions. Some IDEs now also have corresponding visualizations (e.g. the Eclipse plugin EGit). Furthermore, you can use full-fledged Git clients like Atlassian SourceTree (OS X, Windows; free of charge), Tower (OS X; commercial) as well as SmartGit (Linux, OS X and Windows; free for non-commercial use).

3.7. Reflog

The Reference Log (Reflog) are log files that Git creates for each branch and HEAD. They store when a reference was moved from where to where. This happens especially with the checkout, reset, merge and rebase commands.

These log files are stored under .git/logs/ and are named after the reference. The reflog for the master branch can be found under .git/logs/refs/heads/master. There is also the command git reflog show <reference> to list the reflog:

$ git reflog show master
48effd3 master@{0}: HEAD^: updating HEAD
ef51665 master@{1}: rebase -i (finish): refs/heads/master onto 69b9e27
231d0a3 master@{2}: merge @{u}: Fast-forward
...

The Reflog command is rarely used directly and is just an alias for git log -g --oneline. In fact, the -g option causes the command not to show the predecessors in the commit graph, but to process the commits in the order in which they were reflogged.

You can easily try this: Create a test commit, then delete it again with git reset --hard HEAD^. The command git log -g will now first show the HEAD, then the deleted commit, and then the HEAD again.

The reflog thus also references commits that are otherwise no longer referenced, i.e. are “lost” (see Sec. 3.1.2, “Managing Branches”). The reflog might help you if you have deleted a branch that you would have needed after all. Although a git branch -D also deletes the branch’s reflog. However, you had to check out the branch to commit to it, so use git log -g HEAD to find the last time you checked out the branch you were looking for. Then create a branch that points to this (seemingly lost) commit ID, and your lost commits should be back.⁠^[52]

Commands that expect one or more references can also implicitly use Reflog. In addition to the syntax already found in the output of git log -g (e.g. HEAD@{1} for the previous position of the HEAD), Git also understands <ref>@{<when>}. Git interprets the time <when> as an absolute or relative date and then consults the reflog of the corresponding reference to find out what the next log entry in time is. This is then referenced.

Two examples:

$ git log 'master@{two weeks ago}..'
$ git show '@{1st of April, 2011}'

The first command lists all commits between HEAD and the commit the master branch pointed to two weeks ago (note the suffix .. which means a commit range up to HEAD). This doesn’t necessarily have to be a commit that is two weeks old: if you test moved the branch to the very first commit in the repository two weeks ago using git reset --hard <initial-commit>, then that very commit will be referenced.⁠^[53]

The second line shows the commit to which the currently checked out branch (due to missing explicit reference before the @) pointed on April 1, 2011. In both commands, the argument with a Reflog attachment must be enclosed in quotation marks to make sure Git gets the argument completely.

Note that the reflog is only available locally and therefore does not belong to the repository. If you send a commit ID or tag name to another developer, it references the same commit, but a master@{yesterday} can reference different commits depending on the developer.

If you don’t specify a branch and time, Git will assume HEAD. This allows you to use @ as the short form for HEAD in commands. Furthermore, many commands understand the argument - as @{-1}, which is “last position of HEAD”:

$ git checkout feature   # vorher auf "master"
$ git commit ...         # Änderungen, Commits machen
$ git checkout -         # zurück auf "master"
$ git merge -            # Merge von "feature"

4. Advanced Concepts

The following chapter covers selected advanced concepts. The focus is on the Rebase command with its many applications. We find out who changed a line in the source code (Blame) and when, and how to tell Git to ignore files and directories. We’ll also look at how to stash changes to the working tree and annotate commits (Notes). Finally, we show you how to quickly and automatically find commits that introduce a bug (Bisect).

4.1. Moving commits — Rebase

In the section on Git’s internals, we mentioned earlier that you can move and modify commits in a Git repository (graphically speaking) at will. In practice, this is made possible primarily by the git command rebase. This command is very powerful and important, but sometimes a bit more demanding to use.

Rebase is an artificial word which means “to put something on a new basis”. What it means is that a group of commits is moved around within the commit graph, building commit after commit based on another node. The following graphics illustrate how this works:

Figure 22. Before the rebase

Figure 23. …and after that

In its simplest form the command is git rebase <reference> (in the above diagram: git rebase master). This means that Git first marks all commits <reference>..HEAD, i.e. the commits that can be reached from HEAD (the current branch) minus the commits that can be reached from <reference> - in other words, everything that is in the current branch but not in <reference>. In the diagram, these are E and F.

The list of these commits is stored temporarily. Git then checks out the commit <reference> and copies the individual cached commits in the original order as new commits to the branch.

There are a few points to consider:

Because the first node of the topic branch (E) now has a new predecessor (D), its metadata and thus its SHA-1 sum changes (it becomes E_). The second commit (F) then also has a different predecessor (E_ instead of E), its SHA-1 sum changes (it becomes F_) and so on - this is also called the ripple effect. Overall, all copied commits will have new SHA-1 sums - so they’re the same (in terms of changes), but not identical.

Such an action, just like a merge operation, can result in conflicting changes. Git can partially resolve them automatically, but aborts with an error message if the conflicts are not trivial. The rebase process can then either be “repaired” and continued, or aborted (see below).

If no other reference points to node F, it will be lost, because reference HEAD (and the corresponding branch, if applicable) will be shifted to node F_ in case of a successful rebase. So if F has no more reference (and no predecessors referencing F), Git can no longer find the node, and the tree “disappears”. If you’re not sure whether you need the original tree again, you can simply reference it with the tag command, for example. In that case, the commits will be preserved even after a rebase (but then in duplicate at different places in the commit graph).

4.1.1. An Example

Consider the following situation: The sqlite-support branch branches off from the “fixed a bug…” commit. But the master branch has already moved on, and a new 1.4.2 release has been made.

Figure 24. Before the rebase

Now sqlite-support is checked out and rebuilt to master:

$ git checkout sqlite-support
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: include sqlite header files, prototypes
Applying: generalize queries
Applying: modify Makefile to support sqlite

Rebase applies the three changes introduced by commits from the sqlite-support branch to the master branch. After that, the repository looks like this in Gitk:

Figure 25. After rebase

4.1.2. Extended Syntax and Conflicts

Normally git rebase will always build the branch you are currently working on on a new one. However, there is a shortcut: If you want to base topic on master, but you are on a completely different branch, you can do this via

$ git rebase master topic

Git does the following internally:

$ git checkout topic
$ git rebase master

Please note the (unfortunately not very intuitive) order:

git rebase <on which> <what>

A rebase can lead to conflicts. The process then stops with the following error message:

$ git rebase master
...
CONFLICT (content): Merge conflict in <datei>
Failed to merge in the changes.
Patch failed at ...
The copy of the patch that failed is found in:
   .../.git/rebase-apply/patch

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".

You proceed as with a regular merge conflict (see Sec. 3.4, “Resolving Merge Conflicts”) - git mergetool is very helpful here. Then simply add the changed file via git add and let the process continue via git rebase --continue.⁠^[54]

Alternatively, the problematic commit can be skipped using the git rebase --skip command. The commit is then lost unless it is referenced in another branch somewhere else! So you should only perform this action if you are certain that the commit is obsolete.

If none of this helps (e.g. if you can’t solve the conflict at that point, or if you realize that you are rebuilding the wrong tree), pull the emergency brake: git rebase --abort. This will discard all changes to the repository (including successfully copied commits), so that the state afterwards is exactly the same as it was when the rebase process was started. The command also helps if at some point you forget to finish a rebase process, and other commands complain that they can’t do their job because a rebase is in progress.

4.1.3. Why Rebasing Makes Sense

Rebase is primarily useful for keeping the commit history of a project simple and easy to understand. For example, a developer might be working on a feature, but then have something else to do for a few weeks. Meanwhile, however, development on the project has progressed, there’s been a new release, etc. Only now does the developer get to finish a feature. (Even if you want to send patches via email, rebase helps to avoid conflicts, see Sec. 5.9, “Patches via E-mail”.)

For the version history it is now much more logical if his feature was not “dragged along” unfinished for a long period of time alongside the actual development, but if the development branches off from the last stable release.

Rebase is good for exactly this change in history: The developer can now simply enter the command git rebase v1.4.2 on the branch where he developed the feature, to rebuild his feature branch on the commit with the release tag v1.4.2. This makes it much easier to see what differences the feature really brings to the software.

It also happens to every developer in the heat of the moment that commits end up in the wrong branch. There is a bug that happens to be there, which is quickly fixed by a commit; but then a test must be written directly to avoid this bug in the future (another commit), and this must be noted in the documentation. After the actual work is done, you can use Rebase to “transplant” those commits to another location in the commit graph.

Rebase can also be useful if a branch requires a feature that has only recently been incorporated into the software. A merge of the master branch does not make sense semantically, because then these and other changes are inseparably merged with the feature branch. Instead, you rebase the branch on a new commit that already contains the required feature, and then use that in further development.

4.1.4. When Rebasing Is Not Useful — Rebase vs. Merge

The concept of rebase is initially a little difficult to understand. But once you have understood what is possible with it, the question arises: What is the point of a simple merge if you can edit everything with rebase?

When git-rebase is not used, or hardly used at all, a project history often develops that becomes relatively unmanageable, because merges have to be performed constantly and for a few commits at a time.

If, on the other hand, too much rebase is used, there is a danger that the entire project will be senselessly linearized: The flexible branching of Git is used for development, but the branches are then integrated into the publishing branch one after the other (!) like a zip fastener via rebase. This presents us with two main problems:

Logically related commits are no longer recognizable as such. Since all commits are linear, the development of multiple features is inextricably intertwined.

The integration of a branch can no longer be easily undone, because identifying those commits that once belonged to a feature branch is only possible manually.

This is how you can make the most of Git’s flexible branching. The conclusion is that rebase should be used neither too much nor too little. Both make the project history (in different ways) confusing.

In general, you are doing well with the following rules of thumb:

A feature is integrated by merge when it is finished. It is best to avoid creating a fast forward merge so that the merge commit is preserved as the time of integration.

While you are developing, you should use rebase frequently (especially interactive rebase, see below).

Logically separate units should be developed on separate branches - logically related ones possibly on several, which are then merged by rebase (if that makes sense). The merging of logically separate units is then done by merge.

4.1.5. A Word of Warning

As mentioned earlier, a rebase inevitably changes the SHA-1 sums of all commits that are “rebuilt”. If these changes have not yet been published, that is, if a developer has them in a private repository, that’s not too bad either.

But if a branch (e.g. `master`) is published⁠^[55] and later rewritten via rebase, this has unpleasant consequences for all involved: All branches based on master will now reference the old copy of the master branch that has been rewritten. So each branch must be rebased to the new master (which in turn changes all commit IDs). This effect continues, and can be very time-consuming to fix (depending on when such a rebase happens, and how many developers are involved in the project), especially if you’re new to git.

Therefore you should always remember the following rule:

Only edit unpublished commits with the rebase command!

Exceptions are conventions like personal branches or pu. The latter is an abbreviation for Proposed Updates and is usually a branch where new, experimental features are tested for compatibility. No one builds their own work on this branch, so it can be rewritten without problems and prior notice.

Another possibility is offered by private branches, i.e. those that start with <user>/ for example. If you make an agreement that developers will do their own development on these branches, but always base their features on “official” branches, then the developers may rewrite their branches as they wish.

4.1.6. Avoiding Code Duplication

If a feature is being developed over a long period of time, and parts of the feature are already flowing into a mainstream release (e.g. via cherry-pick), the rebase command will detect these commits and omit them when copying or rebuilding the commits, because the change is already contained in the branch.

For example, after a rebase, the new branch consists only of the commits that have not yet been incorporated into the base branch. This way, commits do not appear twice in the version history of a project. If the branch had simply been merged, the same commits with different SHA-1 sums would sometimes be present in different places in the commit graph.

4.1.7. Managing Patch Stacks

There are situations where there is a vanilla version (“simplest version”) of a piece of software and also a certain number of patches applied to it before the vanilla version is shipped. For example, your company builds software, but before each delivery to the customer, some adjustments have to be made (depending on the customer). Or you have open source software in use, but have adapted it a bit to your needs - every time a new, official version of the software is released, you have to reapply your changes and then rebuild the software.⁠^[56]

To manage patch stacks, there are some programs that build on top of Git, but give you the convenience of not having to work directly with the rebase command. For example, TopGit⁠^[57] allows You can define dependencies between branches - if something changes in a branch and other branches depend on it, TopGit will rebuild them on demand. An alternative to TopGit is Stacked Git⁠^[58].

4.1.8. Restricting Rebase via --onto

Now, you may have wondered: git rebase <reference> always copies all commits that are between reference> and HEAD. But what if you only want to implement part of a branch, to “transplant” it, so to speak? Consider the following situation:

Figure 26. Before the rebase --onto

You were developing a feature on the branch topic when you noticed a bug; you created a branch bugfix and found another bug. Semantically speaking, your branch bugfix has nothing to do with the topic branch. Therefore, it makes sense to branch off from the master branch.

But if you now rebuild the branch bugfix using git rebase master, the following happens: All nodes that are in bugfix but not in master are copied to the master branch in order - that is, nodes D, E, F, and G. However, D and E are not part of the bugfix at all.

This is where the --onto option comes into play: It allows you to specify a start and end point for the list of commits to be copied. The general syntax is

git rebase --onto <on which> <start> <end>

In this example, we only want to build the commits F and G (or also: the commits from topic to bugfix) from the top of master. Therefore the command is

$ git rebase --onto master topic bugfix

The result looks as expected:

Figure 27. After the rebase --onto

4.1.9. Improving a Commit

You have learned about the commit --amend command in Sec. 2.1, “Git Commands”, which you can use to improve a commit. However, it only refers to the current (last) commit. With rebase --onto you can also adjust commits that are further back in the past.

First, find the commit you want to edit and create a branch to it:

$ git checkout -b fix-master 21d8691

Then you make your changes, add changed files with git add, and then correct the commit with git commit --amend --no-edit (the --no-edit option takes meta-information like the description of the old commit and does not offer it again for editing).

Now apply all the commits from the master branch from above to your corrected commit:

$ git rebase --onto fix-master 21d8691 master

This will copy all commits from 21d8691 (exclusive!) to master (inclusive!). The faulty commit 21d8691 is no longer referenced, and therefore no longer appears. The fix-master branch is now obsolete and can be deleted.

An equivalent way to edit a commit is with the edit action in the interactive rebase (see Sec. 4.2.2, “Editing Commits Arbitrarily”).

4.1.10. Fine Adjustment of Rebase

There are situations where you may need to adjust the default git rebase behavior. First, this is the case when you use rebase to edit a branch that contains merges. rebase may try to mimic these instead of linearizing the commits. The -p' or `--preserve-merges option is responsible for this. ⁠^[59]

With the -m or --merge option, you can tell git rebase to use merge strategies (see also Sec. 3.3.3, “Merge Strategies”). When using these strategies, keep in mind that rebase internally commits commit by commit to the new branch via cherry-pick; therefore the roles of ours and theirs are reversed: theirs refers to the branch you are building on a new base!

An interesting use case is therefore the strategy option theirs for the merge strategy recursive: If conflicts occur, priority is given to changes from the commit being copied. So such a scenario is useful if you know that there are conflicting changes, but are certain that the changes from the branch you are building are more correct than those from the tree you are building on. If you rebuild topic to master, such a call would look like this:

$ git checkout topic
$ git rebase -m -Xtheirs master

In cases where the recursive (default) strategy gives preference to changes from commits from topic', you will find a corresponding note `Auto-merging <commit description>.

A small, very useful option that rebase passes directly to git apply is --whitespace=fix. It causes Git to automatically correct whitespace errors (such as trailing spaces). If you have merge conflicts due to whitespace (for example, due to changed indentation), you can also use the strategy options presented in Sec. 3.3.4, “Options for the Recursive Strategy” to have solutions generated automatically (for example, by specifying -Xignore-space-change).

4.2. Rewriting History — Interactive Rebase

Rebase knows an interactive mode; it is technically implemented in the same way as the normal mode, but the typical use case is quite different, because the interactive rebase allows to rewrite the story, i.e. to edit commits at will (and not just move them).

In the interactive rebase you can

change the order of commits

delete commits

merge commits

split a commit into several ones

adjust the description of commits

edit commits in any other way you can think of

You activate the mode with the option i or interactive. Basically, the rebase process will run exactly as before, but you will get a list of commits that rebase will rewrite before the command starts. This could look like this, for example:

pick e6ec2b6 Fix expected values of setup tests on Windows
pick 95b104c t/README: hint about using $(pwd) rather than $PWD in tests
pick 91c031d tests: cosmetic improvements to the repo-setup test
pick 786dabe tests: compress the setup tests
pick 4868b2e Subject: setup: officially support --work-tree without
   --git-dir

Below this list is a help text that describes what you can do with the listed commits. Essentially, there are six possible actions for each commit. You simply write the action at the beginning of the line, before the SHA-1 sum, instead of the standard pick action. The following are the actions-you can also abbreviate each one by its initial letter, e.g., s for squash.

pick: “Use commit” (default). Corresponds to the handling of commits in the non-interactive rebase.

-: If you delete a line, the commit is not used (will be lost).

reword: Adjust the commit description.

squash: merge commit with the previous one; editor is opened to merge the descriptions

fixup: Like squash, but throws away the description of the commit.

edit: Free editing. You can perform arbitrary actions.

exec: The rest of the line is executed as a command on the shell. If the command does not end successfully (i.e. with a return value of 0), the rebase stops.

The pick action is the simplest — it simply says that you want to use the commit, rebase should take that commit as it is. The opposite of pick is simply deleting an entire line. The commit is then lost (like git rebase --skip).

If you switch the order of the lines, Git will apply the commits in the newly defined order. In the beginning, the lines are in the order in which they will be applied later — that is, the exact opposite of the order in the tree view! Note that commits often build on top of each other; therefore, swapping commits will often cause conflicts if the commits make changes on the same files and in the same places.

The reword command is handy if you have typos in a commit message and want to correct them (or haven’t written a detailed one yet and want to do so now). The rebase process is stopped at the process marked reword, and Git starts an editor that already displays the commit message. Once you exit the editor (don’t forget to save!), Git will enter the new description and let the rebase process continue.

4.2.1. Correcting Small Errors: Bug Squashing

The squash and fixup commands allow two or more commits to be merged together.

Nobody always writes error-free code immediately. Often there is a big commit in which you have implemented a new feature; shortly after that, small bugs are found. What to do? A detailed description of why you forgot to add or remove a line out of carelessness? Not really useful, and especially annoying for other developers who want to review your code later. It would be nice to maintain the illusion that the commit was bug-free the first time…

For every bug you find, make a small commit with a more or less meaningful description. This could look like this, for example:

$ git log --oneline master..feature
b5ffeb7 fix feature 1
34c4453 fix feature 2
ac445c6 fix feature 1
ae65efd implement feature 2
cf30f4d implement feature 1

When some such commits have accumulated, start an interactive rebase process over the last commits. Simply estimate how many commits you want to work on, and then edit the last five using git rebase -i HEAD~5, for example.

In the editor the commits now appear in reverse order compared to the output of git log. Now arrange the small bugfix commits so that they are below the commit you are fixing. Then mark the fix commits with squash (or s), like this:

pick cf30f4d implement feature 1
s ac445c6 fix feature 1
s b5ffeb7 fix feature 1
pick ae65efd implement feature 2
s 34c4453 fix feature 2

Save the file and close the editor; the rebase process starts. Because you selected squash, rebase stops after commits are merged. The editor will display the commit messages of the merged commits, which you now summarize appropriately. If you use the keyword fixup, or f for short, instead of squash, the commit message of the commits marked in this way will be thrown away—probably more convenient for this way of working.

After the rebase the version history looks much tidier:

$ git log --oneline master..feature
97fe253 implement feature 2
6329a8a implement feature 1

It often happens that you want to "`lock" a small change into the last commit you made. Here the following alias is useful, which is similar to the fixup action:

$ git config --global alias.fixup "commit --amend --no-edit"

As mentioned above, the --no-edit option inherits one-to-one the meta-information of the old commit, especially the commit message.

If you start the commit message with fixup! or squash! followed by the beginning of the description of the commit you want to fix, you execute the command

$ git rebase -i --autosquash master

The commits marked with fixup! or squash! as above are automatically moved to the correct position and given the action squash or fixup. This allows you to exit the editor directly, and the commits are merged. If you frequently work with this option, you can also make this behavior the default for rebase calls by setting a configuration option: To do this, set the rebase.autosquash setting to true.

4.2.2. Editing Commits Arbitrarily

If you mark a commit with edit, it can be edited as you wish. rebase will go through the commits sequentially, as in the other cases. For the commits marked edit, rebase stops and HEAD is set to the corresponding commit. You can then modify the commit as if it were the most recent in the branch. Afterwards, you let rebase continue running:

$ vim ...
// # Korrekturen vornehmen
# Making corrections
$ git add ...
$ git commit --amend
$ git rebase --continue

4.2.2.1. Splitting Commits

Every programmer knows this: Checking in every change in a disciplined and meticulous way is exhausting and often interrupts the workflow. In practice, this leads to commits that are large and confusing. But this way, the version history is available to other developers - and to yourself! - and yourself, the changes should be split into as small logical units as possible.

By the way, it is not only helpful for developers to proceed this way. Also the automated debugging using git bisect works better and more accurate the smaller and more useful the commits are (see `Sec. 4.8, “Finding Regressions — Git Bisect”).

With a little experience, you can split a commit very quickly. If you frequently produce large commits, the following step should become routine.

First you start the rebase process and mark the commit you want to split with edit. rebase stops there, HEAD points to that commit.

You then reset HEAD a commit, but without discarding the changes from HEAD (the commit to be split). This is done with the reset command (see also Sec. 3.2.3, “Reset and the Index”; note that if you still need the commit description, you should copy it first):

$ git reset HEAD^

The changes caused by the commit being split are still present in the files, but the index and repository reflect the state of the previous commit. So you have moved the changes from the commit to be split to the unstaged state (you can verify this by looking at git diff before and after the reset call).

Now you can add some lines, create a commit, add more lines, and finally create a third commit for the remaining lines:

$ git add -p
$ git commit -m "Erster Teil"
$ git add -p
$ git commit -m "Zweiter Teil"
$ git add -u
$ git commit -m "Dritter (und letzter) Teil";

What happens? You have reset the HEAD by using the reset command. With each call to git commit you create a new commit, based on the respective HEAD. Instead of the one big commit (which you threw away with the reset call) you have now put three smaller commits in its place.

Now let rebase continue (git rebase --continue) and build the remaining commits from the top of HEAD (which is now the latest of your three commits).

4.3. Who Made These Changes? — Git Blame

Like other version control systems, Git has a blame or annotate command that puts the date and author of the last change on all lines in a file. This allows you to quickly find out, for example, who is responsible for a line of code that causes a problem, or since when the problem has existed.

The command annotate is only intended for people who are changing to other formats and has the same functionality as the command blame, but a slightly different output format. So you should always use blame if in doubt.

Useful options are -M to display code shifts, and -C to display code copies. You can then use the file name in the output to see from which file code may have been copied or moved. If no file name is displayed, Git couldn’t find any code moves or copies. If you use these options, it’s usually a good idea to suppress the author and date with -s so that the display still fits the screen.

From the following output you can see, for example, that the function end_url_with_slash originally came from the file http.c. The option -L<m>,<n> limits the output to the corresponding lines.

$ git blame -C -s -L123,135 url.c
638794cd url.c  123) char *url_decode_parameter_value(const char
 **query)
638794cd url.c  124) {
ce83eda1 url.c  125)    struct strbuf out = STRBUF_INIT;
730220de url.c  126)    return url_decode_internal(query, "&", &out,
 1);
638794cd url.c  127) }
d7e92806 http.c 128)
eb9d47cf http.c 129) void end_url_with_slash(struct strbuf *buf, const
 char *url)
5ace994f http.c 130) {
5ace994f http.c 131)    strbuf_addstr(buf, url);
5ace994f http.c 132)    if (buf->len && buf->buf[buf->len - 1] != _/_)
5ace994f http.c 133)            strbuf_addstr(buf, "/");
5ace994f http.c 134) }
3793a309 url.c  135)

4.3.1. Blaming with Graphics

A convenient alternative to git blame on the console is the graphical tool git gui blame (you may need to install the git-gui package for this).

Figure 28. A piece of code, which was moved from another file

If you examine a file via git gui blame <file>, the different blocks that originate from different commits are displayed with a grey background. On the left you see the abbreviated commit ID and the initials of the author.

Only when you hover your mouse over such a block does a small popup window appear with information about the commit that changed the lines, possibly with a message stating from which file and which commit this block of code was moved or copied.

In code review, people are often interested in how a file actually looked like before a certain change was made. For this purpose, the graphical blame tool offers the following possibility to go back in the version history: Right-click on the commit ID of a code block and select Blame Parent Commit from the context menu - now the predecessor of this change is displayed. You can go back several steps this way. Use the green arrow in the upper left corner to jump back into the future again.

4.4. Ignoring Files

In almost every project there are files that you do not want to version. Be it the binary output of the compiler, the autogenerated documentation in HTML format or the backup files generated by your editor. Git offers several levels of ignoring files:

user-specific setting

repository-specific setting

repository-specific setting, which will be checked in with

Which option you choose depends entirely on your application. The user-specific settings should contain files and patterns that are relevant to the user, for example backup files that your editor creates. Such patterns are usually stored in a file in the $HOME directory. With the option core.excludesfile you specify which file this should be, e.g. in the case of ~/.gitignore:

$ git config --global core.excludesfile ~/.gitignore

Certain files and patterns are bound to a project and are valid for each participant, e.g. compiler output and autogenerated HTML documentation. You store these settings in the file .gitignore, which you check in as normal and thus deliver to all developers.

Finally, the .git/info/exclude file can be used for repository-specific settings that should not be delivered with a clone, i.e. settings that are both project and user specific.

4.4.1. Pattern Syntax

The syntax for patterns is based on the shell syntax:

Blank lines have no effect and can be used for structuring and separating.

Lines starting with a # are considered comments and have no effect.

Expressions beginning with ! are evaluated as negation.

Expressions ending with a / are evaluated as directory. The expression man/ covers the directory man, but not the file or symlink with the same name.

Expressions that do not contain a / will be evaluated as shell glob for the current and all subdirectories. The expression *.zip in the topmost .gitignore, for example, covers all zip files in the project’s directory structure.

The expression * covers zero or more files and directories. Both t/data/set1/store.txt and t/README.txt are covered by the pattern t/*/*.txt.

Otherwise the pattern is evaluated as a shell globe, more precisely as a shell globe evaluated by the function fnmatch(3) with the flag FNM_PATHNAME. This means that the pattern doc/*html captures doc/index.html, but not doc/api/singleton.html.

Expressions beginning with a / are bound to the path. For example, the expression /*.sh includes upload.sh but not scripts/check-for-error.sh.

An example:⁠^[60]

$ cat ~/.gitignore
# vim swap files
.*.sw[nop]

# python bytecode
*.pyc

# documents
*.dvi
*.pdf

# miscellaneous
*.*~
*.out

4.4.2. Ignoring and Tracking Later

Files that are already versioned are not automatically ignored. To ignore such a file anyway, explicitly tell Git to “forget” the file:

$ git rm documentation.pdf

To delete the file with the next commit, but still keep it in the working tree:

$ git rm --cached documentation.pdf

Files that are already ignored will not appear in the output of git status. Also, git add refuses to accept the file; the --force and -f options force Git to consider the file after all:

$ git add documentation.pdf
The following paths are ignored by one of your .gitignore files:
documentation.pdf
Use -f if you really want to add them.
fatal: no files added
$ git add -f documentation.pdf

4.4.3. Deleting Ignored and Unknown Files

The git clean command deletes ignored as well as unknown (so-called untracked) files. Since files may be irretrievably lost, the command has the --dry-run (or -n) option; it tells you what would be deleted. As a further precaution, the command refuses to delete anything unless you explicitly pass the --force or -f.⁠^[61] option

By default, git clean only deletes the unknown files, with -X it only removes the ignored files, and with -x it removes both unknown and ignored files. With the option -d it additionally deletes directories that come into question. So to delete unknown as well as ignored files and directories, enter

$ git clean -dfx

4.5. Outsourcing Changes — Git Stash

The stash is a mechanism used to temporarily store changes in the working tree that have not yet been saved. A classic use case: your boss asks you to fix a critical bug as soon as possible, but you have just started to implement a new feature. With the git stash command, you can temporarily clean out the unfinished lines without creating a commit, and thus address the bug with a clean working tree. The stash also provides a workaround if you cannot change the branch because this would result in losing changes (see also Sec. 3.1.2, “Managing Branches”).

4.5.1. Basic Usage

With git stash you save the current state of working tree and index, if they differ from HEAD:

$ git stash
Saved working directory and index state WIP on master: b529e34 new spec
 how the script should behave
HEAD is now at b529e34 new spec how the script should behave

With the --keep-index option the index remains intact. This means that all changes that are already in the index remain in the working tree and in the index and are additionally stored in the stash.

The changes to the working tree and index are "put aside", and Git does not create a commit on the current branch. To restore the saved state again, ``apply the saved patch to the current working tree and delete the stash at the same time, use

$ git stash pop
...
Dropped refs/stash@{0} (d4cc94c37e92390e5fabf184a3b5b7ebd5c3943a)

Between saving and restoring the repository you can change the repository as you like, e.g. change the branch, make commits, etc. The stash is always applied to the current working tree.

The command git stash pop is an abbreviation for the two commands git stash apply and git stash drop:

$ git stash apply
...
$ git stash drop
Dropped refs/stash@{0} (d4cc94c37e92390e5fabf184a3b5b7ebd5c3943a)

Both pop and apply maintain the changes in the working tree, the index state is not restored again. The --index option also restores the stored state of the index.

The --patch (or short -p) option starts an interactive mode, i.e. you can select individual hunks to add to the stash just like with git add -p and git reset -p:

$ git stash -p

The configuration setting interactive.singlekey (see Sec. 2.1.2, “Creating Commits Step by Step”) also applies here.

4.5.2. Solving Conflicts

Conflicts can occur if you apply a stash to a commit other than the one on which it was created:

$ git stash pop
Auto-merging hello.pl
CONFLICT (content): Merge conflict in hello.pl

In this case, use the usual recipes to solve the conflict, see Sec. 3.4, “Resolving Merge Conflicts”. It is important, however, that the conflict markers are labeled Updated Upstream (the version in the current working tree) and Stashed Changes (changes in the stash):

<<<<<<< Updated upstream
# E-Mail: valentin.haenel@gmx.de
========
# E-Mail: valentin@gitbu.ch
>>>>>>> Stashed changes

If you have tried to apply a stash with git stash pop and a conflict has occurred, the stash will not be deleted automatically. You must explicitly delete it (after resolving the conflict) with git stash drop.

4.5.3. If You Can Not Apply the Stash…

The stash is applied to the current working tree by default, provided it is clean - if not, Git aborts:

$ git stash pop
Cannot apply to a dirty working tree, please stage your changes

While Git suggests that you add the changes to the index, how you should proceed depends on your goal. If you want to have the changes in the stash in addition to those in the working tree, here’s a good idea:

$ git add -u
$ git stash pop
$ git reset HEAD

For explanation: First, the unsaved changes to the working tree are added to the index; then the changes are extracted from the stash and applied to the working tree, and finally the index is reset.

Alternatively, you can create an additional stash and apply the changes you want to have to a clean working tree:

$ git stash
$ git stash apply stash@{1}
$ git stash drop stash@{1}

For this recipe you use several stashes. First you store the changes in the working tree into a new stash, then you get the changes you actually want from the previous stash and delete it after the application.

4.5.4. Adjusting Messages

By default, Git sets the following message for a stash

WIP: on <branch>: <sha1> <commit-msg>

<branch>: the current branch

<sha1>: the commit ID of the HEAD

<commit-msg>: the commit message of the HEAD

In most cases this is sufficient to identify a stash. If you plan to keep your stashes longer (possible, but not really recommended), or if you want to do more than one, we recommend that you add a better note to them:

$ git stash save "unfertiges feature"
Saved working directory and index state On master: unfertiges feature
HEAD is now at b529e34 new spec how the script should behave

4.5.5. Viewing Stashes

Git manages all stashes as a stack, i.e. more recent states are on top and are processed first. The stashes are named with a reflog syntax (see also Sec. 3.7, “Reflog”):

    stash@{0}
    stash@{1}
    stash@{2}
    ...

If you create a new stash, it will be called stash@{0} and the number of the others will be incremented: stash@{0} becomes stash@{1}, stash@{1} becomes stash@{2} and so on.

If you do not specify an explicit stash, the commands apply, drop and show refer to the most recent, i.e. stash@{0}.

To view individual stashes, use git stash show. By default, this command prints a balance of the added and removed lines (like git diff --stat):

$ git stash show
git-stats.sh |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

The git stash show command also accepts general diff options that affect the format, e.g. `-p` to output a patch in diff format:

$ git stash show -p stash@{0}
diff --git a/git-stats.sh b/git-stats.sh
index 62f92fe..1235fd3 100755
--- a/git-stats.sh
\+++ b/git-stats.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
-START=18.07.2010
-END=25.07.2010
+START=18.07.2000
+END=25.07.2020
  echo "Number of commits per author:"

The git stash list command prints a list of currently created stashes:

$ git stash list
stash@{0}: WIP on master: eae23b6 add number of merge commits to output
stash@{1}: WIP on master: b1ee2cf start and end date in one place only

4.5.6. Deleting Stashes

Individual stashes can be deleted with the command git stash drop, all with git stash clear. If you delete a stash by mistake, you won’t find it again via the usual reflog mechanisms! However, the following command prints the former stashes:⁠^[62]

$ git fsck --unreachable | grep commit | cut -d" "  -f3 | \
  xargs git log --merges --no-walk --grep=WIP

In case of emergency, note that you will find the command at the very end of the git-stash(1) man page.

It is also important that the entries shown in this way only exist as unreachable objects in the object database and are therefore subject to the normal maintenance mechanisms — they are therefore deleted after some time and are not kept permanently.

4.5.7. How Is the Stash Implemented?

Git creates two commit objects for each stash, one for working tree changes and one for index changes. Both have the current HEAD as their ancestor, the working tree object has the index object as its ancestor. This makes a stash in Gitk appear as a triangle, which is a bit confusing at first:

Figure 29. A stash in Gitk

With the alias git tree (see Sec. 3.6.1, “Revision Parameters”) this looks like this:

*   f1fda63 (refs/stash) WIP on master: e2c67eb Kommentar fehlte
|\
| * 4faee09 index on master: e2c67eb Kommentar fehlte
|/
* e2c67eb (HEAD, master) Kommentar fehlte
* 8e2f5f9 Test Datei
* 308aea1 README Datei
* b0400b0 Erste Version

Since the stash objects are not referenced by a branch, the working tree object is kept alive with a special reference, refs/stash. However, this only applies to the latest stash. Older stashes are only referenced in the Reflog (see ` Sec. 3.7, “Reflog”) and therefore do not appear in Gitk. In contrast to normal reflog entries, stored stashes do not expire and are therefore not deleted by the normal maintenance mechanisms.

4.6. Annotating Commits — Git Notes

In general, it is not easy to modify or extend commits once they have been published. Sometimes, however, you wish you could “attach” information to commits afterwards, without the commit changing. This could be ticket numbers, information about whether the software compiled, who tested it, and so on.

Git offers a way to attach notes to a commit using the git notes command. The notes are an uncoupled branch of commits, referenced by refs/notes/commits, on which the development of the notes is stored. On this branch, the notes for a commit are stored in a file whose filename corresponds to the SHA-1 sum of the commit it describes.

But you can disregard these internals — in practice, you can manage the notes completely with git notes. The only important thing is to know: You can only save one note per commit ⁠^[63]. But you can edit or extend the notes afterwards.

To add a new note: git notes add <commit>. If you omit <commit>, HEAD will be used. Similar to git commit an editor opens where you write the note. Alternatively, you can specify it directly with -m "<note>".

By default, the note is always displayed below the commit message:

$ git show 8e8a7c1f
commit 8e8a7c1f4ca66aa024acde03a58c2b67fa901f88
Author: Julius Plenz <julius@plenz.com>
Date:   Sun May 22 15:48:46 2011 +0200

    Schleife optimieren

Notes:
    Dies verursacht Bug #2319 und wird mit v2.1.3-7-g6dfa88a korrigiert

With the --no-notes option you can explicitly tell commands like log or show not to display notes.

The command git notes add will end with an error if a note already exists for the given commit. Use the git notes append command instead to append more lines to the note, or directly git notes edit to edit the note as desired.

By default the notes are not uploaded or downloaded, you have to do this explicitly with the following commands:

$ git push <remote> refs/notes/commits
$ git fetch <remote> refs/notes/commits:refs/notes/commits

The notes concept is not very well developed in Git. In particular, it is problematic when multiple developers create commit notes in parallel, and then need to merge them. For more information, see the git-notes(1) man page.

If you want to use notes, this is usually only useful in connection with ticket, bug tracking or continuous integration systems: These could automatically create notes and thus possibly store helpful additional information in the repository.

To automatically download the notes at each git fetch, add a refspec of the following form to the file git/config (see also Sec. 5.3.1, “git fetch”):

  fetch = +refs/notes/*:refs/notes/*

4.7. Multiple Root Commits

When a repository is initialized, the first commit, called the root commit, is created. This commit is usually the only commit in the entire repository that has no predecessor.

However, it is also possible to have multiple root commits in one repository. This can be useful in the following cases:

You want to merge two independent projects that were previously developed in separate repositories (see also subtree-merges in Sec. 5.11.2, “Subtrees”).

` You want to manage a fully decoupled branch where you keep a todo list, compiled binaries or autogenerated documentation.

In case you want to merge two repositories, this command is usually sufficient:

$ git fetch -n <anderes-repo> master:<anderer-master>
warning: no common commits
...
>From <anderes-repo>
 * [new branch]      master     -> <anderer-master>

The master branch of the other repository is copied to the local repository as <other-master>, including all commits until Git finds a merge base or root commit. The warning "no common commits already indicates that the two version histories do not have a common commit. The repository now has two root commits.

Note that a merge between two branches that do not share commits will fail since a file exists on both sides and is not equal. This may be remedied by subtree-merges, see Sec. 5.11.2, “Subtrees”.

You can also, instead of importing another repository, create a completely detached branch, ``a second root commit. The following two commands are sufficient for this:

$ git checkout --orphan <newroot>
$ git rm --cached -rf .

The first one sets the HEAD to the (not yet existing) branch <newroot>. The rm command deletes all Git-managed files from the index, but leaves them intact in the working tree. So now you have an index that doesn’t contain anything and a branch that doesn’t have a commit yet.

You can now use the git add command to add files to the new root commit and then create it with git commit.

4.8. Finding Regressions — Git Bisect

In software development, a regression refers to the point in time from which a certain feature of a program no longer functions. This can be after an update of libraries, after the introduction of new features that cause side effects etc.

To find such regressions is sometimes difficult. If you are using an extensive test suite, you are relatively well protected from including trivially detectable regressions (e.g. by running a make test before each commit).

If the regression is reproducible ("with the arguments <x> the program crashes", "the configuration setting <y> causes a memory access error"), then you can use Git to automate the search for the commit that causes this regression.

Git provides the command bisect for this purpose, whose algorithm is based on the "divide and conquer" principle (divide and conquer_) works: First you define a point in time (i.e. a commit) when the regression had not yet occurred (called `good), then a point in time when it occurs (called bad, leave it out, Git assumes HEAD). The bisect command is based on the idealized assumption that the regression was initiated by a commit — that is, there is a commit before that everything was fine, and after that the error occurs.⁠^[64]

Now Git chooses a commit from the middle between good and bad and checks it out. You must then check whether the regression is still present. If yes, Git will set bad to this commit, if no, good will be set to this commit. This removes about half of the commits to examine. Git repeats the step until only one commit remains.

So the number of steps bisect takes is logarithmic to the number of commits you examine: For n commits, you need about log₂(n) steps. For 32 commits, that’s a maximum of five steps, but for 1024 commits, that’s a maximum of 10 steps, because ``you can eliminate 512 commits in the first step.

4.8.1. Usage

You start a bisect session with the following commands:

$ git bisect start
$ git bisect bad <funktioniert-nicht>
$ git bisect good <funktioniert>

Once you’ve defined the two points, Git checks out a commit in the middle, so you’re now in detached-head mode (see Sec. 3.2.1, “Detached HEAD”). After you have checked whether the regression is still present, you can mark it with git bisect good or git bisect bad. Git will automatically check out the next commit.

You may not be able to test the checked out commit, for example, because the program does not compile correctly. In this case, you can use git git bisect skip to have another commit nearby selected and proceed with it as usual. You can cancel the debugging at any time with git bisect reset.

4.8.2. Automation

Ideally, you can test automatically whether the error occurs — with a test that must run successfully if the regression does not occur.

You can then define the points good and bad as above. Afterwards you enter git bisect run <path/to/test>.

Based on the return value, bisect decides whether the checked commit is good (if the script ends successfully, i.e. with return value 0) or bad (values 1—127). A special case is the return value 125, which causes a git bisect skip. So if you have a program that needs to be compiled, the first thing you should do is to add a command like make || exit 125, so that the commit is skipped if the program does not compile properly.

Bisect can then automatically identify the problematic commit. This looks like this, for example:

$ git bisect run ./t.sh
Bisecting: 9 revisions left to test after this (roughly 3 steps) ...
Bisecting: 4 revisions left to test after this (roughly 2 steps) ...
Bisecting: 2 revisions left to test after this (roughly 1 step) ...
Bisecting: 0 revisions left to test after this (roughly 0 steps) ...
d29758fffc080d0d0a8ee9e5266fdf75fcb98076 is the first bad commit

With small commits and meaningful descriptions you can save yourself a lot of work by using the bisect command when searching for obscure bugs.

So take special care not to create commits that leave the software in a broken state (does not compile, etc.), which a later commit will fix.

5. Distributed Git

Git is a distributed version control system. To understand this feature, a brief digression into the world of centralized version management is necessary: As the name suggests, in a central version control system, such as RCS, CVS, and Subversion, the development history is stored centrally on a repository server, and all developers synchronize their work with this one repository. Developers who want to change something download a current version to their computer (checkout), maintain their modifications, and then send them back to the server (commit).

5.1. How Does Distributed Version Control Work?

One of the major disadvantages of the centralized approach is that a connection to the server is required for most of the work steps. For example, if you want to view history or make a commit, you need a network connection to the server. Unfortunately, this is not always guaranteed, maybe the server is down or you are working on your laptop without a (W)LAN connection.

For distributed systems this is regulated differently: Basically, each developer has his or her own local copy of the repository, so the question arises of how developers share changes.

One approach is to provide a single “master repository” that all developers use to synchronize their local repositories. The developers connect to this repository from time to time, uploading their own commits (push) and downloading those of their colleagues (fetch or pull). This very centralized approach is often used in practice. For an illustration, see Figure 30, “Central workflow with distributed version management”.

However, there are two noteworthy alternatives in the Git environment that we will introduce in this chapter: the Integration Manager workflow, which uses multiple public repositories (Sec. 5.6, “Distributed Workflow with Multiple Remotes”), and patch exchange by e-mail (Sec. 5.9, “Patches via E-mail”).

Figure 30. Central workflow with distributed version management

Unlike central systems, Git’s commit and checkout processes are local. Other day-to-day tasks, such as reviewing history or switching to a branch, are also done locally. Only the uploading and downloading of commits are non-local operations. This has two important advantages over centralized version management: No network is needed, and everything is faster. How often you synchronize your repository depends, among other things, on the size and development speed of the project. If you’re working with a colleague on the internals of your software, you’ll probably need to synchronize more often than if you’re working with a feature that doesn’t have a major impact on the rest of the code base. It may well be that one synchronization per day is sufficient. So you can work productively even without a permanent network connection.

This chapter is about how to exchange changes between your local repository and a remote repository (aka remote), what to consider when working with multiple remotes, and how to email patches so that they can be easily applied by the recipient.

The most important commands at a glance:

git remote: General configuration of remotes: add, remove, rename, etc.

git clone: Download complete copy.

git pull and git fetch: Download commits and references from a remote.

git push: Upload commits and references to a remote.

5.2. Cloning Repositories

You have already seen the first command related to remote repositories: git clone. Here we illustrate the cloning process with our “git cheat sheet”:⁠^[65]

$ git clone git://github.com/esc/git-cheatsheet-de.git
Initialized empty Git repository in /tmp/test/git-cheatsheet-de/.git/
remote: Counting objects: 77, done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 77 (delta 45), reused 0 (delta 0)
Receiving objects: 100% (77/77), 132.44 KiB, done.
Resolving deltas: 100% (45/45), done.

Git will issue various status messages when this call is made. The most important ones are: the notification of which directory the new repository will be cloned to (Initialized empty Git repository in /tmp/test/git-cheatsheet-de/.git/), and the confirmation that all objects have been successfully received ((Receiving objects: 100% (77/77), 132.44 KiB, done.) If the cloning process is successful, the master branch is checked out,⁠^[66] and the working tree including repository is located in the directory git-cheatsheet-en.

$ cd git-cheatsheet-de
$ ls
cheatsheet.pdf  cheatsheet.tex  Makefile  README
$ ls -d .*
.git/

To create the clone in a different directory, simply pass it as an argument:

$ git clone git://github.com/esc/git-cheatsheet-de.git cheatsheet
Initialized empty Git repository in /tmp/test/cheatsheet/.git/
$ ls
cheatsheet/

Furthermore, the source repository, i.e. the origin of the clone, is configured as a remote repository named origin. The git remote command displays the setting:

$ git remote
origin

The setting is stored in the configuration file .git/config with the entry remote, in this case only for origin:

[remote "origin"]
    fetch = +refs/heads/*:refs/remotes/origin/*
    url = git://github.com/esc/git-cheatsheet-de.git

You will see two settings in the section: fetch and url. The first, called the refspec, specifies which changes are to be downloaded when synchronizing with the remote repository, and the second specifies the URL used to do this.

git remote is also used to manage remote repositories. For example, you can add more remote repositories using git remote add, adapt the URL for the remote repository using git remote set-url, and so on, but more on this later.

The name origin is just a convention; with git remote rename you can change the name of the source repository to suit your needs, for example, from origin to github:

$ git remote rename origin github
$ git remote
github

With the option --origin or -o you set the name immediately when cloning:

$ git clone -o github git://github.com/esc/git-cheatsheet-de.git

5.2.1. Repository URLs

Git supports several protocols for accessing a remote repository, the most common three being Git protocol, SSH, and HTTP(S). Designed specifically for Git, the Git protocol favors data transfer by always transferring the smallest possible amount of data. It doesn’t support authentication, so it’s often transmitted over an SSH connection. This ensures both efficient (Git protocol) and secure (SSH) transmission. HTTP(S) is used when a firewall is configured very restrictively and the allowed ports are drastically restricted.⁠^[67]

In general, a valid URL contains the transfer protocol, the address of the server and the path to the repository:⁠^[68]

ssh://[user@]gitbu.ch[:port]/pfad/zum/repo.git/
git://gitbu.ch[:port]/pfad/zum/repo.git/
http[s]://gitbu.ch[:port]/pfad/zum/repo.git/

For the SSH protocol the short form still exists:

[user@]gitbu.ch:pfad/zum/repo.git/

It is also possible to clone repositories locally using the following syntax:

/pfad/zum/repo.git/
file:///pfad/zum/repo.git/

If you want to know what URLs are configured for a remote repository, use git remote’s --verbose or -v option:

$ git remote -v
origin  git://github.com/esc/git-cheatsheet-de.git (fetch)
origin  git://github.com/esc/git-cheatsheet-de.git (push)

You can see that there are two URLs for the remote repository origin, but they are set to the same value by default. The first URL (fetch) specifies from where and with which protocol changes are downloaded. The second URL (push) specifies where changes are uploaded to and with which protocol. Different URLs are particularly interesting if you download or upload with different protocols. A common example is to download with the git protocol (git://) and upload with the SSH protocol (ssh://). It is then downloaded without authentication and encryption, which provides a speed advantage, but uploaded with authentication and encryption, which ensures that only you or other authorized people can upload. You can use the git remote set-url command to customize the URLs:

$ git remote set-url --add \
  --push origin pass:quotes[git@github.com]:esc/git-cheatsheet-de.git
$ git remote -v
origin  git://github.com/esc/git-cheatsheet-de.git (fetch)
origin  git@github.com:esc/git-cheatsheet-de.git (push)

If you want to customize the URL of a repository, it is often faster to do this directly in the .git/config configuration file. Git provides the git config -e command for this: it opens this file in your editor.

5.2.2. Remote-Tracking-Branches

The current status of the remote repository is stored locally. Git uses the mechanism of remote tracking branches, special branches — local references — that reflect the state of the remote branches. They “track” the remote branches and are advanced or set by Git when synchronizing with the remote, if the branches in the remote have changed. In terms of the commit graph, remote tracking branches are markers within the graph that point to the same commits as the branches in the remote repository. You can’t modify remote tracking branches like normal branches; Git manages them automatically, so it updates them. When you clone a repository, Git initializes a remote tracking branch for each remote branch.

Figure 31. Generated Remote Tracking Branches

Figure 31, “Generated Remote Tracking Branches” shows an example. The origin remote repository has three branches: pu, maint, and master. Git creates a remote tracking branch in the cloned repository for each of these remote branches. It also creates a local branch master in the clone that corresponds to the remote branch master. This is checked out and is the branch you should work in if you plan to upload commits to the master (but see Sec. 5.3.1, “git fetch”).

In the git fetch example, there is only one branch on the remote side, master. That’s why Git creates only one remote tracking branch in the clone, origin/master. The git branch -r command shows all remote tracking branches:

$ git branch -r
  origin/HEAD -> origin/master
  origin/master

The special entry origin/HEAD → origin/master states that in the remote repository the HEAD points to the branch master. This is important for cloning, because this branch is checked out after cloning. The list of remote tracking branches is a bit sparse in this example, you can see more entries in a clone of the Git-via-Git repository:

$ git branch -r
  origin/HEAD -> origin/master
  origin/html
  origin/maint
  origin/man
  origin/master
  origin/next
  origin/pu
  origin/todo

All branches can be displayed with git branch -a:

$ git branch -a
* master
  remotes/origin/HEAD -> origin/master
  remotes/origin/master

In this case, Git uses the prefix remotes/ to clearly distinguish remote tracking branches from normal ones. If you have enabled color output, the different branches will also be color-coded: the checked-out branch green, remote tracking branches red.

Remote Tracking Branches are also references only and are therefore stored under .git/refs like all references. However, since they are special references that are also linked to a remote repository, they end up under .git/refs/remotes/<remote-name> (see Sec. 3.1.1, “HEAD and Other Symbolic References”). In Gitk, the remote tracking branches are displayed with the prefix remotes/<remote-name>/, which is also colored dark yellow (Figure 32, “Branch next and the corresponding remote tracking branch in Gitk”).

Figure 32. Branch next and the corresponding remote tracking branch in Gitk

5.3. Downloading Commits

Now what does it mean when you synchronize two repositories, such as a clone with the source? Synchronization in this context means two things: first, downloading commits and references, and second, uploading. As far as the commit graph is concerned, the local graph needs to be synchronized with the one on the remote side, so that both have the same structure. In this section, we first discuss how to download commits and references from a remote. There are two commands for this: git fetch and git pull. We’ll first introduce both commands, and in Sec. 5.3.3, “git fetch vs. git pull” we’ll describe which command is preferable under which circumstances.

5.3.1. git fetch

As soon as new commits are created by other developers in a remote, you want to download them to your local repository. In the simplest case, you just want to find out which commits you don’t have locally, download them, and update the remote tracking branches so that they reflect the current status in the remote.

Use the git fetch command to do this:

$ git fetch origin
...
From github.com:esc/git-cheatsheet-de
   79170e8..003e3c7  master     -> origin/master

Git acknowledges the call with a message that origin/master has been set from commit 79170e8 to commit 003e3c7. The notation master → origin/master indicates that the branch master from the remote was used to update the remote tracking branch origin/master. In other words: Branches from the remote on the left and remote tracking branches on the right.

See Figure 33, “Remote Tracking Branches are updated” for the effect this has on the commit graph: On the left side is the initial state of the remote origin and next to it that of the clone. Both the remote and the clone have new commits since the last synchronization (C and D). The remote tracking branch origin/master in the clone points to commit B; this is the last state of the remote known to the clone. By calling git fetch origin, Git updates the remote tracking branch in the clone to reflect the current status of the master (pointing to commit C) in the remote. To do this, Git downloads the missing commit C and then sets the remote tracking branch on it.

Figure 33. Remote Tracking Branches are updated

5.3.1.1. Refspec

The refspec (reference specification) ensures that the remote tracking branches are set. This is a description of the references to be retrieved from the remote. An example was given above:

[remote "origin"]
    fetch = +refs/heads/*:refs/remotes/origin/*
    url = git://github.com/esc/git-cheatsheet-de.git

In the entry fetch the refspec for the remote is stored. It has the form: <remote-refs>:<local-refs> with an optional plus (+). The example is configured so that all branches, i.e. all references stored in the remote under refs/heads, end up locally under refs/remotes/origin.⁠^[69] Thus, for example, the branch master from the remote origin (refs/heads/master) is stored locally as refs/remotes/origin/master.

Normally the remote tracking branches are “fast-forwarded”, similar to a fast-forward merge. The remote tracking branch is therefore only updated if the target commit is a descendant of the current reference. This may not be possible, for example, after a rebase. In this case, Git will refuse to update the remote tracking branch. However, the plus overrides this behavior, and the remote tracking branch is still updated. If this happens, Git will indicate this with the addition (forced update):

 + f5225b8..0efec48 pu         -> origin/pu  (forced update)

This setting is useful in practice and is therefore set by default. Furthermore, as a user you do not need to worry about setting the refspec, because if you use the command git clone or git remote add, Git automatically creates the corresponding default entry for you. Sometimes you may want to restrict the refspec explicitly. For example, if you use namespaces for all developers and you are only interested in the master branch and the branches of the other developers in your team (Beatrice and Carlos), it might look like this:

[remote "firma"]
    url = axel@example.com:produkt.git
    fetch = +refs/heads/master:refs/remotes/origin/master
    fetch = +refs/heads/beatrice/*:refs/remotes/origin/beatrice/*
    fetch = +refs/heads/carlos/*:refs/remotes/origin/carlos/*

With regard to the commit graph, Git only downloads those commits that are necessary to get references in the commit graph. This makes sense, because commits that are not “secured” by a reference are considered unreachable, and will eventually be deleted (see also Sec. 3.1.2, “Managing Branches”). In the last example, Git therefore does not need to download commits that are referenced by the branches that are not in the refspec. In terms of distribution, Git does not necessarily need to synchronize the entire commit graph, the “relevant” parts are sufficient.

Alternatively, you can specify the refspec on the command line:

$ git fetch origin +refs/heads/master:refs/remotes/origin/master

If there is a refspec that has no reference on the right side of the colon, there is no target to store. In this case, Git places the reference in the .git/FETCH_HEAD file instead, and you can use the special term FETCH_HEAD for a merge:

$ git fetch origin master
From github.com:esc/git-cheatsheet-de
 * branch            master     -> FETCH_HEAD
$ cat .git/FETCH_HEAD
003e3c70ce7310f6d6836748f45284383480d40e
    branch 'master' of github.com:esc/git-cheatsheet-de
$ git merge FETCH_HEAD

This feature can be useful if you are interested in a single remote branch that you have not configured a remote tracking branch for and do not want to do so.

5.3.1.2. Deleting Expired Remote Tracking Branches

If a Remote Branch is deleted (as described in Sec. 5.4.1, “Deleting Remote References”), the corresponding Remote Tracking Branch is referred to as stale (“expired”). Since such branches usually have no further use, delete them (prune):

$ git remote prune origin

Delete directly during download:

$ git fetch --prune

Since this is often the desired behavior, Git offers the fetch.prune option. If you set it to true, git fetch will behave as if you had called it with the --prune option.

5.3.1.3. Working with Local Branches

So far we have only discussed how to track the change in a remote. If you make changes yourself that are based on one of the branches in the remote, you must first create a local branch where you are allowed to make commits:⁠^[70]

$ git checkout -b next origin/next
Branch next set up to track remote branch next from origin.
Switched to a new branch next

If no local branch named next exists yet, the following abbreviation also works:

$ git checkout next
Branch next set up to track remote branch next from origin.
Switched to a new branch next

The set up to track message indicates that Git is configuring the branch next from the remote origin as the upstream branch for the local branch next. This is a kind of “shortcut” that benefits other Git commands. For more details, see Sec. 5.3.2, “git pull”.

You can work in the local branch as usual. Note, however, that you only ever commit locally. To publish your work, i.e. upload it to a remote branch, you still need the git push command (Sec. 5.4, “Uploading Commits: git push”).

5.3.2. git pull

Suppose you want to transfer commits from the remote repository to your local branch. To do this, first run a git fetch to fetch new commits, and then merge the change from the corresponding remote tracking branch:⁠^[71]

$ git merge origin/master
Updating 79170e8..003e3c7
Fast-forward
 cheatsheet.pdf |  Bin 89792 -> 95619 bytes
 cheatsheet.tex |   19 +++++++++++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

For this use case, Git provides the git pull command to speed up your workflow. It is a combination of git fetch and git merge or git rebase.

Downloading new commits from origin and merging all commits referenced by the master there into the current branch can be done with the following command:

$ git pull origin master
...
From github.com:esc/git-cheatsheet-de
   79170e8..003e3c7  master     -> origin/master
Updating 79170e8..003e3c7
Fast-forward
 cheatsheet.pdf |  Bin 89792 -> 95619 bytes
 cheatsheet.tex |   19 ++++++++++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

In Figure 34, “What happens with a pull” we illustrate the process. On the left, you see the remote repository origin and next to it the current status of the local repository. The repository was cloned when it only contained commits A and B, so the remote tracking branch points origin/master to B. In the meantime, both the remote (C) and local (D) repositories have been added.

On the right side is the state after git pull origin master. Commit C has been added to the local repository. The fetch call contained in the pull has updated the remote tracking branch, i.e. it points to the same commit as the master in origin and thus reflects the state there. In addition, the merge call contained in the pull has integrated the master from origin into the local master, as you can see from the merge commit M and the current position of the local master.

Figure 34. What happens with a pull

Alternatively, the --rebase option instructs the pull command to rebase the local branch to the remote tracking branch after fetch:

$ git pull --rebase  origin master

In Figure 35, “What happens during a pull with rebase” you can see what happens if you perform a rebase instead of the default merge.

Figure 35. What happens during a pull with rebase

The initial situation is the same as in Figure 34, “What happens with a pull”. The fetch contained in the pull moves the remote tracking branch origin/master to commit C. However, rebase does not create a merge commit; instead, a call to rebase gives the commit D a new base, and the local master is set to the new commit D'. (Rebase is described in detail in Sec. 4.1, “Moving commits — Rebase”).

5.3.2.1. Upstream Branches

Often git fetch, git pull and git push are executed without arguments. Git uses the configuration of the upstream branches to decide what to do, among other things. From the repository’s config:

[branch "master"]
    remote = origin
    merge = refs/heads/master

The entry states that the local branch master is linked to the remote branch master in the origin repository.

The remote entry instructs git fetch and git pull, from which remote commits are downloaded. The merge entry tells git pull to merge the new commits from the remote branch master to the local master. This allows both commands to be used without arguments, which is very common in practice.

$ git fetch
...
From github.com:esc/git-cheatsheet-de
   79170e8..003e3c7  master     -> origin/master
$ git pull
...
From github.com:esc/git-cheatsheet-de
   79170e8..003e3c7  master     -> origin/master
Updating 79170e8..003e3c7
Fast-forward
 cheatsheet.pdf |  Bin 89792 -> 95619 bytes
 cheatsheet.tex |   19 ++++++++++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

If no upstream branch is configured, it tries git fetch with origin and otherwise aborts:

$ git fetch
fatal: No remote repository specified.  Please, specify either a URL or
a remote name from which new revisions should be fetched.

If you want changes from an upstream branch on git pull to be applied by rebase instead of merge by default, set the value of the branch.<name>.rebase setting to true, for example

$ git config branch.master.rebase true

5.3.3. git fetch vs. git pull

Git beginners often ask themselves whether they should use fetch or pull. The answer depends on how you develop: How big is the project? How many remotes are there? How heavily are branches used?

5.3.3.1. Distributed Git for Beginners

Especially for beginners, it makes sense that all participants work on the same branch (usually master), synchronize with the same repository (central workflow) and use only git pull for downloading and git push for uploading. This eliminates the need to deal with more complex aspects such as object model, branching and distribution; and participants can contribute improvements with just a few commands.

This results in the following workflow:

# Repository Klonen
$ git clone <URL>
# Arbeiten und lokale Commits machen
$ git add ...
$ git commit
# Veränderungen von Anderen herunterladen
$ git pull
# Eigene Veränderungen hochladen
$ git push
# Weiter arbeiten, und Synchronisation bei Bedarf wiederholen
$ git commit

This approach has advantages and disadvantages. The advantage is certainly that only a basic understanding of Git is necessary to follow the workflow successfully. The automatic configuration of upstream branches ensures that git push and git pull do the “right thing” without argument. In addition, this workflow is similar to what Subversion users are used to.

However, there are also drawbacks, mainly related to implicit merging. Suppose the team consists of two people, Beatrice and Carlos. Both have made local commits, and Beatrice has already uploaded hers. Carlos now runs git pull and receives the message Merge made by recursive. If you keep the commit graph in mind, it’s logical: the local branch and the master of the remote have diverged, so they have been merged back together. However, Carlos doesn’t understand the message, since he was working on a different part of the code than his colleague, and in his opinion no merge was necessary. One problem is that term merge stores the association that many people used to have with centralized version control that changes would be merged into the same file. With Git, however, a merge is always to be understood as the merging of commits into a commit graph. This may mean merging changes to the same file, but it does not require it.

Besides confusing users, this workflow creates “nonsensical” commits in the history. Ideally, merge commits should be meaningful entries in the repository history. An outsider can immediately see that a development branch has been included. However, this workflow inevitably involves the local master and its remote counterpart diverging and being merged back together. The resulting merge commits make no sense — they are actually only a side effect of the workflow and reduce the readability of the history. Although the --rebase option for git pull offers a remedy, the man page explicitly advises against using this option unless you have already internalized the principle of rebase. Once you understand this, you’re also familiar with how the commit graph is created and how to manipulate it — it’s worthwhile for you to go straight for feature-driven development with branches as a workflow.

5.3.3.2. Distributed Git for Advanced Users

Once you understand the object model and the commit graph, we recommend that you use a workflow that essentially consists of git fetch, manual merges, and many branches. The following are some recipes as a suggestion.

If you are using master as your integration branch, you will need to move your local U forward after calling git fetch. To be precise, you need to advance all local branches that have a remote equivalent. Git provides the syntax @{upstream} and @{u}, which corresponds to the remote tracking branch configured for the current branch. This can be very helpful.

# Veränderungen von Anderen herunterladen
$ git remote update
...
   79170e8..003e3c7  master     -> origin/master

# Den Status der Remote-Tracking-Branches abfragen
$ git branch -vv
* master 79170e8 [origin/master: behind 1] Lizenz hinzugefügt

# Veränderungen einsehen
$ git log -p ..@{u}

# Heruntergeladene Änderungen übernehmen
$ git merge @{u}
Updating 79170e8..003e3c7
Fast-forward
...

# ... oder eigene Änderungen darauf neu aufbauen
$ git rebase @{u}

# Änderungen dann hochladen
$ git push

If you frequently synchronize local branches with your remote tracking branch, we recommend the following alias:

$ git config --global alias.fft "merge --ff-only @{u}"

This allows you to easily move forward a branch with git fft (Fast Forward Tracking). The --ff-only option prevents accidental merge commits from occurring where none should.

In this context, Ch. 6, Workflows is also helpful, where it is described how to work clearly with many Topic Branches.

5.4. Uploading Commits: git push

The counterpart to fetch and pull is the command git push. This is used to upload git objects and references to a remote — e.g. the local master to the branch master in the remote origin:

$ git push origin master:master

As with git fetch, you specify the references for uploading with a refspec. However, the refspec has the opposite form:

<local-refs>:<remote-refs>

This time the local references are on the left side of the colon, and the remote references on the right.

If you omit the colon and the remote reference, the local name will also be used on the remote side, and will be created by Git if it doesn’t exist:

$ git push origin master
Counting objects: 73, done.
Compressing objects: 100% (33/33), done.
Writing objects: 100% (73/73), 116.22 KiB, done.
Total 73 (delta 42), reused 68 (delta 40)
Unpacking objects: 100% (73/73), done.
To git@github.com:esc/git-cheatsheet-de.git
 * [new branch]      master -> master

Figure 36, “Upload references and commits” shows the process behind git push. The initial situation is shown on the left (it is the result of a pull call). Git uploads the missing commits D and M to the remote origin. At the same time, the remote branch master is advanced to the commit M, so that it matches the local branch master. In addition, the remote tracking branch origin/master is advanced so that it reflects the current status in the remote.

Figure 36. Upload references and commits

Like fetch, Git refuses to update references where the target commit is not a descendant of the current commit:

$ git push origin master
...
 ! [rejected]        master -> master (non-fast-forward)
error: failed to push some refs to 'git@github.com:esc/git-cheatsheet-de.git'
To prevent you from losing history, non-fast-forward updates were
rejected
Merge the remote changes before pushing again.  See the 'Note about
fast-forwards' section of 'git push --help' for details.

You can override this behavior either by prefixing it with a plus (+) in the refspec or by using the --force or short -f option:⁠^[72]

$ git push origin --force master
$ git push origin +master

Look out! Commits may be lost on the remote side — for example, if you have moved a branch using git reset --hard and commits are no longer referenced.

You’ll also get the error message if you have modified commits that have already been published via git push using git rebase or git commit --amend. So here’s the explicit warning again: avoid modifying commits that you have already published! The modified SHA-1 sums will cause duplication if others have already downloaded the original commits.

5.4.1. Deleting Remote References

There are two ways to delete references in the remote: The older one (before Git version 1.7.0) is to omit the local reference in the refspec — this statement means you want to upload “nothing”. So you replace an existing reference with the empty one.

$ git push origin :bugfix

However, newer git versions usually use the git push command with the --delete option, which is syntactically much clearer:

$ git push origin --delete bugfix

Note that in other clones, the remote tracking branch origin/bugfix, if present, does not automatically disappear! See the section on pruning above (Sec. 5.3, “Downloading Commits”).

5.4.2. Pushing without Arguments: push.default

In everyday life you often run git push without specifying remote and refspec. In this case, Git uses the configuration entries (upstream branch and push.default) to decide which references are sent where.

$ git push
...
To git@github.com:esc/git-cheatsheet-de.git
   79170e8..003e3c7  master -> master

By default, Git proceeds like this:⁠^[73] If you don’t specify a remote, Git will look for the upstream configuration of the current branch. If the name of the branch on the remote side matches the name of the local branch, the corresponding reference is uploaded (this is to protect you from uploading, for example, your branch devel to master if the upstream configuration is incorrect). If no upstream branch is configured, Git aborts with an error message:

$ git push
fatal: The current branch master has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin master

If you use git push <remote> to specify a remote but no branch, Git will attempt to upload the current branch to the remote under the same name.

The strategy described here is also known as simple. For most use cases, it does what the user expects and protects against avoidable errors. However, you can set the push.default option responsible for this to one of the following values if required:

`nothing`	Do not upload anything. This is useful if you always want to explicitly specify which branch you want to upload to where.
`upstream`	If the current branch has an upstream branch, push there.
`current`	Push the current branch into a remote branch of the same name.
`matching`	Uploads all locally existing references for which a reference of the same name already exists in the corresponding remote. Attention: You are potentially uploading several branches at the same time!

5.4.3. Configuring the Upstream Branch

In some cases, Git will automatically configure upstream branches (for example, after a git clone). However, you need to do this explicitly, especially for new branches that you are uploading for the first time. You can do this either afterwards using the --set-upstream-to option or, in short, -u of git branch:

$ git push origin new-feature
$ git branch -u origin/new-feature
Branch new-feature set up to track remote branch new-feature from origin.

Alternatively, and if you think about it, you can also have git push write the configuration when you call git push with the -u option:

$ git push -u origin new-feature

To view the upstream configuration of your branches, call git branch -vv. The output shows the upstream partner of a branch (if any) in square brackets.

5.5. Examining Remotes

In this section, we introduce techniques for viewing a remote and comparing your local repository to it.

5.5.1. Overview of a Remote

The git remote show command gives a concise summary of the remote, including the branches available there, whether they are tracked locally (tracking status) and which local branches are configured for specific tasks.

The command must request the current status from the remote, i.e. the command fails if the remote is not available, e.g. due to a missing network connection. The option -n prevents the query.

$ git remote show origin
* remote origin
  Fetch URL: git://git.kernel.org/pub/scm/git/git.git
  Push  URL: git://git.kernel.org/pub/scm/git/git.git
  HEAD branch: master
  Remote branches:
    html   tracked
    maint  tracked
    man    tracked
    master tracked
    next   tracked
    pu     tracked
    todo   tracked
  Local branches configured for 'git pull':
    master merges with remote master
    pu     merges with remote pu
  Local refs configured for 'git push':
    master pushes to master (local out of date)
    pu     pushes to pu     (up to date)

5.5.2. Comparing with the Upstream

If you have configured an upstream branch, when you change the branch (git checkout) and query the status (git status), you will receive a notification about the status of the branch compared to the upstream, for example:

$ git checkout master
Your branch is behind 'origin/master' by 73 commits, and can be
fast-forwarded.

Here there are four different possibilities:

The branches point to the same commit. Git doesn’t show any special message. This state is also called up-to-date.

The local branch has commits that are not yet available upstream:

Your branch is ahead of 'origin/master' by 16 commits.

The remote tracking branch has commits that are not yet available in the local branch:

Your branch is behind 'origin/master' by 73 commits, and can be fast-forwarded.

Both the second and third conditions apply, a state called diverged in Git jargon:

Your branch and 'origin/master' have diverged, and have 16 and 73 different commit(s) each, respectively.

With the -v (compare only) or -vv (compare and upstream name) option, git branch displays the appropriate information for local branches:

$ git branch -vv
* master      0a464e9 [origin/master: ahead 1] docs: fix grammar in
git-tags.txt
  feature     cd3065f Merge branch 'kc/gitweb-pathinfo-w-anchor'
  next        be8b495 [origin/next] Merge branch master into next
  pu          0c0c536 [origin/pu: behind 3] Merge branch
'jk/maint-merge-rename-create' into pu

The command prints the SHA-1 prefix for all branches and the commit message of the current commit. If an upstream is configured for the branch, Git returns both the name and a comparison to the upstream. In the example, you see four different branches. master has an additional commit that has not yet been uploaded to the remote, and is therefore ahead. The branch feature, on the other hand, has no upstream branch configured, so it currently exists only locally. The branch next is up-to-date with the corresponding remote tracking branch. The Branch pu, on the other hand, “lags” behind its upstream and is therefore displayed as behind. The only state missing here is diverged — then both ahead and behind are shown including the number of “missing” commits.

5.6. Distributed Workflow with Multiple Remotes

Git supports working with multiple remotes. A popular workflow that takes advantage of this feature is the Integration Manager Workflow. There is no “central” repository in the true sense of the word, that is, one that all active developers have write access to. Instead, there is only a quasi-official repository called blessed. It is accessible, for example, via the respective project domain and allows only the most important maintainers (or even only one) write access.

Everyone who wants to contribute to the project clones the blessed repository and starts working. As soon as he has fixed bugs or implemented a new feature, he makes his improvements available via a publicly accessible repository, a so-called developer public. He then sends a pull request to one of the maintainers of the official repository (or to the mailing list), requesting that certain code from his public repository be transferred to the official repository. You can see the infrastructure for this process in Figure 37, “Integration Manager Workflow”. Although it is theoretically possible to give interested parties direct access to your development machine, this almost never happens in practice.

Figure 37. Integration Manager Workflow

One of the maintainers who have access to the master repository then checks if the code works, if it meets the quality requirements, etc. Any errors or ambiguities are reported to the author of the code, who then corrects them in his repository. Only when the maintainer is satisfied does he commit the changes to the master repository, so that the code is delivered in one of the following releases. Maintainers who integrate new code are often referred to as Integration Managers, which gives the workflow its name. Such maintainers often have several remotes configured, one for each contributor.

One of the great advantages of this workflow is that, in addition to the maintainers, interested users, such as colleagues or friends of the developer, also have access to the public developer repositories. They don’t have to wait until the code has found its way into the official repository, but can try out the improvements immediately after deployment. The hosting platform Github in particular relies heavily on this workflow. The web interface used there offers a lot of features to support this workflow, e.g. a visualization that shows all available clones of a project and the commits contained in them, as well as the possibility to perform merges directly in the web interface. For a detailed description of this service, see Ch. 11, GitHub.

5.7. Managing Remotes

With git remote you can manage additional remotes. For example, to add a new remote from another developer, use the command git remote add. Most of the time you’ll want to initialize the remote tracking branches afterwards, which you can do with git fetch:

$ git remote add example git://example.com/example.git
$ git fetch example
...

To do both steps in one call, use the -f option, for fetch:

$ git remote add -f example git://example.com/example.git

If you no longer need the remote, you can remove it from your local configuration using git remote rm. This will also delete all remote tracking branches for that remote:

$ git remote rm example

Remotes do not necessarily have to be configured via git remote add. You can simply use the URL on the command line,⁠^[74] for example to download the objects and references for a bugfix:

$ git fetch git://example.com/example.git bugfix:bugfix

Of course this also works with pull and push.

If you work with several remotes, the command git remote update --prune is a good choice. This will fetch all remotes, and the --prune option will delete all expired remote tracking branches.

The following alias has proved to be very useful for us, as it combines many work steps that are often performed one after the other in practice:

$ git config --global alias.ru "remote update --prune"

5.7.1. Pull-Request

To generate a pull request automatically, there is the git command request-pull. The syntax is:

git request-pull <start> <URL> [<end>]

As <URL> you specify your public repository (either as the actual URL or as a configured remote repository), and as <start> you select the reference on which the feature is built (in many cases the branch master, which should match the master branch of the official repository). Optionally, you can specify an <end>; if you omit this, Git will use HEAD.

The output is by default STDOUT, and includes the repository’s URL and branch name, a short description of all commits by author, and a diff state, i.e., a balance of added and deleted lines by file. This output can easily be forwarded to an e-mail program. If you add the -p option, a patch with all changes is appended below the text.

For example, to ask someone to download the two latest commits from a repository:

$ git request-pull HEAD~2 origin
The following changes since commit d2640ac6a1a552781[...]c48e08e695d53:

  README verbessert (2010-11-20 21:27:20 +0100)

are available in the git repository at:
  git@github.com:esc/git-cheatsheet-de.git master

Valentin Haenel (2):
      Lizenz hinzugefügt
      URL hinzugefügt und Metadaten neu formatiert

 cheatsheet.pdf |  Bin 89513 -> 95619 bytes
 cheatsheet.tex |   18 ++++++++++++++++--
 2 files changed, 16 insertions(), 2 deletions(-)

5.8. Exchanging Tags

Tags are also exchanged with the remote commands fetch or pull and push. In contrast to branches, which change, tags are “static”. For this reason, remote tags are not referenced locally again, so there is no equivalent to the remote tracking branches for the tags. Tags that you get from your remote repositories are stored by Git as .git/refs/tags/ or .git/packed-refs, as usual.

5.8.1. Downloading Tags

In principle, Git automatically downloads new tags when you call git fetch or git pull. That is, if you download a commit that has a tag pointing to it, that tag will be included. However, if you use a refspec to exclude individual branches, then commits in those branches will not be downloaded, and thus no tags that may point to those commits will be downloaded. Conclusion: Git only downloads relevant tags. With the options --no-tags (no tags) and --tags or -t (all tags) you can adjust the default behavior. Note, however, that --tags not only downloads the tags, but necessarily the commits to which they point.

Git notifies you when new tags arrive:

$ git fetch
[fetch output]
From git://git.kernel.org/pub/scm/git/git
 * [new tag]         v1.7.4.2   -> v1.7.4.2

If you want to know what tags are present on the remote side, use git ls-remote with the --tags option. For example, you can get all release candidates of git version 1.7.1 with the following call:

$ git ls-remote origin --tags v1.7.1-rc*
bdf533f9b47dc58ac452a4cc92c81dc0b2f5304f    refs/tags/v1.7.1-rc0
537f6c7fb40257776a513128043112ea43b5cdb8    refs/tags/v1.7.1-rc0^{}
d34cb027c31d8a80c5dbbf74272ecd07001952e6    refs/tags/v1.7.1-rc1
b9aa901856cee7ad16737343f6a372bb37871258    refs/tags/v1.7.1-rc1^{}
03c5bd5315930d8d88d0c6b521e998041a13bb26    refs/tags/v1.7.1-rc2
5469e2dab133a197dc2ca2fa47eb9e846ac19b66    refs/tags/v1.7.1-rc2^{}

Git outputs the SHA-1 sums of the tags and their contents.⁠^[75]

5.8.2. Uploading Tags

Git does not automatically upload tags. You need to pass them explicitly to git push, similar to the branches, e.g. to upload the tag v0.1:

$ git push origin v0.1

If you want to upload all tags at once, use the --tags option. But be careful: Avoid this option if you use Annotated Tags to mark versions and Lightweight Tags to mark something locally, as described in Sec. 3.1.3, “Tags — Marking Important Versions”, because with this option you would upload all tags, as already mentioned.

Attention: Once you have uploaded a tag, you should never change it! The reason: Let’s say Axel changes a tag, like v0.7, that he has already released. First it pointed to the 5b6eef commit, and now to bab18e. Beatrice had already downloaded the first version pointing to 5b6eef, but Carlos had not yet. The next time Beatrice calls git pull, Git won’t download the new version from the v0.7 tag; the assumption is that tags don’t change, so Git doesn’t check the validity of the tag! When Carlos now runs git pull, he also gets the v0.7 tag, but it now points to bab18e. Finally, two versions of the tag — each pointing to different commits --- are in circulation. Not a very helpful situation. It gets really confusing when both Carlos and Beatrice use the same public repository, and upload all tags by default.⁠^[76] The tag “jumps” back and forth between two commits in the public repository, so to speak; which version you get with a clone depends on who pushed last.

If you do get this mishap, you have two options:

The sensible alternative: Instead of replacing the tag, create a new one and upload it as well. Name the new tag according to the project conventions. If the old tag is v0.7, name the new one something like v0.7.1.
If you really want to replace the tag: Admit publicly (mailing list, wiki, blog) that you made a mistake. Let all developers and users know that a tag has changed and ask them to check the tag with you. The size of the project and your willingness to take risks will determine whether this solution is feasible.

5.9. Patches via E-mail

An alternative to setting up a public repository is to automatically send patches via email. The format of the email is chosen so that maintainers can have Git automatically apply patches received via email. Especially for small bug fixes and sporadic collaboration, this is usually less time-consuming and faster. There are many projects that rely on this type of exchange, most notably the Git project itself.

The majority of patches for Git are contributed via the mailing list. There they go through a stringent review process, which usually leads to corrections and improvements. The patches are improved by the author and sent back to the list until a consensus is reached. Meanwhile, the maintainer regularly stores the patches in a branch in his repository, and makes them available for testing via the pu branch. If the patch series is considered finished by the participants on the list, the branch moves on to the different integration branches pu and next, where the changes are tested for compatibility and stability. If everything is in order, the branch finally ends up in the master and from there forms part of the next release.

The approach patches via e-mail is realized by the following git commands:

`git format-patch`	Format commits for sending as patches.
`git send-email`	Send patches.
`git am`	Add patches from a mailbox to the current branch (apply from mailbox).

5.9.1. Exporting Patches

The git format-patch command exports one or more commits as patches in Unix mailbox format and prints one file per commit. The file names consist of a sequential numbering and the commit message, and end in .patch.⁠^[77] As an argument, the command expects either a single commit or a range such as A..B. If you specify a single commit, Git will evaluate this as the selection from the commit to the HEAD.

Figure 38. Formatting three commits to 'master' as patches

Figure 38, “Formatting three commits to 'master' as patches” shows the initial situation. We want to export the three commits in the fix-git-svn-docs branch, that is, all commits from master, as patches:

$ git format-patch master
0001-git-svn.txt-fix-usage-of-add-author-from.patch
0002-git-svn.txt-move-option-descriptions.patch
0003-git-svn.txt-small-typeface-improvements.patch

To export only the HEAD, use option -1, and format-patch will create a patch for the first commit only:

$ git format-patch -1
0001-git-svn.txt-small-typeface-improvements.patch

This also works for any SHA-1 sums:

$ git format-patch -1 9126ce7
0001-git-svn.txt-fix-usage-of-add-author-from.patch

The generated files contain, among other things, the header fields From, Date and Subject, which are used for sending as e-mail. These fields are completed using the information available in the commit — author, date, and commit message. The files also contain a diff-stat summary and the changes themselves as a patch in unified diff format. The [PATCH m/n] suffix⁠^[78] in the subject line is used later by Git to apply the patches in the correct order.

A corresponding excerpt follows:

$ cat 0003-git-svn.txt-small-typeface-improvements.patch
From 6cf93e4dae1e5146242338b1b9297e6d2d8a08f4 Mon Sep 17 00:00:00 2001
From: Valentin Haenel 
Date: Fri, 22 Apr 2011 18:18:55 0200
Subject: [PATCH 3/3] git-svn.txt: small typeface improvements

Signed-off-by: Valentin Haenel 
Acked-by: Eric Wong 
---
 Documentation/git-svn.txt |    8 ++++----
 1 files changed, 4 insertions(), 4 deletions(-)

diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
...

If you plan to send a series of patches, it is recommended that you use the --cover-letter option to create a kind of “cover page” in which you describe the series. By default the file is called 0000-cover-letter.patch. Apart from the default headers, such a file looks like this:

Subject: [PATCH 0/3] *** SUBJECT HERE ***

*** BLURB HERE ***

Valentin Haenel (3):
  git-svn.txt: fix usage of --add-author-from
  git-svn.txt: move option descriptions
  git-svn.txt: small typeface improvements

 Documentation/git-svn.txt |   22 +++++++++++-----------
 1 files changed, 11 insertions(+), 11 deletions(-)

As you can see, the Subject: still has the prefix [PATCH 0/3]; this way, all recipients can immediately see that it is a cover page. The file also contains the output of git shortlog and git diff --stat. Replace * SUBJECT HERE with a subject and BLURB HERE * with a summary of the patch series. Send the file together with the patch files.

Frequently, mailing lists to which patches are sent are used to criticize the patches in terms of content and syntax and to ask the author for improvement. Once the author has made the improvements, he sends the corrected series back to the list as a reroll. Depending on the size of the patch series and the requirements of the project, a patch series may go through several rerolls until it is accepted.

When you send a patch series to a mailing list: Keep the commits on a separate branch, and incorporate the fixes in new commits (for missing functionality) or with interactive rebase (to adjust existing commits). Then use the git format-patch command with the --reroll-count=<n> option (or -v <n> for short): this will create patches with [PATCH v2] as the subject line, making it clear that this is the first reroll in the series.

5.9.2. Sending Patches

Send the generated files with git send-email (or an email client of your choice). The command expects as its only mandatory argument either one or more patch files, a directory full of patches, or a selection of commits (in which case Git also calls git format-patch internally):

$ git send-email 000*
0000-cover-letter.patch
0001-git-svn.txt-fix-usage-of-add-author-from.patch
0002-git-svn.txt-move-option-descriptions.patch
0003-git-svn.txt-small-typeface-improvements.patch
Who should the emails appear to be from? [Valentin Haenel
<valentin.haenel@gmx.de>]

$ git send-email master
/tmp/HMSotqIfnB/0001-git-svn.txt-fix-usage-of-add-author-from.patch
/tmp/HMSotqIfnB/0002-git-svn.txt-move-option-descriptions.patch
/tmp/HMSotqIfnB/0003-git-svn.txt-small-typeface-improvements.patch
Who should the emails appear to be from? [Valentin Haenel
<valentin.haenel@gmx.de>]

The command git send-email sets the fields Message-Id and In-Reply-To. This makes all e-mails after the first one look like replies to them and thus most mail programs will display them as a continuous thread:⁠^[79]

Figure 39. Patch series as mail thread

You can customize the command with options such as --to, --from and` --cc` (see the git-send-email(1) man page). However, if not specified, the essential information is queried interactively — most important is an address to which the patches should be sent.⁠^[80]

Before the emails are actually sent, you will see the header again; you should check if everything is as you want it, and then answer the question` Send this email? ([y]es|[n]o|[q]uit|[a]ll):` answer with y for “yes”. To get familiar with the command, you can first send all emails only to yourself or use the --dry-run option.

As an alternative to git send-email, you can post the contents of the files to one of the many online pastebin services, for example dpaste⁠^[81] or gist.github⁠^[82], and send the reference to it via IRC or Jabber. For pastebin, the recipient downloads the content into a file and submits it via git am (see below).

If you want to use your preferred Mail User Agent (MUA) (e.g. Thunderbird, Kmail or others) to send patches, there may be a few things to consider. Some MUAs are notorious for mutilating patches so that Git won’t recognize them as such.⁠^[83]

5.9.3. Applying Patches

Patch emails exported with git format-patch are translated back into commits by the git command git am (apply from mailbox). A new commit is created from each email, and its meta-information (author, commit message, etc.) is generated from the email header lines (From, Date). As mentioned earlier, Git uses the number in the subject to determine the order in which the commits should be entered. To complete the example from earlier: If the emails are in the Maildir directory patches, then that’s enough:

$ git am patches
Applying: git-svn.txt: fix usage of --add-author-from
Applying: git-svn.txt: move option descriptions
Applying: git-svn.txt: small typeface improvements

The command understands Maildir and mbox formats as well as files that contain the output of git format-patch:

$ git \
  am 0001-git-svn.txt-fix-usage-of-add-author-from.patch
Applying: git-svn.txt: fix usage of --add-author-from

When you apply patches from others using git am, the values of Author/AuthorDate and Committer/CommitDate are different. This means that both the author of the commit and the person who commits it are honored. In particular, the attributes are retained; it remains traceable who wrote which lines of code. With Gitk, the author and committer values are displayed by default; on the command line, set the --format=fuller option, which is accepted by git log and git show, among others:

$ git show --format=fuller  12d3065
commit 12d30657d411979af3ab9ca7139b5290340e4abb
Author:     Valentin Haenel <valentin.haenel@gmx.de>
AuthorDate: Mon Apr 25 23:36:15 2011 +0200
Commit:     Junio C Hamano <gitster@pobox.com>
CommitDate: Tue Apr 26 11:48:34 2011 -0700

    git-svn.txt: fix usage of --add-author-from

With the Dictator and Lieutenants Workflow (Sec. 5.10, “A Distributed, Hierarchical Workflow”), it can happen that more than two people are involved in a commit. In this case, it makes sense that everyone who reviews the patch also “approves” it, especially the author. For this purpose, there is a --signoff option (-s for short) for the git commit and git am commands, which appends the committer’s name and email to the commit message:

Signed-off-by: Valentin Haenel <valentin.haenel@gmx.de>

This feature is especially useful for larger projects, which usually have guidelines on how to format commits and how best to send them.⁠^[84]

Conflicts can occur when patches are entered with git am, e.g. if the patches are based on an older version and the lines concerned have already been changed. In this case, the process is interrupted and you then have several options for how to proceed. Either resolve the conflict, update the index and continue the process with git am --continue, or skip the patch with git am --skip. Use git am --abort to abort the process and restore the current status of the branch.

Because patches usually contain changes made by others, it can sometimes be difficult to find the right solution to a conflict. The best strategy for patches that cannot be applied is to ask the author of the patches to rebase them to a well-defined base, such as the current master, and send them again.

An alternative to git am is the somewhat rudimentary command git apply. It is used to apply a patch to the working tree or index (with the --index option). It is similar to the classic Unix command patch. It is especially useful if you want to edit the patch or metadata before committing, or if someone has sent you the output of git diff instead of git format-patch as a patch.

5.10. A Distributed, Hierarchical Workflow

The Integration Manager workflow does not scale with the size of the project. With large growth, at some point the maintainer is overwhelmed by the complexity of the project and the number of incoming patches. The so-called Dictator and Lieutenants workflow, which is used extensively in the development of the Linux kernel, provides a remedy. In this case, the software is usually divided into different subsystems, and contributions are examined by the lieutenants (also subsystem maintainers) and then forwarded to the Benevolent Dictator. The Benevolent Dictator uploads the changes to the blessed repository, which in turn is synchronized with all other participants.

Figure 40. Workflow: Dictator and Lieutenants

The workflow is based on trust: The dictator trusts his lieutenants and usually takes over their forwarded modifications without control. The advantage is that the dictator is exonerated, but still retains a veto right, which led to the title Benevolent Dictator.

For historical reasons, the official repository is often only the public repository of the current main maintainer or the original author. It is important to note that this repository exists only because of social conventions. Should another developer one day better advance the project, his public repository may become the new Blessed Repository. From a technical point of view, there is no reason not to do so.

The projects that use this workflow in practice prefer to exchange patches by mail. However, the nature of the exchange is secondary, and subsystem maintainers may just as well receive pull requests from developers they know; or they may mix public repositories and patches sent by email at will. Git’s flexibility — especially the variety of different methods for exchanging changes — supports every conceivable workflow in the spirit of free, open development. Certainly a feature that has contributed greatly to Git’s popularity.

5.11. Managing Subprojects

For larger software projects, it is sometimes necessary to outsource certain parts of a program into separate projects. This is the case in the following situations, for example:

Your software depends on a specific version of a library that you want to ship with the source code.

Your initially small project grows so large over time that you want to move functionality to a library that you want to manage as a separate project.

Independent parts of your software are managed by other development groups.

With Git, you can use it in two different ways: You can manage the modules as Git submodules or as subtrees — in either case, you manage source code in a subdirectory of your project.

As submodules, you manage an isolated repository that has nothing to do with your parent repository. If you work with subtrees instead, the project history of the subdirectory becomes inseparable from the parent project. Both have advantages and disadvantages.

We’ll look at both techniques by way of example, creating a fictional project that requires libgit2. The library provides, similar to libgit.a, an API to examine and modify Git repositories.⁠^[85] The library, written in C, can extend its functions to Lua, Ruby, Python, PHP and JavaScript, among others.

5.11.1. Submodules

Submodules are managed by Git as subdirectories that have a special entry in the .gitmodules file. The command git submodule is responsible for handling them.

First we need to import the library. This is done with the following command:

$ git submodule add git://github.com/libgit2/libgit2.git libgit2
Cloning into libgit2...
remote: Counting objects: 4296, done.
remote: Compressing objects: 100% (1632/1632), done.
remote: Total 4296 (delta 3214), reused 3530 (delta 2603)
Receiving objects: 100% (4296/4296), 1.92 MiB | 788 KiB/s, done.
Resolving deltas: 100% (3214/3214), done.

From the output of git status we can now see that there is a new directory libgit2 and that the file .gitmodules with the following content has been created

[submodule "libgit2"]
  path = libgit2
  url = git://github.com/libgit2/libgit2.git

This file has already been added to the index, prepared for committing. The libgit2 directory, on the other hand, does not appear in the output of git diff --staged as usual:

$ git diff --staged -- libgit2
diff --git a/libgit2 b/libgit2
new file mode 160000
index 0000000..b64e11d
--- /dev/null
+++ b/libgit2
@@ -0,0 +1 @@
+Subproject commit 7c80c19e1dffb4421f91913bc79b9cb7596634a4

Instead of listing all the files in the directory, Git saves a “special” file (recognizable by the unusual file mode 160000) that simply records the commit the module is currently on.

We import these changes, and from now on we can compile libgit2 in its subdirectory and then link against it:

$ git commit -m "libgit2-submodule importiert"

The parent project and libgit2 are now merged in the working tree, but their version history is and remains separate. In the Git repository of libgit2 you can behave exactly the same way as in a “real” repository. For example, you can look at the output of git log in the parent project and after a cd libgit2 in the submodule.

5.11.1.1. Changes in Submodules

Now libgit2 has selected the branch development as default branch (i.e. the HEAD on the server side). It may not be the best idea to more or less wire this development branch to your repository.

So we change to the libgit2 directory and check out the latest tag, v0.10.0:

$ cd libgit2
$ git checkout v0.10.0
# Nachricht über "detached HEAD state"
$ cd ..
$ git diff
diff --git a/libgit2 b/libgit2
index 7c80c19..7064938 160000
--- a/libgit2
+++ b/libgit2
@@ -1 +1 @@
-Subproject commit 7c80c19e1dffb4421f91913bc79b9cb7596634a4
+Subproject commit 7064938bd5e7ef47bfd79a685a62c1e2649e2ce7

So the parent Git repository sees a change of HEAD, which was done by the git checkout v0.10.0 command in libgit2/, as a change to the pseudo-file libgit2, which now points to the corresponding new commit.

Now we can add this change to the index and save it as a commit:

$ git add libgit2
$ git commit -m "Libgit2-Version auf v0.10.0 setzen"

Attention: Never add files from libgit2 or the directory libgit2/ (ends with slash) — this breaks the modular concept of Git, you will suddenly manage files from the submodules in the parent project.

Similarly, you can use submodule update (or git remote update in the libgit2/ directory) to download new commits and record a library update in the parent repository accordingly.

5.11.1.2. From a User Perspective

So what does it all look like from the perspective of a user cloning the project for the first time? First, it’s obvious that the submodule(s) are not hard-coded into the repository and are not shipped with it:

$ git clone /dev/shm/super clone-super
$ cd clone-super
$ ls
bar.c  foo.c  libgit2/
$ ls -l libgit2
total 0

The directory libgit2/ is empty. So everything Git knows about the submodules is in the .gitmodules file. You need to initialize this module first and then download the module’s repository:

$ git submodule init
Submodule 'libgit2' (git://github.com/libgit2/libgit2.git)
registered for path 'libgit2'
$ git submodule update
...
Submodule path 'libgit2': checked out '7064938bd5e7ef47bfd79a685a62c1e2649e2ce7'

So we see that libgit2 is automatically set to the v0.10.0 version defined in our repository. But in principle the user can now also change to the directory, check out the branch development and compile the project against this version. Submodules get the flexibility of the sub-repository — so the entry on which state the module is on is only a “recommendation”.

5.11.2. Subtrees

Unlike submodules, which maintain their character as a standalone Git repository, when you work with Subtrees, you directly merge the history of two projects. A comparison of the two approaches follows.

Essentially, this technique is based on so-called subtree-merges, which were briefly discussed in Sec. 3.3.3, “Merge Strategies” about merge strategies. In our example, a subtree-merge is done by merging regular commits from the libgit2 repository under the libgit2/ tree (directory) — a top-level file in the library repository thus becomes a top-level file in the libgit2/ tree, which in turn is part of the repository.

Git has a command to manage subtree-merges.⁠^[86] You must always explicitly specify which subdirectory you are referring to by using -P <prefix>. To import the libgit2 in version 0.8.0, use:

$ git subtree add -P libgit2 \
  git://github.com/libgit2/libgit2.git v0.8.0
git fetch git://github.com/libgit2/libgit2.git v0.8.0
From git://github.com/libgit2/libgit2
 * tag               v0.8.0     -> FETCH_HEAD
Added dir 'libgit2'

The command automatically downloads all required commits and creates a merge commit that creates all the files of libgit2 under the directory libgit2/. The merge commit now links the previous version history to that of libgit2 (by referencing an original commit and then referencing other commits).

The result of this procedure is that your repository now contains all relevant commits from libgit2. Your repository now has two root commits (see also multi-root repositories in Sec. 4.7, “Multiple Root Commits”).

The files are now stored inseparably linked to the project. A git clone of this repository would also transfer all files under libgit2.⁠^[87]

Now what happens when you want to “upgrade” to v0.10.0? Use the pull command from git subtree for this:

$ git subtree -P libgit2 \
  pull git://github.com/libgit2/libgit2.git v0.10.0
From git://github.com/libgit2/libgit2
 * tag               v0.10.0    -> FETCH_HEAD
Merge made by the 'recursive' strategy.
...

Note: Since the original libgit2 commits are present, these commits also seem to change top-level files (e.g., COPYING when you use git log --name-status to examine the version history). In fact, these changes are actually made in libgit2, which is the responsibility of the merge commit, which aligns the trees accordingly.

If you’re not interested in the version history of a subproject, but want to anchor a particular state in the repository, you can use the --squash option. The git subtree add/pull commands then do not merge the corresponding commits, but only create a single commit that contains all changes. Note: Do not use this option unless you have also imported the project using --squash; this will cause merge conflicts.

5.11.2.1. Splitting off a Subdirectory

At some point, you may be faced with the task of managing a subdirectory of your project as a separate repository. However, you may still want to integrate the changes into the original project.

For example, the documentation stored under doc/ will be managed in a separate repository from now on. Occasionally, that is, every few weeks, you want to transfer the latest developments to the master repository.

The git sub-tree command provides a separate sub-command split for this purpose, which you can use to automate this step. It creates a version history containing all changes to a directory, and issues the latest commit — which you can then upload to an (empty) remote.

$ git subtree split -P doc --rejoin
Merge made by the 'ours' strategy.
563c68aa14375f887d104d63bf817f1357482576
$ git push <neues-doku-repo> 563c68aa14375:refs/heads/master

The --rejoin option causes the version history split off in this way to be directly reintegrated into the current project via git subtree merge. From now on you can integrate the new commits via git subtree pull. If you want to use the --squash option instead, omit --rejoin.

5.11.3. Submodules vs. Subtrees

The question “Submodules or Subtrees?” cannot be answered in general, but only on a case by case basis. The decisive criterion should be the affiliation of the subproject to the superordinate one: If you include third-party software, it is probably more likely to be submodules, your own with limited commits and a direct relationship to the main project rather than a subtree.

For example, when you install CGit (see Sec. 7.5, “CGit — CGI for Git”), you must initialize and update a submodule to compile libgit.a. So CGit needs the source code of Git, but doesn’t want to merge the development history with that of Git (the comparatively few CGit commits would be lost in this!). You can, however, compile CGit against another version of Git if you wish — the flexibility of the sub-repository is preserved.

The graphical repository browser Gitk, on the other hand, is managed as a subtree. It is developed in git://ozlabs.org/~paulus/gitk, but is included in the main Git repository with the subtree-merge strategy under gitk-git/.

6. Workflows

In software development, workflows are usually used to describe strategies that define workflows in a team (e.g. 'agile software development'). We can generally limit ourselves to literature references here.⁠^[88]

In Git, you can see “workflows” from two different perspectives: Workflows (command sequences) that affect individual users, and project-related workflows (e.g., release management). Both aspects are discussed below.

6.1. User

Below you will find a list of general development strategies (in no particular order):

Make commits as small and independent as possible: Divide your work into small, logical steps and make a commit for each step. The commits should be independent of future commits and should pass all tests (if any). This makes it easier for your colleagues or maintainers to keep track of what you have done. It also increases the efficiency of commands that examine the story, such as git bisect and git blame. Don’t be afraid to make commits that are too small. It’s easier in hindsight to combine several small commits with git rebase --interactive than to split one big one into several small ones.

Develop in topic branches: Branching is easy, fast and intuitive in Git. Subsequent merging works without problems, even repeatedly. Take advantage of Git’s flexibility: Don’t develop directly in master, but develop each feature in its own branch, called the Topic Branch.

This has several advantages: you can develop features independently; you get a well-defined point in time for integration (merge); you can rebase the development to be “streamlined” and clear before you publish it; you make it easier for other developers to test a new feature in isolation.

Use Namespaces: You can create different classes of branches by using / characters in the branch name. In a central repository you can create your own namespace using your initials (e.g. jp/refactor-base64) or store your features under experimental/ or pu/ (see below) depending on stability.

Rebase early, Rebase often: If you frequently work with Rebase on Topic Branches, you will create a much more readable version history. This is convenient for you and other developers and helps to split the actual programming process into logical units.

Merge small commits when they belong together. If necessary, take the time to split up large commits again in a sensible way (see Sec. 4.2.2, “Editing Commits Arbitrarily”).

However, only use Rebase for your own commits: do not modify already published commits or other developers' commits.

Make a conscious distinction between FF and regular merges: Integrate changes from upstream always via fast-forward (you simply fast forward the local copy of the branches). In contrast, integrate new features through regular merges. The aliases presented in Sec. 3.3.2, “Fast Forward Merges: Fast Forwarding One Branch” are also helpful for differentiation.

Note the merge direction: The command git merge pulls one or more branches into the current one. So always pay attention to the direction in which you perform a merge: Integrate topic branches into the mainline (the branch on which you are preparing the stable release), not the other way around.⁠^[89] This way you can isolate the history of a feature from the mainline even after the fact (git log topic lists only the relevant commits).

Criss-cross merges (crossed merges) should be avoided if possible: They occur when you integrate a branch A into a branch B and an older version of B into A.

Test the compatibility of features via Throw-Away Integration: Create a new (disposable) branch and merge the features whose compatibility you want to test. Run the test suite or test the interaction of the new components in another way. You can then delete the branch and continue developing the features separately. Such Throw-Away branches are usually not published.

Certain work steps appear again and again. Here are a few general solution strategies:

Fix a small bug: If you notice a small bug that you want to fix quickly, you can do this in two ways: stash existing changes (see Sec. 4.5, “Outsourcing Changes — Git Stash”), check out the corresponding branch, fix the bug, change the branch again, and apply the stash.

The other possibility is to fix the bug on the branch you are currently working on and to subsequently transfer the corresponding commit(s) via Cherry Pick or Rebase-Onto (see Sec. 3.5, “Taking over Individual Commits: Cherry Picking”) to the designated bugfix or topic branch.

Correcting a Commit: With git commit --amend you can customize the last commit. The --no-edit option causes the description to be retained and not offered again for editing.

To fix deeper commits, either use interactive rebase and the edit keyword (see Sec. 4.2.2, “Editing Commits Arbitrarily”), or create a small commit for each fix, then arrange them accordingly in the interactive rebase, and apply the fixup action to them to correct the original commit.

Which branches are not yet in master?: Use git branch -vv --no-merged to find out which branches are not yet included in the current branch.

Merge multiple changes from different sources: Use the index to combine several changes, e.g. changes that complement each other but are in different branches or as patches. The commands git apply, git cherry-pick --no-commit and git merge --squash apply the corresponding changes only to the working tree or index without creating a commit.

6.2. A Branching Model

The following section introduces a branching model based on the model described in the gitworkflows(7) man page. The branching model determines which branch performs which functions, when and how commits are taken from a branch, which commits are to be tagged as releases, etc. It is flexible, scales well, and can be extended as needed (see below).

In its basic form the model consists of four branches: maint, master, next, and pu (Proposed Updates). The master branch is used to prepare the next release and to collect trivial changes. pu branches are used for feature development (topic branches). In the next branch halfway stable new features are collected, tested for compatibility, stability and correctness and improved if necessary. Critical bug fixes for previous versions are collected in the main branch and published as maintenance releases.

In principle, commits are always integrated into another branch by a merge (in Figure 41, “Branch model according to gitworkflows (7)” indicated by arrows). Unlike cherry picking, commits are not duplicated, and you can easily see whether a branch already contains a particular commit or not.

The following diagram is a schematic representation of the ten-point workflow, which is explained in detail below.

Figure 41. Branch model according to gitworkflows (7)

New Topic Branches arise from well-defined points, e.g. tagged releases, on the master.
```
$ git checkout -b pu/cmdline-refactor v0.1
```
Sufficiently stable features are taken from their respective pu branch to next (feature graduation).
```
$ git checkout next
$ git merge pu/cmdline-refactor
```
Release preparation: If enough new features have accumulated in next (feature driven development), next is merged to master and if necessary a release candidate tag (RC tag) is created (suffix -rc<n>).
```
$ git checkout master
$ git merge next
$ git tag -a v0.2-rc1
```
From now on, only so-called release critical bugs (RC bugs) are corrected directly in the master. These are “show-stoppers”, i.e. bugs that significantly limit the functionality of the software or make new features unusable. If necessary, you can undo merges of problematic branches (see Sec. 3.2.2, “Rolling Back Commits”).

What happens to next during the release phase depends on the size of the project. If all developers are busy fixing the RC bugs, a development stop for next is a good idea. For larger projects, where development for the next release but one is already being pushed forward during the release phase, next can continue to serve as an integration branch for new features.
Once all RC bugs have been eliminated, the master is tagged as a release and, if necessary, published as a source code archive, distribution package, etc. Furthermore, master is merged to next to transfer all fixes for RC bugs. If no further commits have been made to next in the meantime, this is a fast forward merge. Now new topic branches can be opened again, based on the new release.
```
$ git tag -a v0.2
$ git checkout next
$ git merge master
```
Feature Branches that didn’t make it into the release can now either be merged into the next Branch, or, if they are not yet finished, they can be rebuilt to a new, well-defined base.
```
$ git checkout pu/numeric-integration
$ git rebase next
```
In order to separate feature development from bug fixes and maintenance, bug fixes that affect a previous version are made in the branch maint. This maintenance branch, like the feature branches, branches off from master at well-defined points.
If enough bug fixes have accumulated or if a critical bug has been fixed, e.g. a security bug, the current commit is tagged as maintenance release on the main branch and can be published via the usual channels.
```
$ git checkout maint
$ git tag -a v0.1.1
```
Sometimes it happens that bug fixes made on master are also needed in maint. In this case it is okay to transfer them there using git cherry-pick. But this should be the exception rather than the rule.
To ensure that bug fixes are available in the future, the maint branch is merged to master after a maintenance release.
```
$ git checkout master
$ git merge maint
```
If the bug fixes are very urgent, they can be transferred to the appropriate branch (next or pu/*) using git cherry-pick. As with git cherry-pick to maint, this should only happen rarely.
When a new release is released, the maint branch is fast-forwarded to the state of master, so maint now contains all commits that make up the new release. If no fast-forward is possible here, this is an indication that there are still bug fixes in maint that are not in master (see point 9).
```
$ git checkout maint
$ git merge --ff-only master
```

You can extend the branching model as you wish. One approach that is often encountered is the use of namespaces (see Sec. 3.1, “References: Branches and Tags”) in addition to the pu/* branches. This has the advantage that each developer uses his own namespace, which is delimited by convention. Another very popular extension is to have a separate maint branch for each previous version. This makes it possible to maintain any number of older versions. For this purpose, before merging from maint to master, a corresponding branch for the version is created in point 9.

$ git branch maint-v0.1.2

But keep in mind that these additional maintenance branches mean an increased maintenance effort, because every new bug fix has to be checked. If it is also relevant for an older version, it must be added to the maintenance branch for that version using git cherry-pick. In addition, a new maintenance version may have to be tagged and published.

6.3. Release Management

As soon as a project has more than one or two developers, it usually makes sense to assign a developer to manage the releases. This Integration Manager decides after consultation with the others (e.g. via the mailing list) which branches are integrated and when new releases are made.

Each project has its own requirements for the release process. Below are some general tips on how to monitor development and partially automate the release process.⁠^[90]

6.3.1. Exploring Tasks

The maintainer of a software must have a good overview of the features that are actively being developed and will soon be integrated. In most development models, commits graduate from one branch to the next — in the model presented above, first from the pu branches to next and then to master.

First of all, you should always clean up your local branches in order not to lose the overview. The command git branch --merged master, which lists all branches that are already fully integrated into master (or another branch), is especially helpful here. You can usually delete these.

To get a rough overview of the tasks that need to be done, it is recommended to use git show-branch. Without any further arguments, it lists all local branches, each with an exclamation mark (!) in its own color. The current branch gets a star (*). Below the output all commits are shown and for each branch in the respective column a plus (+) or a star (*) if the commit is part of the branch. A minus (-) indicates merge commits.

$ git show-branch
! [for-hjemli] initialize buf2 properly
 * [master] Merge branch _stable_
  ! [z-custom] silently discard "error opening directory" messages
---
+   [for-hjemli] initialize buf2 properly
--  [master] Merge branch _stable_
+*  [master\^2] Add advice about scan-path in cgitrc.5.txt
+*  [master^2\^] fix two encoding bugs
+*  [master\^] make enable-log-linecount independent of -filecount
+*  [master\~2] new_filter: correctly initialise ... for a new filter
+*  [master\~3] source_filter: fix a memory leak
  + [z-custom] silently discard "error opening directory" messages
  + [z-custom^] Highlight odd rows
  + [z-custom\~2] print upstream modification time
  + [z-custom\~3] make latin1 default charset
+*+ [master~4] CGIT 0.9

Only so many commits are shown until a common merge base of all commits is found (in the example: master~4). If you don’t want to examine all branches at once, but only the branches under pu/, for example, then explicitly specify this as argument. --topics <branch> defines <branch> as integration branch, whose commits are not explicitly shown.

So the following command shows you all commits of all pu branches and their relation to master:

$ git show-branch --topics master "pu/*"

It is worth documenting the commands you use for release management (so that others can continue your tasks if necessary). You should also abbreviate common steps by using aliases.

You could convert the above command into an alias todo as follows:

$ git config --global alias.todo \
  "!git rev-parse --symbolic --branches | \
  xargs git show-branch --topics master"

However, the git show-branch command only recognizes identical, i.e. identical commits. If you use git cherry-pick to copy a commit to another branch, the changes are almost the same, but git show-branch would not detect this because the SHA-1 sum of the commit changes.

The git cherry tool is responsible for these cases. It uses the small tool git-patch-id internally, which reduces a commit to its changes. It ignores whitespace changes and the contextual position of the hunks (line numbers). So the tool returns the same ID for patches that essentially commit the same change.

Usually, git cherry is used when the question arises: Which commits have already been transferred to the integration branch? The command git cherry -v <upstream> <topic> is used for this: It lists all commits from <topic>, and puts a minus (-) in front of them if they are already in <upstream>, otherwise a plus (+). This looks like this:

$ git cherry --abbrev=7 -v master z-custom
+ ae8538e guess default branch from HEAD
- 6f70c3d fix two encoding bugs
- 42a6061 Add advice about scan-path in cgitrc.5.txt
+ cd3cf53 make latin1 default charset
+ 95f7179 Highlight odd rows
+ bbaabe9 silently discard "error opening directory" messages

Two of the patches were already applied after master. git cherry recognizes this, although the commit IDs have changed.

6.3.2. Creating Releases

Git provides the following two useful tools to help you prepare for a release:

git shortlog: Summarizes the output of git log.
git archive: Automatically creates a source code archive.

A good release includes a so-called changelog, i.e. a summary of the most important changes including thanks to people who have contributed help. This is where git shortlog comes in. It shows the respective authors, how many commits each one has made, and the commit messages of each commit. This makes it easy to see who did what.

$ git shortlog HEAD~3..
Georges Khaznadar (1):
      bugfix: 3294518

Kai Dietrich (6):
      delete grammar tests in master
      updated changelog and makefile
      in-code version number updated
      version number in README
      version number in distutils setup.py
      Merge branch _prepare-release-0.9.3_

Valentin Haenel (3):
      test: add trivial test for color transform
      test: expose bug with ID 3294518
      Merge branch _fix-3294518_

The --numbered or -n option sorts the output by the number of commits instead of alphabetically. With --summary or -s the commit messages are omitted.

But if in doubt, don’t simply write the output of git log or git shortlog to the file CHANGELOG. Especially with many technical commits, the changelog is not helpful (if you’re interested in this information, you can always check the repository). But you can take the output as a basis, delete unimportant changes and combine the rest into meaningful groups.

Often the question arises for the maintainer what has changed since the last release. This is where git-describe (see Sec. 3.1.3, “Tags — Marking Important Versions”) comes in handy. In conjunction with --abbrev=0, it outputs the first accessible tag from the HEAD:

$ git describe
wiki2beamer-0.9.2-20-g181f09a
$ git describe --abbrev=0
wiki2beamer-0.9.2

In combination with git shortlog the question can be answered very easily:

$ git shortlog -sn $(git describe --abbrev=0)..
    15  Kai Dietrich
     4  Valentin Haenel
     1  Georges Khaznadar

The git archive command helps to create a source code archive. The command can handle both tar and zip format. Additionally, you can set a prefix for the files to be saved with the option --prefix=. The top level of the repository is then stored below this prefix, usually the name and version number of the software:

$ git archive --format=zip --prefix=wiki2beamer-0.9.3/ HEAD \
    > wiki2beamer-0.9.3.zip
$ git archive --format=tar --prefix=wiki2beamer-0.9.3/ HEAD \
    | gzip > wiki2beamer-0.9.3.tgz

As a mandatory argument the command expects a commit (or a tree), which should be packed as an archive. In the above example the HEAD. But it could also have been a commit ID, a reference (branch or tag) or directly a tree object.⁠^[91]

Again, you can use git describe after you have tagged a release commit. If you have a suitable tag scheme <name>-<X.Y.Z> as above, the following command is sufficient:

$ version=$(git describe)
$ git archive --format=zip --prefix=$version/ HEAD > $version.zip

It’s possible that not all of the files you manage in your git repository should also be in the source code archives, such as the project website. You can also specify paths - so to limit the archive to the src directory and the LICENSE and README files, use

$ version=$(git describe)
$ git archive --format=zip --prefix=$version/ HEAD src LICENSE README \
    > $version.zip

Git will store the SHA-1 sum in the archive if you specify a commit as an argument. In tar format, this is stored as a pax header entry, which Git can read again with the command git get-tar-commit-id:

$ zcat wiki2beamer-0.9.3.tgz | git get-tar-commit-id
181f09a469546b4ebdc6f565ac31b3f07a19cecb

In zip files, Git simply saves the SHA-1 sum in the comment field:

$ unzip -l wiki2beamer-0.9.3.zip | head -5
Archive:  wiki2beamer-0.9.3.zip
181f09a469546b4ebdc6f565ac31b3f07a19cecb
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  05-06-2011 20:45   wiki2beamer-0.9.3/

One problem you should keep in mind is that for example .gitignore files are automatically packed. But since they have no meaning outside a git repository, it is worth excluding them with the git attribute (see Sec. 8.1, “Git Attributes — Treating Files Separately”) export-ignore. This is done with an entry .gitignore export-ignore in .git/info/attributes.

You can also perform automatic keyword substitutions before packing the archive (see Sec. 8.1.2, “Keywords in Files”).

7. Git Servers

The following is about hosting Git repositories and Gitolite, which allows you to flexibly manage repository access rights via SSH public keys. It also explains how to install and configure the two web interfaces Gitweb and CGit, alternatively for Apache or Lighttpd.

7.1. Hosting a Git Server

First some basics: How do repositories on a server differ from those of a normal user? And how does Git exchange the changes?

7.1.1. The Git Protocol

Git is designed for decentralized repository management; the smallest unit for exchanging changes between repositories is commits. However, since there are sometimes thousands of commits between two versions of a piece of software, and a single commit-by-commit transfer would generate a lot of overhead, commits are grouped together into so-called packfiles before being transferred. These packfiles are a simple but effective format.⁠^[92] They are also used to store (older) commits on the hard disk in a space-saving way (git gc or git repack, see Sec. B.1, “Cleaning Up”).

These packfiles are usually transmitted via the Git protocol, which runs on port 9418/TCP by default. The Git protocol is deliberately kept very simple in design and offers only a few functions that are directly related to the structure of Git: What data to send or receive, and a way for the sender and receiver sides to agree on the smallest possible amount of data that needs to be transmitted to synchronize both sides.

Therefore, the Git protocol does not include any authentication option. Instead, Git uses an already existing, secure and simple authentication structure: SSH, the 'Secure Shell'.

So while the Git protocol can be used unencrypted and in raw form for anonymous read-only access, writing or uploading via Git protocol only works via SSH.

Git also supports transport via HTTP(S), FTP(S), and Rsync. Although Rsync is now considered deprecated, and should not be used anymore, there are some use cases for HTTP(S): In particularly restrictive environments with very restrictive firewall rules, you may be able to access a repository via HTTP(S) (i.e. only on port 80 or 443) for both read and write operations. Platforms like GitHub (see Ch. 11, GitHub) therefore offer HTTPS as default transport method.

7.1.2. Repositories on the Same Computer

If you want to synchronize changes to repositories on the same computer, this does not have to be done via detours: Git communicates directly with the other side via Unix pipes, negotiates a common basis and synchronizes the data. (Of course, this requires that the user invoking the Git command has at least read permission to the other repository’s pack files).

7.1.3. Bare Repositories: Repositories Without Working Tree

So far, you’ve probably worked mostly with Git repositories, which were working tree and repository in one: The repository-internal data is stored in the .git subdirectory, all other files belong to the working tree, i.e. you can edit them while Git observes and stores the changes to these files (tracking).

A so-called bare repository, i.e. a “mere” repository, has no assigned working tree. It contains only the files and directories that are stored in a “regular” repository under .git.

You create such a bare repository with git init --bare. Take a look at the difference between the two options:

$ cd /tmp/ && mkdir init-test && cd init-test
$ git init
Initialized empty Git repository in /tmp/init-test/.git/
$ ls -AF
.git/

$ mkdir ../init-test-bare && cd ../init-test-bare
$ git init --bare
Initialized empty Git repository in /tmp/init-test-bare/
$ ls -AF
branches/  config  description  HEAD  hooks/  info/  objects/  refs/

To create a backup of one of your normal repositories, you can create a new bare repository (e.g. on a USB stick) and upload all your references (and thus all your commits):

$ git init --bare /mnt/usb/repo-backup/
$ git push --all /mnt/usb/repo-backup/

7.1.4. Repository Access Permissions

With git init the files are usually created with read and write permission according to the umask setting. This is also a convenient choice for the end user. However, if you want to set up a repository on a server, you can use the --shared option to specify who (at the filesystem level) can access the repository.

`umask`	Default, if `--shared` is not specified; uses the currently set `umask`.
`group`	Default, if only `--shared` is specified. Assigns write permissions to all group members. Especially directories are also set to `g+sx` mode, allowing all group members to create new files (i.e. upload commits). Note that if the `umask` sets read permission for all users (`a+r`), this permission will still be granted.
`all`	Same as `group`, except that read permissions are explicitly granted for all, regardless of the `umask`.
`0<nnn>`	Set the `umask` explicitly to `<nnn>`.

When you initialize a repository with --shared, the receive.denyNonFastForwards option is automatically set. It prevents uploading commits that cannot be integrated via Fast-Forwards (even if the user explicitly wants to via git push -f).

7.1.5. Access via SSH: The Git Shell

Usually, write access to Git repositories located on another computer can only be granted via SSH. However, it is generally undesirable to grant a user who will have access to a repository the same user rights to the whole system.

Git works around this problem with the included git-shell program. It works like a shell, but only allows you to run four Git commands that are responsible for uploading and downloading pack files. Interactive use or execution of other commands is denied by the shell unless you explicitly enable the “Interactive Mode” of the shell — see the git-shell(1) man page for details.

If you create a new user and assign the git shell to him, e.g., using chsh <user>, he cannot log in via SSH, but he can upload commits to all git repositories to which he has write permission.

7.1.6. Access via SSH: Public Keys

It’s a major advantage that Git uses SSH as an encrypted and authenticated transport channel, because most users already have a key pair (public/private) with which they log in on other computers.

So instead of tediously assigning (and then sending out) passwords to accounts, a system administrator can limit access to Git repositories to users who authenticate against SSH public keys. This saves time for the user (by eliminating the need to re-enter a password), but also saves the administrator from having to worry about password changes (which would not be easily possible using the Git shell).

7.1.7. Example: Two Users Want to Collaborate

In the following we will show you how to set up two users on your system, max and moritz, and let them work on the same repository.

First, we have to set up a repository that the two users will want to access later. Assuming that other repositories might follow later, we will create a Unix group git (generally for Git users) and a directory /var/repositories with read permission for members of the git group, as well as a git-example group and its corresponding directory, writeable only for members of git-example, in which the repository will later be located:

$ groupadd git
$ groupadd git-beispiel
$ mkdir -m 0750 /var/repositories
$ mkdir -m 0770 /var/repositories/git-beispiel
$ chown root:git /var/repositories
$ chown root:git-beispiel /var/repositories/git-beispiel

We also create a repository in the last created directory:

$ git init --bare --shared /var/repositories/git-beispiel
$ chown -R nobody:git /var/repositories/git-beispiel

Next we create the two users. Note that this call will not create a home directory for the users under /home/. Also, both are added to the git and git-example groups:

$ adduser --no-create-home --shell /usr/bin/git-shell max
$ adduser --no-create-home --shell /usr/bin/git-shell moritz
$ adduser max git
$ adduser max git-beispiel
$ adduser moritz git
$ adduser moritz git-beispiel

Next, we have to assign a password to each user via passwd so that they can log in via SSH. Afterwards, the new users can now work together on a project. You add the remote as follows:

$ git remote add origin max@server:/var/repositories/git-example

All other users who want to contribute to this project must belong to the git-example group. So this approach is essentially based on the use of Unix groups and Unix users. However, a server admin usually wants to offer not only Git, but various services. And to control the user administration entirely via Unix groups is rather inflexible.

7.2. Gitolite: Simple Git Hosting

The aforementioned described way of managing users has some major disadvantages. Namely:

A full Unix account must be created for each user. This means a lot of additional work for the administrator and possibly also opens security holes.
For each project a separate Unix group must be created.
For each user created, the access permissions must be adjusted manually (or via a script).

The program Gitolite provides a remedy.⁠^[93]

Gitolite originated from the Gitosis project, which is now considered obsolete. The idea: Only only Unix user (e.g. git) is created on the server. Internally, Gitolite then manages a list of users with associated SSH keys. But these users do not have a “real” user account on the system.

Users log just into the git account with their public SSH keys. This has three major advantages:

No password needs to be assigned or changed.
Users can store multiple SSH keys (for different computers they work on).
Using the SSH key a user logs in with, Gitolite can uniquely⁠^[94] derive the internal username and thus the permissions on the repositories managed by Gitolite.

7.2.1. Installing Gitolite

The installation of Gitolite is simple. All you need to do is have your public key ready to register as an administrator. You don’t need root privileges unless you need to create the git user first,⁠^[95] so skip the next step if you have already created such a user.

First, create a user on the computer that will act as the git server (henceforth <server>). Usually, this user is called git, but you may also call it something else (e.g. gitolite). You can specify /home/git as your home directory or, as in this example, something like /var/git:

server# adduser --home /var/git git

Now switch to the git user. Gitolite needs the .ssh/ and bin/ directories, so we need to create them:

server$ mkdir -m 0700 ~/.ssh ~/bin

Now clone the Gitolite repository and install a symlink to bin (this is already the whole installation):

server$ git clone git://github.com/sitaramc/gitolite
server$ gitolite/install -ln

You can now configure Gitolite and enter your public key with which you want to manage the Gitolite configuration:

server$ bin/gitolite setup -pk <ihr-key>.pub

Check that Gitolite works on the computer where you normally work (and where you have stored the corresponding private key):

client$ ssh -T git@<server>
...
 R W    gitolite-admin

You should verify that your key gives you read and write permission to the gitolite-admin repository. Now clone it onto your computer:

client$ git clone git@<server>:gitolite-admin

The repository contains the entire configuration for Gitolite. You check in your changes there and upload them via git push: the server automatically updates the settings.

7.2.2. Configuring Gitolite

The Gitolite admin directory contains two subdirectories, conf and keydir. To introduce a new user to Gitolite, you need to put their SSH key under keydir/<user>.pub. If the user has multiple keys, you can store them in separate files of the format <user>@<description>.pub:

client$ cat > keydir/feh@laptop1.pub
ssh-dss AAAAB3NzaC1kc3M ... dTw== feh@mali
^D
client$ cat > keydir/feh@laptop2.pub
ssh-dss AAAAB3NzaC1kc3M ... 5LA== feh@deepthought
^D

Don’t forget to check in the new keys with git add keydir followed by git commit. To make them known to the gitolite installation, you also need to upload the commits using git push.

Then you can assign permissions to this username in the conf/gitolite.conf configuration file.

You can save yourself a lot of administrative work and typing by using macros. You can combine groups (of users or repositories), e.g.:

@test_entwickler = max markus felix
@test_repos      = test1 test2 test3

These macros are also evaluated recursively. When defining them, it does not have to be clear whether they are users or repositories; the macros are only evaluated at runtime. This allows you to create groups from other groups:

@proj = @developer @tester @admins

There is a special group @all which, depending on the context, contains all users or all repositories.

You can configure one (or more) repositories as follows:

repo @test_repos
    RW+ = @test_entwickler

R and W stand for read or write access. The plus means that forced uploading is also allowed (non-fast-forward, i.e. also deleting commits).

For a repository, of course, several such lines can be entered. In a small project there could be maintainers, other developers and testers. Then the access rights could be regulated as follows:

@maintainers = ... # Hauptentwickler/Chefs
@developers  = ... # Weitere Entwickler
@testers     = ...

repo Projekt
    RW+ = @maintainers
    RW  = @developers
    R   = @testers

Thus, the testers have read-only access, while the developers are allowed to upload new commits, but only if they can be integrated via fast-forward. The main maintainers are allowed “everything”.

These lines are processed sequentially. If the line applies to a user, Gitolite authorizes the user and grants him the appropriate rights. If no line matches the user, the user is rejected and is not allowed to change anything in the repository.

A user can view all his permissions by simply logging into the Git server via SSH. Immediately after installation, this is how it looks like to the administrator:

$ ssh -q git@<server>
hello feh, this is git@mjanja running gitolite3 v3.6.1-6-gdc8b590 on git 2.1.0

 R W     gitolite-admin
 R W     testing

7.2.3. Ownership and Description

If you want to install a web-based tool to browse the Git repositories later, you should also name a person in charge and describe the project:

repo <repo-name>
  # Zugriffsrechte
  config gitweb.owner = "Julius Plenz"
  config gitweb.description = "Ein Test-Repository"

For this to work, you must first enable Gitolite to set these config settings: This is done on the server where Gitolite is installed, in the file .gitolite.rc: Enter the value gitweb\..* under the GIT_CONFIG_KEYS key.

7.2.4. Access Rights on File or Branch Level

Especially in corporate environments, access rights often have to be differentiated even more finely than a mere “has access” and “must not access”. For this purpose, Gitolite offers access restriction on directory- and file- as well as tag- and branch-level.

We will first look at a case that occurs frequently: developers should be able to develop on development branches at will, but only a small group of maintainers should be able to edit “important” branches such as master.

This could be implemented in a similar way:

@maintainers = ...
@developers  = ...

repo Projekt
    RW+ dev/    = @developers
    RW+         = @maintainers
    R           = @developers

Here a “development namespace” is created: The group of developers can work with branches below dev/, e.g. create dev/feature or delete it. However, the developers can only read the master branch, not change it — this is reserved for the maintainers.

The part between the flags (RW+) and the equal sign is a so-called Perl-Compatible Regular Expression (PCRE). If it does not start with refs/, the expression refers to all references below refs/heads/, i.e. branches. In the above example, any references below refs/heads/dev/ can be modified — but not the dev branch itself, nor anything-dev!

But if such an expression starts explicitly with refs/, you can manage any references. In the following way you can set up that all maintainers are allowed to create Release Candidate tags,⁠^[96] but only one maintainer is really allowed to create the versioning tag (or any other):

repo Projekt
    RW+ refs/tags/v.*-rc[0-9]+$     = @maintainers
    RW+ refs/tags/                  = <projektleiter>

If one of the maintainers still wants to upload a tag like v1.0, the following happens:

remote: W refs/tags/v1.0 <repository> <user> DENIED by fallthru
remote: error: hook declined to update refs/tags/v1.0
To <user>:<repository>
 ! [remote rejected] v1.0 -> v1.0 (hook declined)

As mentioned above, here the rules are applied one after the other. Since the tag v1.0 does not match the regular expression above, only the bottom line comes into question, but the username does not match. No line is left (fallthru), so the action is not allowed.

7.2.5. Personal Namespaces

The concept of personal namespaces is somewhat more flexible. This gives each developer his own hierarchy of branches that he can manage.

There is a special keyword for this, USER, which is replaced by the user name currently accessing the branch. This makes the following possible:

repo Projekt
    RW+ p/USER/  = @developers
    R            = @developers @maintainers

Now all developers under p/<user>/ can manage their branches as they like. The lower directive makes sure that all developers can read these branches. Now max can e.g. create p/max/bugfixes, but moritz can only read them.

7.2.6. File-Level Access Control

Gitolite also allows file- and directory-level access restrictions. The virtual reference VREF/NAME is responsible for this. For example, you can allow the documentation team only (writing⁠^[97]) access to doc/:

@doc = ...  # Dokumentations-Team

repo Projekt
    RW VREF/NAME/doc/   = @doc
    -  VREF/NAME/       = @doc

However, the following pitfalls must be taken into account: Once the keyword VREF/NAME appears once, the file-based rules are applied to all users. If none of them apply, access is allowed — so the second rule is important, which prohibits access for @doc unless the commit only modifies files under doc/ (see also Sec. 7.2.7, “Explicitly Prohibiting Actions” below).

Access control checks at the commit level which files are modified; if a commit contains changes to a file that the user is not allowed to edit, the entire push process is aborted. In particular, no actions can be performed that involve commits from other developers that modify files outside the allowed range.

Specifically, in relation to the above example, this means that @doc members generally cannot create new branches. Creating a new branch would mean creating a new reference to an initial commit and then fast-forwarding all commits from top to root, i.e., the entire project history. However, there are certainly commits in it that modify files outside of doc/, and so the action is prohibited.

7.2.7. Explicitly Prohibiting Actions

Previously, a user was only rejected if he failed all rules (fallthru), i.e. if no rights were assigned to him. But the - flag (instead of RW) can be used to explicitly restrict access. Again, the rules are passed through from top to bottom.

repo Projekt
    -   VREF/NAME/Makefile   = @developers

This directive prohibits members of @developers from making commits that modify the Makefile.⁠^[98]

By convention, you should never upload forced updates to the master or maint branches (see also Sec. 3.1, “References: Branches and Tags”). You can now force this policy with Gitolite:

repo Projekt
    RW  master maint    = @developers
    -   master maint    = @developers
    RW+                 = @developers

If a branch that is not called master or maint is uploaded, only the third rule is applied and arbitrary access (including non fast-forward updates) is allowed. Commits that can be integrated to master or maint via fast-forward are allowed by the first rule. Note the missing plus sign, though: A forced update is not covered by the first rule, but by the second one, which explicitly prohibits everything (that has not been allowed before).

7.2.8. Should Policies Be Enforced?

With the means presented here and others, which you can take from the documentation, ⁠^[99] you are able to force policies very flexibly. However, it may not be useful to control everything down to the smallest detail. As mentioned above, especially a control on file name level is problematic. Then, if hours of work go into a commit, but it can’t be uploaded because one of those restrictions prohibits it, the frustration is great (and fixing that commit is not trivial, either; see rebase, Sec. 4.1, “Moving commits — Rebase”).

At the branch level, it makes sense to give only a limited group of developers access to “important” branches (such as master). Of course, strict control over who can do what comes at the expense of flexibility, and it’s this flexibility that makes branching in Git so practical.

7.3. Git Daemon: Anonymous Read-Only Access

The Git daemon allows unencrypted, anonymous, read-only access to Git repositories via the Git protocol. It comes with Git and usually runs on TCP port 9418 (and can thus be started without root privileges).

The transmission is not encrypted. However, the cryptographic integrity that Git constantly checks excludes the possibility of attackers manipulating the data stream and smuggling in malicious code.⁠^[100]
This way is ideal for making source code available to a large number of people quickly and easily. Only the minimum of necessary information is downloaded (only the required commits are negotiated and then transferred packed).

In order to export one or more repositories, simply execute git daemon <path>, where <path> is the path where your repositories are located. You can also specify multiple paths. If you have already set up Gitolite as above, /var/git/repositories is a useful path.

For testing, you can run a Git daemon on a single repository:

$ touch .git/git-daemon-export-ok
$ git daemon --verbose /home/feh/testrepo

Then clone (preferably into a temporary directory) this very repository:

$ git clone git://localhost/home/feh/testrepo
Initialized empty Git repository in /tmp/tmp.kXtkwxKgkc/testrepo/.git/
remote: Counting objects: 130, done.
remote: Compressing objects: 100% (102/102), done.
Receiving objects: 100% (130/130), 239.71 KiB, done.
Resolving deltas: 100% (54/54), done.
remote: Total 130 (delta 54), reused 0 (delta 0)

However, the Git daemon will only export a repository if a git-daemon-export-ok file is created in the .git directory (as done above; in the case of bare repositories, of course, this must be done in the directory itself). This is done for security reasons: For example, /var/git/repositories may contain many (even private) repositories, but only those that really need to be exported without access control will receive this file.

However, the daemon accepts the --export-all option, which removes this restriction and exports all repositories in all subdirectories.

Another important setting is the Base Path, which is the path where the actual Git repositories are located. Start the Git daemon as follows:

$ git daemon --base-path=/var/git/repositories /var/git/repositories

every request for a git repository is preceded by the base path. Now users can clone a repository with the address git://<server>/<project>.git instead of using the cumbersome git://<server>/var/git/repositories/<project>.git.

7.3.1. Git-Daemon and Inetd

As a rule, the Git daemon is supposed to constantly deliver a large number of repositories. To do this, it runs constantly in the background or is restarted for each request. The latter task is typically performed by Inetd from OpenBSD. To make this work, you just need to add the following (one!) line to /etc/inetd.conf:

git     stream  tcp     nowait  <user>   /usr/bin/git git daemon
  --inetd --base-path=/var/git/repositories /var/git/repositories

<user> must be a user who has read access to the repositories. This can be root, because the Inetd normally runs with root privileges, but should be git or a similarly unprivileged account.

The configuration for the xinetd is similar, but more self-explanatory. It is stored e.g. under /etc/xinet.d/git-daemon:

service git
{
    disable         = no
    type            = UNLISTED
    port            = 9418
    socket_type     = stream
    wait            = no
    user            = <user>
    server          = /usr/bin/git
    server_args     = daemon --inetd --base-path=... ...
    log_on_failure  += USERID
}

Do not forget to restart the respective daemon via /etc/init.d/[x]inetd restart.⁠^[101]

7.3.2. The Debian Way: Git Daemon SV

Debian offers the git-daemon-run package which contains configuration files for sv.⁠^[102] The package essentially creates a gitlog user and two executable shell scripts, /etc/sv/git-daemon/run and /etc/sv/git-daemon/log/run. Modify the former to run the Git daemon in the directory where your repositories are located:

#!/bin/sh
exec 2>&1
echo _git-daemon starting._
exec git-daemon --verbose --listen=203.0.113.1 --user=git --group=git \
  --reuseaddr --base-path=/var/git/repositories /var/git/repositories

If you start the Git daemon from a shell script this way (or similarly via SysV-Init), the script will be executed with root privileges. The following options are therefore useful:

`--user=<user>`	The user which the daemon runs as (e.g. `git`). Must have read access to the repositories.
`-⁠-⁠group⁠=⁠<group>`	The group which the daemon runs as. It makes sense to use the user group (`git`) or `nobody`.
`--reuseaddr`	Prevents the daemon restart from going wrong because there are still open connections waiting for a timeout. This option uses the bind address even if there are still connections. You should always specify this option if an instance is running continuously.

If you are using SysV-Init, which means that services are usually started via symlinks in /etc/rc2.d/ to scripts in /etc/init.d/, you will also need to create the following symlinks to automatically start the git daemon when the system boots

# ln -s /usr/bin/sv /etc/init.d/git-daemon
# ln -s ../init.d/git-daemon /etc/rc2.d/S92git-daemon
# ln -s ../init.d/git-daemon /etc/rc0.d/K10git-daemon
# ln -s ../init.d/git-daemon /etc/rc6.d/K10git-daemon

7.3.3. The Git Daemon on Production Systems

On a production system that is more than just a Git server, you may encounter the following situations:

There are several network cards or virtual interfaces.
The service should run on a different port.
Different IPs should deliver different repositories.

The Git daemon provides options to respond to such situations. They are summarized below. For more detailed explanations, please consult the git-daemon man page.

--max-connections=<n>: By default, the Git daemon only allows 32 simultaneous connections. With this option you can increase the number. A value of 0 allows any number of connections.⁠^[103]
--syslog: Uses the syslog mechanism instead of standard error to log error messages.
--port=<n>: Uses a port other than 9418.
--listen=<host/ip>: Determines which interface the Git daemon should bind to. By default the daemon is accessible on all interfaces, so it binds to 0.0.0.0. A setting of 127.0.0.1, for example, only allows connections from the local machine.
--interpolated-path=<template>: If a Git daemon shall offer different repositories depending on the interface-address, this is controlled by the <template>: %IP is replaced by the IP address of the interface, where the connection comes in, and %D by the given path. With a template of /repos/%IP%D, a git clone git://localhost/testrepo will display the following message in the log files: interpolated dir '/repos/127.0.0.1/testrepo' (because the connection is established via the loopback interface). For each interface on which the Git daemon runs, there must be a subdirectory in /repos/ with the interface’s corresponding IP address in which exportable repositories are located.

7.3.4. Specifying Exportable Repositories on Gitolite

Gitolite knows a special username, daemon. For all repositories where this user has read permission, the file git-daemon-export-ok is automatically created. So you can use Gitolite to directly specify which repositories to export:

repo Projekt
    R = daemon

Note that this setting has no effect if you start the Git daemon with the --export-all option. Also, you cannot give this permission to all repositories via repo @all.

7.4. Gitweb: The Integrated Web Frontend

Git comes with an integrated, browser-based frontend called Gitweb. The frontend allows you to search the entire version history of a project: Each commit can be viewed with full details, differences between commits, files or branches, as well as all log messages. In addition, each snapshot can be downloaded individually as a tar archive (this is especially handy for Git newbies).

To get an overview of the functionality, you can use the command git instaweb to set up a temporary web server with Gitweb without further configuration.

Git does not come with its own web server. You can use the --httpd=<webserver> option to specify which web server Git should use to deliver the page. If you just want to try out Gitweb, we recommend using the webrick web server — this is a small web server that automatically ships with the Ruby scripting language.

As soon as you execute the following command, the web server will be started and the page will be displayed in the browser (which browser is used can be specified with the --browser option).

$ git instaweb --httpd=webrick

Note that the command must be started at the top level of a Git directory. If necessary, stop the web server with the following command:

$ git instaweb --stop

7.4.1. Installing Gitweb Globally

Many distributions already include Gitweb as a separate package or directly in the Git package. Under Debian, the corresponding package is called gitweb. If you are not sure if Gitweb is available on your system, you should check under /usr/share/gitweb and install it if necessary.

Gitweb only requires a large Perl script plus a configuration file and optionally a logo, CSS stylesheet, and favicon. The configuration file is usually located in /etc/gitweb.conf, but can also be named differently. It is important that each time the Perl script is called, the environment variable GITWEB_CONFIG is used to specify where this file is located.

Usually you should already have such a file. The following list shows the most important configuration options.

Attention: The file must be written in valid Perl. In particular, do not forget the concluding semicolon when assigning variables!

$projectroot: The directory where your Git repositories are located.
$export_ok: File name that determines whether a repository should be visible in Gitweb. You should set this variable to "git-daemon-export-ok" so that only those repositories that are also delivered by the Git daemon are displayed.
@git_base_url_list: Array of URLs that can be used to clone the project. These URLs appear in the project overview and are very helpful to give people quick access to the source code after they have gotten a brief overview. It’s best to specify the URL where your Git daemon can be reached, e.g. ('git://git.example.com').
$projects_list: Assignment of projects and their owners. This project list can be automatically generated by Gitolite; see the sample configuration file below.
$home_text: Absolute path to a file containing, for example, a company or project-specific text module. This is displayed above the list of repositories.

If you installed Gitolite as mentioned above, and your repositories are located under /var/git/repositories, the following Gitweb configuration should be sufficient:

$projects_list = "/var/git/projects.list";
$projectroot = "/var/git/repositories";
$export_ok = "git-daemon-export-ok";
@git_base_url_list = (_git://example.com_);

7.4.2. Gitweb and Apache

Assuming that you have installed the CGI script under /usr/lib/cgi-bin and the image and CSS files under /usr/share/gitweb (as the Debian gitweb package does), configure Apache as follows:

Create /etc/apache2/sites-available/git.example.com with the following content:

<VirtualHost *:80>
  ServerName    git.example.com
  ServerAdmin   admins@example.com

  SetEnv GITWEB_CONFIG /etc/gitweb.conf

  Alias /gitweb.css         /usr/share/gitweb/gitweb.css
  Alias /git-logo.png       /usr/share/gitweb/git-logo.png
  Alias /git-favicon.png    /usr/share/gitweb/git-favicon.png
  Alias /                   /usr/lib/cgi-bin/gitweb.cgi

  Options +ExecCGI
</VirtualHost>

Then you need to activate the virtual host and let Apache reload the configuration:

# a2ensite git.example.com
# /etc/init.d/apache2 reload

7.4.3. Gitweb and Lighttpd

Depending on how you implement virtual hosts in Lighttpd, the configuration might look different. Three things are important: That you make aliases for the globally installed Gitweb files, set the environment variable GITWEB_CONFIG and that CGI scripts are executed. To do this you need to load the modules mod_alias, mod_setenv and mod_cgi (if you haven’t already done so).

The configuration then looks like this:⁠^[104]

$HTTP["host"] =~ "^git\.example\.com(:\d+)?$" {
    setenv.add-environment = ( "GITWEB_CONFIG" => "/etc/gitweb.conf" )
    alias.url = (
        "/gitweb.css"       => "/usr/share/gitweb/gitweb.css",
        "/git-logo.png"     => "/usr/share/gitweb/git-logo.png",
        "/git-favicon.png"  => "/usr/share/gitweb/git-favicon.png",
        "/"                 => "/usr/lib/cgi-bin/gitweb.cgi",
    )
    $HTTP["url"] =~ "^/$" {
        cgi.assign = ( ".cgi" => "" )
    }
}

Figure 42. Gitweb’s Summary page

Figure 43. Viewing a commit in Gitweb

7.5. CGit — CGI for Git

CGit (“CGI for Git”) is an alternative web frontend. Unlike Gitweb, which is written entirely in Perl, CGit is written in C and uses caching where possible. This renders it much faster than Gitweb.

To install CGit, you need to download the sources first. You will need the latest version of Git to access routines from the Git source code. To do this, you need to initialize the already configured submodule and download the code:

$ git clone git://git.zx2c4.com/cgit
...
$ cd cgit
$ git submodule init
Submodule 'git' (git://git.kernel.org/pub/scm/git/git.git) registered
for path 'git'
$ git submodule update
<Git-Sourcen werden heruntergeladen.>

By default CGit installs the CGI file in a somewhat obscure directory /var/www/htdocs/cgit. To choose more sensible alternatives, create a file cgit.conf in the CGit directory, which is automatically included in the Makefile:

CGIT_SCRIPT_PATH=/usr/lib/cgi-bin
CGIT_DATA_PATH=/usr/share/cgit

Now the program can be compiled and installed via make install. However, it is recommended to use checkinstall⁠^[105] so that you can easily get rid of the package if necessary.

Figure 44. Overview page of CGit

7.5.1. CGit, Apache and Lighttpd

The integration in Apache and Lighttpd is similar. However, since CGit uses “nicer” URLs (like http://git.example.com/dwm/tree/dwm.c for the dwm.c file from the dwm repository), a little effort is required to rewrite the URLs.

The following configurations run CGit on git.example.com:

<VirtualHost *:80>
  ServerName git.example.com

  AcceptPathInfo On
  Options +ExecCGI

  Alias /cgit.css /usr/share/cgit/cgit.css
  Alias /cgit.png /usr/share/cgit/cgit.png
  AliasMatch ^/(.*) /usr/lib/cgi-bin/cgit.cgi/$1
</VirtualHost>

For Lighttpd you have to resort to some tricks. You must not forget to configure virtual-root=/ (see below — this setting is not harmful for Apache either).

$HTTP["host"] =~ "^git\.example\.com(:\d+)?$" {
    alias.url = (
        "/cgit.css" => "/usr/share/cgit/cgit.css",
        "/cgit.png" => "/usr/share/cgit/cgit.png",
        "/cgit.cgi" => "/usr/lib/cgi-bin/cgit.cgi",
        "/"         => "/usr/lib/cgi-bin/cgit.cgi",
    )
    cgi.assign = ( ".cgi" => "" )
    url.rewrite-once = (
        "^/cgit\.(css|png)" => "$0", # statische Seiten "durchreichen"
        "^/.+" => "/cgit.cgi$0"
    )
}

7.5.2. Configuration

The configuration is controlled by the file /etc/cgitrc. A list of supported options can be found in the file cgitrc.5.txt in the source directory of CGit (unfortunately the program does not include any other documentation). The most important ones are listed below:

clone-prefix: URL where the source code (preferably via Git protocol) can be downloaded (similar to @git_base_url_list from Gitweb).
enable-index-links: If set to 1, another column appears in the repository listing, with direct links to the tabs “summary”, “log” and “tree”.
enable-gitweb-owner: If set to 1, the owner is read from the Git repository’s gitweb.owner configuration. Gitolite sets this option automatically when you specify a name, see Sec. 7.2.3, “Ownership and Description”.
enable-log-filecount: Displays a column for each commit, showing the number of changed files.
enable-log-linecount: Analogous to -filecount, displays a summary of added/removed rows.
scan-path: Path that CGit should search for Git repositories. Attention: This option doesn’t take into account whether the repository has been released by the git-daemon-export-ok file (see also project-list)! Also note that the repositories added in this way will only inherit the settings that were made up to that point. It is therefore recommended to list the scan-path line last in the file.
project-list: List of project files to be included in the scan-path. Gitolite creates such a file for all public repositories. See the sample configuration below.
remove-suffix: If the option is set to 1: the .git suffix is removed from URLs or repository names.
root-title: Headline that is displayed on the home page, next to the logo.
root-desc: Lettering that is displayed on the home page, under the headline.
side-by-side-diffs: If the option is set to 1, diff output will display two files side by side instead of using the unified diff format.
snapshots: Specifies which snapshot formats are offered. By default, none are offered. Possible values are tar, tar.gz, tar.bz2 and zip. Specify the desired formats separated by spaces.
virtual-root: Specifies which URL CGit should prefix to each link. If you set CGit to a “higher”" layer, e.g. http://git.example.com, this option should be set to / (this is especially necessary if you use Lighttpd). If you want to run CGit in a subdirectory instead, you should adjust this option accordingly, e.g. to /git.

With the following configuration, any repository you have allowed Gitweb access to in Gitolite will appear in the listing — and the description and author (if specified, see Sec. 7.2.3, “Ownership and Description”) will also be displayed:

virtual-root=/
enable-gitweb-owner=1
remove-suffix=1
project-list=/var/git/projects.list
scan-path=/var/git/repositories

Figure 45. Viewing a commit in CGit

7.5.3. Special Configuration of Individual Repositories

With the scan-path option explained above, in combination with Gitolite it is usually not necessary to add and configure repositories individually. However, if you want to do this, or if your repositories are not stored in a central location, you can do this per repository as follows:

repo.url=foo
repo.path=/pub/git/foo.git
repo.desc=the master foo repository
repo.owner=fooman@example.com

For more repository-specific configurations, consult the sample configuration file or the explanations of the options in the cgitrc.5.txt file in the source directory of CGit. You can also group these manually configured repositories under different sections (option section).

7.5.4. Exploiting Caching

CGit is especially fast compared to Gitweb because it is written in C and also supports caching. This is especially necessary if you have many repositories and/or many page views in a short time.

CGit uses a simple hash mechanism to check if a request is already in the cache and not too old (configurable, see list below). If such a cache entry is present, it will be delivered instead of re-creating the same page (the HTTP header Last-Modified stays the same, i.e. the browser knows when the page is from).

CGit also caches the result of scan-path. This way CGit doesn’t have to add all repositories one by one for the overview page each time.

`cache-root`	Path where the cache files are stored; defaults to `/var/cache/cgit`.
`cache-size`	Number of entries (i.e. individual pages) that the cache contains. The default value is 0, so caching is disabled. A value of 500 should be enough even for large pages.
`cache⁠-⁠<type>⁠-⁠ttl`	Time in minutes for a cache entry to be considered “current”. You can configure the time specifically for individual pages. Possible `<type>` values are: `scanrc` for the result of `scan-path`, `root` for the repository listing, `repo` for the “home” page of a repository, and `dynamic` or `static` for the “dynamic” pages (such as for branch names) or static pages (such as for a commit identified by its SHA-1 sum). By default, these values are set to five minutes, except for `scanrc` (15).

Another important factor that influences how fast the index page builds up is the use of so-called age files. The Idle column is usually recreated each time CGit goes through the branches of each repository and notes the age. This is not very fast though.

It’s more practical to use one file per repository, indicating when the last commit was uploaded. This is best done with hooks (see Sec. 8.2, “Hooks”). Use this command in the post-update hook:

mkdir -p info/web || exit 1
git for-each-ref \
    --sort=-committerdate \
    --format='%(committerdate:iso8601)' \
    --count=1 'refs/heads/*' \
    > info/web/last-modified

If you want to use a different path instead of info/web/last-modified (relative to $GIT_DIR), use the CGit configuration key agefile for the specification.

8. Git Automation

In this chapter, we’ll introduce advanced techniques for automating Git. In the first section about Git attributes, we’ll show you how to tell Git to treat certain files separately, for example, to call an external diff command on graphics.

We continue with hooks — small scripts that are executed when various git commands are called, for example to notify all developers via email when new commits arrive in the repository.

Then we’ll give a basic introduction to scripting with Git and show you useful plumbing commands.

Finally, we will introduce the powerful filter-branch command, which you can use to rewrite the project history on a large scale, for example to remove a file with a password from all commits.

8.1. Git Attributes — Treating Files Separately

Git attributes allow you to assign specific properties to individual files or a group of files so that Git treats them with special care; examples would be forcing the end of lines or marking certain files as binary.

You can write the attributes either in the file .gitattributes or .git/info/attributes. The latter is for a repository and is not managed by Git. A .gitattributes file is usually checked in, so all developers use these attributes. You can also store additional attribute definitions in subdirectories.

One line in this file has the format:

<pattern> <attrib1> <attrib2> ...

An example:

*.eps   binary
*.tex   -text
*.c     filter=indent

Usually attributes can be set (e.g. `binary`), canceled (-text) or set to a value (filter=indent). The man page gitattributes(5) describes in detail how Git interprets the attributes.

A project that is developed in parallel on Windows and Unix machines suffers from the fact that the developers use different conventions for line endings. This is due to the operating system: Windows systems use a carriage return followed by a line feed (CRLF), while unixoid systems use only a line feed (LF).

By means of suitable git attributes you can determine an adequate policy — in this case the attributes text or eol are responsible. The attribute text causes the line ends to be "normalized". Whether a developer’s editor uses CRLF or just LF, Git will only store the version with LF in the blob. If you set the attribute to auto, Git will only perform this normalization if the file also looks like text.

The eol attribute, on the other hand, determines what happens during a checkout. Regardless of the user’s core.eol setting, you can specify e.g. CRLF for some files (because the format requires it).

*.txt   text
*.csv   eol=crlf

With these attributes, .txt files are always saved internally with LF and checked out as CRLF if required (platform- or user-dependent). CSV files on the other hand are checked out with CRLF on all platforms. (Internally, Git will save all these blobs with simple LF extensions).

8.1.1. Filter: Smudge and Clean

Git offers a filter to "smudge" files after a checkout and to "clean" files again before a git add.

The filters do not get any arguments, but only the content of the blob on standard in. The output of the program is used as new blob.

For each filter you have to define a Smudge and a Clean command. If one of the definitions is missing or if the filter is cat, the blob is taken over unchanged.

Which filter is used for which type of files is defined by the git attribute filter. For example, to automatically indent C files correctly before a commit, you can use the following filter definitions (instead of <indent>, any other name is possible):

$ git config filter.<indent>.clean indent
$ git config filter.<indent>.smudge cat
$ echo '*.c filter=<indent>' > .git/info/attributes

To "clean up" a C file, Git now automatically calls the indent program that should be installed on standard systems.⁠^[106]

8.1.2. Keywords in Files

So in principle the well-known keyword expansions can be realized, so that e.g. $Version$ becomes $Version: v1.5.4-rc2$ .

You define the filters in your configuration and then equip corresponding files with this git attribute. This works like this, for example:

$ git config filter.version.smudge \~/bin/git-version.smudge
$ git config filter.version.clean ~/bin/git-version.clean
$ echo '* filter=version' > .git/info/attributes

A filter that replaces or cleans up the $Version$ keyword could be implemented as a Perl one-liner; first the Smudge filter:

#!/bin/sh
version=`git describe --tags`
exec perl -pe _s/$Version(:\s[^$]+)?$/$Version: _"$version"_$/g_

And the Clean-Filter:

#!/usr/bin/perl -p
s/$Version: [^$]+$/$Version$/g

It is important that repeated application of such a filter does not make uncontrolled changes in the file. A double call to Smudge should be fixed by a single call to Clean.

8.1.2.1. Restrictions

The concept of filters in Git is intentionally kept simple and will not be expanded in future versions. The filters receive no information about the context in which Git is currently located: Is a checkout happening? A merge? A diff? They only get the blob content. So the filters should only perform context-independent manipulations.

At the time Smudge is called, the HEAD may not yet be up to date (the above filter would write an incorrect version number to the file during a git checkout, because it is called before the HEAD is moved). So the filters are not very suitable for keyword expansion.

This may annoy users who have become accustomed to this feature in other version control systems. However, there are no good arguments for such an expansion within a version control system. The internal mechanisms Git uses to check if files have been modified are paralyzed (since they always have to go through the clean filter). Also, because of the structure of Git repositories, you can "track" a blob through the commits or trees, so you can always tell if a file belongs to a commit by its contents if necessary.

So keyword expansion is only useful outside of Git. This is not the responsibility of Git, but a Makefile target or script. For example, a make dist can replace all occurrences of VERSION with the output of git describe --tags. Git will display the files as "changed". Once the files are distributed (e.g. as a tarball), you can clean up with git reset --hard.

Alternatively, the export-subst attribute ensures that an expansion of the form $Format:<Pretty>$ is performed. Where <Pretty> must be a format that is valid for git log --pretty=format:<Pretty>, e.g. `%h` for the shortened commit hash sum. Git will only expand these attributes if the file is packaged via git archive (see Sec. 6.3.2, “Creating Releases”).

8.1.3. Own Diff Programs

Git’s internal diff mechanism is very well suited for all types of plaintext. But it fails with binaries - Git just tells you whether they differ or not. However, if you have a project where you need to manage binary data, such as PDFs, OpenOffice documents, or images, it’s a good idea to define a special program that creates meaningful diffs for these files.

For example, there are antiword and pdftotext to convert Word documents and PDFs to plaintext. There are analogous scripts for OpenOffice formats. For images you can use commands from the ImageMagick suite (see also the example below). If you manage statistical data, you can plot the changed recordsets side by side. Depending on the nature of the data, there are usually adequate ways to visualize changes.

Such conversion processes are, of course, lossy: You cannot use this diff output, for example to make meaningful changes to the files in a merge conflict. But to get a quick overview of who changed what, such techniques are sufficient.

8.1.3.1. API for External Diff Programs

Git provides a simple API for custom diff filters. A diff filter is always passed the following seven arguments:

path (name of the file in the Git repository)
old version of the file
old SHA-1 ID of the blob
old Unix rights
new version of the file
new SHA-1 ID of the blob
new Unix rights

The arguments 2 and 5 may be temporary files, which will be deleted as soon as the diff program quits again, so you don’t have to care about cleaning up.

If one of the two files does not exist (newly added or deleted), then /dev/null is passed as file name. The corresponding blob is then 00000…, even if a file does not yet exist as a fixed object in the object database (i.e. only in the working tree or index). The Diff command must be able to handle these cases accordingly.

8.1.3.2. Configuring External Diffs

There are two ways to call an external diff program. The first method is temporary: just set the environment variable GIT_EXTERNAL_DIFF to the path to your program before calling git diff:

$ GIT_EXTERNAL_DIFF=</pfad/zum/diff-kommando> git diff HEAD^

The other option is persistent, but requires some configuration. First you define your own diff command <name>:

$ git config diff.<name>.command </pfad/zum/diff-kommando>

The command needs to be able to handle the above mentioned seven arguments. Now you have to use the git-attribute diff to define, which diff-program is called. To do this, write e.g. the following lines in the .gitattributes file:

*.jpg diff=imgdiff
*.pdf diff=pdfdiff

When you check the file in, other users must also have set corresponding commands for imgdiff or pdfdiff, otherwise they will see the regular output. If you want to set this for one repository only, write this information to .git/info/attributes.

8.1.3.3. Comparing Pictures

A common use case are pictures: What has changed between two versions of an image? To visualize this is not always easy. The tool compare from the ImageMagick suite marks the places that have changed for images of the same size. You can also animate the two images one after the other and recognize by the "flickering" where the image has changed.

Instead, we want a program that compares the two images. Between the two images a kind of "difference" is displayed: All areas where changes have occurred are copied from the new image onto a white background. So the diff shows which areas have been added.

Therefore we save the following script under $HOME/bin/imgdiff:⁠^[107]

#!/bin/sh

OLD="$2"
NEW="$5"

# "xc:none" ist "Nichts", entspricht einem fehlenden Bild
[ "$OLD" = "/dev/null" ] && OLD="xc:none"
[ "$NEW" = "/dev/null" ] && NEW="xc:none"

exec convert "$OLD" "$NEW" -alpha off \
    \( -clone 0-1 -compose difference -composite -threshold 0 \) \
    \( -clone 1-2 -compose copy_opacity -composite \
       -compose over -background white -flatten \) \
    -delete 2 -swap 1,2 +append \
    -background white -flatten x:

Finally, we need to configure the diff command and make sure it is used by an entry in the .git/info/attributes file.

$ git config diff.imgdiff.command ~/bin/imgdiff
$ echo "*.gif diff=imgdiff" > .git/info/attributes

As an example we use the original versions of the Tux.⁠^[108] First we insert the black and white Tux:

$ wget http://www.isc.tamu.edu/~lewing/linux/sit3-bw-tran.1.gif \
  -Otux.gif
$ git add tux.gif && git commit -m "tux hinzugefügt"

It will be replaced by a colored version in the next commit:

$ wget http://www.isc.tamu.edu/~lewing/linux/sit3-bwo-tran.1.gif \ 
  -Otux.gif
$ git diff

The output of the git diff command is a window with the following content: On the left the old version, on the right the new version, and in the middle a mask of those parts of the new image that are different from the old.

Figure 46. The output of git diff with the custom diff program imgdiff

The example with the Tux incl. manual can also be found in a repository at: https://github.com/gitbuch/tux-diff-demo.

8.2. Hooks

Hooks provide a mechanism to "hook" into important Git commands and perform your own actions. Therefore, hooks are usually small shell scripts to perform automated tasks, such as sending emails as soon as new commits are uploaded, or checking for whitespace errors before a commit and issuing a warning if necessary.

For hooks to be executed by Git, they must be located in the hooks/ directory in the Git directory, i.e. under .git/hooks/ or under hooks/ at the top level for bare repositories. They must also be executable.

Git automatically installs sample hooks on a git init, but these have the extension <hook>.sample and are therefore not executed without user intervention (renaming of files).

You can activate a supplied hook e.g. like this:

$ mv .git/hooks/commit-msg.sample .git/hooks/commit-msg

Hooks come in two classes: those that are executed locally (checking commit messages or patches, performing actions after a merge or checkout, etc.), and those that are executed server-side when you publish changes via git push.⁠^[109]

Hooks whose name begins with pre- can often be used to decide whether or not to perform an action. If a pre-hook does not end successfully (i.e. with a non-zero exit status), the action is aborted. Technical documentation on how this works can be found in the githooks(5) man page.

8.2.1. Commits

pre-commit: Is called before the commit message is queried. If the hook terminates with a non-zero value, the commit process is aborted. The hook installed by default checks whether a newly added file has non-ASCII characters in the file name and whether there are whitespace errors in the modified files. With the -n or --no-verify option, git commit skips this hook.

prepare-commit-msg: Will be executed right before the message is displayed in an editor. Gets up to three parameters, the first of which is the file where the commit message is stored so that it can be edited. For example, the hook can add lines automatically. A non-zero exit status cancels the commit process. However, this hook cannot be skipped and therefore should not duplicate or replace the functionality of pre-commit.

commit-msg: Will be executed after the commit message is entered. The only argument is the file where the message is stored, so that it can be modified (normalization). This hook can be skipped by -n or --no-verify; if it does not terminate successfully, the commit process is aborted.

post-commit: Called after a commit has been created.

These hooks act only locally and are used to enforce certain policies regarding commits or commit messages. The pre-commit hook is especially useful for this. For example, some editors do not adequately indicate when there are spaces at the end of the line, or spaces contain spaces. Again, this is annoying when other developers have to clean up whitespace in addition to regular changes. This is where Git helps with the following command:

$ git diff --cached --check
hooks.tex:82: trailing whitespace.
+ auch noch Whitespace aufräumen müssen._

The --check option lets git diff check for such whitespace errors and will only exit successfully if the changes are error-free. If you write this command in your pre-commit hook, you will always be warned if you want to check in whitespace errors. If you are quite sure, you can simply suspend the hook temporarily with git commit -n.

Similarly, you can also store the "Check Syntax" command for a script language of your choice in this hook. For example, the following block for Perl scripts:

git diff --diff-filter=MA --cached --name-only |
while read file; do
    if [ -f $file ] && [ $(head -n 1 $file) = "#!/usr/bin/perl" ]; then
        perl -c $file || exit 1
    fi
done
true

The names of all files modified in the index (diff filter modified and added, see also Sec. 8.3.4, “Finding Changes”) are passed to a subshell that checks per file whether the first line is a Perl script. If so, the file is checked with perl -c. If there is a syntax error in the file, the command will issue an appropriate error message, and exit 1 will terminate the hook, so Git will abort the commit process before an editor is opened to enter the commit message.

The closing true is needed e.g. if a non-perl file was edited: Then the if construct fails, the shell returns the return value of the last command, and although there is nothing to complain about, Git will not execute the commit. With the line true the hook was successful if all passes of the while loop were successful.

The hook can of course be simplified by assuming that all Perl files are present as <name>.pl. Then the following code is sufficient:

git ls-files -z -- _*.pl_ | xargs -z -n 1 perl -c

Since you might want to check only the files managed by Git, a git ls-files is better than a simple ls, because that would also list untracked files ending in .pl.

Besides checking the syntax, you can of course also use Lint style programs that check the source code for "unsightly" or non portable constructs.

Such hooks are extremely useful to avoid accidentally checking in faulty code. If warnings are inappropriate, you can always skip the hook pre-commit by using the -n option when committing.

8.2.2. Server Side

The following hooks are called on the receiver side of git receive-pack after the user enters git push in the local repository.

For a push operation, git send-pack creates one packfile on the local side (see also Sec. 2.2.3, “The Object Database”), which is received by git receive-pack on the recipient side. Such a packfile contains the new values of one or more references as well as the commits required by the recipient repository to completely map the version history. The two sides negotiate which commits these are in advance (similar to a merge base).

pre-receive: The hook is called once and receives a list of changed references on standard input (see below for format). If the hook does not complete successfully, git receive-pack refuses to accept it (the whole push operation fails).

update: Is called once per changed reference and gets three arguments: the old state of the reference, the proposed new one and the name of the reference. If the hook does not end successfully, the update of the single reference is denied (in contrast to pre-receive, where only a whole packfile can be agreed or not).

post-receive: Similar to pre-receive, but is called only after the references have been changed (so it has no influence on whether the packfile is accepted or not).

post-update: After all references are changed, this hook is executed once and gets the names of all changed references as arguments. But the hook is not told, on which state the references were before or are now. (You can use post-receive for this.) A typical use case is a call to git update-server-info, which is necessary if you want to provide a repository via HTTP.

8.2.2.1. The Format of the Receive Hooks

The pre-receive and post-receive hooks get an equivalent input to standard input. The format is the following:

<alte-sha1> <neue-sha1> <name-der-referenz>

This can look like this, for example:

0000000...0000000 ca0e8cf...12b14dc refs/heads/newbranch
ca0e8cf...12b14dc 0000000...0000000 refs/heads/oldbranch
6618257...93afb8d 62dec1c...ac5373b refs/heads/master

A SHA-1 sum of all zeros means "not present". So the first line describes a reference that was not present before, while the second line means the deletion of a reference. The third line represents a regular update.

You can easily read the references with the following loop:

while read old new ref; do
  # ...
done

In old and new then the SHA-1 sums are stored, while ref contains the name of the reference. A git log $old..$new would list all new commits. The default output is forwarded to git send-pack on the page where git push was entered. So you can forward any error messages or reports directly to the user.

8.2.2.2. Sending E-Mails

A practical use of the post-receive hook is to send out emails as soon as new commits are available in the repository. You can program this yourself, of course, but there is a ready-made script that comes with Git. You can find it in the Git source directory under contrib/hooks/post-receive-email, and some distributions, such as Debian, also install it along with Git to /usr/share/doc/git/contrib/hooks/post-receive-email.

Once you have copied the hook into the hooks/ subdirectory of your bare repository and made it executable, you can adjust the configuration accordingly:

$ less config
...
[hooks]
  mailinglist = "Autor Eins <autor1@example.com>, autor2@example.com"
  envelopesender = "git@example.com"
  emailprefix = "[project] "

This means that for each push operation per reference, a mail is sent with a summary of the new commits. The mail goes to all recipients defined in hooks.mailinglist and comes from hooks.envelopesender. The subject line is prefixed with the hooks.emailprefix, so that the mail can be sorted away more easily. More options are documented in the comments of the hooks.

8.2.2.3. The Update Hook

The update hook is called for each reference individually. It is therefore particularly well suited to implement a kind of "access control" to certain branches.

In fact, the update hook is used by Gitolite (see Sec. 7.2, “Gitolite: Simple Git Hosting”) to decide whether a branch may be modified or not. Gitolite implements the hook as a Perl script that checks whether the appropriate permission is present and terminates with a zero or non-zero return value accordingly.

8.2.2.4. Deployment via Hooks

Git is a version control system and knows nothing about deployment processes. However, you can use the update hook to implement a simple deployment procedure - e.g. for web applications.

The following update hook will, if the master branch has changed, replicate the changes to /var/www/www.example.com:

[ "$3" = "refs/heads/master" ] || exit 0
env GIT_WORK_TREE=/var/www/www.example.com git checkout -f

So as soon as you upload new commits via git push to the server’s master branch, this hook will automatically update the web presence.

8.2.3. Applying Patches

The following hooks are each called by git am when one or more patches are applied.

applypatch-msg: Is called before a Patch is applied. The hook receives as its only parameter the file where the commit message of the patch is stored. The hook can change the message if necessary. A non-zero exit status causes git am not to accept the patch.

pre-applypatch: Called after a patch has been applied, but before the change is committed. A non-zero exit status causes git am not to accept the patch.

post-applypatch: Is called after a patch has been applied.

The hooks installed by default execute the corresponding commit hooks commit-msg and pre-commit, if enabled.

8.2.4. Other Hooks

pre-rebase: Is executed before a rebase process starts. Gets as arguments the references that are also passed to the rebase command (e.g. for the git rebase master topic command, the hook gets the arguments master and topic). Based on the exit status git rebase decides whether the rebase process is executed or not.

pre-push: Is executed before a push operation starts. Receives on standard input lines of the form <locale-ref>␣`<locale-sha1>`␣`<remote-ref>`␣`<remote-sha1>`. If the hook does not terminate successfully, the push process is aborted.

post-rewrite: Is called by commands that rewrite commits (currently only git commit --amend and git rebase). Receives a list in the format <old-sha1>␣`<new-sha1>` on standard input.

post-checkout: Is called after a checkout. The first two parameters are the old and new reference to which HEAD points. The third parameter is a flag that indicates whether a branch has been changed (1) or individual files have been checked out (0).

post-merge: Will be executed if a merge was successfully completed. The hook gets a 1 as argument if the merge was a so called squash-merge, i.e. a merge that did not create a commit but only processed the files in the working tree.

pre-auto-gc: Is called before git gc --auto is executed. Prevents execution of the automatic garbage collection if the return value is not zero.

You can use the post-checkout and post-commit hooks to teach Git "real" file permissions. This is because a blob object does not accurately reflect the contents of a file and its access rights. Instead, Git only knows "executable" or "non-executable".⁠^[110]

The script stored in the git source directory under contrib/hooks/setgitperms.perl provides a ready-made solution that you can integrate into the above hooks. The script stores the real access rights in a .gitmeta file. If you do the read-in (option -r) in the pre-commit hook and give the hooks post-checkout and post-merge the command to write permissions (option -w), the permissions of your files should now be persistent. See the comments in the file for the exact commands.

The access rights are of course only stable between checkouts - unless you check in the .gitmeta file and force the use of the hooks, clones of this repository will of course only get the "basic" access rights.

8.3. Writing Your Own Git Commands

Git follows the Unix philosophy of "one tool, one job" with its division into subcommands. It also divides the subcommands into two categories: Porcelain and Plumbing.

Porcelain refers to the "good porcelain" that is taken out of the cupboard for the end user: a tidy user interface and human-readable output. Plumbing commands, on the other hand, are mainly used for "plumbing work" in scripts and have a machine-readable output (usually line by line with unique separators).

In fact, a substantial part of the Porcelain commands is implemented as shell script. They use the various plumbing commands internally, but present a comprehensible interface to the outside. The commands rebase, am, bisect and stash are just a few examples.

It is therefore useful and easy to write your own shell scripts to automate frequently occurring tasks in your workflow. These could be scripts that control the release process of the software, create automatic changelogs or other operations tailored to the project.

Writing your own git command is very easy: You just have to place an executable file in a directory of your $PATH (e.g. in ~/bin) whose name starts with git-. If you type git <command> and <command> is neither an alias nor a known command, Git will simply try to run git-<command>.

Even if you can write scripts in any language you like, we recommend using shell scripts: Not only are they easier to understand for outsiders, but above all, the typical operations used to combine Git commands - calling programs, redirecting output - are "intuitively" possible with the shell and do not require any complicated constructs, such as qx() in Perl or os.popen() in Python.

When writing shell scripts, please pay attention to POSIX compatibility!⁠^[111] This includes in particular not using "bashisms" like [[ … ]] (the POSIX equivalent is [ … ]). If your script does not run without problems with Dash⁠^[112], you should explicitly specify the shell used in the shebang line, e.g. via #!/bin/bash.

All scripts presented in the following section can also be found online, in the script collection for this book.⁠^[113]

8.3.1. Initialization

Typically, you want to ensure that your script is executed in a repository. For necessary initialization tasks, Git offers the git-sh-setup. You should include this shell script directly after the shebang line using . (known as source in interactive shells):

#!/bin/sh

. $(git --exec-path)/git-sh-setup

Unless Git can detect a repository, git-sh-setup will abort. Also, the script will abort if it is not running at the top level in a repository. Your script will not be executed and an error message will be displayed. You can work around this behavior by setting the NONGIT_OK or SUBDIRECTORY_OK variable before the call.

Beside this initialization mechanism there are some functions available, which do frequently occurring tasks. Below is an overview of the most important ones:

cd_to_toplevel: Switches to the top level of the Git repository.

say: Outputs the arguments, unless GIT_QUIET is set.

git_editor: Opens the editor set for Git on the specified files. It’s better to use this function than "blind" `$EDITOR`. Git also uses this as a fallback.

git_pager: Opens the pager defined for Git.

require_work_tree: The function terminates with an error message if there is no working tree to the repository — this is the case with bare repositories. So you should call this function for security reasons if you want to access files from the working tree.

8.3.2. Position in the Repository

In scripts you will often need the information from which directory the script was called. The Git command rev-parse offers some options for this. The following script, stored under ~/bin/git-whereami, illustrates how to "find your way" within a repository.

#!/bin/sh

SUBDIRECTORY_OK=Yes
. $(git --exec-path)/git-sh-setup

gitdir="$(git rev-parse --git-dir)"
absolute="$(git rev-parse --show-toplevel)"
relative="$(git rev-parse --show-cdup)"
prefix="$(git rev-parse --show-prefix)"

echo "gitdir    absolute    relative    prefix"
echo "$gitdir   $absolute   $relative   $prefix"

The output looks like this:

$ git whereami
gitdir          absolute    relative    prefix
.git            /tmp/repo
$ cd very/deep
$ git whereami
gitdir          absolute    relative    prefix
/tmp/repo/.git  /tmp/repo   ../../      very/deep/

Especially important is the prefix you get via --show-prefix. If your command accepts filenames and you want to find the blobs they correspond to in the object database, you must put this prefix in front of the filename. If you are in the very/deep directory and give the script the file name README, it will find the corresponding blob in the current tree via very/deep/README.

8.3.3. List References: rev-list

The core of the plumbing commands is git rev-list (revision list). Its basic function is to resolve one or more references to the SHA-1 sum(s) to which they correspond.

With a git log <ref1>..<ref2> you display the commit messages from <ref1> (exclusive) to <ref2> (inclusive). The git rev-list command resolves this reference to the individual commits that are affected and prints it out line by line:

$ git rev-list master..topic
f4a6a973e38f9fac4b421181402be229786dbee9
bb8d8c12a4c9e769576f8ddeacb6eb4eedfa3751
c7c331668f544ac53de01bc2d5f5024dda7af283

So a script that operates on one or more commits can simply pass information to rev-list, as other Git commands understand it. Your script can even handle complicated expressions.

You can use the command, for example, to check whether fast forward from one branch to another is possible. Fast forward from <ref1> to <ref2> is possible if Git can reach the commit marked by <ref1> in the commit graph of <ref2>. In other words, there is no commit reachable from <ref1> that can’t also be reached from <ref2>.

#!/bin/sh

SUBDIRECTORY_OK=Yes
. $(git --exec-path)/git-sh-setup

[ $# -eq 2 ] || { echo "usage: $(basename $0) <ref1> <ref2>"; exit 1; }

for i in $1 $2
do
    if ! git rev-parse --verify $i >| /dev/null 2>&1 ; then
        echo "Ref:_$i_ does not exist!" && exit 1
    fi
done

one_two=$(git rev-list $1..$2)
two_one=$(git rev-list $2..$1)

[ $(git rev-parse $1) = $(git rev-parse $2) ] \
&& echo "$1 and $2 point to the same commit!" && exit 2

[ -n "$one_two" ] && [ -z "$two_one" ] \
&& echo "FF from $1 to $2 possible!" && exit 0
[ -n "$two_one" ] && [ -z "$one_two" ] \
&& echo "FF from $2 to $1 possible!" && exit 0

echo "FF not possible! $1 and $2 are diverged!" && exit 3

The calls to rev-parse in the For loop check that the arguments are references that Git can resolve to a commit (or other database object) - if this fails, the script aborts with an error message.

The output of the script could look like this:

$ git check-ff topic master
FF von master nach topic möglich!

For simple scripts, which expect only a limited number of options and arguments, a simple evaluation of these, as in the above script, is completely sufficient. However, if you are planning a more complex project, the so-called getopt mode of git rev-parse is recommended. This mode allows syntax analysis of command line options and offers a similar functionality as the C-library getopt. For details see the git-rev-parse(1) man page, section "Parseopt".

8.3.4. Finding Changes

git diff and git log tell you to display information about the files that a commit has changed, using the --name-status option:

$ git log -1 --name-status 8c8674fc9
commit 8c8674fc954d8c4bc46f303a141f510ecf264fcd
...
M       git-pull.sh
M       t/t5520-pull.sh

Each name is preceded by one of five flags⁠^[114], which are shown in the list below:

A (added): File was added

D (deleted): File was deleted

M (modified): File was changed

C (copied): File was copied

R (renamed): File was renamed

The flags C and R are followed by a three-digit number indicating the percentage that has remained the same. So if you duplicate a file, this corresponds to the output C100. A file that is renamed and slightly modified in the same commit via git mv might show up as R094 - a 94% renaming.

$ git log -1 --name-status 0ecace728f
...
M       Makefile
R094    merge-index.c   builtin-merge-index.c
M       builtin.h
M       git.c

You can use these flags to search for commits that have changed a specific file using diff filters. For example, if you want to find out who added a file when, use the following command:

$ git log --pretty=format:'added by %an %ar' --diff-filter=A -- cache.h
added by Linus Torvalds 6 years ago

You can specify several flags to a diff filter directly after each other. The question "Who did most of the work on this file?" can often be answered by whose commits modified this file the most. This can be found out, for example, by doing the following:

$ git log --pretty=format:%an --diff-filter=M -- cache.h | \
  sort | uniq -c | sort -rn | head -n 5
    187 Junio C Hamano
    100 Linus Torvalds
     27 Johannes Schindelin
     26 Shawn O. Pearce
     24 Jeff King

8.3.5. The Object Database and rev-parse

The Git command rev-parse (revision parse) is an extremely flexible tool whose task is, among other things, to translate expressions describing commits or other objects of the object database into their complete SHA-1 sum. For example, the command converts abbreviated SHA-1 sums into the unique 40-character variant:

$ git rev-parse --verify be1ca37e5
be1ca37e540973bb1bc9b7cf5507f9f8d6bce415

The --verify option is passed to make Git print an appropriate error message if the passed reference is not a valid one.

However, the command can also abbreviate a SHA-1 sum with the --short option. The default is seven characters:

$ git rev-parse --verify --short be1ca37e540973bb1bc9b7cf5507f9f8d6bce415
be1ca37

If you want to find out the name of the branch that is currently checked out (as opposed to the commit ID), use git rev-parse --symbolic-full-name HEAD.

But rev-parse (and thus also all other git-commands, which accept arguments as references) supports even more possibilities to reference objects.

<sha1>^{<type>}: Follows the reference <sha1> and resolves it to an object of type <typ>. This way you can find the corresponding tree for a commit <commit> by specifying <commit>^{tree}. If you don’t specify an explicit type, the reference is resolved until Git finds an object that isn’t a tag (which is especially handy when you want to find the equivalent of a tag).

Many git commands do not work on a commit, but on the trees that are referenced (e.g. the git diff command, which compares files, i.e. tree entries). In the man page, these arguments are called tree-ish. Git expects arbitrary references, which can be resolved to a tree, with which the command then continues to work.

<tree-ish>:<path>: Resolves the path <path> to the corresponding referenced tree or blob (corresponds to a directory or file). The referenced object is extracted from <tree-ish>, which can be a tag, a commit or a tree.

The following example illustrates how this special syntax works: The first command extracts the SHA-1 ID of the tree referenced by HEAD. The second command extracts the SHA-1 ID of the blob corresponding to the README file at the top level of the git repository. The third command then verifies that this really is a blob.

$ git rev-parse 'HEAD^{tree}'
89f156b00f35fe5c92ac75c9ccf51f043fe65dd9
$ git rev-parse 89f156b00f:README
67cfeb2016b24df1cb406c18145efd399f6a1792
$ git cat-file -t 67cfeb2016b
blob

A git show 67cfeb2016b would now show the actual contents of the blob. By redirecting with > you can extract the blob as a file to the file system.

The following script first finds the commit ID of the commit that last modifies a particular file (the file is passed as the first argument, $1). Then the script extracts the file (with prefix, see above) from the predecessor of the commit ($ref~) that last modified the file, and saves it in a temporary file.

Finally, Vim is called in diff mode on the file and then the file is deleted.

#!/bin/sh

SUBDIRECTORY_OK=Yes
. $(git --exec-path)/git-sh-setup

[ -z "$1" ] && echo "usage: $(basename $0) <file>" && exit 1
ref="$(git log --pretty=format:%H --diff-filter=M -1 -- $1)"
git rev-parse --verify $ref >/dev/null || exit 1

prefix="$(git rev-parse --show-prefix)"
temp="$(mktemp .diff.$ref.XXXXXX)"
git show $ref^:$prefix$1 > $temp

vim -f -d $temp $1
rm $temp

To resolve a lot of references with rev-parse, you should do this in one program call: rev-parse will print one line for each reference. With dozens or even hundreds of references, the single call is resource-saving and therefore faster.

8.3.6. Iterating References: for-each-ref

A common task is to iterate references. Here, Git provides the general-purpose command for-each-ref. The common syntax is git for-each-ref --format=<format> <pattern>. You can use the pattern to restrict the references to be iterated, e.g. `refs/heads` or refs/tags. With the format expression you specify which properties of the reference should be output. It consists of different fields %(fieldname), which are expanded to corresponding values in the output.

refname: Name of the reference, e.g. `heads/master`. The addition :short shows the short form, i.e. master.

objecttype: Type of object (blob, tree, commit or tag)

objectsize: Object size in byte

object name: Commit ID or SHA-1 sum

upstream: Remote Tracking Branch of the Upstream Branch

Here is a simple example how to display all SHA-1 sums of the release candidates of version 1.7.1:

$ git for-each-ref --format='%(objectname)--%(objecttype)--%(refname:\
  short)' refs/tags/v1.7.1-rc*
bdf533f9b47dc58ac452a4cc92c81dc0b2f5304f--tag--v1.7.1-rc0
d34cb027c31d8a80c5dbbf74272ecd07001952e6--tag--v1.7.1-rc1
03c5bd5315930d8d88d0c6b521e998041a13bb26--tag--v1.7.1-rc2

Note that the separators "--" are taken over in this way and thus additional characters for formatting are possible.

Depending on the object type, other field names are also available, for example, for a tag the tagger field, which contains the tag author, his e-mail and the date. At the same time the fields taggername, taggeremail and taggerdate are available, each containing only the name, the e-mail and the date.

For example, if you want to know for a project who ever created a tag:

$ git for-each-ref --format='%(taggername)' refs/tags | sort -u
Junio C Hamano
Linus Torvalds
Pat Thoyts
Shawn O. Pearce

As a further interface different options are offered for script languages, --shell, --python, --perl and --tcl. Thus the fields are formatted accordingly as string literals in the respective language, so that they can be evaluated per eval and translated into variables:

$ git for-each-ref --shell --format='ref=%(refname)' refs/tags/v1.7.1.*
ref=_refs/tags/v1.7.1.1_
ref=_refs/tags/v1.7.1.2_
ref=_refs/tags/v1.7.1.3_
ref=_refs/tags/v1.7.1.4_

This can be used to write the following script, which prints a summary of all branches that have an upstream branch - including SHA-1 sum of the most recent commit, its author, and tracking status. The output is very similar to git branch -vv, but a bit more readable. The authorname field contains the name of the commit author, similar to taggername. The core is the eval "$data" statement, which translates the line-by-line output of for-each-ref into the variables used later.

#!/bin/sh
SUBDIRECTORY_OK=Yes
. $(git --exec-path)/git-sh-setup

git for-each-ref --shell --format=\
"refname=%(refname:short) "\
"author=%(authorname) "\
"sha1=%(objectname) "\
"upstream=%(upstream:short)" \
refs/heads | while read daten
do
    eval "$daten"
    if [ -n "$upstream" ] ; then
        ahead=$(git rev-list $upstream..$refname | wc -l)
        behind=$(git rev-list $refname..$upstream | wc -l)
        echo $refname
        echo --------------------
        echo     "    Upstream:    "$upstream
        echo     "    Last author: "$author
        echo     "    Commit-ID    "$(git rev-parse --short $sha1)
        echo -n  "    Status:      "
        [ $ahead  -gt 0 ] && echo -n "ahead:"$ahead" "
        [ $behind -gt 0 ] && echo -n "behind:"$behind" "
        [ $behind -eq 0 ] && [ $ahead -eq 0 ] && echo -n "synchron!"
        echo
    fi
done

The output will look like this:

$ git tstatus
maint
--------------------
    Upstream:    origin/maint
    Last author: João Britto
    Commit-ID    4c007ae
    Status:      synchron!
master
--------------------
    Upstream:    origin/master
    Last author: Junio C Hamano
    Commit-ID    4e3aa87
    Status:      synchron!
next
--------------------
    Upstream:    origin/next
    Last author: Junio C Hamano
    Commit-ID    711ff78
    Status:      behind:22
pu
--------------------
    Upstream:    origin/pu
    Last author: Junio C Hamano
    Commit-ID    dba0393
    Status:      ahead:43 behind:126

The other field names as well as examples can be found in the git-for-each-ref(1) man page.

8.3.7. Rewrite References: git update-ref

If you use for-each-ref, you usually want to edit references as well - therefore the update-ref command should be mentioned. With it you can create references and safely convert or delete them. Basically git update-ref works with two or three arguments:

git update-ref <ref> <new-value> [<oldvalue>]

Here is an example that moves the master to HEAD^ if it points to HEAD:

$ git update-ref refs/heads/master HEAD^ HEAD

Or to create a new reference topic at ea0ccd3:

$ git update-ref refs/heads/topic ea0ccd3

To delete references there is the option -d:

git update-ref -d <ref> [<oldvalue>]

For example to delete the reference topic again:

$ git update-ref -d topic ea0ccd3

Of course, you could also manipulate the references with commands like echo <sha> > .git/refs/heads/<ref>, but update-ref brings various safeguards and helps to minimize possible damage. The addition <oldvalue> is optional, but helps to avoid programming errors. It also takes care of special cases (symlinks whose target is inside or outside the repository, references pointing to other references, etc.). An additional advantage is that git update-ref automatically makes entries in the reflog, which makes troubleshooting much easier.

8.3.8. Extended Aliases

If you have only one one-liner, it is usually not worthwhile to create your own script. Git aliases were developed for this use case. For example, it is possible to call external programs by prefixing them with an exclamation mark, for example to simply call gitk --all with git k:

$ git config --global alias.k '!gitk --all'

Another example, which deletes all branches already merged and uses a concatenation of commands for this is:

prune-local = !git branch --merged | grep -v ^* | xargs git branch -d

With certain constructs, you may want to rearrange the arguments passed to the alias or use them within a command chain. The following trick is suitable for this, where a shell function is built into the alias:

$ git config --global alias.demo '!f(){ echo $2 $1 ; }; f'
$ git demo foo bar
bar foo

This allows even more complex one-liners to be defined elegantly as aliases. The following construction filters out for a given file, which authors made how many commits in which the file was changed. If you send patches to the Git project’s mailing list, you are asked to send the mail via CC to the main authors of the files you changed. Use this alias to find out who they are.

who-signed = "!f(){ git log -- $1 | \
    grep Signed-off-by | sort | uniq --count | \
    sort --human-numeric-sort --reverse |\
    sed _s/Signed-off-by: / /_ | head ; } ; f "

There are some things to consider here: An alias is always executed from the toplevel directory of the repository, so the argument must contain the path inside the repository. The alias is also based on the fact that all people involved have signed off on the commit with a signed-off-by line, because these lines are used to generate the statistics. Since the alias is spread over several lines, it must be enclosed in quotes, otherwise Git cannot interpret the alias correctly. The final call to head limits the output to the top ten authors:

$ git who-signed Documentation/git-svn.txt
     46      Junio C Hamano <gitster@pobox.com>
     30      Eric Wong <normalperson@yhbt.net>
     27      Junio C Hamano <junkio@cox.net>
      5      Jonathan Nieder <jrnieder@uchicago.edu>
      4      Yann Dirson <ydirson@altern.org>
      4      Shawn O. Pearce <spearce@spearce.org>
      3      Wesley J. Landaker <wjl@icecavern.net>
      3      Valentin Haenel <valentin.haenel@gmx.de>
      3      Ben Jackson <ben@ben.com>
      3      Adam Roben <aroben@apple.com>

Further interesting ideas and suggestions can be found in the Git-Wiki on the page about aliases.⁠^[115]

8.4. Rewriting Version History

The previously introduced git rebase command and its interactive mode allows developers to edit commits at will. Code that is still in development can be "cleaned up" before it is integrated (e.g. via merge) and thus permanently merged with the software.

But what if all commits are to be changed afterwards, or at least a large part of them? Such requirements arise, for example, when a previously private project is to be published, but sensitive data (keys, certificates, passwords) are included in the commits.

Git offers the filter-branch command to automate this task. Basically, it works like this: You specify a set of references that Git should rewrite. You also define commands that are responsible for modifying the commit message, tree contents, commits, etc. Git goes through each commit and applies the appropriate filter to the appropriate part. The filters are executed per eval in the shell, so they can be complete commands or names of scripts. The following list describes the filters that Git offers:

--env-filter: Can be used to adjust the environment variables under which the commit is rewritten. Especially the variables GIT_{AUTHOR,COMMITTER}_{NAME,EMAIL,DATE} can be exported with new values if needed.

--tree filter: Creates a checkout for each commit to be rewritten, changes to the directory and executes the filter. Afterwards, new files are automatically added and old ones deleted and all changes are applied.

--index filter: Manipulates the index. Behaves similar to the tree filter, except that Git doesn’t create a checkout, making the index filter faster.

--msg-filter: Receives the commit message on default-in and prints the new message on default-out.

--commit-filter: Is called instead of git commit-tree and can thus in principle make several commits from one. See the man page for details.

--tag-name filter: Will be called for all tag names that point to a commit that has been rewritten elsewhere. If you use cat as filter, the tags will be applied.

--subdirectory-filter: Only view the commits that modify the specified directory. The rewritten history will contain only this directory, as the topmost directory in the repository.

The general syntax of the command is: git filter-branch <filter> - <references>. Here <references> is an argument for rev-parse, so it can be one or more branch names, a syntax of the form <ref1>..<ref2> or simply --all for all references. Note the double bar --, which separates the arguments for filter-branch from those for rev-parse!

As soon as one of the filters does not end with the return value zero on a commit, the whole rewrite process will abort. So be careful to catch possible error messages or ignore them by appending || true.

The original references are stored under original/, so when you rewrite the master branch, original/refs/heads/master still points to the original, unrewritten commit (and its predecessor, accordingly). If this backup reference already exists, the filter-branch command will refuse to rewrite the reference unless you specify the -f option for force.

You should always do your filter-branch experiments in a fresh clone. The chance of causing damage by unfortunate typos is not insignificant. However, if you like the result, you can easily make the new repository the master repository, and also outsource the old one as a backup.

The following examples deal with some typical use cases of the filter-branch command.

8.4.1. Removing Sensitive Information Afterwards

Ideally, sensitive data such as keys, certificates or passwords are not part of a repository. Even large binary files or other data junk unnecessarily inflate the size of the repository.

Open source software, the use of which is permitted, but the distribution of which is prohibited by license terms ('no distribution'), may of course not appear in a repository that you make available to the public.

In all these cases you can rewrite the project history so that nobody can find out that the corresponding data ever appeared in the version history of the project.

If you are working with git tags, it is always a good idea to pass the --tag-name-filter cat argument as well, so that tags pointing to commits to be rewritten will also point to the new version.

To delete only some files or subdirectories from the entire project history, use a simple index filter. All you have to do is tell Git to remove the corresponding entries from the index:

$ git filter-branch --index-filter \
  'git rm --cached --ignore-unmatch <file>' \
  --prune-empty -- --all

The --cached and --ignore-unmatch arguments tell git rm to remove only the index entry, and not to abort with an error if the corresponding entry does not exist (e.g. because the file was not added until a particular commit). If you want to delete directories, you must also specify -r.

The argument --prune-empty makes sure that commits which do not change the tree after applying the filter are omitted. So if you have added a certificate with a commit, and this commit becomes an "empty" commit by removing the certificate, Git will omit it altogether.

Similar to the command above, you can also move files or directories with git mv. If the operations are a bit more complex, you should consider designing several simple filters and calling them one after the other.

It is possible that a file you want to delete had a different name in the past. To check this, use the command git log --name-status --follow - <file> to detect possible renames.

8.4.1.1. Removing Strings from Files

If you don’t want to change whole files, but only certain lines in all commits, a filter at index level is not sufficient. You must use a tree filter.

For each commit, Git will check out the relevant tree, change to the appropriate directory, and then run the filter. Any changes you make will be applied (without you having to use git add etc.).

To erase the password v3rYs3cr1T from all files and commits, the following commands are required:

$ git filter-branch --tree-filter 'git ls-files -z | \
  xargs -0 -n 1 sed -i "s/v3rYs3cr1T/PASSWORD/g" \
  2>/dev/null || true' -- master
Rewrite cbddbd3505086b79dc3b6bd92ac9f811c8a6f4d1 (142/142)
Ref _refs/heads/master_ was rewritten

The command performs an in-place replacement with sed on every file in the repository. Any error messages are neither issued nor do they cause the filter-branch call to be aborted.

After the references have been rewritten, you can use the pickaxe tool (-G<expression>, see Sec. 2.1.6, “Examining the Project History”) to verify that no commit really introduces the string v3rYs3cr1T anymore:

$ git log -p -G"v3rYs3cr1T"
# should not produce any output

Tree filters must check out the appropriate tree for each commit. This creates a considerable overhead for many commits and many files, so a filter-branch call can take a long time.

By specifying -d <path> you can instruct the command to check out the tree to <path> instead of .git-rewrite/. If you use a tmpfs here (especially /dev/shm or /tmp), the files are only held in memory, which can speed up the command call by several orders of magnitude.

8.4.1.2. Renaming a Developer

If you want to rename a developer, you can do this by changing the variable GIT_AUTHOR_NAME in an environment filter, if necessary. For example like this:

$ git filter-branch -f --env-filter \
  'if [ "$GIT_AUTHOR_NAME" = "Julius Plenz" ];
  then export GIT_AUTHOR_NAME="Julius Foobar"; fi' -- master

8.4.2. Extracting a Subdirectory

The Subdirectory filter allows you to rewrite the commits so that a subdirectory of the current repository becomes the new top-level directory. All other directories and the former top-level directory are dropped. Commits that have not changed anything in the new subdirectory are also dropped.

In this way, you can, for example, extract the version history of a library from a larger project. The exchange between the outsourced project and the base project can work via submodules or subtree-merges (see Sec. 5.11, “Managing Subprojects”).

To split the directory t/ (containing the test suite) from the git source repository, the following command is sufficient:

$ git filter-branch --subdirectory-filter t -- master
Rewrite 2071fb015bc673d2514142d7614b56a37b3faaf2 (5252/5252)
Ref _refs/heads/master_ was rewritten

Attention: This command runs for several minutes.

8.4.3. Grafts: Subsequent Merges

Git provides a way to simulate merges via so-called Graft Points or Grafts (to graft: plant). Such grafts are stored line by line in the file .git/info/grafts and have the following format:

commit [parent1 [parent2 ...]]

In addition to the information that Git gets from the commit metadata, you can also specify one or more parents for any commits.⁠^[116]

Make sure to still consider the repository as a DAG and not close any circles: Do not define HEAD as the predecessor of the root commit! The grafts file is not part of the repository, so a git clone does not copy this information, it just helps Git find a merge base. However, when filter-branch is called, this graft information is hard-coded into the commits.

This is especially useful in two cases: If you import an old version history from a tool that cannot handle merges correctly (e.g. previous Subversion versions), or if you want to "glue" two version histories together.

Let’s assume the development was switched to Git. But nobody has taken care of converting the old version history. So the new repository was started with an initial commit that reflected the state of the project at that time.

Meanwhile, you’ve successfully converted the old version history to Git, and now you want to append it before the initial commit (or instead). To do this, proceed as follows:

$ cd <neues-repository>
$ git fetch <altes-repository> master:old-master
... Konvertierte Commits importieren ...

You now have a multi-root repository. You then need to find the initial commit of the new repository ($old_root) and define the latest commit of the old, converted repository ($old_tip) as its predecessor:

$ old_root=`git rev-list --reverse master | head -n 1`
$ old_tip=`git rev-parse old-master`
$ echo $old_root $old_tip > .git/info/grafts

Look at the result with Gitk or a similar program. If you are satisfied, you can make the grafts permanent (all commits starting at $old_tip are rewritten). To do this, call git filter-branch without specifying any filters:

$ git filter-branch -- $old_tip..
Rewrite 1591ed7dbb3a683b9bf1d880d7a6ef5d252fc0a0 (1532/1532)
Ref _refs/heads/master_ was rewritten
$ rm .git/info/grafts

Of course you also have to delete the remaining backup references (see below).

8.4.4. Deleting Old Commits

After you have removed any sensitive data from all commits, you still need to make sure that these old commits do not reappear. In the repository you rewrote, this is done in three steps:

Delete the backup references under original/.

You can do this with the following command:

$ git for-each-ref --format='%(refname)' -- 'refs/original/' | \
  xargs -n 1 git update-ref -d

If you have not yet rewritten or deleted old tags or other branches, you must of course do this first.

Delete the Reflog:

$ git reflog expire --verbose --expire=now --all

Delete the (orphaned) commits that are no longer accessible.
The best way to do this is to use the gc option --prune, which sets the time since when a commit should be unreachable so that it is deleted:
Now.
```
$ git gc --prune=now
```

If other developers are working with an outdated version of the repository, they must now "migrate". It is essential that they do not use their development branches to pull old commits back into the cleaned up repository.

The best way to do this is to clone the new repository, fetch important branches from the old repository using git fetch, and rebase directly on the new commits. You can then dispose of the old commits using git gc --prune=now.

9. Interacting with Other Version Control Systems

Git has interfaces to other version control systems, which are important for two basic use cases:

Bidirectional communication: You want to develop locally in a Git repository, but also transfer the changes to an external repository or import changes from there to Git.
Migration: You want to import the version history stored in an existing repository of another system into Git.

Git offers the following interfaces — all of which allow two-way communication and complete conversion:

Subversion (svn): The git-svn tool provides all the essential subcommands for dealing with Subversion repositories and is discussed in detail in this chapter. The program is implemented in Perl and uses the Perl bindings for Git and Subversion. It is managed together with the Git sources in the git.git repository (stored as git-svn.perl). Note: The tool is called git-svn, but is called as usual with git svn <command>. The technical documentation is available in the git-svn(1) man page.
Concurrent Versioning System (cvs): The git cvsimport command imports and synchronizes a CVS repository — its counterpart is git cvsexportcommit.
Perforce (p4): With git p4 you address repositories of the proprietary Perforce system.

For the interaction with other VCS there are also a lot of additional tools and scripts that improve, extend and partly replace the mentioned commands. But also interfaces to other version control systems, such as Mercurial, are offered. If the commands and recipes described in this chapter are not sufficient, an internet research is worthwhile. As a first starting point we recommend the Git-Wiki.⁠^[117]

In addition to its immediate communication capabilities with other systems, Git has its own simple plain-text protocol that lets you translate the version history from any system in such a way that Git creates a repository from it. For a detailed description including an example, see Sec. 9.2, “Custom Importers” about Fast Import.

9.1. Subversion

The following is about how to use git-svn. We’ll show you how to convert Subversion repositories and how to use it to exchange changes between a Subversion repository and Git.

9.1.1. Conversion

The goal is to transfer the version history from a Subversion repository to a Git repository. Before you start, you will need to make preparations that may take some time, depending on the size of your project. However, good preparation helps you to avoid mistakes from the start.

9.1.1.1. Preparation

You should have the following information at hand:

Who are the authors? What are their e-mail addresses?
How is the repository structured? Are there branches and tags?
Should metadata about the Subversion revision be stored in the git commits?

Later, you will run the command git svn clone. The answers to the above questions will determine which options and arguments you use to do this.

Our experience has shown that rarely a single conversion attempt is sufficient. If the Subversion repository is not already local, it’s definitely worth making a local copy of it — so you don’t have to download the revisions over the network on a second attempt. You can use rsvndump, for example, to do this.⁠^[118]

Subversion uses less extensive author metadata than Git; revisions are simply marked with a Subversion username, and there is no difference between the author and committer of a revision. In order for git-svn to convert Subversion usernames to the full names with email addresses typical of Git, a so-called authors file is required:

jplenz  = Julius Plenz <julius@plenz.com>
vhaenel = Valentin Haenel <valentin.haenel@gmx.de>

The file, e.g. authors.txt, is later passed to git-svn via --authors-file= or -A.

The following one-liner determines all Subversion usernames and helps you to create the file:

$ svn log --xml | grep author | sed 's_^.*>\(.*\)<.*$_\1_' | \
  sort --unique

If you do not specify an authors file when converting (or if an author is missing), git-svn will use the Subversion username as the author. The e-mail address is composed of the Subversion username and the UUID of the Subversion repository.

Find out how the repository is structured in the next step. The following questions will help:

Does the repository have a so-called trunk (main development thread), branches and tags?
1. If so, is the default Subversion layout (trunk/, branches/, tags/) used?
2. If not, in which directories are trunk, branches and tags then?
Are only a single or multiple projects managed in the repository?

If the project follows the Subversion standard layout (Figure 47, “Standard Subversion layout”), use the argument --stdlayout or -s for short.

Figure 47. Standard Subversion layout

9.1.1.2. SVN Metadata

The --no-metadata argument prevents additional metadata from being included in the commit message. To what extent this makes sense for your use case is up to you to decide. From a technical standpoint, metadata is only necessary if you want to continue to interact with the Subversion repository. However, it may also be helpful to preserve the metadata, for example if you use the Subversion revision number in your bug tracking system.

The SVN metadata appears in the last line of each commit message and takes the following form:

git-svn-id: <URL>@<Revision> <UUID>

<URL> ist die URL des Subversion-Repositorys, <Revision> die Subversion-Revision und <UUID> (Universally Unique Identifier) eine Art “Fingerabdruck” des Subversion-Repositorys. Zum Beispiel:

<URL> is the URL of the Subversion repository, <Revision> is the Subversion revision, and <UUID> (Universally Unique Identifier) is a sort of “fingerprint” of the Subversion repository. For example:

git-svn-id: file:///demo/trunk@8 2423f1c7-8de6-44f9-ab07-c0d4e8840b78

9.1.1.3. Specifying a Username

How you specify the user name depends on the transport protocol. For those where Subversion handles authentication (e.g. http, https, and svn), use the --username option. For others (svn+ssh), you must specify the username as part of the URL, for example, svn+ssh://USER@svn.example.com.

9.1.1.4. Converting Standard Layouts

You can convert an SVN repository in standard layout with the following call (after you have created an Authors file):

$ git svn clone <http://svn.example.com/> -s -A <authors.txt> \
    --no-metadata <projekt-konvertiert>

9.1.1.5. Non-Standard Layouts

If the repository is not laid out according to the Subversion standard layout, adjust the call to git svn accordingly: Instead of --stdlayout, explicitly specify the trunk with --trunk or -T, the branches with --branches or -b, and the tags with --tags or -t — if, for example, several projects are managed in one Subversion repository (Figure 48, “Non-Standard Layout”).

Figure 48. Non-Standard Layout

To convert project1, the call would be as follows:⁠^[119]

$ git svn clone <http://svn.example.com/> -T trunk/projekt1 \
  -b branches/projekt1 -t tags/projekt1 \
  -A <authors.txt> <projekt1-konvertiert>

An SVN repository without branches or tags can simply be cloned by using the URL of the project directory and omit --stdlayout entirely:

$ git svn clone <http://svn.example.com/projekt> -A authors.txt \
    --no-metadata <projekt-konvertiert>

If several independent projects are managed in one repository, we recommend that you create a separate Git repository for each project. Unlike Subversion, Git is not suitable for managing multiple projects in one repository. The object model means that the development histories (commit graphs) would become inextricably linked. How to “link” projects from different Git repositories is described in Sec. 5.11, “Managing Subprojects”.

9.1.1.6. Postprocessing

Once git svn clone has run, you’ll usually need to do a bit of rework on the repository.

During conversion, git-svn ignores all Subversion properties except svn:execute. If the Subversion repository uses the svn:ignore properties to exclude files, you can translate them into one (or recursively for multiple) .gitignore file(s):

$ git svn create-ignore

The .gitignore files are only created and added to the index — you still have to check them in.

Git creates special git branches under remotes/origin for the Subversion trunk and the Subversion branches and tags. They are very similar to the remote tracking branches, in that they reflect the state of the Subversion repository-that is, they are Subversion tracking branches, so to speak. They are mainly used for bidirectional communication and are updated when synchronized with the Subversion repository. However, if you only want to convert the repository, these branches are of no use anymore and should be rewritten to “real” Git repositories (see below).

A Subversion tracking branch is created for the trunk and for each Subversion branch,⁠^[120] and for each Subversion tag a Subversion tracking branch is also created (no git tag, see below), but under remotes/origin/tags.

Assume that the Subversion repository has the following Subversion branches and tags:

Figure 49. Example Subversion branches and tags

In this case git svn creates the following git branches:

Figure 50. Converted Git Branches

You can adjust the prefix with the option --prefix=. For example, with the --prefix=svn/ statement, all converted references are stored under remotes/svn/ instead of remotes/origin.

As already mentioned, git-svn does not create git tags for Subversion tags. This is because from a technical point of view, Subversion tags are hardly different from Subversion branches. They are also created with git svn copy and — unlike git tags — can be changed afterwards. To be able to track such updates, Subversion tags are therefore also displayed as Subversion tracking branches. Like the Subversion branches, they are of no use (but rather cause confusion) in a converted repository, and should be converted to real Git tags.

If you want to keep the Subversion branches and tags, you should translate the Subversion tracking branches into local Git branches or lightweight Git tags. The following shell script git-convert-refs will help you in the first step:⁠^[121]

#!/bin/sh

. $(git --exec-path)/git-sh-setup
svn_prefix='svn/'

convert_ref(){
  echo -n "converting: $1 to: $2 ..."
  git update-ref $2 $1
  git update-ref -d $1
  echo "done"
}

get_refs(){
  git for-each-ref $1 --format='%(refname)'
}

echo 'Converting svn tags'
get_refs refs/remotes/${svn_prefix}tags | while read svn_tag
do
  new_ref=$(echo $svn_tag | sed -e "s|remotes/$svn_prefix||")
  convert_ref $svn_tag $new_ref
done

echo "Converting svn branches"
get_refs refs/remotes/${svn_prefix} | while read svn_branch
do
  new_ref=$(echo $svn_branch | sed -e "s|remotes/$svn_prefix|heads/|")
  convert_ref $svn_branch $new_ref
done

The script assumes that the repository was converted with the --prefix=svn/ option. The two while loops do the following:

A git tag is created for each Subversion tracking branch that corresponds to a Subversion tag (e.g. refs/remotes/svn/tags/v1.0 → refs/tags/v1.0).
For each Subversion tracking branch that corresponds to a Subversion branch, a “real” local Git branch is created (e.g. refs/remotes/svn/bugfix → refs/heads/bugfix)

The script uses the plumbing commands git for-each-ref, which prints references matching the given expression line by line, and git update-ref, which rewrites and deletes references.⁠^[122]

See Figure 51, “Converted branches and tags before translation” and Figure 52, “Converted branches and tags after translation” to see how the script works. In the Subversion repository there is a trunk, a branch feature and the v1.0 tag. git-svn creates three branches under remotes/svn during the conversion process, as described above. The script git-convert-refs finally translates remotes/svn/trunk → trunk, remotes/svn/feature → feature and remotes/svn/tags/v1.0 becomes a lightweight tag.

Figure 51. Converted branches and tags before translation

Figure 52. Converted branches and tags after translation

After rewriting Subversion branches and tags, you will notice that all Git tags “sit” on very short branches (see tag v1.0 in Figure 52, “Converted branches and tags after translation” and Figure 53, “Converted Git tags on branches”). This is because each Subversion tag is stored with a Subversion commit. So the conversion behavior of git-svn is correct in principle, because one Git commit is created per Subversion revision — but a bit unwieldy for a Git repository: you cannot use git describe --tags, for example.

However, unless the Subversion tag has been modified afterwards, the tagged commit references the same tree as its ancestor, so you can move the tags to the ancestors. The following shell script git-fix-tags⁠^[123] will help here:

#!/bin/sh

. $(git --exec-path)/git-sh-setup
get_tree(){ git rev-parse $1^{tree}; }

git for-each-ref refs/tags --format='%(refname)' \
| while read tag
do
    sha1=$(git rev-parse $tag)
    tree=$(get_tree $tag )
    new=$sha1
    while true
    do
        parent=$(git rev-parse $new^)
        git rev-parse $new^2 > /dev/null 2>&1 && break
        parent_tree=$(get_tree $parent)
        [ "$parent_tree" != "$tree" ] && break
        new=$parent
    done
    [ "$sha1" = "$new" ] && break
    echo -n "Found new commit for tag ${tag#refs/tags/}: " \
        $(git rev-parse --short $new)", resetting..."
    git update-ref $tag $new
    echo 'done'
done

The script examines every tagged commit. If there is a commit among the ancestors that references the same tree, the tag is renewed. If the commit or one of its ancestors itself has multiple ancestors (after a merge), the search is aborted. In Figure 53, “Converted Git tags on branches”, you can see two tags that come into consideration: v1.0 and v2.0. The v1.0 tag was created from commit C1 and does not contain any subsequent changes. The v2.0 tag, on the other hand, was modified again after it was created from Commit C2.

Figure 53. Converted Git tags on branches

In Figure 54, “Tag v1.0 was rewritten” you can see how tag v1.0 was moved from the above script to the ancestor (because the trees are the same). However, tag v2.0 remains in place (because the trees are different due to subsequent changes).

Figure 54. Tag v1.0 was rewritten

The tool git-svn-abandon⁠^[124] takes a similar approach to the two scripts presented, i.e. it converts Subversion tracking branches and moves tags. Instead of lightweight tags, however, it creates annotated tags and does some additional cleanup work, similar to the ones we’ll cover next. Another alternative for moving tags is the script git-move-tags-up.⁠^[125]

You should still decide how to handle the trunk reference (trunk or git-svn). After conversion, it will point to the same commit as master, so you can actually delete it:

$ git branch -d trunk

There may still be Git branches in the repository after the conversion that have already been merged into master. Remove them with the following command:

$ git checkout master
$ git branch --merged | grep -v '^*' | xargs git branch -d

You can also dispose of the remaining legacy files that are both in the repository configuration and in .git/:

$ rm -r .git/svn
$ git config --remove-section svn
$ git config --remove-section svn-remote.svn

You are then ready to upload the converted history to a remote repository to share it with other developers.

$ git remote add <example> <git@git.example.com:projekt1.git>
$ git push <example> --mirror

9.1.1.7. Subversion Merges

Subversion merges are detected by git-svn using the svn:mergeinfo properties and translated as git merges — although not always. It depends on which Subversion revisions were merged and how. If all revisions affecting a branch have been merged (svn merge -r <N:M>), this is represented by a Git merge commit. However, if only individual revisions have been merged (via svn merge -c <N>), then they are simply committed with git cherry-pick instead.

For the following example, we have created a Subversion repository with a branch feature that is merged twice: once as a Subversion merge, which is considered a Git merge commit, and once as a Subversion merge, which is translated as cherry-pick. The result converted with git-svn is shown below.

Figure 55. Converted Subversion repository

The commits in the Subversion repository were made in the following order:

Standardlayout
C1 on trunk
Branch feature
C1 on feature
C2 on feature
C2 on trunk
svn merge branches/feature trunk -c 5 (commit C2 on feature)
svn merge branches/feature trunk -r 3:5 (commit C1&`C2` on feature)

Finally, it should be mentioned that git-svn is by far not the only tool for conversion. git-svn often suffers from speed problems with very large repositories. In this context, two tools are mentioned very often that work faster: on the one hand svn2git⁠^[126] and also svn-fe⁠^[127] (svn-fast-export). If you encounter problems during the conversion (e.g. if the conversion has been running for several days and there is no end in sight), it is worth taking a look at the alternatives.

9.1.2. Bidirectional Communication

The git-svn tool can not only convert a Subversion repository, it is also a better Subversion client. This means you have all the benefits of Git locally (easy and flexible branching, local commits and history) — but you can upload your Git commits from your local Git repository as Subversion commits to a Subversion repository. Additionally, git-svn allows you to download new commits from other developers in the Subversion repository to your local Git repository. You should use git-svn if a complete conversion to Git is not feasible, but you’d like to take advantage of the local benefits of Git. Note that git-svn is a somewhat limited version of Subversion, and not all features are fully available. There are some subtleties to consider, especially when uploading.

First, a summary of the most important git-svn commands:

`git svn init`	Create a Git repository to track a Subversion repository.
`git svn fetch`	Download new revisions from the Subversion repository.
`git svn clone`	Combination of 'git svn init` and `git svn fetch`.
`git svn dcommit`	Upload Git commits as Subversion revisions to the Subversion repository (diff commit)
`git svn rebase`	Combination of `git svn fetch` and `git rebase`, usually executed before a `git svn dcommit`.

9.1.2.1. Cloning a Subversion Repository

To retrieve the repository, first follow the same procedure as in the Subversion conversion section — create an authors file and determine the repository layout. Then you can use git svn clone to clone the Subversion repository, for example:

$ git svn clone http://svn.example.com/ -s \
  -A <authors.txt> <projekt-git>

The call downloads all Subversion revisions and creates a Git repository from the history under <project-git>.

Cloning an entire Subversion history can be extremely time consuming under certain circumstances. From a Subversion point of view, a long history is not a problem because the svn checkout command usually only downloads the current revision. Something similar can be done with git-svn. To do this, you first have to initialize the local Git repository and then only download the current revision (HEAD) from the trunk or branch. The advantage here is certainly the speed, the disadvantage is that there is no local history:

$ git svn init http://svn.example.com/trunk projekt-git
$ cd projekt-git
$ git svn fetch -r HEAD

As an alternative to HEAD, you could specify any revision and then use git svn fetch to download the missing revisions up to HEAD, thus cloning only part of the history.

As part of the conversion, we described how to post-process the repository. Since you want to continue interacting with the Subversion repository in the future, this is not necessary here. Also, the --no-metadata option must not be used, because otherwise the metadata of the form git-svn-id: will disappear from the commit message, and Git will no longer be able to map the commits and revisions.

The call to git-svn creates several entries in the configuration file .git/config. First, an entry svn-remote.svn, which, similar to a remote entry for a Git repository, contains information about the URL and the Subversion branches and tags to track. For example, if you cloned a repository with a standard layout, it might look like this:

[svn-remote "svn"]
    url = http://svn.example.com/
    fetch = trunk:refs/remotes/origin/trunk
    branches = branches/*:refs/remotes/origin/*
    tags = tags/*:refs/remotes/origin/tags/*

In contrast to a regular remote entry this one additionally contains the values branches and tags. These in turn each contain a refspec describing how Subversion branches and tags are stored locally as Subversion tracking branches. The fetch entry only handles the Subversion trunk and must not contain any glob expressions.

If you do not have any Subversion branches and tags, the corresponding entries are omitted:

[svn-remote "svn"]
    url = http://svn.example.com/
    fetch = :refs/remotes/git-svn

If you clone the repository with the prefix option, for example with --prefix=svn/, git svn will adjust the refspecs:

[svn-remote "svn"]
    url = http://svn.example.com/
    fetch = trunk:refs/remotes/svn/trunk
    branches = branches/*:refs/remotes/svn/*
    tags = tags/*:refs/remotes/svn/tags/*

If you specify an authors file, a separate entry is created for it. The file will still be needed in the future when you download new commits from the Subversion repository.

[svn]
    authorsfile = /home/valentin/svn-testing/authors.txt

In the section on conversion we described how to use create-ignore to create .gitignore files. However, if you want to continue working with the Subversion repository, there is little point in checking in the .gitignore files there. They have no effect on Subversion and only confuse other developers who continue to work with the native Subversion client (svn). Instead, there is an option to store the patterns to ignore in the .git/info/excludes file (see Sec. 4.4, “Ignoring Files”), which is not part of the repository. The git svn show-ignore command, which searches for and outputs all svn-ignore properties, can help here:

$ git svn show-ignore > .git/info/excludes

9.1.2.2. Examining a Repository

In addition, git-svn provides some commands for examining the history and other properties of the repository:

`git svn log`	A hybrid of `svn log` and `git log`. The subcommand produces output similar to `svn log`, but uses the local repository to create it. Several options of `git svn` have been recreated, such as `-r <N>:<M>`. Unknown options, e.g. `-p`, are passed directly to `git log` so that options from both commands can be mixed: $ git svn log -r 3:16 -p It would now show the revisions 3–16, including a patch of the changes.
`git svn blame`	Similar to `svn blame`. With the `--git-format` option, the output has the same format as `git blame`, but with Subversion revisions instead of the SHA-1 IDs.
`git svn find⁠-⁠rev`	Shows the SHA-1 ID of the Git commit, which is the changeset of a particular Subversion revision. The revision is passed with the syntax `r<N>`, where `<N>` is the revision number: $ git svn find-rev r6 c56506a535f9d41b64850a757a9f6b15480b2c07
`git svn info`	Like `svn info`. Returns various information about the Subversion repository.
`git svn proplist`	Like `svn proplist`, prints a list of existing Subversion properties.
`git svn propget`	Like `svn propget`, outputs the value of a single Subversion property.

Unfortunately, currently git-svn can only query Subversion properties, but cannot create, modify or delete them.

9.1.2.3. Exchanging Commits

Similar to git fetch, git svn fetch downloads new commits from the Subversion repository. In the process, git-svn fetches all new Subversion revisions, translates them into Git commits, and finally updates the Subversion tracking branches. The output is a list of downloaded Subversion revisions, the files changed by the revision, the SHA-1 sum, and the Subversion tracking branch of the resulting Git commit, e.g:

$ git svn fetch
        A   COPYING
        M   README
r21 = 8d707316e1854afbc1b728af9f834e6954273425 (refs/remotes/trunk)

You can work locally in the Git repository as usual, but there is an important restriction when uploading commits to the Subversion repository: While git-svn is capable of rendering Subversion merges to some degree (see above), it can’t map local Git merges to Subversion merges, so only linear histories should be uploaded via git svn dcommit.

To make this linearization easier, there is the command git svn rebase. It first downloads all new commits from the Subversion repository and then rebuilds the current Git branch to the appropriate Subversion tracking branch via git rebase.

Essentially, the workflow consists of the following commands:

$ git add/commit ...
$ git svn rebase
$ git svn dcommit

Figure 56 shows what git svn rebase does. First, new revisions are downloaded from the Subversion repository, in this case C. Then the remote/origin/trunk tracking branch is “advanced” so to speak, and then corresponds to the current status in the Subversion repository. Finally, the current branch (in this case master) is rebuilt using git rebase. The commit D' can now be uploaded.

Figure 56. git svn rebase integrates the newly added Subversion revision as commit C — before D, which becomes D'.

With git svn dcommit, you upload a Git commit changeset as a revision to the Subversion repository. As part of the operation, the revision is again committed to the local Git repository as a Git commit, but this time with Subversion metadata in the commit message. This, of course, changes the SHA-1 sum of the commit, as shown in Figure 57 by the different commits D' and D''.

Figure 57. After a git svn dcommit, the commit D' has a new SHA-1 ID and becomes D'' because its commit description has been changed to store meta information.

Similar to git push, you may not use git rebase or git commit --amend to modify commits that you have already uploaded with git svn dcommit.

9.1.2.4. Subversion Branches and Tags

The subcommands git svn branch and git svn tag are used to create Subversion branches and tags. For example:

$ git svn tag -m "Tag Version 2.0" v2.0

In the Subversion repository, this creates the tags/v2.0 directory, the contents of which is a copy of the current HEAD.⁠^[128] In the Git repository, a new Subversion tracking branch (remotes/origin/tags/v2.0) is created for this. The -m option optionally passes a message. If not, git-svn sets the message Create tag <tag>.

Git version 1.7.4 introduced a feature that allows you to perform Subversion merges. The feature is available to git svn dcommit via the --mergeinfo option and causes the Subversion property svn:mergeinfo to be set. The documentation for this option in the git-svn(1) man page is new in version 1.7.4.5 and later.

The following is an example of a procedure for creating a branch with git-svn, committing it in it and merging it again later, in the sense of Subversion.

First create the Subversion branch — the command works basically like git svn tag:

$ git svn branch <feature>

Then you create a local branch to work with and commit to it. The branch must be based on the Subversion tracking branch <feature>:

$ git checkout -b <feature> origin/<feature>
$ git commit ...

Then upload the commits to the Subversion repository. The git svn rebase call is only necessary if another user has made commits to the Subversion feature branch in the meantime.

$ git svn rebase
$ git svn dcommit

Now you have to transfer the merge information separately. To do this, proceed as follows: First you merge the branch locally in the Git repository and then upload the resulting merge commit using --mergeinfo. The syntax for this option is:

$ git svn dcommit --mergeinfo=<branch-name>:<N>-<M>

Where <branch-name> is the Subversion name of the branch, e.g. /branches/<name>, <N> the first Subversion revision that changes the branch, and <M> the last.⁠^[129] Assuming you created the branch with revision 23 and now, after two commits, want to merge the branch again, the command would be:

$ git checkout master
$ git merge --no-ff <feature>
$ git svn dcommit --mergeinfo=/branches/feature:23-25

9.2. Custom Importers

Git offers an easy and convenient way to turn any version history into a Git repository using the fast-import subcommand. The fast-import protocol is text-based and very flexible.⁠^[130]

Any kind of data can be used as a basis: be it backups, tarballs, repositories of other version control systems, or, or, or, or… An import program that you can write in any language must translate the existing history into the so-called Fast Import Protocol and output it to Standard Out. This output is then processed by git fast-import, which uses it to create a full-featured Git repository.

For simple importers who need to import a linear version history, three building blocks are important:

Data block	A data block begins with the keyword `data`, followed by a space, followed by the data length in bytes and a line break. This is immediately followed by the data, followed by another line break. The data block does not have to be ended explicitly, since its length is specified in bytes. It looks like this, for example: data 4 test
File	To pass the contents of a file, use the following format in the simplest case: `M <mode> inline <path>` followed by a data block on the next line. So to import a file `README` with the content `test` (without a final newline!) the following construct is necessary: M 644 inline README data 4 test
Commit	For a commit, you must specify the appropriate metadata (at least the committer and date, and a commit message), followed by the changed files. This is done in the following format: commit <branch> committer <who> <email> <when> <Data block for commit message> deleteall For `<branch>` use a corresponding branch on which the commit should be made, e.g. `refs/heads/master`. The name of the committer (`<who>`) is optional, but the email address is not. The format of `<when>` must be a Unix timestamp with timezone, e.g. `1303329307 +0200`.⁠^[131] Analogous to the `committer` line, you can add an `author` line. The data block forms the commit message. The final `deleteall` tells Git to forget everything about files from previous commits. So for each commit, you add all the data completely new.⁠^[132] Then follow one or more file definitions. This could look like this, for example: commit refs/heads/master committer Julius Plenz <julius@plenz.com> 1303329307 +0200 data 23 Import the README File deleteall M 644 inline README data 4 test Unless otherwise specified, commits are built upon each other in the order in which they are read (if they are on the same branch).

With these simple components we want to demonstrate how to turn old release tar balls into a Git archive using a small shell script.

First we download old releases of the editor Vim:

$ wget -q --mirror -nd ftp://ftp.home.vim.org/pub/vim/old/

For each tarball we now want to create a commit. For this we proceed as follows:

Read in archives line by line on Standard In and convert them into absolute path names (because the directory will be changed later).
For each of these archives, perform the following steps:
1. “version”, last change, current time and commit message in the appropriate variables. The time zone is hard coded for simplicity.
2. Create a temporary directory and unpack the archive there.
3. Output the corresponding lines commit, author, committer. Then the prepared commit message, whose length is counted by wc -c (byte count). Finally the keyword deleteall.
4. Output a corresponding file block for each file. The first component of the file name is discarded (e.g. ./vim-1.14/). The length of the following file is again counted using wc -c.
5. Delete the temporary directory.

All output of the script is set to Standard Out, so it can be easily piped to git fast-import. The beginning of the output looks like this:

commit refs/heads/master
author Bram Moolenaar <bram@vim.org> 1033077600 +0200
committer Julius Plenz <julius@plenz.com> 1303330792 +0200
data 15
import vim-1.14
deleteall
M 644 inline src/vim.h
data 7494
/* vi:ts=4:sw=4
 *
 * VIM - Vi IMitation
...

To create a Git repository from this output, let’s proceed as follows:

$ git init vimgit
Initialized empty Git repository in /dev/shm/vimgit/.git/
$ cd vimgit
$ ls ../vim/*.tar.gz | <import-tarballs.sh> | git fast-import
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:       5000
Total objects:         1350 (      1206 duplicates                  )
      blobs  :         1249 (      1177 duplicates        523 deltas)
      trees  :           87 (        29 duplicates          0 deltas)
      commits:           14 (         0 duplicates          0 deltas)
      tags   :            0 (         0 duplicates          0 deltas)
Total branches:           1 (         1 loads     )
      marks:           1024 (         0 unique    )
      atoms:            354
Memory total:          2294 KiB
       pools:          2098 KiB
     objects:           195 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize =   33554432
pack_report: core.packedGitLimit      =  268435456
pack_report: pack_used_ctr            =          1
pack_report: pack_mmap_calls          =          1
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =    7668864 /    7668864
---------------------------------------------------------------------

The command outputs a lot of statistical data about the import process (and aborts with a corresponding error message if the input is not understood). A subsequent reset synchronizes index, working tree and repository, and the tar-balls are successfully imported:

$ git reset --hard
HEAD is now at ddb8ffe import vim-4.5
$ git log --oneline
ddb8ffe import vim-4.5
4151b0c import vim-4.4
dbbdf3d import vim-4.3
6d5aa08 import vim-4.2
bde105d import vim-4.1
332228b import vim-4.0
...

For reference the complete script:⁠^[133]

#!/bin/sh

while read ar; do
    [ -f "$ar" ] || { echo "not a file: $ar" >&2; exit 1; }
    readlink -f "$ar"
done |
while read archive; do
    dir="$(mktemp -d /dev/shm/fi.XXXXXXXX)"
    version="$(basename $archive | sed _s/\.tar\.gz$//_)"
    mod="$(stat -c %Y $archive) +0200"
    now="$(date +%s) +0200"
    msg="import $version"

    cd "$dir" &&
    tar xfz "$archive" &&
    echo "commit refs/heads/master" &&
    echo "author Bram Moolenaar <bram@vim.org> $mod" &&
    echo "committer Julius Plenz <julius@plenz.com> $now" &&
    echo -n "data " && echo -n "$msg" | wc -c && echo "$msg" &&
    echo "deleteall" &&
    find . -type f |
    while read f; do
        echo -n "M 644 inline "
        echo "$f" | sed -e _s,^\./[^/]*/,,_
        echo -n "data " && wc -c < "$f" && cat "$f"
    done &&
    echo
    rm -fr "$dir"
done

As soon as the version history is a bit more complicated, the commands mark, from and merge become particularly interesting. By using mark you can assign an ID to any objects (commits or blobs) in order to access them as “named objects” and not always have to specify the data inline. The commands from and merge define the predecessor(s) of a commit, so that even complicated interdependencies between branches can be displayed. For more details see the man page.

10. Shell Integration

Since you usually run Git commands on the shell, you should add functionality to them to interact with Git. Especially for Git beginners, such interaction between shell and Git is very helpful to keep track of things.

There are two areas in which the shell can help you:

Display important information about a repository at the prompt.
This way you don’t have to call git status and consorts too often.
A custom completion helps you to enter git commands correctly, even if you don’t know the exact syntax.

A good prompt should signal the state of the working tree in addition to the current branch. Are there any changes that are not yet saved? Are there already changes in the index?

A good completion should, for example, when entering git checkout and then pressing the Tab key, only offer branches from the repository for completion. But if you type git checkout -- only files should be completed. This saves time and protects against typos. Other completions are also useful, such as the existing remotes for git push and git pull.

In this chapter we will introduce basic recipes for two popular shells: the Bash and the Z-Shell. Instructions for other interactive shells can be found on the Internet.

The topic of shell integration is very extensive, so the tutorials presented here are only guidelines and ideas and do not claim to be complete. To make matters worse, the git community is developing the user interface - i.e. the existing subcommands and their options - very quickly. So please don’t be surprised if the completion is “lagging behind” and brand new subcommands and options are not (yet) available.

10.1. Git and the Bash

Both the functionality for completion and the status commands for the prompt are implemented in a script called git-completion.bash. It is managed together with the sources for Git. You can find the file in the contrib/completion directory of the Git project. Often the completion is already provided by your distribution or the git installer for your operating system. If you have installed the git package in Debian or Ubuntu, the file should already be in /usr/share/bash-completion/completions/git. In Gentoo, you install the file via the USE flag bash-completion of dev-vcs/git. The current maintainer is Shawn O. Pearce.

10.1.1. Completions

To activate the completion, load the script with the command source and pass the corresponding file as argument, e.g:

source ~/Downloads/git-2.1.0/contrib/completion/git-completion.bash

The completion completes among other things:

Git subcommands

For example, if you type git pu[TAB], the bash will offer you pull and push:

$ git pu[TAB]
pull push

Note: Only the porcelain commands and user aliases are available. External and plumbing commands are not implemented. Subcommands that have additional subcommands themselves, e.g. git remote or git stash, are also completed:

$ git remote [TAB]
add     prune     rename     rm     set-head     show     update

Local Branches and Tags

Useful for subcommands, such as checkout and rebase, that expect a local reference:

$ git branch
* master
  refactor-cmd-line
  refactor-profiling
$ git checkout refactor-[TAB]
refactor-cmd-line    refactor-profiling

Configured Remotes

Commands like git fetch and git remote are often called with a remote as argument. Completion helps here too:

$ git remote show [TAB]
github        sourceforge

Remote Branches and Tags

The Completion can also “check” on the remote page to see which references are available. This is done for example with the command git pull, which expects a remote reference or a refspec:

$ git pull origin v1.7.1[TAB]
v1.7.1       v1.7.1.2     v1.7.1.4     v1.7.1-rc1
v1.7.1.1     v1.7.1.3     v1.7.1-rc0   v1.7.1-rc2

Of course this only works if the remote repository is available. In most cases a network connection and at least read access is required.

Options

Most subcommands have several long options like --bare. The completion usually knows these and completes them accordingly:

$ git diff --color[TAB]
--color         --color-words

Short options, such as -a, are not completed.

Files

For Git commands that expect file names. Good examples are git add and git checkout:

$ git add [TAB]
.git/     hello.py  README    test/
$ git checkout -- [TAB]
.git/     hello.py  README    test/

Git configuration options

The bash completion for Git also completes configuration options that you set with git config:

$ git config user.[TAB]
user.email        user.name         user.signingkey

As usual with bash completion, the input is automatically completed when it is unique. If only the Branch feature exists, typing git checkout fe[TAB] will cause fe to be completed; the command git checkout feature will then appear on the command line - press Enter to execute the command. Only when the input is ambiguous does the bash display the possible completions.

10.1.2. The Prompt

Beside the completion there is another script to display information about the git repository in the prompt. For this you have to load the file contrib/completion/git-prompt.sh (maybe it is also installed by your distribution, e.g. under /usr/lib/git-core/git-sh-prompt). Then, as in the following example, place a call to the __git_ps1 function in the PS1 variable. The function takes a so-called _format string expression_ as argument - i.e. the string %s is replaced by git infos, all other characters are taken over.

source /usr/lib/git-core/git-sh-prompt
PS1=_\u@\h \w$(__git_ps1 " (%s)") $ _

The characters are replaced as follows: \u is the username, \h is the hostname, \w is the current working directory and $(__git_ps1 " (%s)") are the git infos, which without additional configuration (see below) consist only of the branch name:

esc@creche \~ $ cd git-working/git
esc@creche ~/git-working/git (master) $

The format string expression allows you to customize the display of the git info by using additional characters or color codes, e.g. with the following prompt:

PS1=_\u@\h \w$(__git_ps1 " (git)-[%s]") $ _

This looks like this:

esc@creche ~/git-working/git (git)-[master] $

If the current commit is not referenced by a branch (Detached-HEAD), either the tag or the abbreviated SHA-1 sum is displayed, each surrounded by a pair of brackets:

esc@creche ~/git-working/git (git)-[(v1.7.1.4)] $
esc@creche ~/git-working/git (git)-[(e760924...)] $

If you are inside the $GIT_DIR or in a bare repository, this is signaled accordingly:

esc@creche ~/git-working/git/.git (git)-[GIT_DIR!] $
esc@creche ~/git-working/git.git/.git (git)-[BARE:master] $

It also indicates when you are in the middle of a merge, rebase or similar state where only certain operations are possible:

esc@creche ~/git-working/git (git)-[master|REBASE-i] $

You can also expand the display to show the status of the Working Trees using different icons. To do this, you must set the following environment variables to a non-empty value, e.g. to 1.

GIT_PS1_SHOWDIRTYSTATE

For changes that are not yet in the index (unstaged), an asterisk (*) is displayed. For changes that are already in the index (staged), a plus (+) is displayed. The display requires the working tree to be read - this may slow down the shell for large repositories (Git has to check every file for modifications). You can therefore disable this behavior for individual repositories with the Git variable bash.showDirtyState:

$ git config bash.showDirtyState false

GIT_PS1_SHOWSTASHSTATE: If you have created one or more stashes, this is indicated by the dollar sign ($) in the prompt.

GIT_PS1_SHOWUNTRACKEDFILES: The existence of (untracked files) is indicated by a percentage sign (%).

You can activate all this additional information as follows:

GIT_PS1_SHOWDIRTYSTATE=1
GIT_PS1_SHOWSTASHSTATE=1
GIT_PS1_SHOWUNTRACKEDFILES=1

If everything in the repository matches (i.e. unstaged, staged, stashed and untracked), four additional characters (*, +, $ and %) are displayed in the prompt:

esc@creche ~/git-working/git (git)-[master *+$%] $

In newer Git versions, the script has a new feature that shows the relationship to the upstream branch (@{upstream}). Enable this feature by setting GIT_PS1_SHOWUPSTREAM to the value git.⁠^[134] The prompt then signals all states described in Section 5.5.2, "Comparison with the Upstream": up-to-date with the equal sign (=); ahead with the greater-than sign (>); behind with the less-than sign (<); diverged with both a greater-than sign and a less-than sign (><). For example:

esc@creche ~/git-working/git (git)-[master >] $

This function is implemented with the --count option of the git rev-list plumbing command, which does not exist in old git versions, like 1.7.1. If you have such an old git version, but a current script and want to use this display anyway, set the value of the environment variable to legacy - the script will then use an alternative implementation that works without the said option. If you also want to know how far ahead or behind the branch is, add the value verbose. The prompt will also show the number of different commits:

esc@creche ~/git-working/git (git)-[master u+2] $

The desired values are to be assigned to the environment variable as a list:

GIT_PS1_SHOWUPSTREAM="legacy verbose git"

10.2. Git and the Z-Shell

Both completion and prompt functions are always included with the Z-Shell.

The Z-Shell has a very useful feature to call man pages: the run-help function. It is called by default with Esc+H in Emacs mode and displays the man page for the command that is already on the command line:

$ man[ESC]+[h]
#Man-Page man(1) is displayed

However, since Git consists of subcommands and each subcommand has its own man page, run-help does not work very well - only the man page git(1) is displayed. The included run-help-git function can help here:

$ git rebase[ESC][h]
#Man-Page git(1) is displayed
$ unalias run-help
$ autoload run-help
$ autoload run-help-git
$ git rebase[ESC][h]
#Man-Page git-rebase(1) is displayed

10.2.1. Completions

To activate completion for Git, first load the completion system:

$ autoload -Uz compinit && compinit

The completion completes among other things:

Git subcommands

Subcommands are also completed in the Z-Shell. The difference to Bash is that the Z-Shell displays a short description in addition to the actual command:

$ git pu[TAB]
pull     -- fetch from and merge with a remote repository
push     -- update remote refs along with associated objects

The same applies to subcommands, which themselves have subcommands:

$ git remote [TAB]
add      -- add a new remote
prune    -- delete all stale tracking branches for a given remote
rename   -- rename a remote from .git/config and update all...
rm       -- remove a remote from .git/config and all...
show     -- show information about a given remote
update   -- fetch updates for a set of remotes

As well as user aliases:

$ git t[TAB]
tag           -- create tag object signed with GPG
tree          -- alias for _log --oneline --graph --decorate -23_

Local Branches and Tags: The Z-Shell also completes local branches and tags - no difference to Bash.

Configured Remotes: Configured remotes are known to the Z-Shell. For subcommands where only a configured remote is possible, e.g. git remote show, only configured remotes are displayed. If this is not clear, e.g. git pull, additional mechanisms of the Z-Shell take effect and usually a long list is displayed, which consists of the entries in the files .ssh/config (the configured SSH hosts) and .ssh/known_hosts (hosts you have already logged in to).

Options

Unlike Bash, Z-Shell knows both long and short options and shows them including a short description of the option. Here is an excerpt:

$ git branch -[TAB]
-a              -- list both remote-tracking branches and local branches
--contains      -- only list branches which contain the specified commit
--force     -f  -- force the creation of a new branch

Files: The Z-Shell is also able to complete file names - but it is a bit smarter than Bash. For example, for git add and git checkout, only files that actually have changes are offered - files that can either be added to the index or reset. Files that do not qualify are not offered either.

Git configuration options

Like Bash, the Z-Shell completion for Git completes all configuration options for Git. The difference is that it also includes a short description of the options:

$ git config user.[TAB]
email        -- email address used for commits
name         -- full name used for commits
signingkey   -- default GPG key to use when creating signed tags

A big difference with the Z-Shell is the way it is completed. The Z-Shell uses the so-called menu completion. This means that the Z-Shell offers you the next possible completion by pressing the Tab key again.⁠^[135]

$ git pu[TAB]
pull  -- fetch from and merge with another repository or local branch
push  -- update remote refs along with associated objects
$ git pu[TAB]
$ git pull[TAB]
$ git push

The Z-Shell is not (yet) able to complete references on the remote side - but this is on the to-do list. But the Z-Shell is already able to complete files over an SSH connection. This is especially useful in connection with public key authentication and preconfigured SSH hosts. Assume you have configured the following host in .ssh/config:

Host example
    HostName git.example.com
    User max

On the server in your home directory your projects are located as bare repositories: project1.git and project2.git. You also generated an SSH key and stored it in the .ssh/authorized_keys file on the server. You can now use completion across the SSH connection.

$ git clone example:[TAB]
projekt1.git/ projekt2.git/

This is made possible by the completion functions of the Z-shell for ssh.

10.2.2. The Prompt

The Z-Shell contains functions to add git info to the prompt. The functionality is part of the extensive vcs_info system, which knows about a dozen other version control programs besides Git, including Subversion, CVS and Mercurial. Detailed documentation can be found in the zshcontrib(1) man page, in the “Gathering Information From Version Control Systems” section. Here we will only present the settings and customization options relevant to Git.

First, you need to load vcs_info and adjust the prompt to display Git info. It’s important that the Z-Shell option prompt_subst is set; it ensures that variables in the prompt are actually replaced, and you must call vcs_info in the precmd function. precmd is called just before the prompt is displayed. The call vcs_info in it makes sure that the git info is actually stored in the variable ${vcs_info_msg_0_}. Add the following lines to your .zshrc if they are not already included:

# load vcs_info
autoload -Uz vcs_info
# activate prompt_subst
setopt prompt_subst
# define precmd
precmd () { vcs_info }
# Set prompt
PS1=_%n@%m %~${vcs_info_msg_0_} $ _

The prompt is composed as follows: %n is the username, %m is the hostname, %~ is the current working directory, and the variable ${vcs_info_msg_0_} contains the git info. It is important that the prompt is specified with single quotes. This saves the string ${vcs_info_msg_0_} and not the value of the variable. Only when the prompt is displayed is the value of the variable - i.e. the git info - substituted.

The above setting for PS1 looks like this:

esc@creche ~/git-working/git (git)-[master]- $

Since vcs_info works with a lot of version control systems, it’s worth activating only those you actually use:⁠^[136]

zstyle _:vcs_info:*_ enable git

To customize vcs_info, use a so-called zstyle, a hierarchical configuration mechanism of the z-shell described in the zshmodules(1) man page.

Special states like merge or rebase operations are signaled accordingly:

esc@creche ~/git-working/git (git)-[master|bisect]- $

Also in case of a Detached-HEAD either the tag or the abbreviated SHA-1 sum is displayed:

esc@creche ~/git-working/git (git)-[v1.7.1.4] $
esc@creche ~/git-working/git (git)-[e760924...] $

The Z-Shell, like the Bash, can display states of the working tree. Switch this on with the following line:

zstyle _:vcs_info:git*:*_ check-for-changes true

For example, vcs_info shows a U for changes that are not yet in the index (unstaged) and an S for changes that you have included in the index (staged):

esc@creche ~/git-working/git (git)-[master]US- $

A big advantage of vcs_info is that it can be adapted very easily. For example, if you do not like the letters U and S, you can replace them with other characters, e.g. * and +:

zstyle _:vcs_info:git*:*_ unstagedstr _*_
zstyle _:vcs_info:git*:*_ stagedstr _+_

Thus the Zsh prompt now looks more and more like the example from the section on bash:

esc@creche ~/git-working/git (git)-[master]*+- $

To display such not yet stored information vcs_info must always examine the working tree. Since this is known to cause problems with large repositories, you can exclude certain patterns:

zstyle _:vcs_info:*_ disable-patterns "/home/esc/git-working/linux-2.6(|/*)"

Maybe you want to change the order of the characters. In this case you need to adjust two format string expressions: formats and actionformats. The first is the default format, the second is the format when you are in the middle of a merge, rebase or similar process:

zstyle _:vcs_info:git*:*_ formats " (%s)-[%b%u%c]"
zstyle _:vcs_info:git*:*_ actionformats " (%s)-[%b|%a%u%c]"

A selection of the most important characters can be found in the following table. For a detailed list, please refer to the above mentioned man page.

%s: version management system, in our case always git
%b: Current branch, e.g. master
%a: Current process, e.g. merge or rebase-i (only for actionformats)
%u: Character to indicate changes that are not yet in the index, e.g. U
%c: Character to indicate changes that are already in the index, e.g. S

With the above setting the prompt will look like this:

esc@creche ~/git-working/git (git)-[master*+] $

Unfortunately vcs_info cannot signal the existence of unknown files and created stashes by default. But since Z-Shell version 4.3.11 the system supports so called hooks — extensions, which inject additional information into the prompt. We will now introduce two such hooks, which implement the two missing features mentioned above.

The hooks for vcs_info are written as shell functions. Note that the function name has the prefix +vi- to avoid possible collisions. For a hook to really work it has to change a value in the associative array hook_com. In both examples we change the value of the entry staged by appending additional characters to mark certain states. We use the percent sign (%) to indicate unknown files and the dollar sign ($) for created stashes. The percent sign must be specified twice to prevent the Z-Shell from mistakenly interpreting it as formatting. For the hooks we use various plumbing commands (see Sec. 8.3, “Writing Your Own Git Commands”).

+vi-untracked(){
    if [[ $(git rev-parse --is-inside-work-tree 2> /dev/null) == _true_ ]] && \
        [[ -n $(git ls-files --others --exclude-standard) ]] ; then
        hook_com[staged]+=_%%_
    fi
}
+vi-stashed(){
    if git rev-parse --verify refs/stash &> /dev/null ; then
        hook_com[staged]+=_$_
    fi
}

We activate the hooks so that they are evaluated when the git info is set (+set-message):

zstyle _:vcs_info:git*+set-message:*_ hooks stashed untracked

As in the bash example above, four additional characters (*, +, $ and %) may be displayed in the prompt (unstaged, staged, stashed and untracked):

esc@creche ~/git-working/git (git)-[master*+$%] $

With such hooks it is possible to extend the prompt as desired. For example, vcs_info does not show by default whether you are inside the $GIT_DIR or in a bare repository. With an appropriate hook you can include these signals in the prompt.

For more examples, see the Misc/vcs_info-examples file in the Z-Shell repository, including a hook that indicates the upstream branch relationship (section “Compare local changes to remote changes”). A minimal configuration for the Z-Shell according to the examples in this section can be found in the Scripts Collection for this book.⁠^[137]

11. GitHub

There are currently several hosting sites that offer free git hosting for open source projects. By far the best known of all is Github.⁠^[138] Two other well-known pure git hosters are Gitorious⁠^[139] and repo.or.cz.⁠^[140] But also already established hosting sites like Sourceforge⁠^[141] and Berlios⁠^[142] now offer Git hosting.

Github was founded in 2008 by Chris Wanstrath, P.J. Hyett and Tom Preston-Werner. The platform developed in Ruby on Rails has over three million users and hosts over ten million repositories. Even if you consider that many of these repositories are so-called Forks (clones) of other repositories or so-called Gists (source code snippets), this is still a considerable number. Many well-known projects use Github nowadays to manage their source code, among others the command line tool Curl,⁠^[143] the web framework Ruby on Rails⁠^[144] and the JavaScript library jQuery.⁠^[145]

The free offer includes unlimited Git repositories — with the restriction that these are publicly available (public repositories). In addition, Github offers paid options for individuals and companies to create and use restricted access repositories (private repositories). For large companies, Github offers a solution called GitHub Enterprise.

GitHub offers all the essential features you expect from a project hosting platform, including project wikis and issue trackers. But the special thing about it is that the wiki system Gollum⁠^[146] does not use a database as backend, but only a Git repository. As markup, Github offers several syntax options,⁠^[147] including Markdown, Textile, Mediawiki and AsciiDoc.

The issue tracker is designed for Git and also lists pull requests created via the web interface. Additionally, an email backend has been integrated into the issue tracker. Your responses to the incoming emails are automatically processed by GitHub and also displayed in the web interface. But what GitHub does not offer are mailing lists — for that you have to use alternatives.

Figure 58. GitHub page from Gollum

Figure 58, “GitHub page from Gollum” shows a section of the Gollum project page. Important are the menu items Source (source code overview), Commits, Network (forums of the project with changes), Pull-Requests, Issues, Wiki and Graphs (statistical graphs). Other important controls are the button Fork as well as Downloads and also the display of the clone URL.

With GitHub, the developer is the focus of attention: repositories are always assigned to users. This is a big difference to established hosting platforms, where projects are always the main focus and users are subordinate to them. (However, it is also possible to create project accounts in GitHub, to which users are then assigned in turn — popular with private repositories and larger projects).

GitHub offers many ways to share changes. While it is possible to take a centralized approach with GitHub (see Figure 30, “Central workflow with distributed version management”) by giving others access to your own repositories, the most common form of sharing is an Integration Manager workflow (see Figure 37, “Integration Manager Workflow”).

Figure 59. Workflow at GitHub

A potential contributor forks⁠^[148] a repository at GitHub.
The public repository is then cloned and changes are made.
Commits are uploaded to the public repository.
A pull request is sent to the project author. As already mentioned, these can be created and sent directly in the web interface.
The author loads the changes from the public repository, checks whether they meet his quality standards and integrates them locally via merge or cherry pick if necessary.
The contributor’s changes are uploaded to the author’s public repository and thus merged with the software.
The contributor synchronizes his local repository with the author’s public repository.

The GitHub web interface offers a lot of Web 2.0 comfort. For example, instead of steps 5. and 6. you can merge directly via the web interface with a single click. Of course the system checks if the merge can be done without conflicts — if not, a warning will appear instead of the merge option.

Since recently it is also possible to perform steps 1, 2, 3 and 4 completely in the web interface. To do this, click on the button Fork and edit this file in a foreign repository — the repository will be automatically forked for your user account, and a web-based editor will open, where you can enter your changes and a commit message. You will then be automatically redirected to the pull-request page.

Since you can quickly lose track of many forks, GitHub provides a graphical representation of the forks with pending changes, the so-called network graph:

Figure 60. The GitHub Network-Graph

Github offers even more visualizations under Graphs. Under Languages you can see which programming languages the project uses. The Impact graph shows which developer has done when and how much. Punchcard shows the commit activity for weekdays and times of day. Traffic finally lists the number of project page views during the last three months.

As the slogan Social Coding suggests, GitHub has several features that you can also find in social networks. For example, you can follow both individual users and repositories. You will then receive a kind of GitHub news ticker in your dashboard: messages about new and closed pull requests, new commits that have been uploaded, forks, etc. The news feeds of the users and repositories are also available as RSS feeds, if you prefer external newsreaders.

A small, still relatively unknown project can therefore become known very quickly via GitHub when a critical number of “followers” is reached.

GitHub also offers a pastebin service, the Gist. However, unlike other Pastebin services, GiThub’s gist is a full-fledged Git repository. Especially for code snippets this is an interesting innovation.

GitHub also does a good job when connecting to external services. There are 50 so-called service hooks that you can use to forward messages concerning a repository to external services. Among them are old classics like e-mail and IRC, but also more modern alternatives like Twitter and Jabber.

But GitHub also offers additional “gimmicks” that are very handy. So tags automatically become source code archives for download. As you can see in Figure 61, “Downloads created from tags”, both as .tar.gz and .zip archives.

Figure 61. Downloads created from tags

For developers who often work with images, GitHub offers so-called Image View Modes.⁠^[149] They show differences between two versions of an image, similar to the script introduced in Sec. 8.1.3, “Own Diff Programs”. There are the following modes:

2-up: The two different versions are displayed side by side, see Figure 62, “The 2-up image mode”. Differences in size are also visible.

Figure 62. The 2-up image mode
Swipe: The image is split in the middle. On the left you see the old version and on the right the new one. Move the slider back and forth to see the changes. See Figure 63, “The swipe image mode”.

Figure 63. The swipe image mode
Onion Skin: A slider is also used here, but this time the new version is faded in, so there is a smooth transition between old and new.
Difference: Displays only the pixels that have been modified.

The programmers behind GitHub continue to refine the web interface and so innovative improvements are added regularly. The site has its own help page,⁠^[150] where steps with the webinterface are explained in detail with screenshots.

Appendix A: Installation

Installing Git is easy and fast, as it comes with pre-configured packages for most systems. For the sake of completeness, however, we’ll document the most important steps under Linux, Mac OS X, and Windows.

A.1. Linux

Due to the large number of Linux distributions only the installation on Debian, Fedora and Gentoo systems is described here. For other distributions, please refer to the documentation or the package management system, if necessary; of course, you can also compile and install Git from source code.

A.1.1. Debian/Ubuntu

Debian and Ubuntu provide ready-to-use packages that can be installed comfortably and quickly with the Debian package management system. The git installation is modularized, so you can install only certain parts of git if necessary.

git: main package, contains core commands (formerly git-core)
git-email: Add-on for sending patches by e-mail
git-gui: Graphical user interface
git-svn: subcommand svn to interact with Subversion repositories
git-cvs: Interaction with CVS
git-doc: documentation (will be installed under /usr/share/doc)
gitk: Program Gitk

There is also a meta-package git-all which installs all relevant packages. So on a regular workstation you should install Git as follows:

$ sudo aptitude install git-all

Under Ubuntu you can install the package git-all via the graphical package manager Synaptic.

A.1.2. Fedora

On a Fedora system, you should install Git using the package manager yum:

$ sudo yum install git

Analogous to the division into smaller packages as in Debian, certain additional features for Git are available in separate packages. To install all commands, you should install the git-all package.

A.1.3. Gentoo

Gentoo provides the ebuild dev-vcs/git. The graphical tool for creating commits (git gui) and the add-on for sending emails (git send-email) are installed by default. If you want to have a graphical user interface for viewing and editing the history (gitk) in addition, enable the USE flag `tk`. If you plan to use the Subversion interface, enable the subversion USE flag. To install via Portage, type the following command:

$ sudo emerge dev-vcs/git

A.1.4. Installation from Sources

If your distribution doesn’t provide a package for Git, it’s outdated, or you don’t have root privileges on the system, you should install Git directly from source.

Git depends on the five libraries expat (XML parser), curl (data transfer), zlib (compression), pcre (regular expressions) and openssl (encryption/hashing). You may need to compile their sources and install the libraries accordingly before proceeding.

First download the tarball of the current Git version⁠^[151] and unzip it:

$ wget https://www.kernel.org/pub/software/scm/git/git-2.1.0.tar.gz
$ tar xvf git-2.1.0.tar.gz

Now change to the git-2.1.0/ directory and compile the source code; then run make install:

$ cd git-2.1.0/
$ make -j8
$ make install

With make prefix=<prefix> you can install Git to <prefix> (default: $HOME).

A.2. Mac OS X

The Git for OS X project provides an installation program in disk image format (DMG).⁠^[152] So you can install it as usual.

A.3. Windows

The project Git for Windows provides an installation program for Microsoft Windows: msysGit. You can download the program⁠^[153] and install it as usual.

Appendix B: Repository Structure

Git stores the object database, the associated references, etc. in the so-called Git directory, often referred to as $GIT_DIR. By default, this is .git/. It exists only once for each Git repository, i.e. no additional .git/ directories are created in subdirectories.⁠^[154] Among other things, it contains the following entries:

`HEAD`	The `HEAD`, see Sec. 3.1.1, “HEAD and Other Symbolic References”. Besides `HEAD`, other important symbolic references may be stored on the top level, e.g. `ORIG_HEAD` or `FETCH_HEAD`.
`config`	The repository configuration file, see Sec. 1.3, “Configuring Git”.
`hooks/`	Contains the hooks set for this repository, see Sec. 8.2, “Hooks”.
`index`	The index or stage, see Sec. 2.1.1, “Index”.
`info/`	Additional repository information, such as patterns to be ignored (see Sec. 4.4, “Ignoring Files”) and also grafts (see Sec. 8.4.3, “Grafts: Subsequent Merges”). You can put your own information there if other tools can handle it (see e.g. the section on caching of CGit, Sec. 7.5.4, “Exploiting Caching”).
`logs/`	Log of changes to references; accessible via Reflog, see Sec. 3.7, “Reflog”. Contains a log file for each reference under `refs/` and `HEAD`.
`objects/`	The object database, see Sec. 2.2.3, “The Object Database”. For performance reasons, the objects are sorted into subdirectories that correspond to a two-character prefix of their SHA-1 sum (the commit `0a7ba55…` is stored below `0a/7ba55…`). In the subdirectory `pack/` you will find the packfiles and associated indices, which are created by the garbage collection (see below). In the `info/` subdirectory, Git will store a list of existing pack files if required.
`refs/`	All references, including branches in `refs/heads/`, see Sec. 3.1.1, “HEAD and Other Symbolic References”, tags in `refs/tags/`, see Sec. 3.1.3, “Tags — Marking Important Versions”, and remote tracking branches under `refs/remotes/`, see Sec. 5.2.2, “Remote-Tracking-Branches”.

A detailed technical description can be found in the man page gitrepository-layout(5).

Figure 64. The most important entries in .git/

B.1. Cleaning Up

As mentioned in Sec. 3.1.2, “Managing Branches”, for example, commits that are no longer referenced (whether by branches or other commits) are no longer accessible. This is usually the case if you wanted to delete a commit (or have rebuilt commits with Rebase). Git does not delete them from the object database immediately, but leaves them there for two weeks by default, even if they are no longer accessible.

Internally, Git uses the commands prune, prune-packed, fsck, repack, etc. However, the tools are automatically executed by the garbage collection with appropriate options: git gc. The tool performs the following tasks:

Delete Dangling and Unreachable Objects. These occur during various operations and can usually be deleted after some time to save space (default: after two weeks).

Re-pack Loose Objects. Git uses so-called packfiles to pack several Git objects together. (Then there is no longer one file under .git/objects/ per blob, tree and commit — these are combined into one large, zlib-compressed file).

Search existing packfiles for old (unreachable) objects and “thin out” the packfiles accordingly. If necessary, several small packfiles are combined to large ones.

Packing references. This results in so-called Packed Refs, see also Sec. 3.1, “References: Branches and Tags”.

Delete old Reflog entries. By default this happens after 90 days.

Old conflict resolutions (see Rerere, Sec. 3.4.2, “Rerere: Reuse Recorded Resolution”) are discarded (15/60 days hold time for unresolved/solved).

The garbage collection has three modes: automatic, normal and aggressive. You call the automatic mode via git gc --auto — the mode checks if there are really blatant flaws in the Git repository. What “blatant” means is configurable. The following configuration settings allow you to determine (globally or per repository) when, i.e. how many “small” files the automatic mode will clean up, i.e. how many files will be grouped into large archives.

gc.auto (Default: 6700 objects): Combine objects into a packfile.
gc.autopacklimit (Default: 50 packs): Combine packs into one large pack file.

The automatic mode is often called, among others by receive-pack and rebase (interactive). In most cases the automatic mode does nothing, because the defaults are very conservative. If it does, it looks like this:

$ git gc --auto
Auto packing the repository for optimum performance. You may also
run "git gc" manually. See "git help gc" for more information.
...

B.2. Performance

You should either significantly lower the thresholds above which the automatic garbage collection takes effect, or call git gc from time to time. This has one obvious advantage, namely that disk space is saved:

$ du -sh .git
20M     .git
$ git gc
Counting objects: 3726, done.
Compressing objects: 100% (1639/1639), done.
Writing objects: 100% (3726/3726), done.
Total 3726 (delta 1961), reused 2341 (delta 1279)
Removing duplicate objects: 100% (256/256), done.
$ du -sh .git
6.3M    .git

Individual objects under .git/objects/ have been combined into a packfile:

$ ls -lh .git/objects/pack/pack-a97624dd23<...>.pack
-r-------- 1 feh feh 4.6M Jun  1 10:20 .git/objects/pack/pack-a97624dd23<...>.pack
$ file .git/objects/pack/pack-a97624dd23<...>.pack
.git/objects/pack/pack-a97624dd23<...>.pack: Git pack, version 2, 3726 objects

You can use git count-objects to output how many files the object database consists of. Here side by side before and after the above packing process:

$ git count-objects -v
count: 1905                             count: 58
size: 12700                             size: 456
in-pack: 3550                           in-pack: 3726
packs: 7                                packs: 1
size-pack: 4842                         size-pack: 4716
prune-packable: 97                      prune-packable: 0
garbage: 0                              garbage: 0

Nowadays disk space is cheap, so a repository compressed to 30% is not a big gain. But the performance gain is not to be scoffed at. Usually one object (e.g. a commit) will result in further objects (blobs, trees). So if Git has to open one file per object (i.e. at least n blob objects for n managed files), this means n read operations on the file system.

Packfiles have two major advantages: First, Git creates an index for each pack file, which indicates which object is found in which offset of the file. In addition, the packing routine has a certain heuristic to optimize object placement within the file (so that, for example, a tree object and the blob objects it references are stored “close” to each other). This allows Git to simply map the packfile into memory (keyword: “sliding mmap”). The “search object X” operation is then nothing more than a lookup operation in the pack index and a corresponding readout of the location in the pack file, i.e. in memory. This relieves the file and operating system considerably.

The second advantage of packfiles is the delta compression. This way, objects are stored as deltas (changes) of other objects, if possible.⁠^[155] This saves memory space, but on the other hand also enables commands like git blame to detect copies of code pieces between files “inexpensively”, i.e. without much computing effort.

The aggressive mode should only be used in justified exceptional cases.⁠^[156]

Run a git gc on your publicly accessible repositories on a regular basis, e.g. via cron. Commits are always transmitted via the git protocol as packfiles, which are generated on demand, i.e. at the time of retrieval. If the entire repository is already available as one large packfile, parts of it can be extracted more quickly, and a complete clone of the repository does not require any additional computational operations (no huge packfile has to be packed). A regular garbage collection can therefore reduce the load on your server, and the user cloning process is also accelerated.

If the repository is particularly large, it can take a long time for the server to count all objects in a git clone. You can speed this up by regularly calling git repack -A -d -b from the cron-job: Git will then create a bitmap file in addition to the pack files, speeding up this process by one or two orders of magnitude.

1. https://git-scm.com

2. http://vger.kernel.org/vger-lists.html#git

3. https://git.wiki.kernel.org/index.php/Main_Page

4. https://git.wiki.kernel.org/index.php/InterfacesFrontendsAndTools

5. https://git.wiki.kernel.org/index.php/GitFaq

6. https://git.wiki.kernel.org/index.php/GitSvnCrashCourse

7. https://stackoverflow.com

8. Even if you follow the example exactly, you will not get the same SHA-1 checksums, since they are calculated from the contents of the commit, the author, and the commit time, among other things.

9. Alternatively, you can store the user-specific configuration under the XDG-compliant path .config/git/config in your home directory (or relative to your set environment variable $XDG_CONFIG_HOME).

10. If available, settings from /etc/gitconfig are also read in (with lowest priority). You can set options in this file using the --system parameter, but you need root privileges to do this. Setting git options system-wide is unusual.

11. “i18n” is a common abbreviation for the word “internationalization” — the 18 stands for the number of omitted letters between the first and last letter of the word.

12. By default, words are separated by one or more spaces, but you can specify another regular expression to determine what a word is: git diff --word-diff-regex=. See also the git-diff(1) man page.

13. This is an instruction for the Kernel, telling it which program to use to interpret the script. Typical shebang lines include #!/bin/sh or #!/usr/bin/perl.

14. Strictly speaking, the -p option leads directly to the patch mode of git add⁠’s interactive mode. However, the interactive mode is rarely used in practice — in contrast to the patch mode — and is therefore not described further here. The documentation for this can be found in the git add(1) man page in the “Interactive Mode” section.

15. Git then opens the hunk in an editor; below is a guide to editing the hunk: To delete deleted lines (prefixed with -) — i.e. not add them to the index, but keep them in the working tree! — replace the minus sign with a space (the line becomes “context”). To delete + lines, simply remove them from the hunk.

16. However, you can usually not split hunks arbitrarily. At least one line of context, i.e. a line without prefix + or -, must be in between. If you still want to split the hunk, you have to use e for edit.

17. You can see this information in gitk or with the command git log --pretty=fuller.

18. In fact, Git creates a new commit whose changes are a combination of the changes made to the old commit and the index. The new commit then replaces the old one.

19. git rm deletes a file with the next commit, but it remains in the commit history. For information on how to delete a file completely, including from the version history, see Sec. 8.4.1, “Removing Sensitive Information Afterwards”.

20. This and the following examples are from the Git repository.

21. You can download the Git repository, which is examined in detail on the following pages, with the command:
git clone git://github.com/gitbuch/objektmodell-beispiel.git

22. https://en.wikipedia.org/wiki/SHA-1, “Attacks”.

23. https://web.archive.org/web/20120701221412/http://kerneltrap.org/mailarchive/git/2006/8/27/211001

24. The technical documentation is provided in the man page gittutorial-2(7).

25. The tag object is not shown here because it is not necessary for understanding the object structure. Instead, you will find it in Figure 12, “The Tag Object”.

26. Git stores all objects under .git/objects. A distinction is made between loose objects and packfiles. “Loose” objects store the content in a file whose name corresponds to the SHA-1 sum of the content (Git stores one file per object). In contrast, packfiles are compressed archives of many objects. This is done for performance reasons: Not only is the transfer or storage of these archives more efficient, but the file system is also relieved.

27. Internally, of course, Git has mechanisms to recognize blobs as deltas of other blobs and to tie them together to packfiles to save space.

28. These two properties, directional and acyclic, are the only necessary constraint to be placed on a graph that represents changes over time: Neither can future changes be referenced (the direction of the edges always points to the past), nor can you arrive at a point from which the path is already marked (circular reasoning).

29. Of course, this does not prevent you from setting a branch to a commit “somewhere in the middle,” which can also be useful.

30. Due to the fact that the order of the direct ancestors is stored during a merge, it is important to always merge from the smaller to the larger branch, e.g. topic to master. If you then want to use master^^ to examine commits in the master branch, don’t land on commits from the topic branch all at once (see also Sec. 3.3, “Merging Branches”).

31. To find out how Git checks a reference for validity, see the git-check-ref-format(1) man page.

32. How long they stay there is determined by corresponding settings for the garbage collection (maintenance mechanisms), see Sec. B.1, “Cleaning Up”.

33. A detailed overview of the advantages and disadvantages of both schemes as well as a description of the release process etc. can be found in chapter 6 of the book Open Source Projektmanagement by Michael Prokop (Open Source Press, Munich, 2010).

34. To add such a tagged blob to a repository, use the following command: git tag -am "<description>" <tag-name> $(git hash-object -w <file>).

35. These are the commits captured with git log v1.7.1..28ba96a.

36. To verify that the changes in your new branch are the same as the old one, use git diff <reorder-feature> <feature> — if the command does not produce output, the branches will contain identical changes.

37. It is not absolutely necessary that a merge base exists; for example, if you manage multiple root commits in a repository (see Sec. 4.7, “Multiple Root Commits”) and then merge the branches built on top of them, there will be no common base if no merge has taken place before. In this case, a file that exists in different versions on both sides creates a conflict.

38. The following description explains the approach of the resolve strategy. It differs only slightly from the standard recursive strategy, see also the detailed description of this strategy in Sec. 3.3.3, “Merge Strategies”.

39. The recursive strategy is therefore only much more intelligent than resolve if the topology of the commits (i.e., the order in which they branched and merged) is much more complicated than simply branching and then merging.

40. Die für den Merge relevanten Commits, die etwas an der Datei output.c geändert haben, kann Beatrice mit git log --merge -p — output.c auflisten.

41. http://kdiff3.sourceforge.net

42. https://meld.sourceforge.net

43. In Vimdiff you can use Ctrl+W followed by movement with the arrow keys or h, j, k, l to move the window in the appropriate direction. With dp or do you move changes to the other side or apply them from there (diff put — diff obtain).

44. The message Automatic merge failed simply means that a conflict occurred that could not be solved by a 3-way merge. Since Rerere cannot guarantee a meaningful solution, the solution is only “provided,” but not considered the ultimate solution of the conflict.

45. More useful tips can be found in Ch. 6, Workflows.

46. AsciiDoc is a simple, wiki-like markup language: https://asciidoc.org. The Git documentation is in this format and is converted to HTML pages and man pages, and this book was also written in AsciiDoc!

47. For example, the repository of the Git project itself manages the autogenerated HTML documentation in a branch html, which is completely separated from the development branches. This way, merges between the code branches cannot lead to conflicts due to differently compiled HTML documentation. How to create such “decoupled” branches is described in Sec. 4.7, “Multiple Root Commits”.

48. This is because the merge command does not examine each commit individually. Instead, it compares three trees that contain these changes among others, see Sec. 3.3.1, “Two-Branches Merge”.

49. This is because rebase works internally with cherry-pick, which recognizes when the changes that would be introduced by the commit are already present. A similar functionality is provided by git cherry or git patch-id, which can detect almost identical patches.

50. Maybe the character ^ has a special meaning in your shell (this is for example the case in the Z-Shell or rc-Shell). In this case you have to mask the character, i.e. enclose the argument in quotation marks or prefix it with a backslash. In the Z-Shell, there is also the command noglob, which you use to precede git to remove the special meaning of ^.

51. This is not necessarily a commit — these can also be tags or blobs.

52. Whether or not the commits have fallen out because of their age depends, of course, on how often you perform a garbage collection via git gc. See also Sec. B.1, “Cleaning Up”.

53. If you want to list all commits in the last two weeks, use git log --since='two weeks ago' instead.

54. If you’re managing patch stacks with Git that have potential conflicts, you should definitely take a look at the Reuse Recorded Resolution feature, in short, rerere. Rerere saves conflict resolutions and automatically corrects conflicts if a resolution has already been saved, see also Sec. 3.4.2, “Rerere: Reuse Recorded Resolution”.

55. For example, by uploading the branch to a publicly available repository, see Sec. 5.4, “Uploading Commits: git push”.

56. In the latter case, for example, you simply do a git remote update (the new commits are loaded into the origin/master branch) and then build your own branch from scratch to origin/master. See also Sec. 5.1, “How Does Distributed Version Control Work?”.

57. You can find the source code at https://repo.or.cz/w/topgit.git.

58. Short stg or StGit, reachable under https://stacked-git.github.io.

59. This also works fine as long as all branches and merges are above the new reference (i.e. only commits are included from which you can reach the new base). Otherwise, rebase will fail for every commit already in history (error message: “nothing to commit”); these must always be skipped with a git rebase --continue.

60. More examples can be found on the gitignore(5) man page and at https://docs.github.com/en/free-pro-team@latest/github/using-git/ignoring-files.

61. This behavior can be prevented by setting the clean.requireForce setting to false.

62. The command first selects all commit objects that are no longer accessible, and then restricts the list to those that are merge commits and whose commit message contains the string WIP — the properties that a commit object created as a stash has, see Sec. 4.5.7, “How Is the Stash Implemented?”.

63. That’s not quite true; you can only store one note per commit under refs/notes/commits, but you can store additional notes under e.g. refs/notes/bts that relate to the bug tracking system, and only one per commit there.

64. Of course, this commit need not be the core of the regression, it may have been prepared by a completely different commit.

65. We developed the cheat sheet in connection with various Git workshops. It is licensed under a Creative Commons License and is managed with the Git hosting platform GitHub, which we describe in Ch. 11, GitHub beschreiben, verwaltet.

66. Strictly speaking, Git does not “blindly” check out the master branch. In fact, Git looks up which branch the HEAD of the other side references and checks it out.

67. For more information on the Git protocol, see Sec. 7.1.1, “The Git Protocol” (see also Sec. 3.1.1, “HEAD and Other Symbolic References”).

68. For a complete list of possible URLs, see the git-clone(1) man page in the “Git URLs” section.

69. The asterisk (*) is also interpreted as a wildcard like the Shell and considers all files in a directory.

70. Remote tracking branches are only intended to track the branches in a remote. Checking out a remote tracking branch will result in a detached head state and warning.

71. Merging from origin/master to master is a normal merging process. In the example above, no further local commits have been made in the meantime and therefore no merge commits have been created. The master has been fast-forwarded to origin/master.

72. But the “forcing” only takes place locally: The recipient server can prevent the upload despite the specification of the option -f. This is done with the receive.denyNonFastForwards option, or the RW rights assignment for Gitolite (see Sec. 7.2.2, “Configuring Gitolite”).

73. This is the default behavior since version 2.0 (push.default=simple). Earlier Git versions used the push.default=matching setting without any further configuration, which can be buggy, especially for beginners.

74. In Git jargon such remotes are called anonymous.

75. The syntax <tag>^{} dereferences a tag object, so returns the commit, tree or blob object to which the tag points.

76. For example with the alias push = push --tags.

77. See the git-format-patch(1) man page for information on how to customize the numbering, text and file suffix.

78. The number n is the total number of patches exported and m is the number of the current patch. For example, the subject line of the third patch of five would read [PATCH 3/5].

79. You can see in Figure 39, “Patch series as mail thread” a slightly different order of patches than in the previous examples. This is because the first version of the patch series consisted of only two patches, and the third one was added after feedback from the Git mailing list. The series was then expanded and rebased to the state as shown in this section.

80. If no Mail Transfer Agent (MTA) is installed on your system or configured to send e-mail, you can also use an external SMTP server. To do so, adjust the settings described in the section “Use GMail as the SMTP server” of the already mentioned man page.

81. https://dpaste.com

82. https://gist.github.com

83. Useful tips and tricks for various MUAs can be found in the Documentation/SubmittingPatches file in the Git-via-Git repository in the “MUA specific hints” section, and in the git-format-patch(1) man page in the “MUA specific hints” and “Discussion” sections.

84. For the Git project, you can find them at Documentation/SubmittingPatches in the source code repository.

85. The libgit.a is created when compiling Git and gathers all functions that are “public” in Git. However, it is not reentrant or thread-safe, so its use is limited. libgit2 does not have these restrictions.

86. The command is not a standard command of Git, but is installed automatically by some Linux distributions (e.g. Debian, Archlinux) and in the Windows Git installer. Check by calling git subtree whether the command is installed. If not, you can search for the script under /usr/share/doc/git/contrib/subtree/, or copy it from the source code of Git (under contrib/subtree).

87. Therefore, make sure that you only include content that you are allowed to pass on using this technology. Depending on the license, the use of a software may be allowed, but not the distribution.

88. Among others, the third chapter of Open Source Projektmanagement by Michael Prokop (Open Source Press, Munich, 2010) is recommended. The Manifesto for Agile Software Development also provides informative information at http://agilemanifesto.org.

89. An exception is if you need a new development in the mainline in your topic branch, but in that case you can consider rebuilding the topic branch via rebase so that it already contains the required functionality.

90. You can find further suggestions in chapter 6 of the book Open Source Projektmanagement by Michael Prokop (Open Source Press, Munich, 2010).

91. Each commit references exactly one tree. However, git archive behaves differently depending on whether you specify a commit (which references a tree) or a tree directly: For trees, the time of the last modification included in the archive is the system time — but for a commit, the time of the commit is set.

92. A more detailed description can be found in the Git source repository in the Documentation/technical directory. There you can find three files that explain the packfile format, partly based on explanations by Linus Torvalds on IRC: pack-format.txt, pack-heuristics.txt, pack-protocol.txt. Modern versions of Git also use an additional “Bitmap Reachability Index,” which is explained in bitmap-format.txt.

93. The installation and configuration described here refers to Gitolite version 3.6. Since Gitolite version 1.5, which was described in the first edition of this book, there have been some incompatible changes, which you can read about here: https://gitolite.com/gitolite/migr.html

94. A user can only authenticate to an SSH server with his private key if he can decrypt a message encrypted with his public (and Gitolite’s) key. Gitolite can derive the internal user name from the key the user authenticates against.

95. Some distributions also provide ready-made packages of Gitolite. However, it is not recommended to use them because they are usually outdated and are installed globally and with a certain configuration. If you then choose a different username than the one chosen by the developers, you will have to spend a lot of extra effort to get Gitolite working.

96. A release candidate of a software is a pre-release version of a new release that is made available to the public (and not only to a small group of beta testers). Only bug fixes are then incorporated into the final release. Version 1.0 RC 1 (v1.0-rc1) is followed by RC 2 (v1.0-rc2) etc. until version 1.0 is released (v1.0).

97. Of course, Gitolite cannot prohibit read-only access to a subdirectory; this would make the concept of the Git object model with its cryptographically guaranteed integrity absurd.

98. Please also note that this may again cause problems when creating branches, see above.

99. The documentation can be found at https://gitolite.com. The author has also published the book Gitolite Essentials (Packt Publishing, 2014).

100. Strictly speaking, it is necessary for the copied HEAD to match that of the opposite side. Better still, check a version tag signed by a developer.

101. In some distributions, such as Debian, the daemon is called openbsd-inetd.

102. The program sv is part of the init framework runit (http://smarden.org/runit/). It replaces the functionality of SysV-Init, but can also be integrated into it.

103. Note that an instance of the Git daemon is not “expensive.” Packing the requested objects together is, however. So just because your server can handle dozens of HTTP requests per second doesn’t mean it can handle the same number of Git connections.

104. Note that the order in the alias.url directive is important. If you use the line "/" => … to the top, Lighttpd will no longer start or the alias assignment will not be the desired one.

105. The tool checkinstall automatically builds Debian or RPM packages containing all files that would have been installed by make install. Homepage of the program: https://www.asic-linux.com.mx/~izto/checkinstall/

106. You can download the program indent from the GNU project from https://www.gnu.org/software/indent/.

107. The convert command is part of the ImageMagick suite. If you replace -clone 1-2 with -clone 0.2, the different areas are copied from the old image.

108. The graphics were created for the release of Kernel 2.0 by Larry Ewing and can be found at https://www.isc.tamu.edu/~lewing/linux/.

109. “Server-side” here only means that they are not executed in the local repository, but on the “opposite side”.

110. If Git were to include full permissions, then a file with the same contents would not be the same blob for two different developers using different umask(2) settings. To prevent this from happening, Git uses a simplified permission management system.

111. For example, you can have your shell scripts automatically checked at https://www.shellcheck.net/.

112. The Debian Alquimist Shell, a fork of the Alquimist Shell, is a very small, fast shell which is POSIX compatible. It provides the standard Shell /bin/sh on many modern Debian systems as well as on Ubuntu.

113. https://github.com/gitbuch/buch-scripte

114. There are other flags (U, T and B), but in practice they usually play no role.

115. https://git.wiki.kernel.org/index.php/Aliases

116. In principle, you cannot specify a predecessor. Then the corresponding commit becomes a root commit.

117. https://git.wiki.kernel.org/index.php/Interfaces,_frontends,_and_tools#Interaction_with_other_Revision_Control_Systems

118. http://rsvndump.sourceforge.net/

119. If there exist several directories, which contain branches and/or tags, you specify them by several arguments -t or -b.

120. If you did not specify a trunk per -T or --stdlayout during conversion, a single branch called remote/git-svn will be generated.

121. The script is included in the script collection for this book. See: https://github.com/gitbuch/buch-scripte.

122. Basically you can also perform these operations directly with the command mv below .git/refs/. However, the plumbing commands make it possible to handle “exotic” cases like “Packed Refs” or references that are symlinks correctly. In addition, git update-ref writes corresponding entries in the reflog and issues error messages if something goes wrong. See also Sec. 8.3, “Writing Your Own Git Commands”.

123. You can also find this script in the script collection: https://github.com/gitbuch/buch-scripte.

124. https://github.com/nothingmuch/git-svn-abandon

125. https://gist.github.com/hartwork/fa275bedf8c2addeeb57

126. https://web.archive.org/web/20160118021532/http://gitorious.org/svn2git/svn2git

127. In the Git-via-Git repository under contrib/svn-fe

128. Compare the command: svn copy trunk tags/v2.0

129. Compare the Subversion command: svn merge -r 23:25 branches/feature trunk

130. For detailed technical documentation, see the git-fast-import(1) man page.

131. You can use the --date-format option to allow other date formats if required.

132. Although this leads to a little more computing effort, it simplifies the structure of the import program considerably. From the point of view that import software is usually rarely executed and time does not play a critical role, this approach makes sense.

133. The script is available as part of our script collection at https://github.com/gitbuch/buch-scripte.

134. If you use git-svn, you can tell the script to use the SVN upstream (remotes/git-svn) for comparison (if it exists) instead of the upstream branch by setting the variable to the value auto.

135. The zshcompsys(1) man page describes how to further customize the completion. Especially the options group-name and menu-select are recommended.

136. A list of available systems can be obtained by calling the vcs_info_printsys function.

137. https://github.com/gitbuch/buch-scripte

138. https://github.com

139. https://web.archive.org/web/20150303192558/http://gitorious.org

140. https://repo.or.cz

141. https://sourceforge.net

142. https://www.berlios.de

143. https://curl.se

144. https://rubyonrails.org

145. https://jquery.com

146. https://github.com/gollum/gollum

147. https://github.com/github/markup

148. Not to be misunderstood as a project fork, where a project splits due to internal differences.

149. https://github.com/blog/817-behold-image-view-modes

150. https://help.github.com

151. https://www.kernel.org/pub/software/scm/git/

152. https://code.google.com/p/git-osx-installer/

153. https://gitforwindows.org

154. Since a bare repository (see Sec. 7.1.3, “Bare Repositories: Repositories Without Working Tree”) does not have a working tree, the contents normally located in .git form the top level in the directory structure, and there is no additional .git directory.

155. This is not to be confused with version control systems that store incremental versions of a file. Within packfiles, objects are packed independently of their semantic context, i.e. especially their temporal sequence.

156. A detailed discussion of the topic can be found at https://metalinguist.wordpress.com/2007/12/06/the-woes-of-git-gc-aggressive-and-how-git-deltas-work/