Book Status
This document is still in Beta version, but fully translated; so enjoy reading it and leave us some feedback on how we might improve it. We’re currently proofreading and polishing the entire text, fixing some styling and formatting issues.
Any help with proofreading is much appreciated; if you wish to contribute submit your changes via pull request on the |
Preface
Git was developed in early 2005 by Linus Torvalds, the creator and current maintainer of the Linux kernel. For the management of the kernel sources, the development team had initially decided to use the commercial version control system BitKeeper. Problems arose when the company behind BitKeeper, which provided the tool to the project free of charge, accused a developer of revealing the mechanisms of the software by reverse engineering. As a result, Torvalds decided to write a new version control system.
Simply switching to another system was not an option: The alternatives had a centralized architecture and did not scale well enough. The requirements of the kernel project on a version control system are, however, also huge: Between a little version jump (e.g. 2.6.35 to 2.6.36) there are over 500,000 changed lines in almost 1000 files. Responsible for this are over 1000 individuals.
So what were the Design Goals of the new program? Two characteristics crystallized quickly as design goals: speed or performance and verifiable integrity of the managed data.
After only a few weeks of work, a first version of Git was able to manage its own source code. Implemented as a small shell script collection with performance-critical parts in C, this version was still far from being a “full-fledged” version control system.
Since version 1.5 (February 2007), Git offers a new and tidier user interface and extensive documentation, allowing people not directly involved in Git development to use it.
The basic concepts have remained the same up to current versions: First and foremost, the object model and index, key features that distinguish Git from other VCS.
The Unix philosophy of “one tool, one job” is also consistently applied here; the subcommands of Git are each independent, executable programs or scripts.
Even in the 2.0 version there are still (as at the beginning of the development) some subcommands with shell scripts implemented (e.g. git pull
).
Linus Torvalds himself does hardly any programming on Git these days; a few months after the first release, Junio C. Hamano took over as maintainer.
Not only the revolutionary approach of Git, but also the fact that the entire kernel development was migrated to Git quickly and successfully has given Git a steep rise. Many projects, some of them very large, now use Git and benefit from the flexibility it has gained.
Who Is This Book Intended For?
The book is aimed at both professional software developers and users who want to work on small scripts, web pages or other documents or who want to get actively involved in an (open source) project. It teaches basic version control techniques, introduces the basics of Git, and explains all the major use cases.
Work that you don’t manage with a version control system is work that you might have to do again—whether it’s because you accidentally delete a file or consider parts obsolete that you need later. For any form of productive text and development work, you need a tool that can record and manage changes to files. Git is flexible, fast, and equally suited for small projects by individuals or large projects involving hundreds of developers, such as the Linux kernel.
Developers who already use a different version control system can benefit from switching to Git. Git allows a much more flexible way of working and is in many respects not as restrictive as comparable systems. It supports true merging and guarantees the integrity of managed data.
Git also benefits open source projects, because each developer has his or her own repository, which prevents disputes over commit privileges. Git also makes it much easier for newcomers to get started.
Although most of the examples and techniques presented refer to source code, there is no fundamental difference to managing documents written in LaTeX, HTML, AsciiDoc or related formats.
How to Read the Book?
Ch. 1, Introduction and First Steps gives a brief overview: How do you initialize a git repository and manage files in it? It also covers the most important configuration settings.
Ch. 2, The Basics covers two key concepts of Git: the index and the object model. Along with other important commands that are introduced there, understanding these two concepts is essential to the safe use of Git.
Ch. 3, Practical Version Control discusses practical aspects of version control. In particular, it covers the branches and merges that are so central to Git. It also discusses how to resolve merge conflicts in detail.
Ch. 4, Advanced Concepts discusses advanced concepts, with a special focus on the Rebase command, an essential tool for any git professional. Other important commands follow, including Blame, Stash, and Bisect.
Only Ch. 5, Distributed Git looks at the distributed aspects of Git: how to share changes between repositories, how developers can collaborate. Then Ch. 6, Workflows provides an overview of strategies for coordinating development work in a project.
We recommend that you read at least the first five chapters in a row. They describe all the important concepts and techniques for using Git safely in large projects. You can read the following chapters in any order, depending on your interests and needs.
Ch. 7, Git Servers covers installation and maintenance of Git services: two web-based repository browsers and access management for hosted repositories using Gitolite.
Ch. 8, Git Automation summarizes various aspects of automation: How to write hooks and custom Git commands, and how to rewrite the complete version history if necessary.
Finally, Ch. 9, Interacting with Other Version Control Systems discusses migration from other systems to Git. The focus here is on converting existing Subversion repositories, and on the ability to talk to Subversion from within Git.
The appendices deal with the installation and integration of Git into the shell. An outlook on the hosting service Github and a detailed description of the structure and maintenance mechanisms of a git repository provide further background information.
Conventions
The examples are only executed on the shell. Even though some editors and IDEs now offer quite a good Git integration, and even though there are a lot of graphical front-ends for Git, you should first learn the basics with the real Git commands.
The shell prompt is a single dollar sign ($
); keyboard input is printed in semi-bold, like this
$ git status
To find your way around the shell faster and better, we strongly recommend adding git functionality to the shell, such as displaying the branch in the prompt (see Ch. 10, Shell Integration).
Unless otherwise noted, we refer to Git version 2.0. The examples all run with English local settings.
Newly introduced terms are written in italics.
Installation and “The Git-Repository”
The installation of Git is described in detail in App. A, Installation. Some examples use the Git source repository, the repository where Git is actively developed. This repository is also called Git-via-Git or git.git.
After you have installed Git, you can download the repository with the following command
$ git clone git://git.kernel.org/pub/scm/git/git.git
The process takes a few minutes, depending on the connection speed and server load.
Documentation and Help
A comprehensive documentation of Git is available in the form of pre-installed man pages.
Almost every subcommand has its own man page, which you can call in three equivalent ways, here for the git status
command, for example:
$ git help status $ git status --help $ man git-status
On the Git website[1] you can also find links to the official tutorial and other free documentation.
A large, vibrant community has formed around Git. The Git mailing list[2] is the lynchpin of the development: patches are sent in, new features are discussed, and questions about using Git are answered. However, the list, with sometimes more than 100 emails a day, some of them very technical, is only suitable for beginners to a limited extent.
The Git Wiki[3] contains documentation as well as an extensive link collection of tools based on Git[4] and FAQs[5].
Alternatively, the #git
IRC channel on the Freenode network provides a place to get rid of questions not already answered in the FAQs or documentation.
For those switching from the Subversion environment, the Git-SVN Crash Course[6] is recommended, a comparison of Git and Subversion commands that will help you transfer your Subversion knowledge to the Git world.
Also worth mentioning is Stack Overflow[7], a platform by programmers for programmers, on which technical issues, including Git, are discussed.
Downloads and Contacts
The sample repositories of the first two chapters and a collection of all longer scripts are available for download at http://gitbu.ch/.
If you have any comments, please contact us by e-mail at one of the following addresses: kontakt@gitbu.ch, valentin@gitbu.ch or julius@gitbu.ch.
Acknowledgements
First of all, we’d like to thank all the developers and maintainers of the Git project as well as the mailing list and the IRC channel.
Many thanks to Sebastian Pipping and Frank Terbeck for comments and tips. Special thanks to Holger Weiß for his review of the manuscript and helpful ideas. We thank the entire Open Source Press Team for the good and efficient cooperation.
Our thanks go especially to our parents, who have always supported and encouraged us.
Valentin Haenel and Julius Plenz — Berlin, June 2011
Preface to the 2nd Edition
In the 2nd edition, we have limited ourselves to carefully recording the changes in the use of Git that were introduced up to version 2.0 — in fact, many commands and error messages are now more consistent, so that in some places this represents a significant simplification of the text. Inspired by questions from Git training courses and our own experience, new hints on problems, solutions, and interesting features are included.
We thank all those who sent in corrections to the first edition: Philipp Hahn, Ralf Krüdewagen, Michael Prokop, Johannes Reinhold, Heiko Schlichting, Markus Weber.
Valentin Haenel and Julius Plenary Session — Berlin, September 2014
Preface to the Creative Commons Edition
The publisher Open Source Press, who initially convinced us to write this book at all and published it over the past few years, has ceased operations as of 31.12.2015 and has transferred all rights to the published texts back to the authors. We especially thank Markus Wirtz for the always good and productive collaboration that has connected us over many years.
Due to mainly very positive feedback on this text we decided to make it freely available under a CreativeCommons-License.
Valentin Haenel and Julius Plenz — Berlin/Sydney, January 2016
1. Introduction and First Steps
The following chapter provides a concise introduction to the basic concepts and configuration settings of Git. A small sample project shows how to put a file under version control with Git, and the commands you use to perform the most important tasks.
1.1. Basic Terminology
Some important technical terms will be used repeatedly in the following and therefore require a brief explanation. If you have experience with another version control system, you will be familiar with some of the concepts involved, though perhaps under a different name.
- Version Control System (VCS)
-
A system for managing and versioning software or other digital information. Prominent examples are Git, Subversion, CVS, Mercurial (hg), Darcs and Bazaar. Synonyms are Software Configuration Management (SCM) and Revision Control System.
We distinguish between centralized and distributed systems. In a centralized system, such as Subversion, there must be a central server where the history of the project is stored. All developers must connect to this server to view the version history or make changes. In a distributed system like Git, there are many equivalent instances of the repository, so each developer has their own repository. The exchange of changes is more flexible, and does not necessarily take place through a central server.
- Repository
-
The repository is a database where Git stores the different states of each file in a project over time. In particular, every change is packaged and saved as a commit.
- Working Tree
-
The working directory of Git (sometimes called sandbox or checkout in other systems). This is where you make all modifications to the source code. It’s often called the Working Directory.
- Commit
-
Changes to the working tree, such as modified or new files, are stored in the repository as commits. A commit contains both these changes and metadata, such as the author of the changes, the date and time, and a commit message that describes the changes. A commit always references the status of all managed files at a particular point in time. The various Git commands are used to create, manipulate, view, or change the relationships between commits.
HEAD
-
A symbolic reference to the newest commit in the current branch. This reference determines which files you find in the working tree for editing. It is therefore the “head” or tip of a development branch (not to be confused with
HEAD
in systems like CVS or SVN).
- SHA-1
-
The Secure Hash Algorithm creates a unique 160 bit checksum (40 hexadecimal characters) for any digital information. All commits in Git are named after their SHA-1 sum (commit ID), which is calculated from the contents and metadata of the commit. It is, so to speak, a content-dependent version number, such as
f785b8f9ba1a1f5b707a2c83145301c807a7d661
.
- Object model
-
A git repository can be modeled as a graph of commits, manipulated by git commands. This modeling makes it very easy to describe how Git works in detail. For a detailed description of the object model, see Sec. 2.2, “The Object Model”.
- Index
-
The index is an intermediate level between the working tree and the repository, where you prepare a commit. The index therefore indexes which changes to which files you want to package as commits. This concept is unique to Git and often causes difficulties for beginners and people switching to Git. We discuss the index in detail in Sec. 2.1.1, “Index”.
- Clone
-
When you download a Git repository from the Internet, you create a clone of that repository. The clone contains all the information contained in the source repository, especially the entire version history including all commits.
- Branch
-
A branch in the development. Branches are used in practice, for example, to develop new features, prepare releases, or to provide old versions with bug fixes. Branches are — just like the merging of branches (Merge) — extremely easy to handle in Git and an outstanding feature of the system.
master
-
Because you need at least one branch to work with Git, the Branch
master
is created when you initialize a new repository. The name is a convention (similar totrunk
in other systems); you can rename or delete this branch as you wish, as long as at least one other branch is available. Themaster
is technically no different from other branches.
- Tag
-
Tags are symbolic names for hard-to-remember SHA-1 sums. You can use tags to mark important commits, such as releases. A tag can simply be an identifier, such as
v1.6.2
, or it can contain additional metadata such as author, description, and GPG signature.
1.2. First Steps with Git
To get you started, we’ll use a small example to illustrate the workflow with Git. We create a repository and develop a one-liner, a “Hello, World!” program in Perl.
In order for Git to assign a commit to an author, you need to enter your name and email address:
$ git config --global user.name "John Doe" $ git config --global user.email "john.doe@example.com"
Note that a subcommand is specified when Git is called, in this case config
.
Git provides all operations through such subcommands.
It is also important that no equal sign is used when calling git config
.
The following call is therefore incorrect:
$ git config --global user.name = "John Doe"
This is a trip hazard, especially for beginners, because Git does not output an error message, but takes the equals sign as the value to set.
1.2.1. Our First Repository
Before we use Git to manage files, we need to create a repository for the sample project. The repository will be created locally, so it will only be on the file system of the machine you are working on.
It’s generally recommended that you practice using Git locally first, and only later dive into the decentralized features and functions of Git.
$ git init example Initialized empty Git repository in /home/esc/example/.git/
First, Git creates the directory example/
if it doesn’t already exist.
Git then initializes an empty repository in this directory and creates a subdirectory .git/
for it, which is used to manage internal data.
If the example/
directory already exists, Git creates a new Git repository in it.
If both the directory and a repository already exist, Git does nothing.
We change to the directory and look at the current state with git status
:
$ cd example $ git status On branch master Initial commit nothing to commit (create/copy files and use "git add" to track)
Git tells us that we’re about to commit (Initial commit
), but hasn’t found anything to commit (nothing to commit
).
Instead, it gives a hint as to what the next steps should be (most Git commands do that, by the way): “Create or copy files, and use git add
to manage them with Git.”
1.2.2. Our First Commit
Now let’s give Git a first file to manage, which is a “Hello World!” program in Perl. Of course, you can write any program in the programming language of your choice instead.
We’ll first create the hello.pl
file with the following content
print "Hello World!\n";
and execute the script once:
$ perl hello.pl Hello World!
That means we’re ready to manage the file with Git.
But let’s take a look at the output of git status
first:
$ git status On branch master Initial commit Untracked files: (use "git add <file>..." to include in what will be committed) hello.pl nothing added to commit but untracked files present (use "git add" to track)
While the first commit is still pending, Git registers that there are already files in that directory, but the system is unaware of them — Git calls them untracked
.
This is, of course, our little Perl program.
To manage it with Git, we use the command git add <file>
:
$ git add hello.pl
The add
generally stands for “add changes” — so you will need it whenever you have edited files, not just when you first add them!
Git doesn’t provide output for this command.
Use git status
to check if the call was successful:
$ git status On branch master Initial commit Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: hello.pl
Git will apply the changes — our new file — at the next commit. However, this commit is not yet complete — we’ve only prepared it so far.
To be precise, we’ve added the file to the Index, an intermediate stage where you collect changes that will be included in the next commit. For further explanation of this concept, see Sec. 2.1.1, “Index”.
With git status
, under Changes to be committed
, you can always see which files are in the Index, i.e., will be included in the next commit.
Everything is ready for the first commit with the git commit
command.
We also pass the -m
option on the command line with a commit message describing the commit:
$ git commit -m "First version" [master (root-commit) 07cc103] First version 1 file changed, 1 insertion(+) create mode 100644 hello.pl
Git will confirm that the process has been successfully completed and the file will be managed from now on.
The somewhat cryptic output means Git has created the initial commit (root-commit
) with the appropriate message.
A line has been added to a file, and the file has been created with Unix permissions 0644
.[8]
As you’ve no doubt noticed by now, git status
is an indispensable command in your daily work — we’ll use it again here:
$ git status On branch master nothing to commit, working directory clean
Our sample repository is now “clean”, because there are no changes in the Working Tree or Index, nor are there any files that are not managed with Git (untracked files).
1.2.3. Viewing Commits
To conclude this brief introduction, we’ll introduce you to two very useful commands that you’ll often use to examine the version history of projects.
First, git show
allows you to examine a single commit — it’s the most recent one, with no arguments:
$ git show commit 07cc103feb393a93616842921a7bec285178fd56 Author: Valentin Haenel <valentin.haenel@gmx.de> Date: Tue Nov 16 00:40:54 2010 +0100 First version diff --git a/hello.pl b/hello.pl new file mode 100644 index 0000000..fa5a091 --- /dev/null +++ b/hello.pl @@ -0,0 +1 @@ +print "Hello World!\n";
You see all relevant information about the commit: the commit ID, the author, the date and time of the commit, the commit message, and a summary of the changes in Unified-Diff format.
By default, git show
always prints the HEAD
(a symbolic name for the most recent commit), but you could also specify, for example, the commit ID, which is the SHA-1 checksum of the commit, a unique prefix to it, or the branch (master
in this case).
Thus, the following commands are equivalent in this example:
$ git show $ git show HEAD $ git show master $ git show 07cc103 $ git show 07cc103feb393a93616842921a7bec285178fd56
If you want to view more than one commit, git log
is recommended.
More commits are needed to demonstrate the command in a meaningful way; otherwise, the output would be very similar to git show
, since the sample repository currently contains only a single commit.
So let’s add the following comment line to the “Hello World!” program:
# Hello World! in Perl
For the sake of the exercise, let’s take another look at the current status with git status
:
$ git status On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: hello.pl no changes added to commit (use "git add" and/or "git commit -a")
After that, as already described in the output of the command, use git add
to add the changes to the index.
As mentioned earlier, git add
is used both to add new files and to add changes to files already managed.
$ git add hello.pl
Then create a commit:
$ git commit -m "Comment line" [master 8788e46] Comment line 1 file changed, 1 insertion(+)
Now git log
shows you the two commits:
$ git log commit 8788e46167aec2f6be92c94c905df3b430f6ecd6 Author: Valentin Haenel <valentin.haenel@gmx.de> Date: Fri May 27 12:52:58 2011 +0200 Comment line commit 07cc103feb393a93616842921a7bec285178fd56 Author: Valentin Haenel <valentin.haenel@gmx.de> Date: Tue Nov 16 00:40:54 2010 +0100 First version
1.3. Configuring Git
Like most text-based programs, Git offers a wealth of configuration options. So now’s the time to do some basic configuration. These include color settings, which are turned on by default in newer versions, to make it easier to capture the output of Git commands, and small aliases (abbreviations) for frequently needed commands.
You configure Git with the git config
command.
The configuration is saved in a format similar to an INI file.
Without specifying further parameters, the configuration applies only to the current repository (.git/config
).
With the --global
option, it is stored in the .gitconfig
file in the user’s home directory, and is then valid for all repositories.[9]
Important settings that you should always configure are the user name and e-mail address:
$ git config --global user.name "John Doe" $ git config --global user.email "john.doe@example.com"
Note that you must protect spaces in the setting value (using quotation marks or backslashes).
Also, the value follows the name of the option directly — an equal sign is not necessary here either.
The result of the command can be found in the file ~/.gitconfig
:
$ less ~/.gitconfig [user] name = John Doe email = john.doe@example.com
The settings are now “global”, meaning they apply to all repositories you edit under that user name.
If you want to specify an e-mail address other than your globally defined one for a particular project, simply change the setting there (this time, of course, without adding --global
):
$ git config user.email maintainer@project.example.com
When querying an option, Git will first use the setting in the current repository if it exists, otherwise the one from the global .gitconfig
; if this does not exist either, it will fall back to the default value.[10]
The latter is available for all options in the man page git-config
.
You can get a list of all the settings you have set using git config -l
.
You can also edit the .gitconfig
file (or the repository .git/config
) by hand.
This is especially useful for deleting a setting — although git config
also offers a --unset
option, it is easier to delete the corresponding line in an editor.
The commands |
Note, however, that when you set options with an appropriate command, Git automatically protects problematic characters in the option’s value so that no bad configuration files are created.
1.3.1. Git Aliases
Git offers you the possibility to abbreviate single commands and even whole command sequences via Aliases. The syntax is:
$ git config alias.<alias-name> <command>
To set st
as an alias for status
:
$ git config --global alias.st status $ git st On branch master ...
You can also include options in an alias, for example:
$ git config --global alias.gconfig 'config --global'
You will find more useful aliases later in the book; how to create more complex aliases is described in Sec. 8.3.8, “Extended Aliases”. But first, some useful abbreviations:
[alias] st = status ci = commit br = branch co = checkout df = diff he = help cl = clone
1.3.2. Adjusting Colours
Very helpful is the color.ui
option, which checks whether Git should color the output of various commands.
Thus, deleted files and lines appear red, new files and lines appear green, commit IDs appear yellow, etc.
In newer Git versions (1.8.4 and later) this setting is already set automatically, so you don’t need to do anything.
The color.ui
option should be set to auto
— if the output from Git is to a terminal, colors are used.
If the command is written to a file instead, or the output is piped to another program, Git will not output color sequences, as this could interfere with automatic processing.
$ git config --global color.ui auto
1.3.3. Configuring Character Sets
Unless set otherwise, Git assumes UTF-8 as the character encoding for all text, especially author names and the commit message. If you want a different encoding, you should configure it explicitly:[11]
$ git config i18n.commitEncoding ISO-8859-1
Similarly, the setting i18n.logOutputEncoding
determines the character set Git converts names and commit messages to before outputting them.
The encoding of the files managed by Git is not important here and is not affected by these settings — files are only bit streams that Git does not interpret.
If you have to handle files encoded according to ISO-8859-1 in a UTF-8 environment, you should adjust the setting of your pager (see below) accordingly. The following setting is recommended for authors: $ git config core.pager 'env LESSCHARSET=iso8859 less' |
1.3.4. Line End Settings
Since Git runs on Windows systems like it does on unixoid systems, it has to solve the problem of different line-end conventions. (This only affects text files — binaries that Git recognizes as such are excluded from this treatment).
The core.eol
setting, which can take one of the values lf
, crlf
or native
, is mainly relevant for this.
The default setting native
lets Git use the system default — Unix: Line Feed (lf
) only, Windows: Carriage Return & Line Feed (crlf
).
The file is automatically converted to get line feeds only, but is checked out with CRLF if necessary.
Git can convert between the two types when you check out the file, but it’s important not to mix the two.
For this, the core.safecrlf
option provides a mechanism to warn the user (value warn
) or even disallow the commit (value true
).
A safe setting, which also works with older Git versions on Windows systems, is to set core.autocrlf
to input
: This will automatically replace CRLF with LF when reading files from the filesystem.
Your editor must then be able to handle LF line endings accordingly.
You can also specify these settings explicitly per file or subdirectory, so that the format is the same across all platforms (see Sec. 8.1, “Git Attributes — Treating Files Separately”).
1.3.5. Editor, Pager and Browser Settings
Git automatically starts an editor, pager, or browser for certain actions. Usually reasonable defaults are used, but if not, you can configure your preferred program with the following options:
-
core.editor
-
core.pager
-
web.browser
A word about the pager: By default, Git uses the less
program, which is installed on most basic systems.
The command is always started whenever a Git command produces output on a terminal.
However, less
is automatically configured by an environment variable to quit when the output is completely fit on the terminal.
So, if a command produces a lot of output, less
will automatically come to the foreground — and remain invisible otherwise.
If core.pager
is set to cat
, Git will not use a pager.
However, this behavior can be achieved from command to command using the --no-pager
parameter.
In addition, you can use git config pager.diff false
to ensure that the output of the diff command is never sent to the pager.
1.3.6. Configuration via Environment Variables
Some options can also be overridden by environment variables. In this way, options can be set in a shell script or alias for a single command only.
GIT_EDITOR
-
the editor that Git starts, for example, to create the commit message. Alternatively, Git uses the
EDITOR
variable.
GIT_PAGER
-
the pager to be used. The value
cat
switches the pager off.
GIT_AUTHOR_EMAIL
,GIT_COMMITTER_EMAIL
-
uses the appropriate email address for the author or committer field when creating a commit.
GIT_AUTHOR_NAME
,GIT_COMMITTER_NAME
-
analogous to the name.
GIT_DIR
-
Directory in which the Git repository is located; only makes sense if a repository is explicitly stored under a directory other than
.git
.
The latter variable is useful, for example, if you want to access the version history of another repository within a project without changing directory:
$ GIT_DIR="~/proj/example/.git" git log
Alternatively, you can use the -c
option before the subcommand to overwrite a setting for this call only.
For example, you could tell Git to disable the core.trustctime
option for the upcoming call:
$ git -c core.trustctime=false status
1.3.7. Automatic Error Correction
The value of the help.autocorrect
option determines what Git should do if it can’t find the subcommand you entered, for example if you accidentally type git statsu
instead of git status
.
If the option is set to a number n
greater than zero and Git only finds a subcommand similar to the typed command, this command is executed after n
tenths of a second.
A value of -1
executes the command immediately.
Unset or with the value 0
, only the possibilities are listed.
So to correct a typo after one second, set:
$ git config --global help.autocorrect 10 $ git statsu WARNING: You called a Git command named 'statsu', which does not exist. Continuing under the assumption that you meant 'status' in 1.0 seconds automatically... [...]
You can of course cancel the command during this time with Ctrl+C.
2. The Basics
In this chapter, we’ll introduce you to the most important Git commands that you can use to manage your project files in Git. Understanding the Git object model is essential for advanced usage; we’ll cover this important concept in the second section of the chapter. While these explanations may seem overly theoretical at first, we encourage you to read them carefully. All further actions will be much easier for you with the knowledge of this background.
2.1. Git Commands
The commands you learned to get started (especially add
and commit
) work on the index.
In the following, we will take a closer look at the index and the extended use of these commands.
2.1.1. Index
The content of files for Git resides on three levels: the working tree, the index, and the Git repository.
The working tree corresponds to the files as they reside on your workstation’s file system — so if you edit files with an editor, search in them with grep
, etc., you always operate on the working tree.
The repository is the repository for commits, that is, changes, with author, date, and description. The commits together make up the version history.
Unlike many other version control systems, Git now introduces a new feature, the index. It’s a somewhat elusive intermediate level between the working tree and the repository. Its purpose is to prepare commits. This means that you don’t always have to check in all the changes you have made to a file as commits.
The Git commands add
and reset
act (in their basic form) on the index, making changes to the index and deleting them again; only the commit
command transfers the file to the repository as it is held in the index (Figure 1, “Commands add
, reset
and commit
”).
add
, reset
and commit
In the initial state, i.e. when git status
outputs the message nothing to commit
, the working tree and index are synchronized with HEAD
.
The index is therefore not “empty”, but contains the files in the same state as they are in the working tree.
Usually, the workflow is then as follows: First, you make a change to the working tree using an editor.
This change is transferred to the index by add
and finally saved in the repository by commit
.
You can display the differences between these three levels using the diff
command.
A simple git diff
shows the differences between the working tree and the index — the differences between the (actual) files on your working system and the files as they would be checked in if you called git commit
.
The git diff --staged
command, on the other hand, shows the differences between the index (also called the staging area) and the repository, that is, the differences that a commit would commit to the repository.
In the initial state, when the working tree and index are in sync with HEAD
, neither git diff
nor git diff --staged
produces output.
If you want to apply all changes to all files, there are two shortcuts: First, the -u
or --update
option of git add
.
This transfers all changes to the index, but does not yet create a commit.
You can further abbreviate it with the -a
or --all
option of git commit
.
This is a combination of git add -u
and git commit
, which puts all changes to all files into one commit, bypassing the index.
Avoid getting into the habit of using these options — they may be handy as shortcuts on occasion, but they reduce flexibility.
2.1.1.1. Word-Based Diff
An alternative output format for git diff
is the so-called Word-Diff, which is available via the --word-diff
option.
Instead of the removed and added lines, the output of git diff
shows the added (green) and removed (red) words with an appropriate syntax and color-coded.[12]
This is useful when you are only changing single words in a file, for example when correcting AsciiDoc or LaTeX documents, because a diff is difficult to read if added and removed lines differ by only one word:
$ git diff ... - die Option `--color-words` zur Verfgung steht. Statt der entfernten + die Option `--color-words` zur Verfügung steht. Statt der entfernten ...
However, if you use the --word-diff
option, only words that have been changed will be displayed marked accordingly; in addition, line breaks are ignored, which is also very practical because a reorientation of the words is not included as a change in the diff output:
$ git diff --word-diff ... --color-words zur [-Verfgung-]{Verfügung} steht. ...
If you work a lot with continuous text, it is a good idea to set up an alias to abbreviate this command, so that you only have to type $ git config --global alias.dw "diff --word-diff" |
2.1.2. Creating Commits Step by Step
But why create commits step-by-step — don’t you always want to check in all changes?
Yes, of course, you usually want to commit your changes completely. However, it can be useful to check them in step by step, for example, to better reflect the development history.
An example: You have worked intensively on your software project for the past three hours, but because it was so exciting, you forgot to pack the four new features into handy commits. In addition, the features are scattered over various files.
At best, you want to be selective, that is, you don’t want to commit all changes from one file, but only certain lines (functions, definitions, tests, …), and from different files.
Git’s index provides the flexibility you need for this. You collect some changes in the index and pack them into a commit — but all other changes are still preserved in the files.
We’ll illustrate this using the “Hello World!” example from the previous chapter.
As a reminder, the contents of the hello.pl
file
# Hello World! in Perl
print "Hello World!\n";
Now we prepare the file so that it has several independent changes that we don’t want to combine into a single commit.
First, we add a shebang line at the beginning.[13]
We also add a line naming the author, and the Perl statement use strict
, which tells the Perl interpreter to be as strict as possible in its syntax analysis.
It is important for our example that the file has been changed in several places:
#!/usr/bin/perl
# Hello World! in Perl
# Author: Valentin Haenel
use strict;
print "Hello World!\n";
With a simple git add hello.pl
all new lines would be added to the index — so the state of the file in the index would be the same as in the working tree.
Instead, we use the --patch
option or short -p
.[14]
This has the effect that we are interactively asked which changes we want to add to the index.
Git offers us each change one by one, and we can decide on a case-by-case basis how we want to handle them:
$ git add -p diff --git a/hello.pl b/hello.pl index c6f28d5..908e967 100644 --- a/hello.pl +++ b/hello.pl @@ -1,2 +1,5 @@ +#!/usr/bin/perl # Hello World! in Perl +# Author: Valentin Haenel +use strict; print "Hello World!\n"; Stage this hunk [y,n,q,a,d,/,s,e,?]?
This is where Git shows all changes, since they’re very close together in the code. If the changes are far apart or spread across different files, they’re offered separately. The term hunk refers to loosely connected lines in the source code. Some of the options we have at this point include the following:
Stage this hunk[y,n,q,a,d,/,s,e,?]?
The options are each only one letter long and difficult to remember. A small reminder is always given by [?]. We have summarized the most important options below.
y (yes)
|
Transfer the current hunk to the index. |
n (no)
|
Don’t pick up the current hunk. |
q (quit)
|
Do not pick up the current hunk or any of the following ones. |
a (all)
|
Pick up the current hunk and all those that follow (in the current file). |
s (split)
|
Try to split the current hunk. |
e (edit)
|
Edit the current hunk.[15] |
In the example we split the current hunk and enter s
for split.
Stage this hunk [y,n,q,a,d,/,s,e,?]? [s] Split into 2 hunks. @@ -1 +1,2 @@ +#!/usr/bin/perl # Hello World! in Perl
Git confirms that the hunk was successfully split, and now offers us a diff that contains only the shebang line.[16]
We specify y
for yes and q
for quit on the next hunk.
To check if everything worked, we use git diff
with the --staged
option, which shows the difference between index and HEAD
(the latest commit):
$ git diff --staged diff --git a/hello.pl b/hello.pl index c6f28d5..d2cc6dc 100644 --- a/hello.pl +++ b/hello.pl @@ -1,2 +1,3 @@ +#!/usr/bin/perl # Hello World! in Perl print "Hello World!\n";
To see which changes are not yet in the index, a simple call to git diff
is enough to show us that — as expected — there are still two lines in the working tree:
$ git diff diff --git a/hello.pl b/hello.pl index d2cc6dc..908e967 100644 --- a/hello.pl +++ b/hello.pl @@ -1,3 +1,5 @@ #!/usr/bin/perl # Hello World! in Perl +# Author: Valentin Haenel +use strict; print "Hello World!\n";
At this point we could create a commit, but for demonstration purposes we want to start from scratch.
So we use git reset HEAD
to reset the index.
$ git reset HEAD Unstaged changes after reset: M hello.pl
Git confirms and names the files that have changes in them; in this case, it’s just the one.
The git reset
command is in a sense the counterpart of git add
: Instead of transferring differences from the working tree to the index, reset
transfers differences from the repository to the index.
Committing changes to the working tree is potentially destructive, as your changes may be lost.
Therefore, this is only possible with the --hard
option, which we discuss in Sec. 3.2.3, “Reset and the Index”.
If you frequently use git add -p
, it is only a matter of time before you accidentally select a hunk you didn’t want.
If the index was empty, this is not a problem since you can reset it to start over.
It only becomes a problem if you have already recorded many changes in the index and don’t want to lose them, i.e. you remove a particular hunk from the index without wanting to touch the other hunks.
Analogous to git add -p
there is the command git reset -p
, which removes single hunks from the index.
To demonstrate this, let’s first apply all changes with git add hello.pl
and then run git reset -p
.
$ git reset -p diff --git a/hello.pl b/hello.pl index c6f28d5..908e967 100644 --- a/hello.pl +++ b/hello.pl @@ -1,2 +1,5 @@ +#!/usr/bin/perl # Hello World! in Perl +# Author: Valentin Haenel +use strict; print "Hello World!\n"; Unstage this hunk [y,n,q,a,d,/,s,e,?]?
As in the example with git add -p
, Git offers hunks one by one, but this time all the hunks in the index.
Accordingly, the question is: Unstage this hunk [y,n,q,a,d,/,s,e,?]?
, i.e. whether we want to remove the hunk from the index again.
As before, by entering the question mark we get an extended description of the available options.
At this point we press s
once for split, n
once for no and y
once for yes.
Now only the shebang line should be in the index:
$ git diff --staged diff --git a/hello.pl b/hello.pl index c6f28d5..d2cc6dc 100644 --- a/hello.pl +++ b/hello.pl @@ -1,2 +1,3 @@ +#!/usr/bin/perl # Hello World! in Perl print "Hello World!\n";
In the interactive modes of $ git config --global interactive.singlekey true |
A word of warning: A git add -p
may tempt you to check in versions of a file that are not executable or syntactically correct (e.g. because you forgot an essential line).
So don’t rely on your commit being correct just because make
— which works on working tree files! -- runs successfully.
Even if a later commit fixes the problem, it will still be a problem, among other things, with automated debugging via bisect (see Sec. 4.8, “Finding Regressions — Git Bisect”).
2.1.3. Creating Commits
You now know how to exchange changes between working tree, index, and repository.
Let’s turn to the git commit
command, which you use to “commit” changes to the repository.
A commit keeps track of the state of all the files in your project at any given time, and also contains meta-information:[17]
-
Name of the authors and e-mail address
-
Name of the committer and e-mail address
-
Creation date
-
Commit date
In fact, the name of the author does not have to be the name of the committer (who commits). Often, commits are integrated or edited by maintainers (for example, by rebase, which also adjusts the committer information, see Sec. 4.1, “Moving commits — Rebase”). The committer information is usually of secondary importance, though — most programs only show the author and the date the commit was made.
When you create a commit, Git uses the user.name
and user.email
settings configured in the previous section to identify the commit.
If you call git commit
without any additional arguments, Git will combine all changes in the index into one commit, and open an editor to create a commit message.
However, the message will always contain instructions commented out with hash marks (#
), or information about which files are changed by the commit.
If you call git commit -v
, you will still get a diff of the changes you will check in, below the instructions.
This is especially useful for keeping track of the changes, and for using the auto-complete feature of your editor.
Once you exit the editor, Git creates the commit. If you don’t specify a commit message or delete the entire contents of the file, Git will abort and not create a commit.
If you only want to write one line, you can use the --message
option, or short -m
, which allows you to specify the message directly on the command line, thus bypassing the editor:
$ git commit -m "Dies ist die Commit-Nachricht"
2.1.3.1. Improving a Commit
If you rashly entered git commit
, but want to make the commit slightly better, the --amend
(“correct”) option helps.
The option causes git to “add” the changes in the index to the commit you just made.[18]
You can also customize the commit message.
Note that the SHA-1 sum of the commit will change in any case.
The git commit --amend
call only changes the current commit on a branch.
Sec. 4.1.9, “Improving a Commit” describes how to improve past commits.
Calling $ git config --global alias.fixup "commit --amend --no-edit" |
2.1.3.2. Good Commit Messages
What should a commit message look like? Not much can be changed in the outer form: The commit message must be at least one line long, but preferably no longer than 50 characters. This makes lists of commits easier to read. If you want to add a more detailed description (which is highly recommended!), separate it from the first line with a blank line. No line should be longer than 76 characters, as is usual for email.
Commit messages often follow the habits or specifics of a project. There may be conventions, such as references to the bug tracking or issue system, or a link to the appropriate API documentation.
Note the following points when writing a commit description:
-
Never create empty commit messages. Commit messages such as
Update
,Fix
,Improvement
, etc. are just as meaningful as an empty message — you might as well leave it at that.
-
Very important: Describe why something was changed and what the implications are. What has been changed is always obvious from the diff!
-
Be critical and note if you think there is room for improvement or the commit may introduce bugs elsewhere.
-
The first line should not be longer than 50 characters, so the output of the version history always remains well formatted and readable.
-
If the message becomes longer, a short summary (with the important keywords) should be in the first line. After a blank line follows an extensive description.
We can’t stress enough how important a good commit description is. When committing, a developer remembers the changes well, but after a few days, the motivation behind them is often forgotten. Your colleagues or project members will thank you, too, because they can commit changes much faster.
Writing a good commit message also helps to briefly reflect on what has been done and what is still to come. You may find that you’ve forgotten one important detail as you write it.
You can also argue about a timeline: The time it takes you to write a good commit message is a minute or two. But how much less time will the bug-finding process take if each commit is well documented? How much time will you save others (and yourself) if you provide a good description of a diff, which may be hard to understand? Also, the blame tool, which annotates each line of a file with the commit that last changed it, will become an indispensable tool for detailed commit descriptions (see Sec. 4.3, “Who Made These Changes? — Git Blame”).
If you are not used to writing detailed commit messages, start today. Practice makes perfect, and once you get used to it, the work will go quickly — you and others will benefit.
The Git repository is a prime example of good commit messaging. Without knowing the details of Git, you’ll quickly know who changed what and why. You can also see how many hands a commit goes through before it’s integrated.
Unfortunately, the commit messages in most projects are still very spartan, so don’t be disappointed if your peers are lazy about writing, but rather set a good example and provide detailed descriptions.
2.1.4. Moving and Deleting Files
If you want to delete or move files managed by Git, use git rm
or git mv
.
They act like the regular Unix commands, but they also modify the index so that the action is included in the next commit.[19]
Like the standard Unix commands, git rm
also accepts the -r
and -f
options to recursively delete or force deletion.
git mv
also offers an option -f
(force) if the new filename already exists and should be overwritten.
Both commands accept the option -n
or --dry-run
, which simulates the process and does not modify files.
To delete a file from the index only, use |
You will often forget to move a file via git mv
or delete it via git rm
, and use the standard Unix commands instead.
In this case, simply mark the file (already deleted by rm
) as deleted in the index, too, using git rm <file>
.
To rename the file, proceed as follows: First mark the old file name as deleted using git rm <old-name>
.
Then add the new file: git add <new-name>
.
Then check via git status
whether the file is marked as “renamed”.
Internally, it doesn’t matter to Git whether you move a file regularly via However, Git comes with a so-called Rename Detection: If a blob is the same and is only referenced by a different file name, Git interprets this as a rename. If you want to examine the history of a file and follow it if it is renamed, use the following command: $ git log --follow -- <file> |
2.1.5. Using Grep on a Repository
If you want to search for an expression in all files of your project, you can usually use grep -R <expression> .
.
However, Git offers its own grep command, which you can call up using git grep <expression>
.
This command usually searches for the expression in all files managed by Git.
If you want to examine only some of the files instead, you can specify the pattern explicitly.
With the following command you can find all occurrences of border-color
in all CSS files:
$ git grep border-color -- '*.css'
The grep implementation of Git supports all common flags that are also present in GNU Grep.
However, calling git grep
is usually an order of magnitude faster, since Git has significant performance advantages due to the object database and the multithreaded design of the command.
The popular $ git config alias.ack '!git -c color.grep.filename="green bold" \ -c color.grep.match="black yellow" -c color.grep.linenumber="yellow bold" \ grep -n --break --heading --color=always --untracked' |
2.1.6. Examining the Project History
Use git log
to examine the project’s version history.
The options of this command (most of which also work for git show
) are very extensive, and we will introduce the most important ones below.
Without any arguments, git log
will output the author, date, commit ID, and the full commit message for each commit.
This is handy when you need a quick overview of who did what and when.
However, the list is a bit cumbersome when you’re looking at a lot of commits.
If you only want to look at recently created commits, limit git log
’s output to n commits with the -<n>
option.
For example, the last four commits are shown with:
$ git log -4
To display a single commit, enter:
$ git log -1 <commit>
The <commit>
argument is a legal name for a single commit, such as the commit ID or SHA-1 sum.
However, if you do not specify anything, Git automatically uses HEAD
.
Apart from single commits, the command also understands so-called commit ranges (series of commits), see Sec. 2.1.7, “Commit-Ranges”.
The -p
(--patch
) option appends the full patch in Unified-Diff format below the description.
Thus, a git show <commit>
from the output is equivalent to git log -1 -p <commit>
.
If you want to display the commits in compressed form, we recommend the --oneline
option: It summarizes each commit with its abbreviated SHA-1 sum and the first line of the commit message.
It is therefore important that you include as much useful information as possible in this line!
For example, this would look like this:[20]
$ git log --oneline 25f3af3 Correctly report corrupted objects 786dabe tests: compress the setup tests 91c031d tests: cosmetic improvements to the repo-setup test b312b41 exec_cmd: remove unused extern
The --oneline
option is only an alias for --pretty=oneline
.
There are other ways to customize the output of git log
.
The possible values for the --pretty
option are:
oneline
|
Commit-ID and first line of the description. |
short
|
Commit ID, first line of the description and author of the commit; output is four lines. |
medium
|
Default; output of commit ID, author, date and complete description. |
full
|
Commit ID, author’s name, name of the committer and full description — no date. |
fuller
|
Like |
email
|
Formats the information from |
format:<string>
|
Any format can be adapted by placeholders; for details see the man page |
Independently of this, you can display more information about the changes made by the commit below the commit message. Consider the following examples, which clearly show which files were changed in how many places:
$ git log -1 --oneline 4868b2ea 4868b2e setup: officially support --work-tree without --git-dir $ git log -1 --oneline --name-status 4868b2ea 4868b2e setup: officially support --work-tree without --git-dir M setup.c M t/t1510-repo-setup.sh $ git log -1 --oneline --stat 4868b2ea 4868b2e setup: officially support --work-tree without --git-dir setup.c | 19 t/t1510-repo-setup.sh | 210 +++++++++++++++++------------------ 2 files changed, 134 insertions(), 95 deletions(-) $ git log -1 --oneline --shortstat 4868b2ea 4868b2e setup: officially support --work-tree without --git-dir 2 files changed, 134 insertions(+), 95 deletions(-)
2.1.6.1. Time Constraints
You can restrict the time of the commits to be displayed using the --after
or --since
and --until
or --before
options.
The options are all synonymous, so they give the same results.
You can specify absolute dates in any common format, or relative dates, here are some examples:
$ git log --after='Tue Feb 1st, 2011' $ git log --since='2011-01-01' $ git log --since='two weeks ago' --before='one week ago' $ git log --since='yesterday'
2.1.6.2. File-Level Restrictions
If you specify one or more file or directory names after a git log
call, Git will only display the commits that affect at least one of the specified files.
Provided a project is well structured, the output of commits can be severely limited and a particular change can be found quickly.
Since filenames may collide with branches or tags, you should be sure to specify the filenames after a --
which means that only file arguments follow.
$ git log -- main.c $ git log -- *.h $ git log -- Documentation/
These calls only output the commits in which changes were made to the main.c
file, an .h
file, or a file under Documentation/
.
2.1.6.3. Grep for Commits
You can also search for commits in the style of grep
, where the --author
, --committer
, and --grep
options are available.
The first two options filter commits by author or committer name or address, as expected. For example, list all commits that Linus Torvalds has made since early 2010:
$ git log --since='2010-01-01' --author='Linus Torvalds'
You can also enter only part of the name or e-mail address here, so searching for 'Linus' would produce the same result.
For example, you can use --grep
to search for keywords or phrases in the commit message, such as all commits that contain the word “fix” (not case-sensitive):
$ git log -i --grep=fix
The -i
(or --regexp-ignore-case
) option causes git log
to ignore the pattern case (also works with --author
and --committer
).
All three options treat the values as regular expressions, just like grep
(see the regex(7)
man page).
The -E
and -F
options change the behaviour of the options in the same way as egrep
and fgrep
: to use extended regular expressions or to search for the literal search term (whose special characters lose their meaning).
To search for changes, use the so-called Pickaxe tool.
This will help you find commits whose diffs contain a certain regular expression (“ $ git log -p -G<regex> The Note that in earlier versions of Git, this operation was performed by the |
Equipped with these tools, you can now tame masses of commits yourself. Just specify as many criteria as you need to reduce the number of commits.
2.1.7. Commit-Ranges
So far, we’ve only looked at commands that require only a single commit as an argument, explicitly identified by its commit ID, or implicitly by the symbolic name HEAD
, which references the most recent commit.
The git show
command displays information about a commit, while the git log
command starts at a commit, and then goes back in the version history until the beginning of the repository (called the root commit
) is reached.
An important tool for specifying a series of commits is the so-called commit ranges in the form <commit1>..<commit2>
.
Since we have not yet worked with multiple branches, this is simply a range of commits in a repository, from <commit1>
exclusive to <commit2>
inclusive.
If you omit one of the two boundaries, Git will take the value HEAD
.
2.1.8. Differences between Commits
The command git show
or git log -p
has been used to show only the difference from the previous commit.
If you want to see the differences between several commits, the command git diff
.
The diff command performs several tasks.
As already seen, you can examine the differences between the working tree and the index without specifying any commits, or the differences between index and HEAD
with the --staged
option.
However, if you pass two commits or a commit range to the command, the difference between these commits is displayed instead.
2.2. The Object Model
Git is based on a simple but extremely powerful object model. It is used to map the typical elements of a repository (files, directories, commits) and the development over time. Understanding this model is very important, and it helps to abstract from typical Git steps to better understand them.
In the following, we will again use a “Hello World!” program as an example, this time in the Python programming language.[21]
The project consists of the file hello.py
as well as a README
file and a directory test
.
If you run the program with the command python hello.py
, you will get the output: Hello World!
.
In the directory test
is a simple shell script, test.sh
, which displays an error message if the Python program does not output the string Hello World!
as expected.
The repository for this project consists of the following four commits:
$ git log --oneline e2c67eb Kommentar fehlte 8e2f5f9 Test Datei 308aea1 README Datei b0400b0 Erste Version
2.2.1. SHA-1 — The Secure Hash Algorithm
SHA-1 is a secure hash algorithm that calculates a checksum of digital information: the SHA-1 sum.
The algorithm was introduced in 1995 by the American National Institute of Standards and Technology (NIST) and the National Security Agency (NSA).
SHA-1 was developed for cryptographic purposes and is used for checking the integrity of messages and as a basis for digital signatures.
Figure 3, “SHA-1 Algorithm” shows how it works, where we calculate the checksum of hello.py
.
The algorithm is a mathematical one-way function that maps a bit sequence of maximum length 264-1 bits (about 2 exbibytes) to a checksum of length 160 bits (20 bytes). The checksum is usually represented as a hexadecimal character string of length 40. The algorithm results in 2160 (approx. 1.5 · 1049) different combinations for this length of checksum, and therefore it is very, very unlikely that two bit sequences have the same checksum. This property is called collision safety.
Despite all efforts of cryptologists, several years ago various theoretical attacks on SHA-1 became known, which are supposed to make the generation of collisions possible with a considerable computing effort.[22] For this reason, NIST today recommends the use of the successors of SHA-1: SHA-256, SHA-384 and SHA-512, which have longer checksums and thus make the generation of collisions more difficult. On the Git mailing list there was a debate about switching to one of these alternatives, but this step was not considered necessary.[23]
This is because, although there is a theoretical attack vector on the SHA-1 algorithm, this does not compromise the security of Git. In fact, the integrity of a repository is not primarily protected by the collision resistance of an algorithm, but by the fact that many developers have identical copies of the repository.
The SHA-1 algorithm plays a central role in Git because it is used to build checksums of the data stored in the Git repository, the Git objects.
This makes them easy to reference as SHA-1 sums of their contents.
In your daily work with Git, you will usually only use SHA-1 sums of commits, known as commit IDs.
This reference can be passed to many Git commands, such as git show
and git diff
.
Depending on the repository, you often only need to specify the first few characters of an SHA-1 sum, since in practice a prefix is sufficient to uniquely identify a commit.
2.2.2. The Git Objects
All data stored in a Git repository is available as Git objects. There are four types:[24]
Object | Saves… | References other objects | Correspondence |
---|---|---|---|
Blob |
File content |
No |
File |
Tree |
Blobs and Trees |
Yes |
Directory |
Commit |
Project state |
Yes, a tree and further commits |
Snapshot/Archive at a time |
Tag |
Tag information |
Yes, an object |
Naming important snapshots or blobs |
Figure 4, “Git Objects” shows three objects from the example project — a blob, a tree, and a commit.[25]
The representation of each object includes the object type, the size in bytes, the SHA-1 sum, and the contents.
The blob contains the content of the file hello.py
(but not the file name).
The tree contains references to one blob for each file in the project, i.e. one for hello.py
and one for README
, plus one tree per subdirectory, i.e. in this case only one for test
.
The files in the subdirectories are referenced separately in the respective trees that map these subdirectories.
So the commit object contains exactly one reference to a tree, and that reference is to the tree of the project content — this is a snapshot of the state of the project. The commit object also contains a reference to its direct ancestors, along with the metadata “author” and “committer” and the commit message.
Many Git commands expect a tree as an argument. However, because a commit, for example, references a tree, this is called a tree-ish argument. This refers to any object that can last be resolved to a tree. This category also includes tags (see Sec. 3.1.3, “Tags — Marking Important Versions”). Similarly, commit-ish is an argument that can be resolved to a commit.
File contents are always stored in blobs. Trees only contain references to blobs and other trees in the form of the SHA-1 sums of these objects. A commit in turn references a tree.
2.2.3. The Object Database
All Git objects are stored in the object database and are identifiable by their unique SHA-1 sum, i.e. you can find an object in the database by its SHA-1 sum once it has been stored. Thus, the object database basically functions like a large hash table, where the SHA-1 sums serve as keys for the stored contents:[26]
e2c67eb ⟶ commit 8e2f5f9 ⟶ commit 308aea1 ⟶ commit b0400b0 ⟶ commit a26b00a ⟶ tree 6cf9be8 ⟶ blob (README) 52ea6d6 ⟶ blob (hello.py) c37fd6f ⟶ tree (test) e92bf15 ⟶ blob (test/test.sh) 5b4b58b ⟶ tree dcc027b ⟶ blob (hello.py) e4dc644 ⟶ tree a347f5e ⟶ tree
You will first see the four commits that make up the Git repository, including the e2c67eb
commit shown in Figure 4, “Git Objects”.
This is followed by trees and blobs, each with file or directory correspondence.
So-called top-level trees have no directory name: They refer to the top level of a project.
A commit always references a top-level tree, so there are four of them.
The hierarchical relationship of the objects listed above is shown in Figure 5, “Hierarchical Relationship of Git Objects”. On the left-hand side, you can see the four commits that are already in the repository, and on the right-hand side, the referenced contents of the most recent commit (C4). As described above, each commit contains a reference to its direct predecessor (the resulting graph of commits is discussed below). This relationship is illustrated by the arrows pointing from one commit to the next.
Each commit references the top-level tree — including the C4 commit in the example.
The top-level tree in turn references the files hello.py
and README
in the form of blobs, and the subdirectory test
in the form of another tree.
Because of this hierarchical structure and the relationship of the individual objects to one another, Git is able to map the contents of a hierarchical file system as Git objects and store them in the object database.
2.2.4. Examining the Object Database
In a short digression we will go into how to examine the object database of Git. To do this, Git provides so-called plumbing commands, a group of low-level tools for Git, as opposed to the porcelain commands you usually work with. These commands are therefore not important for Git beginners, but are simply intended to give you a different approach to the concept of the object database. For more information, see Sec. 8.3, “Writing Your Own Git Commands”.
Let’s first look at the current commit.
We’ll use the git show
command with the --format=raw
option, so let’s output the commit in raw format, so that everything this commit contains is displayed.
$ git show --format=raw e2c67eb commit e2c67ebb6d2db2aab831f477306baa44036af635 tree a26b00aaef1492c697fd2f5a0593663ce07006bf parent 8e2f5f996373b900bd4e54c3aefc08ae44d0aac2 author Valentin Haenel <valentin.haenel@gmx.de> 1294515058 +0100 committer Valentin Haenel <valentin.haenel@gmx.de> 1294516312 +0100 Kommentar fehlte ...
As you can see, all the information in Figure 4, “Git Objects” is output: the SHA-1 sums of the commit, tree, and direct ancestor, plus the author and committer (including the date as a Unix timestamp), and the commit description. The command also provides the diff output for the previous commit — but this is not part of the commit, strictly speaking, and is therefore omitted here.
Next, let’s take a look at the tree referenced by this commit, using git ls-tree
, a plumbing command to list the contents stored in a tree.
It’s similar to ls -l
, except that it is in the object database.
With --abbrev=7
we shorten the output SHA-1 sums to seven characters.
$ git ls-tree --abbrev=7 a26b00a 100644 blob 6cf9be8 README 100644 blob 52ea6d6 hello.py 040000 tree c37fd6f test
As in Figure 4, “Git Objects” the tree referenced by the commit contains one blob for each of the two files, and one tree (also: subtree) for the test
directory.
We can look at its contents again with ls-tree
, since we now know the SHA-1 sum of the tree.
As expected, you can see that the test
tree references exactly one blob, the blob for the file test.sh
.
$ git ls-tree --abbrev=7 c37fd6f 100755 blob e92bf15 test.sh
Finally, we make sure that the blob for hello.py
really contains our “Hello World!” program and that the SHA-1 sum is correct.
The command git show
shows any objects.
If we pass the SHA-1 sum of a blob, its contents are output.
To check the SHA-1 sum we use the plumbing command git hash-object
.
$ git show 52ea6d6 #! /usr/bin/env python """ Hello World! """ print 'Hello World!' $ git hash-object hello.py 52ea6d6f53b2990f5d6167553f43c98dc8788e81
A note for curious readers: git hash-object hello.py
does not produce the same output as the Unix command sha1sum hello.py
.
This is because not only the file content is stored in a blob.
Instead, the object type, in this case blob
, and the size, in this case 67 bytes, are stored in a header at the beginning of the blob.
The hash-object
command therefore does not calculate the checksum of the file content, but of the blob object.
2.2.5. Deduplication
The four commits that make up the sample repository are shown again in Figure 6, “Repository Content”, but in a different way: The dashed bordered tree and blob objects indicate unchanged objects, all others were added or changed in the corresponding commit.
The reading direction here is from bottom to top: at the bottom is C1, which contains only the file hello.py
.
Since trees only contain references to blobs and other trees, each commit stores the status of all files, but not their contents. Normally, only a few files change during a commit. New blob objects (and therefore new tree objects) are now created for the new files or those to which changes have been made. However, the references to the unchanged files remain the same.
Even more: A file that exists twice only exists once in the object database. The contents of this file are stored as a blob in the object database and are referenced by a tree in two places. This effect is known as deduplication: Duplicates are not only prevented, but not made possible in the first place. Deduplication is an essential feature of Content-Addressable File Systems, i.e. file systems that know files only by their contents (such as Git, for example, by giving an object the SHA-1 sum of itself as “name”).
Consequently, a repository in which the same 1 MB file exists 1000 times takes up only slightly more than 1 MB. Git essentially has to manage the blob, plus a commit and a tree with 1000 blob entries (20 bytes each plus the length of the filename). A checkout of this repository, on the other hand, consumes about 1 GB of space on the filesystem because Git resolves deduplication.[27]
The git checkout
and git reset
commands restore a previous state (see also Sec. 3.2, “Restoring Versions”): You specify the reference of the corresponding commit, and Git searches for it in the object database.
The reference is then used to find the tree object of this commit from the object database.
Finally, Git uses the references contained in the tree object to find all other tree and blob objects in the object database and replicates them as directories and files on the file system.
This allows you to restore exactly the project state that was saved with the commit at the time.
2.2.6. The Graph Structure
Because each commit stores its direct ancestors, a graph structure is created. More precisely, the arrangement of the commits creates a Directed Acyclic Graph (DAG). A graph consists of two core elements: the nodes and the edges connecting these nodes. In a directed graph, the edges are also characterized by a direction, which means that when you run the graph, you can only use the edges that point in the appropriate direction to move from one node to the next. The acyclic property rules out that you can find your way back to a node by any route through the graph. So you cannot move in a circle.[28]
Most Git commands are used to manipulate the graph: to add/remove nodes or to change the relation of the nodes to each other. You’ll know you’ve reached an advanced level of Git competency when you’ve internalized this rather abstract concept, and when you’re working with branches on a daily basis, you always think of the graph behind them. Understanding Git at this level is the first and only real hurdle to mastering Git safely in everyday life. |
The graph structure is derived from the object model, because each commit knows its direct ancestor (possibly several in the case of a merge commit). The commits form the nodes of this graph — the references to ancestors form the edges.
An example graph is shown in Figure 7, “A Commit Graph”. It consists of several commits, which are colored to make it easier to distinguish between their affiliations to different development branches. First, the commits A, B, C, and D were made. They form the main development branch. Commits E and F contain feature development, which was transferred to the main development branch with commit H. Commit G is a single commit that has not yet been integrated into the main development branch.
One result of the graph structure is the cryptographically secured integrity of a repository. Git uses the SHA-1 sum of a commit to reference not only the contents of the project files at a given point in time, but also all commits executed up to that point, and their relationship to each other, i.e. the complete version history.
The object model makes this possible: each commit stores a reference to its ancestors. These references are then used to calculate the SHA-1 sum of the commit itself. So you get a different commit if you reference another ancestor.
Since the predecessor in turn references predecessors, and its SHA-1 sum depends on the predecessors, and so on, this means that the complete version history is implicitly encoded in the commit ID. Implicit here means: If even one bit of a commit changes anywhere in the version history, then the SHA-1 sum of subsequent commits, especially the topmost one, is no longer the same. The SHA-1 sum doesn’t say anything detailed about the version history, though; it’s just a checksum of it.
2.2.6.1. References: Branches and Tags
However, there is not much you can do with a pure commit graph. To reference (i.e., work with) a node, you need to know its name, which is the SHA-1 sum of the commit. In everyday use, however, you rarely use the SHA-1 sum of a commit directly, but instead use symbolic names, called references, which Git can resolve to the SHA-1 sum.
Git basically offers two types of references, branches and tags. These are pointers to a commit graph, which are used to mark specific nodes. Branches have a “moving” character, meaning that they move up as new commits are added to the branch. Tags, on the other hand, are static in nature, and mark important points in the commit graph, such as releases.
Figure 8, “Example of a Commit Graph with Branches and Tags” shows the same commit graph with the master
, HEAD
, feature
, and bugfix
branches.
And the v0.1
and v0.2
tags.
3. Practical Version Control
The following chapter introduces all the essential techniques you’ll use in your daily work with Git. In addition to a more detailed description of the index and how to restore old versions, the focus is on working effectively with branches.
3.1. References: Branches and Tags
In the CVS/SVN environment, “Branch” and “Merge” are often a book with seven seals for newcomers, but for experts they are a regular cause for hair-raising. In Git, branching and merging are commonplace, simple, transparent, and fast. It’s common for a developer to create multiple branches and perform multiple merges in one day.
The tool Gitk is helpful in order not to lose the overview of several branches.
With gitk --all
you show all branches.
The tool visualizes the commit graph explained in the previous section.
Each commit represents one line.
Branches are displayed as green labels, tags as yellow pointers.
For more information, see Sec. 3.6.2, “Gitk”.
v0.1
.Because branches in Git are “cheap” and merges are easy, you can afford to use branches excessively. Want to try something, prepare a small bug fix, or start with an experimental feature? You can create a new branch for each of these. You want to test if one branch is compatible with the other? Merge them together, test everything, then delete the merge again and continue developing. This is common practice among developers using Git.
First, let’s look at references in general. References are nothing more than symbolic names for the hard to remember SHA-1 sums of commits.
These references are stored in .git/refs/
.
The name of a reference is determined by the file name, and the target is determined by the contents of the file.
For example, the master branch you have been working on all along looks like this:
$ cat .git/refs/heads/master 89062b72afccda5b9e8ed77bf82c38577e603251
If Git needs to manage a lot of references, they may not be stored as files under |
Under .git/refs/
there are several directories that represent the “type” of reference.
There is no fundamental difference between these references, only when and how they are used.
The references you will use most often are branches.
They are stored under .git/refs/heads/
.
Heads refers to what is sometimes called a “tip” in other systems: The latest commit on a development branch.[29]
Branches move up when you make commits on a branch, so they remain at the top of the version history.
Branches in other developers' repositories (e.g. the master branch of the official repository), so-called remote tracking branches, are stored under .git/refs/remotes/
(see Sec. 5.2.2, “Remote-Tracking-Branches”).
Tags, static references, which are mostly used for versioning, are stored under .git/refs/tags/
(see Sec. 3.1.3, “Tags — Marking Important Versions”).
3.1.1. HEAD and Other Symbolic References
Eine Referenz, die Sie selten explizit, aber ständig implizit benutzen, ist HEAD
.
Sie referenziert meist den gerade ausgecheckten Branch, hier master
:
One reference that you rarely use explicitly, but always implicitly, is HEAD
.
It usually refers to the branch you just checked out, in this case master
:
$ cat .git/HEAD ref: refs/heads/master
HEAD
can also point directly to a commit if you type git checkout <commit-id>
.
However, you are then in so-called detached-head mode, in which commits may get lost, see also Sec. 3.2.1, “Detached HEAD”.
The HEAD
determines which files are found in the working tree, which commit becomes the predecessor when a new one is created, which commit is displayed by git show
, and so on.
When we speak of “the current branch”, we mean the HEAD
in a technically correct sense.
The simple commands log
, show
, and diff
take HEAD
as their first argument, without any further arguments.
The output of git log
is the same as the output of git log HEAD
, and so on — this applies to most commands that operate on a commit if you don’t specify one explicitly.
HEAD
is thus similar to the shell variable PWD
, which specifies “where you are”.
When we talk about a commit, a command usually doesn’t care whether you specify the commit ID in full or in abbreviated form, or whether you access the commit by reference, such as a tag or branch.
However, such a reference may not always be unique.
What happens if there is a branch master
and a tag with the same name?
Git checks if the following references exist:
-
.git/<name>
(mostly only useful forHEAD
or similar) -
.git/refs/<name>
-
.git/refs/tags/<name>
-
.git/refs/heads/<name>
-
.git/refs/remotes/<name>
-
.git/refs/remotes/<name>/HEAD
Git will take the first matching reference it finds.
So you should always give tags a unique scheme so that they don’t get confused with branches.
This way you can address branches directly by name instead of heads/<name>
.
Especially important are the suffixes ^
and ~<n>
.
The syntax <ref>^
indicates the direct ancestor of <ref>
.
This does not always have to be unique: If two or more branches were merged, the merge commit has several direct ancestors.
<ref>^
or <ref>^1
then denotes the first direct ancestor, <ref>^2
the second, and so on.[30]
So the syntax HEAD^^
means “the two-level previous direct ancestor of the current commit”.
Note that ^
may have a special meaning in your shell and you may need to protect it with quotes or a backslash.
^
and ~<n>
The syntax <ref>~<n>
is equivalent to repeating ^
n times: HEAD~10
thus denotes the tenth direct predecessor of the current commit.
Note: This does not mean that only eleven commits are stored between HEAD
and HEAD~10
: Since ^
only follows the first string in any merge, the eleven commits stored between the two references, and all the other commits integrated by a merge, are the same.
The syntax is documented in the git-rev-parse(1)
man page in the “Specifying Revisions” section.
3.1.2. Managing Branches
A branch is created in Git in no time.
All Git needs to do is identify the currently checked out commit and store the SHA-1 sum in the .git/refs/heads/<branch-name>
file.
$ time git branch neuer-branch git branch neuer-branch 0.00s user 0.00s system 100% cpu 0.008 total
The command is so fast because (unlike other systems) no files need to be copied and no additional metadata needs to be stored. Information about the structure of the version history can always be derived from the commit that a branch references and its ancestors.
Here is an overview of the most important options:
git branch [-v]
-
Lists local branches. The currently checked-out branch is marked with an asterisk. You can also use
-v
to display the commit IDs to which the branches point and the first line of the description of the corresponding commits.$ git branch -v maint 65f13f2 Start 1.7.5.1 maintenance track * master 791a765 Update draft release notes to 1.7.6 next b503560 Merge branch _master_ into next pu d7a491c Merge branch _js/info-man-path_ into pu
git branch <branch> [<ref>]
-
Creates a new branch
<branch>
pointing to commit<ref>
(<ref>
can be the SHA-1 sum of a commit, another branch, etc.). If you do not specify a reference, this is HEAD, the current branch.
git branch -m <new-name>
git branch -m <old-name> <new-name>
-
In the first form the current branch is renamed to
<new-name>
. In the second form<old-name>
is renamed to<new-name>
. The command fails if this would overwrite another branch.$ git branch -m master fatal: A branch named 'master' already exists.
If you rename a branch, Git will not display a message. So you can check afterwards to make sure the renaming was successful:
$ git branch * master test $ git branch -m test pu/feature $ git branch * master pu/feature
git branch -M …
-
Like
-m
, except that a branch is also renamed if it overwrites another branch. Attention: Commits of the overwritten branch may be lost!
git branch -d <branch>
-
Delete
<branch>
. You can specify several branches at once. Git refuses to delete a branch if it is not yet fully integrated into its upstream branch, or, if it does not exist, intoHEAD
, the current branch. (For more on upstream branches, see Sec. 5.3.2, “git pull”).
git branch -D …
-
Deletes a branch, even if it contains commits that have not yet been integrated into the upstream or current branch. Note: These commits may be lost unless they are referenced differently.
3.1.2.1. Changing Branches: Checkout
You can change branches with git checkout <branch>
.
If you create a Branch and want to switch directly to it, use git checkout -b <branch>
.
The command is equivalent to git branch <branch> && git checkout <branch>
.
What happens during a checkout?
Each branch references a commit, which in turn references a tree, that is, the image of a directory structure.
A git checkout <branch>
now resolves the reference <branch>
to a commit and replicates the commit’s tree to the index and to the working tree (i.e., the filesystem).
Since Git knows which version of files are currently in the index and working tree, only the files that differ on the current and new branches need to be checked out.
Git makes it hard for users to lose information. Therefore, a checkout is more likely to fail than overwrite any unsaved changes in a file. This happens in the following two cases:
-
The checkout would overwrite a file in the working tree that contains changes. Git will display the following error message:
error: Your local changes to the following files would be overwritten by checkout: file
.
-
The checkout would overwrite an untracked file, i.e. a file that is not managed by Git. Git then aborts with the error message: error:
The following untracked working tree files would be overwritten by checkout: file.
If, however, changes are stored in the working tree or index that are compatible with both branches, a checkout takes over these changes. This would look like this, for example:
$ git checkout master A neue-datei.txt Switched to branch master
This means that the file new-file.txt
was added, which does not exist on either branch.
So since no information can be lost here, the file is simply transferred.
The message: A new-file.txt
reminds you which files you should still take care of.
A
stands for added, D
for deleted and M
for modified.
If you’re sure you don’t need your changes anymore, you can use git checkout -f
to ignore the error messages and run the checkout anyway.
If you want to keep the changes and change the branch (e.g., interrupt your work and fix a bug on another branch), git stash
will help (Sec. 4.5, “Outsourcing Changes — Git Stash”).
3.1.2.2. Branch Naming Conventions
In principle, you can name branches almost arbitrarily.
Exceptions are spaces, some special characters with special meaning for Git (e.g. *
, ^
, :
, ~
), as well as two consecutive dots (..
) or a dot at the beginning of the name.[31]
It makes sense to always enter branch names completely in lower case letters.
Since Git manages branch names under .git/refs/heads/
as files, it is essential that you use upper and lower case.
You can group branches into “namespaces” by using a /
as a separator.
Branches that are related to the translation of a software can then be named e.g. i18n/german
, i18n/english
etc.
If several developers share a repository, you can also create “private” branches under <username>/<topic>
.
These namespaces are represented by a directory structure, so that a directory <username>/
with the branch file <topic>
is created under .git/refs/heads/
.
The main development branch of your project should always be called master
.
Bugfixes are often managed on a branch maint
(short for “maintenance”).
The next release is usually prepared for next
.
Features that are still in an experimental state should be developed in pu
(for “proposed updates”) or in pu/<feature>
.
For a more detailed description of how to use branches to structure development and organize release cycles, see Ch. 6, Workflows on Workflows.
3.1.2.3. Deleted Branches and “Lost” Commits
Commits each have one or more predecessors. Therefore, you can walk through the commit graph “directed”, that is, from newer to older commits, until you reach a root commit.
It’s not the other way around: if a commit knew its successor, that version would have to be stored somewhere.
This would change the SHA-1 sum of the commit, and the successor would have to reference the corresponding new commit, which would give it a new SHA-1 sum, so the predecessor would have to be changed, and so on.
So Git can only go through the commits from a named reference (such as a branch or HEAD
) in the direction of earlier commits.
Therefore, if the “top” of a branch is deleted, the topmost commit is no longer referenced (in Git jargon: unreachable). As a result, the predecessor is no longer referenced, and so on, until the next commit comes along that is referenced in some way (either by a branch, or by having a successor that is itself referenced by a branch).
So when you delete a branch, the commits on that branch are not deleted, they are just “lost”. Git simply doesn’t find them anymore.
However, they will still be present in the object database for a while.[32] So you can easily restore a branch by explicitly specifying the previous (and supposedly deleted) commit as a reference:
$ git branch -D test Deleted branch test (was e32bf29). $ git branch test e32bf29
Another way to retrieve deleted commits is the reflog (see Sec. 3.7, “Reflog”).
3.1.3. Tags — Marking Important Versions
SHA-1 sums are a very elegant solution to describe versions decentrally, but they are semantically poor and unwieldy for humans. Unlike linear revision numbers, commit IDs alone tell us nothing about the order of versions.
During the development of software projects, different “important” versions need to be marked so that they can be easily found in the repository. The most important ones are usually those that are released, called releases. Release candidates are also often marked in this way, i.e. versions that form the basis for the next version and are checked for critical bugs in the course of quality assurance without adding new features. Depending on the project and development model, there are different conventions for marking releases and procedures for preparing and publishing them.
In the open source area, two versioning schemes have become established: the classic major/minor/micro versioning scheme and, more recently, date-based versioning.
With major/minor/micro versioning, which is used e.g. with the Linux kernel and also Git, a version is identified by three (often four) numbers: 2.6.39
or 1.7.1
.
With date-based versioning, on the other hand, the designation is derived from the time of the release, e.g.: 2011.05
or 2011-05-19
.
This has the great advantage that the age of a version is easily identifiable.[33]
Git offers tags (“labels”) that can be used to mark any Git object — usually commits — to highlight prominent states in its development history.
Like branches, tags are implemented as references to objects.
Unlike branches, however, tags are static, meaning that they are not moved when new commits are added, and always point to the same object.
There are two types of tags: annotated and lightweight.
Annotated tags are tagged with metadata, such as author, description, or GPG signature.
Lightweight tags, on the other hand, “simply” point to a specific Git object.
For both types of tags, Git creates references under .git/refs/tags/
or .git/packed-refs
.
The difference is that for each annotated tag, Git creates a special Git object — a tag object — in the Object Database to store the metadata and SHA-1 sum of the selected object, while a Lightweight tag points directly to the selected object.
Figure 12, “The Tag Object” shows the contents of a tag object; compare also the other git objects, Figure 4, “Git Objects”.
The tag object shown has both a size (158 bytes) and a SHA-1 sum.
It contains the name (0.1
), the object type and the SHA-1 sum of the referenced object as well as the name and e-mail of the author, which is called tagger in Git jargon.
In addition, the tag contains a tag message that describes the version, for example, and optionally a GPG signature.
In the Git project, for example, a tag message consists of the current version designation and the signature of the maintainer.
In the following, let’s first look at how you manage tags locally. Sec. 5.8, “Exchanging Tags” describes how you exchange tags between repositories.
3.1.3.1. Managing Tags
You can manage tags with the command git tag
.
Without arguments it shows all existing tags.
Depending on the size of the project, it is worth limiting the output with the -l
option and a corresponding pattern.
With the following command you display all variants of version 1.7.1 of the git project, i.e. both the release candidates with the addition -rc*
and the (four-digit) maintenance releases:
$ git tag -l v1.7.1* v1.7.1 v1.7.1-rc0 v1.7.1-rc1 v1.7.1-rc2 v1.7.1.1 v1.7.1.2 v1.7.1.3 v1.7.1.4
The content of a tag is provided by git show
:
$ git show 0.1 | head tag 0.1 Tagger: Valentin Haenel <valentin.haenel@gmx.de> Date: Wed Mar 23 16:52:03 2011 +0100 Erste Veröffentlichung commit e2c67ebb6d2db2aab831f477306baa44036af635 Author: Valentin Haenel <valentin.haenel@gmx.de> Date: Sat Jan 8 20:30:58 2011 +0100
Gitk presents tags as yellow, arrow-like boxes that are clearly distinguishable from the green, rectangular branches:
3.1.3.2. Lightweight Tags
To add a lightweight tag to the HEAD
, pass the desired name to the command (in this example, to mark an important commit)
$ git tag api-aenderung $ git tag api-aenderung
To add a lightweight tag to the HEAD
, pass the desired name to the command (in this example, to mark an important commit)
$ git tag pre-regression HEAD~23 $ git tag api-aenderung pre-regression
Tags are unique — if you try to recreate a tag, Git will abort with an error message:
$ git tag pre-regression fatal: tag 'pre-regression' already exists
3.1.3.3. Annotated Tags
Annotated tags are created with the -a
option.
As with git commit
, an editor will open and allow you to write the tag message.
Or you can pass the tag message with the option -m
— in which case the option -a
is redundant:
$ git tag -m "Zweite Veröffentlichung" 0.2
3.1.3.4. Signed Tags
To verify a signed tag, use the -v
(verify) option:
$ git tag -v v1.7.1 object d599e0484f8ebac8cc50e9557a4c3d246826843d type commit tag v1.7.1 tagger Junio C Hamano <gitster@pobox.com> 1272072587 -0700 Git 1.7.1 gpg: Signature made Sat Apr 24 03:29:47 2010 CEST using DSA key ID F3119B9A gpg: Good signature from "Junio C Hamano <junkio@cox.net>" ...
Of course, this assumes that you have both GnuPG installed and that you have already imported the signer’s key.
In order to sign tags yourself, you must first set the preferred key:
$ git config --global user.signingkey <GPG-Key-ID>
Now you can create signed tags with the -s
(sign) option:
$ git tag -s -m "Dritte Veröffentlichung" 3.0
3.1.3.5. Deleting and Overwriting Tags
Use the -d
and -f
options to delete or overwrite tags:
$ git tag -d 0.2 Deleted tag '0.2' (was 4773c73)
The options should be used with caution, especially if you use the tags not only locally, but also publish them.
Under certain circumstances, tags may indicate different commits — version 1.0
in repository X points to a different commit than version 1.0
in repository Y.
But see also Sec. 5.8, “Exchanging Tags”.
3.1.3.6. Lightweight vs. Annotated Tags
For public versioning of software, annotated tags are generally more useful. Unlike lightweight tags, they contain meta-information that shows who created a tag and when — the person contact is unique. Users of software can also find out who has approved a particular version. For example, it’s clear that Junio C. Hamano has tagged Git version 1.7.1 — so it has his “seal of approval”. The statement also confirms the cryptographic signature, of course. Lightweight tags, on the other hand, are particularly suitable for applying local markers, for example to identify certain commits relevant to the current task. However, make sure not to upload such tags to a public repository (see Sec. 5.8, “Exchanging Tags”), as they might spread. If you only use the tags locally, you can also delete them once they have fulfilled their service (see above).
3.1.3.7. Non-Commit Tags
With tags you can mark any Git object, not only commits, but also trees, blobs and even tag objects themselves! The classic example is to put the GPG public key used by the maintainer of a project to sign tags in a blob.
For example, the tag junio-gpg-pub
in the Git repository of Git points to the key of Junio C. Hamano:
$ git show junio-gpg-pub | head -5 tag junio-gpg-pub Tagger: Junio C Hamano <junkio@cox.net> Date: Tue Dec 13 16:33:29 2005 -0800 GPG key to sign git.git archive.
Because this blob object is not referenced by any tree, the file is virtually separate from the actual code, but still exists in the repository. In addition, a tag on a “lonely” blob is necessary so that it is not considered unreachable and is deleted during repository maintenance.[34]
To use the key, proceed as follows:
$ git cat-file blob junio-gpg-pub | gpg --import gpg: key F3119B9A: public key "Junio C Hamano <junkio@cox.net>" imported gpg: Total number processed: 1 gpg: imported: 1
You can then verify all tags in the Git-via-Git repository, as described above.
3.1.3.8. Describing Commits
Tags are very useful for describing any commit “better”.
The git describe
command gives a description consisting of the most recent tag and its relative position in the commit graph.
Here’s an example from the git project: we describe a commit with the SHA-1 prefix 28ba96a
, which is located in the commit graph seven commits after version 1.7.1
:
$ git describe --tags v1.7.1-7-g28ba96a
The output of git describe
is formatted as follows:
<tag>-<position>-g<SHA-1>
The tag is v1.7.1
; the position indicates that there are seven new commits between the tag and the described commit.[35]
The g
before the ID indicates that the description is derived from a Git repository, which is useful in environments with multiple version control systems.
By default, git describe
only searches for annotated tags, but the --tags
option extends the search to include lightweight tags.
The command is very useful because it translates a content-based identifier into something useful for humans: v1.7.1-7-g28ba96a
is much closer to v1.7.1
than v1.7.1-213-g3183286
.
This allows you to compile the output directly into the software in a way that makes sense, just like in the Git project:
$ git describe v1.7.5-rc2-8-g0e73bb4 $ make GIT_VERSION = 1.7.5.rc2.8.g0e73bb ... $ ./git --version git version 1.7.5.rc2.8.g0e73bb
This way a user knows roughly what version he has, and can track which commit the version was compiled from.
3.2. Restoring Versions
The goal of version control software is not just to examine changes between commits.
Above all, it is also important to restore older versions of a file or entire directory trees, or to undo changes.
In Git, the commands checkout
, reset
, and revert
are particularly useful for this.
The Git command checkout
can not only change branches, but also restore files from previous commits.
The syntax is general:
git checkout [-f] <referenz> -- <muster>
checkout
resolves the given reference (and HEAD
if missing) to a commit and extracts all files matching <pattern>
to the working tree.
If <pattern>
is a directory, it refers to all files and subdirectories in it.
Unless you explicitly specify a pattern, all files are checked out.
Changes to a file are not simply overwritten, unless you specify the -f
option (see above).
HEAD
is also set to the corresponding commit (or branch).
However, if you specify a pattern, checkout
overwrites this file(s) without prompting.
So to discard all changes to <file>
, enter git checkout — <file>
: Git then replaces <file>
with the version in the current branch.
This way, you can also reconstruct the older state of a file:
$ git checkout ce66692 -- <datei>
The double minus separates the patterns from the options or arguments. It is not necessary, however: If there are no branches or other references with that name, Git will try to find one. So the separation only makes it clear that you want to recover the file(s) in question.
To view the contents of a file from a particular commit without checking it out, use the following command:
$ git show ce66692:<file>
Use |
3.2.1. Detached HEAD
If you check out a commit that is not referenced by a branch, you are in detached-HEAD mode:
$ git checkout 3329661 Note: checking out '3329661'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b new_branch_name HEAD is now at 3329661... Add LICENSE file
As the explanation, which you can hide by setting the option advice.detachedHead
to false
, already warns you, changes you make now will be lost in case of doubt: Since your HEAD
is the only direct reference to the commit after that, further commits are not directly referenced by a branch (they are unreachable, see above).
So working in detached HEAD mode is especially useful if you want to try something quickly: Has the bug actually already appeared in commit 3329661
? Was there actually a README
file at the time of 3329661
?
If you want to do more than just look around from the commit you checked out, for example, to see if your software already had a particular bug at the time, you should create a branch: $ git checkout -b <temp-branch> Then you can make commits as usual without fear of losing them. |
3.2.2. Rolling Back Commits
If you want to undo all the changes a commit makes, the revert
command helps.
However, it does not delete a commit, but creates a new one whose changes are exactly the opposite of the other commit: Deleted lines become added lines, and vice versa.
Suppose you have a commit that creates a LICENSE
file.
The patch of the corresponding commit looks like this:
--- /dev/null +++ b/LICENSE @@ -0,0 +1 @@ +This software is released under the GNU GPL version 3 or newer.
Now you can undo the changes:
$ git revert 3329661 Finished one revert. [master a68ad2d] Revert "Add LICENSE file" 1 files changed, 0 insertions(+), 1 deletions(-) delete mode 100644 LICENSE
Git creates a new commit on the current branch — unless you specify otherwise — with the description Revert "<Old commit message>"
.
This commit looks like this:
$ git show commit a68ad2d41e9219383449d703521573477ee7da48 Author: Julius Plenz <feh@mali> Date: Mon Mar 7 05:28:47 2011 +0100 Revert "Add LICENSE file" This reverts commit 3329661775af3c52e6b2ad7e9e7e7d789ba62712. diff --git a/LICENSE b/LICENSE deleted file mode 100644 index 3fd9c20..0000000 --- a/LICENSE +++ /dev/null @@ -1 +0,0 @@ -This software is released under the GNU GPL version 3 or newer.
Note that from now on, both the commit and the revert will appear in the version history of a project. You therefore only undo the changes, but do not delete any information from the version history.
You should therefore only use revert
if you need to undo a change that has already been published.
However, if you are developing locally in a separate branch, it makes more sense to delete these commits completely (see the following section on reset
and the topic Rebase, Sec. 4.1, “Moving commits — Rebase”).
If you want to perform a rebase, but not for all changes to the commit, but only for those to a file, you can use this procedure:
$ git show -R 3329661 -- LICENSE | git apply --index $ git commit -m 'Revert change to LICENSE from 3329661'
The git show
command prints the changes from commit 3329661
that apply to the LICENSE
file.
The -R
option causes the unified-diff format to be displayed “the other way around” (reverse).
The output is passed to git apply
to make the changes to the file and index.
The changes are then checked in.
Another way to undo a change is to check out a file from a previous commit, add it to the index, and check it in again:
$ git checkout 3329661 -- <datei> $ git add <datei> $ git commit -m 'Reverting <datei> to resemble 3329661'
3.2.3. Reset and the Index
If you are deleting a commit completely, not just undoing it, use git reset
.
The reset command sets the HEAD
(and thus the current branch), and optionally the index and working tree, to a particular commit.
The syntax is git reset [<option>] [<commit>]
.
The most important types of resets are the following:
--soft
|
Resets only the |
--mixed
|
Default setting if you do not specify an option.
Sets |
--hard
|
Synchronizes |
If you call git reset
without any options, this is equivalent to a git reset --mixed HEAD
.
We’ve already seen this command: Git sets the current HEAD
to HEAD
(so it doesn’t change it) and the index to HEAD
— in this case, the changes you added before are lost.
The possible uses of this command are many and varied and will reappear in the various command sequences. Therefore it is important to understand the functionality, even if there are sometimes alternative commands that have the same effect.
Suppose you have made two commits to master
that you actually want to move to a new branch to work on further.
The following command sequence creates a new branch pointing to HEAD
, and then resets HEAD
and the current branch master
two commits.
Then check out the new branch <new-feature>
.
$ git branch <neues-feature> $ git reset --hard HEAD^^ $ git checkout <neues-feature>
Alternatively, the following sequence has the same effect: you create a Branch <new-feature>
that points to the current commit.
Then you delete master
and re-create it so that it points to the second predecessor of the current commit.
$ git checkout -b <new-feature> $ git branch -D master $ git branch master HEAD^^
3.2.3.1. Using Reset
With reset
you do not delete any commits, but only move references.
As a result, the commits that are no longer referenced are lost, and are therefore deleted (unreachable).
So you can use reset
to delete only the topmost commits on a branch, not arbitrary commits “somewhere in the middle,” as this would destroy the commit graph.
(For the somewhat more complicated deletion of commits “in the middle,” see rebase, Sec. 4.1, “Moving commits — Rebase”).
Git always stores the original HEAD
under ORIG_HEAD
.
So if you have performed a reset by mistake, use git reset --hard ORIG_HEAD
to undo it (even if the commit was supposedly deleted).
However, this does not affect lost changes to the working tree (which you have not yet checked in) — they are deleted irrevocably.
The result from above (moving two commits to a new branch) can also be achieved this way:
$ git reset --hard HEAD^^ $ git checkout -b <new-feature> ORIG_HEAD
A common use of reset
is to discard changes on a test basis.
You want to try a patch?
Add some debugging output?
Change a few constants?
If you don’t like the result, a git reset --hard
deletes all changes to the working tree.
You can also use reset
to “make your version history nice.”
For example, if you have a few commits on a branch <feature>
based on master
, but they are not well structured (or much too large), you can create a branch <reorder-feature>
and pack all changes into new commits:
$ git checkout -b <reorder-feature> <feature> $ git reset master $ git add -p $ git commit $ ...
The command git reset master
sets index and HEAD
to the state of master
.
However, your changes in the working tree are preserved, i.e. all changes that distinguish the branch <feature>
from master
are now only contained in the files in the working tree.
Now you can add the changes incrementally using git add -p
and package them into (several) handy commits.[36]
Suppose you are working on a change and want to check it in temporarily (to continue working on it later). You can then use the following commands:
$ git commit -m 'feature (noch unfertig)' (später) $ git reset --soft HEAD^ (weiterarbeiten)
The command git reset --soft HEAD^
resets the HEAD
one commit, but leaves the index and the working tree untouched.
So all changes from your temporary commit are still in the index and working tree, but the actual commit is lost.
You can now make further changes and create a new commit later.
Similar functionality is provided by the --amend
option for git commit
, as well as the git stash
command, which is explained in Sec. 4.5, “Outsourcing Changes — Git Stash”.
3.3. Merging Branches
Merging branches is called merging in Git; the commit that merges two or more branches together is called a merge commit.
Git provides the merge
subcommand, which allows you to merge one branch into another.
This means that any changes you make to the branch will be reflected in the current one.
Note that the command integrates the specified branch into the currently checked-out branch (i.e., HEAD
).
The command therefore only needs one argument:
$ git merge <branch-name>
If you handle your branches carefully, there should be no problems with merging. If there are, then this section also presents strategies for resolving merge conflicts.
First, we will look at an object-level merge process.
3.3.1. Two-Branches Merge
The two branches, topic
and master
, that you want to merge, each reference the most recent commit in a chain of commits (F and D), and these two commits in turn reference a tree (corresponding to the top-level directory of your project).
First, Git calculates a so-called merge base, that is, a commit that both of the commits to be merged have as common ancestors. Usually there are several such bases — in the diagram below, A and B — and then the most recent one (which has the other bases as ancestors) is used.[37] In simple terms, this is the commit where the branches diverged (i.e., B).
Now, if you want to merge two commits (D and F to M), then the trees referenced by the commits must be merged.
Git does this as follows:[38] If a tree entry (another tree or a blob) is the same in both commits, then that very tree entry will be taken over in the merge commit. This happens in two cases:
-
A file has not been changed by either commit, or a subdirectory does not contain a changed file: In the first case, the blob SHA 1 sum of this file is the same in both commits. In the second case, the same tree object is referenced by both commits. The referenced blob or tree is therefore the same as the one referenced in the merge base.
-
A file was changed on both sides and equivalently (same blobs). This happens, for example, if all changes to a file were copied from one branch using
git cherry-pick
(see Sec. 3.5, “Taking over Individual Commits: Cherry Picking”). The referenced blob is then not the same as in the merge base.
If a tree entry disappears in one of the commits, but is still present in the other, and is the same as in the merge base, then it is not taken over. This is equivalent to deleting a file or directory if no changes have been made to the file on the other side. Similarly, if a commit brings a new tree entry, it is copied to the merge tree.
Now what happens if a file from the commits has different blobs, that is, the file has been changed at least on one side? In the event that one of the blobs is the same as in the merge base, only one side of the file has been changed, so Git can simply adopt those changes.
However, if both blobs are different from the merge base, you might run into problems. First, Git tries to apply the changes on both sides.
A 3-way merge algorithm is usually employed for this purpose. Unlike the classic 2-way merge algorithm, which is used when you have two different versions A and B of a file and want to merge them, this 3-way algorithm involves a third version C of the file, extracted from the above merge base. Therefore, because a common ancestor of the file is known, the algorithm can in many cases better (that is, not only based on the line number or context) decide how to merge changes. In practice, so many trivial merge conflicts are already solved automatically without user intervention.
However, there are conflicts that no merge algorithm, no matter how good, can merge. This happens, for example, if the context in version A of the file was changed just before a change in file B, or, worse still, version A and B and C have different versions of a line.
Such a case is called a merge conflict. Git merges all the files as best it can, and then presents the conflicting changes to the user so they can manually merge them (and thus resolve the conflict) (see Sec. 3.4, “Resolving Merge Conflicts”).
Although it is basically possible to generate a syntactically correct resolution with an algorithm that is specially designed for the respective programming language, an algorithm cannot look beyond the semantics of the code, i.e., cannot grasp the meaning of the code. Therefore, a solution generated in this way would usually not make sense.
3.3.2. Fast Forward Merges: Fast Forwarding One Branch
The git merge
command does not always create a merge commit.
A trivial case, but one that does occur frequently, is the so-called fast-forward merge, i.e. a fast forward merge of the branch.
A fast forward merge occurs when a branch, for example topic
, is the child of a second branch, master
:
A simple git merge topic
in Branch master
now causes master
to simply be moved forward — no merge commit is created.
Of course, such a behavior only works if the two branches have not diverged, i.e. if the merge base of both branches is one of the two branches itself, in this case master
.
This behavior is often desirable:
-
You want to integrate upstream changes, that is, changes from another Git repository. You typically use a command like
git merge origin/master
to do this. Agit pull
will also perform a merge. To learn how to merge changes between git repositories, see Ch. 5, Distributed Git. -
You want to add an experimental branch. Because it’s quick and easy to create branches in Git, it’s a good idea to start a new branch for each feature. If you’ve tried something experimental on a branch and want to integrate it without being able to tell when it’s “time to integrate”, you can do so by fast-forwarding.
With the options |
There are different opinions on whether changes should always be integrated via fast-forward or whether it is better to create a merge commit, although this is not absolutely necessary. The results are the same in both cases: Changes from one branch are integrated into another.
However, when you create a Merge-Commit, the integration of a feature becomes clear. Consider the following two excerpts from the version history of a project:
In the above case, you cannot easily see which commits were previously developed in branch sha1-caching
, that is, they have to do with a specific feature of the software.
In the lower version, however, you can see at first glance that there were exactly four commits on that branch, and that it was then integrated. Since nothing was developed in parallel, the merge commit would in principle be unnecessary, but it does make the integration of the feature clear.
So instead of relying on the magic of nfm = merge --no-ff # no-ff-merge ffm = merge --ff-only # ff-merge |
An explicit merge commit is also helpful because you can undo it with a single command. This is useful, for example, if you have integrated a branch but it has bugs: If the code is running in production, it is often desirable to merge the entire change back in until the bug is fixed. Use for this:
git revert -m 1 <merge-commit>
Git then produces a new commit that reverses any changes made by the merge.
The -m 1
option here specifies which “side” of the merge should be considered the mainline, or stable line of development: its changes are preserved.
In the above example, -m 1
would cause the changes made by the four commits from branch sha1-caching
, the second string of the merge, to be undone.
3.3.3. Merge Strategies
Git has five different merge strategies, some of which can be further adjusted by strategy options.
You determine the strategy by -s
, so a merge call is as follows:
git merge -s <strategy> <branch>
Some of these strategies can only merge two branches, others any number.
resolve
-
The
resolve
strategy can merge two branches using a 3-way merge technique. The newest (best) of all possible bases is used as the merge base. This strategy is fast and generally produces good results.
recursive
-
This is the standard strategy that Git uses to merge two branches. A 3-way merge algorithm is also used here. However, this strategy is more clever than
resolve
: If several merge bases exist, all of which have “equal rights,”[39] then Git first merges these bases together, and then uses the result as the merge base for the 3-way merge algorithm. In addition to the fact that merges with file renames can be processed more easily as a result, a test run on the version history of the Linux kernel has shown that these strategies result in fewer merge conflicts than theresolve
strategy. The strategy can be adapted by various options (see below).
octopus
-
Standard strategy when three or more branches are merged. In contrast to the two strategies mentioned above, the octopus strategy can only perform merges if no error occurs, i.e. if no manual conflict resolution is necessary. The strategy is especially designed to integrate many topic branches that are known to be compatible with the mainline (main development strand).
ours
-
Can merge any number of branches, but does not use a merge algorithm. Instead, the blobs or trees of the current branch (that is, the branch from which you entered
git merge
) are always used. This strategy is mainly used when you want to overwrite old developments with the current state of affairs.
subtree
-
Works like
recursive
, but the strategy does not compare the trees “on equal footing,” but tries to find the tree of one side as a subtree of the other side and only then merge them. This strategy is useful, for example, if you manage theDocumentation/
subdirectory of your project in a separate repository. Then you can merge the changes from that repository into the master repository by usinggit pull -s subtree <documentation-repo>
to apply thesubtree
strategy, which recognizes the contents of<documentation-repo>
as a subdirectory of the master repository and applies the merge process only to that subdirectory. This topic is discussed in more detail in Sec. 5.11, “Managing Subprojects”.
3.3.4. Options for the Recursive Strategy
The default strategy recursive
knows several options that adjust the behavior especially with regard to conflict resolution.
You specify them with the option -X
; the syntax is:
git merge -s recursive -X <option> <branch>
If you only merge two branches, you do not need to explicitly specify the recursive
strategy by -s recursive
.
Since the strategy can only merge two branches, it is possible to speak of our version and theirs: our version is the checked-out branch in the merge process, while their version references the branch you want to integrate.
ours
-
If a merge conflict occurs that would normally need to be resolved manually, our version is used instead. The strategy option is different from
ours
, however, because it ignores any changes made by the other side(s). Theours
option, on the other hand, takes all changes made by our side and the other side, and only gives priority in the event of a conflict and only at the points of conflict on our side.
theirs
-
Like
ours
, except that the opposite is true: in case of conflicts, their version is preferred.
ignore-space-change
,ignore-all-space
,ignore-space-at-eol
-
Since whitespace does not play a syntactic role in most languages, these options allow you to tell Git to try to resolve a merge conflict automatically if whitespace is not important. A common use case is when an editor or IDE has automatically reformatted source code.
The option
ignore-space-at-eol
ignores whitespace at the end of the line, which is especially helpful if both sides use different line-end conventions (LF/CRLF). If you specifyignore-space-change
, whitespace is also treated as a pure separator: Thus, when comparing a line, it is irrelevant how many spaces or tabs are in one place — indented lines remain indented, and separated words remain separated. The optionignore-all-space
ignores any whitespace.This is the general strategy: If their version brings in only whitespace changes covered by the specified option, they are ignored and our version is used; if they bring in further changes and our version has only whitespace changes, their version is used. However, if both sides have not only whitespace changes, there is still a merge conflict.
In general, after a merge that you could only solve by using one of these options, it is recommended to normalize the corresponding files again, i.e. to make the line endings and indentations uniform.
subtree=<tree>
-
Similar to the
subtree
strategy, but an explicit path is specified here. Similar to the above example, you would use:git pull -Xsubtree=Documentation <documentation-repo>
3.4. Resolving Merge Conflicts
As already described, some conflicts cannot be resolved by algorithms — in this case manual rework is necessary. Good team coordination and fast integration cycles can minimize major merge conflicts. But especially in early development, when possibly the internals of a software are changed instead of adding new features, conflicts can occur.
If you are working in a larger team, the developer who has done most of the work on the conflicted code is usually responsible for finding a solution. However, such a conflict resolution is usually not difficult if the developer has a good overview of the software in general and of his piece of code and its interaction with other parts in particular.
We will go through the solution of a merge conflict using a simple example in C.
Take a look at the following output.c
file:
int i;
for(i = 0; i < nr_of_lines(); i++)
output_line(i);
print_stats();
The piece of code goes through all lines of an output and outputs them one after the other. Finally it returns a small statistic.
Now two developers change something in this code.
The first one, Axel, writes a function that wraps the lines before they are output and replaces output_line
in the above piece of code with his improved version output_wrapped_line
:
int i;
int tw = 72;
for(i = 0; i < nr_of_lines(); i++)
output_wrapped_line(i, tw);
print_stats();
The second developer, Beatrice, modifies the code so that her newly introduced configuration setting max_output_lines
is honored and not too many lines are output:
int i;
for(i = 0; i < nr_of_lines(); i++) {
if(i > config_get("max_output_lines"))
break;
output_line(i);
}
print_stats();
So Beatrice uses the “obsolete” version output_line
, and Axel does not yet have the construct that checks the configuration setting.
Now Beatrice tries to transfer her changes on Branch B to the branch master
, where Axel has already integrated his changes:
$ git checkout master $ git merge B Auto-merging output.c CONFLICT (content): Merge conflict in output.c Automatic merge failed; fix conflicts and then commit the result.
In the output.c
file, Git now places conflict markers, highlighted in semi-bold at the bottom to indicate where changes overlap.
There are two pages: The first is HEAD
, i.e. the branch to which Beatrice wants to apply the changes — in this case master
.
The other side is the branch to be integrated — B.
The two sides are separated by a series of equal signs:
int i; int tw = 72; <<<<<<< HEAD for(i = 0; i < nr_of_lines(); i++) output_wrapped_line(i, tw); ======= for(i = 0; i < nr_of_lines(); i++) { if(i > config_get("max_output_lines")) break; output_line(i); } >>>>>>> print_stats();
It should be noted here that only the actual conflicting changes are objected to by Beatrice.
Axel’s definition of tw
above is accepted without any problems, although it is not yet available in Beatrice.
Beatrice must now resolve the conflict. This is done by first editing the file directly, modifying the code as it should be, and then removing the conflict markers. If Axel has documented in detail in his commit message[40] how his new function works, this should be done quickly:
int i;
int tw = 72;
for(i = 0; i < nr_of_lines(); i++) {
if(i > config_get("max_output_lines"))
break;
output_wrapped_line(i, tw);
}
print_stats();
Beatrice must then add the changes using git add
.
If no conflict markers remain in the file, Git will indicate that a conflict has been resolved.
Finally, the result has to be checked in:
$ git add output.c $ git commit
The commit message should definitely state how this conflict was resolved. It should also mention possible side effects on other parts of the program.
Normally, merge commits are “empty”, i.e., there is no diff output in git show
(because the changes were caused by other commits).
This is different in the case of a merge commit that resolves a conflict:
$ git show commit 6e6c55810c884356402c078f30e45a997047058e Merge: f894659 256329f Author: Beatrice <beatrice@gitbu.ch> Date: Mon Feb 28 05:59:36 2011 +0100 Merge branch 'B' * B: honor max_output_lines config option Conflicts: output.c diff --cc output.c index a2bd8ed,f4c8bec..e39e39d --- a/output.c +++ b/output.c @@@ -1,7 -1,9 +1,10 @@@ int i; +int tw = 72; - for(i = 0; i < nr_of_lines(); i++) + for(i = 0; i < nr_of_lines(); i++) { + if(i > config_get("max_output_lines")) + break; - output_line(i); + output_wrapped_line(i, tw); + } print_stats();
This combined diff output differs from the usual unidiff format: There is not only one column with the markers for added (+
), removed (-
) and context or unchanged (␣
), but two.
So Git compares the result with both ancestors.
The lines changed in the second column are exactly the same as Axel’s commit; the (semi-bold) changes in the first column are Beatrice’s commit including conflict resolution.
The default way, as seen above, is the following:
-
Open conflicting file
-
Resolve conflict, remove markers
-
Mark file as “resolved” via
git add
-
Repeat steps one to three for all files where conflicts occurred
-
Check in conflict solutions via
git commit
If you don’t know how to resolve the conflict on an ad hoc basis (for example, if you want to hire the original developer to produce a conflict-free version of the code), you can use git merge --abort
to abort the merge process — that is, to restore your working tree to the state it was in before you initiated the merge.
This command also aborts a merge that you have already partially resolved.
Attention: All changes that have not been checked in will be lost.
To get an overview of which commits caused changes to your file relevant to the merge conflict, you can use the command git log --merge -p -- <file> Git then lists the diffs of commits that have made changes to |
If you are in a merge conflict, a file with conflicts is stored in three stages: Stage one contains the version of the file in the merge base (that is, the common original version of the file), stage two contains the version from the HEAD
(that is, the version from the branch into which you are merging).
Finally, stage three contains the file in the version of the branch you are merging into (this has the symbolic reference MERGE_HEAD
).
The working tree contains the combination of these three stages with conflict markers.
However, you can display these versions with git show :<n>:<file>
:
$ git show :1:output.c $ git show :2:output.c $ git show :3:output.c
With a program specially developed for 3-way merges, however, it is much easier for you to keep an overview. The program looks at the three stages of a file, visualizes them accordingly and offers you options to move changes back and forth.
3.4.1. Help with Merging: Mergetool
In the case of non-trivial merge conflicts, a merge tool is recommended that visualizes the three stages of a file accordingly, thereby facilitating the resolution of the conflict.
Common IDEs and editors such as Vim and Emacs offer such a mode. There are also external tools such as KDiff3[41] and Meld.[42] The latter visualizes particularly well how a file has changed between commits.
You launch such a merge tool via git mergetool
.
Git will go through all the files that contain conflicts and display each one (when you press enter) in a merge tool.
By default this is Vimdiff.[43]
Such a program will usually display the three versions of a file — our page, their page, and the file merged as far as possible, including conflict markers — in three columns side by side, the latter sensibly in the middle. It is always essential that you make the change (conflict resolution) in the middle file, i.e. in the working copy. The other files are temporary and are deleted again when the merge tool is finished.
In principle, you can use any other tool.
The mergetool
script simply stores the three stages of the file with the corresponding file name and starts the diff tool on these three files.
If it quits again, Git checks to see if there are any conflict markers left in the file — if not, Git will assume that the conflict was resolved successfully and automatically add the file to the index using git add
.
Finally, when you have finished processing all the files, you only need to make one commit call to seal the conflict resolution.
The merge.tool
option determines which tool Git starts on the file.
The following commands are already preconfigured, meaning that Git already knows in which order the program expects the arguments and which additional options need to be specified:
araxis bc3 codecompare deltawalker diffmerge diffuse ecmerge emerge gvimdiff gvimdiff2 gvimdiff3 kdiff3 meld opendiff p4merge tkdiff tortoisemerge vimdiff vimdiff2 vimdiff3 xxdiff
To use your own merge tool, you must set merge.tool
to a suitable name, for example mymerge
, and then at least specify the mergetool.mymerge.cmd
option.
The shell evaluates the expression stored in it, and the variables BASE
, LOCAL
, REMOTE
, and MERGED
, which are contained in the file with the conflict markers, are set to the corresponding temporary files.
You can further configure the properties of your merge command, see the git-config(1)
man page in the mergetool
configuration section.
If you temporarily (not permanently) decide to use another merge program, specify it with the |
3.4.2. Rerere: Reuse Recorded Resolution
Git has a relatively unknown (and poorly documented), but very helpful feature: Rerere, short for Reuse Recorded Resolution.
You need to set the rerere.enabled
option to true
to have the command called automatically (note the d
at the end of enabled
).
The idea behind Rerere is simple but effective: Whenever a merge conflict occurs, Rerere automatically records a pre-image, an image of the conflict file including markers. In the case of the example above, it would look like this:
$ git merge B Auto-merging output.c CONFLICT (content): Merge conflict in output.c Recorded preimage for 'output.c' Automatic merge failed; fix conflicts and then commit the result.
If the conflict is resolved as above and the solution is checked in, Rerere saves the conflict resolution:
$ vim output.c $ git add output.c $ git commit Recorded resolution for 'output.c'. [master 681acc2] Merge branch 'B'
So far Rerere has not really helped. But now we can delete the merge commit completely (and are back to the situation before the merge). Then we execute the merge again:
$ git reset --hard HEAD^ HEAD is now at f894659 wrap output at 72 chars $ git merge B Auto-merging output.c CONFLICT (content): Merge conflict in output.c Resolved 'output.c' using previous resolution. Automatic merge failed; fix conflicts and then commit the result.
Rerere notices that the conflict is known and that a solution has already been found.[44] So Rerere calculates a 3-way-merge between the saved pre-image, the saved solution and the version of the file in the working tree. This way Rerere can resolve not only the same conflicts, but also similar ones (if in the meantime further lines outside the conflict area have been changed).
The result is not directly added to the index.
The solution is simply copied to the file.
You can then use git diff
to check whether the solution looks useful, run tests if necessary, etc.
If everything looks good, you can use the automatic solution via git add
as usual.
3.4.2.1. Why Rerere Makes Sense
One might object: Who voluntarily takes the risk of deleting an already (possibly costly) resolved merge conflict in order to want to repeat it at some point?
However, the procedure is desirable: First of all, it doesn’t make sense to simply periodically and out of habit merge the mainline — i.e. the main development thread, e.g. master
— into the topic branch (we will come back to this later).
But if you have a long-lived topic branch and want to test it occasionally to see if it is compatible with the mainline, you don’t want to resolve the conflicts by hand every time — once resolved, Rerere will resolve conflicts automatically.
This way you can successively develop your feature, knowing that it is in conflict with the mainline.
But at the time of the integration of the feature the conflicts are all automatically resolvable (because you have occasionally saved conflict solutions with Rerere).
In addition, Rerere is also called automatically in conflict cases that arise in a rebase process (see Sec. 4.1, “Moving commits — Rebase”). Again, once conflicts have been resolved, they can be automatically resolved again. Once you have merged a branch into the mainline for test purposes and resolved a conflict, this solution is automatically applied when you rebuild this branch on the mainline via rebase.
3.4.2.2. Using Rerere
In order for the Rere functionality to be used, you must set the rerere.enabled
option to true
, as mentioned above.
Rerere will then be called automatically when a merge conflict occurs (to capture the pre-image, possibly to resolve the conflict) and when a conflict resolution is checked in (to save the resolution).
Rerere stores information such as pre-image and resolution in .git/rr-cache/
, uniquely identified by a SHA-1 sum.
You almost never need to call the git rerere
subcommand, as it is already handled by merge
and commit
.
You can also use git rerere gc
to delete very old solutions.
What happens if a wrong conflict resolution was checked in?
Then you should delete the conflict resolution, otherwise Rerere will reapply the solution when you repeat the conflicted merge.
To do this, there is the command git rerere forget <file>
— directly after Rerere has checked in a wrong solution, you can delete the wrong solution in this way and restore the original state of the file (i.e. with conflict markers).
If you only want to do the latter, a git checkout -m <file>
will also help.
3.4.3. Avoiding Conflicts
Decentralized version control systems generally manage merges much better than central ones. This is mainly due to the fact that it is common practice in decentralized systems to check in many small changes locally first. This avoids “monster commits”, which offer much more potential for conflict. This finer granular development history and the fact that merges are usually data in the version history (as opposed to simply copying the lines of code) mean that decentralized systems do not have to look at the mere contents of files when merging.
Prevention is the best way to minimize merge conflicts.
Make small commits!
Combine your changes so that the resulting commit makes sense as a unit.
Always build Topic Branches on the latest release.
Merge from topic branches into “collection branches” or directly into master
, not the other way around.[45]
Using Rerere prevents conflicts that have already been resolved from constantly reoccurring.
Obviously, good communication among developers is also important for prevention: If several developers implement different and mutually influencing changes to the same function, this will certainly lead to conflicts sooner or later.
Another factor that unfortunately often leads to unnecessary(!) conflicts is autogenerated content. Suppose you write the documentation of a software in AsciiDoc[46] or work on a LaTeX project with several contributors: Never add the compiled man pages or the compiled DVI/PS/PDF to the repository! In the autogenerated formats, small changes to the plaintext (i.e. in the Ascii or LaTeX version) can cause large (and unpredictable) changes to the compiled formats that Git will not resolve adequately. Instead, it makes sense to provide appropriate Makefile targets or scripts to generate the files, and possibly keep the compiled version on a separate branch.[47]
3.5. Taking over Individual Commits: Cherry Picking
It will happen that you don’t want to integrate an entire branch directly, but rather parts, i.e. individual commits, first.
The cherry-pick
(“pick the good cherries”) git command is responsible for this.
The command expects one or more commits to be copied to the current branch. For example:
$ git cherry-pick d0c915d $ git cherry-pick topic~5 topic~1 $ git cherry-pick topic~5..topic~1
The middle command copies two explicitly specified commits; the last command, on the other hand, copies all commits belonging to the specified commit range.
Unlike a merge, however, only the changes are integrated, not the commit itself.
To do this, it would have to reference its predecessor, so that the predecessor would also have to be integrated, and so on, which is equivalent to a merge.
So when you take over commits with cherry-pick
, new commits are created with a new commit ID.
Git can’t know that these commits are actually the same.
So if you are merging two branches that you have cherry-picked changes between, conflicts can occur.[48]
These are usually trivial to resolve, and the strategy options ours
and theirs
might be helpful (see Sec. 3.3.4, “Options for the Recursive Strategy”).
The rebase command, on the other hand, recognizes such commit duplications,[49] and omits the duplicated commits.
This allows you to take some commits “from the middle” and then rebuild the branch the commits came from.
The cherry-pick
command also understands these merge strategy options itself: If you want to copy a commit to the current branch, and if you want to make sure the new commit is right in case of conflict, use:
git cherry-pick -Xtheirs <commit>
The $ git cherry-pick -n 785aa39 512f3e9 4e4a063 Finished one cherry-pick. Finished one cherry-pick. Finished one cherry-pick. $ git commit -m "Diverse kleine Änderungen" |
3.6. Visualizing Repositories
When you have created and merged some branches, you will have noticed that the following is the case: it’s easy to lose track.
The arrangement of commits and their relationships to each other is called the topology of a repository.
In the following, we will introduce the graphical program gitk
, among other things, to examine these topologies.
For small repositories, first call gitk --all
, which displays the entire repository as a graph.
Clicking on the individual commits displays the meta-information as well as the generated patch.
3.6.1. Revision Parameters
Since the listing of multiple commits is hard to keep track of, we examine a small sample repository with several branches merged together:
gitk
We recognize four branches (A-D) and one tag release
.
We can also display this tree on the console with the appropriate command line options using the log
command (branch and tag names are printed in semi-bold for better distinction):
$ git log --decorate --pretty=oneline --abbrev-commit --graph --all * c937566 (HEAD, D) commit on branch D | * b0b30ef (release, A) Merge branch 'C' into A | |\ | | * 807db47 (C) commit on branch C | | * 996a53b commit on branch C | |/ |/| | * 83f6bf3 commit on branch A | * 5b2c291 Merge branch 'B' into A | |\ | | * 2417cf7 (B) commit on branch B | |/ |/| | * 0bf1433 commit on branch A |/ * 4783886 initial commit
The output of the So for a quick overview, it’s much more convenient to set up an alias that automatically adds the many long options.
The authors use the alias $ git config --global alias.tree \'log --decorate \ --pretty=oneline --abbrev-commit --graph' By using |
Now we change the above command: instead of the --all
option, which puts all commits in the tree, we now specify B
(the name of the branch)
$ git tree B * 2417cf7 (B) commit on branch B * 4783886 initial commit
We receive all commits that are accessible from B. A commit only knows its predecessor(s) (several if branches are merged). “All commits reachable from B” thus refers to the list of commits from B onwards, up to a commit that has no predecessor (called a root commit).
Instead of one, the command can also accept multiple references.
So to get the same output as with the --all
option, you must specify references A, B, and D.
C can be omitted because the commit is already “collected” on the way from A to the root commit.
Of course, you can also specify an SHA-1 sum directly instead of symbolic references:
$ git tree 5b2c291 * 5b2c291 Merge branch 'B' into A |\ | * 2417cf7 (B) commit on branch B * | 0bf1433 commit on branch A |/ * 4783886 initial commit
If a reference is preceded by a caret (^
), this negates the meaning.[50]
So the notation ^A
means: not the commits that are accessible from A.
However, this switch only excludes these commits, but not the others.
So the above log command with the argument ^A
will not output anything, because Git only knows which commits should not be displayed.
So again, we add --all
to list all commits, minus those that are accessible from A:
$ git tree --all ^A * c937566 (HEAD, D) commit on branch D
An alternative notation is available with --not
: Instead of ^A
you can also write --not A
.
Such commands are especially useful for examining the difference between two branches: Which commits are in branch D that are not in A? The command returns the answer:
$ git tree D ^A * c937566 (HEAD, D) commit on branch D
Because this question is often asked, there is another, more intuitive notation for it: A..D
is equivalent to D ^A
:
$ git tree A..D * c937566 (HEAD, D) commit on branch D
Of course the order is important here: “D without A” is a different set of commits than “A without D”! (Compare also the complete graph.)
In our example there is a tag release
.
To check which commits from branch D (which could stand for “Development”) are not yet included in the current release, simply specify release..D
.
The syntax Alternatively, Git provides the symmetrical difference |
3.6.1.1. Reference vs. List of References
In the example, A always refers to all commits that are accessible from A.
But actually a branch is just a reference to a single commit.
So why does log
always list all commits reachable from A, while the git command show
with the argument A only shows this one commit?
The difference is what the commands expect as an argument: show
expects an object, that is, a reference to a single object, which is then displayed.[51]
Many other commands expect one (or more) commits instead, and these commands convert the arguments into a list of commits (traversing the list until the root commit).
3.6.2. Gitk
Gitk is a graphical program implemented in Tcl, which is usually packaged by distributors along with the actual Git commands — so you can be sure to find it on almost any system.
It represents individual commits or the entire repository in a three-part view: at the top is the tree structure with two additional columns for author and date, below is a list of changes in unified diff format, and a list of files to restrict the changes displayed.
The graph view is intuitive: Different colors help to distinguish the different version strings.
Commits are always blue dots, with two exceptions: The HEAD
is highlighted in yellow, and a commit that is not a root commit, but whose predecessor is not displayed, is shown in white.
Branches with an arrowhead indicate that further commits have been made on the branch. However, Gitk hides the branch due to the time distance between commits. A click on the arrowhead will take you to the continuation of the branch.
Branches appear as green labels, the currently checked out branch additionally bold. Tags are shown as yellow arrows.
You can delete or check out a branch with a right click on it. Right-clicking on commits opens a menu in which you can perform actions on the selected commit. The only thing that might be easier to do with Gitk than from the command line is cherry picking, i.e. transferring individual commits to another branch (see also Sec. 3.5, “Taking over Individual Commits: Cherry Picking”).
Gitk accepts essentially the same options as git log
.
Some examples:
$ gitk --since=yesterday -- doc/ $ gitk e13404a..48effd3 $ gitk --all -n 100
The first command shows all commits since yesterday that have made changes to a file under the doc/
directory.
The second command limits the commits to a specific range, while the third command shows the 100 most recent commits from all branches.
Experience shows that beginners are often confused because |
Many users leave gitk
open during work.
Then it’s important to update the display from time to time so that more recent commits appear.
With F5 (Update) you load all new commits and refresh the display of the references.
Sometimes, however, if you delete a branch, for example, this is not enough.
Although the branch is no longer displayed, there may still be unreachable commits in the GUI as artifacts.
The key combination Ctrl+F5 (Reload) completely reloads the repository, which solves the problem.
As an alternative to gitk
, you can use the GTK-based gitg
or Qt-based qgit
on UNIX systems; on an OS X system, for example, you can use GitX; for Windows, you can use GitExtensions.
Some IDEs now also have corresponding visualizations (e.g. the Eclipse plugin EGit).
Furthermore, you can use full-fledged Git clients like Atlassian SourceTree (OS X, Windows; free of charge), Tower (OS X; commercial) as well as SmartGit (Linux, OS X and Windows; free for non-commercial use).
3.7. Reflog
The Reference Log (Reflog) are log files that Git creates for each branch and HEAD
.
They store when a reference was moved from where to where.
This happens especially with the checkout
, reset
, merge
and rebase
commands.
These log files are stored under .git/logs/
and are named after the reference.
The reflog for the master
branch can be found under .git/logs/refs/heads/master
.
There is also the command git reflog show <reference>
to list the reflog:
$ git reflog show master 48effd3 master@{0}: HEAD^: updating HEAD ef51665 master@{1}: rebase -i (finish): refs/heads/master onto 69b9e27 231d0a3 master@{2}: merge @{u}: Fast-forward ...
The Reflog command is rarely used directly and is just an alias for git log -g --oneline
.
In fact, the -g
option causes the command not to show the predecessors in the commit graph, but to process the commits in the order in which they were reflogged.
You can easily try this: Create a test commit, then delete it again with git reset --hard HEAD^
.
The command git log -g
will now first show the HEAD
, then the deleted commit, and then the HEAD
again.
The reflog thus also references commits that are otherwise no longer referenced, i.e. are “lost” (see Sec. 3.1.2, “Managing Branches”).
The reflog might help you if you have deleted a branch that you would have needed after all.
Although a git branch -D
also deletes the branch’s reflog.
However, you had to check out the branch to commit to it, so use git log -g HEAD
to find the last time you checked out the branch you were looking for.
Then create a branch that points to this (seemingly lost) commit ID, and your lost commits should be back.[52]
Commands that expect one or more references can also implicitly use Reflog.
In addition to the syntax already found in the output of git log -g
(e.g. HEAD@{1}
for the previous position of the HEAD), Git also understands <ref>@{<when>}
.
Git interprets the time <when>
as an absolute or relative date and then consults the reflog of the corresponding reference to find out what the next log entry in time is.
This is then referenced.
Two examples:
$ git log 'master@{two weeks ago}..' $ git show '@{1st of April, 2011}'
The first command lists all commits between HEAD
and the commit the master
branch pointed to two weeks ago (note the suffix ..
which means a commit range up to HEAD
).
This doesn’t necessarily have to be a commit that is two weeks old: if you test moved the branch to the very first commit in the repository two weeks ago using git reset --hard <initial-commit>
, then that very commit will be referenced.[53]
The second line shows the commit to which the currently checked out branch (due to missing explicit reference before the @
) pointed on April 1, 2011.
In both commands, the argument with a Reflog attachment must be enclosed in quotation marks to make sure Git gets the argument completely.
Note that the reflog is only available locally and therefore does not belong to the repository.
If you send a commit ID or tag name to another developer, it references the same commit, but a master@{yesterday}
can reference different commits depending on the developer.
If you don’t specify a branch and time, Git will assume $ git checkout feature # vorher auf "master" $ git commit ... # Änderungen, Commits machen $ git checkout - # zurück auf "master" $ git merge - # Merge von "feature" |
4. Advanced Concepts
The following chapter covers selected advanced concepts. The focus is on the Rebase command with its many applications. We find out who changed a line in the source code (Blame) and when, and how to tell Git to ignore files and directories. We’ll also look at how to stash changes to the working tree and annotate commits (Notes). Finally, we show you how to quickly and automatically find commits that introduce a bug (Bisect).
4.1. Moving commits — Rebase
In the section on Git’s internals, we mentioned earlier that you can move and modify commits in a Git repository (graphically speaking) at will.
In practice, this is made possible primarily by the git command rebase
.
This command is very powerful and important, but sometimes a bit more demanding to use.
Rebase is an artificial word which means “to put something on a new basis”. What it means is that a group of commits is moved around within the commit graph, building commit after commit based on another node. The following graphics illustrate how this works:
In its simplest form the command is git rebase <reference>
(in the above diagram: git rebase master
).
This means that Git first marks all commits <reference>..HEAD
, i.e. the commits that can be reached from HEAD
(the current branch) minus the commits that can be reached from <reference>
- in other words, everything that is in the current branch but not in <reference>
.
In the diagram, these are E and F.
The list of these commits is stored temporarily.
Git then checks out the commit <reference>
and copies the individual cached commits in the original order as new commits to the branch.
There are a few points to consider:
-
Because the first node of the topic branch (E) now has a new predecessor (D), its metadata and thus its SHA-1 sum changes (it becomes E_). The second commit (F) then also has a different predecessor (E_ instead of E), its SHA-1 sum changes (it becomes F_) and so on - this is also called the ripple effect. Overall, all copied commits will have new SHA-1 sums - so they’re the same (in terms of changes), but not identical.
-
Such an action, just like a merge operation, can result in conflicting changes. Git can partially resolve them automatically, but aborts with an error message if the conflicts are not trivial. The rebase process can then either be “repaired” and continued, or aborted (see below).
-
If no other reference points to node F, it will be lost, because reference HEAD (and the corresponding branch, if applicable) will be shifted to node F_ in case of a successful rebase. So if F has no more reference (and no predecessors referencing F), Git can no longer find the node, and the tree “disappears”. If you’re not sure whether you need the original tree again, you can simply reference it with the
tag
command, for example. In that case, the commits will be preserved even after a rebase (but then in duplicate at different places in the commit graph).
4.1.1. An Example
Consider the following situation:
The sqlite-support
branch branches off from the “fixed a bug…” commit.
But the master
branch has already moved on, and a new 1.4.2 release has been made.
Now sqlite-support
is checked out and rebuilt to master
:
$ git checkout sqlite-support $ git rebase master First, rewinding head to replay your work on top of it... Applying: include sqlite header files, prototypes Applying: generalize queries Applying: modify Makefile to support sqlite
Rebase applies the three changes introduced by commits from the sqlite-support
branch to the master
branch.
After that, the repository looks like this in Gitk:
4.1.2. Extended Syntax and Conflicts
Normally git rebase
will always build the branch you are currently working on on a new one.
However, there is a shortcut:
If you want to base topic
on master
, but you are on a completely different branch, you can do this via
$ git rebase master topic
Git does the following internally:
$ git checkout topic $ git rebase master
Please note the (unfortunately not very intuitive) order:
git rebase <on which> <what>
A rebase can lead to conflicts. The process then stops with the following error message:
$ git rebase master ... CONFLICT (content): Merge conflict in <datei> Failed to merge in the changes. Patch failed at ... The copy of the patch that failed is found in: .../.git/rebase-apply/patch When you have resolved this problem, run "git rebase --continue". If you prefer to skip this patch, run "git rebase --skip" instead. To check out the original branch and stop rebasing, run "git rebase --abort".
You proceed as with a regular merge conflict (see Sec. 3.4, “Resolving Merge Conflicts”) - git mergetool
is very helpful here.
Then simply add the changed file via git add
and let the process continue via git rebase --continue
.[54]
Alternatively, the problematic commit can be skipped using the git rebase --skip
command.
The commit is then lost unless it is referenced in another branch somewhere else!
So you should only perform this action if you are certain that the commit is obsolete.
If none of this helps (e.g. if you can’t solve the conflict at that point, or if you realize that you are rebuilding the wrong tree), pull the emergency brake: git rebase --abort
.
This will discard all changes to the repository (including successfully copied commits), so that the state afterwards is exactly the same as it was when the rebase process was started.
The command also helps if at some point you forget to finish a rebase process, and other commands complain that they can’t do their job because a rebase is in progress.
4.1.3. Why Rebasing Makes Sense
Rebase is primarily useful for keeping the commit history of a project simple and easy to understand. For example, a developer might be working on a feature, but then have something else to do for a few weeks. Meanwhile, however, development on the project has progressed, there’s been a new release, etc. Only now does the developer get to finish a feature. (Even if you want to send patches via email, rebase helps to avoid conflicts, see Sec. 5.9, “Patches via E-mail”.)
For the version history it is now much more logical if his feature was not “dragged along” unfinished for a long period of time alongside the actual development, but if the development branches off from the last stable release.
Rebase is good for exactly this change in history:
The developer can now simply enter the command git rebase v1.4.2
on the branch where he developed the feature, to rebuild his feature branch on the commit with the release tag v1.4.2
.
This makes it much easier to see what differences the feature really brings to the software.
It also happens to every developer in the heat of the moment that commits end up in the wrong branch. There is a bug that happens to be there, which is quickly fixed by a commit; but then a test must be written directly to avoid this bug in the future (another commit), and this must be noted in the documentation. After the actual work is done, you can use Rebase to “transplant” those commits to another location in the commit graph.
Rebase can also be useful if a branch requires a feature that has only recently been incorporated into the software.
A merge of the master
branch does not make sense semantically, because then these and other changes are inseparably merged with the feature branch.
Instead, you rebase the branch on a new commit that already contains the required feature, and then use that in further development.
4.1.4. When Rebasing Is Not Useful — Rebase vs. Merge
The concept of rebase is initially a little difficult to understand. But once you have understood what is possible with it, the question arises: What is the point of a simple merge if you can edit everything with rebase?
When git-rebase is not used, or hardly used at all, a project history often develops that becomes relatively unmanageable, because merges have to be performed constantly and for a few commits at a time.
If, on the other hand, too much rebase is used, there is a danger that the entire project will be senselessly linearized: The flexible branching of Git is used for development, but the branches are then integrated into the publishing branch one after the other (!) like a zip fastener via rebase. This presents us with two main problems:
-
Logically related commits are no longer recognizable as such. Since all commits are linear, the development of multiple features is inextricably intertwined.
-
The integration of a branch can no longer be easily undone, because identifying those commits that once belonged to a feature branch is only possible manually.
This is how you can make the most of Git’s flexible branching. The conclusion is that rebase should be used neither too much nor too little. Both make the project history (in different ways) confusing.
In general, you are doing well with the following rules of thumb:
-
A feature is integrated by merge when it is finished. It is best to avoid creating a fast forward merge so that the merge commit is preserved as the time of integration.
-
While you are developing, you should use rebase frequently (especially interactive rebase, see below).
-
Logically separate units should be developed on separate branches - logically related ones possibly on several, which are then merged by rebase (if that makes sense). The merging of logically separate units is then done by merge.
4.1.5. A Word of Warning
As mentioned earlier, a rebase inevitably changes the SHA-1 sums of all commits that are “rebuilt”. If these changes have not yet been published, that is, if a developer has them in a private repository, that’s not too bad either.
But if a branch (e.g. `master`) is published[55] and later rewritten via rebase, this has unpleasant consequences for all involved:
All branches based on master
will now reference the old copy of the master
branch that has been rewritten.
So each branch must be rebased to the new master
(which in turn changes all commit IDs).
This effect continues, and can be very time-consuming to fix (depending on when such a rebase happens, and how many developers are involved in the project), especially if you’re new to git.
Therefore you should always remember the following rule:
Only edit unpublished commits with the rebase command! |
Exceptions are conventions like personal branches or pu
.
The latter is an abbreviation for Proposed Updates and is usually a branch where new, experimental features are tested for compatibility.
No one builds their own work on this branch, so it can be rewritten without problems and prior notice.
Another possibility is offered by private branches, i.e. those that start with <user>/
for example.
If you make an agreement that developers will do their own development on these branches, but always base their features on “official” branches, then the developers may rewrite their branches as they wish.
4.1.6. Avoiding Code Duplication
If a feature is being developed over a long period of time, and parts of the feature are already flowing into a mainstream release (e.g. via cherry-pick
), the rebase command will detect these commits and omit them when copying or rebuilding the commits, because the change is already contained in the branch.
For example, after a rebase, the new branch consists only of the commits that have not yet been incorporated into the base branch. This way, commits do not appear twice in the version history of a project. If the branch had simply been merged, the same commits with different SHA-1 sums would sometimes be present in different places in the commit graph.
4.1.7. Managing Patch Stacks
There are situations where there is a vanilla version (“simplest version”) of a piece of software and also a certain number of patches applied to it before the vanilla version is shipped. For example, your company builds software, but before each delivery to the customer, some adjustments have to be made (depending on the customer). Or you have open source software in use, but have adapted it a bit to your needs - every time a new, official version of the software is released, you have to reapply your changes and then rebuild the software.[56]
To manage patch stacks, there are some programs that build on top of Git, but give you the convenience of not having to work directly with the rebase command. For example, TopGit[57] allows You can define dependencies between branches - if something changes in a branch and other branches depend on it, TopGit will rebuild them on demand. An alternative to TopGit is Stacked Git[58].
4.1.8. Restricting Rebase via --onto
Now, you may have wondered:
git rebase <reference>
always copies all commits that are between reference>
and HEAD
.
But what if you only want to implement part of a branch, to “transplant” it, so to speak?
Consider the following situation:
rebase --onto
You were developing a feature on the branch topic
when you noticed a bug; you created a branch bugfix
and found another bug.
Semantically speaking, your branch bugfix
has nothing to do with the topic
branch. Therefore, it makes sense to branch off from the master
branch.
But if you now rebuild the branch bugfix
using git rebase master
, the following happens:
All nodes that are in bugfix
but not in master
are copied to the master
branch in order - that is, nodes D, E, F, and G.
However, D and E are not part of the bugfix at all.
This is where the --onto
option comes into play:
It allows you to specify a start and end point for the list of commits to be copied.
The general syntax is
git rebase --onto <on which> <start> <end>
In this example, we only want to build the commits F and G (or also: the commits from topic
to bugfix
) from the top of master
.
Therefore the command is
$ git rebase --onto master topic bugfix
The result looks as expected: