5. Distributed Git

Git is a distributed version control system. To understand this feature, a brief digression into the world of centralized version management is necessary: As the name suggests, in a central version control system, such as RCS, CVS, and Subversion, the development history is stored centrally on a repository server, and all developers synchronize their work with this one repository. Developers who want to change something download a current version to their computer (checkout), maintain their modifications, and then send them back to the server (commit).

5.1. How Does Distributed Version Control Work?

One of the major disadvantages of the centralized approach is that a connection to the server is required for most of the work steps. For example, if you want to view history or make a commit, you need a network connection to the server. Unfortunately, this is not always guaranteed, maybe the server is down or you are working on your laptop without a (W)LAN connection.

For distributed systems this is regulated differently: Basically, each developer has his or her own local copy of the repository, so the question arises of how developers share changes.

One approach is to provide a single “master repository” that all developers use to synchronize their local repositories. The developers connect to this repository from time to time, uploading their own commits (push) and downloading those of their colleagues (fetch or pull). This very centralized approach is often used in practice. For an illustration, see Figure 30, “Central workflow with distributed version management”.

However, there are two noteworthy alternatives in the Git environment that we will introduce in this chapter: the Integration Manager workflow, which uses multiple public repositories (Sec. 5.6, “Distributed Workflow with Multiple Remotes”), and patch exchange by e-mail (Sec. 5.9, “Patches via E-mail”).

central workflow
Figure 30. Central workflow with distributed version management

Unlike central systems, Git’s commit and checkout processes are local. Other day-to-day tasks, such as reviewing history or switching to a branch, are also done locally. Only the uploading and downloading of commits are non-local operations. This has two important advantages over centralized version management: No network is needed, and everything is faster. How often you synchronize your repository depends, among other things, on the size and development speed of the project. If you’re working with a colleague on the internals of your software, you’ll probably need to synchronize more often than if you’re working with a feature that doesn’t have a major impact on the rest of the code base. It may well be that one synchronization per day is sufficient. So you can work productively even without a permanent network connection.

This chapter is about how to exchange changes between your local repository and a remote repository (aka remote), what to consider when working with multiple remotes, and how to email patches so that they can be easily applied by the recipient.

The most important commands at a glance:

git remote

General configuration of remotes: add, remove, rename, etc.

git clone

Download complete copy.

git pull and git fetch

Download commits and references from a remote.

git push

Upload commits and references to a remote.

5.2. Cloning Repositories

You have already seen the first command related to remote repositories: git clone. Here we illustrate the cloning process with our “git cheat sheet”:⁠[65]

$ git clone git://github.com/esc/git-cheatsheet-de.git
Initialized empty Git repository in /tmp/test/git-cheatsheet-de/.git/
remote: Counting objects: 77, done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 77 (delta 45), reused 0 (delta 0)
Receiving objects: 100% (77/77), 132.44 KiB, done.
Resolving deltas: 100% (45/45), done.

Git will issue various status messages when this call is made. The most important ones are: the notification of which directory the new repository will be cloned to (Initialized empty Git repository in /tmp/test/git-cheatsheet-de/.git/), and the confirmation that all objects have been successfully received ((Receiving objects: 100% (77/77), 132.44 KiB, done.) If the cloning process is successful, the master branch is checked out,⁠[66] and the working tree including repository is located in the directory git-cheatsheet-en.

$ cd git-cheatsheet-de
$ ls
cheatsheet.pdf  cheatsheet.tex  Makefile  README
$ ls -d .*
.git/

To create the clone in a different directory, simply pass it as an argument:

$ git clone git://github.com/esc/git-cheatsheet-de.git cheatsheet
Initialized empty Git repository in /tmp/test/cheatsheet/.git/
$ ls
cheatsheet/

Furthermore, the source repository, i.e. the origin of the clone, is configured as a remote repository named origin. The git remote command displays the setting:

$ git remote
origin

The setting is stored in the configuration file .git/config with the entry remote, in this case only for origin:

[remote "origin"]
    fetch = +refs/heads/*:refs/remotes/origin/*
    url = git://github.com/esc/git-cheatsheet-de.git

You will see two settings in the section: fetch and url. The first, called the refspec, specifies which changes are to be downloaded when synchronizing with the remote repository, and the second specifies the URL used to do this.

git remote is also used to manage remote repositories. For example, you can add more remote repositories using git remote add, adapt the URL for the remote repository using git remote set-url, and so on, but more on this later.

The name origin is just a convention; with git remote rename you can change the name of the source repository to suit your needs, for example, from origin to github:

$ git remote rename origin github
$ git remote
github

With the option --origin or -o you set the name immediately when cloning:

$ git clone -o github git://github.com/esc/git-cheatsheet-de.git

5.2.1. Repository URLs

Git supports several protocols for accessing a remote repository, the most common three being Git protocol, SSH, and HTTP(S). Designed specifically for Git, the Git protocol favors data transfer by always transferring the smallest possible amount of data. It doesn’t support authentication, so it’s often transmitted over an SSH connection. This ensures both efficient (Git protocol) and secure (SSH) transmission. HTTP(S) is used when a firewall is configured very restrictively and the allowed ports are drastically restricted.⁠[67]

In general, a valid URL contains the transfer protocol, the address of the server and the path to the repository:⁠[68]

  • ssh://[user@]gitbu.ch[:port]/pfad/zum/repo.git/

  • git://gitbu.ch[:port]/pfad/zum/repo.git/

  • http[s]://gitbu.ch[:port]/pfad/zum/repo.git/

For the SSH protocol the short form still exists:

  • [user@]gitbu.ch:pfad/zum/repo.git/

It is also possible to clone repositories locally using the following syntax:

  • /pfad/zum/repo.git/

  • file:///pfad/zum/repo.git/

If you want to know what URLs are configured for a remote repository, use git remote’s --verbose or -v option:

$ git remote -v
origin  git://github.com/esc/git-cheatsheet-de.git (fetch)
origin  git://github.com/esc/git-cheatsheet-de.git (push)

You can see that there are two URLs for the remote repository origin, but they are set to the same value by default. The first URL (fetch) specifies from where and with which protocol changes are downloaded. The second URL (push) specifies where changes are uploaded to and with which protocol. Different URLs are particularly interesting if you download or upload with different protocols. A common example is to download with the git protocol (git://) and upload with the SSH protocol (ssh://). It is then downloaded without authentication and encryption, which provides a speed advantage, but uploaded with authentication and encryption, which ensures that only you or other authorized people can upload. You can use the git remote set-url command to customize the URLs:

$ git remote set-url --add \
  --push origin pass:quotes[git@github.com]:esc/git-cheatsheet-de.git
$ git remote -v
origin  git://github.com/esc/git-cheatsheet-de.git (fetch)
origin  git@github.com:esc/git-cheatsheet-de.git (push)

If you want to customize the URL of a repository, it is often faster to do this directly in the .git/config configuration file. Git provides the git config -e command for this: it opens this file in your editor.

5.2.2. Remote-Tracking-Branches

The current status of the remote repository is stored locally. Git uses the mechanism of remote tracking branches, special branches — local references — that reflect the state of the remote branches. They “track” the remote branches and are advanced or set by Git when synchronizing with the remote, if the branches in the remote have changed. In terms of the commit graph, remote tracking branches are markers within the graph that point to the same commits as the branches in the remote repository. You can’t modify remote tracking branches like normal branches; Git manages them automatically, so it updates them. When you clone a repository, Git initializes a remote tracking branch for each remote branch.

clone
Figure 31. Generated Remote Tracking Branches

Figure 31, “Generated Remote Tracking Branches” shows an example. The origin remote repository has three branches: pu, maint, and master. Git creates a remote tracking branch in the cloned repository for each of these remote branches. It also creates a local branch master in the clone that corresponds to the remote branch master. This is checked out and is the branch you should work in if you plan to upload commits to the master (but see Sec. 5.3.1, “git fetch”).

In the git fetch example, there is only one branch on the remote side, master. That’s why Git creates only one remote tracking branch in the clone, origin/master. The git branch -r command shows all remote tracking branches:

$ git branch -r
  origin/HEAD -> origin/master
  origin/master

The special entry origin/HEAD → origin/master states that in the remote repository the HEAD points to the branch master. This is important for cloning, because this branch is checked out after cloning. The list of remote tracking branches is a bit sparse in this example, you can see more entries in a clone of the Git-via-Git repository:

$ git branch -r
  origin/HEAD -> origin/master
  origin/html
  origin/maint
  origin/man
  origin/master
  origin/next
  origin/pu
  origin/todo

All branches can be displayed with git branch -a:

$ git branch -a
* master
  remotes/origin/HEAD -> origin/master
  remotes/origin/master

In this case, Git uses the prefix remotes/ to clearly distinguish remote tracking branches from normal ones. If you have enabled color output, the different branches will also be color-coded: the checked-out branch green, remote tracking branches red.

Remote Tracking Branches are also references only and are therefore stored under .git/refs like all references. However, since they are special references that are also linked to a remote repository, they end up under .git/refs/remotes/<remote-name> (see Sec. 3.1.1, “HEAD and Other Symbolic References”). In Gitk, the remote tracking branches are displayed with the prefix remotes/<remote-name>/, which is also colored dark yellow (Figure 32, “Branch next and the corresponding remote tracking branch in Gitk”).

remote tracking gitk
Figure 32. Branch next and the corresponding remote tracking branch in Gitk

5.3. Downloading Commits

Now what does it mean when you synchronize two repositories, such as a clone with the source? Synchronization in this context means two things: first, downloading commits and references, and second, uploading. As far as the commit graph is concerned, the local graph needs to be synchronized with the one on the remote side, so that both have the same structure. In this section, we first discuss how to download commits and references from a remote. There are two commands for this: git fetch and git pull. We’ll first introduce both commands, and in Sec. 5.3.3, “git fetch vs. git pull” we’ll describe which command is preferable under which circumstances.

5.3.1. git fetch

As soon as new commits are created by other developers in a remote, you want to download them to your local repository. In the simplest case, you just want to find out which commits you don’t have locally, download them, and update the remote tracking branches so that they reflect the current status in the remote.

Use the git fetch command to do this:

$ git fetch origin
...
From github.com:esc/git-cheatsheet-de
   79170e8..003e3c7  master     -> origin/master

Git acknowledges the call with a message that origin/master has been set from commit 79170e8 to commit 003e3c7. The notation master → origin/master indicates that the branch master from the remote was used to update the remote tracking branch origin/master. In other words: Branches from the remote on the left and remote tracking branches on the right.

See Figure 33, “Remote Tracking Branches are updated” for the effect this has on the commit graph: On the left side is the initial state of the remote origin and next to it that of the clone. Both the remote and the clone have new commits since the last synchronization (C and D). The remote tracking branch origin/master in the clone points to commit B; this is the last state of the remote known to the clone. By calling git fetch origin, Git updates the remote tracking branch in the clone to reflect the current status of the master (pointing to commit C) in the remote. To do this, Git downloads the missing commit C and then sets the remote tracking branch on it.

fetch
Figure 33. Remote Tracking Branches are updated
5.3.1.1. Refspec

The refspec (reference specification) ensures that the remote tracking branches are set. This is a description of the references to be retrieved from the remote. An example was given above:

[remote "origin"]
    fetch = +refs/heads/*:refs/remotes/origin/*
    url = git://github.com/esc/git-cheatsheet-de.git

In the entry fetch the refspec for the remote is stored. It has the form: <remote-refs>:<local-refs> with an optional plus (+). The example is configured so that all branches, i.e. all references stored in the remote under refs/heads, end up locally under refs/remotes/origin.⁠[69] Thus, for example, the branch master from the remote origin (refs/heads/master) is stored locally as refs/remotes/origin/master.

Normally the remote tracking branches are “fast-forwarded”, similar to a fast-forward merge. The remote tracking branch is therefore only updated if the target commit is a descendant of the current reference. This may not be possible, for example, after a rebase. In this case, Git will refuse to update the remote tracking branch. However, the plus overrides this behavior, and the remote tracking branch is still updated. If this happens, Git will indicate this with the addition (forced update):

 + f5225b8..0efec48 pu         -> origin/pu  (forced update)

This setting is useful in practice and is therefore set by default. Furthermore, as a user you do not need to worry about setting the refspec, because if you use the command git clone or git remote add, Git automatically creates the corresponding default entry for you. Sometimes you may want to restrict the refspec explicitly. For example, if you use namespaces for all developers and you are only interested in the master branch and the branches of the other developers in your team (Beatrice and Carlos), it might look like this:

[remote "firma"]
    url = axel@example.com:produkt.git
    fetch = +refs/heads/master:refs/remotes/origin/master
    fetch = +refs/heads/beatrice/*:refs/remotes/origin/beatrice/*
    fetch = +refs/heads/carlos/*:refs/remotes/origin/carlos/*

With regard to the commit graph, Git only downloads those commits that are necessary to get references in the commit graph. This makes sense, because commits that are not “secured” by a reference are considered unreachable, and will eventually be deleted (see also Sec. 3.1.2, “Managing Branches”). In the last example, Git therefore does not need to download commits that are referenced by the branches that are not in the refspec. In terms of distribution, Git does not necessarily need to synchronize the entire commit graph, the “relevant” parts are sufficient.

Alternatively, you can specify the refspec on the command line:

$ git fetch origin +refs/heads/master:refs/remotes/origin/master

If there is a refspec that has no reference on the right side of the colon, there is no target to store. In this case, Git places the reference in the .git/FETCH_HEAD file instead, and you can use the special term FETCH_HEAD for a merge:

$ git fetch origin master
From github.com:esc/git-cheatsheet-de
 * branch            master     -> FETCH_HEAD
$ cat .git/FETCH_HEAD
003e3c70ce7310f6d6836748f45284383480d40e
    branch 'master' of github.com:esc/git-cheatsheet-de
$ git merge FETCH_HEAD

This feature can be useful if you are interested in a single remote branch that you have not configured a remote tracking branch for and do not want to do so.

5.3.1.2. Deleting Expired Remote Tracking Branches

If a Remote Branch is deleted (as described in Sec. 5.4.1, “Deleting Remote References”), the corresponding Remote Tracking Branch is referred to as stale (“expired”). Since such branches usually have no further use, delete them (prune):

$ git remote prune origin

Delete directly during download:

$ git fetch --prune

Since this is often the desired behavior, Git offers the fetch.prune option. If you set it to true, git fetch will behave as if you had called it with the --prune option.

5.3.1.3. Working with Local Branches

So far we have only discussed how to track the change in a remote. If you make changes yourself that are based on one of the branches in the remote, you must first create a local branch where you are allowed to make commits:⁠[70]

$ git checkout -b next origin/next
Branch next set up to track remote branch next from origin.
Switched to a new branch next

If no local branch named next exists yet, the following abbreviation also works:

$ git checkout next
Branch next set up to track remote branch next from origin.
Switched to a new branch next

The set up to track message indicates that Git is configuring the branch next from the remote origin as the upstream branch for the local branch next. This is a kind of “shortcut” that benefits other Git commands. For more details, see Sec. 5.3.2, “git pull”.

You can work in the local branch as usual. Note, however, that you only ever commit locally. To publish your work, i.e. upload it to a remote branch, you still need the git push command (Sec. 5.4, “Uploading Commits: git push”).

5.3.2. git pull

Suppose you want to transfer commits from the remote repository to your local branch. To do this, first run a git fetch to fetch new commits, and then merge the change from the corresponding remote tracking branch:⁠[71]

$ git merge origin/master
Updating 79170e8..003e3c7
Fast-forward
 cheatsheet.pdf |  Bin 89792 -> 95619 bytes
 cheatsheet.tex |   19 +++++++++++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

For this use case, Git provides the git pull command to speed up your workflow. It is a combination of git fetch and git merge or git rebase.

Downloading new commits from origin and merging all commits referenced by the master there into the current branch can be done with the following command:

$ git pull origin master
...
From github.com:esc/git-cheatsheet-de
   79170e8..003e3c7  master     -> origin/master
Updating 79170e8..003e3c7
Fast-forward
 cheatsheet.pdf |  Bin 89792 -> 95619 bytes
 cheatsheet.tex |   19 ++++++++++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

In Figure 34, “What happens with a pull” we illustrate the process. On the left, you see the remote repository origin and next to it the current status of the local repository. The repository was cloned when it only contained commits A and B, so the remote tracking branch points origin/master to B. In the meantime, both the remote (C) and local (D) repositories have been added.

On the right side is the state after git pull origin master. Commit C has been added to the local repository. The fetch call contained in the pull has updated the remote tracking branch, i.e. it points to the same commit as the master in origin and thus reflects the state there. In addition, the merge call contained in the pull has integrated the master from origin into the local master, as you can see from the merge commit M and the current position of the local master.

pull
Figure 34. What happens with a pull

Alternatively, the --rebase option instructs the pull command to rebase the local branch to the remote tracking branch after fetch:

$ git pull --rebase  origin master

In Figure 35, “What happens during a pull with rebase” you can see what happens if you perform a rebase instead of the default merge.

pull rebase
Figure 35. What happens during a pull with rebase

The initial situation is the same as in Figure 34, “What happens with a pull”. The fetch contained in the pull moves the remote tracking branch origin/master to commit C. However, rebase does not create a merge commit; instead, a call to rebase gives the commit D a new base, and the local master is set to the new commit D'. (Rebase is described in detail in Sec. 4.1, “Moving commits — Rebase”).

5.3.2.1. Upstream Branches

Often git fetch, git pull and git push are executed without arguments. Git uses the configuration of the upstream branches to decide what to do, among other things. From the repository’s config:

[branch "master"]
    remote = origin
    merge = refs/heads/master

The entry states that the local branch master is linked to the remote branch master in the origin repository.

The remote entry instructs git fetch and git pull, from which remote commits are downloaded. The merge entry tells git pull to merge the new commits from the remote branch master to the local master. This allows both commands to be used without arguments, which is very common in practice.

$ git fetch
...
From github.com:esc/git-cheatsheet-de
   79170e8..003e3c7  master     -> origin/master
$ git pull
...
From github.com:esc/git-cheatsheet-de
   79170e8..003e3c7  master     -> origin/master
Updating 79170e8..003e3c7
Fast-forward
 cheatsheet.pdf |  Bin 89792 -> 95619 bytes
 cheatsheet.tex |   19 ++++++++++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

If no upstream branch is configured, it tries git fetch with origin and otherwise aborts:

$ git fetch
fatal: No remote repository specified.  Please, specify either a URL or
a remote name from which new revisions should be fetched.

If you want changes from an upstream branch on git pull to be applied by rebase instead of merge by default, set the value of the branch.<name>.rebase setting to true, for example

$ git config branch.master.rebase true

5.3.3. git fetch vs. git pull

Git beginners often ask themselves whether they should use fetch or pull. The answer depends on how you develop: How big is the project? How many remotes are there? How heavily are branches used?

5.3.3.1. Distributed Git for Beginners

Especially for beginners, it makes sense that all participants work on the same branch (usually master), synchronize with the same repository (central workflow) and use only git pull for downloading and git push for uploading. This eliminates the need to deal with more complex aspects such as object model, branching and distribution; and participants can contribute improvements with just a few commands.

This results in the following workflow:

# Repository Klonen
$ git clone <URL>
# Arbeiten und lokale Commits machen
$ git add ...
$ git commit
# Veränderungen von Anderen herunterladen
$ git pull
# Eigene Veränderungen hochladen
$ git push
# Weiter arbeiten, und Synchronisation bei Bedarf wiederholen
$ git commit

This approach has advantages and disadvantages. The advantage is certainly that only a basic understanding of Git is necessary to follow the workflow successfully. The automatic configuration of upstream branches ensures that git push and git pull do the “right thing” without argument. In addition, this workflow is similar to what Subversion users are used to.

However, there are also drawbacks, mainly related to implicit merging. Suppose the team consists of two people, Beatrice and Carlos. Both have made local commits, and Beatrice has already uploaded hers. Carlos now runs git pull and receives the message Merge made by recursive. If you keep the commit graph in mind, it’s logical: the local branch and the master of the remote have diverged, so they have been merged back together. However, Carlos doesn’t understand the message, since he was working on a different part of the code than his colleague, and in his opinion no merge was necessary. One problem is that term merge stores the association that many people used to have with centralized version control that changes would be merged into the same file. With Git, however, a merge is always to be understood as the merging of commits into a commit graph. This may mean merging changes to the same file, but it does not require it.

Besides confusing users, this workflow creates “nonsensical” commits in the history. Ideally, merge commits should be meaningful entries in the repository history. An outsider can immediately see that a development branch has been included. However, this workflow inevitably involves the local master and its remote counterpart diverging and being merged back together. The resulting merge commits make no sense — they are actually only a side effect of the workflow and reduce the readability of the history. Although the --rebase option for git pull offers a remedy, the man page explicitly advises against using this option unless you have already internalized the principle of rebase. Once you understand this, you’re also familiar with how the commit graph is created and how to manipulate it — it’s worthwhile for you to go straight for feature-driven development with branches as a workflow.

5.3.3.2. Distributed Git for Advanced Users

Once you understand the object model and the commit graph, we recommend that you use a workflow that essentially consists of git fetch, manual merges, and many branches. The following are some recipes as a suggestion.

If you are using master as your integration branch, you will need to move your local U forward after calling git fetch. To be precise, you need to advance all local branches that have a remote equivalent. Git provides the syntax @{upstream} and @{u}, which corresponds to the remote tracking branch configured for the current branch. This can be very helpful.

# Veränderungen von Anderen herunterladen
$ git remote update
...
   79170e8..003e3c7  master     -> origin/master

# Den Status der Remote-Tracking-Branches abfragen
$ git branch -vv
* master 79170e8 [origin/master: behind 1] Lizenz hinzugefügt

# Veränderungen einsehen
$ git log -p ..@{u}

# Heruntergeladene Änderungen übernehmen
$ git merge @{u}
Updating 79170e8..003e3c7
Fast-forward
...

# ... oder eigene Änderungen darauf neu aufbauen
$ git rebase @{u}

# Änderungen dann hochladen
$ git push

If you frequently synchronize local branches with your remote tracking branch, we recommend the following alias:

$ git config --global alias.fft "merge --ff-only @{u}"

This allows you to easily move forward a branch with git fft (Fast Forward Tracking). The --ff-only option prevents accidental merge commits from occurring where none should.

In this context, Ch. 6, Workflows is also helpful, where it is described how to work clearly with many Topic Branches.

5.4. Uploading Commits: git push

The counterpart to fetch and pull is the command git push. This is used to upload git objects and references to a remote — e.g. the local master to the branch master in the remote origin:

$ git push origin master:master

As with git fetch, you specify the references for uploading with a refspec. However, the refspec has the opposite form:

<local-refs>:<remote-refs>

This time the local references are on the left side of the colon, and the remote references on the right.

If you omit the colon and the remote reference, the local name will also be used on the remote side, and will be created by Git if it doesn’t exist:

$ git push origin master
Counting objects: 73, done.
Compressing objects: 100% (33/33), done.
Writing objects: 100% (73/73), 116.22 KiB, done.
Total 73 (delta 42), reused 68 (delta 40)
Unpacking objects: 100% (73/73), done.
To git@github.com:esc/git-cheatsheet-de.git
 * [new branch]      master -> master

Figure 36, “Upload references and commits” shows the process behind git push. The initial situation is shown on the left (it is the result of a pull call). Git uploads the missing commits D and M to the remote origin. At the same time, the remote branch master is advanced to the commit M, so that it matches the local branch master. In addition, the remote tracking branch origin/master is advanced so that it reflects the current status in the remote.

push
Figure 36. Upload references and commits

Like fetch, Git refuses to update references where the target commit is not a descendant of the current commit:

$ git push origin master
...
 ! [rejected]        master -> master (non-fast-forward)
error: failed to push some refs to 'git@github.com:esc/git-cheatsheet-de.git'
To prevent you from losing history, non-fast-forward updates were
rejected
Merge the remote changes before pushing again.  See the 'Note about
fast-forwards' section of 'git push --help' for details.

You can override this behavior either by prefixing it with a plus (+) in the refspec or by using the --force or short -f option:⁠[72]

$ git push origin --force master
$ git push origin +master

Look out! Commits may be lost on the remote side — for example, if you have moved a branch using git reset --hard and commits are no longer referenced.

You’ll also get the error message if you have modified commits that have already been published via git push using git rebase or git commit --amend. So here’s the explicit warning again: avoid modifying commits that you have already published! The modified SHA-1 sums will cause duplication if others have already downloaded the original commits.

5.4.1. Deleting Remote References

There are two ways to delete references in the remote: The older one (before Git version 1.7.0) is to omit the local reference in the refspec — this statement means you want to upload “nothing”. So you replace an existing reference with the empty one.

$ git push origin :bugfix

However, newer git versions usually use the git push command with the --delete option, which is syntactically much clearer:

$ git push origin --delete bugfix

Note that in other clones, the remote tracking branch origin/bugfix, if present, does not automatically disappear! See the section on pruning above (Sec. 5.3, “Downloading Commits”).

5.4.2. Pushing without Arguments: push.default

In everyday life you often run git push without specifying remote and refspec. In this case, Git uses the configuration entries (upstream branch and push.default) to decide which references are sent where.

$ git push
...
To git@github.com:esc/git-cheatsheet-de.git
   79170e8..003e3c7  master -> master

By default, Git proceeds like this:⁠[73] If you don’t specify a remote, Git will look for the upstream configuration of the current branch. If the name of the branch on the remote side matches the name of the local branch, the corresponding reference is uploaded (this is to protect you from uploading, for example, your branch devel to master if the upstream configuration is incorrect). If no upstream branch is configured, Git aborts with an error message:

$ git push
fatal: The current branch master has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin master

If you use git push <remote> to specify a remote but no branch, Git will attempt to upload the current branch to the remote under the same name.

The strategy described here is also known as simple. For most use cases, it does what the user expects and protects against avoidable errors. However, you can set the push.default option responsible for this to one of the following values if required:

nothing

Do not upload anything. This is useful if you always want to explicitly specify which branch you want to upload to where.

upstream

If the current branch has an upstream branch, push there.

current

Push the current branch into a remote branch of the same name.

matching

Uploads all locally existing references for which a reference of the same name already exists in the corresponding remote. Attention: You are potentially uploading several branches at the same time!

5.4.3. Configuring the Upstream Branch

In some cases, Git will automatically configure upstream branches (for example, after a git clone). However, you need to do this explicitly, especially for new branches that you are uploading for the first time. You can do this either afterwards using the --set-upstream-to option or, in short, -u of git branch:

$ git push origin new-feature
$ git branch -u origin/new-feature
Branch new-feature set up to track remote branch new-feature from origin.

Alternatively, and if you think about it, you can also have git push write the configuration when you call git push with the -u option:

$ git push -u origin new-feature

To view the upstream configuration of your branches, call git branch -vv. The output shows the upstream partner of a branch (if any) in square brackets.

5.5. Examining Remotes

In this section, we introduce techniques for viewing a remote and comparing your local repository to it.

5.5.1. Overview of a Remote

The git remote show command gives a concise summary of the remote, including the branches available there, whether they are tracked locally (tracking status) and which local branches are configured for specific tasks.

The command must request the current status from the remote, i.e. the command fails if the remote is not available, e.g. due to a missing network connection. The option -n prevents the query.

$ git remote show origin
* remote origin
  Fetch URL: git://git.kernel.org/pub/scm/git/git.git
  Push  URL: git://git.kernel.org/pub/scm/git/git.git
  HEAD branch: master
  Remote branches:
    html   tracked
    maint  tracked
    man    tracked
    master tracked
    next   tracked
    pu     tracked
    todo   tracked
  Local branches configured for 'git pull':
    master merges with remote master
    pu     merges with remote pu
  Local refs configured for 'git push':
    master pushes to master (local out of date)
    pu     pushes to pu     (up to date)

5.5.2. Comparing with the Upstream

If you have configured an upstream branch, when you change the branch (git checkout) and query the status (git status), you will receive a notification about the status of the branch compared to the upstream, for example:

$ git checkout master
Your branch is behind 'origin/master' by 73 commits, and can be
fast-forwarded.

Here there are four different possibilities:

  • The branches point to the same commit. Git doesn’t show any special message. This state is also called up-to-date.

  • The local branch has commits that are not yet available upstream:

    Your branch is ahead of 'origin/master' by 16 commits.

  • The remote tracking branch has commits that are not yet available in the local branch:

    Your branch is behind 'origin/master' by 73 commits, and can be fast-forwarded.

  • Both the second and third conditions apply, a state called diverged in Git jargon:

    Your branch and 'origin/master' have diverged, and have 16 and 73 different commit(s) each, respectively.

With the -v (compare only) or -vv (compare and upstream name) option, git branch displays the appropriate information for local branches:

$ git branch -vv
* master      0a464e9 [origin/master: ahead 1] docs: fix grammar in
git-tags.txt
  feature     cd3065f Merge branch 'kc/gitweb-pathinfo-w-anchor'
  next        be8b495 [origin/next] Merge branch master into next
  pu          0c0c536 [origin/pu: behind 3] Merge branch
'jk/maint-merge-rename-create' into pu

The command prints the SHA-1 prefix for all branches and the commit message of the current commit. If an upstream is configured for the branch, Git returns both the name and a comparison to the upstream. In the example, you see four different branches. master has an additional commit that has not yet been uploaded to the remote, and is therefore ahead. The branch feature, on the other hand, has no upstream branch configured, so it currently exists only locally. The branch next is up-to-date with the corresponding remote tracking branch. The Branch pu, on the other hand, “lags” behind its upstream and is therefore displayed as behind. The only state missing here is diverged — then both ahead and behind are shown including the number of “missing” commits.

5.6. Distributed Workflow with Multiple Remotes

Git supports working with multiple remotes. A popular workflow that takes advantage of this feature is the Integration Manager Workflow. There is no “central” repository in the true sense of the word, that is, one that all active developers have write access to. Instead, there is only a quasi-official repository called blessed. It is accessible, for example, via the respective project domain and allows only the most important maintainers (or even only one) write access.

Everyone who wants to contribute to the project clones the blessed repository and starts working. As soon as he has fixed bugs or implemented a new feature, he makes his improvements available via a publicly accessible repository, a so-called developer public. He then sends a pull request to one of the maintainers of the official repository (or to the mailing list), requesting that certain code from his public repository be transferred to the official repository. You can see the infrastructure for this process in Figure 37, “Integration Manager Workflow”. Although it is theoretically possible to give interested parties direct access to your development machine, this almost never happens in practice.

integration manager workflow
Figure 37. Integration Manager Workflow

One of the maintainers who have access to the master repository then checks if the code works, if it meets the quality requirements, etc. Any errors or ambiguities are reported to the author of the code, who then corrects them in his repository. Only when the maintainer is satisfied does he commit the changes to the master repository, so that the code is delivered in one of the following releases. Maintainers who integrate new code are often referred to as Integration Managers, which gives the workflow its name. Such maintainers often have several remotes configured, one for each contributor.

One of the great advantages of this workflow is that, in addition to the maintainers, interested users, such as colleagues or friends of the developer, also have access to the public developer repositories. They don’t have to wait until the code has found its way into the official repository, but can try out the improvements immediately after deployment. The hosting platform Github in particular relies heavily on this workflow. The web interface used there offers a lot of features to support this workflow, e.g. a visualization that shows all available clones of a project and the commits contained in them, as well as the possibility to perform merges directly in the web interface. For a detailed description of this service, see Ch. 11, GitHub.

5.7. Managing Remotes

With git remote you can manage additional remotes. For example, to add a new remote from another developer, use the command git remote add. Most of the time you’ll want to initialize the remote tracking branches afterwards, which you can do with git fetch:

$ git remote add example git://example.com/example.git
$ git fetch example
...

To do both steps in one call, use the -f option, for fetch:

$ git remote add -f example git://example.com/example.git

If you no longer need the remote, you can remove it from your local configuration using git remote rm. This will also delete all remote tracking branches for that remote:

$ git remote rm example

Remotes do not necessarily have to be configured via git remote add. You can simply use the URL on the command line,⁠[74] for example to download the objects and references for a bugfix:

$ git fetch git://example.com/example.git bugfix:bugfix

Of course this also works with pull and push.

If you work with several remotes, the command git remote update --prune is a good choice. This will fetch all remotes, and the --prune option will delete all expired remote tracking branches.

The following alias has proved to be very useful for us, as it combines many work steps that are often performed one after the other in practice:

$ git config --global alias.ru "remote update --prune"

5.7.1. Pull-Request

To generate a pull request automatically, there is the git command request-pull. The syntax is:

git request-pull <start> <URL> [<end>]

As <URL> you specify your public repository (either as the actual URL or as a configured remote repository), and as <start> you select the reference on which the feature is built (in many cases the branch master, which should match the master branch of the official repository). Optionally, you can specify an <end>; if you omit this, Git will use HEAD.

The output is by default STDOUT, and includes the repository’s URL and branch name, a short description of all commits by author, and a diff state, i.e., a balance of added and deleted lines by file. This output can easily be forwarded to an e-mail program. If you add the -p option, a patch with all changes is appended below the text.

For example, to ask someone to download the two latest commits from a repository:

$ git request-pull HEAD~2 origin
The following changes since commit d2640ac6a1a552781[...]c48e08e695d53:

  README verbessert (2010-11-20 21:27:20 +0100)

are available in the git repository at:
  git@github.com:esc/git-cheatsheet-de.git master

Valentin Haenel (2):
      Lizenz hinzugefügt
      URL hinzugefügt und Metadaten neu formatiert

 cheatsheet.pdf |  Bin 89513 -> 95619 bytes
 cheatsheet.tex |   18 ++++++++++++++++--
 2 files changed, 16 insertions(), 2 deletions(-)

5.8. Exchanging Tags

Tags are also exchanged with the remote commands fetch or pull and push. In contrast to branches, which change, tags are “static”. For this reason, remote tags are not referenced locally again, so there is no equivalent to the remote tracking branches for the tags. Tags that you get from your remote repositories are stored by Git as .git/refs/tags/ or .git/packed-refs, as usual.

5.8.1. Downloading Tags

In principle, Git automatically downloads new tags when you call git fetch or git pull. That is, if you download a commit that has a tag pointing to it, that tag will be included. However, if you use a refspec to exclude individual branches, then commits in those branches will not be downloaded, and thus no tags that may point to those commits will be downloaded. Conclusion: Git only downloads relevant tags. With the options --no-tags (no tags) and --tags or -t (all tags) you can adjust the default behavior. Note, however, that --tags not only downloads the tags, but necessarily the commits to which they point.

Git notifies you when new tags arrive:

$ git fetch
[fetch output]
From git://git.kernel.org/pub/scm/git/git
 * [new tag]         v1.7.4.2   -> v1.7.4.2

If you want to know what tags are present on the remote side, use git ls-remote with the --tags option. For example, you can get all release candidates of git version 1.7.1 with the following call:

$ git ls-remote origin --tags v1.7.1-rc*
bdf533f9b47dc58ac452a4cc92c81dc0b2f5304f    refs/tags/v1.7.1-rc0
537f6c7fb40257776a513128043112ea43b5cdb8    refs/tags/v1.7.1-rc0^{}
d34cb027c31d8a80c5dbbf74272ecd07001952e6    refs/tags/v1.7.1-rc1
b9aa901856cee7ad16737343f6a372bb37871258    refs/tags/v1.7.1-rc1^{}
03c5bd5315930d8d88d0c6b521e998041a13bb26    refs/tags/v1.7.1-rc2
5469e2dab133a197dc2ca2fa47eb9e846ac19b66    refs/tags/v1.7.1-rc2^{}

Git outputs the SHA-1 sums of the tags and their contents.⁠[75]

5.8.2. Uploading Tags

Git does not automatically upload tags. You need to pass them explicitly to git push, similar to the branches, e.g. to upload the tag v0.1:

$ git push origin v0.1

If you want to upload all tags at once, use the --tags option. But be careful: Avoid this option if you use Annotated Tags to mark versions and Lightweight Tags to mark something locally, as described in Sec. 3.1.3, “Tags — Marking Important Versions”, because with this option you would upload all tags, as already mentioned.

Attention: Once you have uploaded a tag, you should never change it! The reason: Let’s say Axel changes a tag, like v0.7, that he has already released. First it pointed to the 5b6eef commit, and now to bab18e. Beatrice had already downloaded the first version pointing to 5b6eef, but Carlos had not yet. The next time Beatrice calls git pull, Git won’t download the new version from the v0.7 tag; the assumption is that tags don’t change, so Git doesn’t check the validity of the tag! When Carlos now runs git pull, he also gets the v0.7 tag, but it now points to bab18e. Finally, two versions of the tag — each pointing to different commits --- are in circulation. Not a very helpful situation. It gets really confusing when both Carlos and Beatrice use the same public repository, and upload all tags by default.⁠[76] The tag “jumps” back and forth between two commits in the public repository, so to speak; which version you get with a clone depends on who pushed last.

If you do get this mishap, you have two options:

  1. The sensible alternative: Instead of replacing the tag, create a new one and upload it as well. Name the new tag according to the project conventions. If the old tag is v0.7, name the new one something like v0.7.1.

  2. If you really want to replace the tag: Admit publicly (mailing list, wiki, blog) that you made a mistake. Let all developers and users know that a tag has changed and ask them to check the tag with you. The size of the project and your willingness to take risks will determine whether this solution is feasible.

5.9. Patches via E-mail

An alternative to setting up a public repository is to automatically send patches via email. The format of the email is chosen so that maintainers can have Git automatically apply patches received via email. Especially for small bug fixes and sporadic collaboration, this is usually less time-consuming and faster. There are many projects that rely on this type of exchange, most notably the Git project itself.

The majority of patches for Git are contributed via the mailing list. There they go through a stringent review process, which usually leads to corrections and improvements. The patches are improved by the author and sent back to the list until a consensus is reached. Meanwhile, the maintainer regularly stores the patches in a branch in his repository, and makes them available for testing via the pu branch. If the patch series is considered finished by the participants on the list, the branch moves on to the different integration branches pu and next, where the changes are tested for compatibility and stability. If everything is in order, the branch finally ends up in the master and from there forms part of the next release.

The approach patches via e-mail is realized by the following git commands:

git format-patch

Format commits for sending as patches.

git send-email

Send patches.

git am

Add patches from a mailbox to the current branch (apply from mailbox).

5.9.1. Exporting Patches

The git format-patch command exports one or more commits as patches in Unix mailbox format and prints one file per commit. The file names consist of a sequential numbering and the commit message, and end in .patch.⁠[77] As an argument, the command expects either a single commit or a range such as A..B. If you specify a single commit, Git will evaluate this as the selection from the commit to the HEAD.

gitk screen format patch
Figure 38. Formatting three commits to 'master' as patches

Figure 38, “Formatting three commits to 'master' as patches” shows the initial situation. We want to export the three commits in the fix-git-svn-docs branch, that is, all commits from master, as patches:

$ git format-patch master
0001-git-svn.txt-fix-usage-of-add-author-from.patch
0002-git-svn.txt-move-option-descriptions.patch
0003-git-svn.txt-small-typeface-improvements.patch

To export only the HEAD, use option -1, and format-patch will create a patch for the first commit only:

$ git format-patch -1
0001-git-svn.txt-small-typeface-improvements.patch

This also works for any SHA-1 sums:

$ git format-patch -1 9126ce7
0001-git-svn.txt-fix-usage-of-add-author-from.patch

The generated files contain, among other things, the header fields From, Date and Subject, which are used for sending as e-mail. These fields are completed using the information available in the commit — author, date, and commit message. The files also contain a diff-stat summary and the changes themselves as a patch in unified diff format. The [PATCH m/n] suffix⁠[78] in the subject line is used later by Git to apply the patches in the correct order.

A corresponding excerpt follows:

$ cat 0003-git-svn.txt-small-typeface-improvements.patch
From 6cf93e4dae1e5146242338b1b9297e6d2d8a08f4 Mon Sep 17 00:00:00 2001
From: Valentin Haenel 
Date: Fri, 22 Apr 2011 18:18:55 0200
Subject: [PATCH 3/3] git-svn.txt: small typeface improvements

Signed-off-by: Valentin Haenel 
Acked-by: Eric Wong 
---
 Documentation/git-svn.txt |    8 ++++----
 1 files changed, 4 insertions(), 4 deletions(-)

diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
...

If you plan to send a series of patches, it is recommended that you use the --cover-letter option to create a kind of “cover page” in which you describe the series. By default the file is called 0000-cover-letter.patch. Apart from the default headers, such a file looks like this:

Subject: [PATCH 0/3] *** SUBJECT HERE ***

*** BLURB HERE ***

Valentin Haenel (3):
  git-svn.txt: fix usage of --add-author-from
  git-svn.txt: move option descriptions
  git-svn.txt: small typeface improvements

 Documentation/git-svn.txt |   22 +++++++++++-----------
 1 files changed, 11 insertions(+), 11 deletions(-)

As you can see, the Subject: still has the prefix [PATCH 0/3]; this way, all recipients can immediately see that it is a cover page. The file also contains the output of git shortlog and git diff --stat. Replace * SUBJECT HERE with a subject and BLURB HERE * with a summary of the patch series. Send the file together with the patch files.

Frequently, mailing lists to which patches are sent are used to criticize the patches in terms of content and syntax and to ask the author for improvement. Once the author has made the improvements, he sends the corrected series back to the list as a reroll. Depending on the size of the patch series and the requirements of the project, a patch series may go through several rerolls until it is accepted.

When you send a patch series to a mailing list: Keep the commits on a separate branch, and incorporate the fixes in new commits (for missing functionality) or with interactive rebase (to adjust existing commits). Then use the git format-patch command with the --reroll-count=<n> option (or -v <n> for short): this will create patches with [PATCH v2] as the subject line, making it clear that this is the first reroll in the series.

5.9.2. Sending Patches

Send the generated files with git send-email (or an email client of your choice). The command expects as its only mandatory argument either one or more patch files, a directory full of patches, or a selection of commits (in which case Git also calls git format-patch internally):

$ git send-email 000*
0000-cover-letter.patch
0001-git-svn.txt-fix-usage-of-add-author-from.patch
0002-git-svn.txt-move-option-descriptions.patch
0003-git-svn.txt-small-typeface-improvements.patch
Who should the emails appear to be from? [Valentin Haenel
<valentin.haenel@gmx.de>]

$ git send-email master
/tmp/HMSotqIfnB/0001-git-svn.txt-fix-usage-of-add-author-from.patch
/tmp/HMSotqIfnB/0002-git-svn.txt-move-option-descriptions.patch
/tmp/HMSotqIfnB/0003-git-svn.txt-small-typeface-improvements.patch
Who should the emails appear to be from? [Valentin Haenel
<valentin.haenel@gmx.de>]

The command git send-email sets the fields Message-Id and In-Reply-To. This makes all e-mails after the first one look like replies to them and thus most mail programs will display them as a continuous thread:⁠[79]

mail thread
Figure 39. Patch series as mail thread

You can customize the command with options such as --to, --from and` --cc` (see the git-send-email(1) man page). However, if not specified, the essential information is queried interactively — most important is an address to which the patches should be sent.⁠[80]

Before the emails are actually sent, you will see the header again; you should check if everything is as you want it, and then answer the question` Send this email? ([y]es|[n]o|[q]uit|[a]ll):` answer with y for “yes”. To get familiar with the command, you can first send all emails only to yourself or use the --dry-run option.

As an alternative to git send-email, you can post the contents of the files to one of the many online pastebin services, for example dpaste[81] or gist.github[82], and send the reference to it via IRC or Jabber. For pastebin, the recipient downloads the content into a file and submits it via git am (see below).

If you want to use your preferred Mail User Agent (MUA) (e.g. Thunderbird, Kmail or others) to send patches, there may be a few things to consider. Some MUAs are notorious for mutilating patches so that Git won’t recognize them as such.⁠[83]

5.9.3. Applying Patches

Patch emails exported with git format-patch are translated back into commits by the git command git am (apply from mailbox). A new commit is created from each email, and its meta-information (author, commit message, etc.) is generated from the email header lines (From, Date). As mentioned earlier, Git uses the number in the subject to determine the order in which the commits should be entered. To complete the example from earlier: If the emails are in the Maildir directory patches, then that’s enough:

$ git am patches
Applying: git-svn.txt: fix usage of --add-author-from
Applying: git-svn.txt: move option descriptions
Applying: git-svn.txt: small typeface improvements

The command understands Maildir and mbox formats as well as files that contain the output of git format-patch:

$ git \
  am 0001-git-svn.txt-fix-usage-of-add-author-from.patch
Applying: git-svn.txt: fix usage of --add-author-from

When you apply patches from others using git am, the values of Author/AuthorDate and Committer/CommitDate are different. This means that both the author of the commit and the person who commits it are honored. In particular, the attributes are retained; it remains traceable who wrote which lines of code. With Gitk, the author and committer values are displayed by default; on the command line, set the --format=fuller option, which is accepted by git log and git show, among others:

$ git show --format=fuller  12d3065
commit 12d30657d411979af3ab9ca7139b5290340e4abb
Author:     Valentin Haenel <valentin.haenel@gmx.de>
AuthorDate: Mon Apr 25 23:36:15 2011 +0200
Commit:     Junio C Hamano <gitster@pobox.com>
CommitDate: Tue Apr 26 11:48:34 2011 -0700

    git-svn.txt: fix usage of --add-author-from

With the Dictator and Lieutenants Workflow (Sec. 5.10, “A Distributed, Hierarchical Workflow”), it can happen that more than two people are involved in a commit. In this case, it makes sense that everyone who reviews the patch also “approves” it, especially the author. For this purpose, there is a --signoff option (-s for short) for the git commit and git am commands, which appends the committer’s name and email to the commit message:

Signed-off-by: Valentin Haenel <valentin.haenel@gmx.de>

This feature is especially useful for larger projects, which usually have guidelines on how to format commits and how best to send them.⁠[84]

Conflicts can occur when patches are entered with git am, e.g. if the patches are based on an older version and the lines concerned have already been changed. In this case, the process is interrupted and you then have several options for how to proceed. Either resolve the conflict, update the index and continue the process with git am --continue, or skip the patch with git am --skip. Use git am --abort to abort the process and restore the current status of the branch.

Because patches usually contain changes made by others, it can sometimes be difficult to find the right solution to a conflict. The best strategy for patches that cannot be applied is to ask the author of the patches to rebase them to a well-defined base, such as the current master, and send them again.

An alternative to git am is the somewhat rudimentary command git apply. It is used to apply a patch to the working tree or index (with the --index option). It is similar to the classic Unix command patch. It is especially useful if you want to edit the patch or metadata before committing, or if someone has sent you the output of git diff instead of git format-patch as a patch.

5.10. A Distributed, Hierarchical Workflow

The Integration Manager workflow does not scale with the size of the project. With large growth, at some point the maintainer is overwhelmed by the complexity of the project and the number of incoming patches. The so-called Dictator and Lieutenants workflow, which is used extensively in the development of the Linux kernel, provides a remedy. In this case, the software is usually divided into different subsystems, and contributions are examined by the lieutenants (also subsystem maintainers) and then forwarded to the Benevolent Dictator. The Benevolent Dictator uploads the changes to the blessed repository, which in turn is synchronized with all other participants.

patches via email
Figure 40. Workflow: Dictator and Lieutenants

The workflow is based on trust: The dictator trusts his lieutenants and usually takes over their forwarded modifications without control. The advantage is that the dictator is exonerated, but still retains a veto right, which led to the title Benevolent Dictator.

For historical reasons, the official repository is often only the public repository of the current main maintainer or the original author. It is important to note that this repository exists only because of social conventions. Should another developer one day better advance the project, his public repository may become the new Blessed Repository. From a technical point of view, there is no reason not to do so.

The projects that use this workflow in practice prefer to exchange patches by mail. However, the nature of the exchange is secondary, and subsystem maintainers may just as well receive pull requests from developers they know; or they may mix public repositories and patches sent by email at will. Git’s flexibility — especially the variety of different methods for exchanging changes — supports every conceivable workflow in the spirit of free, open development. Certainly a feature that has contributed greatly to Git’s popularity.

5.11. Managing Subprojects

For larger software projects, it is sometimes necessary to outsource certain parts of a program into separate projects. This is the case in the following situations, for example:

  • Your software depends on a specific version of a library that you want to ship with the source code.

  • Your initially small project grows so large over time that you want to move functionality to a library that you want to manage as a separate project.

  • Independent parts of your software are managed by other development groups.

With Git, you can use it in two different ways: You can manage the modules as Git submodules or as subtrees — in either case, you manage source code in a subdirectory of your project.

As submodules, you manage an isolated repository that has nothing to do with your parent repository. If you work with subtrees instead, the project history of the subdirectory becomes inseparable from the parent project. Both have advantages and disadvantages.

We’ll look at both techniques by way of example, creating a fictional project that requires libgit2. The library provides, similar to libgit.a, an API to examine and modify Git repositories.⁠[85] The library, written in C, can extend its functions to Lua, Ruby, Python, PHP and JavaScript, among others.

5.11.1. Submodules

Submodules are managed by Git as subdirectories that have a special entry in the .gitmodules file. The command git submodule is responsible for handling them.

First we need to import the library. This is done with the following command:

$ git submodule add git://github.com/libgit2/libgit2.git libgit2
Cloning into libgit2...
remote: Counting objects: 4296, done.
remote: Compressing objects: 100% (1632/1632), done.
remote: Total 4296 (delta 3214), reused 3530 (delta 2603)
Receiving objects: 100% (4296/4296), 1.92 MiB | 788 KiB/s, done.
Resolving deltas: 100% (3214/3214), done.

From the output of git status we can now see that there is a new directory libgit2 and that the file .gitmodules with the following content has been created

[submodule "libgit2"]
  path = libgit2
  url = git://github.com/libgit2/libgit2.git

This file has already been added to the index, prepared for committing. The libgit2 directory, on the other hand, does not appear in the output of git diff --staged as usual:

$ git diff --staged -- libgit2
diff --git a/libgit2 b/libgit2
new file mode 160000
index 0000000..b64e11d
--- /dev/null
+++ b/libgit2
@@ -0,0 +1 @@
+Subproject commit 7c80c19e1dffb4421f91913bc79b9cb7596634a4

Instead of listing all the files in the directory, Git saves a “special” file (recognizable by the unusual file mode 160000) that simply records the commit the module is currently on.

We import these changes, and from now on we can compile libgit2 in its subdirectory and then link against it:

$ git commit -m "libgit2-submodule importiert"

The parent project and libgit2 are now merged in the working tree, but their version history is and remains separate. In the Git repository of libgit2 you can behave exactly the same way as in a “real” repository. For example, you can look at the output of git log in the parent project and after a cd libgit2 in the submodule.

5.11.1.1. Changes in Submodules

Now libgit2 has selected the branch development as default branch (i.e. the HEAD on the server side). It may not be the best idea to more or less wire this development branch to your repository.

So we change to the libgit2 directory and check out the latest tag, v0.10.0:

$ cd libgit2
$ git checkout v0.10.0
# Nachricht über "detached HEAD state"
$ cd ..
$ git diff
diff --git a/libgit2 b/libgit2
index 7c80c19..7064938 160000
--- a/libgit2
+++ b/libgit2
@@ -1 +1 @@
-Subproject commit 7c80c19e1dffb4421f91913bc79b9cb7596634a4
+Subproject commit 7064938bd5e7ef47bfd79a685a62c1e2649e2ce7

So the parent Git repository sees a change of HEAD, which was done by the git checkout v0.10.0 command in libgit2/, as a change to the pseudo-file libgit2, which now points to the corresponding new commit.

Now we can add this change to the index and save it as a commit:

$ git add libgit2
$ git commit -m "Libgit2-Version auf v0.10.0 setzen"

Attention: Never add files from libgit2 or the directory libgit2/ (ends with slash) — this breaks the modular concept of Git, you will suddenly manage files from the submodules in the parent project.

Similarly, you can use submodule update (or git remote update in the libgit2/ directory) to download new commits and record a library update in the parent repository accordingly.

5.11.1.2. From a User Perspective

So what does it all look like from the perspective of a user cloning the project for the first time? First, it’s obvious that the submodule(s) are not hard-coded into the repository and are not shipped with it:

$ git clone /dev/shm/super clone-super
$ cd clone-super
$ ls
bar.c  foo.c  libgit2/
$ ls -l libgit2
total 0

The directory libgit2/ is empty. So everything Git knows about the submodules is in the .gitmodules file. You need to initialize this module first and then download the module’s repository:

$ git submodule init
Submodule 'libgit2' (git://github.com/libgit2/libgit2.git)
registered for path 'libgit2'
$ git submodule update
...
Submodule path 'libgit2': checked out '7064938bd5e7ef47bfd79a685a62c1e2649e2ce7'

So we see that libgit2 is automatically set to the v0.10.0 version defined in our repository. But in principle the user can now also change to the directory, check out the branch development and compile the project against this version. Submodules get the flexibility of the sub-repository — so the entry on which state the module is on is only a “recommendation”.

5.11.2. Subtrees

Unlike submodules, which maintain their character as a standalone Git repository, when you work with Subtrees, you directly merge the history of two projects. A comparison of the two approaches follows.

Essentially, this technique is based on so-called subtree-merges, which were briefly discussed in Sec. 3.3.3, “Merge Strategies” about merge strategies. In our example, a subtree-merge is done by merging regular commits from the libgit2 repository under the libgit2/ tree (directory) — a top-level file in the library repository thus becomes a top-level file in the libgit2/ tree, which in turn is part of the repository.

Git has a command to manage subtree-merges.⁠[86] You must always explicitly specify which subdirectory you are referring to by using -P <prefix>. To import the libgit2 in version 0.8.0, use:

$ git subtree add -P libgit2 \
  git://github.com/libgit2/libgit2.git v0.8.0
git fetch git://github.com/libgit2/libgit2.git v0.8.0
From git://github.com/libgit2/libgit2
 * tag               v0.8.0     -> FETCH_HEAD
Added dir 'libgit2'

The command automatically downloads all required commits and creates a merge commit that creates all the files of libgit2 under the directory libgit2/. The merge commit now links the previous version history to that of libgit2 (by referencing an original commit and then referencing other commits).

The result of this procedure is that your repository now contains all relevant commits from libgit2. Your repository now has two root commits (see also multi-root repositories in Sec. 4.7, “Multiple Root Commits”).

The files are now stored inseparably linked to the project. A git clone of this repository would also transfer all files under libgit2.⁠[87]

Now what happens when you want to “upgrade” to v0.10.0? Use the pull command from git subtree for this:

$ git subtree -P libgit2 \
  pull git://github.com/libgit2/libgit2.git v0.10.0
From git://github.com/libgit2/libgit2
 * tag               v0.10.0    -> FETCH_HEAD
Merge made by the 'recursive' strategy.
...

Note: Since the original libgit2 commits are present, these commits also seem to change top-level files (e.g., COPYING when you use git log --name-status to examine the version history). In fact, these changes are actually made in libgit2, which is the responsibility of the merge commit, which aligns the trees accordingly.

If you’re not interested in the version history of a subproject, but want to anchor a particular state in the repository, you can use the --squash option. The git subtree add/pull commands then do not merge the corresponding commits, but only create a single commit that contains all changes. Note: Do not use this option unless you have also imported the project using --squash; this will cause merge conflicts.

5.11.2.1. Splitting off a Subdirectory

At some point, you may be faced with the task of managing a subdirectory of your project as a separate repository. However, you may still want to integrate the changes into the original project.

For example, the documentation stored under doc/ will be managed in a separate repository from now on. Occasionally, that is, every few weeks, you want to transfer the latest developments to the master repository.

The git sub-tree command provides a separate sub-command split for this purpose, which you can use to automate this step. It creates a version history containing all changes to a directory, and issues the latest commit — which you can then upload to an (empty) remote.

$ git subtree split -P doc --rejoin
Merge made by the 'ours' strategy.
563c68aa14375f887d104d63bf817f1357482576
$ git push <neues-doku-repo> 563c68aa14375:refs/heads/master

The --rejoin option causes the version history split off in this way to be directly reintegrated into the current project via git subtree merge. From now on you can integrate the new commits via git subtree pull. If you want to use the --squash option instead, omit --rejoin.

5.11.3. Submodules vs. Subtrees

The question “Submodules or Subtrees?” cannot be answered in general, but only on a case by case basis. The decisive criterion should be the affiliation of the subproject to the superordinate one: If you include third-party software, it is probably more likely to be submodules, your own with limited commits and a direct relationship to the main project rather than a subtree.

For example, when you install CGit (see Sec. 7.5, “CGit — CGI for Git”), you must initialize and update a submodule to compile libgit.a. So CGit needs the source code of Git, but doesn’t want to merge the development history with that of Git (the comparatively few CGit commits would be lost in this!). You can, however, compile CGit against another version of Git if you wish — the flexibility of the sub-repository is preserved.

The graphical repository browser Gitk, on the other hand, is managed as a subtree. It is developed in git://ozlabs.org/~paulus/gitk, but is included in the main Git repository with the subtree-merge strategy under gitk-git/.


65. We developed the cheat sheet in connection with various Git workshops. It is licensed under a Creative Commons License and is managed with the Git hosting platform GitHub, which we describe in Ch. 11, GitHub beschreiben, verwaltet.
66. Strictly speaking, Git does not “blindly” check out the master branch. In fact, Git looks up which branch the HEAD of the other side references and checks it out.
67. For more information on the Git protocol, see Sec. 7.1.1, “The Git Protocol” (see also Sec. 3.1.1, “HEAD and Other Symbolic References”).
68. For a complete list of possible URLs, see the git-clone(1) man page in the “Git URLs” section.
69. The asterisk (*) is also interpreted as a wildcard like the Shell and considers all files in a directory.
70. Remote tracking branches are only intended to track the branches in a remote. Checking out a remote tracking branch will result in a detached head state and warning.
71. Merging from origin/master to master is a normal merging process. In the example above, no further local commits have been made in the meantime and therefore no merge commits have been created. The master has been fast-forwarded to origin/master.
72. But the “forcing” only takes place locally: The recipient server can prevent the upload despite the specification of the option -f. This is done with the receive.denyNonFastForwards option, or the RW rights assignment for Gitolite (see Sec. 7.2.2, “Configuring Gitolite”).
73. This is the default behavior since version 2.0 (push.default=simple). Earlier Git versions used the push.default=matching setting without any further configuration, which can be buggy, especially for beginners.
74. In Git jargon such remotes are called anonymous.
75. The syntax <tag>^{} dereferences a tag object, so returns the commit, tree or blob object to which the tag points.
76. For example with the alias push = push --tags.
77. See the git-format-patch(1) man page for information on how to customize the numbering, text and file suffix.
78. The number n is the total number of patches exported and m is the number of the current patch. For example, the subject line of the third patch of five would read [PATCH 3/5].
79. You can see in Figure 39, “Patch series as mail thread” a slightly different order of patches than in the previous examples. This is because the first version of the patch series consisted of only two patches, and the third one was added after feedback from the Git mailing list. The series was then expanded and rebased to the state as shown in this section.
80. If no Mail Transfer Agent (MTA) is installed on your system or configured to send e-mail, you can also use an external SMTP server. To do so, adjust the settings described in the section “Use GMail as the SMTP server” of the already mentioned man page.
81. https://dpaste.com
82. https://gist.github.com
83. Useful tips and tricks for various MUAs can be found in the Documentation/SubmittingPatches file in the Git-via-Git repository in the “MUA specific hints” section, and in the git-format-patch(1) man page in the “MUA specific hints” and “Discussion” sections.
84. For the Git project, you can find them at Documentation/SubmittingPatches in the source code repository.
85. The libgit.a is created when compiling Git and gathers all functions that are “public” in Git. However, it is not reentrant or thread-safe, so its use is limited. libgit2 does not have these restrictions.
86. The command is not a standard command of Git, but is installed automatically by some Linux distributions (e.g. Debian, Archlinux) and in the Windows Git installer. Check by calling git subtree whether the command is installed. If not, you can search for the script under /usr/share/doc/git/contrib/subtree/, or copy it from the source code of Git (under contrib/subtree).
87. Therefore, make sure that you only include content that you are allowed to pass on using this technology. Depending on the license, the use of a software may be allowed, but not the distribution.