5. Distributed Git
Git is a distributed version control system. To understand this feature, a brief digression into the world of centralized version management is necessary: As the name suggests, in a central version control system, such as RCS, CVS, and Subversion, the development history is stored centrally on a repository server, and all developers synchronize their work with this one repository. Developers who want to change something download a current version to their computer (checkout), maintain their modifications, and then send them back to the server (commit).
5.1. How Does Distributed Version Control Work?
One of the major disadvantages of the centralized approach is that a connection to the server is required for most of the work steps. For example, if you want to view history or make a commit, you need a network connection to the server. Unfortunately, this is not always guaranteed, maybe the server is down or you are working on your laptop without a (W)LAN connection.
For distributed systems this is regulated differently: Basically, each developer has his or her own local copy of the repository, so the question arises of how developers share changes.
One approach is to provide a single “master repository” that all developers use to synchronize their local repositories. The developers connect to this repository from time to time, uploading their own commits (push) and downloading those of their colleagues (fetch or pull). This very centralized approach is often used in practice. For an illustration, see Figure 30, “Central workflow with distributed version management”.
However, there are two noteworthy alternatives in the Git environment that we will introduce in this chapter: the Integration Manager workflow, which uses multiple public repositories (Sec. 5.6, “Distributed Workflow with Multiple Remotes”), and patch exchange by e-mail (Sec. 5.9, “Patches via E-mail”).
Unlike central systems, Git’s commit and checkout processes are local. Other day-to-day tasks, such as reviewing history or switching to a branch, are also done locally. Only the uploading and downloading of commits are non-local operations. This has two important advantages over centralized version management: No network is needed, and everything is faster. How often you synchronize your repository depends, among other things, on the size and development speed of the project. If you’re working with a colleague on the internals of your software, you’ll probably need to synchronize more often than if you’re working with a feature that doesn’t have a major impact on the rest of the code base. It may well be that one synchronization per day is sufficient. So you can work productively even without a permanent network connection.
This chapter is about how to exchange changes between your local repository and a remote repository (aka remote), what to consider when working with multiple remotes, and how to email patches so that they can be easily applied by the recipient.
The most important commands at a glance:
git remote
-
General configuration of remotes: add, remove, rename, etc.
git clone
-
Download complete copy.
git pull
andgit fetch
-
Download commits and references from a remote.
git push
-
Upload commits and references to a remote.
5.2. Cloning Repositories
You have already seen the first command related to remote repositories: git clone
.
Here we illustrate the cloning process with our “git cheat sheet”:[65]
$ git clone git://github.com/esc/git-cheatsheet-de.git Initialized empty Git repository in /tmp/test/git-cheatsheet-de/.git/ remote: Counting objects: 77, done. remote: Compressing objects: 100% (77/77), done. remote: Total 77 (delta 45), reused 0 (delta 0) Receiving objects: 100% (77/77), 132.44 KiB, done. Resolving deltas: 100% (45/45), done.
Git will issue various status messages when this call is made.
The most important ones are: the notification of which directory the new repository will be cloned to (Initialized empty Git repository in /tmp/test/git-cheatsheet-de/.git/
), and the confirmation that all objects have been successfully received ((Receiving objects: 100% (77/77), 132.44 KiB, done.
)
If the cloning process is successful, the master
branch is checked out,[66] and the working tree including repository is located in the directory git-cheatsheet-en
.
$ cd git-cheatsheet-de $ ls cheatsheet.pdf cheatsheet.tex Makefile README $ ls -d .* .git/
To create the clone in a different directory, simply pass it as an argument:
$ git clone git://github.com/esc/git-cheatsheet-de.git cheatsheet Initialized empty Git repository in /tmp/test/cheatsheet/.git/ $ ls cheatsheet/
Furthermore, the source repository, i.e. the origin of the clone, is configured as a remote repository named origin
.
The git remote command displays the setting:
$ git remote origin
The setting is stored in the configuration file .git/config
with the entry remote
, in this case only for origin
:
[remote "origin"] fetch = +refs/heads/*:refs/remotes/origin/* url = git://github.com/esc/git-cheatsheet-de.git
You will see two settings in the section: fetch
and url
.
The first, called the refspec, specifies which changes are to be downloaded when synchronizing with the remote repository, and the second specifies the URL used to do this.
git remote
is also used to manage remote repositories.
For example, you can add more remote repositories using git remote add
, adapt the URL for the remote repository using git remote set-url
, and so on, but more on this later.
The name origin
is just a convention; with git remote rename
you can change the name of the source repository to suit your needs, for example, from origin
to github
:
$ git remote rename origin github $ git remote github
With the option --origin
or -o
you set the name immediately when cloning:
$ git clone -o github git://github.com/esc/git-cheatsheet-de.git
5.2.1. Repository URLs
Git supports several protocols for accessing a remote repository, the most common three being Git protocol, SSH, and HTTP(S). Designed specifically for Git, the Git protocol favors data transfer by always transferring the smallest possible amount of data. It doesn’t support authentication, so it’s often transmitted over an SSH connection. This ensures both efficient (Git protocol) and secure (SSH) transmission. HTTP(S) is used when a firewall is configured very restrictively and the allowed ports are drastically restricted.[67]
In general, a valid URL contains the transfer protocol, the address of the server and the path to the repository:[68]
-
ssh://[user@]gitbu.ch[:port]/pfad/zum/repo.git/
-
git://gitbu.ch[:port]/pfad/zum/repo.git/
-
http[s]://gitbu.ch[:port]/pfad/zum/repo.git/
For the SSH protocol the short form still exists:
-
[user@]gitbu.ch:pfad/zum/repo.git/
It is also possible to clone repositories locally using the following syntax:
-
/pfad/zum/repo.git/
-
file:///pfad/zum/repo.git/
If you want to know what URLs are configured for a remote repository, use git remote’s --verbose
or -v
option:
$ git remote -v origin git://github.com/esc/git-cheatsheet-de.git (fetch) origin git://github.com/esc/git-cheatsheet-de.git (push)
You can see that there are two URLs for the remote repository origin, but they are set to the same value by default.
The first URL (fetch
) specifies from where and with which protocol changes are downloaded.
The second URL (push
) specifies where changes are uploaded to and with which protocol.
Different URLs are particularly interesting if you download or upload with different protocols.
A common example is to download with the git protocol (git://
) and upload with the SSH protocol (ssh://
).
It is then downloaded without authentication and encryption, which provides a speed advantage, but uploaded with authentication and encryption, which ensures that only you or other authorized people can upload.
You can use the git remote set-url
command to customize the URLs:
$ git remote set-url --add \ --push origin pass:quotes[git@github.com]:esc/git-cheatsheet-de.git $ git remote -v origin git://github.com/esc/git-cheatsheet-de.git (fetch) origin git@github.com:esc/git-cheatsheet-de.git (push)
If you want to customize the URL of a repository, it is often faster to do this directly in the |
5.2.2. Remote-Tracking-Branches
The current status of the remote repository is stored locally. Git uses the mechanism of remote tracking branches, special branches — local references — that reflect the state of the remote branches. They “track” the remote branches and are advanced or set by Git when synchronizing with the remote, if the branches in the remote have changed. In terms of the commit graph, remote tracking branches are markers within the graph that point to the same commits as the branches in the remote repository. You can’t modify remote tracking branches like normal branches; Git manages them automatically, so it updates them. When you clone a repository, Git initializes a remote tracking branch for each remote branch.
Figure 31, “Generated Remote Tracking Branches” shows an example.
The origin
remote repository has three branches: pu
, maint
, and master
.
Git creates a remote tracking branch in the cloned repository for each of these remote branches.
It also creates a local branch master
in the clone that corresponds to the remote branch master
.
This is checked out and is the branch you should work in if you plan to upload commits to the master
(but see Sec. 5.3.1, “git fetch”).
In the git fetch example, there is only one branch on the remote side, master.
That’s why Git creates only one remote tracking branch in the clone, origin/master
.
The git branch -r
command shows all remote tracking branches:
$ git branch -r origin/HEAD -> origin/master origin/master
The special entry origin/HEAD → origin/master
states that in the remote repository the HEAD
points to the branch master
.
This is important for cloning, because this branch is checked out after cloning.
The list of remote tracking branches is a bit sparse in this example, you can see more entries in a clone of the Git-via-Git repository:
$ git branch -r origin/HEAD -> origin/master origin/html origin/maint origin/man origin/master origin/next origin/pu origin/todo
All branches can be displayed with git branch -a
:
$ git branch -a * master remotes/origin/HEAD -> origin/master remotes/origin/master
In this case, Git uses the prefix remotes/
to clearly distinguish remote tracking branches from normal ones.
If you have enabled color output, the different branches will also be color-coded: the checked-out branch green, remote tracking branches red.
Remote Tracking Branches are also references only and are therefore stored under .git/refs
like all references.
However, since they are special references that are also linked to a remote repository, they end up under .git/refs/remotes/<remote-name>
(see Sec. 3.1.1, “HEAD and Other Symbolic References”).
In Gitk, the remote tracking branches are displayed with the prefix remotes/<remote-name>/
, which is also colored dark yellow (Figure 32, “Branch next
and the corresponding remote tracking branch in Gitk”).
next
and the corresponding remote tracking branch in Gitk5.3. Downloading Commits
Now what does it mean when you synchronize two repositories, such as a clone with the source?
Synchronization in this context means two things: first, downloading commits and references, and second, uploading.
As far as the commit graph is concerned, the local graph needs to be synchronized with the one on the remote side, so that both have the same structure.
In this section, we first discuss how to download commits and references from a remote.
There are two commands for this: git fetch
and git pull
.
We’ll first introduce both commands, and in Sec. 5.3.3, “git fetch vs. git pull” we’ll describe which command is preferable under which circumstances.
5.3.1. git fetch
As soon as new commits are created by other developers in a remote, you want to download them to your local repository. In the simplest case, you just want to find out which commits you don’t have locally, download them, and update the remote tracking branches so that they reflect the current status in the remote.
Use the git fetch
command to do this:
$ git fetch origin ... From github.com:esc/git-cheatsheet-de 79170e8..003e3c7 master -> origin/master
Git acknowledges the call with a message that origin/master
has been set from commit 79170e8
to commit 003e3c7
.
The notation master → origin/master
indicates that the branch master
from the remote was used to update the remote tracking branch origin/master
.
In other words: Branches from the remote on the left and remote tracking branches on the right.
See Figure 33, “Remote Tracking Branches are updated” for the effect this has on the commit graph: On the left side is the initial state of the remote origin and next to it that of the clone.
Both the remote and the clone have new commits since the last synchronization (C and D).
The remote tracking branch origin/master
in the clone points to commit B; this is the last state of the remote known to the clone.
By calling git fetch origin
, Git updates the remote tracking branch in the clone to reflect the current status of the master
(pointing to commit C) in the remote.
To do this, Git downloads the missing commit C and then sets the remote tracking branch on it.
5.3.1.1. Refspec
The refspec (reference specification) ensures that the remote tracking branches are set. This is a description of the references to be retrieved from the remote. An example was given above:
[remote "origin"] fetch = +refs/heads/*:refs/remotes/origin/* url = git://github.com/esc/git-cheatsheet-de.git
In the entry fetch
the refspec for the remote is stored.
It has the form: <remote-refs>:<local-refs>
with an optional plus (+).
The example is configured so that all branches, i.e. all references stored in the remote under refs/heads
, end up locally under refs/remotes/origin
.[69]
Thus, for example, the branch master
from the remote origin
(refs/heads/master
) is stored locally as refs/remotes/origin/master
.
Normally the remote tracking branches are “fast-forwarded”, similar to a fast-forward merge.
The remote tracking branch is therefore only updated if the target commit is a descendant of the current reference.
This may not be possible, for example, after a rebase.
In this case, Git will refuse to update the remote tracking branch.
However, the plus overrides this behavior, and the remote tracking branch is still updated.
If this happens, Git will indicate this with the addition (forced update)
:
+ f5225b8..0efec48 pu -> origin/pu (forced update)
This setting is useful in practice and is therefore set by default.
Furthermore, as a user you do not need to worry about setting the refspec, because if you use the command git clone
or git remote add
, Git automatically creates the corresponding default entry for you.
Sometimes you may want to restrict the refspec explicitly.
For example, if you use namespaces for all developers and you are only interested in the master
branch and the branches of the other developers in your team (Beatrice and Carlos), it might look like this:
[remote "firma"] url = axel@example.com:produkt.git fetch = +refs/heads/master:refs/remotes/origin/master fetch = +refs/heads/beatrice/*:refs/remotes/origin/beatrice/* fetch = +refs/heads/carlos/*:refs/remotes/origin/carlos/*
With regard to the commit graph, Git only downloads those commits that are necessary to get references in the commit graph. This makes sense, because commits that are not “secured” by a reference are considered unreachable, and will eventually be deleted (see also Sec. 3.1.2, “Managing Branches”). In the last example, Git therefore does not need to download commits that are referenced by the branches that are not in the refspec. In terms of distribution, Git does not necessarily need to synchronize the entire commit graph, the “relevant” parts are sufficient.
Alternatively, you can specify the refspec on the command line:
$ git fetch origin +refs/heads/master:refs/remotes/origin/master
If there is a refspec that has no reference on the right side of the colon, there is no target to store.
In this case, Git places the reference in the .git/FETCH_HEAD
file instead, and you can use the special term FETCH_HEAD
for a merge:
$ git fetch origin master From github.com:esc/git-cheatsheet-de * branch master -> FETCH_HEAD $ cat .git/FETCH_HEAD 003e3c70ce7310f6d6836748f45284383480d40e branch 'master' of github.com:esc/git-cheatsheet-de $ git merge FETCH_HEAD
This feature can be useful if you are interested in a single remote branch that you have not configured a remote tracking branch for and do not want to do so.
5.3.1.2. Deleting Expired Remote Tracking Branches
If a Remote Branch is deleted (as described in Sec. 5.4.1, “Deleting Remote References”), the corresponding Remote Tracking Branch is referred to as stale (“expired”). Since such branches usually have no further use, delete them (prune):
$ git remote prune origin
Delete directly during download:
$ git fetch --prune
Since this is often the desired behavior, Git offers the fetch.prune
option.
If you set it to true
, git fetch will behave as if you had called it with the --prune
option.
5.3.1.3. Working with Local Branches
So far we have only discussed how to track the change in a remote. If you make changes yourself that are based on one of the branches in the remote, you must first create a local branch where you are allowed to make commits:[70]
$ git checkout -b next origin/next Branch next set up to track remote branch next from origin. Switched to a new branch next
If no local branch named next
exists yet, the following abbreviation also works:
$ git checkout next Branch next set up to track remote branch next from origin. Switched to a new branch next
The set up to track
message indicates that Git is configuring the branch next
from the remote origin
as the upstream branch for the local branch next
.
This is a kind of “shortcut” that benefits other Git commands.
For more details, see Sec. 5.3.2, “git pull”.
You can work in the local branch as usual.
Note, however, that you only ever commit locally.
To publish your work, i.e. upload it to a remote branch, you still need the git push
command (Sec. 5.4, “Uploading Commits: git push”).
5.3.2. git pull
Suppose you want to transfer commits from the remote repository to your local branch.
To do this, first run a git fetch
to fetch new commits, and then merge the change from the corresponding remote tracking branch:[71]
$ git merge origin/master Updating 79170e8..003e3c7 Fast-forward cheatsheet.pdf | Bin 89792 -> 95619 bytes cheatsheet.tex | 19 +++++++++++++++++--- 2 files changed, 16 insertions(+), 3 deletions(-)
For this use case, Git provides the git pull
command to speed up your workflow.
It is a combination of git fetch
and git merge
or git rebase
.
Downloading new commits from origin
and merging all commits referenced by the master
there into the current branch can be done with the following command:
$ git pull origin master ... From github.com:esc/git-cheatsheet-de 79170e8..003e3c7 master -> origin/master Updating 79170e8..003e3c7 Fast-forward cheatsheet.pdf | Bin 89792 -> 95619 bytes cheatsheet.tex | 19 ++++++++++++++++--- 2 files changed, 16 insertions(+), 3 deletions(-)
In Figure 34, “What happens with a pull” we illustrate the process.
On the left, you see the remote repository origin
and next to it the current status of the local repository.
The repository was cloned when it only contained commits A and B, so the remote tracking branch points origin/master
to B.
In the meantime, both the remote (C
) and local (D
) repositories have been added.
On the right side is the state after git pull origin master
.
Commit C has been added to the local repository.
The fetch
call contained in the pull
has updated the remote tracking branch, i.e. it points to the same commit as the master
in origin
and thus reflects the state there.
In addition, the merge
call contained in the pull
has integrated the master
from origin
into the local master
, as you can see from the merge commit M and the current position of the local master
.
Alternatively, the --rebase
option instructs the pull
command to rebase the local branch to the remote tracking branch after fetch
:
$ git pull --rebase origin master
In Figure 35, “What happens during a pull with rebase” you can see what happens if you perform a rebase instead of the default merge.
The initial situation is the same as in Figure 34, “What happens with a pull”.
The fetch
contained in the pull
moves the remote tracking branch origin/master
to commit C.
However, rebase
does not create a merge commit; instead, a call to rebase
gives the commit D a new base, and the local master
is set to the new commit D'.
(Rebase is described in detail in Sec. 4.1, “Moving commits — Rebase”).
5.3.2.1. Upstream Branches
Often git fetch
, git pull
and git push
are executed without arguments.
Git uses the configuration of the upstream branches to decide what to do, among other things.
From the repository’s config:
[branch "master"] remote = origin merge = refs/heads/master
The entry states that the local branch master
is linked to the remote branch master
in the origin
repository.
The remote
entry instructs git fetch
and git pull
, from which remote commits are downloaded.
The merge
entry tells git pull
to merge the new commits from the remote branch master
to the local master
.
This allows both commands to be used without arguments, which is very common in practice.
$ git fetch ... From github.com:esc/git-cheatsheet-de 79170e8..003e3c7 master -> origin/master $ git pull ... From github.com:esc/git-cheatsheet-de 79170e8..003e3c7 master -> origin/master Updating 79170e8..003e3c7 Fast-forward cheatsheet.pdf | Bin 89792 -> 95619 bytes cheatsheet.tex | 19 ++++++++++++++++--- 2 files changed, 16 insertions(+), 3 deletions(-)
If no upstream branch is configured, it tries git fetch
with origin
and otherwise aborts:
$ git fetch fatal: No remote repository specified. Please, specify either a URL or a remote name from which new revisions should be fetched.
If you want changes from an upstream branch on $ git config branch.master.rebase true |
5.3.3. git fetch vs. git pull
Git beginners often ask themselves whether they should use fetch
or pull
.
The answer depends on how you develop: How big is the project?
How many remotes are there?
How heavily are branches used?
5.3.3.1. Distributed Git for Beginners
Especially for beginners, it makes sense that all participants work on the same branch (usually master
), synchronize with the same repository (central workflow) and use only git pull
for downloading and git push
for uploading.
This eliminates the need to deal with more complex aspects such as object model, branching and distribution; and participants can contribute improvements with just a few commands.
This results in the following workflow:
# Repository Klonen $ git clone <URL> # Arbeiten und lokale Commits machen $ git add ... $ git commit # Veränderungen von Anderen herunterladen $ git pull # Eigene Veränderungen hochladen $ git push # Weiter arbeiten, und Synchronisation bei Bedarf wiederholen $ git commit
This approach has advantages and disadvantages.
The advantage is certainly that only a basic understanding of Git is necessary to follow the workflow successfully.
The automatic configuration of upstream branches ensures that git push
and git pull
do the “right thing” without argument.
In addition, this workflow is similar to what Subversion users are used to.
However, there are also drawbacks, mainly related to implicit merging.
Suppose the team consists of two people, Beatrice and Carlos.
Both have made local commits, and Beatrice has already uploaded hers.
Carlos now runs git pull
and receives the message Merge made by recursive
.
If you keep the commit graph in mind, it’s logical: the local branch and the master
of the remote have diverged, so they have been merged back together.
However, Carlos doesn’t understand the message, since he was working on a different part of the code than his colleague, and in his opinion no merge was necessary.
One problem is that term merge stores the association that many people used to have with centralized version control that changes would be merged into the same file.
With Git, however, a merge is always to be understood as the merging of commits into a commit graph.
This may mean merging changes to the same file, but it does not require it.
Besides confusing users, this workflow creates “nonsensical” commits in the history.
Ideally, merge commits should be meaningful entries in the repository history.
An outsider can immediately see that a development branch has been included.
However, this workflow inevitably involves the local master
and its remote counterpart diverging and being merged back together.
The resulting merge commits make no sense — they are actually only a side effect of the workflow and reduce the readability of the history.
Although the --rebase
option for git pull
offers a remedy, the man page explicitly advises against using this option unless you have already internalized the principle of rebase.
Once you understand this, you’re also familiar with how the commit graph is created and how to manipulate it — it’s worthwhile for you to go straight for feature-driven development with branches as a workflow.
5.3.3.2. Distributed Git for Advanced Users
Once you understand the object model and the commit graph, we recommend that you use a workflow that essentially consists of git fetch
, manual merges, and many branches.
The following are some recipes as a suggestion.
If you are using master
as your integration branch, you will need to move your local U forward after calling git fetch
.
To be precise, you need to advance all local branches that have a remote equivalent.
Git provides the syntax @{upstream}
and @{u}
, which corresponds to the remote tracking branch configured for the current branch.
This can be very helpful.
# Veränderungen von Anderen herunterladen $ git remote update ... 79170e8..003e3c7 master -> origin/master # Den Status der Remote-Tracking-Branches abfragen $ git branch -vv * master 79170e8 [origin/master: behind 1] Lizenz hinzugefügt # Veränderungen einsehen $ git log -p ..@{u} # Heruntergeladene Änderungen übernehmen $ git merge @{u} Updating 79170e8..003e3c7 Fast-forward ... # ... oder eigene Änderungen darauf neu aufbauen $ git rebase @{u} # Änderungen dann hochladen $ git push
If you frequently synchronize local branches with your remote tracking branch, we recommend the following alias: $ git config --global alias.fft "merge --ff-only @{u}" This allows you to easily move forward a branch with |
In this context, Ch. 6, Workflows is also helpful, where it is described how to work clearly with many Topic Branches.
5.4. Uploading Commits: git push
The counterpart to fetch
and pull
is the command git push
.
This is used to upload git objects and references to a remote — e.g. the local master
to the branch master
in the remote origin
:
$ git push origin master:master
As with git fetch
, you specify the references for uploading with a refspec.
However, the refspec has the opposite form:
<local-refs>:<remote-refs>
This time the local references are on the left side of the colon, and the remote references on the right.
If you omit the colon and the remote reference, the local name will also be used on the remote side, and will be created by Git if it doesn’t exist:
$ git push origin master Counting objects: 73, done. Compressing objects: 100% (33/33), done. Writing objects: 100% (73/73), 116.22 KiB, done. Total 73 (delta 42), reused 68 (delta 40) Unpacking objects: 100% (73/73), done. To git@github.com:esc/git-cheatsheet-de.git * [new branch] master -> master
Figure 36, “Upload references and commits” shows the process behind git push
.
The initial situation is shown on the left (it is the result of a pull
call).
Git uploads the missing commits D and M to the remote origin
.
At the same time, the remote branch master is advanced to the commit M, so that it matches the local branch master
.
In addition, the remote tracking branch origin/master
is advanced so that it reflects the current status in the remote.
Like fetch
, Git refuses to update references where the target commit is not a descendant of the current commit:
$ git push origin master ... ! [rejected] master -> master (non-fast-forward) error: failed to push some refs to 'git@github.com:esc/git-cheatsheet-de.git' To prevent you from losing history, non-fast-forward updates were rejected Merge the remote changes before pushing again. See the 'Note about fast-forwards' section of 'git push --help' for details.
You can override this behavior either by prefixing it with a plus (+
) in the refspec or by using the --force
or short -f
option:[72]
$ git push origin --force master $ git push origin +master
Look out!
Commits may be lost on the remote side — for example, if you have moved a branch using git reset --hard
and commits are no longer referenced.
You’ll also get the error message if you have modified commits that have already been published via git push
using git rebase
or git commit --amend
.
So here’s the explicit warning again: avoid modifying commits that you have already published!
The modified SHA-1 sums will cause duplication if others have already downloaded the original commits.
5.4.1. Deleting Remote References
There are two ways to delete references in the remote: The older one (before Git version 1.7.0) is to omit the local reference in the refspec — this statement means you want to upload “nothing”. So you replace an existing reference with the empty one.
$ git push origin :bugfix
However, newer git versions usually use the git push
command with the --delete
option, which is syntactically much clearer:
$ git push origin --delete bugfix
Note that in other clones, the remote tracking branch origin/bugfix
, if present, does not automatically disappear!
See the section on pruning above (Sec. 5.3, “Downloading Commits”).
5.4.2. Pushing without Arguments: push.default
In everyday life you often run git push
without specifying remote and refspec.
In this case, Git uses the configuration entries (upstream branch and push.default
) to decide which references are sent where.
$ git push ... To git@github.com:esc/git-cheatsheet-de.git 79170e8..003e3c7 master -> master
By default, Git proceeds like this:[73] If you don’t specify a remote, Git will look for the upstream configuration of the current branch.
If the name of the branch on the remote side matches the name of the local branch, the corresponding reference is uploaded (this is to protect you from uploading, for example, your branch devel
to master
if the upstream configuration is incorrect).
If no upstream branch is configured, Git aborts with an error message:
$ git push fatal: The current branch master has no upstream branch. To push the current branch and set the remote as upstream, use git push --set-upstream origin master
If you use git push <remote>
to specify a remote but no branch, Git will attempt to upload the current branch to the remote under the same name.
The strategy described here is also known as simple
.
For most use cases, it does what the user expects and protects against avoidable errors.
However, you can set the push.default
option responsible for this to one of the following values if required:
nothing
|
Do not upload anything. This is useful if you always want to explicitly specify which branch you want to upload to where. |
upstream
|
If the current branch has an upstream branch, push there. |
current
|
Push the current branch into a remote branch of the same name. |
matching
|
Uploads all locally existing references for which a reference of the same name already exists in the corresponding remote. Attention: You are potentially uploading several branches at the same time! |
5.4.3. Configuring the Upstream Branch
In some cases, Git will automatically configure upstream branches (for example, after a git clone
).
However, you need to do this explicitly, especially for new branches that you are uploading for the first time.
You can do this either afterwards using the --set-upstream-to
option or, in short, -u
of git branch
:
$ git push origin new-feature $ git branch -u origin/new-feature Branch new-feature set up to track remote branch new-feature from origin.
Alternatively, and if you think about it, you can also have git push
write the configuration when you call git push
with the -u
option:
$ git push -u origin new-feature
To view the upstream configuration of your branches, call git branch -vv
.
The output shows the upstream partner of a branch (if any) in square brackets.
5.5. Examining Remotes
In this section, we introduce techniques for viewing a remote and comparing your local repository to it.
5.5.1. Overview of a Remote
The git remote show
command gives a concise summary of the remote, including the branches available there, whether they are tracked locally (tracking status) and which local branches are configured for specific tasks.
The command must request the current status from the remote, i.e. the command fails if the remote is not available, e.g. due to a missing network connection.
The option -n
prevents the query.
$ git remote show origin * remote origin Fetch URL: git://git.kernel.org/pub/scm/git/git.git Push URL: git://git.kernel.org/pub/scm/git/git.git HEAD branch: master Remote branches: html tracked maint tracked man tracked master tracked next tracked pu tracked todo tracked Local branches configured for 'git pull': master merges with remote master pu merges with remote pu Local refs configured for 'git push': master pushes to master (local out of date) pu pushes to pu (up to date)
5.5.2. Comparing with the Upstream
If you have configured an upstream branch, when you change the branch (git checkout
) and query the status (git status
), you will receive a notification about the status of the branch compared to the upstream, for example:
$ git checkout master Your branch is behind 'origin/master' by 73 commits, and can be fast-forwarded.
Here there are four different possibilities:
-
The branches point to the same commit. Git doesn’t show any special message. This state is also called up-to-date.
-
The local branch has commits that are not yet available upstream:
Your branch is ahead of 'origin/master' by 16 commits.
-
The remote tracking branch has commits that are not yet available in the local branch:
Your branch is behind 'origin/master' by 73 commits, and can be fast-forwarded.
-
Both the second and third conditions apply, a state called diverged in Git jargon:
Your branch and 'origin/master' have diverged, and have 16 and 73 different commit(s) each, respectively.
With the -v
(compare only) or -vv
(compare and upstream name) option, git branch
displays the appropriate information for local branches:
$ git branch -vv * master 0a464e9 [origin/master: ahead 1] docs: fix grammar in git-tags.txt feature cd3065f Merge branch 'kc/gitweb-pathinfo-w-anchor' next be8b495 [origin/next] Merge branch master into next pu 0c0c536 [origin/pu: behind 3] Merge branch 'jk/maint-merge-rename-create' into pu
The command prints the SHA-1 prefix for all branches and the commit message of the current commit.
If an upstream is configured for the branch, Git returns both the name and a comparison to the upstream.
In the example, you see four different branches.
master
has an additional commit that has not yet been uploaded to the remote, and is therefore ahead.
The branch feature
, on the other hand, has no upstream branch configured, so it currently exists only locally.
The branch next
is up-to-date with the corresponding remote tracking branch.
The Branch pu
, on the other hand, “lags” behind its upstream and is therefore displayed as behind
.
The only state missing here is diverged — then both ahead and behind are shown including the number of “missing” commits.
5.6. Distributed Workflow with Multiple Remotes
Git supports working with multiple remotes. A popular workflow that takes advantage of this feature is the Integration Manager Workflow. There is no “central” repository in the true sense of the word, that is, one that all active developers have write access to. Instead, there is only a quasi-official repository called blessed. It is accessible, for example, via the respective project domain and allows only the most important maintainers (or even only one) write access.
Everyone who wants to contribute to the project clones the blessed repository and starts working. As soon as he has fixed bugs or implemented a new feature, he makes his improvements available via a publicly accessible repository, a so-called developer public. He then sends a pull request to one of the maintainers of the official repository (or to the mailing list), requesting that certain code from his public repository be transferred to the official repository. You can see the infrastructure for this process in Figure 37, “Integration Manager Workflow”. Although it is theoretically possible to give interested parties direct access to your development machine, this almost never happens in practice.
One of the maintainers who have access to the master repository then checks if the code works, if it meets the quality requirements, etc. Any errors or ambiguities are reported to the author of the code, who then corrects them in his repository. Only when the maintainer is satisfied does he commit the changes to the master repository, so that the code is delivered in one of the following releases. Maintainers who integrate new code are often referred to as Integration Managers, which gives the workflow its name. Such maintainers often have several remotes configured, one for each contributor.
One of the great advantages of this workflow is that, in addition to the maintainers, interested users, such as colleagues or friends of the developer, also have access to the public developer repositories. They don’t have to wait until the code has found its way into the official repository, but can try out the improvements immediately after deployment. The hosting platform Github in particular relies heavily on this workflow. The web interface used there offers a lot of features to support this workflow, e.g. a visualization that shows all available clones of a project and the commits contained in them, as well as the possibility to perform merges directly in the web interface. For a detailed description of this service, see Ch. 11, GitHub.
5.7. Managing Remotes
With git remote
you can manage additional remotes.
For example, to add a new remote from another developer, use the command git remote add
.
Most of the time you’ll want to initialize the remote tracking branches afterwards, which you can do with git fetch
:
$ git remote add example git://example.com/example.git $ git fetch example ...
To do both steps in one call, use the $ git remote add -f example git://example.com/example.git |
If you no longer need the remote, you can remove it from your local configuration using git remote rm
.
This will also delete all remote tracking branches for that remote:
$ git remote rm example
Remotes do not necessarily have to be configured via git remote add
.
You can simply use the URL on the command line,[74] for example to download the objects and references for a bugfix:
$ git fetch git://example.com/example.git bugfix:bugfix
Of course this also works with pull
and push
.
If you work with several remotes, the command git remote update --prune
is a good choice.
This will fetch
all remotes, and the --prune
option will delete all expired remote tracking branches.
The following alias has proved to be very useful for us, as it combines many work steps that are often performed one after the other in practice: $ git config --global alias.ru "remote update --prune" |
5.7.1. Pull-Request
To generate a pull request automatically, there is the git command request-pull
.
The syntax is:
git request-pull <start> <URL> [<end>]
As <URL>
you specify your public repository (either as the actual URL or as a configured remote repository), and as <start>
you select the reference on which the feature is built (in many cases the branch master
, which should match the master branch of the official repository).
Optionally, you can specify an <end>
; if you omit this, Git will use HEAD
.
The output is by default STDOUT, and includes the repository’s URL and branch name, a short description of all commits by author, and a diff state, i.e., a balance of added and deleted lines by file.
This output can easily be forwarded to an e-mail program.
If you add the -p
option, a patch with all changes is appended below the text.
For example, to ask someone to download the two latest commits from a repository:
$ git request-pull HEAD~2 origin The following changes since commit d2640ac6a1a552781[...]c48e08e695d53: README verbessert (2010-11-20 21:27:20 +0100) are available in the git repository at: git@github.com:esc/git-cheatsheet-de.git master Valentin Haenel (2): Lizenz hinzugefügt URL hinzugefügt und Metadaten neu formatiert cheatsheet.pdf | Bin 89513 -> 95619 bytes cheatsheet.tex | 18 ++++++++++++++++-- 2 files changed, 16 insertions(), 2 deletions(-)
5.8. Exchanging Tags
Tags are also exchanged with the remote commands fetch
or pull
and push
.
In contrast to branches, which change, tags are “static”.
For this reason, remote tags are not referenced locally again, so there is no equivalent to the remote tracking branches for the tags.
Tags that you get from your remote repositories are stored by Git as .git/refs/tags/
or .git/packed-refs
, as usual.
5.8.1. Downloading Tags
In principle, Git automatically downloads new tags when you call git fetch
or git pull
.
That is, if you download a commit that has a tag pointing to it, that tag will be included.
However, if you use a refspec to exclude individual branches, then commits in those branches will not be downloaded, and thus no tags that may point to those commits will be downloaded.
Conclusion: Git only downloads relevant tags.
With the options --no-tags
(no tags) and --tags
or -t
(all tags) you can adjust the default behavior.
Note, however, that --tags
not only downloads the tags, but necessarily the commits to which they point.
Git notifies you when new tags arrive:
$ git fetch [fetch output] From git://git.kernel.org/pub/scm/git/git * [new tag] v1.7.4.2 -> v1.7.4.2
If you want to know what tags are present on the remote side, use git ls-remote
with the --tags
option.
For example, you can get all release candidates of git version 1.7.1
with the following call:
$ git ls-remote origin --tags v1.7.1-rc* bdf533f9b47dc58ac452a4cc92c81dc0b2f5304f refs/tags/v1.7.1-rc0 537f6c7fb40257776a513128043112ea43b5cdb8 refs/tags/v1.7.1-rc0^{} d34cb027c31d8a80c5dbbf74272ecd07001952e6 refs/tags/v1.7.1-rc1 b9aa901856cee7ad16737343f6a372bb37871258 refs/tags/v1.7.1-rc1^{} 03c5bd5315930d8d88d0c6b521e998041a13bb26 refs/tags/v1.7.1-rc2 5469e2dab133a197dc2ca2fa47eb9e846ac19b66 refs/tags/v1.7.1-rc2^{}
Git outputs the SHA-1 sums of the tags and their contents.[75]
5.8.2. Uploading Tags
Git does not automatically upload tags.
You need to pass them explicitly to git push
, similar to the branches, e.g. to upload the tag v0.1
:
$ git push origin v0.1
If you want to upload all tags at once, use the --tags
option.
But be careful: Avoid this option if you use Annotated Tags to mark versions and Lightweight Tags to mark something locally, as described in Sec. 3.1.3, “Tags — Marking Important Versions”, because with this option you would upload all tags, as already mentioned.
Attention: Once you have uploaded a tag, you should never change it!
The reason: Let’s say Axel changes a tag, like v0.7
, that he has already released.
First it pointed to the 5b6eef
commit, and now to bab18e
.
Beatrice had already downloaded the first version pointing to 5b6eef
, but Carlos had not yet.
The next time Beatrice calls git pull
, Git won’t download the new version from the v0.7
tag; the assumption is that tags don’t change, so Git doesn’t check the validity of the tag!
When Carlos now runs git pull
, he also gets the v0.7
tag, but it now points to bab18e
.
Finally, two versions of the tag — each pointing to different commits --- are in circulation.
Not a very helpful situation.
It gets really confusing when both Carlos and Beatrice use the same public repository, and upload all tags by default.[76]
The tag “jumps” back and forth between two commits in the public repository, so to speak; which version you get with a clone depends on who pushed last.
If you do get this mishap, you have two options:
-
The sensible alternative: Instead of replacing the tag, create a new one and upload it as well. Name the new tag according to the project conventions. If the old tag is
v0.7
, name the new one something likev0.7.1
. -
If you really want to replace the tag: Admit publicly (mailing list, wiki, blog) that you made a mistake. Let all developers and users know that a tag has changed and ask them to check the tag with you. The size of the project and your willingness to take risks will determine whether this solution is feasible.
5.9. Patches via E-mail
An alternative to setting up a public repository is to automatically send patches via email. The format of the email is chosen so that maintainers can have Git automatically apply patches received via email. Especially for small bug fixes and sporadic collaboration, this is usually less time-consuming and faster. There are many projects that rely on this type of exchange, most notably the Git project itself.
The majority of patches for Git are contributed via the mailing list.
There they go through a stringent review process, which usually leads to corrections and improvements.
The patches are improved by the author and sent back to the list until a consensus is reached.
Meanwhile, the maintainer regularly stores the patches in a branch in his repository, and makes them available for testing via the pu
branch.
If the patch series is considered finished by the participants on the list, the branch moves on to the different integration branches pu
and next
, where the changes are tested for compatibility and stability.
If everything is in order, the branch finally ends up in the master
and from there forms part of the next release.
The approach patches via e-mail is realized by the following git commands:
git format-patch
|
Format commits for sending as patches. |
git send-email
|
Send patches. |
git am
|
Add patches from a mailbox to the current branch (apply from mailbox). |
5.9.1. Exporting Patches
The git format-patch
command exports one or more commits as patches in Unix mailbox format and prints one file per commit.
The file names consist of a sequential numbering and the commit message, and end in .patch
.[77]
As an argument, the command expects either a single commit or a range such as A..B
.
If you specify a single commit, Git will evaluate this as the selection from the commit to the HEAD
.
Figure 38, “Formatting three commits to 'master' as patches” shows the initial situation.
We want to export the three commits in the fix-git-svn-docs
branch, that is, all commits from master
, as patches:
$ git format-patch master 0001-git-svn.txt-fix-usage-of-add-author-from.patch 0002-git-svn.txt-move-option-descriptions.patch 0003-git-svn.txt-small-typeface-improvements.patch
To export only the $ git format-patch -1 0001-git-svn.txt-small-typeface-improvements.patch This also works for any SHA-1 sums: $ git format-patch -1 9126ce7 0001-git-svn.txt-fix-usage-of-add-author-from.patch |
The generated files contain, among other things, the header fields From
, Date
and Subject
, which are used for sending as e-mail.
These fields are completed using the information available in the commit — author, date, and commit message.
The files also contain a diff-stat summary and the changes themselves as a patch in unified diff format.
The [PATCH m/n]
suffix[78] in the subject line is used later by Git to apply the patches in the correct order.
A corresponding excerpt follows:
$ cat 0003-git-svn.txt-small-typeface-improvements.patch From 6cf93e4dae1e5146242338b1b9297e6d2d8a08f4 Mon Sep 17 00:00:00 2001 From: Valentin HaenelDate: Fri, 22 Apr 2011 18:18:55 0200 Subject: [PATCH 3/3] git-svn.txt: small typeface improvements Signed-off-by: Valentin Haenel Acked-by: Eric Wong --- Documentation/git-svn.txt | 8 ++++---- 1 files changed, 4 insertions(), 4 deletions(-) diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt ...
If you plan to send a series of patches, it is recommended that you use the --cover-letter
option to create a kind of “cover page” in which you describe the series.
By default the file is called 0000-cover-letter.patch
.
Apart from the default headers, such a file looks like this:
Subject: [PATCH 0/3] *** SUBJECT HERE *** *** BLURB HERE *** Valentin Haenel (3): git-svn.txt: fix usage of --add-author-from git-svn.txt: move option descriptions git-svn.txt: small typeface improvements Documentation/git-svn.txt | 22 +++++++++++----------- 1 files changed, 11 insertions(+), 11 deletions(-)
As you can see, the Subject:
still has the prefix [PATCH 0/3]
; this way, all recipients can immediately see that it is a cover page.
The file also contains the output of git shortlog
and git diff --stat
.
Replace * SUBJECT HERE
with a subject and BLURB HERE
*
with a summary of the patch series.
Send the file together with the patch files.
Frequently, mailing lists to which patches are sent are used to criticize the patches in terms of content and syntax and to ask the author for improvement. Once the author has made the improvements, he sends the corrected series back to the list as a reroll. Depending on the size of the patch series and the requirements of the project, a patch series may go through several rerolls until it is accepted. When you send a patch series to a mailing list: Keep the commits on a separate branch, and incorporate the fixes in new commits (for missing functionality) or with interactive rebase (to adjust existing commits).
Then use the |
5.9.2. Sending Patches
Send the generated files with git send-email
(or an email client of your choice).
The command expects as its only mandatory argument either one or more patch files, a directory full of patches, or a selection of commits (in which case Git also calls git format-patch
internally):
$ git send-email 000* 0000-cover-letter.patch 0001-git-svn.txt-fix-usage-of-add-author-from.patch 0002-git-svn.txt-move-option-descriptions.patch 0003-git-svn.txt-small-typeface-improvements.patch Who should the emails appear to be from? [Valentin Haenel <valentin.haenel@gmx.de>] $ git send-email master /tmp/HMSotqIfnB/0001-git-svn.txt-fix-usage-of-add-author-from.patch /tmp/HMSotqIfnB/0002-git-svn.txt-move-option-descriptions.patch /tmp/HMSotqIfnB/0003-git-svn.txt-small-typeface-improvements.patch Who should the emails appear to be from? [Valentin Haenel <valentin.haenel@gmx.de>]
The command git send-email
sets the fields Message-Id
and In-Reply-To
.
This makes all e-mails after the first one look like replies to them and thus most mail programs will display them as a continuous thread:[79]
You can customize the command with options such as --to
, --from
and` --cc` (see the git-send-email(1)
man page).
However, if not specified, the essential information is queried interactively — most important is an address to which the patches should be sent.[80]
Before the emails are actually sent, you will see the header again; you should check if everything is as you want it, and then answer the question` Send this email? ([y]es|[n]o|[q]uit|[a]ll):` answer with y
for “yes”.
To get familiar with the command, you can first send all emails only to yourself or use the --dry-run
option.
As an alternative to |
If you want to use your preferred Mail User Agent (MUA) (e.g. Thunderbird, Kmail or others) to send patches, there may be a few things to consider. Some MUAs are notorious for mutilating patches so that Git won’t recognize them as such.[83]
5.9.3. Applying Patches
Patch emails exported with git format-patch
are translated back into commits by the git command git am
(apply from mailbox).
A new commit is created from each email, and its meta-information (author, commit message, etc.) is generated from the email header lines (From
, Date
).
As mentioned earlier, Git uses the number in the subject to determine the order in which the commits should be entered.
To complete the example from earlier: If the emails are in the Maildir directory patches
, then that’s enough:
$ git am patches Applying: git-svn.txt: fix usage of --add-author-from Applying: git-svn.txt: move option descriptions Applying: git-svn.txt: small typeface improvements
The command understands Maildir and mbox formats as well as files that contain the output of $ git \ am 0001-git-svn.txt-fix-usage-of-add-author-from.patch Applying: git-svn.txt: fix usage of --add-author-from |
When you apply patches from others using git am
, the values of Author/AuthorDate and Committer/CommitDate are different.
This means that both the author of the commit and the person who commits it are honored.
In particular, the attributes are retained; it remains traceable who wrote which lines of code.
With Gitk, the author and committer values are displayed by default; on the command line, set the --format=fuller
option, which is accepted by git log
and git show
, among others:
$ git show --format=fuller 12d3065 commit 12d30657d411979af3ab9ca7139b5290340e4abb Author: Valentin Haenel <valentin.haenel@gmx.de> AuthorDate: Mon Apr 25 23:36:15 2011 +0200 Commit: Junio C Hamano <gitster@pobox.com> CommitDate: Tue Apr 26 11:48:34 2011 -0700 git-svn.txt: fix usage of --add-author-from
With the Dictator and Lieutenants Workflow (Sec. 5.10, “A Distributed, Hierarchical Workflow”), it can happen that more than two people are involved in a commit.
In this case, it makes sense that everyone who reviews the patch also “approves” it, especially the author.
For this purpose, there is a --signoff
option (-s
for short) for the git commit
and git am
commands, which appends the committer’s name and email to the commit message:
Signed-off-by: Valentin Haenel <valentin.haenel@gmx.de>
This feature is especially useful for larger projects, which usually have guidelines on how to format commits and how best to send them.[84]
Conflicts can occur when patches are entered with git am
, e.g. if the patches are based on an older version and the lines concerned have already been changed.
In this case, the process is interrupted and you then have several options for how to proceed.
Either resolve the conflict, update the index and continue the process with git am --continue
, or skip the patch with git am --skip
.
Use git am --abort
to abort the process and restore the current status of the branch.
Because patches usually contain changes made by others, it can sometimes be difficult to find the right solution to a conflict.
The best strategy for patches that cannot be applied is to ask the author of the patches to rebase them to a well-defined base, such as the current master
, and send them again.
An alternative to |
5.10. A Distributed, Hierarchical Workflow
The Integration Manager workflow does not scale with the size of the project. With large growth, at some point the maintainer is overwhelmed by the complexity of the project and the number of incoming patches. The so-called Dictator and Lieutenants workflow, which is used extensively in the development of the Linux kernel, provides a remedy. In this case, the software is usually divided into different subsystems, and contributions are examined by the lieutenants (also subsystem maintainers) and then forwarded to the Benevolent Dictator. The Benevolent Dictator uploads the changes to the blessed repository, which in turn is synchronized with all other participants.
The workflow is based on trust: The dictator trusts his lieutenants and usually takes over their forwarded modifications without control. The advantage is that the dictator is exonerated, but still retains a veto right, which led to the title Benevolent Dictator.
For historical reasons, the official repository is often only the public repository of the current main maintainer or the original author. It is important to note that this repository exists only because of social conventions. Should another developer one day better advance the project, his public repository may become the new Blessed Repository. From a technical point of view, there is no reason not to do so.
The projects that use this workflow in practice prefer to exchange patches by mail. However, the nature of the exchange is secondary, and subsystem maintainers may just as well receive pull requests from developers they know; or they may mix public repositories and patches sent by email at will. Git’s flexibility — especially the variety of different methods for exchanging changes — supports every conceivable workflow in the spirit of free, open development. Certainly a feature that has contributed greatly to Git’s popularity.
5.11. Managing Subprojects
For larger software projects, it is sometimes necessary to outsource certain parts of a program into separate projects. This is the case in the following situations, for example:
-
Your software depends on a specific version of a library that you want to ship with the source code.
-
Your initially small project grows so large over time that you want to move functionality to a library that you want to manage as a separate project.
-
Independent parts of your software are managed by other development groups.
With Git, you can use it in two different ways: You can manage the modules as Git submodules or as subtrees — in either case, you manage source code in a subdirectory of your project.
As submodules, you manage an isolated repository that has nothing to do with your parent repository. If you work with subtrees instead, the project history of the subdirectory becomes inseparable from the parent project. Both have advantages and disadvantages.
We’ll look at both techniques by way of example, creating a fictional project that requires libgit2
.
The library provides, similar to libgit.a
, an API to examine and modify Git repositories.[85]
The library, written in C, can extend its functions to Lua, Ruby, Python, PHP and JavaScript, among others.
5.11.1. Submodules
Submodules are managed by Git as subdirectories that have a special entry in the .gitmodules
file.
The command git submodule
is responsible for handling them.
First we need to import the library. This is done with the following command:
$ git submodule add git://github.com/libgit2/libgit2.git libgit2 Cloning into libgit2... remote: Counting objects: 4296, done. remote: Compressing objects: 100% (1632/1632), done. remote: Total 4296 (delta 3214), reused 3530 (delta 2603) Receiving objects: 100% (4296/4296), 1.92 MiB | 788 KiB/s, done. Resolving deltas: 100% (3214/3214), done.
From the output of git status
we can now see that there is a new directory libgit2
and that the file .gitmodules
with the following content has been created
[submodule "libgit2"] path = libgit2 url = git://github.com/libgit2/libgit2.git
This file has already been added to the index, prepared for committing.
The libgit2
directory, on the other hand, does not appear in the output of git diff --staged
as usual:
$ git diff --staged -- libgit2 diff --git a/libgit2 b/libgit2 new file mode 160000 index 0000000..b64e11d --- /dev/null +++ b/libgit2 @@ -0,0 +1 @@ +Subproject commit 7c80c19e1dffb4421f91913bc79b9cb7596634a4
Instead of listing all the files in the directory, Git saves a “special” file (recognizable by the unusual file mode 160000
) that simply records the commit the module is currently on.
We import these changes, and from now on we can compile libgit2
in its subdirectory and then link against it:
$ git commit -m "libgit2-submodule importiert"
The parent project and libgit2
are now merged in the working tree, but their version history is and remains separate.
In the Git repository of libgit2
you can behave exactly the same way as in a “real” repository.
For example, you can look at the output of git log
in the parent project and after a cd libgit2
in the submodule.
5.11.1.1. Changes in Submodules
Now libgit2
has selected the branch development as default branch (i.e. the HEAD
on the server side).
It may not be the best idea to more or less wire this development branch to your repository.
So we change to the libgit2
directory and check out the latest tag, v0.10.0
:
$ cd libgit2 $ git checkout v0.10.0 # Nachricht über "detached HEAD state" $ cd .. $ git diff diff --git a/libgit2 b/libgit2 index 7c80c19..7064938 160000 --- a/libgit2 +++ b/libgit2 @@ -1 +1 @@ -Subproject commit 7c80c19e1dffb4421f91913bc79b9cb7596634a4 +Subproject commit 7064938bd5e7ef47bfd79a685a62c1e2649e2ce7
So the parent Git repository sees a change of HEAD
, which was done by the git checkout v0.10.0
command in libgit2/
, as a change to the pseudo-file libgit2
, which now points to the corresponding new commit.
Now we can add this change to the index and save it as a commit:
$ git add libgit2 $ git commit -m "Libgit2-Version auf v0.10.0 setzen"
Attention: Never add files from libgit2
or the directory libgit2/
(ends with slash) — this breaks the modular concept of Git, you will suddenly manage files from the submodules in the parent project.
Similarly, you can use submodule update
(or git remote update
in the libgit2/
directory) to download new commits and record a library update in the parent repository accordingly.
5.11.1.2. From a User Perspective
So what does it all look like from the perspective of a user cloning the project for the first time? First, it’s obvious that the submodule(s) are not hard-coded into the repository and are not shipped with it:
$ git clone /dev/shm/super clone-super $ cd clone-super $ ls bar.c foo.c libgit2/ $ ls -l libgit2 total 0
The directory libgit2/
is empty.
So everything Git knows about the submodules is in the .gitmodules
file.
You need to initialize this module first and then download the module’s repository:
$ git submodule init Submodule 'libgit2' (git://github.com/libgit2/libgit2.git) registered for path 'libgit2' $ git submodule update ... Submodule path 'libgit2': checked out '7064938bd5e7ef47bfd79a685a62c1e2649e2ce7'
So we see that libgit2
is automatically set to the v0.10.0
version defined in our repository.
But in principle the user can now also change to the directory, check out the branch development
and compile the project against this version.
Submodules get the flexibility of the sub-repository — so the entry on which state the module is on is only a “recommendation”.
5.11.2. Subtrees
Unlike submodules, which maintain their character as a standalone Git repository, when you work with Subtrees, you directly merge the history of two projects. A comparison of the two approaches follows.
Essentially, this technique is based on so-called subtree-merges, which were briefly discussed in Sec. 3.3.3, “Merge Strategies” about merge strategies.
In our example, a subtree-merge is done by merging regular commits from the libgit2
repository under the libgit2/
tree (directory) — a top-level file in the library repository thus becomes a top-level file in the libgit2/
tree, which in turn is part of the repository.
Git has a command to manage subtree-merges.[86]
You must always explicitly specify which subdirectory you are referring to by using -P <prefix>
.
To import the libgit2
in version 0.8.0, use:
$ git subtree add -P libgit2 \ git://github.com/libgit2/libgit2.git v0.8.0 git fetch git://github.com/libgit2/libgit2.git v0.8.0 From git://github.com/libgit2/libgit2 * tag v0.8.0 -> FETCH_HEAD Added dir 'libgit2'
The command automatically downloads all required commits and creates a merge commit that creates all the files of libgit2
under the directory libgit2/
.
The merge commit now links the previous version history to that of libgit2
(by referencing an original commit and then referencing other commits).
The result of this procedure is that your repository now contains all relevant commits from libgit2
.
Your repository now has two root commits (see also multi-root repositories in Sec. 4.7, “Multiple Root Commits”).
The files are now stored inseparably linked to the project.
A git clone
of this repository would also transfer all files under libgit2.[87]
Now what happens when you want to “upgrade” to v0.10.0
?
Use the pull
command from git subtree
for this:
$ git subtree -P libgit2 \ pull git://github.com/libgit2/libgit2.git v0.10.0 From git://github.com/libgit2/libgit2 * tag v0.10.0 -> FETCH_HEAD Merge made by the 'recursive' strategy. ...
Note: Since the original libgit2
commits are present, these commits also seem to change top-level files (e.g., COPYING
when you use git log --name-status
to examine the version history).
In fact, these changes are actually made in libgit2
, which is the responsibility of the merge commit, which aligns the trees accordingly.
If you’re not interested in the version history of a subproject, but want to anchor a particular state in the repository, you can use the |
5.11.2.1. Splitting off a Subdirectory
At some point, you may be faced with the task of managing a subdirectory of your project as a separate repository. However, you may still want to integrate the changes into the original project.
For example, the documentation stored under doc/
will be managed in a separate repository from now on.
Occasionally, that is, every few weeks, you want to transfer the latest developments to the master repository.
The git sub-tree
command provides a separate sub-command split
for this purpose, which you can use to automate this step.
It creates a version history containing all changes to a directory, and issues the latest commit — which you can then upload to an (empty) remote.
$ git subtree split -P doc --rejoin Merge made by the 'ours' strategy. 563c68aa14375f887d104d63bf817f1357482576 $ git push <neues-doku-repo> 563c68aa14375:refs/heads/master
The --rejoin
option causes the version history split off in this way to be directly reintegrated into the current project via git subtree merge
.
From now on you can integrate the new commits via git subtree pull
.
If you want to use the --squash
option instead, omit --rejoin
.
5.11.3. Submodules vs. Subtrees
The question “Submodules or Subtrees?” cannot be answered in general, but only on a case by case basis. The decisive criterion should be the affiliation of the subproject to the superordinate one: If you include third-party software, it is probably more likely to be submodules, your own with limited commits and a direct relationship to the main project rather than a subtree.
For example, when you install CGit (see Sec. 7.5, “CGit — CGI for Git”), you must initialize and update a submodule to compile libgit.a
.
So CGit needs the source code of Git, but doesn’t want to merge the development history with that of Git (the comparatively few CGit commits would be lost in this!).
You can, however, compile CGit against another version of Git if you wish — the flexibility of the sub-repository is preserved.
The graphical repository browser Gitk, on the other hand, is managed as a subtree.
It is developed in git://ozlabs.org/~paulus/gitk
, but is included in the main Git repository with the subtree-merge strategy under gitk-git/
.