Git Book — 4. Advanced Concepts

4. Advanced Concepts

The following chapter covers selected advanced concepts. The focus is on the Rebase command with its many applications. We find out who changed a line in the source code (Blame) and when, and how to tell Git to ignore files and directories. We’ll also look at how to stash changes to the working tree and annotate commits (Notes). Finally, we show you how to quickly and automatically find commits that introduce a bug (Bisect).

4.1. Moving commits — Rebase

In the section on Git’s internals, we mentioned earlier that you can move and modify commits in a Git repository (graphically speaking) at will. In practice, this is made possible primarily by the git command rebase. This command is very powerful and important, but sometimes a bit more demanding to use.

Rebase is an artificial word which means “to put something on a new basis”. What it means is that a group of commits is moved around within the commit graph, building commit after commit based on another node. The following graphics illustrate how this works:

Figure 22. Before the rebase

Figure 23. …and after that

In its simplest form the command is git rebase <reference> (in the above diagram: git rebase master). This means that Git first marks all commits <reference>..HEAD, i.e. the commits that can be reached from HEAD (the current branch) minus the commits that can be reached from <reference> - in other words, everything that is in the current branch but not in <reference>. In the diagram, these are E and F.

The list of these commits is stored temporarily. Git then checks out the commit <reference> and copies the individual cached commits in the original order as new commits to the branch.

There are a few points to consider:

Because the first node of the topic branch (E) now has a new predecessor (D), its metadata and thus its SHA-1 sum changes (it becomes E_). The second commit (F) then also has a different predecessor (E_ instead of E), its SHA-1 sum changes (it becomes F_) and so on - this is also called the ripple effect. Overall, all copied commits will have new SHA-1 sums - so they’re the same (in terms of changes), but not identical.

Such an action, just like a merge operation, can result in conflicting changes. Git can partially resolve them automatically, but aborts with an error message if the conflicts are not trivial. The rebase process can then either be “repaired” and continued, or aborted (see below).

If no other reference points to node F, it will be lost, because reference HEAD (and the corresponding branch, if applicable) will be shifted to node F_ in case of a successful rebase. So if F has no more reference (and no predecessors referencing F), Git can no longer find the node, and the tree “disappears”. If you’re not sure whether you need the original tree again, you can simply reference it with the tag command, for example. In that case, the commits will be preserved even after a rebase (but then in duplicate at different places in the commit graph).

4.1.1. An Example

Consider the following situation: The sqlite-support branch branches off from the “fixed a bug…” commit. But the master branch has already moved on, and a new 1.4.2 release has been made.

Figure 24. Before the rebase

Now sqlite-support is checked out and rebuilt to master:

$ git checkout sqlite-support
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: include sqlite header files, prototypes
Applying: generalize queries
Applying: modify Makefile to support sqlite

Rebase applies the three changes introduced by commits from the sqlite-support branch to the master branch. After that, the repository looks like this in Gitk:

Figure 25. After rebase

4.1.2. Extended Syntax and Conflicts

Normally git rebase will always build the branch you are currently working on on a new one. However, there is a shortcut: If you want to base topic on master, but you are on a completely different branch, you can do this via

$ git rebase master topic

Git does the following internally:

$ git checkout topic
$ git rebase master

Please note the (unfortunately not very intuitive) order:

git rebase <on which> <what>

A rebase can lead to conflicts. The process then stops with the following error message:

$ git rebase master
...
CONFLICT (content): Merge conflict in <datei>
Failed to merge in the changes.
Patch failed at ...
The copy of the patch that failed is found in:
   .../.git/rebase-apply/patch

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".

You proceed as with a regular merge conflict (see Sec. 3.4, “Resolving Merge Conflicts”) - git mergetool is very helpful here. Then simply add the changed file via git add and let the process continue via git rebase --continue.⁠^[54]

Alternatively, the problematic commit can be skipped using the git rebase --skip command. The commit is then lost unless it is referenced in another branch somewhere else! So you should only perform this action if you are certain that the commit is obsolete.

If none of this helps (e.g. if you can’t solve the conflict at that point, or if you realize that you are rebuilding the wrong tree), pull the emergency brake: git rebase --abort. This will discard all changes to the repository (including successfully copied commits), so that the state afterwards is exactly the same as it was when the rebase process was started. The command also helps if at some point you forget to finish a rebase process, and other commands complain that they can’t do their job because a rebase is in progress.

4.1.3. Why Rebasing Makes Sense

Rebase is primarily useful for keeping the commit history of a project simple and easy to understand. For example, a developer might be working on a feature, but then have something else to do for a few weeks. Meanwhile, however, development on the project has progressed, there’s been a new release, etc. Only now does the developer get to finish a feature. (Even if you want to send patches via email, rebase helps to avoid conflicts, see Sec. 5.9, “Patches via E-mail”.)

For the version history it is now much more logical if his feature was not “dragged along” unfinished for a long period of time alongside the actual development, but if the development branches off from the last stable release.

Rebase is good for exactly this change in history: The developer can now simply enter the command git rebase v1.4.2 on the branch where he developed the feature, to rebuild his feature branch on the commit with the release tag v1.4.2. This makes it much easier to see what differences the feature really brings to the software.

It also happens to every developer in the heat of the moment that commits end up in the wrong branch. There is a bug that happens to be there, which is quickly fixed by a commit; but then a test must be written directly to avoid this bug in the future (another commit), and this must be noted in the documentation. After the actual work is done, you can use Rebase to “transplant” those commits to another location in the commit graph.

Rebase can also be useful if a branch requires a feature that has only recently been incorporated into the software. A merge of the master branch does not make sense semantically, because then these and other changes are inseparably merged with the feature branch. Instead, you rebase the branch on a new commit that already contains the required feature, and then use that in further development.

4.1.4. When Rebasing Is Not Useful — Rebase vs. Merge

The concept of rebase is initially a little difficult to understand. But once you have understood what is possible with it, the question arises: What is the point of a simple merge if you can edit everything with rebase?

When git-rebase is not used, or hardly used at all, a project history often develops that becomes relatively unmanageable, because merges have to be performed constantly and for a few commits at a time.

If, on the other hand, too much rebase is used, there is a danger that the entire project will be senselessly linearized: The flexible branching of Git is used for development, but the branches are then integrated into the publishing branch one after the other (!) like a zip fastener via rebase. This presents us with two main problems:

Logically related commits are no longer recognizable as such. Since all commits are linear, the development of multiple features is inextricably intertwined.

The integration of a branch can no longer be easily undone, because identifying those commits that once belonged to a feature branch is only possible manually.

This is how you can make the most of Git’s flexible branching. The conclusion is that rebase should be used neither too much nor too little. Both make the project history (in different ways) confusing.

In general, you are doing well with the following rules of thumb:

A feature is integrated by merge when it is finished. It is best to avoid creating a fast forward merge so that the merge commit is preserved as the time of integration.

While you are developing, you should use rebase frequently (especially interactive rebase, see below).

Logically separate units should be developed on separate branches - logically related ones possibly on several, which are then merged by rebase (if that makes sense). The merging of logically separate units is then done by merge.

4.1.5. A Word of Warning

As mentioned earlier, a rebase inevitably changes the SHA-1 sums of all commits that are “rebuilt”. If these changes have not yet been published, that is, if a developer has them in a private repository, that’s not too bad either.

But if a branch (e.g. `master`) is published⁠^[55] and later rewritten via rebase, this has unpleasant consequences for all involved: All branches based on master will now reference the old copy of the master branch that has been rewritten. So each branch must be rebased to the new master (which in turn changes all commit IDs). This effect continues, and can be very time-consuming to fix (depending on when such a rebase happens, and how many developers are involved in the project), especially if you’re new to git.

Therefore you should always remember the following rule:

Only edit unpublished commits with the rebase command!

Exceptions are conventions like personal branches or pu. The latter is an abbreviation for Proposed Updates and is usually a branch where new, experimental features are tested for compatibility. No one builds their own work on this branch, so it can be rewritten without problems and prior notice.

Another possibility is offered by private branches, i.e. those that start with <user>/ for example. If you make an agreement that developers will do their own development on these branches, but always base their features on “official” branches, then the developers may rewrite their branches as they wish.

4.1.6. Avoiding Code Duplication

If a feature is being developed over a long period of time, and parts of the feature are already flowing into a mainstream release (e.g. via cherry-pick), the rebase command will detect these commits and omit them when copying or rebuilding the commits, because the change is already contained in the branch.

For example, after a rebase, the new branch consists only of the commits that have not yet been incorporated into the base branch. This way, commits do not appear twice in the version history of a project. If the branch had simply been merged, the same commits with different SHA-1 sums would sometimes be present in different places in the commit graph.

4.1.7. Managing Patch Stacks

There are situations where there is a vanilla version (“simplest version”) of a piece of software and also a certain number of patches applied to it before the vanilla version is shipped. For example, your company builds software, but before each delivery to the customer, some adjustments have to be made (depending on the customer). Or you have open source software in use, but have adapted it a bit to your needs - every time a new, official version of the software is released, you have to reapply your changes and then rebuild the software.⁠^[56]

To manage patch stacks, there are some programs that build on top of Git, but give you the convenience of not having to work directly with the rebase command. For example, TopGit⁠^[57] allows You can define dependencies between branches - if something changes in a branch and other branches depend on it, TopGit will rebuild them on demand. An alternative to TopGit is Stacked Git⁠^[58].

4.1.8. Restricting Rebase via --onto

Now, you may have wondered: git rebase <reference> always copies all commits that are between reference> and HEAD. But what if you only want to implement part of a branch, to “transplant” it, so to speak? Consider the following situation:

Figure 26. Before the rebase --onto

You were developing a feature on the branch topic when you noticed a bug; you created a branch bugfix and found another bug. Semantically speaking, your branch bugfix has nothing to do with the topic branch. Therefore, it makes sense to branch off from the master branch.

But if you now rebuild the branch bugfix using git rebase master, the following happens: All nodes that are in bugfix but not in master are copied to the master branch in order - that is, nodes D, E, F, and G. However, D and E are not part of the bugfix at all.

This is where the --onto option comes into play: It allows you to specify a start and end point for the list of commits to be copied. The general syntax is

git rebase --onto <on which> <start> <end>

In this example, we only want to build the commits F and G (or also: the commits from topic to bugfix) from the top of master. Therefore the command is

$ git rebase --onto master topic bugfix

The result looks as expected:

Figure 27. After the rebase --onto

4.1.9. Improving a Commit

You have learned about the commit --amend command in Sec. 2.1, “Git Commands”, which you can use to improve a commit. However, it only refers to the current (last) commit. With rebase --onto you can also adjust commits that are further back in the past.

First, find the commit you want to edit and create a branch to it:

$ git checkout -b fix-master 21d8691

Then you make your changes, add changed files with git add, and then correct the commit with git commit --amend --no-edit (the --no-edit option takes meta-information like the description of the old commit and does not offer it again for editing).

Now apply all the commits from the master branch from above to your corrected commit:

$ git rebase --onto fix-master 21d8691 master

This will copy all commits from 21d8691 (exclusive!) to master (inclusive!). The faulty commit 21d8691 is no longer referenced, and therefore no longer appears. The fix-master branch is now obsolete and can be deleted.

An equivalent way to edit a commit is with the edit action in the interactive rebase (see Sec. 4.2.2, “Editing Commits Arbitrarily”).

4.1.10. Fine Adjustment of Rebase

There are situations where you may need to adjust the default git rebase behavior. First, this is the case when you use rebase to edit a branch that contains merges. rebase may try to mimic these instead of linearizing the commits. The -p' or `--preserve-merges option is responsible for this. ⁠^[59]

With the -m or --merge option, you can tell git rebase to use merge strategies (see also Sec. 3.3.3, “Merge Strategies”). When using these strategies, keep in mind that rebase internally commits commit by commit to the new branch via cherry-pick; therefore the roles of ours and theirs are reversed: theirs refers to the branch you are building on a new base!

An interesting use case is therefore the strategy option theirs for the merge strategy recursive: If conflicts occur, priority is given to changes from the commit being copied. So such a scenario is useful if you know that there are conflicting changes, but are certain that the changes from the branch you are building are more correct than those from the tree you are building on. If you rebuild topic to master, such a call would look like this:

$ git checkout topic
$ git rebase -m -Xtheirs master

In cases where the recursive (default) strategy gives preference to changes from commits from topic', you will find a corresponding note `Auto-merging <commit description>.

A small, very useful option that rebase passes directly to git apply is --whitespace=fix. It causes Git to automatically correct whitespace errors (such as trailing spaces). If you have merge conflicts due to whitespace (for example, due to changed indentation), you can also use the strategy options presented in Sec. 3.3.4, “Options for the Recursive Strategy” to have solutions generated automatically (for example, by specifying -Xignore-space-change).

4.2. Rewriting History — Interactive Rebase

Rebase knows an interactive mode; it is technically implemented in the same way as the normal mode, but the typical use case is quite different, because the interactive rebase allows to rewrite the story, i.e. to edit commits at will (and not just move them).

In the interactive rebase you can

change the order of commits

delete commits

merge commits

split a commit into several ones

adjust the description of commits

edit commits in any other way you can think of

You activate the mode with the option i or interactive. Basically, the rebase process will run exactly as before, but you will get a list of commits that rebase will rewrite before the command starts. This could look like this, for example:

pick e6ec2b6 Fix expected values of setup tests on Windows
pick 95b104c t/README: hint about using $(pwd) rather than $PWD in tests
pick 91c031d tests: cosmetic improvements to the repo-setup test
pick 786dabe tests: compress the setup tests
pick 4868b2e Subject: setup: officially support --work-tree without
   --git-dir

Below this list is a help text that describes what you can do with the listed commits. Essentially, there are six possible actions for each commit. You simply write the action at the beginning of the line, before the SHA-1 sum, instead of the standard pick action. The following are the actions-you can also abbreviate each one by its initial letter, e.g., s for squash.

pick: “Use commit” (default). Corresponds to the handling of commits in the non-interactive rebase.

-: If you delete a line, the commit is not used (will be lost).

reword: Adjust the commit description.

squash: merge commit with the previous one; editor is opened to merge the descriptions

fixup: Like squash, but throws away the description of the commit.

edit: Free editing. You can perform arbitrary actions.

exec: The rest of the line is executed as a command on the shell. If the command does not end successfully (i.e. with a return value of 0), the rebase stops.

The pick action is the simplest — it simply says that you want to use the commit, rebase should take that commit as it is. The opposite of pick is simply deleting an entire line. The commit is then lost (like git rebase --skip).

If you switch the order of the lines, Git will apply the commits in the newly defined order. In the beginning, the lines are in the order in which they will be applied later — that is, the exact opposite of the order in the tree view! Note that commits often build on top of each other; therefore, swapping commits will often cause conflicts if the commits make changes on the same files and in the same places.

The reword command is handy if you have typos in a commit message and want to correct them (or haven’t written a detailed one yet and want to do so now). The rebase process is stopped at the process marked reword, and Git starts an editor that already displays the commit message. Once you exit the editor (don’t forget to save!), Git will enter the new description and let the rebase process continue.

4.2.1. Correcting Small Errors: Bug Squashing

The squash and fixup commands allow two or more commits to be merged together.

Nobody always writes error-free code immediately. Often there is a big commit in which you have implemented a new feature; shortly after that, small bugs are found. What to do? A detailed description of why you forgot to add or remove a line out of carelessness? Not really useful, and especially annoying for other developers who want to review your code later. It would be nice to maintain the illusion that the commit was bug-free the first time…

For every bug you find, make a small commit with a more or less meaningful description. This could look like this, for example:

$ git log --oneline master..feature
b5ffeb7 fix feature 1
34c4453 fix feature 2
ac445c6 fix feature 1
ae65efd implement feature 2
cf30f4d implement feature 1

When some such commits have accumulated, start an interactive rebase process over the last commits. Simply estimate how many commits you want to work on, and then edit the last five using git rebase -i HEAD~5, for example.

In the editor the commits now appear in reverse order compared to the output of git log. Now arrange the small bugfix commits so that they are below the commit you are fixing. Then mark the fix commits with squash (or s), like this:

pick cf30f4d implement feature 1
s ac445c6 fix feature 1
s b5ffeb7 fix feature 1
pick ae65efd implement feature 2
s 34c4453 fix feature 2

Save the file and close the editor; the rebase process starts. Because you selected squash, rebase stops after commits are merged. The editor will display the commit messages of the merged commits, which you now summarize appropriately. If you use the keyword fixup, or f for short, instead of squash, the commit message of the commits marked in this way will be thrown away—probably more convenient for this way of working.

After the rebase the version history looks much tidier:

$ git log --oneline master..feature
97fe253 implement feature 2
6329a8a implement feature 1

It often happens that you want to "`lock" a small change into the last commit you made. Here the following alias is useful, which is similar to the fixup action:

$ git config --global alias.fixup "commit --amend --no-edit"

As mentioned above, the --no-edit option inherits one-to-one the meta-information of the old commit, especially the commit message.

If you start the commit message with fixup! or squash! followed by the beginning of the description of the commit you want to fix, you execute the command

$ git rebase -i --autosquash master

The commits marked with fixup! or squash! as above are automatically moved to the correct position and given the action squash or fixup. This allows you to exit the editor directly, and the commits are merged. If you frequently work with this option, you can also make this behavior the default for rebase calls by setting a configuration option: To do this, set the rebase.autosquash setting to true.

4.2.2. Editing Commits Arbitrarily

If you mark a commit with edit, it can be edited as you wish. rebase will go through the commits sequentially, as in the other cases. For the commits marked edit, rebase stops and HEAD is set to the corresponding commit. You can then modify the commit as if it were the most recent in the branch. Afterwards, you let rebase continue running:

$ vim ...
// # Korrekturen vornehmen
# Making corrections
$ git add ...
$ git commit --amend
$ git rebase --continue

4.2.2.1. Splitting Commits

Every programmer knows this: Checking in every change in a disciplined and meticulous way is exhausting and often interrupts the workflow. In practice, this leads to commits that are large and confusing. But this way, the version history is available to other developers - and to yourself! - and yourself, the changes should be split into as small logical units as possible.

By the way, it is not only helpful for developers to proceed this way. Also the automated debugging using git bisect works better and more accurate the smaller and more useful the commits are (see `Sec. 4.8, “Finding Regressions — Git Bisect”).

With a little experience, you can split a commit very quickly. If you frequently produce large commits, the following step should become routine.

First you start the rebase process and mark the commit you want to split with edit. rebase stops there, HEAD points to that commit.

You then reset HEAD a commit, but without discarding the changes from HEAD (the commit to be split). This is done with the reset command (see also Sec. 3.2.3, “Reset and the Index”; note that if you still need the commit description, you should copy it first):

$ git reset HEAD^

The changes caused by the commit being split are still present in the files, but the index and repository reflect the state of the previous commit. So you have moved the changes from the commit to be split to the unstaged state (you can verify this by looking at git diff before and after the reset call).

Now you can add some lines, create a commit, add more lines, and finally create a third commit for the remaining lines:

$ git add -p
$ git commit -m "Erster Teil"
$ git add -p
$ git commit -m "Zweiter Teil"
$ git add -u
$ git commit -m "Dritter (und letzter) Teil";

What happens? You have reset the HEAD by using the reset command. With each call to git commit you create a new commit, based on the respective HEAD. Instead of the one big commit (which you threw away with the reset call) you have now put three smaller commits in its place.

Now let rebase continue (git rebase --continue) and build the remaining commits from the top of HEAD (which is now the latest of your three commits).

4.3. Who Made These Changes? — Git Blame

Like other version control systems, Git has a blame or annotate command that puts the date and author of the last change on all lines in a file. This allows you to quickly find out, for example, who is responsible for a line of code that causes a problem, or since when the problem has existed.

The command annotate is only intended for people who are changing to other formats and has the same functionality as the command blame, but a slightly different output format. So you should always use blame if in doubt.

Useful options are -M to display code shifts, and -C to display code copies. You can then use the file name in the output to see from which file code may have been copied or moved. If no file name is displayed, Git couldn’t find any code moves or copies. If you use these options, it’s usually a good idea to suppress the author and date with -s so that the display still fits the screen.

From the following output you can see, for example, that the function end_url_with_slash originally came from the file http.c. The option -L<m>,<n> limits the output to the corresponding lines.

$ git blame -C -s -L123,135 url.c
638794cd url.c  123) char *url_decode_parameter_value(const char
 **query)
638794cd url.c  124) {
ce83eda1 url.c  125)    struct strbuf out = STRBUF_INIT;
730220de url.c  126)    return url_decode_internal(query, "&", &out,
 1);
638794cd url.c  127) }
d7e92806 http.c 128)
eb9d47cf http.c 129) void end_url_with_slash(struct strbuf *buf, const
 char *url)
5ace994f http.c 130) {
5ace994f http.c 131)    strbuf_addstr(buf, url);
5ace994f http.c 132)    if (buf->len && buf->buf[buf->len - 1] != _/_)
5ace994f http.c 133)            strbuf_addstr(buf, "/");
5ace994f http.c 134) }
3793a309 url.c  135)

4.3.1. Blaming with Graphics

A convenient alternative to git blame on the console is the graphical tool git gui blame (you may need to install the git-gui package for this).

Figure 28. A piece of code, which was moved from another file

If you examine a file via git gui blame <file>, the different blocks that originate from different commits are displayed with a grey background. On the left you see the abbreviated commit ID and the initials of the author.

Only when you hover your mouse over such a block does a small popup window appear with information about the commit that changed the lines, possibly with a message stating from which file and which commit this block of code was moved or copied.

In code review, people are often interested in how a file actually looked like before a certain change was made. For this purpose, the graphical blame tool offers the following possibility to go back in the version history: Right-click on the commit ID of a code block and select Blame Parent Commit from the context menu - now the predecessor of this change is displayed. You can go back several steps this way. Use the green arrow in the upper left corner to jump back into the future again.

4.4. Ignoring Files

In almost every project there are files that you do not want to version. Be it the binary output of the compiler, the autogenerated documentation in HTML format or the backup files generated by your editor. Git offers several levels of ignoring files:

user-specific setting

repository-specific setting

repository-specific setting, which will be checked in with

Which option you choose depends entirely on your application. The user-specific settings should contain files and patterns that are relevant to the user, for example backup files that your editor creates. Such patterns are usually stored in a file in the $HOME directory. With the option core.excludesfile you specify which file this should be, e.g. in the case of ~/.gitignore:

$ git config --global core.excludesfile ~/.gitignore

Certain files and patterns are bound to a project and are valid for each participant, e.g. compiler output and autogenerated HTML documentation. You store these settings in the file .gitignore, which you check in as normal and thus deliver to all developers.

Finally, the .git/info/exclude file can be used for repository-specific settings that should not be delivered with a clone, i.e. settings that are both project and user specific.

4.4.1. Pattern Syntax

The syntax for patterns is based on the shell syntax:

Blank lines have no effect and can be used for structuring and separating.

Lines starting with a # are considered comments and have no effect.

Expressions beginning with ! are evaluated as negation.

Expressions ending with a / are evaluated as directory. The expression man/ covers the directory man, but not the file or symlink with the same name.

Expressions that do not contain a / will be evaluated as shell glob for the current and all subdirectories. The expression *.zip in the topmost .gitignore, for example, covers all zip files in the project’s directory structure.

The expression * covers zero or more files and directories. Both t/data/set1/store.txt and t/README.txt are covered by the pattern t/*/*.txt.

Otherwise the pattern is evaluated as a shell globe, more precisely as a shell globe evaluated by the function fnmatch(3) with the flag FNM_PATHNAME. This means that the pattern doc/*html captures doc/index.html, but not doc/api/singleton.html.

Expressions beginning with a / are bound to the path. For example, the expression /*.sh includes upload.sh but not scripts/check-for-error.sh.

An example:⁠^[60]

$ cat ~/.gitignore
# vim swap files
.*.sw[nop]

# python bytecode
*.pyc

# documents
*.dvi
*.pdf

# miscellaneous
*.*~
*.out

4.4.2. Ignoring and Tracking Later

Files that are already versioned are not automatically ignored. To ignore such a file anyway, explicitly tell Git to “forget” the file:

$ git rm documentation.pdf

To delete the file with the next commit, but still keep it in the working tree:

$ git rm --cached documentation.pdf

Files that are already ignored will not appear in the output of git status. Also, git add refuses to accept the file; the --force and -f options force Git to consider the file after all:

$ git add documentation.pdf
The following paths are ignored by one of your .gitignore files:
documentation.pdf
Use -f if you really want to add them.
fatal: no files added
$ git add -f documentation.pdf

4.4.3. Deleting Ignored and Unknown Files

The git clean command deletes ignored as well as unknown (so-called untracked) files. Since files may be irretrievably lost, the command has the --dry-run (or -n) option; it tells you what would be deleted. As a further precaution, the command refuses to delete anything unless you explicitly pass the --force or -f.⁠^[61] option

By default, git clean only deletes the unknown files, with -X it only removes the ignored files, and with -x it removes both unknown and ignored files. With the option -d it additionally deletes directories that come into question. So to delete unknown as well as ignored files and directories, enter

$ git clean -dfx

4.5. Outsourcing Changes — Git Stash

The stash is a mechanism used to temporarily store changes in the working tree that have not yet been saved. A classic use case: your boss asks you to fix a critical bug as soon as possible, but you have just started to implement a new feature. With the git stash command, you can temporarily clean out the unfinished lines without creating a commit, and thus address the bug with a clean working tree. The stash also provides a workaround if you cannot change the branch because this would result in losing changes (see also Sec. 3.1.2, “Managing Branches”).

4.5.1. Basic Usage

With git stash you save the current state of working tree and index, if they differ from HEAD:

$ git stash
Saved working directory and index state WIP on master: b529e34 new spec
 how the script should behave
HEAD is now at b529e34 new spec how the script should behave

With the --keep-index option the index remains intact. This means that all changes that are already in the index remain in the working tree and in the index and are additionally stored in the stash.

The changes to the working tree and index are "put aside", and Git does not create a commit on the current branch. To restore the saved state again, ``apply the saved patch to the current working tree and delete the stash at the same time, use

$ git stash pop
...
Dropped refs/stash@{0} (d4cc94c37e92390e5fabf184a3b5b7ebd5c3943a)

Between saving and restoring the repository you can change the repository as you like, e.g. change the branch, make commits, etc. The stash is always applied to the current working tree.

The command git stash pop is an abbreviation for the two commands git stash apply and git stash drop:

$ git stash apply
...
$ git stash drop
Dropped refs/stash@{0} (d4cc94c37e92390e5fabf184a3b5b7ebd5c3943a)

Both pop and apply maintain the changes in the working tree, the index state is not restored again. The --index option also restores the stored state of the index.

The --patch (or short -p) option starts an interactive mode, i.e. you can select individual hunks to add to the stash just like with git add -p and git reset -p:

$ git stash -p

The configuration setting interactive.singlekey (see Sec. 2.1.2, “Creating Commits Step by Step”) also applies here.

4.5.2. Solving Conflicts

Conflicts can occur if you apply a stash to a commit other than the one on which it was created:

$ git stash pop
Auto-merging hello.pl
CONFLICT (content): Merge conflict in hello.pl

In this case, use the usual recipes to solve the conflict, see Sec. 3.4, “Resolving Merge Conflicts”. It is important, however, that the conflict markers are labeled Updated Upstream (the version in the current working tree) and Stashed Changes (changes in the stash):

<<<<<<< Updated upstream
# E-Mail: valentin.haenel@gmx.de
========
# E-Mail: valentin@gitbu.ch
>>>>>>> Stashed changes

If you have tried to apply a stash with git stash pop and a conflict has occurred, the stash will not be deleted automatically. You must explicitly delete it (after resolving the conflict) with git stash drop.

4.5.3. If You Can Not Apply the Stash…

The stash is applied to the current working tree by default, provided it is clean - if not, Git aborts:

$ git stash pop
Cannot apply to a dirty working tree, please stage your changes

While Git suggests that you add the changes to the index, how you should proceed depends on your goal. If you want to have the changes in the stash in addition to those in the working tree, here’s a good idea:

$ git add -u
$ git stash pop
$ git reset HEAD

For explanation: First, the unsaved changes to the working tree are added to the index; then the changes are extracted from the stash and applied to the working tree, and finally the index is reset.

Alternatively, you can create an additional stash and apply the changes you want to have to a clean working tree:

$ git stash
$ git stash apply stash@{1}
$ git stash drop stash@{1}

For this recipe you use several stashes. First you store the changes in the working tree into a new stash, then you get the changes you actually want from the previous stash and delete it after the application.

4.5.4. Adjusting Messages

By default, Git sets the following message for a stash

WIP: on <branch>: <sha1> <commit-msg>

<branch>: the current branch

<sha1>: the commit ID of the HEAD

<commit-msg>: the commit message of the HEAD

In most cases this is sufficient to identify a stash. If you plan to keep your stashes longer (possible, but not really recommended), or if you want to do more than one, we recommend that you add a better note to them:

$ git stash save "unfertiges feature"
Saved working directory and index state On master: unfertiges feature
HEAD is now at b529e34 new spec how the script should behave

4.5.5. Viewing Stashes

Git manages all stashes as a stack, i.e. more recent states are on top and are processed first. The stashes are named with a reflog syntax (see also Sec. 3.7, “Reflog”):

    stash@{0}
    stash@{1}
    stash@{2}
    ...

If you create a new stash, it will be called stash@{0} and the number of the others will be incremented: stash@{0} becomes stash@{1}, stash@{1} becomes stash@{2} and so on.

If you do not specify an explicit stash, the commands apply, drop and show refer to the most recent, i.e. stash@{0}.

To view individual stashes, use git stash show. By default, this command prints a balance of the added and removed lines (like git diff --stat):

$ git stash show
git-stats.sh |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

The git stash show command also accepts general diff options that affect the format, e.g. `-p` to output a patch in diff format:

$ git stash show -p stash@{0}
diff --git a/git-stats.sh b/git-stats.sh
index 62f92fe..1235fd3 100755
--- a/git-stats.sh
\+++ b/git-stats.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
-START=18.07.2010
-END=25.07.2010
+START=18.07.2000
+END=25.07.2020
  echo "Number of commits per author:"

The git stash list command prints a list of currently created stashes:

$ git stash list
stash@{0}: WIP on master: eae23b6 add number of merge commits to output
stash@{1}: WIP on master: b1ee2cf start and end date in one place only

4.5.6. Deleting Stashes

Individual stashes can be deleted with the command git stash drop, all with git stash clear. If you delete a stash by mistake, you won’t find it again via the usual reflog mechanisms! However, the following command prints the former stashes:⁠^[62]

$ git fsck --unreachable | grep commit | cut -d" "  -f3 | \
  xargs git log --merges --no-walk --grep=WIP

In case of emergency, note that you will find the command at the very end of the git-stash(1) man page.

It is also important that the entries shown in this way only exist as unreachable objects in the object database and are therefore subject to the normal maintenance mechanisms — they are therefore deleted after some time and are not kept permanently.

4.5.7. How Is the Stash Implemented?

Git creates two commit objects for each stash, one for working tree changes and one for index changes. Both have the current HEAD as their ancestor, the working tree object has the index object as its ancestor. This makes a stash in Gitk appear as a triangle, which is a bit confusing at first:

Figure 29. A stash in Gitk

With the alias git tree (see Sec. 3.6.1, “Revision Parameters”) this looks like this:

*   f1fda63 (refs/stash) WIP on master: e2c67eb Kommentar fehlte
|\
| * 4faee09 index on master: e2c67eb Kommentar fehlte
|/
* e2c67eb (HEAD, master) Kommentar fehlte
* 8e2f5f9 Test Datei
* 308aea1 README Datei
* b0400b0 Erste Version

Since the stash objects are not referenced by a branch, the working tree object is kept alive with a special reference, refs/stash. However, this only applies to the latest stash. Older stashes are only referenced in the Reflog (see ` Sec. 3.7, “Reflog”) and therefore do not appear in Gitk. In contrast to normal reflog entries, stored stashes do not expire and are therefore not deleted by the normal maintenance mechanisms.

4.6. Annotating Commits — Git Notes

In general, it is not easy to modify or extend commits once they have been published. Sometimes, however, you wish you could “attach” information to commits afterwards, without the commit changing. This could be ticket numbers, information about whether the software compiled, who tested it, and so on.

Git offers a way to attach notes to a commit using the git notes command. The notes are an uncoupled branch of commits, referenced by refs/notes/commits, on which the development of the notes is stored. On this branch, the notes for a commit are stored in a file whose filename corresponds to the SHA-1 sum of the commit it describes.

But you can disregard these internals — in practice, you can manage the notes completely with git notes. The only important thing is to know: You can only save one note per commit ⁠^[63]. But you can edit or extend the notes afterwards.

To add a new note: git notes add <commit>. If you omit <commit>, HEAD will be used. Similar to git commit an editor opens where you write the note. Alternatively, you can specify it directly with -m "<note>".

By default, the note is always displayed below the commit message:

$ git show 8e8a7c1f
commit 8e8a7c1f4ca66aa024acde03a58c2b67fa901f88
Author: Julius Plenz <julius@plenz.com>
Date:   Sun May 22 15:48:46 2011 +0200

    Schleife optimieren

Notes:
    Dies verursacht Bug #2319 und wird mit v2.1.3-7-g6dfa88a korrigiert

With the --no-notes option you can explicitly tell commands like log or show not to display notes.

The command git notes add will end with an error if a note already exists for the given commit. Use the git notes append command instead to append more lines to the note, or directly git notes edit to edit the note as desired.

By default the notes are not uploaded or downloaded, you have to do this explicitly with the following commands:

$ git push <remote> refs/notes/commits
$ git fetch <remote> refs/notes/commits:refs/notes/commits

The notes concept is not very well developed in Git. In particular, it is problematic when multiple developers create commit notes in parallel, and then need to merge them. For more information, see the git-notes(1) man page.

If you want to use notes, this is usually only useful in connection with ticket, bug tracking or continuous integration systems: These could automatically create notes and thus possibly store helpful additional information in the repository.

To automatically download the notes at each git fetch, add a refspec of the following form to the file git/config (see also Sec. 5.3.1, “git fetch”):

  fetch = +refs/notes/*:refs/notes/*

4.7. Multiple Root Commits

When a repository is initialized, the first commit, called the root commit, is created. This commit is usually the only commit in the entire repository that has no predecessor.

However, it is also possible to have multiple root commits in one repository. This can be useful in the following cases:

You want to merge two independent projects that were previously developed in separate repositories (see also subtree-merges in Sec. 5.11.2, “Subtrees”).

` You want to manage a fully decoupled branch where you keep a todo list, compiled binaries or autogenerated documentation.

In case you want to merge two repositories, this command is usually sufficient:

$ git fetch -n <anderes-repo> master:<anderer-master>
warning: no common commits
...
>From <anderes-repo>
 * [new branch]      master     -> <anderer-master>

The master branch of the other repository is copied to the local repository as <other-master>, including all commits until Git finds a merge base or root commit. The warning "no common commits already indicates that the two version histories do not have a common commit. The repository now has two root commits.

Note that a merge between two branches that do not share commits will fail since a file exists on both sides and is not equal. This may be remedied by subtree-merges, see Sec. 5.11.2, “Subtrees”.

You can also, instead of importing another repository, create a completely detached branch, ``a second root commit. The following two commands are sufficient for this:

$ git checkout --orphan <newroot>
$ git rm --cached -rf .

The first one sets the HEAD to the (not yet existing) branch <newroot>. The rm command deletes all Git-managed files from the index, but leaves them intact in the working tree. So now you have an index that doesn’t contain anything and a branch that doesn’t have a commit yet.

You can now use the git add command to add files to the new root commit and then create it with git commit.

4.8. Finding Regressions — Git Bisect

In software development, a regression refers to the point in time from which a certain feature of a program no longer functions. This can be after an update of libraries, after the introduction of new features that cause side effects etc.

To find such regressions is sometimes difficult. If you are using an extensive test suite, you are relatively well protected from including trivially detectable regressions (e.g. by running a make test before each commit).

If the regression is reproducible ("with the arguments <x> the program crashes", "the configuration setting <y> causes a memory access error"), then you can use Git to automate the search for the commit that causes this regression.

Git provides the command bisect for this purpose, whose algorithm is based on the "divide and conquer" principle (divide and conquer_) works: First you define a point in time (i.e. a commit) when the regression had not yet occurred (called `good), then a point in time when it occurs (called bad, leave it out, Git assumes HEAD). The bisect command is based on the idealized assumption that the regression was initiated by a commit — that is, there is a commit before that everything was fine, and after that the error occurs.⁠^[64]

Now Git chooses a commit from the middle between good and bad and checks it out. You must then check whether the regression is still present. If yes, Git will set bad to this commit, if no, good will be set to this commit. This removes about half of the commits to examine. Git repeats the step until only one commit remains.

So the number of steps bisect takes is logarithmic to the number of commits you examine: For n commits, you need about log₂(n) steps. For 32 commits, that’s a maximum of five steps, but for 1024 commits, that’s a maximum of 10 steps, because ``you can eliminate 512 commits in the first step.

4.8.1. Usage

You start a bisect session with the following commands:

$ git bisect start
$ git bisect bad <funktioniert-nicht>
$ git bisect good <funktioniert>

Once you’ve defined the two points, Git checks out a commit in the middle, so you’re now in detached-head mode (see Sec. 3.2.1, “Detached HEAD”). After you have checked whether the regression is still present, you can mark it with git bisect good or git bisect bad. Git will automatically check out the next commit.

You may not be able to test the checked out commit, for example, because the program does not compile correctly. In this case, you can use git git bisect skip to have another commit nearby selected and proceed with it as usual. You can cancel the debugging at any time with git bisect reset.

4.8.2. Automation

Ideally, you can test automatically whether the error occurs — with a test that must run successfully if the regression does not occur.

You can then define the points good and bad as above. Afterwards you enter git bisect run <path/to/test>.

Based on the return value, bisect decides whether the checked commit is good (if the script ends successfully, i.e. with return value 0) or bad (values 1—127). A special case is the return value 125, which causes a git bisect skip. So if you have a program that needs to be compiled, the first thing you should do is to add a command like make || exit 125, so that the commit is skipped if the program does not compile properly.

Bisect can then automatically identify the problematic commit. This looks like this, for example:

$ git bisect run ./t.sh
Bisecting: 9 revisions left to test after this (roughly 3 steps) ...
Bisecting: 4 revisions left to test after this (roughly 2 steps) ...
Bisecting: 2 revisions left to test after this (roughly 1 step) ...
Bisecting: 0 revisions left to test after this (roughly 0 steps) ...
d29758fffc080d0d0a8ee9e5266fdf75fcb98076 is the first bad commit

With small commits and meaningful descriptions you can save yourself a lot of work by using the bisect command when searching for obscure bugs.

So take special care not to create commits that leave the software in a broken state (does not compile, etc.), which a later commit will fix.

54. If you’re managing patch stacks with Git that have potential conflicts, you should definitely take a look at the Reuse Recorded Resolution feature, in short, rerere. Rerere saves conflict resolutions and automatically corrects conflicts if a resolution has already been saved, see also Sec. 3.4.2, “Rerere: Reuse Recorded Resolution”.

55. For example, by uploading the branch to a publicly available repository, see Sec. 5.4, “Uploading Commits: git push”.

56. In the latter case, for example, you simply do a git remote update (the new commits are loaded into the origin/master branch) and then build your own branch from scratch to origin/master. See also Sec. 5.1, “How Does Distributed Version Control Work?”.

57. You can find the source code at https://repo.or.cz/w/topgit.git.

58. Short stg or StGit, reachable under https://stacked-git.github.io.

59. This also works fine as long as all branches and merges are above the new reference (i.e. only commits are included from which you can reach the new base). Otherwise, rebase will fail for every commit already in history (error message: “nothing to commit”); these must always be skipped with a git rebase --continue.

60. More examples can be found on the gitignore(5) man page and at https://docs.github.com/en/free-pro-team@latest/github/using-git/ignoring-files.

61. This behavior can be prevented by setting the clean.requireForce setting to false.

62. The command first selects all commit objects that are no longer accessible, and then restricts the list to those that are merge commits and whose commit message contains the string WIP — the properties that a commit object created as a stash has, see Sec. 4.5.7, “How Is the Stash Implemented?”.

63. That’s not quite true; you can only store one note per commit under refs/notes/commits, but you can store additional notes under e.g. refs/notes/bts that relate to the bug tracking system, and only one per commit there.

64. Of course, this commit need not be the core of the regression, it may have been prepared by a completely different commit.