8. Git Automation
In this chapter, we’ll introduce advanced techniques for automating Git. In the first section about Git attributes, we’ll show you how to tell Git to treat certain files separately, for example, to call an external diff command on graphics.
We continue with hooks — small scripts that are executed when various git commands are called, for example to notify all developers via email when new commits arrive in the repository.
Then we’ll give a basic introduction to scripting with Git and show you useful plumbing commands.
Finally, we will introduce the powerful filter-branch
command, which you can use to rewrite the project history on a large scale, for example to remove a file with a password from all commits.
8.1. Git Attributes — Treating Files Separately
Git attributes allow you to assign specific properties to individual files or a group of files so that Git treats them with special care; examples would be forcing the end of lines or marking certain files as binary.
You can write the attributes either in the file .gitattributes
or .git/info/attributes
.
The latter is for a repository and is not managed by Git.
A .gitattributes
file is usually checked in, so all developers use these attributes.
You can also store additional attribute definitions in subdirectories.
One line in this file has the format:
<pattern> <attrib1> <attrib2> ...
An example:
*.eps binary *.tex -text *.c filter=indent
Usually attributes can be set (e.g. `binary`), canceled (-text
) or set to a value (filter=indent
).
The man page gitattributes(5)
describes in detail how Git interprets the attributes.
A project that is developed in parallel on Windows and Unix machines suffers from the fact that the developers use different conventions for line endings. This is due to the operating system: Windows systems use a carriage return followed by a line feed (CRLF), while unixoid systems use only a line feed (LF).
By means of suitable git attributes you can determine an adequate policy — in this case the attributes text
or eol
are responsible.
The attribute text
causes the line ends to be "normalized".
Whether a developer’s editor uses CRLF or just LF, Git will only store the version with LF in the blob.
If you set the attribute to auto
, Git will only perform this normalization if the file also looks like text.
The eol
attribute, on the other hand, determines what happens during a checkout.
Regardless of the user’s core.eol
setting, you can specify e.g. CRLF for some files (because the format requires it).
*.txt text *.csv eol=crlf
With these attributes, .txt
files are always saved internally with LF and checked out as CRLF if required (platform- or user-dependent).
CSV files on the other hand are checked out with CRLF on all platforms.
(Internally, Git will save all these blobs with simple LF extensions).
8.1.1. Filter: Smudge and Clean
Git offers a filter to "smudge" files after a checkout and to "clean" files again before a git add.
The filters do not get any arguments, but only the content of the blob on standard in. The output of the program is used as new blob.
For each filter you have to define a Smudge and a Clean command.
If one of the definitions is missing or if the filter is cat
, the blob is taken over unchanged.
Which filter is used for which type of files is defined by the git attribute filter
.
For example, to automatically indent C files correctly before a commit, you can use the following filter definitions (instead of <indent>
, any other name is possible):
$ git config filter.<indent>.clean indent $ git config filter.<indent>.smudge cat $ echo '*.c filter=<indent>' > .git/info/attributes
To "clean up" a C file, Git now automatically calls the indent
program that should be installed on standard systems.[106]
8.1.2. Keywords in Files
So in principle the well-known keyword expansions can be realized, so that e.g. $Version$
becomes $Version: v1.5.4-rc2$
.
You define the filters in your configuration and then equip corresponding files with this git attribute. This works like this, for example:
$ git config filter.version.smudge \~/bin/git-version.smudge $ git config filter.version.clean ~/bin/git-version.clean $ echo '* filter=version' > .git/info/attributes
A filter that replaces or cleans up the $Version$
keyword could be implemented as a Perl one-liner; first the Smudge filter:
#!/bin/sh version=`git describe --tags` exec perl -pe _s/$Version(:\s[^$]+)?$/$Version: _"$version"_$/g_
And the Clean-Filter:
#!/usr/bin/perl -p s/$Version: [^$]+$/$Version$/g
It is important that repeated application of such a filter does not make uncontrolled changes in the file. A double call to Smudge should be fixed by a single call to Clean.
8.1.2.1. Restrictions
The concept of filters in Git is intentionally kept simple and will not be expanded in future versions. The filters receive no information about the context in which Git is currently located: Is a checkout happening? A merge? A diff? They only get the blob content. So the filters should only perform context-independent manipulations.
At the time Smudge is called, the HEAD
may not yet be up to date (the above filter would write an incorrect version number to the file during a git checkout
, because it is called before the HEAD
is moved).
So the filters are not very suitable for keyword expansion.
This may annoy users who have become accustomed to this feature in other version control systems. However, there are no good arguments for such an expansion within a version control system. The internal mechanisms Git uses to check if files have been modified are paralyzed (since they always have to go through the clean filter). Also, because of the structure of Git repositories, you can "track" a blob through the commits or trees, so you can always tell if a file belongs to a commit by its contents if necessary.
So keyword expansion is only useful outside of Git.
This is not the responsibility of Git, but a Makefile
target or script.
For example, a make dist
can replace all occurrences of VERSION
with the output of git describe --tags
.
Git will display the files as "changed".
Once the files are distributed (e.g. as a tarball), you can clean up with git reset --hard
.
Alternatively, the export-subst
attribute ensures that an expansion of the form $Format:<Pretty>$
is performed.
Where <Pretty>
must be a format that is valid for git log --pretty=format:<Pretty>
, e.g. `%h` for the shortened commit hash sum.
Git will only expand these attributes if the file is packaged via git archive
(see Sec. 6.3.2, “Creating Releases”).
8.1.3. Own Diff Programs
Git’s internal diff mechanism is very well suited for all types of plaintext. But it fails with binaries - Git just tells you whether they differ or not. However, if you have a project where you need to manage binary data, such as PDFs, OpenOffice documents, or images, it’s a good idea to define a special program that creates meaningful diffs for these files.
For example, there are antiword
and pdftotext
to convert Word documents and PDFs to plaintext.
There are analogous scripts for OpenOffice formats.
For images you can use commands from the ImageMagick suite (see also the example below).
If you manage statistical data, you can plot the changed recordsets side by side.
Depending on the nature of the data, there are usually adequate ways to visualize changes.
Such conversion processes are, of course, lossy: You cannot use this diff output, for example to make meaningful changes to the files in a merge conflict. But to get a quick overview of who changed what, such techniques are sufficient.
8.1.3.1. API for External Diff Programs
Git provides a simple API for custom diff filters. A diff filter is always passed the following seven arguments:
-
path (name of the file in the Git repository)
-
old version of the file
-
old SHA-1 ID of the blob
-
old Unix rights
-
new version of the file
-
new SHA-1 ID of the blob
-
new Unix rights
The arguments 2 and 5 may be temporary files, which will be deleted as soon as the diff program quits again, so you don’t have to care about cleaning up.
If one of the two files does not exist (newly added or deleted), then /dev/null
is passed as file name.
The corresponding blob is then 00000
…, even if a file does not yet exist as a fixed object in the object database (i.e. only in the working tree or index).
The Diff command must be able to handle these cases accordingly.
8.1.3.2. Configuring External Diffs
There are two ways to call an external diff program.
The first method is temporary:
just set the environment variable GIT_EXTERNAL_DIFF
to the path to your program before calling git diff
:
$ GIT_EXTERNAL_DIFF=</pfad/zum/diff-kommando> git diff HEAD^
The other option is persistent, but requires some configuration.
First you define your own diff command <name>
:
$ git config diff.<name>.command </pfad/zum/diff-kommando>
The command needs to be able to handle the above mentioned seven arguments.
Now you have to use the git-attribute diff
to define, which diff-program is called.
To do this, write e.g. the following lines in the .gitattributes
file:
*.jpg diff=imgdiff *.pdf diff=pdfdiff
When you check the file in, other users must also have set corresponding commands for imgdiff
or pdfdiff
, otherwise they will see the regular output.
If you want to set this for one repository only, write this information to .git/info/attributes
.
8.1.3.3. Comparing Pictures
A common use case are pictures:
What has changed between two versions of an image?
To visualize this is not always easy.
The tool compare
from the ImageMagick suite marks the places that have changed for images of the same size.
You can also animate the two images one after the other and recognize by the "flickering" where the image has changed.
Instead, we want a program that compares the two images. Between the two images a kind of "difference" is displayed: All areas where changes have occurred are copied from the new image onto a white background. So the diff shows which areas have been added.
Therefore we save the following script under $HOME/bin/imgdiff
:[107]
#!/bin/sh OLD="$2" NEW="$5" # "xc:none" ist "Nichts", entspricht einem fehlenden Bild [ "$OLD" = "/dev/null" ] && OLD="xc:none" [ "$NEW" = "/dev/null" ] && NEW="xc:none" exec convert "$OLD" "$NEW" -alpha off \ \( -clone 0-1 -compose difference -composite -threshold 0 \) \ \( -clone 1-2 -compose copy_opacity -composite \ -compose over -background white -flatten \) \ -delete 2 -swap 1,2 +append \ -background white -flatten x:
Finally, we need to configure the diff command and make sure it is used by an entry in the .git/info/attributes
file.
$ git config diff.imgdiff.command ~/bin/imgdiff $ echo "*.gif diff=imgdiff" > .git/info/attributes
As an example we use the original versions of the Tux.[108] First we insert the black and white Tux:
$ wget http://www.isc.tamu.edu/~lewing/linux/sit3-bw-tran.1.gif \ -Otux.gif $ git add tux.gif && git commit -m "tux hinzugefügt"
It will be replaced by a colored version in the next commit:
$ wget http://www.isc.tamu.edu/~lewing/linux/sit3-bwo-tran.1.gif \ -Otux.gif $ git diff
The output of the git diff
command is a window with the following content:
On the left the old version, on the right the new version, and in the middle a mask of those parts of the new image that are different from the old.
git diff
with the custom diff program imgdiff
The example with the Tux incl. manual can also be found in a repository at: https://github.com/gitbuch/tux-diff-demo.
8.2. Hooks
Hooks provide a mechanism to "hook" into important Git commands and perform your own actions. Therefore, hooks are usually small shell scripts to perform automated tasks, such as sending emails as soon as new commits are uploaded, or checking for whitespace errors before a commit and issuing a warning if necessary.
For hooks to be executed by Git, they must be located in the hooks/
directory in the Git directory, i.e. under .git/hooks/
or under hooks/
at the top level for bare repositories.
They must also be executable.
Git automatically installs sample hooks on a git init
, but these have the extension <hook>.sample
and are therefore not executed without user intervention (renaming of files).
You can activate a supplied hook e.g. like this:
$ mv .git/hooks/commit-msg.sample .git/hooks/commit-msg
Hooks come in two classes: those that are executed locally (checking commit messages or patches, performing actions after a merge or checkout, etc.), and those that are executed server-side when you publish changes via git push
.[109]
Hooks whose name begins with pre-
can often be used to decide whether or not to perform an action.
If a pre
-hook does not end successfully (i.e. with a non-zero exit status), the action is aborted.
Technical documentation on how this works can be found in the githooks(5)
man page.
8.2.1. Commits
pre-commit
-
Is called before the commit message is queried. If the hook terminates with a non-zero value, the commit process is aborted. The hook installed by default checks whether a newly added file has non-ASCII characters in the file name and whether there are whitespace errors in the modified files. With the
-n
or--no-verify
option,git commit
skips this hook.
prepare-commit-msg
-
Will be executed right before the message is displayed in an editor. Gets up to three parameters, the first of which is the file where the commit message is stored so that it can be edited. For example, the hook can add lines automatically. A non-zero exit status cancels the commit process. However, this hook cannot be skipped and therefore should not duplicate or replace the functionality of
pre-commit
.
commit-msg
-
Will be executed after the commit message is entered. The only argument is the file where the message is stored, so that it can be modified (normalization). This hook can be skipped by
-n
or--no-verify
; if it does not terminate successfully, the commit process is aborted.
post-commit
-
Called after a commit has been created.
These hooks act only locally and are used to enforce certain policies regarding commits or commit messages.
The pre-commit
hook is especially useful for this.
For example, some editors do not adequately indicate when there are spaces at the end of the line, or spaces contain spaces.
Again, this is annoying when other developers have to clean up whitespace in addition to regular changes.
This is where Git helps with the following command:
$ git diff --cached --check hooks.tex:82: trailing whitespace. + auch noch Whitespace aufräumen müssen._
The --check
option lets git diff
check for such whitespace errors and will only exit successfully if the changes are error-free.
If you write this command in your pre-commit
hook, you will always be warned if you want to check in whitespace errors.
If you are quite sure, you can simply suspend the hook temporarily with git commit -n
.
Similarly, you can also store the "Check Syntax" command for a script language of your choice in this hook. For example, the following block for Perl scripts:
git diff --diff-filter=MA --cached --name-only | while read file; do if [ -f $file ] && [ $(head -n 1 $file) = "#!/usr/bin/perl" ]; then perl -c $file || exit 1 fi done true
The names of all files modified in the index (diff filter modified
and added
, see also Sec. 8.3.4, “Finding Changes”) are passed to a subshell that checks per file whether the first line is a Perl script.
If so, the file is checked with perl -c
.
If there is a syntax error in the file, the command will issue an appropriate error message, and exit 1
will terminate the hook, so Git will abort the commit process before an editor is opened to enter the commit message.
The closing true
is needed e.g. if a non-perl file was edited:
Then the if construct fails, the shell returns the return value of the last command, and although there is nothing to complain about, Git will not execute the commit.
With the line true
the hook was successful if all passes of the while
loop were successful.
The hook can of course be simplified by assuming that all Perl files are present as <name>.pl
.
Then the following code is sufficient:
git ls-files -z -- _*.pl_ | xargs -z -n 1 perl -c
Since you might want to check only the files managed by Git, a git ls-files
is better than a simple ls
, because that would also list untracked files ending in .pl
.
Besides checking the syntax, you can of course also use Lint style programs that check the source code for "unsightly" or non portable constructs.
Such hooks are extremely useful to avoid accidentally checking in faulty code.
If warnings are inappropriate, you can always skip the hook pre-commit
by using the -n
option when committing.
8.2.2. Server Side
The following hooks are called on the receiver side of git receive-pack
after the user enters git push
in the local repository.
For a push operation, git send-pack
creates one packfile on the local side (see also Sec. 2.2.3, “The Object Database”), which is received by git receive-pack
on the recipient side.
Such a packfile contains the new values of one or more references as well as the commits required by the recipient repository to completely map the version history.
The two sides negotiate which commits these are in advance (similar to a merge base).
pre-receive
-
The hook is called once and receives a list of changed references on standard input (see below for format). If the hook does not complete successfully,
git receive-pack
refuses to accept it (the whole push operation fails).
update
-
Is called once per changed reference and gets three arguments: the old state of the reference, the proposed new one and the name of the reference. If the hook does not end successfully, the update of the single reference is denied (in contrast to
pre-receive
, where only a whole packfile can be agreed or not).
post-receive
-
Similar to
pre-receive
, but is called only after the references have been changed (so it has no influence on whether the packfile is accepted or not).
post-update
-
After all references are changed, this hook is executed once and gets the names of all changed references as arguments. But the hook is not told, on which state the references were before or are now. (You can use
post-receive
for this.) A typical use case is a call togit update-server-info
, which is necessary if you want to provide a repository via HTTP.
8.2.2.1. The Format of the Receive Hooks
The pre-receive
and post-receive
hooks get an equivalent input to standard input.
The format is the following:
<alte-sha1> <neue-sha1> <name-der-referenz>
This can look like this, for example:
0000000...0000000 ca0e8cf...12b14dc refs/heads/newbranch ca0e8cf...12b14dc 0000000...0000000 refs/heads/oldbranch 6618257...93afb8d 62dec1c...ac5373b refs/heads/master
A SHA-1 sum of all zeros means "not present". So the first line describes a reference that was not present before, while the second line means the deletion of a reference. The third line represents a regular update.
You can easily read the references with the following loop:
while read old new ref; do # ... done
In old
and new
then the SHA-1 sums are stored, while ref
contains the name of the reference.
A git log $old..$new
would list all new commits.
The default output is forwarded to git send-pack
on the page where git push
was entered.
So you can forward any error messages or reports directly to the user.
8.2.2.2. Sending E-Mails
A practical use of the post-receive
hook is to send out emails as soon as new commits are available in the repository.
You can program this yourself, of course, but there is a ready-made script that comes with Git.
You can find it in the Git source directory under contrib/hooks/post-receive-email
, and some distributions, such as Debian, also install it along with Git to /usr/share/doc/git/contrib/hooks/post-receive-email
.
Once you have copied the hook into the hooks/
subdirectory of your bare repository and made it executable, you can adjust the configuration accordingly:
$ less config ... [hooks] mailinglist = "Autor Eins <autor1@example.com>, autor2@example.com" envelopesender = "git@example.com" emailprefix = "[project] "
This means that for each push operation per reference, a mail is sent with a summary of the new commits.
The mail goes to all recipients defined in hooks.mailinglist
and comes from hooks.envelopesender
.
The subject line is prefixed with the hooks.emailprefix
, so that the mail can be sorted away more easily.
More options are documented in the comments of the hooks.
8.2.2.3. The Update Hook
The update
hook is called for each reference individually.
It is therefore particularly well suited to implement a kind of "access control" to certain branches.
In fact, the update
hook is used by Gitolite (see Sec. 7.2, “Gitolite: Simple Git Hosting”) to decide whether a branch may be modified or not.
Gitolite implements the hook as a Perl script that checks whether the appropriate permission is present and terminates with a zero or non-zero return value accordingly.
8.2.2.4. Deployment via Hooks
Git is a version control system and knows nothing about deployment processes. However, you can use the update hook to implement a simple deployment procedure - e.g. for web applications.
The following update
hook will, if the master
branch has changed, replicate the changes to /var/www/www.example.com
:
[ "$3" = "refs/heads/master" ] || exit 0 env GIT_WORK_TREE=/var/www/www.example.com git checkout -f
So as soon as you upload new commits via git push
to the server’s master branch, this hook will automatically update the web presence.
8.2.3. Applying Patches
The following hooks are each called by git am
when one or more patches are applied.
applypatch-msg
-
Is called before a Patch is applied. The hook receives as its only parameter the file where the commit message of the patch is stored. The hook can change the message if necessary. A non-zero exit status causes
git am
not to accept the patch.
pre-applypatch
-
Called after a patch has been applied, but before the change is committed. A non-zero exit status causes
git am
not to accept the patch.
post-applypatch
-
Is called after a patch has been applied.
The hooks installed by default execute the corresponding commit hooks commit-msg
and pre-commit
, if enabled.
8.2.4. Other Hooks
pre-rebase
-
Is executed before a rebase process starts. Gets as arguments the references that are also passed to the rebase command (e.g. for the
git rebase master topic
command, the hook gets the argumentsmaster
andtopic
). Based on the exit statusgit rebase
decides whether the rebase process is executed or not.
pre-push
-
Is executed before a push operation starts. Receives on standard input lines of the form
<locale-ref>
␣`<locale-sha1>`␣`<remote-ref>`␣`<remote-sha1>`. If the hook does not terminate successfully, the push process is aborted.
post-rewrite
-
Is called by commands that rewrite commits (currently only
git commit --amend
andgit rebase
). Receives a list in the format<old-sha1>
␣`<new-sha1>` on standard input.
post-checkout
-
Is called after a checkout. The first two parameters are the old and new reference to which
HEAD
points. The third parameter is a flag that indicates whether a branch has been changed (1
) or individual files have been checked out (0
).
post-merge
-
Will be executed if a merge was successfully completed. The hook gets a
1
as argument if the merge was a so called squash-merge, i.e. a merge that did not create a commit but only processed the files in the working tree.
pre-auto-gc
-
Is called before
git gc --auto
is executed. Prevents execution of the automatic garbage collection if the return value is not zero.
You can use the post-checkout
and post-commit
hooks to teach Git "real" file permissions.
This is because a blob object does not accurately reflect the contents of a file and its access rights.
Instead, Git only knows "executable" or "non-executable".[110]
The script stored in the git source directory under contrib/hooks/setgitperms.perl
provides a ready-made solution that you can integrate into the above hooks.
The script stores the real access rights in a .gitmeta
file.
If you do the read-in (option -r
) in the pre-commit
hook and give the hooks post-checkout
and post-merge
the command to write permissions (option -w
), the permissions of your files should now be persistent.
See the comments in the file for the exact commands.
The access rights are of course only stable between checkouts - unless you check in the .gitmeta
file and force the use of the hooks, clones of this repository will of course only get the "basic" access rights.
8.3. Writing Your Own Git Commands
Git follows the Unix philosophy of "one tool, one job" with its division into subcommands. It also divides the subcommands into two categories: Porcelain and Plumbing.
Porcelain refers to the "good porcelain" that is taken out of the cupboard for the end user: a tidy user interface and human-readable output. Plumbing commands, on the other hand, are mainly used for "plumbing work" in scripts and have a machine-readable output (usually line by line with unique separators).
In fact, a substantial part of the Porcelain commands is implemented as shell script.
They use the various plumbing commands internally, but present a comprehensible interface to the outside.
The commands rebase
, am
, bisect
and stash
are just a few examples.
It is therefore useful and easy to write your own shell scripts to automate frequently occurring tasks in your workflow. These could be scripts that control the release process of the software, create automatic changelogs or other operations tailored to the project.
Writing your own git command is very easy:
You just have to place an executable file in a directory of your $PATH
(e.g. in ~/bin
) whose name starts with git-
.
If you type git <command>
and <command>
is neither an alias nor a known command, Git will simply try to run git-<command>
.
Even if you can write scripts in any language you like, we recommend using shell scripts:
Not only are they easier to understand for outsiders, but above all, the typical operations used to combine Git commands - calling programs, redirecting output - are "intuitively" possible with the shell and do not require any complicated constructs, such as When writing shell scripts, please pay attention to POSIX compatibility![111]
This includes in particular not using "bashisms" like |
All scripts presented in the following section can also be found online, in the script collection for this book.[113]
8.3.1. Initialization
Typically, you want to ensure that your script is executed in a repository.
For necessary initialization tasks, Git offers the git-sh-setup
.
You should include this shell script directly after the shebang line using .
(known as source
in interactive shells):
#!/bin/sh . $(git --exec-path)/git-sh-setup
Unless Git can detect a repository, git-sh-setup
will abort.
Also, the script will abort if it is not running at the top level in a repository.
Your script will not be executed and an error message will be displayed.
You can work around this behavior by setting the NONGIT_OK
or SUBDIRECTORY_OK
variable before the call.
Beside this initialization mechanism there are some functions available, which do frequently occurring tasks. Below is an overview of the most important ones:
cd_to_toplevel
-
Switches to the top level of the Git repository.
say
-
Outputs the arguments, unless
GIT_QUIET
is set.
git_editor
-
Opens the editor set for Git on the specified files. It’s better to use this function than "blind" `$EDITOR`. Git also uses this as a fallback.
git_pager
-
Opens the pager defined for Git.
require_work_tree
-
The function terminates with an error message if there is no working tree to the repository — this is the case with bare repositories. So you should call this function for security reasons if you want to access files from the working tree.
8.3.2. Position in the Repository
In scripts you will often need the information from which directory the script was called.
The Git command rev-parse
offers some options for this.
The following script, stored under ~/bin/git-whereami
, illustrates how to "find your way" within a repository.
#!/bin/sh SUBDIRECTORY_OK=Yes . $(git --exec-path)/git-sh-setup gitdir="$(git rev-parse --git-dir)" absolute="$(git rev-parse --show-toplevel)" relative="$(git rev-parse --show-cdup)" prefix="$(git rev-parse --show-prefix)" echo "gitdir absolute relative prefix" echo "$gitdir $absolute $relative $prefix"
The output looks like this:
$ git whereami gitdir absolute relative prefix .git /tmp/repo $ cd very/deep $ git whereami gitdir absolute relative prefix /tmp/repo/.git /tmp/repo ../../ very/deep/
Especially important is the prefix you get via --show-prefix
.
If your command accepts filenames and you want to find the blobs they correspond to in the object database, you must put this prefix in front of the filename.
If you are in the very/deep
directory and give the script the file name README
, it will find the corresponding blob in the current tree via very/deep/README
.
8.3.3. List References: rev-list
The core of the plumbing commands is git rev-list
(revision list).
Its basic function is to resolve one or more references to the SHA-1 sum(s) to which they correspond.
With a git log <ref1>..<ref2>
you display the commit messages from <ref1>
(exclusive) to <ref2>
(inclusive).
The git rev-list
command resolves this reference to the individual commits that are affected and prints it out line by line:
$ git rev-list master..topic f4a6a973e38f9fac4b421181402be229786dbee9 bb8d8c12a4c9e769576f8ddeacb6eb4eedfa3751 c7c331668f544ac53de01bc2d5f5024dda7af283
So a script that operates on one or more commits can simply pass information to rev-list
, as other Git commands understand it.
Your script can even handle complicated expressions.
You can use the command, for example, to check whether fast forward from one branch to another is possible.
Fast forward from <ref1>
to <ref2>
is possible if Git can reach the commit marked by <ref1>
in the commit graph of <ref2>
.
In other words, there is no commit reachable from <ref1>
that can’t also be reached from <ref2>
.
#!/bin/sh SUBDIRECTORY_OK=Yes . $(git --exec-path)/git-sh-setup [ $# -eq 2 ] || { echo "usage: $(basename $0) <ref1> <ref2>"; exit 1; } for i in $1 $2 do if ! git rev-parse --verify $i >| /dev/null 2>&1 ; then echo "Ref:_$i_ does not exist!" && exit 1 fi done one_two=$(git rev-list $1..$2) two_one=$(git rev-list $2..$1) [ $(git rev-parse $1) = $(git rev-parse $2) ] \ && echo "$1 and $2 point to the same commit!" && exit 2 [ -n "$one_two" ] && [ -z "$two_one" ] \ && echo "FF from $1 to $2 possible!" && exit 0 [ -n "$two_one" ] && [ -z "$one_two" ] \ && echo "FF from $2 to $1 possible!" && exit 0 echo "FF not possible! $1 and $2 are diverged!" && exit 3
The calls to rev-parse
in the For loop check that the arguments are references that Git can resolve to a commit (or other database object) - if this fails, the script aborts with an error message.
The output of the script could look like this:
$ git check-ff topic master FF von master nach topic möglich!
For simple scripts, which expect only a limited number of options and arguments, a simple evaluation of these, as in the above script, is completely sufficient.
However, if you are planning a more complex project, the so-called |
8.3.4. Finding Changes
git diff
and git log
tell you to display information about the files that a commit has changed, using the --name-status
option:
$ git log -1 --name-status 8c8674fc9 commit 8c8674fc954d8c4bc46f303a141f510ecf264fcd ... M git-pull.sh M t/t5520-pull.sh
Each name is preceded by one of five flags[114], which are shown in the list below:
A
(added)-
File was added
D
(deleted)-
File was deleted
M
(modified)-
File was changed
C
(copied)-
File was copied
R
(renamed)-
File was renamed
The flags C
and R
are followed by a three-digit number indicating the percentage that has remained the same.
So if you duplicate a file, this corresponds to the output C100
.
A file that is renamed and slightly modified in the same commit via git mv
might show up as R094
- a 94% renaming.
$ git log -1 --name-status 0ecace728f ... M Makefile R094 merge-index.c builtin-merge-index.c M builtin.h M git.c
You can use these flags to search for commits that have changed a specific file using diff filters. For example, if you want to find out who added a file when, use the following command:
$ git log --pretty=format:'added by %an %ar' --diff-filter=A -- cache.h added by Linus Torvalds 6 years ago
You can specify several flags to a diff filter directly after each other. The question "Who did most of the work on this file?" can often be answered by whose commits modified this file the most. This can be found out, for example, by doing the following:
$ git log --pretty=format:%an --diff-filter=M -- cache.h | \ sort | uniq -c | sort -rn | head -n 5 187 Junio C Hamano 100 Linus Torvalds 27 Johannes Schindelin 26 Shawn O. Pearce 24 Jeff King
8.3.5. The Object Database and rev-parse
The Git command rev-parse
(revision parse) is an extremely flexible tool whose task is, among other things, to translate expressions describing commits or other objects of the object database into their complete SHA-1 sum.
For example, the command converts abbreviated SHA-1 sums into the unique 40-character variant:
$ git rev-parse --verify be1ca37e5 be1ca37e540973bb1bc9b7cf5507f9f8d6bce415
The --verify
option is passed to make Git print an appropriate error message if the passed reference is not a valid one.
However, the command can also abbreviate a SHA-1 sum with the --short
option.
The default is seven characters:
$ git rev-parse --verify --short be1ca37e540973bb1bc9b7cf5507f9f8d6bce415 be1ca37
If you want to find out the name of the branch that is currently checked out (as opposed to the commit ID), use |
But rev-parse
(and thus also all other git-commands, which accept arguments as references) supports even more possibilities to reference objects.
<sha1>^{<type>}
-
Follows the reference
<sha1>
and resolves it to an object of type<typ>
. This way you can find the corresponding tree for a commit<commit>
by specifying<commit>^{tree}
. If you don’t specify an explicit type, the reference is resolved until Git finds an object that isn’t a tag (which is especially handy when you want to find the equivalent of a tag).
Many git commands do not work on a commit, but on the trees that are referenced (e.g. the git diff
command, which compares files, i.e. tree entries).
In the man page, these arguments are called tree-ish.
Git expects arbitrary references, which can be resolved to a tree, with which the command then continues to work.
<tree-ish>:<path>
-
Resolves the path
<path>
to the corresponding referenced tree or blob (corresponds to a directory or file). The referenced object is extracted from<tree-ish>
, which can be a tag, a commit or a tree.
The following example illustrates how this special syntax works:
The first command extracts the SHA-1 ID of the tree referenced by HEAD
.
The second command extracts the SHA-1 ID of the blob corresponding to the README
file at the top level of the git repository.
The third command then verifies that this really is a blob.
$ git rev-parse 'HEAD^{tree}' 89f156b00f35fe5c92ac75c9ccf51f043fe65dd9 $ git rev-parse 89f156b00f:README 67cfeb2016b24df1cb406c18145efd399f6a1792 $ git cat-file -t 67cfeb2016b blob
A git show 67cfeb2016b
would now show the actual contents of the blob.
By redirecting with >
you can extract the blob as a file to the file system.
The following script first finds the commit ID of the commit that last modifies a particular file (the file is passed as the first argument, $1
).
Then the script extracts the file (with prefix, see above) from the predecessor of the commit ($ref~
) that last modified the file, and saves it in a temporary file.
Finally, Vim is called in diff mode on the file and then the file is deleted.
#!/bin/sh SUBDIRECTORY_OK=Yes . $(git --exec-path)/git-sh-setup [ -z "$1" ] && echo "usage: $(basename $0) <file>" && exit 1 ref="$(git log --pretty=format:%H --diff-filter=M -1 -- $1)" git rev-parse --verify $ref >/dev/null || exit 1 prefix="$(git rev-parse --show-prefix)" temp="$(mktemp .diff.$ref.XXXXXX)" git show $ref^:$prefix$1 > $temp vim -f -d $temp $1 rm $temp
To resolve a lot of references with |
8.3.6. Iterating References: for-each-ref
A common task is to iterate references.
Here, Git provides the general-purpose command for-each-ref
.
The common syntax is git for-each-ref --format=<format> <pattern>
.
You can use the pattern to restrict the references to be iterated, e.g. `refs/heads` or refs/tags
.
With the format expression you specify which properties of the reference should be output.
It consists of different fields %(fieldname)
, which are expanded to corresponding values in the output.
refname
-
Name of the reference, e.g. `heads/master`. The addition
:short
shows the short form, i.e.master
.
objecttype
-
Type of object (
blob
,tree
,commit
ortag
)
objectsize
-
Object size in byte
object name
-
Commit ID or SHA-1 sum
upstream
-
Remote Tracking Branch of the Upstream Branch
Here is a simple example how to display all SHA-1 sums of the release candidates of version 1.7.1
:
$ git for-each-ref --format='%(objectname)--%(objecttype)--%(refname:\ short)' refs/tags/v1.7.1-rc* bdf533f9b47dc58ac452a4cc92c81dc0b2f5304f--tag--v1.7.1-rc0 d34cb027c31d8a80c5dbbf74272ecd07001952e6--tag--v1.7.1-rc1 03c5bd5315930d8d88d0c6b521e998041a13bb26--tag--v1.7.1-rc2
Note that the separators "--" are taken over in this way and thus additional characters for formatting are possible.
Depending on the object type, other field names are also available, for example, for a tag the tagger
field, which contains the tag author, his e-mail and the date.
At the same time the fields taggername
, taggeremail
and taggerdate
are available, each containing only the name, the e-mail and the date.
For example, if you want to know for a project who ever created a tag:
$ git for-each-ref --format='%(taggername)' refs/tags | sort -u Junio C Hamano Linus Torvalds Pat Thoyts Shawn O. Pearce
As a further interface different options are offered for script languages, --shell
, --python
, --perl
and --tcl
.
Thus the fields are formatted accordingly as string literals in the respective language, so that they can be evaluated per eval
and translated into variables:
$ git for-each-ref --shell --format='ref=%(refname)' refs/tags/v1.7.1.* ref=_refs/tags/v1.7.1.1_ ref=_refs/tags/v1.7.1.2_ ref=_refs/tags/v1.7.1.3_ ref=_refs/tags/v1.7.1.4_
This can be used to write the following script, which prints a summary of all branches that have an upstream branch - including SHA-1 sum of the most recent commit, its author, and tracking status.
The output is very similar to git branch -vv
, but a bit more readable.
The authorname
field contains the name of the commit author, similar to taggername
.
The core is the eval "$data"
statement, which translates the line-by-line output of for-each-ref
into the variables used later.
#!/bin/sh SUBDIRECTORY_OK=Yes . $(git --exec-path)/git-sh-setup git for-each-ref --shell --format=\ "refname=%(refname:short) "\ "author=%(authorname) "\ "sha1=%(objectname) "\ "upstream=%(upstream:short)" \ refs/heads | while read daten do eval "$daten" if [ -n "$upstream" ] ; then ahead=$(git rev-list $upstream..$refname | wc -l) behind=$(git rev-list $refname..$upstream | wc -l) echo $refname echo -------------------- echo " Upstream: "$upstream echo " Last author: "$author echo " Commit-ID "$(git rev-parse --short $sha1) echo -n " Status: " [ $ahead -gt 0 ] && echo -n "ahead:"$ahead" " [ $behind -gt 0 ] && echo -n "behind:"$behind" " [ $behind -eq 0 ] && [ $ahead -eq 0 ] && echo -n "synchron!" echo fi done
The output will look like this:
$ git tstatus maint -------------------- Upstream: origin/maint Last author: João Britto Commit-ID 4c007ae Status: synchron! master -------------------- Upstream: origin/master Last author: Junio C Hamano Commit-ID 4e3aa87 Status: synchron! next -------------------- Upstream: origin/next Last author: Junio C Hamano Commit-ID 711ff78 Status: behind:22 pu -------------------- Upstream: origin/pu Last author: Junio C Hamano Commit-ID dba0393 Status: ahead:43 behind:126
The other field names as well as examples can be found in the git-for-each-ref(1)
man page.
8.3.7. Rewrite References: git update-ref
If you use for-each-ref
, you usually want to edit references as well - therefore the update-ref
command should be mentioned.
With it you can create references and safely convert or delete them.
Basically git update-ref
works with two or three arguments:
git update-ref <ref> <new-value> [<oldvalue>]
Here is an example that moves the master
to HEAD^
if it points to HEAD
:
$ git update-ref refs/heads/master HEAD^ HEAD
Or to create a new reference topic
at ea0ccd3
:
$ git update-ref refs/heads/topic ea0ccd3
To delete references there is the option -d
:
git update-ref -d <ref> [<oldvalue>]
For example to delete the reference topic
again:
$ git update-ref -d topic ea0ccd3
Of course, you could also manipulate the references with commands like echo <sha> > .git/refs/heads/<ref>
, but update-ref
brings various safeguards and helps to minimize possible damage.
The addition <oldvalue>
is optional, but helps to avoid programming errors.
It also takes care of special cases (symlinks whose target is inside or outside the repository, references pointing to other references, etc.).
An additional advantage is that git update-ref
automatically makes entries in the reflog, which makes troubleshooting much easier.
8.3.8. Extended Aliases
If you have only one one-liner, it is usually not worthwhile to create your own script.
Git aliases were developed for this use case.
For example, it is possible to call external programs by prefixing them with an exclamation mark, for example to simply call gitk --all
with git k
:
$ git config --global alias.k '!gitk --all'
Another example, which deletes all branches already merged and uses a concatenation of commands for this is:
prune-local = !git branch --merged | grep -v ^* | xargs git branch -d
With certain constructs, you may want to rearrange the arguments passed to the alias or use them within a command chain. The following trick is suitable for this, where a shell function is built into the alias:
$ git config --global alias.demo '!f(){ echo $2 $1 ; }; f' $ git demo foo bar bar foo
This allows even more complex one-liners to be defined elegantly as aliases. The following construction filters out for a given file, which authors made how many commits in which the file was changed. If you send patches to the Git project’s mailing list, you are asked to send the mail via CC to the main authors of the files you changed. Use this alias to find out who they are.
who-signed = "!f(){ git log -- $1 | \ grep Signed-off-by | sort | uniq --count | \ sort --human-numeric-sort --reverse |\ sed _s/Signed-off-by: / /_ | head ; } ; f "
There are some things to consider here:
An alias is always executed from the toplevel directory of the repository, so the argument must contain the path inside the repository.
The alias is also based on the fact that all people involved have signed off on the commit with a signed-off-by
line, because these lines are used to generate the statistics.
Since the alias is spread over several lines, it must be enclosed in quotes, otherwise Git cannot interpret the alias correctly.
The final call to head
limits the output to the top ten authors:
$ git who-signed Documentation/git-svn.txt 46 Junio C Hamano <gitster@pobox.com> 30 Eric Wong <normalperson@yhbt.net> 27 Junio C Hamano <junkio@cox.net> 5 Jonathan Nieder <jrnieder@uchicago.edu> 4 Yann Dirson <ydirson@altern.org> 4 Shawn O. Pearce <spearce@spearce.org> 3 Wesley J. Landaker <wjl@icecavern.net> 3 Valentin Haenel <valentin.haenel@gmx.de> 3 Ben Jackson <ben@ben.com> 3 Adam Roben <aroben@apple.com>
Further interesting ideas and suggestions can be found in the Git-Wiki on the page about aliases.[115]
8.4. Rewriting Version History
The previously introduced git rebase
command and its interactive mode allows developers to edit commits at will.
Code that is still in development can be "cleaned up" before it is integrated (e.g. via merge) and thus permanently merged with the software.
But what if all commits are to be changed afterwards, or at least a large part of them? Such requirements arise, for example, when a previously private project is to be published, but sensitive data (keys, certificates, passwords) are included in the commits.
Git offers the filter-branch
command to automate this task.
Basically, it works like this:
You specify a set of references that Git should rewrite.
You also define commands that are responsible for modifying the commit message, tree contents, commits, etc. Git goes through each commit and applies the appropriate filter to the appropriate part.
The filters are executed per eval
in the shell, so they can be complete commands or names of scripts.
The following list describes the filters that Git offers:
--env-filter
-
Can be used to adjust the environment variables under which the commit is rewritten. Especially the variables
GIT_{AUTHOR,COMMITTER}_{NAME,EMAIL,DATE}
can be exported with new values if needed.
--tree filter
-
Creates a checkout for each commit to be rewritten, changes to the directory and executes the filter. Afterwards, new files are automatically added and old ones deleted and all changes are applied.
--index filter
-
Manipulates the index. Behaves similar to the tree filter, except that Git doesn’t create a checkout, making the index filter faster.
--msg-filter
-
Receives the commit message on default-in and prints the new message on default-out.
--commit-filter
-
Is called instead of
git commit-tree
and can thus in principle make several commits from one. See the man page for details.
--tag-name filter
-
Will be called for all tag names that point to a commit that has been rewritten elsewhere. If you use
cat
as filter, the tags will be applied.
--subdirectory-filter
-
Only view the commits that modify the specified directory. The rewritten history will contain only this directory, as the topmost directory in the repository.
The general syntax of the command is:
git filter-branch <filter> - <references>
.
Here <references>
is an argument for rev-parse
, so it can be one or more branch names, a syntax of the form <ref1>..<ref2>
or simply --all
for all references.
Note the double bar --
, which separates the arguments for filter-branch
from those for rev-parse
!
As soon as one of the filters does not end with the return value zero on a commit, the whole rewrite process will abort.
So be careful to catch possible error messages or ignore them by appending || true
.
The original references are stored under original/
, so when you rewrite the master
branch, original/refs/heads/master
still points to the original, unrewritten commit (and its predecessor, accordingly).
If this backup reference already exists, the filter-branch
command will refuse to rewrite the reference unless you specify the -f
option for force.
You should always do your |
The following examples deal with some typical use cases of the filter-branch
command.
8.4.1. Removing Sensitive Information Afterwards
Ideally, sensitive data such as keys, certificates or passwords are not part of a repository. Even large binary files or other data junk unnecessarily inflate the size of the repository.
Open source software, the use of which is permitted, but the distribution of which is prohibited by license terms ('no distribution'), may of course not appear in a repository that you make available to the public.
In all these cases you can rewrite the project history so that nobody can find out that the corresponding data ever appeared in the version history of the project.
If you are working with git tags, it is always a good idea to pass the |
To delete only some files or subdirectories from the entire project history, use a simple index filter. All you have to do is tell Git to remove the corresponding entries from the index:
$ git filter-branch --index-filter \ 'git rm --cached --ignore-unmatch <file>' \ --prune-empty -- --all
The --cached
and --ignore-unmatch
arguments tell git rm
to remove only the index entry, and not to abort with an error if the corresponding entry does not exist (e.g. because the file was not added until a particular commit). If you want to delete directories, you must also specify -r
.
The argument --prune-empty
makes sure that commits which do not change the tree after applying the filter are omitted.
So if you have added a certificate with a commit, and this commit becomes an "empty" commit by removing the certificate, Git will omit it altogether.
Similar to the command above, you can also move files or directories with git mv
.
If the operations are a bit more complex, you should consider designing several simple filters and calling them one after the other.
It is possible that a file you want to delete had a different name in the past.
To check this, use the command |
8.4.1.1. Removing Strings from Files
If you don’t want to change whole files, but only certain lines in all commits, a filter at index level is not sufficient. You must use a tree filter.
For each commit, Git will check out the relevant tree, change to the appropriate directory, and then run the filter.
Any changes you make will be applied (without you having to use git add
etc.).
To erase the password v3rYs3cr1T
from all files and commits, the following commands are required:
$ git filter-branch --tree-filter 'git ls-files -z | \ xargs -0 -n 1 sed -i "s/v3rYs3cr1T/PASSWORD/g" \ 2>/dev/null || true' -- master Rewrite cbddbd3505086b79dc3b6bd92ac9f811c8a6f4d1 (142/142) Ref _refs/heads/master_ was rewritten
The command performs an in-place replacement with sed
on every file in the repository.
Any error messages are neither issued nor do they cause the filter-branch
call to be aborted.
After the references have been rewritten, you can use the pickaxe tool (-G<expression>
, see Sec. 2.1.6, “Examining the Project History”) to verify that no commit really introduces the string v3rYs3cr1T
anymore:
$ git log -p -G"v3rYs3cr1T" # should not produce any output
Tree filters must check out the appropriate tree for each commit.
This creates a considerable overhead for many commits and many files, so a By specifying |
8.4.1.2. Renaming a Developer
If you want to rename a developer, you can do this by changing the variable GIT_AUTHOR_NAME
in an environment filter, if necessary.
For example like this:
$ git filter-branch -f --env-filter \ 'if [ "$GIT_AUTHOR_NAME" = "Julius Plenz" ]; then export GIT_AUTHOR_NAME="Julius Foobar"; fi' -- master
8.4.2. Extracting a Subdirectory
The Subdirectory filter allows you to rewrite the commits so that a subdirectory of the current repository becomes the new top-level directory. All other directories and the former top-level directory are dropped. Commits that have not changed anything in the new subdirectory are also dropped.
In this way, you can, for example, extract the version history of a library from a larger project. The exchange between the outsourced project and the base project can work via submodules or subtree-merges (see Sec. 5.11, “Managing Subprojects”).
To split the directory t/
(containing the test suite) from the git source repository, the following command is sufficient:
$ git filter-branch --subdirectory-filter t -- master Rewrite 2071fb015bc673d2514142d7614b56a37b3faaf2 (5252/5252) Ref _refs/heads/master_ was rewritten
Attention: This command runs for several minutes.
8.4.3. Grafts: Subsequent Merges
Git provides a way to simulate merges via so-called Graft Points or Grafts (to graft: plant).
Such grafts are stored line by line in the file .git/info/grafts
and have the following format:
commit [parent1 [parent2 ...]]
In addition to the information that Git gets from the commit metadata, you can also specify one or more parents for any commits.[116]
Make sure to still consider the repository as a DAG and not close any circles:
Do not define HEAD
as the predecessor of the root commit!
The grafts file is not part of the repository, so a git clone
does not copy this information, it just helps Git find a merge base.
However, when filter-branch
is called, this graft information is hard-coded into the commits.
This is especially useful in two cases: If you import an old version history from a tool that cannot handle merges correctly (e.g. previous Subversion versions), or if you want to "glue" two version histories together.
Let’s assume the development was switched to Git. But nobody has taken care of converting the old version history. So the new repository was started with an initial commit that reflected the state of the project at that time.
Meanwhile, you’ve successfully converted the old version history to Git, and now you want to append it before the initial commit (or instead). To do this, proceed as follows:
$ cd <neues-repository> $ git fetch <altes-repository> master:old-master ... Konvertierte Commits importieren ...
You now have a multi-root repository.
You then need to find the initial commit of the new repository ($old_root
) and define the latest commit of the old, converted repository ($old_tip
) as its predecessor:
$ old_root=`git rev-list --reverse master | head -n 1` $ old_tip=`git rev-parse old-master` $ echo $old_root $old_tip > .git/info/grafts
Look at the result with Gitk or a similar program.
If you are satisfied, you can make the grafts permanent (all commits starting at $old_tip
are rewritten).
To do this, call git filter-branch
without specifying any filters:
$ git filter-branch -- $old_tip.. Rewrite 1591ed7dbb3a683b9bf1d880d7a6ef5d252fc0a0 (1532/1532) Ref _refs/heads/master_ was rewritten $ rm .git/info/grafts
Of course you also have to delete the remaining backup references (see below).
8.4.4. Deleting Old Commits
After you have removed any sensitive data from all commits, you still need to make sure that these old commits do not reappear. In the repository you rewrote, this is done in three steps:
-
Delete the backup references under
original/
. -
You can do this with the following command:
$ git for-each-ref --format='%(refname)' -- 'refs/original/' | \ xargs -n 1 git update-ref -d
If you have not yet rewritten or deleted old tags or other branches, you must of course do this first.
-
Delete the Reflog:
$ git reflog expire --verbose --expire=now --all
-
Delete the (orphaned) commits that are no longer accessible.
-
The best way to do this is to use the
gc
option--prune
, which sets the time since when a commit should be unreachable so that it is deleted: -
Now.
$ git gc --prune=now
If other developers are working with an outdated version of the repository, they must now "migrate". It is essential that they do not use their development branches to pull old commits back into the cleaned up repository.
The best way to do this is to clone the new repository, fetch important branches from the old repository using git fetch
, and rebase directly on the new commits.
You can then dispose of the old commits using git gc --prune=now
.