9. Interacting with Other Version Control Systems

Git has interfaces to other version control systems, which are important for two basic use cases:

Bidirectional communication

You want to develop locally in a Git repository, but also transfer the changes to an external repository or import changes from there to Git.

Migration

You want to import the version history stored in an existing repository of another system into Git.

Git offers the following interfaces — all of which allow two-way communication and complete conversion:

Subversion (svn)

The git-svn tool provides all the essential subcommands for dealing with Subversion repositories and is discussed in detail in this chapter. The program is implemented in Perl and uses the Perl bindings for Git and Subversion. It is managed together with the Git sources in the git.git repository (stored as git-svn.perl). Note: The tool is called git-svn, but is called as usual with git svn <command>. The technical documentation is available in the git-svn(1) man page.

Concurrent Versioning System (cvs)

The git cvsimport command imports and synchronizes a CVS repository — its counterpart is git cvsexportcommit.

Perforce (p4)

With git p4 you address repositories of the proprietary Perforce system.

For the interaction with other VCS there are also a lot of additional tools and scripts that improve, extend and partly replace the mentioned commands. But also interfaces to other version control systems, such as Mercurial, are offered. If the commands and recipes described in this chapter are not sufficient, an internet research is worthwhile. As a first starting point we recommend the Git-Wiki.⁠[117]

In addition to its immediate communication capabilities with other systems, Git has its own simple plain-text protocol that lets you translate the version history from any system in such a way that Git creates a repository from it. For a detailed description including an example, see Sec. 9.2, “Custom Importers” about Fast Import.

9.1. Subversion

The following is about how to use git-svn. We’ll show you how to convert Subversion repositories and how to use it to exchange changes between a Subversion repository and Git.

9.1.1. Conversion

The goal is to transfer the version history from a Subversion repository to a Git repository. Before you start, you will need to make preparations that may take some time, depending on the size of your project. However, good preparation helps you to avoid mistakes from the start.

9.1.1.1. Preparation

You should have the following information at hand:

  1. Who are the authors? What are their e-mail addresses?

  2. How is the repository structured? Are there branches and tags?

  3. Should metadata about the Subversion revision be stored in the git commits?

Later, you will run the command git svn clone. The answers to the above questions will determine which options and arguments you use to do this.

Our experience has shown that rarely a single conversion attempt is sufficient. If the Subversion repository is not already local, it’s definitely worth making a local copy of it — so you don’t have to download the revisions over the network on a second attempt. You can use rsvndump, for example, to do this.⁠[118]

Subversion uses less extensive author metadata than Git; revisions are simply marked with a Subversion username, and there is no difference between the author and committer of a revision. In order for git-svn to convert Subversion usernames to the full names with email addresses typical of Git, a so-called authors file is required:

jplenz  = Julius Plenz <julius@plenz.com>
vhaenel = Valentin Haenel <valentin.haenel@gmx.de>

The file, e.g. authors.txt, is later passed to git-svn via --authors-file= or -A.

The following one-liner determines all Subversion usernames and helps you to create the file:

$ svn log --xml | grep author | sed 's_^.*>\(.*\)<.*$_\1_' | \
  sort --unique

If you do not specify an authors file when converting (or if an author is missing), git-svn will use the Subversion username as the author. The e-mail address is composed of the Subversion username and the UUID of the Subversion repository.

Find out how the repository is structured in the next step. The following questions will help:

  1. Does the repository have a so-called trunk (main development thread), branches and tags?

    1. If so, is the default Subversion layout (trunk/, branches/, tags/) used?

    2. If not, in which directories are trunk, branches and tags then?

  2. Are only a single or multiple projects managed in the repository?

If the project follows the Subversion standard layout (Figure 47, “Standard Subversion layout”), use the argument --stdlayout or -s for short.

svn stdlayout crop
Figure 47. Standard Subversion layout
9.1.1.2. SVN Metadata

The --no-metadata argument prevents additional metadata from being included in the commit message. To what extent this makes sense for your use case is up to you to decide. From a technical standpoint, metadata is only necessary if you want to continue to interact with the Subversion repository. However, it may also be helpful to preserve the metadata, for example if you use the Subversion revision number in your bug tracking system.

The SVN metadata appears in the last line of each commit message and takes the following form:

git-svn-id: <URL>@<Revision> <UUID>

<URL> ist die URL des Subversion-Repositorys, <Revision> die Subversion-Revision und <UUID> (Universally Unique Identifier) eine Art “Fingerabdruck” des Subversion-Repositorys. Zum Beispiel:

<URL> is the URL of the Subversion repository, <Revision> is the Subversion revision, and <UUID> (Universally Unique Identifier) is a sort of “fingerprint” of the Subversion repository. For example:

git-svn-id: file:///demo/trunk@8 2423f1c7-8de6-44f9-ab07-c0d4e8840b78
9.1.1.3. Specifying a Username

How you specify the user name depends on the transport protocol. For those where Subversion handles authentication (e.g. http, https, and svn), use the --username option. For others (svn+ssh), you must specify the username as part of the URL, for example, svn+ssh://USER@svn.example.com.

9.1.1.4. Converting Standard Layouts

You can convert an SVN repository in standard layout with the following call (after you have created an Authors file):

$ git svn clone <http://svn.example.com/> -s -A <authors.txt> \
    --no-metadata <projekt-konvertiert>
9.1.1.5. Non-Standard Layouts

If the repository is not laid out according to the Subversion standard layout, adjust the call to git svn accordingly: Instead of --stdlayout, explicitly specify the trunk with --trunk or -T, the branches with --branches or -b, and the tags with --tags or -t — if, for example, several projects are managed in one Subversion repository (Figure 48, “Non-Standard Layout”).

svn nonstdlayout crop
Figure 48. Non-Standard Layout

To convert project1, the call would be as follows:⁠[119]

$ git svn clone <http://svn.example.com/> -T trunk/projekt1 \
  -b branches/projekt1 -t tags/projekt1 \
  -A <authors.txt> <projekt1-konvertiert>

An SVN repository without branches or tags can simply be cloned by using the URL of the project directory and omit --stdlayout entirely:

$ git svn clone <http://svn.example.com/projekt> -A authors.txt \
    --no-metadata <projekt-konvertiert>

If several independent projects are managed in one repository, we recommend that you create a separate Git repository for each project. Unlike Subversion, Git is not suitable for managing multiple projects in one repository. The object model means that the development histories (commit graphs) would become inextricably linked. How to “link” projects from different Git repositories is described in Sec. 5.11, “Managing Subprojects”.

9.1.1.6. Postprocessing

Once git svn clone has run, you’ll usually need to do a bit of rework on the repository.

During conversion, git-svn ignores all Subversion properties except svn:execute. If the Subversion repository uses the svn:ignore properties to exclude files, you can translate them into one (or recursively for multiple) .gitignore file(s):

$ git svn create-ignore

The .gitignore files are only created and added to the index — you still have to check them in.

Git creates special git branches under remotes/origin for the Subversion trunk and the Subversion branches and tags. They are very similar to the remote tracking branches, in that they reflect the state of the Subversion repository-that is, they are Subversion tracking branches, so to speak. They are mainly used for bidirectional communication and are updated when synchronized with the Subversion repository. However, if you only want to convert the repository, these branches are of no use anymore and should be rewritten to “real” Git repositories (see below).

A Subversion tracking branch is created for the trunk and for each Subversion branch,⁠[120] and for each Subversion tag a Subversion tracking branch is also created (no git tag, see below), but under remotes/origin/tags.

Assume that the Subversion repository has the following Subversion branches and tags:

svn branches crop
Figure 49. Example Subversion branches and tags

In this case git svn creates the following git branches:

git branches crop
Figure 50. Converted Git Branches

You can adjust the prefix with the option --prefix=. For example, with the --prefix=svn/ statement, all converted references are stored under remotes/svn/ instead of remotes/origin.

As already mentioned, git-svn does not create git tags for Subversion tags. This is because from a technical point of view, Subversion tags are hardly different from Subversion branches. They are also created with git svn copy and — unlike git tags — can be changed afterwards. To be able to track such updates, Subversion tags are therefore also displayed as Subversion tracking branches. Like the Subversion branches, they are of no use (but rather cause confusion) in a converted repository, and should be converted to real Git tags.

If you want to keep the Subversion branches and tags, you should translate the Subversion tracking branches into local Git branches or lightweight Git tags. The following shell script git-convert-refs will help you in the first step:⁠[121]

#!/bin/sh

. $(git --exec-path)/git-sh-setup
svn_prefix='svn/'

convert_ref(){
  echo -n "converting: $1 to: $2 ..."
  git update-ref $2 $1
  git update-ref -d $1
  echo "done"
}

get_refs(){
  git for-each-ref $1 --format='%(refname)'
}

echo 'Converting svn tags'
get_refs refs/remotes/${svn_prefix}tags | while read svn_tag
do
  new_ref=$(echo $svn_tag | sed -e "s|remotes/$svn_prefix||")
  convert_ref $svn_tag $new_ref
done

echo "Converting svn branches"
get_refs refs/remotes/${svn_prefix} | while read svn_branch
do
  new_ref=$(echo $svn_branch | sed -e "s|remotes/$svn_prefix|heads/|")
  convert_ref $svn_branch $new_ref
done

The script assumes that the repository was converted with the --prefix=svn/ option. The two while loops do the following:

  • A git tag is created for each Subversion tracking branch that corresponds to a Subversion tag (e.g. refs/remotes/svn/tags/v1.0refs/tags/v1.0).

  • For each Subversion tracking branch that corresponds to a Subversion branch, a “real” local Git branch is created (e.g. refs/remotes/svn/bugfixrefs/heads/bugfix)

The script uses the plumbing commands git for-each-ref, which prints references matching the given expression line by line, and git update-ref, which rewrites and deletes references.⁠[122]

See Figure 51, “Converted branches and tags before translation” and Figure 52, “Converted branches and tags after translation” to see how the script works. In the Subversion repository there is a trunk, a branch feature and the v1.0 tag. git-svn creates three branches under remotes/svn during the conversion process, as described above. The script git-convert-refs finally translates remotes/svn/trunktrunk, remotes/svn/featurefeature and remotes/svn/tags/v1.0 becomes a lightweight tag.

git convert refs before
Figure 51. Converted branches and tags before translation
git convert refs after
Figure 52. Converted branches and tags after translation

After rewriting Subversion branches and tags, you will notice that all Git tags “sit” on very short branches (see tag v1.0 in Figure 52, “Converted branches and tags after translation” and Figure 53, “Converted Git tags on branches”). This is because each Subversion tag is stored with a Subversion commit. So the conversion behavior of git-svn is correct in principle, because one Git commit is created per Subversion revision — but a bit unwieldy for a Git repository: you cannot use git describe --tags, for example.

However, unless the Subversion tag has been modified afterwards, the tagged commit references the same tree as its ancestor, so you can move the tags to the ancestors. The following shell script git-fix-tags[123] will help here:

#!/bin/sh

. $(git --exec-path)/git-sh-setup
get_tree(){ git rev-parse $1^{tree}; }

git for-each-ref refs/tags --format='%(refname)' \
| while read tag
do
    sha1=$(git rev-parse $tag)
    tree=$(get_tree $tag )
    new=$sha1
    while true
    do
        parent=$(git rev-parse $new^)
        git rev-parse $new^2 > /dev/null 2>&1 && break
        parent_tree=$(get_tree $parent)
        [ "$parent_tree" != "$tree" ] && break
        new=$parent
    done
    [ "$sha1" = "$new" ] && break
    echo -n "Found new commit for tag ${tag#refs/tags/}: " \
        $(git rev-parse --short $new)", resetting..."
    git update-ref $tag $new
    echo 'done'
done

The script examines every tagged commit. If there is a commit among the ancestors that references the same tree, the tag is renewed. If the commit or one of its ancestors itself has multiple ancestors (after a merge), the search is aborted. In Figure 53, “Converted Git tags on branches”, you can see two tags that come into consideration: v1.0 and v2.0. The v1.0 tag was created from commit C1 and does not contain any subsequent changes. The v2.0 tag, on the other hand, was modified again after it was created from Commit C2.

git svn tag fix before
Figure 53. Converted Git tags on branches

In Figure 54, “Tag v1.0 was rewritten” you can see how tag v1.0 was moved from the above script to the ancestor (because the trees are the same). However, tag v2.0 remains in place (because the trees are different due to subsequent changes).

git svn tag fix after
Figure 54. Tag v1.0 was rewritten

The tool git-svn-abandon[124] takes a similar approach to the two scripts presented, i.e. it converts Subversion tracking branches and moves tags. Instead of lightweight tags, however, it creates annotated tags and does some additional cleanup work, similar to the ones we’ll cover next. Another alternative for moving tags is the script git-move-tags-up.⁠[125]

You should still decide how to handle the trunk reference (trunk or git-svn). After conversion, it will point to the same commit as master, so you can actually delete it:

$ git branch -d trunk

There may still be Git branches in the repository after the conversion that have already been merged into master. Remove them with the following command:

$ git checkout master
$ git branch --merged | grep -v '^*' | xargs git branch -d

You can also dispose of the remaining legacy files that are both in the repository configuration and in .git/:

$ rm -r .git/svn
$ git config --remove-section svn
$ git config --remove-section svn-remote.svn

You are then ready to upload the converted history to a remote repository to share it with other developers.

$ git remote add <example> <git@git.example.com:projekt1.git>
$ git push <example> --mirror
9.1.1.7. Subversion Merges

Subversion merges are detected by git-svn using the svn:mergeinfo properties and translated as git merges — although not always. It depends on which Subversion revisions were merged and how. If all revisions affecting a branch have been merged (svn merge -r <N:M>), this is represented by a Git merge commit. However, if only individual revisions have been merged (via svn merge -c <N>), then they are simply committed with git cherry-pick instead.

For the following example, we have created a Subversion repository with a branch feature that is merged twice: once as a Subversion merge, which is considered a Git merge commit, and once as a Subversion merge, which is translated as cherry-pick. The result converted with git-svn is shown below.

git svn merge demo
Figure 55. Converted Subversion repository

The commits in the Subversion repository were made in the following order:

  1. Standardlayout

  2. C1 on trunk

  3. Branch feature

  4. C1 on feature

  5. C2 on feature

  6. C2 on trunk

  7. svn merge branches/feature trunk -c 5 (commit C2 on feature)

  8. svn merge branches/feature trunk -r 3:5 (commit C1&`C2` on feature)

Finally, it should be mentioned that git-svn is by far not the only tool for conversion. git-svn often suffers from speed problems with very large repositories. In this context, two tools are mentioned very often that work faster: on the one hand svn2git[126] and also svn-fe[127] (svn-fast-export). If you encounter problems during the conversion (e.g. if the conversion has been running for several days and there is no end in sight), it is worth taking a look at the alternatives.

9.1.2. Bidirectional Communication

The git-svn tool can not only convert a Subversion repository, it is also a better Subversion client. This means you have all the benefits of Git locally (easy and flexible branching, local commits and history) — but you can upload your Git commits from your local Git repository as Subversion commits to a Subversion repository. Additionally, git-svn allows you to download new commits from other developers in the Subversion repository to your local Git repository. You should use git-svn if a complete conversion to Git is not feasible, but you’d like to take advantage of the local benefits of Git. Note that git-svn is a somewhat limited version of Subversion, and not all features are fully available. There are some subtleties to consider, especially when uploading.

First, a summary of the most important git-svn commands:

git svn init

Create a Git repository to track a Subversion repository.

git svn fetch

Download new revisions from the Subversion repository.

git svn clone

Combination of 'git svn init` and git svn fetch.

git svn dcommit

Upload Git commits as Subversion revisions to the Subversion repository (diff commit)

git svn rebase

Combination of git svn fetch and git rebase, usually executed before a git svn dcommit.

9.1.2.1. Cloning a Subversion Repository

To retrieve the repository, first follow the same procedure as in the Subversion conversion section — create an authors file and determine the repository layout. Then you can use git svn clone to clone the Subversion repository, for example:

$ git svn clone http://svn.example.com/ -s \
  -A <authors.txt> <projekt-git>

The call downloads all Subversion revisions and creates a Git repository from the history under <project-git>.

Cloning an entire Subversion history can be extremely time consuming under certain circumstances. From a Subversion point of view, a long history is not a problem because the svn checkout command usually only downloads the current revision. Something similar can be done with git-svn. To do this, you first have to initialize the local Git repository and then only download the current revision (HEAD) from the trunk or branch. The advantage here is certainly the speed, the disadvantage is that there is no local history:

$ git svn init http://svn.example.com/trunk projekt-git
$ cd projekt-git
$ git svn fetch -r HEAD

As an alternative to HEAD, you could specify any revision and then use git svn fetch to download the missing revisions up to HEAD, thus cloning only part of the history.

As part of the conversion, we described how to post-process the repository. Since you want to continue interacting with the Subversion repository in the future, this is not necessary here. Also, the --no-metadata option must not be used, because otherwise the metadata of the form git-svn-id: will disappear from the commit message, and Git will no longer be able to map the commits and revisions.

The call to git-svn creates several entries in the configuration file .git/config. First, an entry svn-remote.svn, which, similar to a remote entry for a Git repository, contains information about the URL and the Subversion branches and tags to track. For example, if you cloned a repository with a standard layout, it might look like this:

[svn-remote "svn"]
    url = http://svn.example.com/
    fetch = trunk:refs/remotes/origin/trunk
    branches = branches/*:refs/remotes/origin/*
    tags = tags/*:refs/remotes/origin/tags/*

In contrast to a regular remote entry this one additionally contains the values branches and tags. These in turn each contain a refspec describing how Subversion branches and tags are stored locally as Subversion tracking branches. The fetch entry only handles the Subversion trunk and must not contain any glob expressions.

If you do not have any Subversion branches and tags, the corresponding entries are omitted:

[svn-remote "svn"]
    url = http://svn.example.com/
    fetch = :refs/remotes/git-svn

If you clone the repository with the prefix option, for example with --prefix=svn/, git svn will adjust the refspecs:

[svn-remote "svn"]
    url = http://svn.example.com/
    fetch = trunk:refs/remotes/svn/trunk
    branches = branches/*:refs/remotes/svn/*
    tags = tags/*:refs/remotes/svn/tags/*

If you specify an authors file, a separate entry is created for it. The file will still be needed in the future when you download new commits from the Subversion repository.

[svn]
    authorsfile = /home/valentin/svn-testing/authors.txt

In the section on conversion we described how to use create-ignore to create .gitignore files. However, if you want to continue working with the Subversion repository, there is little point in checking in the .gitignore files there. They have no effect on Subversion and only confuse other developers who continue to work with the native Subversion client (svn). Instead, there is an option to store the patterns to ignore in the .git/info/excludes file (see Sec. 4.4, “Ignoring Files”), which is not part of the repository. The git svn show-ignore command, which searches for and outputs all svn-ignore properties, can help here:

$ git svn show-ignore > .git/info/excludes
9.1.2.2. Examining a Repository

In addition, git-svn provides some commands for examining the history and other properties of the repository:

git svn log

A hybrid of svn log and git log. The subcommand produces output similar to svn log, but uses the local repository to create it. Several options of git svn have been recreated, such as -r <N>:<M>. Unknown options, e.g. -p, are passed directly to git log so that options from both commands can be mixed:

$ git svn log -r 3:16 -p

It would now show the revisions 3–16, including a patch of the changes.

git svn blame

Similar to svn blame. With the --git-format option, the output has the same format as git blame, but with Subversion revisions instead of the SHA-1 IDs.

git svn find⁠-⁠rev

Shows the SHA-1 ID of the Git commit, which is the changeset of a particular Subversion revision. The revision is passed with the syntax r<N>, where <N> is the revision number:

$ git svn find-rev r6
c56506a535f9d41b64850a757a9f6b15480b2c07
git svn info

Like svn info. Returns various information about the Subversion repository.

git svn proplist

Like svn proplist, prints a list of existing Subversion properties.

git svn propget

Like svn propget, outputs the value of a single Subversion property.

Unfortunately, currently git-svn can only query Subversion properties, but cannot create, modify or delete them.

9.1.2.3. Exchanging Commits

Similar to git fetch, git svn fetch downloads new commits from the Subversion repository. In the process, git-svn fetches all new Subversion revisions, translates them into Git commits, and finally updates the Subversion tracking branches. The output is a list of downloaded Subversion revisions, the files changed by the revision, the SHA-1 sum, and the Subversion tracking branch of the resulting Git commit, e.g:

$ git svn fetch
        A   COPYING
        M   README
r21 = 8d707316e1854afbc1b728af9f834e6954273425 (refs/remotes/trunk)

You can work locally in the Git repository as usual, but there is an important restriction when uploading commits to the Subversion repository: While git-svn is capable of rendering Subversion merges to some degree (see above), it can’t map local Git merges to Subversion merges, so only linear histories should be uploaded via git svn dcommit.

To make this linearization easier, there is the command git svn rebase. It first downloads all new commits from the Subversion repository and then rebuilds the current Git branch to the appropriate Subversion tracking branch via git rebase.

Essentially, the workflow consists of the following commands:

$ git add/commit ...
$ git svn rebase
$ git svn dcommit

Figure 56 shows what git svn rebase does. First, new revisions are downloaded from the Subversion repository, in this case C. Then the remote/origin/trunk tracking branch is “advanced” so to speak, and then corresponds to the current status in the Subversion repository. Finally, the current branch (in this case master) is rebuilt using git rebase. The commit D' can now be uploaded.

svn rebase
Figure 56. git svn rebase integrates the newly added Subversion revision as commit C — before D, which becomes D'.

With git svn dcommit, you upload a Git commit changeset as a revision to the Subversion repository. As part of the operation, the revision is again committed to the local Git repository as a Git commit, but this time with Subversion metadata in the commit message. This, of course, changes the SHA-1 sum of the commit, as shown in Figure 57 by the different commits D' and D''.

svn dcommit
Figure 57. After a git svn dcommit, the commit D' has a new SHA-1 ID and becomes D'' because its commit description has been changed to store meta information.

Similar to git push, you may not use git rebase or git commit --amend to modify commits that you have already uploaded with git svn dcommit.

9.1.2.4. Subversion Branches and Tags

The subcommands git svn branch and git svn tag are used to create Subversion branches and tags. For example:

$ git svn tag -m "Tag Version 2.0" v2.0

In the Subversion repository, this creates the tags/v2.0 directory, the contents of which is a copy of the current HEAD.⁠[128] In the Git repository, a new Subversion tracking branch (remotes/origin/tags/v2.0) is created for this. The -m option optionally passes a message. If not, git-svn sets the message Create tag <tag>.

Git version 1.7.4 introduced a feature that allows you to perform Subversion merges. The feature is available to git svn dcommit via the --mergeinfo option and causes the Subversion property svn:mergeinfo to be set. The documentation for this option in the git-svn(1) man page is new in version 1.7.4.5 and later.

The following is an example of a procedure for creating a branch with git-svn, committing it in it and merging it again later, in the sense of Subversion.

First create the Subversion branch — the command works basically like git svn tag:

$ git svn branch <feature>

Then you create a local branch to work with and commit to it. The branch must be based on the Subversion tracking branch <feature>:

$ git checkout -b <feature> origin/<feature>
$ git commit ...

Then upload the commits to the Subversion repository. The git svn rebase call is only necessary if another user has made commits to the Subversion feature branch in the meantime.

$ git svn rebase
$ git svn dcommit

Now you have to transfer the merge information separately. To do this, proceed as follows: First you merge the branch locally in the Git repository and then upload the resulting merge commit using --mergeinfo. The syntax for this option is:

$ git svn dcommit --mergeinfo=<branch-name>:<N>-<M>

Where <branch-name> is the Subversion name of the branch, e.g. /branches/<name>, <N> the first Subversion revision that changes the branch, and <M> the last.⁠[129] Assuming you created the branch with revision 23 and now, after two commits, want to merge the branch again, the command would be:

$ git checkout master
$ git merge --no-ff <feature>
$ git svn dcommit --mergeinfo=/branches/feature:23-25

9.2. Custom Importers

Git offers an easy and convenient way to turn any version history into a Git repository using the fast-import subcommand. The fast-import protocol is text-based and very flexible.⁠[130]

Any kind of data can be used as a basis: be it backups, tarballs, repositories of other version control systems, or, or, or, or…​ An import program that you can write in any language must translate the existing history into the so-called Fast Import Protocol and output it to Standard Out. This output is then processed by git fast-import, which uses it to create a full-featured Git repository.

For simple importers who need to import a linear version history, three building blocks are important:

Data block

A data block begins with the keyword data, followed by a space, followed by the data length in bytes and a line break. This is immediately followed by the data, followed by another line break. The data block does not have to be ended explicitly, since its length is specified in bytes. It looks like this, for example:

data 4
test
File

To pass the contents of a file, use the following format in the simplest case: M <mode> inline <path> followed by a data block on the next line.

So to import a file README with the content test (without a final newline!) the following construct is necessary:

M 644 inline README
data 4
test
Commit

For a commit, you must specify the appropriate metadata (at least the committer and date, and a commit message), followed by the changed files. This is done in the following format:

commit <branch>
committer <who> <email> <when>
<Data block for commit message>
deleteall

For <branch> use a corresponding branch on which the commit should be made, e.g. refs/heads/master. The name of the committer (<who>) is optional, but the email address is not. The format of <when> must be a Unix timestamp with timezone, e.g. 1303329307 +0200.⁠[131] Analogous to the committer line, you can add an author line.

The data block forms the commit message. The final deleteall tells Git to forget everything about files from previous commits. So for each commit, you add all the data completely new.⁠[132] Then follow one or more file definitions. This could look like this, for example:

commit refs/heads/master
committer Julius Plenz <julius@plenz.com> 1303329307 +0200
data 23
Import the README File
deleteall
M 644 inline README
data 4
test

Unless otherwise specified, commits are built upon each other in the order in which they are read (if they are on the same branch).

With these simple components we want to demonstrate how to turn old release tar balls into a Git archive using a small shell script.

First we download old releases of the editor Vim:

$ wget -q --mirror -nd ftp://ftp.home.vim.org/pub/vim/old/

For each tarball we now want to create a commit. For this we proceed as follows:

  1. Read in archives line by line on Standard In and convert them into absolute path names (because the directory will be changed later).

  2. For each of these archives, perform the following steps:

    1. “version”, last change, current time and commit message in the appropriate variables. The time zone is hard coded for simplicity.

    2. Create a temporary directory and unpack the archive there.

    3. Output the corresponding lines commit, author, committer. Then the prepared commit message, whose length is counted by wc -c (byte count). Finally the keyword deleteall.

    4. Output a corresponding file block for each file. The first component of the file name is discarded (e.g. ./vim-1.14/). The length of the following file is again counted using wc -c.

    5. Delete the temporary directory.

All output of the script is set to Standard Out, so it can be easily piped to git fast-import. The beginning of the output looks like this:

commit refs/heads/master
author Bram Moolenaar <bram@vim.org> 1033077600 +0200
committer Julius Plenz <julius@plenz.com> 1303330792 +0200
data 15
import vim-1.14
deleteall
M 644 inline src/vim.h
data 7494
/* vi:ts=4:sw=4
 *
 * VIM - Vi IMitation
...

To create a Git repository from this output, let’s proceed as follows:

$ git init vimgit
Initialized empty Git repository in /dev/shm/vimgit/.git/
$ cd vimgit
$ ls ../vim/*.tar.gz | <import-tarballs.sh> | git fast-import
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:       5000
Total objects:         1350 (      1206 duplicates                  )
      blobs  :         1249 (      1177 duplicates        523 deltas)
      trees  :           87 (        29 duplicates          0 deltas)
      commits:           14 (         0 duplicates          0 deltas)
      tags   :            0 (         0 duplicates          0 deltas)
Total branches:           1 (         1 loads     )
      marks:           1024 (         0 unique    )
      atoms:            354
Memory total:          2294 KiB
       pools:          2098 KiB
     objects:           195 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize =   33554432
pack_report: core.packedGitLimit      =  268435456
pack_report: pack_used_ctr            =          1
pack_report: pack_mmap_calls          =          1
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =    7668864 /    7668864
---------------------------------------------------------------------

The command outputs a lot of statistical data about the import process (and aborts with a corresponding error message if the input is not understood). A subsequent reset synchronizes index, working tree and repository, and the tar-balls are successfully imported:

$ git reset --hard
HEAD is now at ddb8ffe import vim-4.5
$ git log --oneline
ddb8ffe import vim-4.5
4151b0c import vim-4.4
dbbdf3d import vim-4.3
6d5aa08 import vim-4.2
bde105d import vim-4.1
332228b import vim-4.0
...

For reference the complete script:⁠[133]

#!/bin/sh

while read ar; do
    [ -f "$ar" ] || { echo "not a file: $ar" >&2; exit 1; }
    readlink -f "$ar"
done |
while read archive; do
    dir="$(mktemp -d /dev/shm/fi.XXXXXXXX)"
    version="$(basename $archive | sed _s/\.tar\.gz$//_)"
    mod="$(stat -c %Y $archive) +0200"
    now="$(date +%s) +0200"
    msg="import $version"

    cd "$dir" &&
    tar xfz "$archive" &&
    echo "commit refs/heads/master" &&
    echo "author Bram Moolenaar <bram@vim.org> $mod" &&
    echo "committer Julius Plenz <julius@plenz.com> $now" &&
    echo -n "data " && echo -n "$msg" | wc -c && echo "$msg" &&
    echo "deleteall" &&
    find . -type f |
    while read f; do
        echo -n "M 644 inline "
        echo "$f" | sed -e _s,^\./[^/]*/,,_
        echo -n "data " && wc -c < "$f" && cat "$f"
    done &&
    echo
    rm -fr "$dir"
done

As soon as the version history is a bit more complicated, the commands mark, from and merge become particularly interesting. By using mark you can assign an ID to any objects (commits or blobs) in order to access them as “named objects” and not always have to specify the data inline. The commands from and merge define the predecessor(s) of a commit, so that even complicated interdependencies between branches can be displayed. For more details see the man page.


117. https://git.wiki.kernel.org/index.php/Interfaces,_frontends,_and_tools#Interaction_with_other_Revision_Control_Systems
118. http://rsvndump.sourceforge.net/
119. If there exist several directories, which contain branches and/or tags, you specify them by several arguments -t or -b.
120. If you did not specify a trunk per -T or --stdlayout during conversion, a single branch called remote/git-svn will be generated.
121. The script is included in the script collection for this book. See: https://github.com/gitbuch/buch-scripte.
122. Basically you can also perform these operations directly with the command mv below .git/refs/. However, the plumbing commands make it possible to handle “exotic” cases like “Packed Refs” or references that are symlinks correctly. In addition, git update-ref writes corresponding entries in the reflog and issues error messages if something goes wrong. See also Sec. 8.3, “Writing Your Own Git Commands”.
123. You can also find this script in the script collection: https://github.com/gitbuch/buch-scripte.
124. https://github.com/nothingmuch/git-svn-abandon
125. https://gist.github.com/hartwork/fa275bedf8c2addeeb57
126. https://web.archive.org/web/20160118021532/http://gitorious.org/svn2git/svn2git
127. In the Git-via-Git repository under contrib/svn-fe
128. Compare the command: svn copy trunk tags/v2.0
129. Compare the Subversion command: svn merge -r 23:25 branches/feature trunk
130. For detailed technical documentation, see the git-fast-import(1) man page.
131. You can use the --date-format option to allow other date formats if required.
132. Although this leads to a little more computing effort, it simplifies the structure of the import program considerably. From the point of view that import software is usually rarely executed and time does not play a critical role, this approach makes sense.
133. The script is available as part of our script collection at https://github.com/gitbuch/buch-scripte.