Git Book — 1. Introduction and First Steps

1. Introduction and First Steps

The following chapter provides a concise introduction to the basic concepts and configuration settings of Git. A small sample project shows how to put a file under version control with Git, and the commands you use to perform the most important tasks.

1.1. Basic Terminology

Some important technical terms will be used repeatedly in the following and therefore require a brief explanation. If you have experience with another version control system, you will be familiar with some of the concepts involved, though perhaps under a different name.

Version Control System (VCS): A system for managing and versioning software or other digital information. Prominent examples are Git, Subversion, CVS, Mercurial (hg), Darcs and Bazaar. Synonyms are Software Configuration Management (SCM) and Revision Control System.

We distinguish between centralized and distributed systems. In a centralized system, such as Subversion, there must be a central server where the history of the project is stored. All developers must connect to this server to view the version history or make changes. In a distributed system like Git, there are many equivalent instances of the repository, so each developer has their own repository. The exchange of changes is more flexible, and does not necessarily take place through a central server.

Repository: The repository is a database where Git stores the different states of each file in a project over time. In particular, every change is packaged and saved as a commit.

Working Tree: The working directory of Git (sometimes called sandbox or checkout in other systems). This is where you make all modifications to the source code. It’s often called the Working Directory.

Commit: Changes to the working tree, such as modified or new files, are stored in the repository as commits. A commit contains both these changes and metadata, such as the author of the changes, the date and time, and a commit message that describes the changes. A commit always references the status of all managed files at a particular point in time. The various Git commands are used to create, manipulate, view, or change the relationships between commits.

HEAD: A symbolic reference to the newest commit in the current branch. This reference determines which files you find in the working tree for editing. It is therefore the “head” or tip of a development branch (not to be confused with HEAD in systems like CVS or SVN).

SHA-1: The Secure Hash Algorithm creates a unique 160 bit checksum (40 hexadecimal characters) for any digital information. All commits in Git are named after their SHA-1 sum (commit ID), which is calculated from the contents and metadata of the commit. It is, so to speak, a content-dependent version number, such as f785b8f9ba1a1f5b707a2c83145301c807a7d661.

Object model: A git repository can be modeled as a graph of commits, manipulated by git commands. This modeling makes it very easy to describe how Git works in detail. For a detailed description of the object model, see Sec. 2.2, “The Object Model”.

Index: The index is an intermediate level between the working tree and the repository, where you prepare a commit. The index therefore indexes which changes to which files you want to package as commits. This concept is unique to Git and often causes difficulties for beginners and people switching to Git. We discuss the index in detail in Sec. 2.1.1, “Index”.

Clone: When you download a Git repository from the Internet, you create a clone of that repository. The clone contains all the information contained in the source repository, especially the entire version history including all commits.

Branch: A branch in the development. Branches are used in practice, for example, to develop new features, prepare releases, or to provide old versions with bug fixes. Branches are — just like the merging of branches (Merge) — extremely easy to handle in Git and an outstanding feature of the system.

master: Because you need at least one branch to work with Git, the Branch master is created when you initialize a new repository. The name is a convention (similar to trunk in other systems); you can rename or delete this branch as you wish, as long as at least one other branch is available. The master is technically no different from other branches.

Tag: Tags are symbolic names for hard-to-remember SHA-1 sums. You can use tags to mark important commits, such as releases. A tag can simply be an identifier, such as v1.6.2, or it can contain additional metadata such as author, description, and GPG signature.

1.2. First Steps with Git

To get you started, we’ll use a small example to illustrate the workflow with Git. We create a repository and develop a one-liner, a “Hello, World!” program in Perl.

In order for Git to assign a commit to an author, you need to enter your name and email address:

$ git config --global user.name "John Doe"
$ git config --global user.email "john.doe@example.com"

Note that a subcommand is specified when Git is called, in this case config. Git provides all operations through such subcommands. It is also important that no equal sign is used when calling git config. The following call is therefore incorrect:

$ git config --global user.name = "John Doe"

This is a trip hazard, especially for beginners, because Git does not output an error message, but takes the equals sign as the value to set.

1.2.1. Our First Repository

Before we use Git to manage files, we need to create a repository for the sample project. The repository will be created locally, so it will only be on the file system of the machine you are working on.

It’s generally recommended that you practice using Git locally first, and only later dive into the decentralized features and functions of Git.

$ git init example
Initialized empty Git repository in /home/esc/example/.git/

First, Git creates the directory example/ if it doesn’t already exist. Git then initializes an empty repository in this directory and creates a subdirectory .git/ for it, which is used to manage internal data. If the example/ directory already exists, Git creates a new Git repository in it. If both the directory and a repository already exist, Git does nothing. We change to the directory and look at the current state with git status:

$ cd example
$ git status
On branch master

Initial commit

nothing to commit (create/copy files and use "git add" to track)

Git tells us that we’re about to commit (Initial commit), but hasn’t found anything to commit (nothing to commit). Instead, it gives a hint as to what the next steps should be (most Git commands do that, by the way): “Create or copy files, and use git add to manage them with Git.”

1.2.2. Our First Commit

Now let’s give Git a first file to manage, which is a “Hello World!” program in Perl. Of course, you can write any program in the programming language of your choice instead.

We’ll first create the hello.pl file with the following content

print "Hello World!\n";

and execute the script once:

$ perl hello.pl
Hello World!

That means we’re ready to manage the file with Git. But let’s take a look at the output of git status first:

$ git status
On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

      hello.pl
nothing added to commit but untracked files present (use "git add" to track)

While the first commit is still pending, Git registers that there are already files in that directory, but the system is unaware of them — Git calls them untracked. This is, of course, our little Perl program. To manage it with Git, we use the command git add <file>:

$ git add hello.pl

The add generally stands for “add changes” — so you will need it whenever you have edited files, not just when you first add them!

Git doesn’t provide output for this command. Use git status to check if the call was successful:

$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

      new file:   hello.pl

Git will apply the changes — our new file — at the next commit. However, this commit is not yet complete — we’ve only prepared it so far.

To be precise, we’ve added the file to the Index, an intermediate stage where you collect changes that will be included in the next commit. For further explanation of this concept, see Sec. 2.1.1, “Index”.

With git status, under Changes to be committed, you can always see which files are in the Index, i.e., will be included in the next commit.

Everything is ready for the first commit with the git commit command. We also pass the -m option on the command line with a commit message describing the commit:

$ git commit -m "First version"
[master (root-commit) 07cc103] First version
 1 file changed, 1 insertion(+)
 create mode 100644 hello.pl

Git will confirm that the process has been successfully completed and the file will be managed from now on. The somewhat cryptic output means Git has created the initial commit (root-commit) with the appropriate message. A line has been added to a file, and the file has been created with Unix permissions 0644.⁠^[8]

As you’ve no doubt noticed by now, git status is an indispensable command in your daily work — we’ll use it again here:

$ git status
On branch master
nothing to commit, working directory clean

Our sample repository is now “clean”, because there are no changes in the Working Tree or Index, nor are there any files that are not managed with Git (untracked files).

1.2.3. Viewing Commits

To conclude this brief introduction, we’ll introduce you to two very useful commands that you’ll often use to examine the version history of projects.

First, git show allows you to examine a single commit — it’s the most recent one, with no arguments:

$ git show
commit 07cc103feb393a93616842921a7bec285178fd56
Author: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Tue Nov 16 00:40:54 2010 +0100

    First version

diff --git a/hello.pl b/hello.pl
new file mode 100644
index 0000000..fa5a091
--- /dev/null
+++ b/hello.pl
@@ -0,0 +1 @@
+print "Hello World!\n";

You see all relevant information about the commit: the commit ID, the author, the date and time of the commit, the commit message, and a summary of the changes in Unified-Diff format.

By default, git show always prints the HEAD (a symbolic name for the most recent commit), but you could also specify, for example, the commit ID, which is the SHA-1 checksum of the commit, a unique prefix to it, or the branch (master in this case). Thus, the following commands are equivalent in this example:

$ git show
$ git show HEAD
$ git show master
$ git show 07cc103
$ git show 07cc103feb393a93616842921a7bec285178fd56

If you want to view more than one commit, git log is recommended. More commits are needed to demonstrate the command in a meaningful way; otherwise, the output would be very similar to git show, since the sample repository currently contains only a single commit. So let’s add the following comment line to the “Hello World!” program:

# Hello World! in Perl

For the sake of the exercise, let’s take another look at the current status with git status:

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working
   directory)

      modified:   hello.pl

no changes added to commit (use "git add" and/or "git commit -a")

After that, as already described in the output of the command, use git add to add the changes to the index. As mentioned earlier, git add is used both to add new files and to add changes to files already managed.

$ git add hello.pl

Then create a commit:

$ git commit -m "Comment line"
[master 8788e46] Comment line
 1 file changed, 1 insertion(+)

Now git log shows you the two commits:

$ git log
commit 8788e46167aec2f6be92c94c905df3b430f6ecd6
Author: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Fri May 27 12:52:58 2011 +0200

    Comment line

commit 07cc103feb393a93616842921a7bec285178fd56
Author: Valentin Haenel <valentin.haenel@gmx.de>
Date:   Tue Nov 16 00:40:54 2010 +0100

    First version

1.3. Configuring Git

Like most text-based programs, Git offers a wealth of configuration options. So now’s the time to do some basic configuration. These include color settings, which are turned on by default in newer versions, to make it easier to capture the output of Git commands, and small aliases (abbreviations) for frequently needed commands.

You configure Git with the git config command. The configuration is saved in a format similar to an INI file. Without specifying further parameters, the configuration applies only to the current repository (.git/config). With the --global option, it is stored in the .gitconfig file in the user’s home directory, and is then valid for all repositories.⁠^[9]

Important settings that you should always configure are the user name and e-mail address:

$ git config --global user.name "John Doe"
$ git config --global user.email "john.doe@example.com"

Note that you must protect spaces in the setting value (using quotation marks or backslashes). Also, the value follows the name of the option directly — an equal sign is not necessary here either. The result of the command can be found in the file ~/.gitconfig:

$ less ~/.gitconfig
[user]
    name = John Doe
    email = john.doe@example.com

The settings are now “global”, meaning they apply to all repositories you edit under that user name. If you want to specify an e-mail address other than your globally defined one for a particular project, simply change the setting there (this time, of course, without adding --global):

$ git config user.email maintainer@project.example.com

When querying an option, Git will first use the setting in the current repository if it exists, otherwise the one from the global .gitconfig; if this does not exist either, it will fall back to the default value.⁠^[10] The latter is available for all options in the man page git-config. You can get a list of all the settings you have set using git config -l.

You can also edit the .gitconfig file (or the repository .git/config) by hand. This is especially useful for deleting a setting — although git config also offers a --unset option, it is easier to delete the corresponding line in an editor.

The commands git config -e or git config --global -e launch the editor configured for Git on the local or global configuration file.

Note, however, that when you set options with an appropriate command, Git automatically protects problematic characters in the option’s value so that no bad configuration files are created.

1.3.1. Git Aliases

Git offers you the possibility to abbreviate single commands and even whole command sequences via Aliases. The syntax is:

$ git config alias.<alias-name> <command>

To set st as an alias for status:

$ git config --global alias.st status
$ git st
On branch master
...

You can also include options in an alias, for example:

$ git config --global alias.gconfig 'config --global'

You will find more useful aliases later in the book; how to create more complex aliases is described in Sec. 8.3.8, “Extended Aliases”. But first, some useful abbreviations:

[alias]
    st = status
    ci = commit
    br = branch
    co = checkout
    df = diff
    he = help
    cl = clone

1.3.2. Adjusting Colours

Very helpful is the color.ui option, which checks whether Git should color the output of various commands. Thus, deleted files and lines appear red, new files and lines appear green, commit IDs appear yellow, etc. In newer Git versions (1.8.4 and later) this setting is already set automatically, so you don’t need to do anything.

The color.ui option should be set to auto — if the output from Git is to a terminal, colors are used. If the command is written to a file instead, or the output is piped to another program, Git will not output color sequences, as this could interfere with automatic processing.

$ git config --global color.ui auto

1.3.3. Configuring Character Sets

Unless set otherwise, Git assumes UTF-8 as the character encoding for all text, especially author names and the commit message. If you want a different encoding, you should configure it explicitly:⁠^[11]

$ git config i18n.commitEncoding ISO-8859-1

Similarly, the setting i18n.logOutputEncoding determines the character set Git converts names and commit messages to before outputting them.

The encoding of the files managed by Git is not important here and is not affected by these settings — files are only bit streams that Git does not interpret.

If you have to handle files encoded according to ISO-8859-1 in a UTF-8 environment, you should adjust the setting of your pager (see below) accordingly. The following setting is recommended for authors:

$ git config core.pager 'env LESSCHARSET=iso8859 less'

1.3.4. Line End Settings

Since Git runs on Windows systems like it does on unixoid systems, it has to solve the problem of different line-end conventions. (This only affects text files — binaries that Git recognizes as such are excluded from this treatment).

The core.eol setting, which can take one of the values lf, crlf or native, is mainly relevant for this. The default setting native lets Git use the system default — Unix: Line Feed (lf) only, Windows: Carriage Return & Line Feed (crlf). The file is automatically converted to get line feeds only, but is checked out with CRLF if necessary.

Git can convert between the two types when you check out the file, but it’s important not to mix the two. For this, the core.safecrlf option provides a mechanism to warn the user (value warn) or even disallow the commit (value true).

A safe setting, which also works with older Git versions on Windows systems, is to set core.autocrlf to input: This will automatically replace CRLF with LF when reading files from the filesystem. Your editor must then be able to handle LF line endings accordingly.

You can also specify these settings explicitly per file or subdirectory, so that the format is the same across all platforms (see Sec. 8.1, “Git Attributes — Treating Files Separately”).

1.3.5. Editor, Pager and Browser Settings

Git automatically starts an editor, pager, or browser for certain actions. Usually reasonable defaults are used, but if not, you can configure your preferred program with the following options:

core.editor
core.pager
web.browser

A word about the pager: By default, Git uses the less program, which is installed on most basic systems. The command is always started whenever a Git command produces output on a terminal. However, less is automatically configured by an environment variable to quit when the output is completely fit on the terminal. So, if a command produces a lot of output, less will automatically come to the foreground — and remain invisible otherwise.

If core.pager is set to cat, Git will not use a pager. However, this behavior can be achieved from command to command using the --no-pager parameter. In addition, you can use git config pager.diff false to ensure that the output of the diff command is never sent to the pager.

1.3.6. Configuration via Environment Variables

Some options can also be overridden by environment variables. In this way, options can be set in a shell script or alias for a single command only.

GIT_EDITOR: the editor that Git starts, for example, to create the commit message. Alternatively, Git uses the EDITOR variable.

GIT_PAGER: the pager to be used. The value cat switches the pager off.

GIT_AUTHOR_EMAIL, GIT_COMMITTER_EMAIL: uses the appropriate email address for the author or committer field when creating a commit.

GIT_AUTHOR_NAME, GIT_COMMITTER_NAME: analogous to the name.

GIT_DIR: Directory in which the Git repository is located; only makes sense if a repository is explicitly stored under a directory other than .git.

The latter variable is useful, for example, if you want to access the version history of another repository within a project without changing directory:

$ GIT_DIR="~/proj/example/.git" git log

Alternatively, you can use the -c option before the subcommand to overwrite a setting for this call only. For example, you could tell Git to disable the core.trustctime option for the upcoming call:

$ git -c core.trustctime=false status

1.3.7. Automatic Error Correction

The value of the help.autocorrect option determines what Git should do if it can’t find the subcommand you entered, for example if you accidentally type git statsu instead of git status.

If the option is set to a number n greater than zero and Git only finds a subcommand similar to the typed command, this command is executed after n tenths of a second. A value of -1 executes the command immediately. Unset or with the value 0, only the possibilities are listed.

So to correct a typo after one second, set:

$ git config --global help.autocorrect 10
$ git statsu
WARNING: You called a Git command named 'statsu', which does not exist.
Continuing under the assumption that you meant 'status'
in 1.0 seconds automatically...
[...]

You can of course cancel the command during this time with Ctrl+C.

8. Even if you follow the example exactly, you will not get the same SHA-1 checksums, since they are calculated from the contents of the commit, the author, and the commit time, among other things.

9. Alternatively, you can store the user-specific configuration under the XDG-compliant path .config/git/config in your home directory (or relative to your set environment variable $XDG_CONFIG_HOME).

10. If available, settings from /etc/gitconfig are also read in (with lowest priority). You can set options in this file using the --system parameter, but you need root privileges to do this. Setting git options system-wide is unusual.

11. “i18n” is a common abbreviation for the word “internationalization” — the 18 stands for the number of omitted letters between the first and last letter of the word.

Git: Distributed Version Control for Code and Documents

1. Introduction and First Steps

1.1. Basic Terminology

1.2. First Steps with Git

1.2.1. Our First Repository

1.2.2. Our First Commit

1.2.3. Viewing Commits

1.3. Configuring Git

1.3.1. Git Aliases

1.3.2. Adjusting Colours

1.3.3. Configuring Character Sets

1.3.4. Line End Settings

1.3.5. Editor, Pager and Browser Settings

1.3.6. Configuration via Environment Variables

1.3.7. Automatic Error Correction