Sharing Git repositories on Dreamhost with InDefero

This document explains how to install the InDefero project management system on a shared host like Dreamhost. They should also be useful for any other self-contained deployment with minor adjustments, as I’ve tried to explain in detail a number of points that weren’t too clear in the original documentation. While these notes focus on Git, InDefero also supports Mercurial and SVN backends; the installation part of this document may still be useful to you even if you intend to use either of those backends.

I wanted to have a way of hosting my own Git repositories and projects for collaboration with colleagues. While I already use github for open source projects, I wanted an easy solution for private collaboration. For this type of work I don’t need all the bells and whistles of Github, my requirements are pretty simple:

  • Hosting multiple git repositories.
  • Open source code I can run on my own server.
  • The ability to add users and have different users on different projects, without needing to give each user a real shell account on the server.
  • Basic code management features: source viewing, change log and commits.
  • A few project documentation and ticket managemet features are welcome, but not a deal breaker.

I looked around quite a bit, and InDefero seemed to fit the bill best. The various *Forge alternatives are enormous beasts, from what I found online Trac’s git support may still be a bit flaky (I found discouraging comments about the plugin), Redmine looks neat but it requires a full Rails stack, which I’m not really interested in getting into, and gitosis may be just a tad too bare bones for what I wanted. But InDefero kept appearing as a good balance of the above, so I decided to give it a try. I’d summarize it as a lightweight, google-code-like project management system whose git backend design is inspired by gitosis. It also supports Hg and SVN backends, though I’m not interested in those and won’t discuss their configuration here (I turned them off).

So far I have been very satisfied in that I think it does precisely what I need. There are one or two minor things I’d like to have that I miss from Trac, but in the end they are all sugar I can live without, since they involve git history and log manipulations I can easily do with a local client.

The one thing that wasn’t too easy was the installation: the documentation isn’t the best, the installation instructions are scattered across various documents, and a number of key steps are not explained at all (not even mentioned), so it took me a fair amount of time to get the whole thing working. I am putting together these notes in the hope they save someone else time, and the InDefero team is welcome to use any or all of this in their own documents. I estimate that with a document like this one, you should be able to start from scratch and configure an entire setup in 2-4 hours, assuming you are comfortable working in a Linux command-line environment and are used to building software from source.

Please let me know of any inaccuracies in this document so I can correct them for the benefit of others.

Preliminaries

The key sets of instructions that this document complements are:

Hopefully if I missed something, the above should let you figure out the whole thing.

For everything below, I have set the variable:

PREFIX=$HOME/usr/local

and I have a set of bash utilities that configure my various *PATH variables so this is a valid installation prefix for me, so all installs will be done using --prefix=$PREFIX. This means that the following path variables are all properly configured:

PATH: binary execution
LD_LIBRARY_PATH: dynamic linker search path
LIBRARY_PATH: static linking by gcc (like -L)
CPATH: generic include path for gcc (like -I), used for all languages
C_INCLUDE_PATH: C-specific include path, after CPATH
CPLUS_INCLUDE_PATH: C++-specific include path, after CPATH
PYTHONPATH: search path for python packages

I also use a little python script called unpack that knows how to unpack zip, tar, gzip, bz2 and other formats with a single call, so I don’t have to constantly remember what to type. If you don’t use unpack, simply substitute the appropriate tar/unzip commands as needed.

Dreamhost configuration

This section contains the various things you need to do on the Dreamhost account, some of them via the control panel, some locally on the shell. Basically, you will need:

  • A custom domain for your InDefero install, along with a set of directories for it.
  • A MySQL database.
  • A new user whose purpose will be only to manage the Git repositories, and who will receive the SSH keys of your InDefero users.

We’ll now see these in detail.

Domain configuration and file layout

Since on a shared host I can’t install in /home/www as the default instructions suggest, I made a separate directory for the InDefero installation, along with a sub-domain pointing to it. In these instructions, I will use example.com as the generic domain and site for the subdomain hosting the InDefero site, substitute your values accordingly.

If the main site is called http://example.com, then I created a subdomain in the Dreamhost panel called http://site.example.com, wich was pointed to the directory $HOME/example.com/site/www as the serving directory. The extra www subdirectory is meant to host the actual public files (akin to /var/www in a non-shared host), while $HOME/example.com/site holds the various files of the InDefero install, git repositories, etc. Specifically, in my case, the layout of $HOME/example.com/site is:

drwxrwxr-x 3 2009-11-30 18:31 git_repos/
drwxrwxr-x 2 2009-11-28 21:51 idf_upload/
drwxrwxr-x 2 2009-11-28 21:51 idf_upload_attach/
lrwxrwxrwx 1 2009-11-28 17:14 indefero -> indefero-0.8.8/
drwxrwxr-x 8 2009-11-29 21:38 indefero-0.8.8/
drwxrwxr-x 3 2009-12-01 03:41 mysql-dumps/
lrwxrwxrwx 1 2009-11-28 23:50 pluf -> pluf-master-091126/
drwxrwxr-x 7 2009-11-28 17:14 pluf-master-091126/
drwxrwxr-x 2 2009-11-28 23:37 www/
drwxrwxr-x 3 2009-11-30 18:00 tmp/

The www/ directory where files actually get served from only contains:

.htaccess -> ../indefero/www/.htaccess
favicon.gif
favicon.ico
index.php -> ../indefero/www/index.php
media -> ../indefero/www/media/

A local directory is needed for the actual git repositories. The InDefero suggested layout can’t be used on a shared host, which is why there is a git_repos subdirectory shown above. This will be put in the InDefero config file later.

The mysql-dumps directory will be used to store backups of our database, as explained below.

Note

I kept the main indefero and pluf directories as symlinks to the unpacked versions I downloaded, to make upgrades easier by repointing a symlink once things are tested to work (and to make backing off a problem a little easier). I also created a git repo for each of indefero and pluf upon download, so that I can track my local changes as patches on git, which hopefully will make it easiet to upgrade by showing me precisely what I changed from the default install.

MySQL

One thing the instructions didn’t mention even in passing, is the separate MySQL configuration steps required. This may be common knowledge for someone used to PHP, but it wasn’t for me. Using the Dreamhost panel, I made an SQL database with:

User: USER
Host: mysql.site.example.com
DB  : examplecom_site

For reference, the local login command is:

mysql -u USER -p -h mysql.site.example.com examplecom_site

Local user for Git management

We need a user to manage the git transactions. All tutorials I’ve found suggest the creation of a dedicated user called git. On dreamhost this username is already taken, so I made another user (call it git2), and also created a custom group to which both git2 and my normal user will belong. As long as this information is given to the proper InDefero config variables, the actual name of the user is irrelevant.

In the git user’s home directory, don’t forget to make the .ssh directory with the proper permissions and make an empty authorized_keys file. The InDefero instructions for the SyncGit plugin explain this, but they assume you have sudo access. On a shared host this isn’t the case, so you must do it manually by logging in as the new user, and then running the rest of the commands. For reference (substitute git2 with the name of your git user):

su - git2  # become the new user manually
cd
mkdir .ssh
touch .ssh/authorized_keys
chmod 0700 .ssh
chmod 0600 .ssh/authorized_keys
exit

Note

On the Dreamhost panel, when creating the new user, do not select “enhanced security”, because we need this new user to be able to share a group with the normal user, and if I understand correctly, “enhanced security” would lock down the new user too much.

Installing all the prerequisites

OpenSSL and Curl (for Git)

As suggested by this post, I built OpenSSL and Curl, as they provide some extra functionality to the Git we’ll build (the one on Dreamhost is very old). In my case they may not have been 100% necessary, as right now I don’t intend to have my InDefero repositories pulling, but it’s easy enough to do as part of the whole build. They are perfectly straightforward. First, the latest openssl:

wget http://www.openssl.org/source/openssl-0.9.8l.tar.gz
unpack openssl-0.9.8l.tar.gz
cd openssl-0.9.8l
./config shared zlib --prefix=$PREFIX
make
make install

And similarly for Curl:

wget http://curl.haxx.se/download/curl-7.19.7.tar.gz
unpack curl-7.19.7.tar.gz
cd curl-7.19.7/
./configure --prefix=$PREFIX --with-ssl=$PREFIX
make
make install

If you have symbol version problems with OpenSSL, you may find the following notes by Jacobo de Vera useful (thanks for the contribution!):

The upstream version of OpenSSL does not include versioning symbols, since this causes problems when more than one version of a library is present, the versions of openSSL that are shipped with the main distros are patched to include versioning symbols. One of the symptoms of this is having those error messages output whenever a tool that uses OpenSSL is executed:

$ php
php: /home/user/run/lib/libssl.so.0.9.8: no version information
available (required by php)
php: /home/user/run/lib/libcrypto.so.0.9.8: no version information
available (required by php)
php: /home/user/run/lib/libcrypto.so.0.9.8: no version information
available (required by /usr/lib/libc-client.so.2002edebian)
php: /home/user/run/lib/libssl.so.0.9.8: no version information
available (required by /usr/lib/libc-client.so.2002edebian)
php: /home/user/run/lib/libssl.so.0.9.8: no version information
available (required by /usr/lib/libcurl.so.3)
php: /home/user/run/lib/libcrypto.so.0.9.8: no version information
available (required by /usr/lib/libcurl.so.3)

What I have done to avoid this, since I compiled my own OpenSSL in order to build a newer version of GIT (to replace the default one provided by Dreamhost), is to follow the steps that distros do, described in a comment here:

> http://rt.openssl.org/Ticket/Display.html?id=1222&user=guest&pass=guest
> For that to happen I introduced a version script openssl.ld with the
> following contents:
>
> OPENSSL_0.9.8 {
> global:
> *;
> };
>
> It has to be in the toplevel directory and in the engines directory.
>
> The SHARED_LDFLAGS get the additional options
> -Wl,--version-script=openssl.ld

SHARED_FLAGS is a variable in the Makefile that is generated by config. After installing this version, the errors disappear.

Git

The dreamhost wiki page on git has more details, including the NO_MMAP suggestion to prevent dreamhost from killing git processes that access large files via mmap (this triggers a false positive on their automatic memory police). In my case, I built v1.6.5.3. After unpacking the sources, I used:

./configure --prefix=$PREFIX --with-openssl=$PREFIX --with-curl=$PREFIX
make NO_MMAP=1 install

Note that you must give the NO_MMAP flag in the install step, else git will get rebuilt if you only give it in the make step and then try to run a simple make install.

PEAR and PHP tools

The indefero docs put this later, but to be 100% sure that all subsequent pear/php commands run using the proper versions, I think it’s safest to first set up the environment by putting this into the bashrc file and reloading:

# PEAR/PHP install at dreamhost
export PHP_PEAR_PHP_BIN=/usr/local/php5/bin/php
export PATH=$HOME/usr/pear:/usr/local/php5/bin:$PATH

Now, we can do a local pear install. It seems pear also needs some caching directories, and I don’t know enough about it to be sure it’s safe to have the caching directories below the root pear path, so I’m keeping them separate. I made the following directories:

mkdir -p ~/usr/var/pear/cache
mkdir -p ~/usr/var/pear/temp

~/usr/pear will be the root pear tree, and ~/usr/var will hold server-style data in a single location, and will use that for the PEAR temporary directories. The indefero installation instructions suggest using ~/tmp/pear, but I don’t like keeping anything that I can’t simply destroy on ~/tmp, so I used this layout instead.

Now I can create the pear config:

pear config-create ~/usr/ ~/.pearrc
pear config-set download_dir ~/usr/var/pear/cache/
pear config-set cache_dir ~/usr/var/pear/cache/
pear config-set temp_dir ~/usr/var/pear/temp/

With this configured, I can now run the install and it all worked fine:

pear install -o PEAR
pear install --alldeps Mail
pear install --alldeps Mail_mime

A quick check gives me:

[usr]> pear list
INSTALLED PACKAGES, CHANNEL PEAR.PHP.NET:
=========================================
PACKAGE          VERSION STATE
Archive_Tar      1.3.3   stable
Auth_SASL        1.0.3   stable
Console_Getopt   1.2.3   stable
Mail             1.1.14  stable
Mail_Mime        1.5.2   stable
Mail_mimeDecode  1.5.0   stable
Net_SMTP         1.3.4   stable
Net_Socket       1.0.9   stable
PEAR             1.9.0   stable
Structures_Graph 1.0.3   stable
XML_Util         1.2.1   stable

The above worked for me, but Jacobo notes that he actually needed to do a full PEAR install, his notes are below in case you also need a from-scratch PEAR build (still assuming the PATH configuration listed above):

The version of PEAR that is currently available in DreamHost is so old that Archive_Tar, which is a dependency of the newer PEAR, will not accept to be installed.

I wanted a clean PEAR installation, like the ones resulting from running pear install -o PEAR, but, ironically, first I needed a newer PEAR :)

I created a temp directory ~/tmpear and then ran this:

cd ~/tmpear
curl http://pear.php.net/go-pear | php

Said yes to everything but changed the installation prefix to be $HOME/tmpear. Once that was finished, ran this:

$HOME/tmpear/bin/pear config-create $HOME/usr .pearrc
$HOME/tmpear/bin/pear install -of PEAR

Notice the -f flag, otherwise it won’t reinstall it:

rm -rf $HOME/tmpear

I then installed the rest of the needed packages:

pear install -a Mail
pear install -a Mail_Mime

Install and configure Pluf/InDefero

Once I had the file layout ready, for the actual installation of Pluf and Indefero, I followed the instructions as listed in part 3 of the InDefero Dreamhost/Mercurial instructions pretty much to the letter. That section describes fairly well the changes needed to the generic InDefero install (explained here).

This is the part where most of the work goes, in editing the configuration of the conf/idf.php file (along with a few changes to path.php)

In the conf/idf.php file, I created this block of variables that summarizes most of my configuration:

# fperez - variables
$fp = 'example.com';
$fp_home = '$HOME';
$fp_site = '$HOME/example.com/site';
$fp_git_user_home = '/home/git2';
$fp_git_repos = "$fp_site/git_repos";
$fp_site_url = 'site.example.com';
$fp_mail_user = 'nobody@nowhere';
$fp_db_login = 'USER';
$fp_db_password = '???';
$fp_db_server = "mysql.$fp_site_url";
$fp_database = 'examplecom_site';

Then, with those variables I constructed the values for everything below in the actual file, minimizing repetition of paths and making the whole thing a bit easier to understand (for me).

In particular, don’t forget that the MySQL information must then be properly put into the php configuration file also:

# Database configuration
$cfg['db_login'] = $fp_db_login;
$cfg['db_password'] = $fp_db_password;
$cfg['db_server'] = $fp_db_server;
$cfg['db_version'] = '5.0'; # Only needed for MySQL
$cfg['db_table_prefix'] = 'indefero_';
$cfg['db_engine'] = 'MySQL';
$cfg['db_database'] = $fp_database;

A few other configuration variables that rely on the ones above, and on the directory layout previously explained for our site:

$cfg['url_upload'] = "http://$fp_proj_url/media/upload";
$cfg['upload_path'] = "$fp_proj/idf_upload";
$cfg['upload_issue_path'] = "$fp_proj/idf_upload_attach";
$cfg['tmp_folder'] = "$fp_proj/tmp";
$cfg['pear_path'] = "$fp_home/usr/pear/php";
$cfg['git_path'] = "$fp_home/usr/local/bin/git";

Initialize Pluf and InDefero

Once the db information above is correctly entered into the php config, the following should work, executed in the indefero/src directory:

$ php ../../pluf/src/migrate.php --conf=IDF/conf/idf.php -a -i -d
PHP include path: $HOME/usr/pear/php:.:/usr/local/php5/lib/php:/usr/local/lib/php:$HOME/example.com/site/pluf-master/src
Install all the apps
Pluf_Migrations_Install_setup
IDF_Migrations_Install_setup

Next, run the boostrap script to create the first user. Once that’s working, use this .htaccess file:

Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*) /index.php?_pluf_action=$1

to get shorter urls for projects. Note: the last line is different from that on the website, and this is the correct one (from a message on the mailing list by the author).

You should now have a running installation, try it out by creating a new project. Enjoy!

Note

By default, InDefero does not create empty repositories on the server, nor is there an option to do so. The recommended workflow is simply to create the project on the server, then make a local repository and push to the InDefero host (the ‘Source’ tab for each project has nice copy/paste instructions for this).

New users

InDefero is meant as a public forge, but in my case I don’t actually need outsiders to create new accounts, and in fact I don’t want the functionality. I will create new accounts manually only for collaborators I am going to work with, and this is easily done by running again the bootstrap script with different user information. These users can then change their password via the web interface to whatever they want.

I actually disabled new account creation by simply commenting out from src/IDF/templates/idf/login_form.html the “I am new here” entry that normally leads to the new account page. Just surround the relevant line with {* and *} comment markers:

> git diff HEAD~2 login_form.html
diff --git a/src/IDF/templates/idf/login_form.html b/src/IDF/templates/idf/login
_form.html
index 624d613..93d5566 100644
--- a/src/IDF/templates/idf/login_form.html
+++ b/src/IDF/templates/idf/login_form.html
@@ -10,8 +10,9 @@
 <p><label for="id_login">{trans 'My login is'}</label> <input type="text" name=
"login" id="id_login" value="{$login}" /></p>

 <h3>{trans 'Do you have a password?'}</h3>
-<p><input name="action" id="action-new-user" value="new-user" type="radio" /> <
label for="action-new-user">{trans 'No, I am a new here.'}</label></p>
-
+{*
+<p><input name="action" id="action-new-user" value="new-user" type="radio" /> <
label for="action-new-user">{trans 'No, I am a new here.'}</label></p> >
+*}
 <p><input name="action" id="action-login" value="login" type="radio" checked="c
hecked" /> <label for="action-login">{trans 'Yes'}</label>, <label for="id_passw
ord">{trans 'my password is'}</label> <input type="password" name="password" id=
"id_password" /></p>

 <p><input type="submit" value="{trans 'Sign in'}" />

If you want to really disable creation in full, you should also replace the indefero/src/IDF/templates/idf/register/index.html template with a mostly empty page, since otherwise people can still just navigate to the site.example.com/register url and will get the registration page. This is what I did for my actual site.

If you make changes to the html templates, remember to flush the temporary and cache directories to force a refresh of the public pages.

SSH key synchronization and security

One small nugget that is stated in the InDefero Git docs but is not very clearly explained is how new users are given permissions to the repositories. This requires two things: a little cron job run by the special git user and understanding how the keys are managed.

Your InDefero users do not have shell access to your server; in order to use the repositories they must upload their SSH public key through the web interface. Every time a user’s SSH key is uploaded, InDefero leaves a little temporary file (its name is stored as $cfg['idf_plugin_syncgit_sync_file']) and InDefero ships with a php script that detects this file and syncs the SSH key from the database over to the ~/.ssh/authorized_keys file of the special git user. You can run this script manually to sync users, and it’s a good idea to leave it as a cron job in case users update their SSH keys later; the script is indefero/scripts/gitcron.php.

An important point is that when these keys are uploaded, they do not give your InDefero users unrestricted login access, as this would defeat the isolation between projects that InDefero offers. Their SSH keys are saved as authorized, but only to run a single command, a little python script called indefero/scripts/gitserve.py that checks that user’s permissions in the database, and only gives them access to the repositories consistent with those permissions. This ensures that the special git user is not a security hole that would allow one user who knew the path to another repository he’s normally not allowed access, to read it bypassing the web interface. Many thanks to Loic D’Anterroches, the InDefero project lead, for clarifying this point.

Backing things up

OK, you now have a system up and running. How do you back it up? Most of the state of the project lives on the file system (and can thus be backed up with a simple rsync call), except for the SQL database. A simple way to handle this is to back it up manually once, and then to store this backup into a git repository. We can then run a cron job on the server that periodically runs a backup again, and then commits the changes to the repo. By storing an uncompressed dump of the backup, we make it easy for git to compute diffs and to later compress the entire repository efficiently.

We start by running a manual backup once:

cd mysql-dumps
mysqldump --opt -uUSER -pPASSWD -h HOSTNAME DBNAME > DUMPFILE

with this in place, we can now initialize the git repo:

git init
git add DUMPFILE
git ci -m"Initializing: repo to hold backups of SQL database for InDefero site."

And now, we can add a cron job that runs every night a script like:

#!/bin/bash
# Dump a backup of mysql database and store it in git repo.

#######################################################
# Configure here with your information
user=YOUR_SQL_USERNAME
hostname=YOUR_SQL_HOSTNAME
passwd="????"
dbname=YOUR_DATABASE_NAME
outdir=$HOME/example.com/site/mysql-dumps

# Make sure we use our own git
git=$HOME/usr/local/bin/git

#######################################################
# Code below
dumpfile=$dbname.sql

cd $outdir
mysqldump --opt -u$user -p$passwd -h $hostname $dbname > $dumpfile
# Store the history in git itself.
$git commit -a -m"Automated backup"
# Run gc every time to compact repo and save space
echo "Optimizing repository"
$git gc

Since this script has to hold your SQL password in plain text, make it read-execute only for your user, and don’t use an important password there. Alternatively, if you want to play it safer, you can take the password as an argument and initiate the backup process remotely over SSH, from a trusted host. For my purposes this is sufficient.

Once the SQL database is nicely backed up in our site directory, the entire project state consists of plain files, and we can simply rsync it nightly to a remote host for off-site backup.

That’s it. Every night the SQL database is backed up, and git gives us a revision history that is also very space efficient, as the gc step ensures that days with no real changes don’t take any extra space on disk (I tested this). A regular rsync off-site ensures that I have the entire site state and history safely stored, should anything happen at Dreamhost.

Final comments

So far I think InDefero does what I need it to. I hope to clarify a few small questions I have on the list (the author has been very responsive to my queries so far), but I think I’ll stick with it.

A few final points that I did not cover in these notes but that you may need in your own setup:

Email

I did not configure email delivery, as I only expect to make a few new users and I will do it by hand. Jacobo notes that if you eliminate from the IDF config file all email-related options, then the PEAR Mail module’s defaults should work; I haven’t tested this myself.

Git-daemon

This is mentioned in the last step of the official instructions, but the basic Dreamhost plan does not allow me to run daemons. However, git-daemon is only needed if you want to provide anonymous access to your repositories. This is not my case (I use github.com for all my public code), so I didn’t look further into this topic.