Installing DSpace on Debian 5

Lathe operator machining parts for transport planes (LOC)

This post is very similar to my recent post covering installation on CentOS but some of the processes are slightly different for the Debian flavour of Linux. This post should cover everything Debian users need to get up and running with DSpace, although it does not cover configuration of the Handle server.  This is described adequately by the official documentation.

Like the article above, this post is heavily indebted to Clive Gould’s description of the process using DSpace 1.5.2 and CentOS 5.3 though I’ve altered and expanded on a number of elements.

Prerequisites

DSpace has the following software prerequisites for all platforms:

  • Java JDK v5 or later
  • Apache Maven 2.0.8 or later
  • Apache Ant 1.7 or later
  • PostgreSQL database (Oracle is also an option)
  • Jakarta Tomcat (other servlet engines are also an option)
  • Perl 5

Installing on Debian 5

Normally we would assume that you have disabled the root login and created a separate user with sudo permission. However, this is not the standard approach on CentOS machines and complicates the installation process. Therefore the following instructions should be carried out as the root user unless stated. We recommend disabling the root account following the installation.

Installing Prerequisites

System update

First, ensure that the system is up to date:

apt-get update
apt-get dist-upgrade

Note, if you are using the unofficial maintainer’s archive you may need to add their public key using the package in their repository:

apt-get install dmo-archive-keyring

Similarly if you are using the debian multimedia archive you may need to install their key. Download the package from the Debian Multimedia site and run the following command as root:

dpkg -i debian-multimedia-keyring_2008.10.16_all.deb

Create user

DSpace and Tomcat will need to run as the same user so create that user now:

useradd –m dspace
passwd dspace

Apache

Install the apache web server using the standard packages:

aptitude install apache2 apache2.2-common apache2-mpm-prefork apache2-utils libexpat1 ssl-cert

Java SDK

The java SDK needs to be installed using the linux installer downloaded from the Oracle Java site, not the debian package. Download the latest version – jdk-6u22-linux-i586.bin at the time of writing.

To install java we will use two additional packages, fakeroot and java-package.

apt-get install fakeroot java-package

I had to run the above command with --fix-missing to get it to complete. Next run this command as a non-root user:

fakeroot make-jpkg jdk-6u22-linux-i586.bin

If this fails with the error “No matching plugin was found” then you may need to edit one of the configuration files so that java-package recognises that this is a valid java version. If so edit (as root) the file /usr/share/java-package/sun-j2sdk.sh and add the following to the relevant section, which should be obvious. This assumes the version shown above:

	"jdk-6u22-linux-i586.bin") # SUPPORTED
		j2se_version=1.6.0+update${archive_name:6:1}${revision}
		j2se_expected_min_size=130
		found=true
		;;

And re-run the fakeroot command given above.

You should now be able to install the package as root:

dpkg -i sun-j2sdk1.6_1.6.0+update2_i386.deb

PostgreSQL, Ant and Maven

These three prerequisites can be installed from the standard debian packages.

apt-get install postgresql ant maven2

Configure Maven

You will need to create an environment variable for the location of Maven. Add the following lines to /etc/profile to set the Maven home directory environment variable and add Maven binaries to the system path:

export M2_HOME=/usr/share/maven2
export PATH=${PATH}:${M2_HOME}/bin

Log out and back in so that the profile changes are picked up and test the installation by confirming the Maven version:

mvn –version

HTTP Proxy

If you need to run maven through a HTTP proxy you will need to configure this in the central settings.xml file. This should live under the Maven installation directory:

cd $M2_HOME/conf/

Uncomment the proxies section and add your details as illustrated in the highlighted section below:

<settings xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <proxies>
    <proxy>
      <active>true</active>
      <protocol>http</protocol>
      <host>your.proxy.com</host>
      <port>8080</port
      <username>proxyuser</username>
      <password>somepassword</password>
      <nonProxyHosts>localhost</nonProxyHosts>
    </proxy>
  </proxies>
</settings>

The username and password settings are only required if your proxy server requires authentication.

You can test the Maven installation and network connectivity by running:

mvn clean

This will give the message BUILD ERROR because it requires a pom.xml file but the output will indicate whether maven is able to download packages from its repositories.

Configure Ant

HTTP Proxy

Part of the DSpace installation needs to download files from the web. If you are accessing the internet through a proxy you may need to tell ant the proxy details. Edit /etc/profile and add the following lines, filling in the details of your HTTP proxy:

export ANT_HOME=/user/local/ant
export PATH=${PATH}:${ANT_HOME}/bin

Note that the proxy address should contain only the domain name (or IP address) and not include ‘http://’.

Configure PostgreSQL

As per the DSpace installation instructions a couple of changes are needed to the PostgreSQL configuration. Edit the file /etc/postgresql/<version>/main/postgresql.conf and uncomment the line starting:

listen_addresses = 'localhost'

And in the file /etc/postgresql/<version>/main/pg_hba.conf add the line below. Note that this should be the first configuration entry in the list, i.e. before the line that says # “local” is for Unix domain socket connections only:

host dspace dspace 127.0.0.1 255.255.255.255 md5

Restart PostgreSQL to pick up the changes.

/etc/init.d/postgresql-<version> restart

Tomcat

DSpace will run on other servlet engines but for the purposes of this document we will use Apache Tomcat. In order to use Tomcat 6 we will need to download and install it.

Tomcat can be downloaded from the Apache site. The latest stable version at the time of writing is 6.0.26. Unpack it to /usr/local and create a symbolic link to the new folder called tomcat. The name of the directory created after unpacking the Tomcat distribution will be dependent on the version that you downloaded.

ln –s apache-tomcat-6.0.26 tomcat

The DSpace documentation mentions setting the TOMCAT_USER environment variable so that Tomcat will run as the dspace user. This is not necessary because we will be creating a startup script for Tomcat that will ensure that it runs as the correct user. Do however make sure that the tomcat directory is owned by the dspace user:

cd /usr/local
chown –R dspace:dspace apache-tomcat-6.0.26

Set some Java options to ensure that Tomcat is compatible with DSpace. Edit /etc/profile and add the following line:

export JAVA_OPTS="-Xmx512M -Xms64M -Dfile.encoding=UTF-8"

Log out and back in for these changes to be picked up.

Also tell the Tomcat connector to support UTF-8 encoding in URIs. Edit /usr/local/tomcat/conf/server.xml and make the highlighted changes to this section:

<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443"
URIEncoding="UTF-8" />

And start Tomcat to check if it starts up OK:

cd /usr/local/tomcat/bin
./startup.sh

If everything is fine shutdown Tomcat:

./shutdown.sh

Perl

Perl should already be installed on your debian server installation.

Install DSpace

DSpace can be downloaded from Sourceforge. For this example we will be using the default release rather than the source release. Copy the distribution file to your chosen installation directory (e.g. /usr/local/) and unpack it.

This will create a directory called dspace-<version>-release where <version> is the number of the DSpace version you downloaded. Following the convention of the DSpace manual we will refer to this directory as [dspace-source] in the remainder of this document. Change the ownership of this directory to the dspace user:

chown dspace:dspace dspace-<version>-release

Configure PostgreSQL

The PostgreSQL database will need to be configured for useby DSpace. First create the user in the DBMS. You will only be able to do this as the postgres user:

su – postgres
createuser -U postgres -d -A -P dspace

Next, log in as the dspace user and create a database. Log out (back to the root account) if still logged in as the postgres user:

exit

Then log in as dspace and create the database:

su – dspace
createdb -U dspace -E UNICODE dspace
exit

Configure DSpace

Edit the DSpace configuration file [dspace-source]/dspace/config/dspace.cfg and set the following values appropriate to your installation.

dspace.dir = /usr/local/dspace
dspace.hostname = dspace.mydomain.com
dspace.baseUrl = http://dspace.mydomain.com
dspace.url = ${dspace.baseUrl}/xmlui
dspace.name = My DSpace Repository
db.username = dspace
db.password = XXXX
mail.server=smtp.mydomain.com
mail.from.address = dspace-noreply@mydomain.com
feedback.recipient = dspace-help@mydomain.com
mail.admin = dspace-help@mydomain.com
default.language = en_GB

Next create the DSpace installation directory. This should be the directory specified in the dspace.dir configuration option above and should be owned by the dspace user. Assuming a dspace directory of /user/local/dspace:

cd /usr/local
mkdir dspace
chown dspace:dspace dspace

Again, following the convention of the DSpace manual we will refer to this installation directory as [dspace].

Build and install DSpace

Run the Maven package for DSpace as the dspace user:

su – dspace
cd [dspace-source]/dspace
mvn package

This may download a large number of additional packages in order to generate the DSpace installation.

Next, run the Ant build (still logged in as the dspace user):

cd [dspace-source]/dspace/target/dspace-<version>-build.dir
ant fresh_install

Configure tomcat to use DSpace webapps

The DSpace installation instructions describe either copying the DSpace webapps into the default Tomcat webapps folder or pointing Tomcat to the DSpace webapps by changing the appBase setting for localhost. We do not recommend either of these approaches – the latter would remove access to all the default Tomcat webapps, which may not be what you want. Instead follow the instructions from the Windows installation section to set Contexts for paths relevant to DSpace.

Edit the file /usr/local/tomcat/conf/server.xml and add the following lines to the host section, substituting the actual paths to your installation directories:

<!-- DEFINE A CONTEXT PATH FOR DSpace JSP User Interface  -->
<Context path="/xmlui" docBase="[dspace]/webapps/xmlui" debug="0"
reloadable="true" cachingAllowed="false"
allowLinking="true"/>

<!-- DEFINE A CONTEXT PATH FOR DSpace OAI User Interface  -->
<Context path="/oai" docBase="[dspace]/webapps/oai" debug="0"
reloadable="true" cachingAllowed="false"
allowLinking="true"/>

Before starting Tomcat, create the DSpace administration user. Still logged in as the dspace user run this command:

cd [dspace]/bin
./create-administrator

And then start Tomcat:

cd /usr/local/tomcat/bin
./shutdown.sh
./startup.sh

You can test the installation using lynx (or another browser if you have a GUI capable Debian installation).

lynx http://localhost:8080/xmlui

Additional Configuration Options

Proxy DSpace/Tomcat through Apache

We don’t want to have to type the port number in the URL in order to access our repository so the next step is to set up proxying through Apache to Tomcat using mod_jk.

Install the standard package for mod_jk:

apt-get install libapache2-mod-jk

Edit the file /etc/libapache2-mod-jk/workers.properties and change the following values:

workers.tomcat_home=/usr/local/tomcat
workers.java_home=/usr/lib/j2dsk1.6-sun
worker.list=worker1
worker.worker1.port=8009
worker.default.host=localhost
worker.default.type=ajp13
worker.default.lbfactor=1

And comment out the entries in the DEFAULT LOAD BALANCER WORKER DEFINITION section.

Next, create a file called /etc/apache2/mods-available/jk.conf containing:

#
# Mod_jk2 allows the Apache Web server to connect to application
# servers using the AJP protocol. This allows web applications to
# be integrated seamlessly into your Apache server's URI space and
# utilize Apache features such as SSL processing.
#
#
# mod_jk is configured in /etc/libapache2-mod-jk/workers.properties
#
# Where to find workers.properties
JkWorkersFile	/etc/libapache2-mod-jk/workers.properties
# Where to put jk logs
JkLogFile	/var/log/apache2/mod_jk.log
# Set the jk log level [debug/error/info]
JkLogLevel	info
# Select the log format
JkLogStampFormat	"[%a %b %d %H:%M:%S %Y] "
# JkOptions indicate to send SSL KEY SIZE,
JkOptions	+ForwardKeySize +ForwardURICompat -ForwardDirectories
# JkRequestLogFormat set the request format
JkRequestLogFormat	"%w %V %T"

(Adapted from Clive Gould – http://dspacebromley.blogspot.com/)

In the Virtual Host section of your Apache default site configuration file (/etc/apache2/sites-enabled/000-default) add the following lines to map the URL paths to the worker:

JkMount	/xmlui worker1
JkMount	/xmlui/* worker1
JkMount	/oai worker1
JkMount	/oai/* worker1

Restart Apache:

/etc/init.d/apache2 restart

The repository should now be available at the domain name specified in the section Configure DSpace (with the addition of /xmlui to the URL).

Configure Tomcat to start on boot

Configuring Tomcat to start when the system boots is a two stage process. First, log in as root and create a script file in the directory /etc/init.d/ called tomcat. Give it the following contents, changing any paths where relevant:

#!/bin/bash
### BEGIN INIT INFO
# Provides:          tomcat
# Required-Start:    $syslog
# Required-Stop:     $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Start tomcat at boot time
# Description:       Enable tomcat.
### END INIT INFO
TOMCAT_HOME=/usr/local/tomcat
START_TOMCAT=${TOMCAT_HOME}/bin/startup.sh
STOP_TOMCAT=${TOMCAT_HOME}/bin/shutdown.sh
start() {
  echo -n "Starting tomcat: "
  su - dspace ${START_TOMCAT}
  echo "done."
}
stop() {
  echo -n "Shutting down tomcat: "
  su - dspace ${STOP_TOMCAT}
  echo "done."
}
case "$1" in
start)
  start
;;
stop)
  stop
;;
restart)
  stop
  sleep 6
  start
;;
*)
  echo "Usage: $0 {start|stop|restart}"
esac
exit 0

(Source http://cloudservers.rackspacecloud.com/index.php/CentOS_-_Tomcat_6, modified)

Make the script executable:

chmod 755 tomcat

Test that this script works to start and stop Tomcat (this assumes that Tomcat is not currently running):

/etc/init.d/tomcat start
/etc/init.d/tomcat stop

Second, use update-rc.d to create the symbolic links to start the service at the required boot levels.

update-rc.d tomcat defaults

Cron jobs

A number of cron jobs are required in order to carry out scheduled tasks such as sending subscription emails, generating thumbnail images of media and generating statistics. First create a set of cronjobs as the dspaceuser. From a root login:

su – dspace
crontab –e

Enter the following lines:

# Send out subscription e-mails at 01:00 every day
0 1 * * *  [dspace]/bin/sub-daily
# Run the media filter at 02:00 every day
0 2 * * *  [dspace]/bin/filter-media
# Run the checksum checker at 03:00
0 3 * * *  [dspace]/bin/checker -lp
# Mail the results to the sysadmin at 04:00
0 4 * * *  [dspace]/bin/dsrun org.dspace.checker.DailyReportEmailer -c
# Run stat analysis
0 1 * * * [dspace]/bin/stat-general
0 1 * * * [dspace]/bin/stat-monthly
0 2 * * * [dspace]/bin/stat-report-general
0 2 * * * [dspace]/bin/stat-report-monthly

Log out and then back in as the postgres user and create another crontab to clean the database regularly:

exit
su – postgres
crontab –e

Add these lines (for example):

# Clean up the database nightly at 4.20am
20 4 * * * vacuumdb --analyze dspace > /dev/null 2>&1

Finished

This should complete the DSpace installation and you should be able to access the repository on your desired URL without the need to enter a port number.

Image credit: The Library of Congress

About these ads

About Rob Ingram
Rob Ingram is the Technical Officer for the Repositories Support Project offering technical advice and support to UK repository networks.

6 Responses to Installing DSpace on Debian 5

  1. Hey! Thanks for the guide! It’s very useful!

  2. Hi again!
    I follow you guide, but I have one question
    How can I quit the /xmlui to the URL ?
    I would like a URL like dspace.mypage.edu.co
    Thanks :)

    • Rob Ingram says:

      I did have a good reference for this from the mail archives but the link I had doesn’t seem to work anymore and I can’t track down the post.

      WARNING: I’m just working from memory here – back everything up if you try this!

      The main points that I remember were:

      In main dspace config change the setting for:

      dspace.url = ${dspace.baseUrl}/xmlui

      to just:

      dspace.url = ${dspace.baseUrl}/

      In the tomcat server.xml comment out the context path definition for /xmlui and set the main document base for the server to /usr/local/dspace/webapps. Note that this will remove access to any other tomcat web apps.

      Create a symbolic link in /usr/local/dspace/webapps to xmlui called ROOT – this makes xmlui the root application for tomcat.

      Change the JkMount statement in /etc/apache2/sites-enabled/000-default so that the references to /xmlui are just /

  3. Graciano says:

    Hi
    java-package was removed from Squeeze.
    Have you tried this on Debian 6?
    Thanks

  4. Jesus Marquez says:

    Hi, i have one problem when execute ant fresh_install, next error ocurr:

    BUILD FAILED
    /home/jmarquez/download/dspace-1.7.1-src-release/dspace/target/dspace-1.7.1-build.dir/build.xml:809: Java returned: 1

    Thank any help!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: