CDS Invenio INSTALLATION
========================
Revision: $Id$
About
=====
This document specifies how to build, customize, and install CDS
Invenio for the first time. See RELEASE-NOTES if you are upgrading
from a previous CDS Invenio release.
Contents
========
0. Prerequisites
1. Quick instructions for the impatient CDS Invenio admin
2. Detailed instructions for the patient CDS Invenio admin
3. Configuration philosophy explained and elucidated
0. Prerequisites
================
Here is the software you need to have around before you
start installing CDS Invenio:
a) Unix-like operating system. The main development and
production platform for CDS Invenio at CERN is Debian GNU/Linux,
but any Unix system supporting the software listed below
should do. Note that localhost should have an MTA running
so that CDS Invenio can email notification alerts or registration
information to the end users, contact moderators and
reviewers of submitted documents, inform administrators about
various runtime system information, etc.
Note that if you are using Debian "Sarge" GNU/Linux, you can
install most of the prerequisites and recommendations listed
below by running:
$ sudo apt-get install libapache2-mod-python2.3 \
libapache2-mod-php4 apache2-mpm-prefork mysql-server \
mysql-client php4-cli php4-mysql python2.3-mysqldb \
python2.3-numeric python2.3-4suite python2.3-psyco rxp \
wml gnuplot xpdf-utils gs-common antiword catdoc wv \
html2text ppthtml xlhtml clisp sbcl cmucl gettext
b) MySQL server (may be on a remote machine), and MySQL client
(must be available locally too). MySQL versions 4.0.x are
recommended, versions above 4.1.0 may cause troubles.
Please set the variable ``max_allowed_packet'' in your
``my.cnf'' init file to at least 4M.
c) Apache 2 server, with support for loading DSO modules, and
optionally with SSL support for HTTPS-secure user
authentication. Tested mainly with version 2.0.43 and above.
Apache 2.x is required for the mod_python module (see below).
Note for FreeBSD users: Thierry Thomas
reports troubles with Python open() in the Apache 2 and
mod_python context. The solution is either (1) to compile
Apache 2 with --enable-threads (the port has a knob
WITH_THREADS to do that); or (2) to add the two following two
lines in Apache's envvars file:
LD_PRELOAD=/usr/lib/libc_r.so # or libpthread.so
export LD_PRELOAD
d) Python v2.3 or above:
as well as the following Python modules:
- (mandatory) MySQLdb:
- (mandatory) Numeric module (v21 and above):
- (recommended) PyStemmer, for indexing and ranking:
- (recommended) PyRXP, for very fast XML MARC processing:
- (recommended) Gnuplot.Py, for producing graphs:
- (optional) 4suite, slower alternative to PyRXP:
- (optional) Psyco, to speed up the code at places:
- (optional) RDFLib, to use RDF ontologies and thesauri:
- (optional) mechanize, to run regression web tests:
e) mod_python Apache module. Tested mainly with versions
3.0BETA4 and above. mod_python 3.x is required for Apache 2.
Previous versions (as well as Apache 1 ones) exhibited some
problems with MySQL connectivity in our experience.
f) PHP compiled as Apache module, including MySQL support.
Tested mainly with PHP version 4.3.0 and above. (Note that if
you are compiling mod_php from source, it is good to compile
it against the same MySQL client library as mod_python, so
that the two Apache modules are using the same MySQL client
library. We saw Apache/PHP/Python problems in the past when
they weren't. A care should be taken even when you are using
precompiled packages. For example, on Ubuntu 5.10 "Breezy
Badger" it seems that python2.4-mysqldb depends on
libmysqlclient14, while php5-mysql on libmysqlclient12, which
leads to Apache segmentation faults.)
g) PHP compiled as a standalone command-line executable (CLI)
(in addition to Apache module) is required, too. As of PHP
4.3.0 you'll obtain the CLI executable by default, so you
don't have to compile it separately. Note that PHP CLI should
be compiled with the process control support (--enable-pcntl)
and the compression library (--with-zlib).
h) WML - Website META Language. Tested mainly with versions
2.0.8 and 2.0.9. Note that on Red Hat Linux 9 the WML 2.0.9
compiled with Perl 5.8.0 exhibits problems, so you better use
downgraded/upgraded Perl for compiling WML on that platform.
i) If you want to be able to extract references from PDF fulltext
files, then you need to install pdftotext version 3 at least.
j) If you want to be able to search for words in the fulltext files
(i.e. to have fulltext indexing), then you need as well to install
some of the following tools:
- for PDF files: pdftotext or pstotext
- for PostScript files: pstotext or ps2ascii
- for MS Word files: antiword, catdoc, or wvText
- for MS PowerPoint files: pptHtml and html2text
- for MS Excel files: xlhtml and html2text
k) If you have chosen to install fast XML MARC Python processors
in the step d) above, then you have to install the parsers
themselves:
- (optional) RXP:
- (optional) 4suite:
l) (recommended) Gnuplot, the command-line driven interactive
plotting program. It is used to display download and citation
history graphs on the Detailed record pages on the web
interface. Note that Gnuplot is not required, only
recommended.
m) (recommended) A Common Lisp implementation, such as CLISP,
SBCL or CMUCL. It is used for the web server log analysing
tool and the metadata checking program. Note that any of the
three implementations CLISP, SBCL, or CMUCL will do. CMUCL
produces fastest machine code, but it does not support UTF-8
yet. Pick up CLISP if you don't know what to do. Note that a
Common Lisp implementation is not required, only recommended.
n) GNU Gettext, a set of tools that makes it possible to
translate the application in multiple languages.
This is available by default on many systems.
Note that the configure script checks whether you have all the
prerequisite software installed and that it won't let you continue
unless everything is in order. It also warns you if it cannot find
some optional but recommended software.
1. Quick instructions for the impatient CDS Invenio admin
=========================================================
$ cd /usr/local/src/
$ wget http://cdsware.cern.ch/download/cds-invenio-0.90.tar.gz
$ wget http://cdsware.cern.ch/download/cds-invenio-0.90.tar.gz.md5
$ wget http://cdsware.cern.ch/download/cds-invenio-0.90.tar.gz.sig
$ md5sum -v -c cds-invenio-0.90.tar.gz.md5
$ gpg --verify cds-invenio-0.90.tar.gz.sig cds-invenio-0.90.tar.gz
$ tar xvfz cds-invenio-0.90.tar.gz
$ cd cds-invenio-0.90
$ ./configure --prefix=/opt/cds-invenio \
--with-weburl=http://webserver.domain.com \
--with-sweburl=https://webserver.domain.com \
--with-dbhost=sqlserver.domain.com \
--with-dbname=cdsinvenio \
--with-dbuser=cdsinvenio \
--with-dbpass=myp1ss \
--with-python=/opt/python/bin/python2.3
$ vi ./config/config.wml ## optional, but strongly recommended
$ make
$ mysql -h sqlserver.domain.com -u root -p mysql
mysql> CREATE DATABASE cdsinvenio;
mysql> GRANT ALL PRIVILEGES ON cdsinvenio.* TO cdsinvenio@webserver.domain.com IDENTIFIED BY 'myp1ss';
$ sudo vi /path/to/apache/conf/httpd.conf ## see below in part 2
$ sudo vi /path/to/php/conf/php.ini ## see below in part 2
$ sudo /path/to/apache/bin/apachectl graceful
$ make create-tables ## optional
$ sudo ln -s /opt/cds-invenio/lib/python/invenio \
/usr/local/lib/python2.3/site-packages/invenio \
## optional
$ make install
$ make test ## optional
$ sudo chown -R www-data /opt/cds-invenio/var
$ make create-demo-site ## optional
$ make load-demo-records ## optional
$ make remove-demo-records ## optional
$ make drop-demo-site ## optional
$ firefox http://webserver.domain.com/admin/ ## optional
2. Detailed instructions for the patient CDS Invenio admin
==========================================================
The CDS Invenio uses standard GNU autoconf method to build and
install its files. This means that you proceed as follows:
$ cd /usr/local/src/
Change to a directory where we will configure and build the
CDS Invenio. (The built files will be installed into
different "target" directories later.)
$ wget http://cdsware.cern.ch/download/cds-invenio-0.90.tar.gz
$ wget http://cdsware.cern.ch/download/cds-invenio-0.90.tar.gz.md5
$ wget http://cdsware.cern.ch/download/cds-invenio-0.90.tar.gz.sig
Fetch CDS Invenio source tarball from the CDS Software
Consortium distribution server, together with MD5 checksum
and GnuPG cryptographic signature files useful for verifying
the integrity of the tarball.
$ md5sum -v -c cds-invenio-0.90.tar.gz.md5
Verify MD5 checksum.
$ gpg --verify cds-invenio-0.90.tar.gz.sig cds-invenio-0.90.tar.gz
Verify GnuPG cryptographic signature. Note that you may
first have to import my public key into your keyring, if you
haven't done that already:
$ gpg --keyserver wwwkeys.pgp.net --recv-keys 0xBA5A2B67
The output of the gpg --verify command should then read:
Good signature from "Tibor Simko "
You can safely ignore any trusted signature certification
warning that may follow after the signature has been
successfully verified.
$ tar xvfz cds-invenio-0.90.tar.gz
Untar the distribution tarball.
$ cd cds-invenio-0.90
Go to the source directory.
$ ./configure --prefix=/opt/cds-invenio \
--with-weburl=http://webserver.domain.com \
--with-sweburl=https://webserver.domain.com \
--with-dbhost=sqlserver.domain.com \
--with-dbname=cdsinvenio \
--with-dbuser=cdsinvenio \
--with-dbpass=myp1ss \
--with-python=/opt/python/bin/python2.3
Configure essential CDS Invenio parameters, with the following
signification:
--prefix=/opt/cds-invenio
CDS Invenio general installation directory, used to
hold command-line binaries and program libraries
containing the core CDS Invenio functionality, but
also to store web pages, runtime log and cache
information. Several subdirs like `bin', `lib', and
`var' will be created inside the --prefix directory
to this effect. Note that the --prefix directory
should be chosen outside of the Apache htdocs tree,
since only one subdirectory (prefix/var/www) is to be
accessible directly on the Web (see below).
--with-weburl=http://webserver.domain.com
The URL denoting the home URL of your CDS Invenio
installation. The files served by this URL will be
located in `prefix/var/www', so later on in your
Apache config file you would map `weburl' to
`prefix/var/www' (see below).
--with-sweburl=https://webserver.domain.com
The URL denoting the HTTPS-secure equivalent of the
home URL. The secure home URL will be used for
personalization pages, such as user login and
registration page. You must run SSL-enabled Apache
in order to use this feature. If you don't run
SSL-enabled Apache, then the user authentication will
be done via standard HTTP protocol, user credentials
travelling in clear text across the net. The
--with-sweburl option is optional.
--with-dbhost=sqlserver.domain.com
--with-dbname=cdsinvenio
--with-dbuser=cdsinvenio
--with-dbpass=myp1ss
The database server host, the database name, and the
database user credentials.
--with-python=/opt/python/bin/python2.3
Optionally, specify a path to some specific Python
binary. This is useful if you have more than one
Python installation on your system. If you don't set
this option, then the first Python that will be found
in your PATH will be chosen for running CDS Invenio.
CDS Invenio won't install to any other directory but to the
one mentioned in this configuration line.
Do not use trailing slashes when specifying any of the above
values.
This configuration step is mandatory, and is referred to as
"pre-compile time configuration step a)" in the elucidative
explanatory commentary below.
Note that if you prefer to build CDS Invenio out of its
source tree, you may run the above configure command like
this:
$ mkdir build && cd build && ../configure --prefix=...
$ vi ./config/config.wml ## optional, but strongly recommended
Optionally, customize your CDS Invenio installation. We
strongly recommend you to edit at least the top of this file
where you can define some very essential CDS Invenio
parameters like the name of your CDS Invenio document server
(look for CDSNAME and CDSNAMEINTL) or the email address of
the local CDS Invenio administrator (look for SUPPORTEMAIL
and ADMINEMAIL). The latter is needed if you want to use
administration modules, and you will certainly do.
The rest of the "config.wml" file enables you to change the
CDS Invenio web page look and feel, and otherwise to influence
its behaviour and default parameters.
This configuration step is optional, but strongly
recommended. It is referred to as "compile time
configuration step b)" in the elucidative explanatory
commentary below.
$ make
Launch the CDS Invenio build. Since many messages are printed
during the build process, you may want to run it in a
fast-scrolling terminal such as rxvt or in a detached screen
session.
During this step all the pages and scripts will be
pre-created and customized based on the config you have
edited in the previous step.
Before proceeding further with the CDS Invenio installation, we
have to do some admin-level tasks on the MySQL and Apache
servers.
$ mysql -h sqlserver.domain.com -u root -p mysql
mysql> CREATE DATABASE cdsinvenio;
mysql> GRANT ALL PRIVILEGES ON cdsinvenio.* TO cdsinvenio@webserver.domain.com IDENTIFIED BY 'myp1ss';
You need to create a dedicated database on your MySQL server
that the CDS Invenio can use for its purposes. Please
contact your MySQL administrator and ask him to execute the
above commands that will create the "cdsinvenio" database, a
user called "cdsinvenio" with password "myp1ss", and that
will grant all rights on the "cdsinvenio" database to the
"cdsinvenio" user. The credential values are the ones you
have chosen in the configure line above.
$ sudo vi /path/to/apache/conf/httpd.conf ## see below in part 2
Please ask your webserver administrator to put the following
lines in your "httpd.conf" configuration file:
AddDefaultCharset UTF-8
AddType application/x-httpd-php .php
AddType application/x-httpd-php-source .phps
This is to ensure that the browsers will get UTF-8 as the
default page encoding, and that "*.php" files will be
interpreted by the web server as PHP files.
As mentioned above, the web pages will get installed into
the `prefix/var/www' directory. Therefore you should
specify something along the lines of:
ServerSignature Off
ServerTokens Prod
NameVirtualHost *
ServerName webserver.domain.com
ServerAdmin cds.support@cern.ch
DocumentRoot /opt/cds-invenio/var/www
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
allow from all
ErrorLog /opt/cds-invenio/var/log/apache.err
LogLevel warn
CustomLog /opt/cds-invenio/var/log/apache.log combined
DirectoryIndex index.en.html index.html index.py index.en.php index.php
SetHandler python-program
PythonHandler invenio.webinterface_layout
PythonDebug On
AddHandler python-program .py
PythonHandler mod_python.publisher
PythonDebug On
This will tell Apache where to find the files, how to
interpret .py files, which files to serve as indexes, etc.
If you have configured the system to use secure URL for
login (see above), then you have to specify secure site too,
such as:
ServerSignature Off
ServerTokens Prod
NameVirtualHost *:443
SSLCertificateFile /etc/apache2/ssl/apache.pem
ServerName webserver.domain.com
ServerAdmin cds.support@cern.ch
SSLEngine on
DocumentRoot /opt/cds-invenio/var/www
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
allow from all
ErrorLog /opt/cds-invenio/var/log/apache-ssl.err
LogLevel warn
CustomLog /opt/cds-invenio/var/log/apache-ssl.log combined
DirectoryIndex index.en.html index.html index.py index.en.php index.php
SetHandler python-program
PythonHandler invenio.webinterface_layout
PythonDebug On
AddHandler python-program .py
PythonHandler mod_python.publisher
PythonDebug On
$ sudo vi /path/to/php/conf/php.ini ## see below in part 2
Please ask your webserver administrator to put the following
lines in your "php.ini" configuration file:
log_errors = on
display_errors = off
expose_php = off
max_execution_time = 160
register_globals = on
short_open_tag = on
This will set up some relevant PHP variables.
If your OS uses dynamic loading PHP MySQL libraries, such as
Debian "Sarge" GNU/Linux, you will also have to specify:
extension=mysql.so
in both /etc/php4/{cli,apache2}/php.ini files.
$ sudo /path/to/apache/bin/apachectl graceful
Please ask your webserver administrator to restart the
Apache server after the above "httpd.conf" and "php.ini"
changes.
After these admin-level tasks to be performed as root, let's
now go back to finish the installation of the CDS Invenio
package.
$ make create-tables ## optional
If you are installing for the first time, you have to create
CDS Invenio tables in the database.
Note that the `make install' process will warn you in case
the tables were not created and will ask you to run this
step manually before completing the make install process.
$ sudo ln -s /opt/cds-invenio/lib/python/invenio \
/usr/local/lib/python2.3/site-packages/invenio \
## optional
If you are installing for the first time, you will have to
create a symbolic link from Python's site-packages directory
that would indicate to Python where to find CDS Invenio's
Python files.
Note that the exact symlink target location depends on the
--prefix location (prefix/lib/python/invenio) and the exact
symlink source location depends on the Python version you
are using. (See also --with-python configuration option.)
Note that the `make install' process will warn you in case
the symbolic link was not created and it will indicate you
the command to use to create it manually before completing
the make install process.
$ make install
Install the web pages, scripts, utilities and everything
needed for runtime into the respective directories, as
specified earlier by the configure command.
After this step, you should be able to point your browser to
the chosen URL of your local CDS Invenio installation and see it
running!
$ make test
Optionally, you can run our test suite to verify the results
of known tests on your local CDS Invenio installation. Note
that this command should be run only after you have
installed the whole system via `make install'.
$ sudo chown -R www-data /opt/cds-invenio/var
One more superuser step, as we need to enable Apache server
to write some log information and to cache interesting
entities inside the "var" subdirectory of our CDS Invenio
general installation directory.
Here we assume that your Apache server processes are run
under "www-data" group. Change this appropriately for your
system.
$ make create-demo-site ## optional
This step is recommended to test your local CDS Invenio
installation. It should give you our "Atlantis Institute of
Science" demo installation, exactly as you see it at
.
$ make load-demo-records ## optional
Optionally, load some demo records to be able to test
indexing and searching of your local demo CDS Invenio
installation.
$ make remove-demo-records ## optional
Optionally, remove the demo records loaded in the previous
step but otherwise keep the demo collection, submit, format
etc configurations that you may reuse and modify for
production purposes.
$ make drop-demo-site ## optional
Optionally, drop also all the demo configuration so that
you'll have a blank CDS Invenio system for your production
purposes.
$ firefox http://webserver.domain.com/admin/ ## optional
Optionally, do further runtime configuration of the CDS Invenio,
like definition of data collections, document types,
document formats, word file tables, etc.
This configuration step is optional, and is referred to as
"runtime configuration step c)" in the elucidative
explanatory commentary below.
3. Configuration philosophy explained and elucidated
====================================================
As you could see from the above, the configuration of the CDS Invenio is
threefold:
(a) pre-compile time configuration phase
[uses command line options / while doing "configure"]
(b) compile time configuration phase
[uses WML / after "configure", while doing "make && make install"]
(c) runtime configuration phase
[uses MySQL / after "make install", while doing "netscape http://webserver.domain.com/DEMO/admin/"]
What is the difference, and why?
(a) pre-compile time configuration phase
[uses command line options / while doing "configure"]
This configures essential CDS Invenio parameters that makes your
CDS Invenio copy installable and runable. The essential parameters
include: general CDS Invenio installation directory containing
(among others) binaries, libraries, and log and cache
directories; install directory for Web scripts and pages;
and MySQL user and server credentials.
This configuration step uses standard GNU autoconf approach,
i.e. you will run the standard "configure" script. Note
that the only arguments that CDS Invenio takes into consideration
are the general "--prefix" one and CDS Invenio-specific
"--with-foo" arguments, see the end of "configure --help"
output.
This configuration step is mandatory. Without knowing
theses essential parameters there is nothing to install and
nothing to run.
Usually, you do this step only once.
(b) compile time configuration phase
[uses WML / after "configure", while doing "make && make install"]
Optionally, you may choose to influence CDS Invenio behaviour, to
set up CDS Invenio system name, to choose its default parameters,
to change its look and feel, to add your local web pages, etc.
This configuration step uses WML, the Website META Language.
The most important configuration file is "config/config.wml"
that you can edit at your will. Optionally, if you are an
advanced user, you may edit other WML files in the
distribution tree.
After that, when you type "make", the CDS Invenio pages will be
pre-generated. We prefer that this configuration step is
done during compile-time and not runtime, because of
multiple reasons: (i) the pre-generated pages impose less
load on the web server and the database server and so they
are served faster to the end user; (ii) we use several
different languages (Python, PHP) and by using independent
compile-time configuration language we can share the same
configuration variables across heterogeneous languages;
(iii) use of WML and page templates enables you to easily
change anything in the CDS Invenio system, even into deep
levels.
If you are changing parameters and/or look and feel
of CDS Invenio pages, you may want to repeat these step several
times:
$ vi config/config.wml
$ make drop-tables
$ make create-tables
$ make create-demo-site
$ make load-demo-records
$ firefox http://webserver.domain.com/
[...]
to see what it brings, until you are satisfied with the
result.
(Note that you may as well choose to change these parameters
inside the CDS Invenio library configuration files later on,
during runtime, at least for non-static Python and PHP
pages.)
(c) runtime configuration phase
[uses MySQL / after "make install", while doing "firefox http://webserver.domain.com/admin/"]
Optionally, you will most probably want to define specific
data collections, to configure submit and search page for those
collections, to specify search options and word files to
search in, formats how to display data, etc.
This configuration step uses MySQL configuration tables and
is done during the runtime, for your convenience. It means
that after previous configuration step (b), and after
successful "make install", if you are happy with its result
you no longer edit WML files within the CDS Invenio source tree but
rather configure "fully running" CDS Invenio installation via
its Administration web interface.
Usually, you will do this step many times in the future, to
tweak the running installation, to add new collections and
data types, etc.
(Note that if you want to change something "deeper" in a
running CDS Invenio installation, such as look and feel of pages,
or to add some new pages, then you may want to edit library
configuration files for Python and PHP non-static pages, as
noted above. But if you want to do really "deep" changes,
you need to go back to WML source, so you may want to leave
your customized copy of the CDS Invenio WML source tree around.)
We hope that this explains why we have chosen this three-level
configuration model, and that you will find it convenient in real
life.
Good luck, and thanks for choosing CDS Invenio.
- CDS Development Group