CERN DOCUMENT SERVER SOFTWARE (CDSware) INSTALLATION ==================================================== Revision: $Id$ About ===== This document specifies how to build, customize, and install the CERN Document Server Software (CDSware) for the first time. See RELEASE-NOTES if you are upgrading from a previous CDSware release. Contents ======== 0. Prerequisites 1. Quick instructions for the impatient CDSware admin 2. Detailed instructions for the patient CDSware admin 3. Configuration philosophy explained and elucidated 0. Prerequisites ================ Here is the software you need to have around before you start installing CDSware: a) Unix-like operating system. The main development and production platform for CDSware at CERN is Debian GNU/Linux, but any Unix system supporting the software listed below should do. Note that localhost should have an MTA running so that CDSware can email notification alerts or registration information to the end users, contact moderators and reviewers of submitted documents, inform administrators about various runtime system information, etc. b) MySQL server (may be on a remote machine), and MySQL client (must be available locally too). MySQL versions 4.0.x are recommended, versions above 4.1.0 may cause troubles. Please set the variable ``max_allowed_packet'' in your ``my.cnf'' init file to at least 4M. c) Apache 2 server, with support for loading DSO modules. Tested mainly with version 2.0.43 and above. Apache 2 is required for the mod_python module (see below). Note for FreeBSD users: Thierry Thomas reports troubles with Python open() in the Apache 2 and mod_python context. The solution is either (1) to compile Apache 2 with --enable-threads (the port has a knob WITH_THREADS to do that); or (2) to add the two following two lines in Apache's envvars file: LD_PRELOAD=/usr/lib/libc_r.so # or libpthread.so export LD_PRELOAD d) Python v2.2.2 or above (v2.3.2 and above recommended): as well as the following Python modules: - (mandatory) MySQLdb version 0.9.2 (higher versions like MySQLdb 1.0.0 will not work!): - (mandatory) Numeric module (v21 and above): - (recommended) PyStemmer, for indexing and ranking: - (recommended) PyRXP, for very fast XML MARC processing: - (recommended) Gnuplot.Py, for producing graphs: - (optional) 4suite, slower alternative to PyRXP: - (optional) Psyco, to speed up the code at places: e) mod_python Apache module. Tested mainly with versions 3.0BETA4 and above. mod_python 3.x is required for Apache 2. Previous versions (as well as Apache 1 ones) exhibited some problems with MySQL connectivity in our experience. f) PHP compiled as Apache module, including MySQL support. Tested mainly with PHP version 4.3.0 and above. g) PHP compiled as a standalone command-line executable (CLI) (in addition to Apache module) is required, too. As of PHP 4.3.0 you'll obtain the CLI executable by default, so you don't have to compile it separately. Note that PHP CLI should be compiled with the process control support (--enable-pcntl) and the compression library (--with-zlib). h) WML - Website META Language. Tested mainly with versions 2.0.8 and 2.0.9. Note that on Red Hat Linux 9 the WML 2.0.9 compiled with Perl 5.8.0 exhibits problems, so you better use downgraded/upgraded Perl for compiling WML on that platform. i) If you want to be able to search for words in the fulltext files (i.e. to have fulltext indexing), then you need as well to install some of the following tools: - for PDF files: pdftotext or pstotext - for PostScript files: pstotext or ps2ascii - for MS Word files: antiword, catdoc, or wvText - for MS PowerPoint files: pptHtml and html2text - for MS Excel files: xlhtml and html2text j) If you have chosen to install fast XML MARC Python processors in the step d) above, then you have to install the parsers themselves: - (optional) RXP: - (optional) 4suite: k) (recommended) Gnuplot, the command-line driven interactive plotting program. It is used to display download and citation history graphs on the Detailed record pages on the web interface. Note that Gnuplot is not required, only recommended. l) (recommended) A Common Lisp implementation, such as CLISP, SBCL or CMUCL. It is used for the web server log analysing tool and the metadata checking program. Note that any of the three above implementations will do. CMUCL produces fastest code, but it does not support UTF-8 yet. Note that a Common Lisp implementation is not required, only recommended. Note that the configure script checks whether you have all the prerequisite software installed and that it won't let you to continue unless everything is in order. It also warns you if it cannot find some optional, but recommended software. 1. Quick instructions for the impatient CDSware admin ===================================================== $ cd /usr/local/src/ $ wget http://cdsware.cern.ch/download/cdsware-0.3.2.tar.gz $ wget http://cdsware.cern.ch/download/cdsware-0.3.2.tar.gz.md5 $ wget http://cdsware.cern.ch/download/cdsware-0.3.2.tar.gz.sig $ md5sum -v -c cdsware-0.3.2.tar.gz.md5 $ gpg --verify cdsware-0.3.2.tar.gz.sig cdsware-0.3.2.tar.gz $ tar xvfz cdsware-0.3.2.tar.gz $ cd cdsware-0.3.2 $ ./configure --prefix=/usr/local/cdsware-DEMO \ --with-webdir=/var/www/DEMO \ --with-weburl=http://webserver.domain.com/DEMO \ --with-dbhost=sqlserver.domain.com \ --with-dbname=cdsware \ --with-dbuser=cdsware \ --with-dbpass=myp1ss \ --with-python=/opt/python/bin/python2.3 $ vi ./config/config.wml ## optional, but strongly recommended $ make $ mysql -h sqlserver.domain.com -u root -p mysql mysql> CREATE DATABASE cdsware; mysql> GRANT ALL PRIVILEGES ON cdsware.* TO cdsware@webserver.domain.com IDENTIFIED BY 'myp1ss'; $ sudo vi /path/to/apache/conf/httpd.conf ## see below in part 2 $ sudo vi /path/to/php/conf/php.ini ## see below in part 2 $ sudo /path/to/apache/bin/apachectl graceful $ make create-tables ## optional $ make install $ make test ## optional $ sudo chown -R www-data /usr/local/cdsware-DEMO/var $ make create-demo-site ## optional $ make load-demo-records ## optional $ make remove-demo-records ## optional $ make drop-demo-site ## optional $ netscape http://webserver.domain.com/cdsware/admin/ ## optional 2. Detailed instructions for the patient CDSware admin ====================================================== The CERN Document Server Software (CDSware) uses standard GNU autoconf method to build and install its files. This means that you proceed as follows: $ cd /usr/local/src/ Change to a directory where we will configure and build the CDS Software. (The built files will be installed into different "target" directories later.) $ wget http://cdsware.cern.ch/download/cdsware-0.3.2.tar.gz $ wget http://cdsware.cern.ch/download/cdsware-0.3.2.tar.gz.md5 $ wget http://cdsware.cern.ch/download/cdsware-0.3.2.tar.gz.sig Fetch CDSware source tarball from the CDSware distribution server, together with MD5 checksum and GnuPG cryptographic signature files useful for verifying the integrity of the tarball. $ md5sum -v -c cdsware-0.3.2.tar.gz.md5 Verify MD5 checksum. $ gpg --verify cdsware-0.3.2.tar.gz.sig cdsware-0.3.2.tar.gz Verify GnuPG cryptographic signature. Note that you may first have to import my public key into your keyring, if you haven't done that already: $ gpg --keyserver wwwkeys.pgp.net --recv-keys 0xBA5A2B67 The output of the gpg --verify command should then read: Good signature from "Tibor Simko " You can safely ignore any trusted signature certification warning that may follow after the signature has been successfully verified. $ tar xvfz cdsware-0.3.2.tar.gz Untar the distribution tarball. $ cd cdsware-0.3.2 Go to the source directory. $ ./configure --prefix=/usr/local/cdsware-DEMO \ --with-webdir=/var/www/DEMO \ --with-weburl=http://webserver.domain.com/DEMO \ --with-dbhost=sqlserver.domain.com \ --with-dbname=cdsware \ --with-dbuser=cdsware \ --with-dbpass=myp1ss \ --with-python=/opt/python/bin/python2.3 Configure essential CDSware parameters, with the following signification: --prefix=/usr/local/cdsware-DEMO The CDSware general installation directory, used to hold command-line binaries and program libraries containing the core CDSware functionality, but also to store runtime log and cache information. Several subdirs like `bin', `lib', and `var' will be created inside the --prefix directory to this effect. Note that the --prefix directory should be chosen outside of the Apache htdocs tree, since no file from this directory is to be accessible on the Web. --with-webdir=/var/www/DEMO The directory holding the web interface to CDSware, i.e. containing all the callable scripts and web pages visible to the end user. Must be located inside Apache htdocs tree. The scripts within this directory will generally call core CDSware libraries and binaries installed in the --prefix directory. --with-weburl=http://webserver.domain.com/DEMO The URL corresponding to the --with-webdir directory. It will denote the home URL of your CDSware installation. --with-dbhost=sqlserver.domain.com --with-dbname=cdsware --with-dbuser=cdsware --with-dbpass=myp1ss The database server host, the database name, and the database user credentials. --with-python=/opt/python/bin/python2.3 Optionally, specify a path to some specific Python binary. This is useful if you have more than one Python installation on your system. If you don't set this option, then the first Python that will be found in your PATH will be chosen for running CDSware. CDSware won't install to any other directory but to the two mentioned in this configuration line. Do not use trailing slashes when specifying any of the above values. This configuration step is mandatory, and is referred to as "pre-compile time configuration step a)" in the elucidative explanatory commentary below. (Note that if you prefer to build CDSware out of its source tree, you may run the above configure command like this: $ mkdir build && cd build && ../configure --prefix=...) $ vi ./config/config.wml ## optional, but strongly recommended Optionally, customize your CDSware installation. We strongly recommend you to edit at least the top of this file where you can define some very essential CDSware parameters like the name of your CDSware document server or the email address of the local CDSware administrator. (The latter is needed if you want to use administration modules, and you certainly do!) The rest of the "config.wml" file enables you to change the CDSware web page look and feel, and otherwise to influence its behaviour and default parameters. This configuration step is optional, but strongly recommended. It is referred to as "compile time configuration step b)" in the elucidative explanatory commentary below. $ make Launch the CDSware build. All the pages will be pre-created based on the config you have edited in the previous step. Before proceeding further with the CDSware installation, we have to do some admin-level tasks on the MySQL and Apache servers. $ mysql -h sqlserver.domain.com -u root -p mysql mysql> CREATE DATABASE cdsware; mysql> GRANT ALL PRIVILEGES ON cdsware.* TO cdsware@webserver.domain.com IDENTIFIED BY 'myp1ss'; You need to create a dedicated database on your MySQL server that the CDSware can use for its purposes. Please contact your MySQL administrator and ask him to execute the above commands that will create the "cdsware" database, a user called "cdsware" with password "myp1ss", and that will grant all rights on the "cdsware" database to the "cdsware" user. (Of course, you are free to choose your own user credentials and the database name; the above values were just an example. See also the configure line below.) $ sudo vi /path/to/apache/conf/httpd.conf ## see below in part 2 Please ask your webserver administrator to put the following lines in your "httpd.conf" configuration file: AddDefaultCharset UTF-8 AddType application/x-httpd-php .php AddType application/x-httpd-php-source .phps DirectoryIndex index.en.html index.html index.py index.en.php index.php This is to ensure that the browsers will get UTF-8 as the default page encoding, that "*.php" files will be interpreted by the web server as PHP files, and that "index.py" or "index.en.php" will be considered as directory index file. In addition, you have to ask Apache to interpret .py files in the installation place via mod_python: AddHandler python-program .py PythonHandler mod_python.publisher PythonDebug On $ sudo vi /path/to/php/conf/php.ini ## see below in part 2 Please ask your webserver administrator to put the following lines in your "php.ini" configuration file: log_errors = on display_errors = off expose_php = off max_execution_time = 160 register_globals = on short_open_tag = on This will set up some relevant PHP variables. $ sudo /path/to/apache/bin/apachectl graceful Please ask your webserver administrator to restart the Apache server after the above "httpd.conf" and "php.ini" changes. After these admin-level tasks to be performed as root, let's now go back to finish the installation of the CDSware package. $ make create-tables ## optional Optionally, create CDSware tables on the MySQL server. You probably want to do this step only once, i.e. if you have not created any CDSware database and tables yet. $ make install Install the web pages, scripts, utilities and everything needed for runtime into the respective directories, as specified earlier by the configure command. After this step, you should be able to point your browser to the chosen URL of your local CDSware installation and see it running! $ make test Optionally, you can run our test suite to verify the results of known tests on your local CDSware installation. Note that this command should be run only after you have installed the whole system via `make install'. $ sudo chown -R www-data /usr/local/cdsware-DEMO/var One more superuser step, as we need to enable Apache server to write some log information and to cache interesting entities inside the "var" subdirectory of our CDSware general installation directory. Here we assume that your Apache server processes are run under "www-data" group. Change this appropriately for your system. $ make create-demo-site ## optional This step is recommended to test your local CDSware installation. It should give you our "Atlantis Institute of Science" demo installation, exactly as you see it at . $ make load-demo-records ## optional Optionally, load some demo records to be able to test indexing and searching of your local demo CDSware installation. $ make remove-demo-records ## optional Optionally, remove the demo records loaded in the previous step but otherwise keep the demo collection, submit, format etc configurations that you may reuse and modify for production purposes. $ make drop-demo-site ## optional Optionally, drop also all the demo configuration so that you'll have a blank CDSware system for your production purposes. $ netscape http://webserver.domain.com/DEMO/admin/ ## optional Optionally, do further runtime configuration of the CDSware, like definition of data collections, document types, document formats, word file tables, etc. This configuration step is optional, and is referred to as "runtime configuration step c)" in the elucidative explanatory commentary below. 3. Configuration philosophy explained and elucidated ==================================================== As you could see from the above, the configuration of the CDSware is threefold: (a) pre-compile time configuration phase [uses command line options / while doing "configure"] (b) compile time configuration phase [uses WML / after "configure", while doing "make && make install"] (c) runtime configuration phase [uses MySQL / after "make install", while doing "netscape http://webserver.domain.com/DEMO/admin/"] What is the difference, and why? (a) pre-compile time configuration phase [uses command line options / while doing "configure"] This configures essential CDSware parameters that makes your CDSware copy installable and runable. The essential parameters include: general CDSware installation directory containing (among others) binaries, libraries, and log and cache directories; install directory for Web scripts and pages; and MySQL user and server credentials. This configuration step uses standard GNU autoconf approach, i.e. you will run the standard "configure" script. Note that the only arguments that CDSware takes into consideration are the general "--prefix" one and CDSware-specific "--with-foo" arguments, see the end of "configure --help" output. This configuration step is mandatory. Without knowing theses essential parameters there is nothing to install and nothing to run. Usually, you do this step only once. (b) compile time configuration phase [uses WML / after "configure", while doing "make && make install"] Optionally, you may choose to influence CDSware behaviour, to set up CDSware system name, to choose its default parameters, to change its look and feel, to add your local web pages, etc. This configuration step uses WML, the Website META Language. The most important configuration file is "config/config.wml" that you can edit at your will. Optionally, if you are an advanced user, you may edit other WML files in the distribution tree. After that, when you type "make", the CDSware pages will be pre-generated. We prefer that this configuration step is done during compile-time and not runtime, because of multiple reasons: (i) the pre-generated pages impose less load on the web server and the database server and so they are served faster to the end user; (ii) we use several different languages (Python, PHP) and by using independent compile-time configuration language we can share the same configuration variables across heterogeneous languages; (iii) use of WML and page templates enables you to easily change anything in the CDSware system, even into deep levels. If you are changing parameters and/or look and feel of CDSware pages, you may want to repeat these step several times: $ vi config/config.wml $ make drop-tables $ make create-tables $ make create-demo-site $ make load-demo-records $ netscape http://webserver.domain.com/DEMO/ [...] to see what it brings, until you are satisfied with the result. (Note that you may as well choose to change these parameters inside the CDSware library configuration files later on, during runtime, at least for non-static Python and PHP pages.) (c) runtime configuration phase [uses MySQL / after "make install", while doing "netscape http://webserver.domain.com/DEMO/admin/"] Optionally, you will most probably want to define specific data collections, to configure submit and search page for those collections, to specify search options and word files to search in, formats how to display data, etc. This configuration step uses MySQL configuration tables and is done during the runtime, for your convenience. It means that after previous configuration step (b), and after successful "make install", if you are happy with its result you no longer edit WML files within the CDSware source tree but rather configure "fully running" CDSware installation via its Administration web interface. Usually, you will do this step many times in the future, to tweak the running installation, to add new collections and data types, etc. (Note that if you want to change something "deeper" in a running CDSware installation, such as look and feel of pages, or to add some new pages, then you may want to edit library configuration files for Python and PHP non-static pages, as noted above. But if you want to do really "deep" changes, you need to go back to WML source, so you may want to leave your customized copy of the CDSware WML source tree around.) We hope that this explains why we have chosen this three-level configuration model, and that you will find it convenient in real life. Good luck, and thanks for choosing the CERN Document Server Software. - CDS Development Group