Page Menu
Home
c4science
Search
Configure Global Search
Log In
Files
F91388842
run.html.wml
No One
Temporary
Actions
Download File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Subscribers
None
File Metadata
Details
File Info
Storage
Attached
Created
Sun, Nov 10, 15:26
Size
7 KB
Mime Type
text/x-c
Expires
Tue, Nov 12, 15:26 (2 d)
Engine
blob
Format
Raw Data
Handle
22255789
Attached To
R3600 invenio-infoscience
run.html.wml
View Options
## $Id$
## This file is part of the CERN Document Server Software (CDSware).
## Copyright (C) 2002, 2003, 2004, 2005 CERN.
##
## The CDSware is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## The CDSware is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDSware; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
#include "cdspage.wml" \
title="HOWTO Run Your CDSware Installation" \
navbar_name="admin" \
navtrail_previous_links="<a class=navtrail href=<WEBURL>/admin/>Admin Area</a> > <a class=navtrail href=<WEBURL>/admin/howto/>Admin HOWTOs</a>" \
navbar_select="howto_run"
Version <: print generate_pretty_revision_date_string('$Id$'); :>
<h2>Overview</h2>
<p>This HOWTO guide intends to give you ideas on how to run your
CDSware installation and how to take care of its normal operation day
by day.
<h2>BibSched Periodical Tasks</h2>
<p>Many tasks that manipulate the bibliographic record database can be
set to run in a periodical mode. For example, we want to have the
indexing engine to scan periodically for newly arrived documents to
index them as soon as they enter into the system. It is the role of
the BibSched system to take care of the task scheduling and the task
execution.
<p>Periodical tasks (such as regular metadata indexing) as well as
one-time tasks (such as a batch upload of acquired metadata file)
are not executed straight away but are stored in the BibSched task
queue. BibSched daemon looks periodically in the queue and launches
the tasks according to their order or the date of programmed runtime.
You can consider BibSched to be a kind of cron daemon for
bibliographic tasks.
<p>This means that after CDSware installation you want to have
BibSched daemon running permanently. To launch BibSched daemon, do:
<blockquote>
<pre>
$ bibsched -d
</pre>
</blockquote>
To setup indexing, reformatting, and collection updating daemons to
run periodically with a sleeping period of, say, 1 hour:
<blockquote>
<pre>
$ bibindex -f50000 -s1h
$ bibreformat -oHB,HD -s1h
$ webcoll -v0 -s1h
$ bibrank -f50000 -s1h
</pre>
</blockquote>
<strong>HINT:</strong> It is good to have these three tasks
permanently in your BibSched queue so that your newly submitted
documents will be further processed automatically.
<p> Note that the BibSched daemon automatic mode stops as soon as some
of the tasks ends with an error. It it therefore a good idea to
inspect BibSched queue from time to time. This can be done by running
the BibSched command-line admin interface:
<blockquote>
<pre>
$ bibsched
</pre>
</blockquote>
that will permit you to stop/start the daemon mode, to delete the
tasks already submitted, to run some of the tasks manually, etc. Note
also that BibSched daemon writes log and error files on its operation
and on the operation of its tasks. The log and error files can be
found on your system at <LOGDIR>.
<p><strong>HINT:</strong> You may want to launch the
<code>bibsched</code> command from time to time (say a couple of times
per day) to inspect the BibSched queue and to verify the status of the
BibSched system.
<p><strong>HINT:</strong> You may want to clean up the old BibSched
tasks with the DONE status, let us say once per month, to make the
task table slim and the bibsched daemon both faster and less memory
hungry:
<blockquote>
<pre>
$ echo "DELETE FROM schTASK WHERE status='DONE' AND runtime<DATE_SUB(NOW(), INTERVAL 1 MONTH);" | dbexec
</pre>
</blockquote>
<h2>Recalculate ranking weights</h2>
<p>When you are adding new records to the system, the word frequency
ranking weights for old records aren't recalculated by default in
order to speed up the insertion of new records. This may influence a
bit the precision of word similarity searches. It is therefore
advised to expressely run bibrank in the recalculating mode once in a
while by doing:
<blockquote>
<pre>
$ bibrank -wwrd -R
</pre>
</blockquote>
You may want to do this either (i) periodically, say once per month,
or (ii) depending on the number of newly added records, say when the
database size grows by 2-3 percent.
<h2>Guest Users Cleanup</h2>
<p>Guest users create a lot of entries in <CDSNAME> tables that are
related to their web sessions, their search history, personal baskets,
etc. This data has to be garbage-collected periodically. At the
moment this is done via a command line program:
<blockquote>
<pre>
$ sessiongc
</pre>
</blockquote>
<strong>HINT:</strong> You may want to launch this command every day.
In the future the garbage collection task may be done via BibSched task
queue. <!--FIXME-->
<h2>Alert Engine</h2>
<p><CDSNAME> users may set up an automatic notification email alerts
that would send them documents corresponding to the user profile by
email either daily, weekly, or monthly. It is the job of the alert
engine to do this. The alert engine has to be run every day:
<blockquote>
<pre>
$ alertengine
</pre>
</blockquote>
<strong>HINT:</strong> You may want to set up an external cron job
to call <code>alertengine</code> each day.
<h2>Cleaning Up the Filesystem</h2>
<p>BibSched creates log and err files in <code><prefix>/var/log</code>
directory that is good to clean up from time to time. For example:
<blockquote>
<pre>
$ find /usr/local/cdsware-DEMO/var/log -name "bibsched_task_*" -size 0c -exec \rm -f {} \;
$ find /usr/local/cdsware-DEMO/var/log -name "bibsched_task_*" -atime +28 -exec \rm -f {} \;
$ find /usr/local/cdsware-DEMO/var/log -name "bibsched_task_*" -atime +7 -exec gzip -9 {} \;
</pre>
</blockquote>
<p>BibReformat creates temporary XML files in
<code><prefix>/var/tmp</code> that may be deleted after they are
uploaded. For example:
<blockquote>
<pre>
$ find /usr/local/cdsware-DEMO/var/tmp -name "rec_fmt_*.xml" -size 0c -exec \rm -f {} \;
$ find /usr/local/cdsware-DEMO/var/tmp -name "rec_fmt_*.xml" -atime +28 -exec \rm -f {} \;
$ find /usr/local/cdsware-DEMO/var/tmp -name "rec_fmt_*.xml" -atime +7 -exec gzip -9 {} \;
</pre>
</blockquote>
<p>The BibHarvest admin tool (oaiharvest) creates temporary XML files in
<code><prefix>/var/tmp</code> that may be deleted after they are
uploaded. For example:
<blockquote>
<pre>
$ find /usr/local/cdsware-DEMO/var/tmp -name "bibharvestadmin.*" -exec \rm -f {} \;
$ find /usr/local/cdsware-DEMO/var/tmp -name "bibconvertrun*" -exec \rm -f {} \;
$ find /usr/local/cdsware-DEMO/var/tmp -name "oaiharvest*" -exec gzip -9 {} \;
</pre>
</blockquote>
<p>FIXME: Thoughts on WebSubmit log archives, what to keep, what not.
Event Timeline
Log In to Comment