diff --git a/modules/bibharvest/doc/admin/bibharvest-admin-guide.webdoc b/modules/bibharvest/doc/admin/bibharvest-admin-guide.webdoc index 0e9760256..8c8e4f862 100644 --- a/modules/bibharvest/doc/admin/bibharvest-admin-guide.webdoc +++ b/modules/bibharvest/doc/admin/bibharvest-admin-guide.webdoc @@ -1,292 +1,274 @@ ## -*- mode: html; coding: utf-8; -*- ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
To harvest records from an OAI compliant repository, run the
-bibharvest
command-line tool. For example:
-
-
-- --$ bibharvest -vListRecords -f2004-04-01 -u2004-04-02 -pmarcxml -o/tmp/z.xml \\ - http://cdsweb.cern.ch/oai2d --
For further help with the command-line harvesting tool, run
-bibharvest --help
.
-
-
-
In order to periodically harvest metadata from one or several repositories, it is possible to organize OAI sources through the BibHarvest Admin Interface. The interface allows the administrator to add new repositories as well as edit and delete existing ones. Once the database has been set up thorugh the BibHarvest Admin Interface, run the oaiharvest command-line tool to run periodical harvesting
-The first step requires the administrator to enter the baseURL of the OAI repository. This is done for validation purposes - i.e. to check that the baseURL actually points to an OAI-compliant repository.
(Note: the validation simply performs an 'Identify' query to the baseURL and parses the reply with crucial tags such as OAI-PMH
and Identify
).
Once the baseURL is validated, the administrator is required to fill into the following fields:
Once a source has been added to the database, it will be visible in the overview page, as shown below
name | baseURL | metadataprefix | frequency | bibconvertfile | postprocess | actions | |
---|---|---|---|---|---|---|---|
cdsweb | http://cdsweb.cern.ch/oai2d | marcxml | daily | NULL | h-u | edit / delete |
At this point it will be possible to edit the definition of this source by clicking on the appropriate action button. All the fields described in 2.2.1 can be modified (except for the Starting date).
There is no more validation at this stage, hence, please take extra care when editing important fields such as baseurl and metadataprefix.
OAI repositories can be removed from the database by clicking on the appropriate action button in the overview page.
Once administrators have set up their desired OAI repositories in the database through the Admin Interface they can invoke oaiharvest
to start up periodical harvesting.
Oaiharvest usage
oaiharvest [options] Specific options: -r, --repository=REPOS_ONE,"REPOS TWO" name of the OAI repositories to be harvested (default=all) -d, --dates=yyyy-mm-dd:yyyy-mm-dd harvest repositories between specified dates (overrides repositories' last updated timestamps) Scheduling options: -u, --user=USER user name to store task, password needed -s, --sleeptime=SLEEP time after which to repeat tasks (no) e.g.: 1s, 30m, 24h, 7d -t, --time=TIME moment for the task to be active (now) e.g.: +15s, 5m, 3h , 2002-10-27 13:57:26 General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1)
oaiharvest
performs a number of operations on the repositories listed in the database. By default oaiharvest
considers all repositories, one by one (this gets overridden when --repository
argument is passed).
oaiharvest
behaves according to the arguments passed at the command line:
--dates
argument is not passed, it checks whether an update from the repository is needed (Note: the update status is calculated based on the time of the last harvesting and the frequency chosen by the administrator).
bibharvest
and harvests all the metadata that the repository has added since the data of the last update. When the update is finished, the last update value is set to the current time and date.
+--dates
argument is passed, it simply harvests the metadata of the repository from/until the given dates. The last update date is left unchanged.
In most cases, administrators will want oaiharvest
to run in the background, i.e. run in sleep mode and wake up periodically (e.g. every 24 hours) to check whether updates are needed, e.g. oaiharvest -s 24h
In other cases, administrators may want to perform periodical harvesting only on specific sources, e.g. oaiharvest -r cdsweb -s 12h
Another option is that administrators may want to harvest from certain repositories within two specific dates. This will be regarded as a one-off operation and will not affect the last update value of the source, e.g. oaiharvest -r cdsweb -d 2005-05-05:2005-05-30
-
Each OAI set is defined by one or a union of several definitions. Each such definition is provided separately and corresponds to one line as displayed in the OAI Repository Admin overview page. The following information displays:
Once OAI Repository is defined, the next step is to expose corresponding metadata via the OAI Repository Gateway. This is done by launching the oaiarchive
script.
Oaiarchive usage
To expose set 'setname' via OAI repository gateway:oaiarchive [options] Options: -o --oaiset= Specify setSpec -h --help Print this help -V --version Print version information and exit Modes -a --add Add records to OAI repository -d --delete Remove records from OAI repository -r --report Print OAI repository status -i --info Give info about OAI set (default) Additional parameters: -p --upload Upload records -u --user=USER User name to submit the task as, password needed. -v --verbose=LEVEL Verbose level (0=min,1=normal,9=max). -s --sleeptime=SLEEP Time after which to repeat tasks (no) -t --time=DATE Moment for the task to be active (now).
To remove records defined by set 'setname' from OAI repository:oaiarchive -apo 'setname' -s24
To print OAI set status launch:oaiarchive -dpo 'setname'
To print out the current status of the OAI repository launch:oaiarchive -io 'setname'
oaiarchive -r
Please note that the oaiarchive
script can be scheduled via BibSched
in order to periodically update the OAI Repository with respect to database modifications and OAI set definitions modifications.
Please see also invenio.conf for more fine configurations of the OAI Repository.
global
set can be used. To expose the global set with
periodical updates on daily basis launch:
To perform a reverse operation, i.e. to remove all records from the global OAI set, remove the oaiarchive task from the$ oaiarchive -apo global -s24h
BibSched
queue and launch:
$ oaiarchive -dpo global