diff --git a/modules/bibharvest/doc/admin/bibharvest-admin-guide.webdoc b/modules/bibharvest/doc/admin/bibharvest-admin-guide.webdoc index 0e9760256..8c8e4f862 100644 --- a/modules/bibharvest/doc/admin/bibharvest-admin-guide.webdoc +++ b/modules/bibharvest/doc/admin/bibharvest-admin-guide.webdoc @@ -1,292 +1,274 @@ ## -*- mode: html; coding: utf-8; -*- ## $Id$ ## This file is part of CDS Invenio. ## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN. ## ## CDS Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## CDS Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDS Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

Contents

1. Overview
2. OAI Data Harvesting
-       2.1 One time harvesting
-       2.2 Periodical harvesting
-3. OAI Repository
+       2.1 Bibharvest Admin Interface
+       2.2 oaiharvest commmand-line tool
+3. OAI Repository (Exporting)
       3.1 Definition of OAI sets
       3.2 Exposing metadata via OAI Repository Gateway

1. Overview

The BibHarvest module handles metadata gathering and delivery between OAI-PMH v.2.0 compliant repositories. Metadata exchange is performed on top of the OAI-PMH, the Open Archives Initiative's Protocol for Metadata Harvesting. The BibHarvest Admin Interface can be used to set up a database of OAI sources and determine the periodicity of harvesting.

2. OAI Data Harvesting

-

2.1 One-time harvesting

- -

To harvest records from an OAI compliant repository, run the -bibharvest command-line tool. For example: - -

-
-$ bibharvest -vListRecords -f2004-04-01 -u2004-04-02 -pmarcxml -o/tmp/z.xml \\
-             http://cdsweb.cern.ch/oai2d
-
-
- -

For further help with the command-line harvesting tool, run -bibharvest --help. - - -

2.2 Periodical harvesting

-

In order to periodically harvest metadata from one or several repositories, it is possible to organize OAI sources through the BibHarvest Admin Interface. The interface allows the administrator to add new repositories as well as edit and delete existing ones. Once the database has been set up thorugh the BibHarvest Admin Interface, run the oaiharvest command-line tool to run periodical harvesting

-

2.2.1. Bibharvest Admin Interface

+

2.1 Bibharvest Admin Interface

Add OAI sources

The first step requires the administrator to enter the baseURL of the OAI repository. This is done for validation purposes - i.e. to check that the baseURL actually points to an OAI-compliant repository. (Note: the validation simply performs an 'Identify' query to the baseURL and parses the reply with crucial tags such as OAI-PMH and Identify).
Once the baseURL is validated, the administrator is required to fill into the following fields:

Edit and delete OAI sources

Once a source has been added to the database, it will be visible in the overview page, as shown below

name baseURL metadataprefix frequency bibconvertfile postprocess actions
cdsweb http://cdsweb.cern.ch/oai2d marcxml daily NULL h-u edit / delete

At this point it will be possible to edit the definition of this source by clicking on the appropriate action button. All the fields described in 2.2.1 can be modified (except for the Starting date). There is no more validation at this stage, hence, please take extra care when editing important fields such as baseurl and metadataprefix.
OAI repositories can be removed from the database by clicking on the appropriate action button in the overview page.

-

2.2.2 oaiharvest commmand-line tool

+

2.2 oaiharvest commmand-line tool

Once administrators have set up their desired OAI repositories in the database through the Admin Interface they can invoke oaiharvest to start up periodical harvesting.

Oaiharvest usage

  oaiharvest [options]
 
  Specific options:
  -r, --repository=REPOS_ONE,"REPOS TWO"     name of the OAI repositories to be harvested (default=all)
  -d, --dates=yyyy-mm-dd:yyyy-mm-dd          harvest repositories between specified dates (overrides repositories' last updated timestamps)
 
  Scheduling options:
  -u,  --user=USER          user name to store task, password needed
  -s,  --sleeptime=SLEEP    time after which to repeat tasks (no)
                            e.g.: 1s, 30m, 24h, 7d
  -t,  --time=TIME          moment for the task to be active (now)
                            e.g.: +15s, 5m, 3h , 2002-10-27 13:57:26
 
  General options:
  -h,  --help               print this help and exit
  -V,  --version            print version and exit
  -v,  --verbose=LEVEL      verbose level (from 0 to 9, default 1)
 
oaiharvest performs a number of operations on the repositories listed in the database. By default oaiharvest considers all repositories, one by one (this gets overridden when --repository argument is passed).
For each repository that is considered, oaiharvest behaves according to the arguments passed at the command line:
Oaiharvest usage examples

In most cases, administrators will want oaiharvest to run in the background, i.e. run in sleep mode and wake up periodically (e.g. every 24 hours) to check whether updates are needed, e.g. oaiharvest -s 24h

In other cases, administrators may want to perform periodical harvesting only on specific sources, e.g. oaiharvest -r cdsweb -s 12h

Another option is that administrators may want to harvest from certain repositories within two specific dates. This will be regarded as a one-off operation and will not affect the last update value of the source, e.g. oaiharvest -r cdsweb -d 2005-05-05:2005-05-30

-

3. OAI Repository

+

3. OAI Repository (Exporting)

The OAI Repository corresponds to a set of metadata exposed for periodical harvesting by external OAI service providers. The following steps have to be done in order to expose metadata via OAI:

3.1. Definition of OAI sets

The OAI repository is composed of OAI sets that correspond to a part of your metadata base. OAI sets can be configured via the OAI Repository Admin Interface, featuring essential OAI repository maintenance tasks.

Each OAI set is defined by one or a union of several definitions. Each such definition is provided separately and corresponds to one line as displayed in the OAI Repository Admin overview page. The following information displays:

3.2. Exposing metadata via OAI Repository Gateway

3.2.1 oaiarchive commmand-line tool

Once OAI Repository is defined, the next step is to expose corresponding metadata via the OAI Repository Gateway. This is done by launching the oaiarchive script.

Oaiarchive usage

 
  oaiarchive [options]
 
 
  Options:
 
  -o --oaiset=    Specify setSpec
  -h --help       Print this help
  -V --version    Print version information and exit
 
  Modes
  -a --add        Add records to OAI repository
  -d --delete     Remove records from OAI repository
  -r --report     Print OAI repository status
  -i --info       Give info about OAI set (default)
 
  Additional parameters:
  -p --upload     Upload records
 
  -u --user=USER       User name to submit the task as, password needed.
  -v --verbose=LEVEL   Verbose level (0=min,1=normal,9=max).
  -s --sleeptime=SLEEP Time after which to repeat tasks (no)
  -t --time=DATE       Moment for the task to be active (now).
 
 
To expose set 'setname' via OAI repository gateway:
  oaiarchive -apo 'setname' -s24
 
To remove records defined by set 'setname' from OAI repository:
  oaiarchive -dpo 'setname'
 
To print OAI set status launch:
  oaiarchive -io 'setname'
 
To print out the current status of the OAI repository launch:
  oaiarchive -r
 

Please note that the oaiarchive script can be scheduled via BibSched in order to periodically update the OAI Repository with respect to database modifications and OAI set definitions modifications.

Please see also invenio.conf for more fine configurations of the OAI Repository.

3.2.2 Exposing entire metadata database

In order to expose all public records (the entire content of the Home collection) via the OAI Repository gateway, a predefined global set can be used. To expose the global set with periodical updates on daily basis launch:
  $ oaiarchive -apo global -s24h
 
To perform a reverse operation, i.e. to remove all records from the global OAI set, remove the oaiarchive task from the BibSched queue and launch:
  $ oaiarchive -dpo global