diff --git a/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc b/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc index ba216b6af..f1b53dbda 100644 --- a/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc +++ b/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc @@ -1,478 +1,483 @@ ## -*- mode: html; coding: utf-8; -*- ## This file is part of Invenio. -## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 CERN. +## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2013 CERN. ## ## Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

Contents

1. Overview
2. Configuring BibUpload
3. Running BibUpload
       3.1. Inserting new records
       3.2. Inserting records into the Holding Pen
       3.3. Updating existing records
       3.4. Inserting and updating at the same time
       3.5. Updating preformatted output formats
       3.6. Uploading fulltext files
4. Batch Uploader
       4.1. Web interface - Cataloguers
       4.1. Web interface - Robots
       4.2. Daemon mode

1. Overview

BibUpload enables you to upload bibliographic data in MARCXML format into Invenio bibliographic database. It is also used internally by other Invenio modules as the sole entrance of metadata into the bibliographic databases.

Note that before uploading a MARCXML file, you may want to run provided /opt/invenio/bin/xmlmarclint on it in order to verify its correctness.

2. Configuring BibUpload

BibUpload takes a MARCXML file as its input. There is nothing to be configured for these files. If the files have to be coverted into MARCXML from some other format, structured or not, this is usually done beforehand via BibConvert module.

Note that if you are using external system numbers for your records, such as when your records are being synchronized from an external system, then BibUpload knows about the tag 970 as the one containing external system number. (To change this 970 tag into something else, you would have to edit BibUpload config source file.)

Note also that in the similar way BibUpload knows about OAI identifiers, so that it will refuse to insert the same OAI harvested record twice, for example.

3. Running BibUpload

3.1 Inserting new records

Consider that you have an MARCXML file containing new records that is to be uploaded into the Invenio. (For example, it might have been produced by BibConvert.) To finish the upload, you would call the BibUpload script in the insert mode as follows:

 $ bibupload -i file.xml
 
 
In the insert mode, all the records from the file will be treated as new. This means that they should not contain neither 001 tags (holding record IDs) nor 970 tags (holding external system numbers). BibUpload would refuse to upload records having these tags, in order to prevent potential double uploading. If your file does contain 001 or 970, then chances are that you want to update existing records, not re-upload them as new, and so BibUpload will warn you about this and will refuse to continue.

For example, to insert a new record, your file should look like this:

     <record>
         <datafield tag="100" ind1=" " ind2=" ">
             <subfield code="a">Doe, John</subfield>
         </datafield>
         <datafield tag="245" ind1=" " ind2=" ">
             <subfield code="a">On The Foo And Bar</subfield>
         </datafield>
     </record>
 

3.2 Inserting records into the Holding Pen

A special mode of BibUpload that is thigthly connected with BibEdit is the Holding Pen mode.

When you insert a record using the holding pen mode such as in the following example:

 $ bibupload -o file.xml
 
the records are not actually integrated into the database, but are instead put into an intermediate space called holding pen, where authorized curators can review them, manipulate them and eventually approve them.

The holding pen is integrated with BibEdit.

3.3 Updating existing records

When you want to update existing records, with the new content from your input MARCXML file, then your input file should contain either tags 001 (holding record IDs) or tag 970 (holding external system numbers). BibUpload will try to match existing records via 001 and 970 and if it finds a record in the database that corresponds to a record from the file, it will update its content. Otherwise it will signal an error saying that it could not find the record-to-be-updated.

For example, to update a title of record #123 via correct mode, your input file should contain record ID in the 001 tag and the title in 245 tag as follows:

     <record>
         <controlfield tag="001">123</controlfield>
         <datafield tag="245" ind1=" " ind2=" ">
             <subfield code="a">My Newly Updated Title</subfield>
         </datafield>
     </record>
 

There are several updating modes:

 
     -r, --replace Replace existing records by those from the XML
                   MARC file.  The original content is wiped out
                   and fully replaced.  Signals error if record
                   is not found via matching record IDs or system
                   numbers.
                   Fields defined in Invenio config variable
                   CFG_BIBUPLOAD_STRONG_TAGS are not replaced.
 
                   Note also that `-r' can be combined with `-i'
                   into an `-ir' option that would automatically
                   either insert records as new if they are not
                   found in the system, or correct existing
                   records if they are found to exist.
 
     -a, --append  Append fields from XML MARC file at the end of
                   existing records.  The original content is
                   enriched only.  Signals error if record is not
                   found via matching record IDs or system
                   numbers.
 
     -c, --correct Correct fields of existing records by those
                   from XML MARC file.  The original record
                   content is modified only on those fields from
                   the XML MARC file where both the tags and the
                   indicators match: the original fields are
                   removed and replaced by those from the XML
                   MARC file.  Fields not present in XML MARC
                   file are not changed (unlike the -r option).
                   Fields with "provenance" subfields defined in
                   'CFG_BIBUPLOAD_CONTROLLED_PROVENANCE_TAGS'
                   are protected against deletion unless the
                   input MARCXML contains a matching
                   provenance value.
                   Signals error if record is not found via
                   matching record IDs or system numbers.
 
     -d, --delete  Delete fields of existing records that are
                   contained in the XML MARC file. The fields in
                   the original record that are not present in
                   the XML MARC file are preserved.
                   This is incompatible with FFT (see below).
 

If you combine the --pretend parameter with the above updating modes you can actually test what would be executed without modifying the database or altering the system status.

3.4 Inserting and updating at the same time

Note that the insert/update modes can be combined together. For example, if you have a file that contains a mixture of new records with possibly some records to be updated, then you can run:

 $ bibupload -i -r file.xml
 
 
In this case BibUpload will try to do an update (for records having either 001 or 970 identifiers), or an insert (for the other ones).

3.5 Updating preformatted output formats

BibFormat can use this special upload mode during which metadata will not be updated, only the preformatted output formats for records:

     -f, --format        Upload only the format (FMT) fields.
                         The original content is not changed, and neither its modification date.
 
This is useful for bibreformat daemon only; human administrators don't need to explicitly know about this mode.

3.6 Uploading fulltext files

The fulltext files can be uploaded and revised via a special FFT ("fulltext file transfer") tag with the following semantic:

     FFT $a  ...  location of the docfile to upload (a filesystem path or a URL)
         $d  ...  docfile description (optional)
         $f  ...  format (optional; if not set, deduced from $a)
         $m  ...  new desired docfile name (optional; used for renaming files)
         $n  ...  docfile name (optional; if not set, deduced from $a)
         $o  ...  flag (repeatable subfield)
         $r  ...  restriction (optional, see below)
         $t  ...  docfile type (e.g. Main, Additional)
         $v  ...  version (used only with REVERT and DELETE-FILE, see below)
         $x  ...  url/path for an icon (optional)
         $z  ...  comment (optional)
 

For example, to upload a new fulltext file thesis.pdf associated to record ID 123:

     <record>
         <controlfield tag="001">123</controlfield>
         <datafield tag="FFT" ind1=" " ind2=" ">
             <subfield code="a">/tmp/thesis.pdf</subfield>
             <subfield code="t">Main</subfield>
             <subfield code="d">
               This is the fulltext version of my thesis in the PDF format.
               Chapter 5 still needs some revision.
             </subfield>
         </datafield>
     </record>
 

The FFT tag can be repetitive, so one can pass along another FFT tag instance containing a pointer to e.g. the thesis defence slides. The subfields of an FFT tag are non-repetitive.

When more than one FFT tag is specified for the same document (e.g. for adding more than one format at a time), if $t (docfile type), $m (new desired docfile name), $r (restriction), $v (version), $x (url/path for an icon), are specified, they should be identically specified for each single entry of FFT. E.g. if you want to specify an icon for a document with two formats (say .pdf and .doc), you'll write two FFT tags, both containing the same $x subfield.

The bibupload process, when it encounters FFT tags, will automatically populate fulltext storage space (/opt/invenio/var/data/files) and metadata record associated tables (bibrec_bibdoc, bibdoc) as appropriate. It will also enrich the 856 tags (URL tags) of the MARC metadata of the record in question with references to the latest versions of each file.

Note that for $a and $x subfields filesystem paths must be absolute (e.g. /tmp/icon.gif is valid, while Destkop/icon.gif is not) and they must be readable by the user/group of the bibupload process that will handle the FFT.

The bibupload process supports the usual modes correct, append, replace, insert with a semantic that is somewhat similar to the semantic of the metadata upload:

Metadata Fulltext
objects being uploaded MARC field instances characterized by tags (010-999) fulltext files characterized by unique file names (FFT $n)
insert insert new record; must not exist insert new files; must not exist
append append new tag instances for the given tag XXX, regardless of existing tag instances append new files, if filename (i.e. new format) not already present
correct correct tag instances for the given tag XXX; delete existing ones and replace with given ones correct files with the given filename; add new revision or delete file; if the docname does not exist the file is added
replace replace all tags, whatever XXX are replace all files, whatever filenames are
delete delete all existing tag instances not supported

-

Note, in append and insert mode,

$m
is ignored. +

Note that you can mix regular MARC tags with special FFT tags in +the incoming XML input file. Both record metadata and record files +will be updated as a result. Hence beware with some input modes, such +as replace mode, if you would like to touch only files.

+ +

Note that in append and insert mode the $m is ignored.

In order to rename a document just use the the correct mode specifing in the $n subfield the original docname that should be renamed and in $m the new name.

Special values can be assigned to the $t subfield.

ValueMeaning
PURGEIn order to purge previous file revisions (i.e. in order to keep only the latest file version), please use the correct mode with $n docname and $t PURGE as the special keyword.
DELETEIn order to delete all existing versions of a file, making it effectively hidden, please use the correct mode with $n docname and $t DELETE as the special keyword.
EXPUNGEIn order to expunge (i.e. remove completely, also from the filesystem) all existing versions of a file, making it effectively disappear, please use the correct mode with $n docname and $t EXPUNGE as the special keyword.
FIX-MARCIn order to synchronize MARC to the bibrec/bibdoc structure (e.g. after an update or a tweak in the database), please use the correct mode with $n docname and $t FIX-MARC as the special keyword.
FIX-ALLIn order to fix a record (i.e. put all its linked documents in a coherent state) and synchronize the MARC to the table, please use the correct mode with $n docname and $t FIX-ALL as the special keyword.
REVERTIn order to revert to a previous file revision (i.e. to create a new revision with the same content as some previous revision had), please use the correct mode with $n docname, $t REVERT as the special keyword and $v the number corresponding to the desired version.
DELETE-FILEIn order to delete a particular file added by mistake, please use the correct mode with $n docname, $t DELETE-FILE, specifing $v version and $f format. Note that this operation is not reversible. Note that if you don't spcify a version, the last version will be used.

In order to preserve previous comments and descriptions when correcting, please use the KEEP-OLD-VALUE special keyword with the desired $d and $z subfield.

The $r subfield can contain a string that can be use to restrict the given document. The same value must be specified for all the format of a given document. By default the keyword will be used as the status parameter for the "viewrestrdoc" action, which can be used to give access right/restriction to desired user. e.g. if you set the keyword "thesis", you can the connect the "thesisviewer" to the action "viewrestrdoc" with parameter "status" set to "thesis". Then all the user which are linked with the "thesisviewer" role will be able to download the document. Instead any other user will not be allowed. Note, if you use the keyword "KEEP-OLD-VALUE" the previous restrictions if applicable will be kept.

More advanced document-level restriction is indeed possible. If the value contains infact:

Note, that authors (as defined in the record MARC) and superadmin are always authorized to access a document, no matter what is the given value of the status.

Some special flags might be set via FFT and associated with the current document by using the $o subfield. This feature is experimental. Currently only two flags are actively considered:

Note that each time bibupload is called on a record, the 8564 tags pointing to locally stored files are recreated on the basis of the full-text files connected to the record. Thus, if you whish to update some 8564 tag pointing to a locally managed file, the only way to perform this is through the FFT tag, not by editing 8564 directly.

4. Batch Uploader

4.1 Web interface - Cataloguers

The batchuploader web interface can be used either to upload metadata files or documents. Opposed to daemon mode, actions will be executed only once.

The available upload history displays metadata and document uploads using the web interface, not daemon mode.

4.2 Web interface - Robots

If it is needed to use the batch upload function from within command line, this can be achieved with a curl call, like:

 $ curl -F 'file=@localfile.xml' -F 'mode=-i' http://cdsweb.cern.ch/batchuploader/robotupload -A invenio_webupload
 
 

This service provides (client, file) checking to assure the records are put into a collection the client has rights to.
To configure this permissions, check CFG_BATCHUPLOADER_WEB_ROBOT_RIGHTS variable in the configuration file.
The allowed user agents can also be defined using the CFG_BATCHUPLOADER_WEB_ROBOT_AGENT variable.

4.2 Daemon mode

The batchuploader daemon mode is intended to be a bibsched task for document or metadata upload. The parent directory where the daemon will look for folders metadata and documents must be specified in the invenio configuration file.

An example of how directories should be arranged, considering that invenio was installed in folder /opt/invenio would be:

      /opt/invenio/var/batchupload
             /opt/invenio/var/batchupload/documents
                     /opt/invenio/var/batchupload/documents/append
                     /opt/invenio/var/batchupload/documents/revise
             /opt/invenio/var/batchupload/metadata
                     /opt/invenio/var/batchupload/metadata/append
                     /opt/invenio/var/batchupload/metadata/correct
                     /opt/invenio/var/batchupload/metadata/insert
                     /opt/invenio/var/batchupload/metadata/replace
 

When running the batchuploader daemon there are two possible execution modes:

         -m,   --metadata    Look for metadata files in folders insert, append, correct and replace.
                             All files are uploaded and then moved to the corresponding DONE folder.
         -d,   --documents   Look for documents in folders append and revise. Uploaded files are then
                             moved to DONE folders if possible.
 
By default, metadata mode is used.

An example of invocation would be:

 $ batchuploader --documents
 
 

It is possible to program batch uploader to run periodically. Read the Howto-run guide to see how.