Page MenuHomec4science

bibupload-admin-guide.webdoc
No OneTemporary

File Metadata

Created
Tue, Jan 21, 23:07

bibupload-admin-guide.webdoc

## -*- mode: html; coding: utf-8; -*-
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
<!-- WebDoc-Page-Title: BibUpload Admin Guide -->
<!-- WebDoc-Page-Navtrail:<a class="navtrail" href="<CFG_SITE_URL>/help/admin<lang:link/>">_(Admin Area)_</a> -->
<!-- WebDoc-Page-Revision: $Id$ -->
<h2>Contents</h2>
<strong>1. <a href="#1">Overview</a></strong><br />
<strong>2. <a href="#2">Configuring BibUpload</a></strong><br />
<strong>3. <a href="#3">Running BibUpload</a></strong><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.1. <a href="#3.1">Inserting new records</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.2. <a href="#3.2">Updating existing records</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.3. <a href="#3.3">Inserting and updating at the same time</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.4. <a href="#3.4">Updating preformatted output formats</a><br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.5. <a href="#3.5">Uploading fulltext files</a><br />
<a name="1"></a><h2>1. Overview</h2>
<p>BibUpload enables you to upload bibliographic data in MARCXML
format into CDS Invenio bibliographic database. It is also used
internally by other CDS Invenio modules as the sole entrance of
metadata into the bibliographic databases.</p>
<p>Note that before uploading a MARCXML file, you may want to run
provided <code>/opt/cds-invenio/bin/xmlmarclint</code> on it in order
to verify its correctness.</p>
<a name="2"></a><h2>2. Configuring BibUpload</h2>
<p>BibUpload takes a MARCXML file as its input. There is nothing to
be configured for these files. If the files have to be coverted into
MARCXML from some other format, structured or not, this is usually
done beforehand via <a href="bibconvert-admin">BibConvert</a> module.
</p>
<p>Note that if you are using external system numbers for your
records, such as when your records are being synchronized from an
external system, then BibUpload knows about the tag 970 as the one
containing external system number. (To change this 970 tag into
something else, you would have to edit BibUpload config source file.)
</p>
<p>Note also that in the similar way BibUpload knows about OAI
identifiers, so that it will refuse to insert the same OAI harvested
record twice, for example.
</p>
<a name="3"></a><h2>3. Running BibUpload</h2>
<a name="3.1"></a><h3>3.1 Inserting new records</h3>
<p>Consider that you have an MARCXML file containing new records that
is to be uploaded into the CDS Invenio. (For example, it might have
been produced by <a href="bibconvert-admin">BibConvert</a>.) To finish
the upload, you would call the BibUpload script in the insert mode as
follows:
<blockquote>
<pre>
$ bibupload -i file.xml
<pre>
</blockquote>
In the insert mode, all the records from the file will be treated as
new. This means that they should not contain neither 001 tags
(holding record IDs) nor 970 tags (holding external system numbers).
BibUpload would refuse to upload records having these tags, in order
to prevent potential double uploading. If your file does contain 001
or 970, then chances are that you want to update existing records, not
re-upload them as new, and so BibUpload will warn you about this and
will refuse to continue.
</p>
<p>For example, to insert a new record, your file should look like this:
<pre>
&lt;record&gt;
&lt;datafield tag="100" ind1=" " ind2=" "&gt;
&lt;subfield code="a"&gt;Doe, John&lt;/subfield&gt;
&lt;/datafield&gt;
&lt;datafield tag="245" ind1=" " ind2=" "&gt;
&lt;subfield code="a"&gt;On The Foo And Bar&lt;/subfield&gt;
&lt;/datafield&gt;
&lt;/record&gt;
</pre>
</p>
<a name="3.2"></a><h3>3.2 Updating existing records</h3>
<p>When you want to update existing records, with the new content from
your input MARCXML file, then your input file should contain either
tags 001 (holding record IDs) or tag 970 (holding external system
numbers). BibUpload will try to match existing records via 001 and
970 and if it finds a record in the database that corresponds to a
record from the file, it will update its content. Otherwise it will
signal an error saying that it could not find the
record-to-be-updated.
</p>
<p>For example, to update a title of record #123 via correct mode, your
input file should contain record ID in the 001 tag and the title in
245 tag as follows:
<pre>
&lt;record&gt;
&lt;controlfield tag="001"&gt;123&lt;/controlfield&gt;
&lt;datafield tag="245" ind1=" " ind2=" "&gt;
&lt;subfield code="a"&gt;My Newly Updated Title&lt;/subfield&gt;
&lt;/datafield&gt;
&lt;/record&gt;
</pre>
</p>
<p>There are several updating modes:
<pre>
-r, --replacerecord Replace existing records by those from the XML
MARC file. The original content is wiped out
and fully replaced. Signals error if record
is not found via matching record IDs or system
numbers.
Note also that `-r' can be combined with `-i'
into an `-ir' option that would automatically
either insert records as new if they are not
found in the system, or correct existing
records if they are found to exist.
-a, --appendfield Append fields from XML MARC file at the end of
existing records. The original content is
enriched only. Signals error if record is not
found via matching record IDs or system
numbers.
-c, --correctfield Correct fields of existing records by those
from XML MARC file. The original record
content is modified only in the fields from
the XML MARC file: the original fields are
removed and replaced by those from the XML
MARC file. Fields not present in XML MARC
file are not changed (unlike the -r option).
Signals error if record is not found via
matching record IDs or system numbers.
</pre>
</p>
<a name="3.3"></a><h3>3.3 Inserting and updating at the same time</h3>
<p>Note that the insert/update modes can be combined together. For
example, if you have a file that contains a mixture of new records
with possibly some records to be updated, then you can run:
<blockquote>
<pre>
$ bibupload -i -r file.xml
<pre>
</blockquote>
In this case BibUpload will try to do an update (for records having
either 001 or 970 identifiers), or an insert (for the other ones).
</p>
<a name="3.4"></a><h3>3.4 Updating preformatted output formats</h3>
<p>BibFormat can use this special upload mode during which metadata will
not be updated, only the preformatted output formats for records:
<pre>
-f, --format Upload only the format (FMT) fields.
The original content is not changed, and neither its modification date.
</pre>
This is useful for <code>bibreformat</code> daemon only; human
administrators don't need to explicitly know about this mode.
</p>
<a name="3.5"></a><h3>3.5 Uploading fulltext files</h3>
<p>The fulltext files can be uploaded and revised via a special FFT
("fulltext file transfer") tag with the following
semantic:
<pre>
FFT $a ... location of the docfile to upload (a filesystem path or a URL)
$n ... docfile name (optional; if not set, deduced from $a)
$m ... new desired docfile name (optional; used for renaming files)
$t ... docfile type (e.g. Main, Additional)
$d ... docfile description (optional)
$f ... format (optional; if not set, deduced from $a)
$z ... comment (optional)
$r ... restriction (optional, see below)
$v ... version (used only with REVERT, see below)
$x ... url/path for an icon (optional)
</pre>
</p>
<p>For example, to upload a new fulltext file <code>thesis.pdf</code>
associated to record ID 123:
<pre>
&lt;record&gt;
&lt;controlfield tag="001"&gt;123&lt;/controlfield&gt;
&lt;datafield tag="FFT" ind1=" " ind2=" "&gt;
&lt;subfield code="a"&gt;/tmp/thesis.pdf&lt;/subfield&gt;
&lt;subfield code="t"&gt;Main&lt;/subfield&gt;
&lt;subfield code="d"&gt;
This is the fulltext version of my thesis in the PDF format.
Chapter 5 still needs some revision.
&lt;/subfield&gt;
&lt;/datafield&gt;
&lt;/record&gt;
</pre>
</p>
<p>The FFT tag can be repetitive, so one can pass along another FFT
tag instance containing a pointer to e.g. the thesis defence slides.
The subfields of an FFT tag are non-repetitive.
</p>
<p>The bibupload process, when it encounters FFT tags, will
automatically populate fulltext storage space
(<code>/opt/cds-invenio/var/data/files</code>) and metadata record
associated tables (<code>bibrec_bibdoc</code>, <code>bibdoc</code>) as
appropriate. It will also enrich the 856 tags (URL tags) of the MARC
metadata of the record in question with references to the latest
versions of each file.
</p>
<p>The bibupload process supports the usual modes correct, append,
replace, insert with a semantic that is somewhat similar to the
semantic of the metadata upload:
<blockquote>
<table border="1">
<thead>
<tr>
<th></th><th>Metadata</th> <th>Fulltext</th>
</tr>
</thead>
<tbody>
<tr>
<td>objects being uploaded</td><td> MARC field instances characterized by tags (010-999) </td> <td> fulltext files characterized by unique file names (FFT $n)</td>
</tr>
<tr>
<td> insert </td><td> insert new record; must not exist </td><td> insert new files; must not exist </td>
</tr>
<tr>
<td> append </td><td> append new tag instances for the given tag XXX, regardless of existing tag instances</td><td> append new files, if filename (i.e. new format) not already present </td>
</tr>
<tr>
<td> correct </td><td> correct tag instances for the given tag XXX; delete existing ones and replace with given ones </td><td> correct files with the given filename; add new revision or delete file</td>
</tr>
<tr>
<td> replace </td><td> replace all tags, whatever XXX are</td><td> replace all files, whatever filenames are </td>
</tr>
</tbody>
</table>
</blockquote>
</p>
<p>Note, in append and insert mode, <pre>$m</pre> is ignored.
<p>In order to purge previous file revisions (i.e. in order to keep
only the latest file version), please use the correct mode with $n
docname and $t PURGE as the special keyword.
</p>
<p>In order to delete all existing versions of a file, making it
effectively hidden, please use the correct mode with $n docname
and $t DELETE as the special keyword.
</p>
<p>In order to expunge (i.e. remove completely, also from the filesystem) all existing versions of a file, making it
effectively disappear, please use the correct mode with $n docname
and $t EXPUNGE as the special keyword.
</p>
<p>In order to revert to a previous file revision (i.e. to create a new revision with
the same content as some previous revision had), please use the correct mode with
$n docname, $t REVERT as the special keyword and $v the number corresponding
to the desired version.</p>
<p>In order to preserve previous comments and descriptions when correcting, please use the KEEP-OLD-VALUE special keyword with the desired $d and $z subfield.
</p>
<p>In order to add an icon representing a document, you must use a $x subfields. All the FFT for different format of the document must have then the same $x subfield. You can use KEEP-OLD-VALUE in order to keep the previous icon when correcting.
</p>
<p>The $r subfield can contain a keyword that can be use to restrict the given document. The same keyword must be specified for all the format of a given document. The keyword will be used as the status parameter for the "viewrestrdoc" action, which can be used to give access right/restriction to desired user. e.g. if you set the keyword "thesis", you can the connect the "thesisviewer" to the action "viewrestrdoc" with parameter "status" set to "thesis". Then all the user which are linked with the "thesisviewer" role will be able to download the document. Instead any other user will not be allowed. Note, if you use the keyword "KEEP-OLD-VALUE" the previous restrictions if applicable will be kept.
</p>

Event Timeline