diff --git a/modules/bibconvert/doc/admin/guide.html.wml b/modules/bibconvert/doc/admin/guide.html.wml index 4d1015756..ea05af375 100644 --- a/modules/bibconvert/doc/admin/guide.html.wml +++ b/modules/bibconvert/doc/admin/guide.html.wml @@ -1,933 +1,935 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="BibConvert Admin Guide" \ navtrail_previous_links="/admin/> > /admin/bibconvert/>BibConvert Admin" \ navbar_name="admin" \ navbar_select="bibconvert-admin-guide" +

Version <: print generate_pretty_revision_date_string('$Id$'); :> +

Contents

1. Overview
2. Configuration Example
3. Running BibConvert
4. BibConvert Configuration Guide
       Conventions
       4.1. Step 1 Definition of Source Record
       4.2. Step 2 Definition of Source Fields
       4.3. Step 3 Definition of Target Record
       4.4. Formatting in BibConvert
          4.4.1 Definition of Formatting Functions
          4.4.2 Generated Values

1. Overview

BibConvert utility enables you to convert metadata records from various metadata formats into another metadata format supported by local database. It is designed to process metadata records harvested in XML converting them into MARC21 before they are finally uploaded into database. However, BibConvert is flexible enough to deal also with other structured metadata according to your needs and offers a way, how you actually can insert what you want into the database.

It is suitable for tasks such as conversion of records received from multiple data sources or conversion of records from another system that may support a different metadata format.

2. Configuration Example

OAI DublinCore into MARC21 and OAI MARC into MARC21 configurations will be provided as default configuration, ensuring the standard uploading sequence (incl. BibHarvest and BibUpload utilities). Other configurations can be created according to your needs. The configuration file that has to be created for each data source is a text file with following structure:

 ### the configuration starts here
 ### Configuration of bibconvert templates
 ### source data : 
 
 === data extraction configuration template ===
 ### here comes the data extraction configuration template
 #   entry example:
 
 AU---%A---MAX---;---
 
 #   extracts maximum available data by field from metadata record
 #   the values are found between specified tags
 #   in this case between the '%A' tag and other tags defined
 #   repetitive values are recognized by a semicolon separator
 #   resp. by multiple presence of '%A' tag
 
 ===   data source configuration template   ===
 ### here comes the data source configuration template
 #   entry example:
 
 AU---<:FIRSTNAME:>-<:SURNAME:>
 
 #   describes the contents of extracted source data fields
 #   in this case, the field AU is described as having two distinct subfields
 
 ===   data target configuration template   ===
 ### here comes the data target configuration template
 #   entry example:
 
 AU::CONF(AU,,0)---<datafield id="700" ind1="" ind2=""><subfield code="a"><:AU*::SURNAME::CAP():>, <AU*::FIRSTNAME::ABR():></subfield></datafield>
 
 #   This section concerns rather the desired output, while previous two were focused on the data source structures.
 #   Each line equals to one output line, composed of given literals and values from extracted source data fields.
 #   In this example, the XML Marc21 output line is defined, 
 #   containing re-formatted values of source fields SURNAME and FIRSTNAME
 
 ### the configuration ends here
 

Having prepared a configuration, the BibConvert will convert the source data file according to it in a batch mode. The BibConvert is fully compatible with the Uploader1.x configuration language. For more information, have a look at the BibConvert Configuration Guide section below.

3. Running BibConvert

For a fully functional demo, consider the following sample input data:

sample.dat -- sample bibliographic data to be converted and inputted into CDSware
sample.cfg -- sample configuration file, featuring knowledge base demo

To convert the above data into XML MARC, use the following command:

 $ bibconvert -b'<collection>' -csample.cfg -e'</collection>' < sample.dat > /tmp/sample.xml
 
 
and see the XML MARC output file. You would then continue the upload procedure by calling BibUpload.

Other useful BibConvert configuration examples:

dcq.cfg -- Qualified Dublin Core in SGML to XML MARC example
dcq.dat -- corresponding data file, featuring collection identifiers demo
bibtex.cfg -- BibTeX to XML MARC example

4. BibConvert Configuration Guide

Conventions


- comment line starts with '#' sign in the first column
- each section is declared by a line starting with '===' (further characters on the line are ignored)
- values are separated by '---'

4.1. Step 1 Definition of Source record

- Create/edit "data extraction configuration template" section of the configuration file.
- Each line of this section stands for a definition of one source field:

name---keyword---terminating string---separator---

- Choose a (valid) name allowed by the system
- Enter keyword and terminating string, which are boundary tags for the wanted value extraction
- In case the field is repetitive, enter the value separator
- "---"is mandatory separator between all values, even zero-length
- MAX/MIN keywords can be used instead of terminating string
 

Example of a definition of author(repetitive) and title (non-repetitive) fields:

 === data extraction configuration template ===
 ### here comes the data extraction configuration template
 

AU---AU_---MAX---;---
TI---TI_---EOL------

4.2. Step 2 Definition of Source fields

Each field extracted from the source according to the definition done in the first step can have an internal structure, which is described in this section.

- Create/edit "data source configuration template" section of the configuration file.
- Each line of this section stands for a definition of one source field
- corresponds to the name defined in the step 1

name---{CONST<:SUBFIELD:>[CONST]}}

- Enter only constants that appear systematically.
- Between two discrete subfields has to be defined a constant of a non zero length
- "---"is a mandatory separator between the name and the source field definition

Example of a definition of author(repetitive) and title (non-repetitive) fields:

 ===   data source configuration template   ===
 TI---<:TI:>
 AU---<:FIRSTNAME:>-<:SURNAME:>
 

4.3. Step 3 Definition of target record

This definition describes the layout of the target record that is created by the conversion, together with the corresponcence to the source fields defined in step 2.

- Create/edit "data target configuration template" section of the configuration file.
- Each line of this section stands for an output line created by the conversion.
- <name> corresponds to the name defined in the steps 1 and 2

CODE---CONST<:name::SUBFIELD::FUNCT():>CONST<:GENERATED_VALUE:>

- CODE stands for a tag for readability (optional)
- "::"is a mandatory separator between the name and the subfield definition
- optionally, you can apply the appropriate formatting function(s) and generated values
- "::"is a mandatory separator between the subfield definition and the function(s)
- "---"is a mandatory separator between the tag and the output code definition
- mark repetitive source fields with an asterisk (*)

Example of a definition of author (repetitive) and title (non-repetitive) codes:

 
AU::CONF(AU,,0)---<datafield id="700" ind1="" ind2=""><subfield code="a"><:AU*::AU:></subfield></datafield>
TI::CONF(TI,,0)---<datafield id="245" ind1="" ind2=""><subfield code="a"><:TI::TI::SUP(SPACE, ):></subfield></datafield>

4.4 Formatting in BibConvert

 4.4.1 Definition of formatting functions

Every field can be processed with a variety of functions that partially or entirely change the original value.
There are three types of functions available that take as element either single characters, words or the entire value of processed field.
 

Every function requires a certain number of parameters to be entered in brackets. If an  insufficient number of parameters is present, the function uses default values. Default values are constructed with attempt to keep the original value.

The configuration of templates is case sensitive.

The following functions are available:

ADD(prefix,suffix) - add prefix/suffix
KB(kb_file,[0-9]) -lookup in kb_file and replace value
ABR(x,suffix)/ABRW(x,suffix) - abbreviation with suffix addition
ABRX() - abbreviate exclusively words longer
CUT(prefix,postfix) - remove substring from side
REP(x,y) - replacement of characters
SUP(type) - suppression of characters of specified type
LIM(n,L/R)/LIMW(str,L/R) - restriction to n letters
WORDS(n,side) - restriction to n words from L/R
MINL(n)/MAXL(n) - replacement of words shorter/greater than n
MINLW(n) - replacement of short values
EXP(str,1|0)/EXPW(type) - replacement of words from value if containing spec. type/string
IF(value,valueT,valueF) - replace T/F value
UP/DOWN/CAP/SHAPE/NUM - lower case and upper case, shape
SPLIT(n,h,str,from)/SPLITW(sep,h,str,from) - split into more lines
CONF(field,value,1/0)/CONFL(value,1/0) - confirm validity of a field
RANGE(from,to) - confirm only entries in the specified range
 

ADD(prefix,postfix)

default: ADD(,)    no addition

Adds prefix/postfix to the value, we can use this function to add the proper field name as a prefix of the value itself:

ADD(WAU=,)    prefix for the first author (which may have been taken from the field AU2)
 

KB(kb_file)    -    kb_file search

default: KB(kb_file,1/0/R)

The input value is compared to a kb_file and may be replaced by another value. In the case that the input value is not recognized, it is by default kept without any modification. This default can be overridden by _DEFAULT_---default value entry in the kb_file

The file specified in the parameter is a text file representing a table of values that correspond to each other:

{input_value---output_value}

KB(file,1) searches the exact value passed.
KB(file,0) searches the KB code inside the value passed.
KB(file,2) as 0 but not case sensitive
KB(file,R) replacements are applied on substrings/characters only.

bibconvert look-up value in KB_file in one of following modes:
===========================================================
1 - case sensitive / match (default)
2 - not case sensitive / search
3 - case sensitive / search
4 - not case sensitive / match
5 - case sensitive / search (in KB)
6 - not case sensitive / search (in KB)
7 - case sensitive / search (reciprocal)
8 - not case sensitive / search (reciprocal)
9 - replace by _DEFAULT_ only
R - not case sensitive / search (reciprocal) replace

Edge spaces are not considered. Output value is not further formated.

ABR(x,trm),ABRW(x,trm)  - abbreviate term to x places with(out) postfix

default: ABR(1,.)
default: ABRW(1,.)

The words in the input value are shortened according to the parameters specified. By default, only the initial character is kept and the output value is terminated by a dot.
ABRW takes entire value as one word.

 
example input output
ABR() firstname_surname f._s.
ABR(1,) firstname_surname f_s
ABR(10,COMMA) firstname_surname firstname,_surname,

ABRX() - abbreviate exclusively words longer than given limit

default: ABRX(1,.)

Exclusively words that reach the specified length limit in the input value are abbreviated. No suffix is appended to the words shorter than specified limit.

CUT(prefix,postfix) - remove substring from side

default: CUT(,)

Remove string from the value (reverse function to the "ADD")

REP(x,y)   - replace x with y

default: REP(,)    no replacement

The input value is searched for the string specified in the first parameter. All such strings are replaced with the string specified in the second parameter.

SUP(type,string)   - suppress chars of certain type

default: SUP(,)    type not recognized

All groups of characters belonging to the type specified in the first parameter are suppressed or replaced with a string specified in the second parameter.

Recognized types:

SPACE .. invisible chars incl. NEWLINE
ALPHA .. alphabetic
NALPHA .. not alphabetic
NUM .. numeric
NNUM    .. not numeric
ALNUM  .. alphanumeric
NALNUM  .. non alphanumeric
LOWER  .. lower case
UPPER  .. upper case
PUNCT  .. punctuation
NPUNCT  .. not punctuation
 

 
example input output
SUP(SPACE,-) sep_1999 sep-1999
SUP(NNUM) sep_1999 1999
SUP(NUM) sep_1999 sep_

LIM(n,side)/LIMW(str,side)   - limit to n letters from L/R

default: LIM(0,)        no change
default: LIMW(,R)        no change

Limits the value in order to get the required number of characters by cutting excess characters from either side.
LIMW removes the Left/Right side to the (str) string.

 
example input output
LIM(4,L) sep_1999 1999
LIM(4,R) sep_1999 sep_
LIMW(_,R) sep_1999 sep_

WORDS(n,side)  - limit to n words from L/R

default: WORDS(0,R)

Keeps the number of words specified in the first parameter from either side.
 
 

 
example input output
WORDS(1) sep_1999 1999
WORDS(1,L) sep_1999 sep_

MINL(n)   - exp. words shorter than n

default: MINL(1)

All words shorter than the limit specified in the parameter are replaced fro mthe sentence.
The words with length exactly n are kept.
 
 

 
example input output
MINL(2) History of Physics History of Physics
MINL(3) History of Physics History Physics

MAXL(n)   - exp. words longer than n

default: MAXL(0)

All words greater in number of characters than the limit specified in the parameter are replaced. Words with length exactly n are kept.
 
 

 
example input output
MAXL(2) History of Physics of
MAXL(3) History of Physics of

MINLW(n) - replacement of short values

default: MINLW(1) (no change)

The entire value is deleted if shorter than the specified limit.
This is used for the validation of created records, where we have 20 characters in the header.
The default validation is MINLW(21), i.e. the record entry will not be consided as valid, unless it contains at least 21 characters including the header. This default setting can be overriden by the -l command line option.

In order to increase the necessary length of the output line in the configuration itself, apply the function on the total value:

AU::MINLW(25)---CER <:SYSNO:> AU    L <:SURNAME:>, <:NAME:>
 
 

EXP(str,1|0) - exp./aprove word containing specified string

default: EXP   (,0)     leave all value

The record is shortened by replacing words containing the specified string.
The second parameter states whether the string approves the word (0) or disables it (1).

for example, to get the email address from the value, use the following
 
 

 
example input output
EXP(@,0) mail to: libdesk@cern.ch libdesk@cern.ch
EXP(:,1) mail to: libdesk@cern.ch mail libdesk@cern.ch
EXP(@) mail to: libdesk@cern.ch libdesk@cern.ch

EXPW(type)   - exp. word from value if containing spec. type

default: EXPW        type not recognized
 

The sentence is shortened by replacing words containing specified type of character.

Types supported in EXPW function:

ALPHA .. alphabetic
NALPHA .. not alphabetic
NUM .. numeric
NNUM    .. not numeric
ALNUM  .. alphanumeric
NALNUM  .. non alphanumeric
LOWER  .. lower case
UPPER  .. upper case
PUNCT  .. punctuation
NPUNCT  .. non punctuation

Note: SPACE is not handled as a keyword, since all space characters are considered as word separators.
 
 

 
example input output
EXPW(NNUM) sep_1999 1999
EXPW(NUM) sep_1999 sep

IF(value,valueT,valueF) - replace T/F value

default: IF(,,)

Compares the value with the first parameter. In case the result is TRUE, the input value is replaced with the second parameter, otherwise the input value is replaced with the third parameter.
In case the input value has to be kept, whatever it is, the keyword ORIG can be used (usually in the place of the third parameter)
 
 

 
example input output
IF(sep_1999,sep) sep_1999 sep
IF(oct_1999,oct) sep_1999
IF(oct_1999,oct,ORIG) sep_1999 oct_1999

UP    - upper case

Convert all characters to upper case

DOWN   - lower case

Convert all characters to lower case

CAP    - make capitals

Convert the initial character of each word to upper case and the rest of characters to lower case

SHAPE    - format string

Supresses all invalid spaces

NUM    - number

If it contains at least one digit, convert it into a number by suppressing other characters. Leading zeroes are deleted.

SPLIT(n,h,str,from)

Splits the input value into more lines, where each line contains at most (n+h+length of str) characters, (n) being the number of characters following the number of characters in the header, specified in (h). The header repeats at the beginning of each line. An additional string can be inserted as a separator between the header and the following value. This string is specified by the third parameter (str). It is possible to restrict the application of (str) so it does not appear on the first line by entering "2" for (from)

SPLITW(sep,h,str,from)

Splits the input value into more lines by replacing the line separator stated in (sep) with CR/LFs. Also, as in the case of the SPLIT function, the first (h) characters are taken as a header and repeat at the beginning of each line.  An additional string can be inserted as a separator between the header and the following value. This string is specified by the third parameter (str). It is possible to restrict the application of (str) so it does not appear on the first line by entering "2" for (from)

CONF(field,value,1/0)  - confirm validity of a field

The input value is taken as it is, or refused depending on the value of some other field. In case the other (field) contains  the string specified in (value), then the input value is confirmed (1) or refused (0).

CONFL(str,1|0) - confirm validity of a field

The input value is confirmed if it contains (1)/misses(0) the specified string (str)

RANGE(from,to) - confirm only entries in the specified range

Left side function of target template configuration section to select the desired entries from the repetitive field.
The range can only be continuous.

The entry is confirmed in case its input falls into the range from-to specified in the parameter, border values included. As an upper limit it is possibe to use the keyword MAX.

This is useful in case of AU code, where the first entry has a different definition from other entries:

AU::RANGE(1,1)---CER <:SYSNO:> AU2    L <:AU::SURNAME:>, <:AU::NAME:>    ... takes the first name from the defined AU field
AU::RANGE(2,MAX)---CER <:SYSNO:> AU     L <:AU::SURNAME:> , <:AU::NAME:>    ... takes the the rest of namesfrom the AU field
 

DEFP() - default print

The value is printed by default even if it does not contain any variable input from the source file.

4.4.2 Generated values

In the template configurations, values can be either taken from the source or generated in the process itself. This is mainly useful for evaluating constant values.

Currently, the following date values are generated:
 

DATE(format,n)

default: DATE(,10)

where n is the number of digits required.

Generates the current date in the form given as a parameter. The format has to be given according to the ANSI C notation, i.e. the string is composed out of following components:

  %a    abbreviated weekday name
  %A    full weekday name
  %b    abbreviated month name
  %B    full month name
  %c    date and time representation
  %d    decimal day of month number (01-31)
  %H    hour (00-23)(12 hour format)
  %I    hour (01-12)(12 hour format)
  %j    day of year(001-366)
  %m    month (01-12)
  %M    minute (00-59)
  %p    local equivalent of a.m. or p.m.
  %S    second (00-59)
  %U    week number in year (00-53)(starting with Sunday)
  %V    week number in year
  %w    weekday (0-6)(starting with Sunday)
  %W    week number in year (00-53)(starting with Monday)
  %x    local date representation
  %X    local time representation
  %y    year (no century prefix)
  %Y    year (with century prefix)
  %Z    time zone name
  %%    %
 

WEEK(diff)

Enters the two-digit number of the current week (%V) increased by specified difference.
If the resulting number is negative, the returned value is zero (00).
Values are kept up to 99, three digit values are shortened from the left.

WEEK(-4)    returns 48, if current week is 52
WEEK           current week
 

SYSNO

 
Works the same as DATE, however the format of the resulting value is fixed so it complies with the requirements of further record handling. The format is 'whhmmss', where:

w     current weekday
hh    current hour
mm    current minute
ss    current second

The system number, if generated like this, contains a variable value changing every second. For the system number is an identifier of the record, it is needed to ensure it will be unique for the entire record processed. Unlike the function DATE, which simply generates the value of format given, SYSNO keeps the value persistent throughout the entire record and excludes collision with other records that are generated in period of one week with one second granularity.

It is not possible to use the DATE function for generating a system number instead.

The system number is unique in range of one week only, according to the current definition.
 
 

OAI

Inserts OAI identifier incremented by one for earch record Starting value that is used in the first record in the batch job can be specified on the command line using the -o<starting_value> option.

diff --git a/modules/bibedit/doc/admin/guide.html.wml b/modules/bibedit/doc/admin/guide.html.wml index d70e8a41e..4c2a4f89e 100644 --- a/modules/bibedit/doc/admin/guide.html.wml +++ b/modules/bibedit/doc/admin/guide.html.wml @@ -1,185 +1,187 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="BibEdit Admin Guide" \ navtrail_previous_links="/admin/> > /admin/bibedit/>BibEdit Admin" \ navbar_name="admin" \ navbar_select="bibedit-admin-guide" +

Version <: print generate_pretty_revision_date_string('$Id$'); :> +

Contents

1. Overview
2. Edit records via Web interface
3. Edit records via command line
4. Delete records via command line

1. Overview

BibEdit enables you to directly manipulate bibliographic data, edit a single record, do global replacements, and other cataloguing tasks.

2. Edit records via Web interface

Bibliographic Metadata Editor on Web is not implemented yet. Please use the command-line technique describe below.

3. Edit records via command line

The idea is to download record in XML MARC format, edit it by using any editor, and upload the changes back. Note that you can edit any number of records at the same time: for example, you can download all records written by Qllis, J, open the file in your favourite text editor, and change globally the author name to the proper form Ellis, J.

You therefore continue as follows:

  1. Download the record in XML MARC. For example, download record ID 1234:
              $ wget -O z.xml 'http://your.site/search.py?recid=1234&of=xm'
             
    or download latest 5,000 public documents written by Qllis, J:
              $ wget -O z.xml 'http://your.site/search.py?p=Qllis%2C+J&f=author&of=xm&rg=5000'
             
  2. Edit the metadata as necessary:
              $ emacs z.xml
             
  3. Upload changes back:
              $ bibupload -r z.xml
             
  4. See the progress of the treatment of the file via BibSched:
              $ bibsched
             
    If you do not want to wait for the next wake-up time of indexing and formatting daemons, launch them manually now:
              $ bibindex
              $ bibreformat
              $ webcoll
              
    and watch the progress via bibsched.
After which the record(s) should be fully modified and formatted and all indexes and collections updated, as necessary.

4. Delete records via command line

Once a record has been uploaded, we prefer not to *destroy* it fully anymore (i.e. to wipe it out and to reuse its record ID for another record) for a variety of reasons. For example, some users may have put this record already into their baskets in the meantime, or the record might have been already announced by alert emails to the external world, or the OAI harvestors might have harvested it already, etc. We usually prefer only to *mark* records as deleted, so that our record IDs are ensured to stay permanent. Thus said, the canonical way to delete a record ID 1234 in CDSware v0.1.x development branch is to download its XML MARC:

        $ wget -O z.xml 'http://your.site/search.py?recid=1234&of=xm'
        
and to mark it as deleted by adding the indicator ``DELETED'' into the MARC 980 $$c tag:
        $ vi z.xml
        [...]
         <datafield tag="980" ind1="" ind2="">
           <subfield code="a">PREPRINT</subfield>
           <subfield code="c">DELETED</subfield>
         </datafield>
        [...]
        
and upload thusly modified record in the `replace' mode:
        $ bibupload -r z.xml
        
and watch the progress via bibsched, as mentioned in the section 3.

This procedure will remove all necessary entries from the words index space, the collection cache space, etc, so that the record will not be findable anymore from the search interface by usual means. But, the record HTML brief and detailed displays will remain untouched, so that the record will still be shown to the end users as it used to be when they will access their baskets, or when they access it via direct URL distributed by the alert engine (search.py?recid=1234).

In some cases this may not be what is wanted. For example you may want to warn the users that the record has been deleted and hide its old contents. To do this, just modify the contents of the other MARC tags as appropriate, for example you can remove everything and leave only a title warning:

        $ cat z.xml
        <record>
        <controlfield tag="001">1234</controlfield>
        <datafield tag="245" ind1="" ind2="">
           <subfield code="a">The record has been deleted</subfield>
        </datafield>
        <datafield tag="980" ind1="" ind2="">
           <subfield code="c">DELETED</subfield>
        </datafield>
        </record>
 
so that the end users would see a message ``The record has been deleted'' instead of the usual title, authors, and stuff in their baskets.

P.S. Note that the ``bibXXx'' tables will keep having entries for the deleted records. These entries are to be cleaned from time to time by the BibEdit garbage collector. This GC isn't part of CDSware yet; moreover in the future we plan to abolish all the bibXXx tables, so that this won't be necessary anymore.

P.S. If you want to wipe out all the existing bibliographic content of your site, for example to start uploading the documents from scratch again, you can launch:

        $ /path/to/your/cdsware/bin/dbexec < /path/to/your/cdsware-source/modules/miscutil/sql/tabbibclean.sql
        $ /path/to/your/cdsware/bin/webcoll
      
diff --git a/modules/bibformat/doc/admin/guide.html.wml b/modules/bibformat/doc/admin/guide.html.wml index c119c7598..a4c9808ff 100644 --- a/modules/bibformat/doc/admin/guide.html.wml +++ b/modules/bibformat/doc/admin/guide.html.wml @@ -1,2488 +1,2490 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="BibFormat Admin Guide" \ navtrail_previous_links="/admin/> > /admin/bibformat/>BibFormat Admin" \ navbar_name="admin" \ navbar_select="bibformat-guide" +

Version <: print generate_pretty_revision_date_string('$Id$'); :> +

Contents

1. Overview
2. Configuring BibFormat
3. Running BibFormat
       3.1 From Web interface
       3.2 From the command-line interface
4. Detailed Configuration Manual

1. Overview

The BibFormat admin interface enables you to specify how the bibliographic data is presented to the end user in the search interface and search results pages. For example, you may specify that titles should be printed in bold font, the abstract in small italic, etc. Moreover, the BibFormat is not only a simple bibliographic data output formatter, but also an automated link constructor. For example, from the information on journal name and pages, it may automatically create links to publisher's site based on some configuration rules.

2. Configuring BibFormat

By default, a simple HTML format based on the most common fields (title, author, abstract, keywords, fulltext link, etc) is defined. You certainly want to define your own ouput formats in case you have a specific metadata structure.

Here is a short guide of what you can configure:

Behaviours
Define one or more output BibFormat behaviours. These are then passed as parameters to the BibFormat modules while executing formatting.
Example: You can tell BibFormat that is has to enrich the incoming metadata file by the created format, or that it only has to print the format out.
Extraction Rules
Define how the metadata tags from input are mapped into internal BibFormat variable names. The variable names can afterwards be used in formatting and linking rules.
Example: You can tell that 100 $a field should be mapped into $100.a internal variable that you could use later.
Link Rules
Define rules for automated creation of URI links from mapped internal variables.
Example: You can tell a rule how to create a link to People database out of the $100.a internal variable repesenting author's name. (The $100.a variable was mapped in the previous step, see the Extraction Rules.)
File Formats
Define file format types based on file extensions. This will be used when proposing various fulltext services.
Example: You can tell that *.pdf files will be treated as PDF files.
User Defined Functions (UDFs)
Define your own functions that you can reuse when creating your own output formats. This enables you to do complex formatting without ever touching the BibFormat core code.
Example: You can define a function how to match and extract email addresses out of a text file.
Formats
Define the output formats, i.e. how to create the output out of internal BibFormat variables that were extracted in a previous step. This is the functionality you would want to configure most of the time. It may reuse formats, user defined functions, knowledge bases, etc.
Example: You can tell that authors should be printed in italic, that if there are more than 10 authors only the first three should be printed, etc.
Knowledge Bases (KBs)
Define one or more knowledge bases that enables you to transform various forms of input data values into the unique standard form on the output.
Example: You can tell that Phys Rev D and Physical Review D are both the same journal and that these names should be standardized to Phys Rev : D.
Execution Test
Enables you to test your formats on your sample data file. Useful when debugging newly created formats.

To learn more on BibFormat configuration, you can consult the BibFormat Admin Guide.

3. Running BibFormat

3.1. From the Web interface

Run Reformat Records tool. This tool permits you to update stored formats for bibliographic records.
It should normally be used after configuring BibFormat's Behaviours and Formats. When these are ready, you can choose to rebuild formats for selected collections or you can manually enter a search query and the web interface will accomplish all necessary formatting steps.
Example: You can request Photo collections to have their HTML brief formats rebuilt, or you can reformat all the records written by Ellis.

3.2. From the command-line interface

Consider having an XML MARC data file that is to be uploaded into the CDSware. (For example, it might have been harvested from other sources and processed via BibConvert.) Having configured BibFormat and its default output type behaviour, you would then run this file throught BibFormat as follows:

 $ bibformat < /tmp/sample.xml > /tmp/sample_with_fmt.xml
 
 
that would create default HTML formats and would "enrich" the input XML data file by this format. (You would then continue the upload procedure by calling successively BibUpload and BibWords.)

Now consider a different situation. You would like to add a new possible format, say "HTML portfolio" and "HTML captions" in order to nicely format multiple photographs in one page. Let us suppose that these two formats are called hp and hc and are already loaded in the collection_format table. (TODO: describe how this is done via WebAdmin.) You would then proceed as follows: firstly, you would prepare the corresponding output behaviours called HP and HC (TODO: note the uppercase!) that would not enrich the input file but that would produce an XML file with only 001 and FMT tags. (This is in order not to update the bibliographic information but the formats only.) You would also prepare corresponding formats at the same time. Secondly, you would launch the formatting as follows:

 $ bibformat otype=HP,HC < /tmp/sample.xml > /tmp/sample_fmts_only.xml
 
 
that should give you an XML file containing only 001 and FMT tags. Finally, you would upload the formats:
 $ bibupload < /tmp/sample_fmts_only.xml
 
 
and that's it. The new formats should now appear in WebSearch.

4. Detailed Configuration Manual

What follows is a transcription of an old FlexElink Configuration Manual v0.3 (2002-07-31). The text suffers from low HTML quality and missing screen snapshots. The terminology may not be fully up-to-date at places.

1. - About BibFormat 2

2. - How it works?. 3

3. - A first look at the web configuration interface. 5

4. - Mapping the input (OAI Extraction Rules) 7

5. - Defining output types: Behaviors 10

6. - Formats 12

7. - Knowledge bases (KBs) 13

8. - User Defined Functions (UDFs) 14

9. - Defining links 16

9.1. - EXTERNAL link conditions 17

9.2. - INTERNAL link conditions 17

9.3. - Example. 18

10. - User management 20

11. - Evaluation Language Reference. 21

 


 

1. - About BibFormat

 

BibFormat is a piece of software that is part of the CERN Document Server (CDS, http://www.cds.ch) and more concretely of the CDS Search module (http://weblib.cern.ch).

Its mission, in few words, is to provide a flexible mechanism to format the bibliographic records that are shown as a result of CDS Search user queries allowing the administrators or users customize the view of them. Besides, it offers the possibility of using a linking system that can generate automatically all the links included in the displayed records (fulltext access, electronic journals reference, etc) reducing considerably maintenance.

To clarify this too formal definition, we'll try to illustrate the role of BibFormat inside the CDS Search module by showing the following figure. Please, note that this drawing is trying to show the main role that BibFormat plays in the CDS structure and it's quite simplified, but of course the underlying logic is a bit more complex.

[Fig. 0]

 

As you can see, when a user query is received, Weblib determines which records from the database match it; then it ask BibFormat to format the obtained records. BibFormat looks at its rule repository and for each record determines which format has to be taken, applies the format specification and solves the possible links; gives all this (in a formatted way) back to Weblib and it makes a nice HTML page including the formatted results given by BibFormat among other info.

The good point in all this is that anyone that has access to BibFormat rule repository is able to modify the final appearance of a query result in the CDS Search module without altering the logic of the search engine.

In order to be able to modify this BibFormat rule repository, a web configuration interface is provided. Trough this paper, we'll try to explain (in a friendly way and form the user point of view) how to access this interface, how it's structured and how to configure BibFormat trough it to achieve desired results.


 

2. - How it works?

 

We've outlined which is the role of BibFormat inside the CDS, so it's time now to have an overview of how it works and how it's organized. We'll try not to be very technical, however a few explanation about the BibFormat repository and architecture is needed to understand how it works.

BibFormat, basically, takes some bibliographic records as input and produces a formatted & linked version of them as output. By "formatted" we mean that BibFormat can produce an output containing a transformed version of the input data (normally an HTML view); the good part is that you can entirely specify the transformation to apply. At the same time, by "linked" we mean that you can ask BibFormat to include (if necessary) inside this formatted version references to some Internet resources that are related to the data from some pre-configured rules.

As an example, we could imagine that you'd want to see the resulting records from CDS Search queries to show their title in bold followed by their authors separated by comas. For achieving this you'll have to go to the BibFormat configuration interface and define a behavior for BibFormat in which you describe how to format incoming records:

 

"<b>" $title "</b>"

forall($author){

  $author separator(", ")

}

Figure 1.- A very first Evaluation Language example

Don't be scared!! It's a first approach to the way BibFormat allows you to describe formats. As you can see, BibFormat uses a special language that you'll have to learn if you want to be able to specify formats or links; it seems difficult (as much as a programming language) but you'll see that it's quite more easy than it seems at first sight.

In the next figure, is shown how BibFormat works internally. When BibFormat is called, it receives a set of bibliographic records to format. It separates each record and translates it into a set of what we call "internal variables"; these "internal variables" are simply an internal representation of the bibliographic record; the important thing with them is that they will be available when you have to describe the formats. Once it has these "internal vars", the processor module looks into the behavior repository for that one (let's say format) you've asked BibFormat to apply (when BibFormat is called, you can indicate which of the pre-configured behaviors to apply; this allows it to have more than one behavior); inside this behavior you can specify which data you want to appear, how it has to appear, some links if they exist…in other words, the format (actually, it's something more than a format, it describes how BibFormat has to behave for a given input; that's why we refer to it as behavior).  As we've already said, you can include links in a behavior specification; links are a special BibFormat feature that helps you to reduce the maintenance of your formats: you can include a link in several formats or behaviors.


The picture below, describes all this explanation.

 

 

[Fig. 2]

 

Summarizing, BibFormat can transform an input made up of bibliographic records in an HTML output (not only HTML but any text-based output) according to certain pre-configured specifications (behaviors) that you can entirely define using a certain language.

Just to mention, currently BibFormat is working taking OAI MARC XML as format for input records, but it can be adapted to other ways of inputs (reading a database, function call, etc) with a little of development.


3. - A first look at the web configuration interface

 

BibFormat can be configured through its configuration interface that is accessible via web. It's made up of a bunch of web pages that present you the main configuration aspects of BibFormat allowing you to change them. In this section we are going to have a first look at this web interface, how it's structured and its correspondence with BibFormat features.

Before entering these web pages you'll be asked for your accessing username & password. Only certain users are allowed to access BibFormat WI; first you need a CDS account that you can create easily by using the standard CDS account manager; then you have to ask BibFormat administrator to give privileges to access the WI.

 Once your password is accepted you'll access the configuration interface. You'll see that is quite simple: It's structured in different sections; each of them corresponds to a BibFormat feature and you can navigate through them by using a navigation bar that is always present on the left.

 

[Fig. 3]

 

Here you are a list of the different sections the interface offers you and their correspondence with BibFormat features:

·        Behaviors: This is the main section, the one you enter by default when you access the web interface. It contains definitions for the different pre-configured output types or behaviors that allow you to define how you want BibFormat to behave when each output type is selected. More information in chapter Defining output types: Behaviors of this manual.

·        OAI Extraction Rules: The input types and mapping rules for OAI MARC XML inputs are defined here. You'll find here the information about all the internal variables and their correspondence with the input XML tags. See chapter Mapping the input of this manual for more information.

·        Link Rules: Allows you to access the link rules repository for defining the way links are generated. See chapter Defining Links for a more detailed description about the BibFormat linking system.

·        UDFs: Presents you a list of all the User Defined Functions (UDFs) that you can use inside Evaluation Language (EL) statements that are used for specifying different configuration aspects. You'll also be able to modify or extend this list within this section. Everything about using UDFs and defining new ones in chapter User Defined Functions (UDFs).

·        Formats: Another EL feature: You can define a certain piece of EL code under a name for re-using it whenever you want. See chapter Formats.

·        KBs: A complete management interface for Knowledge Bases (KBs); those KBs will also be available inside EL statements. See chapter Knowledge Bases(Kbs) for more specific information.

·        Execution Test: You'll be able to execute BibFormat from this section and view the results and some debug info in a web page. You have to specify an input data file (through a URL).

·        User management: Allows you to define which CDS users can access or not the BibFormat web interface.

 

Each section has different particularities but the way of dealing with them follows a common line through the interface. However, each section with their common things and particular characteristics are treated in the following chapters of this manual.

 

 

 

 

 


4. - Mapping the input (OAI Extraction Rules)

 

We have already spoken a bit about BibFormat internal variables. These are a key point to understand the BibFormat way of working. As you know, BibFormat takes some bibliographic records as input and, according to some pre-configured behavior, formats them into HTML, for example. The problem is that this input records can come in several formats: different XML conventions, database records, etc. For now, at CDS we only consider that the input comes in OAI MARC XML but for the near future we'll may be have to extend it to accept other input formats.

That's the reason why internal variables appear; they provide a common way to refer to input data without relaying in any concrete format. In other words, we will define BibFormat links and behaviors referring to these internal variables and we'll have some rules that define how to map an input format to them, so we would be able to use any BibFormat defined behavior with any input that can be mapped to internal variables.

 

[Fig. 4]

 

You shouldn't worry about this because is more in the development/administration side, but it's important to know where internal variables come from and what they refer to. Besides, for CDS we only consider the incoming data in OAI MARC XML format, so we'll talk only about this case.

Internal variables are quite a simple concept: It's just a label that represents some values from the input. Besides, a variable can have fields that are also labels that represent values from the input but that are related to other under the variable (e.g. You can have a variable that maps authors and another that maps authors home institutes independently; but if you want to have represent an author and his home institute you need to relate these two variables in some way). Variables and their fields also support multiple values.

Focusing on OAI MARC XML, the concept of variable and field is already in the input structure.:

·        Each occurrence of OAI MARC XML varfield element will correspond to a different variable value.

·        Each occurrence of OAI MARC XML subfield inside a certain varfield element will correspond to a different field value of the variable that maps the varfield.

So what we will have in BibFormat is a set of rules that tells a variable name to which varfield element corresponds and each variable field name which subfield element maps. Trough the web interface you'll be able to add or delete new fields to variables or variables themselves, you'll be able even to modify the mapping tags of variables (this way you can keep your formats independent of changes in the meaning of MARC tags).

In the web interface, all this is located in OAI Ext. Rules section as you can see in the following figure:

 

[Fig. 5]

Let's illustrate how BibFormat maps a certain input to variables and fields with an example:

We have this variable & field definition on BibFormat:

 

Var. label

Mapping tag

Mult. V.

Fields

100

<varfield id="100" i1="" i2="">

Yes

 

Field label

Mapping tag

a

<subfield label="a">

e

<subfield label="e">

 

 

909C0

<varfield id="909" i1="C" i2="0">

No

 

 

Field label

Mapping tag

b

<subfield label="b">

 

 

 

And then a record like the following arrives as input:

 

<oai_marc>

   <varfield id="037" i1="" i2="">

      <subfield label="a">SCAN-0009119</subfield>

   </varfield>

   <varfield id="100" i1="" i2="">

      <subfield label="a">Racah, Giulio</subfield>

   </varfield>

  <varfield id="100" i1="" i2="">

      <subfield label="a">Guignard, G</subfield>

      <subfield label="e">editor</subfield>

   </varfield>

   <varfield id="909" i1="C" i2="0">

      <subfield label="b">11</subfield>

   </varfield>

   <varfield id="909" i1="C" i2="0">

      <subfield label="b">12</subfield>

   </varfield>

</oai_marc>


 

The result of the mapping would be like this:

 

 

Variable "100"

Value# 0

 

Field "a" value

Racah, Giulio

Value# 1

 

Field "a" value

Guignard, G

Field "e" value

editor

 

 

Variable "909C0"

Value# 0

 

Field "b" value

12

 

Notice how varfield 037 is not considered because there isn't an entry in the BibFormat configuration. Also notice how the values are created: if "allow multiple values" is set to "Yes" each occurrence of a varfield element determines a new value (variable "100"); in other case, the last value is taken as single value for the variable (variable "909C0").

 


5. - Defining output types: Behaviors

 

Now that we already know how internal variables are structured and what they represent in the input, it's time to have a look at how to configure BibFormat to transform that input data mapped into variables into HTML results (although any text-based output could be generated).

When BibFormat is asked to format a bunch of bibliographic records, it is also necessary to specify which output type it has to use. This output type is a string that identifies a pre-configured set of conditions and actions that tells BibFormat how to behave with the given input data (that's why the terms output type and behavior are used indifferently along this document).

BibFormat can have several pre-configured behaviors each one identified by a different label. There are two different types of behaviors (you can choose the behavior type when you define it):

1.     Normal ŕ Consists in a behavior that outputs exactly the result of its evaluation.

2.     Input Erich (only for XML inputs)ŕ It echoes each xml record from the input inserting the behavior result just before the xml closing element of the record.

Each behavior contains an ordered list of conditions; a condition can contain zero or more associated actions (actions are ordered inside a condition). A condition is a behavior item described by an Evaluation Language expression that gives as result "TRUE" or "FALSE". An action is an Evaluation Language (EL) statement that produces any output.

When BibFormat is called to format a set of input records with a given behavior label, it looks for the behavior conditions. It evaluates their EL in order and when one of them produces "TRUE" as result, it looks for their associated actions. Then BibFormat evaluates the actions in the specified order and concatenates their result.

By using different conditions you can specify alternative formats inside a behavior (imagine that you want to format a record differently depending on its base number); it's true that you could also reach this solution by using EL IF statements, but it's more clear, efficient and re-usable (you can change one condition without touching the rest or you can give it more priority than others, that means give it the chance to be evaluated before others, by changing its apply order).

Actions are used for specifying the format itself or the actions you want to carry on with in case the condition is accomplished.

Through the web interface you can define new output types or modify the ones that already exist. The use is quite easy: you just have to select the link in the desired item with the operation you want to do over it.

 

[Fig. 6]

 

Let's have a look at a simple example to illustrate how to define behavior that fit our needs:

Imagine a typical case where you want to format bibliographic records but depending on their base number you want to apply different formats. Whenever a record from base 27 (standards) arrives we want only to show its title and the standard numbers, in other case a default format will be applied in which the title and authors are shown. We'll assume CDS variable notation and that the input rules are defined properly.

We are going to define a new NORMAL behavior for this new situation, let's call it SIMPLE. In it we'll need two conditions to be defined: one for applying the default format and another one for the 27-base special one. The base number comes in variable 909C0.b, so the conditions would be based on this variable content.

The result behavior should be defined like this:

 

SIMPLE (NORMAL)

10

$909C0.b="27"

 

"<b>"$245.a"</b>"

forall($0248.a){

  rep_prefix(" - ")  $0248.a separator("; ") 

}

 

50

""=""

 

"<b>"$245.a"</b>"

forall($100.a){

  rep_prefix(" -  Authors: ")  $100.a separator("; ") 

}

 

 

Some explanations on this example are needed:

·        As you can see we have defined two conditions: one for the 27-format and another for the default format. The point that is important is the order in which we put the conditions: For each record in the input the special one is evaluated first (because it has a lower evaluation number, 10) and if the condition is true the format will be applied; in case the base is not 27 the default condition is evaluated and because its condition EL code is always true the default will be used to format the record.

·        Don't worry too much about the action code because it's quite trivial. There are some "strange" things like the use of functions rep_prefix and separator. These are special UDFs that have a special behavior inside a FORALL statement:

o       rep_prefix ŕ Prints the string argument only when we are in the first iteration of a FORALL. In order words, put the prefix of the string which is to be generated by the FORALL statement.

o       Separator ŕ Prints the string argument in every FORALL iteration but not in the last one.


6. - Formats

 

Formats are a special construction that BibFormat Evaluation Language (EL) offers. It allows you to group under an identifier some EL code and after you can call it from every EL statement.

You can manage these formats using the web interface. It is quite easy to do so: When you access the Formats section it will present you a list with all the format identifiers that are already defined and a small documentation about what's the format for. From there you can see the whole EL code by using the link [Code]. You can add a new format by using the set of input boxes that you'll find at the end of the page. Also delete and modify operations are possible for already defined formats.

 

[Fig. 7]

Note: When defining formats, one has to pay attention not to use "recursive" format calls (either direct or indirect); this can lead to execution problems. For example, imagine that we have a format called "ex 1" that has a call for itself:

 

Format "ex_1"

"hello world"

format("ex_1")

 

…this is a "direct" recursive call; you should never have these kind of calls as the web interface should warn you if it finds these kind of troubles. However, "indirect" calls are not detected by the web interface, so you have to care about them. One example of "indirect" recursion:

Format "ex_1"

"hello world"

format("ex_2")

 


7. - Knowledge bases (KBs)

 

This is yet another special feature provided by BibFormat Evaluation Language. In a few words, this allows you to map one string value to another according to a pre-stored set of key values that map to other values (the knowledge bases). All the knowledge bases are identified by a label that has to be unique (among other KBs identifiers); remember that identifiers are not case-sensitive.

These sets of values, normally lived in a file, but with this new development there was the need to have an easy KB management that was integrated in BibFormat. For this reason, you can manage KBs from the BibFormat configuration interface: section KBs.

When accessing to KBs section, the list of all the KBs identifiers defined will be displayed. Below it you'll find a set of controls to add new KBs; the use of these controls is as usual along the interface but there's something a bit special: Normally, you shouldn't fill in the input box that asks you for the Knowledge base table name; all the knowledge base data is handled by a database in which each KB corresponds to a DB table; this input box gets the internal table name for that KB; normally the KB manager will generate it for you so you shouldn't need to use it.

 

[Fig. 8]

 

Each KB has a link for accessing the list of values that it contains. If you click on it, a new window will show you the list of current values (key and mapped ones) and a very easy interface to add new values or to delete existing ones (KB values are case sensitive).

 

[Fig. 9]

 


8. - User Defined Functions (UDFs)

 

The use of User Defined Functions (UDFs) is one of the more powerful features of BibFormat Evaluation Language (EL). The idea is that inside EL you can use operations or functions over strings; normally a large number of different string transformations are needed when talking about formatting but we cannot pretend implement all this operations inside EL because it's in constant growing and new needs appear all the time. For dealing with this problem, BibFormat defines a mechanism that allows you to use define as much functions (UDFs) as you want and use them inside any EL statement.

These functions are identified by a unique name and they receive data (over which they do operations) by parameters. These functions are defined in a programming language (PHP) and therefore good knowledge of this language is needed.

BibFormat offers a complete UDF management through the UDFs web interface section. There you'll see a complete list of all defined UDFs with their identifier, parameters and a small documentation about what the UDF does. You can also add, delete or modify UDFs or even have a look at the PHP code of an already defined function (there you'll be able to launch small tests over the defined functions).

 

[Fig. 10]

 

The definition of these functions should be reserved to administrators and some particularities have to be taken into account when defining UDFs:

·        When you want to add or modify a UDF you are asked for the parameter list; you have to enter the parameter names separated by comas. Ex: You want to define a new function for prefixing a given string with another, so you need two parameters (one for the string which is going to be prefixed, let's name it str, and another one for the prefix itself, let's name it prefix); you should enter them in the parameter input box like this: prefix, str

·        The order in which you specify the parameters when defining a function is the order in which they have to be passed to the UDF from an EL statement.

·        When defining the PHP code of a function, there are some important things to consider:

o       The result of a function has to be a string.

o       The parameters are available inside the PHP code as variables with the parameter name.

o       The result of the function has to be defined by a PHP result clause giving the resulting string.

o       Make sure the PHP code is correct (there's no way to know if the code is correct from BibFormat and it won't tell you if it is).

o       There are some special variables available inside the PHP definition:

§         $FIRST_ITERATION ŕ Is equal to "1" when we are in the first iteration of an EL FORALL statement. "0" in other case. If the call is made outside a FORALL is set to "1".

§         $LAST_ITERATION ŕ Just the opposite case.

With these two variables you can define FORALL special functions like a function to print a separator.

 

 

 


9. - Defining links

 

As we've already said, BibFormat is not only a formatter but it also provides a link manager but, what do we mean by ‘link manager'? The idea is to have a set of rules that describe how to generate a link using certain data; if the link can be generated from those rules, then the link manager can check different things (i.e. see if the link is valid, if it's a link to a file it can check if the file exists and in which formats it exists, etc) and finally return the solved link. In other words, if you have a set of bibliographic records that can contain a certain link and that link can be coded in the link manager rules, you don't need to store each link in each bibliographic record, you just use the link manager to generate them dynamically; like this, you only have to maintain a small set of rules and not thousands of static links in records.

BibFormat allows you to configure different link definitions each of them identified by a unique name; each of these link definitions have some associated parameters which are the information passed to the rules defined for it. Then, when you call the link manager to solve a link (from an EL statement, for example) you'll have to specify the identifier of the link definition you want to be used and the value for each of the parameters used by that link definition (always string values). The link manager will retrieve the rules associated to the link definition specified and will interpret those rules using the given parameter values, informing you if the link was generated correctly and result (the solved link).

BibFormat provides this mechanism and through the web interface you can access to the rule repository for having a look at what are the available link definitions, define new link rules or maintain already defined ones. When adding or modifying a link definition you'll have to specify the parameters, please remember to separate them by using comas.

 

[Fig. 10]

 

Link definitions are structurally quite similar to behaviors: Although there can be different types of them (as we'll see later), a link definition is made up of one or more conditions and each of these conditions can have one or more actions that tell how the link has to be built in case its condition is accomplished. In general, link rules (this includes conditions and actions) have a particular structure and they are described in Evaluation Language (EL) with one restriction: EL LINK statement cannot be used. Each group of conditions-actions of a link definition can be of a different solving type (actually, when you create a new link definition, its solving type its asked; this is just because all conditions that will be created for that link definition will have the selected solving type as default; but you can change it afterwards having a "mixed" link definition). Their structure and way the link manager interprets them will depend in their solving type. Currently, there you can define link conditions of two different solving types: EXTERNAL or INTERNAL. A more detailed explanation about each type is given later.

As we've said a link definition is made up of various link conditions. When a solving for a concrete link definition is asked, the link manager retrieves all link conditions associated to it. Then it takes the first of them (following the evaluation order - the lower is the evaluation order number, the first the condition is considered), it evaluates its EL code with the parameter values passed and if the result is "TRUE" associated actions are executed, the link is returned and the solving process finishes. In case a condition fails, it looks for the next one. If all the conditions fail then the link manager returns that the link couldn't be solved. This is the general behavior of the link manager, but the way of determining if a link has been solved or not and the link building depends on the condition solving type.

 

 

9.1. - EXTERNAL link conditions

 

This is the simplest way of solving links. It's intended to be used when you want to generate a link that points to an external resource (normally a web page). In this case the link condition is composed by only one action that will be evaluated if the associated condition is "TRUE". When a condition of this type is evaluated "TRUE" and the action is executed, the result of the action is given as the solved link and the link manager finishes.

 

[Fig. 11]

 

9.2. - INTERNAL link conditions

 

This condition solving type is intended to be used when you want to link to a document which is a file (inside or outside your file system) and that can be in different file formats.

This case is a bit more complex than the previous one, so we'll go step-by-step explaining differences and special features:

·        An INTERNAL condition has a base file path and a base URL associated. The base file path is the string that will be used as prefix when looking for a file generated by the actions associated to that condition. On the other hand, the base URL will be a string to which the link string (resulting from the actions) will be added (i.e. if the base file path of a condition is /tmp/docs and the base URL is http://doc.cern.ch/, if the condition is true and the result of the actions is test.pdf, the file path the link manager will have to check will be /tmp/docs/test.pdf and, if the file exists, the generated link will be http:/doc.cern.ch/test.pdf)

·        Any condition of this type can several associated file formats. This is a new concept that is only used for INTERNAL condition solving. A file format is simply a set of file extensions that are grouped under an identifier. Then, you can associate a file format identifier with a link condition. When the condition is true the link manager will combine each result from the condition actions with the associated file formats to check the existence of a file of any format; this means that when an action is evaluated, the link manager takes the file extensions of each associated file format identifier and checks if the file base path + resulting action string + file extension exists in the file system.

·        One condition of this type can have more than one associated action. Each of its actions describes an alternative way of building the file path. When a condition of this type is evaluated to "TRUE", the link manager retrieves its actions (following actions apply order) and evaluates the first one; with the action result it builds the file path in this way: file base path + resulting action string, and then combines this string with each of the file extensions. If any of the combination exists in the file system, the link is generated (if there are more than one file format combination that exist, the link variable will have multiple values containing the different links); if not, it starts the same process with the next action. If any of the actions drive to a existing file, the link is not generated.

·        When calling the link manager from a EL statement (see chapter Evaluation Language Reference), if the link is solved we'll be able to access to a special internal variable that contains as value the resulting link. In the INTERNAL condition links, we have said that this variable can contain multiple values in case the link manager finds different file formats. In this case, there's another extension that consists in having some special variable fields containing special values for each value in the LINK variable and to which you can access when the link is solved; here's a table detailing the different variable LINK fields which are defined when a INTERNAL condition link is solved:

 

Field name

Value that contains

url

The same value as the LINK variable: The solved URL.

file

Contains the local full path to the file the solved URL points to.

format_id

Contains the file format id string

format_desc

Contains the file format description string (this is defined for each file format)

 

 

9.3. - Example

 

As the link generation is quite a complex topic (specially when talking about INTERNAL linking) we'll try to illustrate it with a simple example.

Let's imagine we want to create a new link definition for generating full-text access to the documents that are archived on a document server (a file system which contains document's electronic versions). These documents are organized systematically depending in three characteristics that are included in the bibliographic records: BASE, CATEGORY and ID. When the base corresponds to "CERNREP" then the files are archived below directory /pub/www/home/cernrep/ and can be stored following two different criteria that depend on the CATEGORY and ID values; the documents are all HTML. However, if the base is "PREPRINT" and the CATEGORY is either "HEP-TH" or "HEP-PH" they are stored under directory /archive/electronic|/pub/www/home/ following a certain criteria; in this case the documents can be in several file formats: PDF, Postscript, MS Word.

Of course, we want only the link to be created if the files corresponding to the bibliographic records exist.

So we start creating a new link definition that we'll call FULLTEXT. It will receive three parameters that are the information we need for generating this kind of links: BASE, CATEGORY and ID. We select INTERNAL as solving type as default and then we fill it the base file path and url with some default values (these values are not important, they will be copied by default to the conditions we are going to create afterwards).

 

[Fig. 12]

 

Then we create a condition for the first possibility: when BASE is "CERNREP". We select INTERNAL as link condition because we want to link to a file and we want to check its existence and we fill in the base file path and URL with the corresponding values. Then we assign the file format types and we enter the file archiving criteria as different actions.

 

[Fig. 13]

 

For the other possibility we proceed in the same way by adapting the definition to the requirements; we'll have something like this as result:

 

[Fig. 14]

 

Once we have finished the link definition, we can insert links of this type from a BibFormat behavior, for example. Let's imagine we have included a piece of EL code like this in a behavior because we want to insert a link to the full-text documents of any record:

 

link("FULLTEXT", $base, $category, $id)

{

  "Fulltext: "

  forall($link){

    "<a href=\"" $link.url "\">" $link.format_desc "</a>"

     separator " - "

  }

}

 

This EL statement will include the string "Fulltext: " followed by a link to all the documents found for the values of internal variables $base, $category, $id separated by " - ".


10. - User management

 

The BibFormat web interface (WI) comes with a security mechanism which allows you to define which users can access the WI. BibFormat doesn't have a user management incorporated; instead it uses CDS user schema (as is a part of CDS). So if you are not registered as CDS user and you want to have access to BibFormat WI, first thing to do is to register in CDS through the standard procedure (for example via the CDS Search interface you can access the CDs account management system).

BibFormat WI access policy is rather simple: it keeps a list of CDS users that can access the WI. Then if someone tries to access any part of the WI, the system will ask the user to identify him as CDS user. If the CDS login is successful and the user is in BibFormat's access list, then the user will gain access to the WI.

There's a section in the WI which allows you to define which CDS users will have access to the WI. The use is rather simple: You can add CDS users to the access list by specifying either their CDS user id or their CDS login; then you can delete a CDS uses from the access list by simply selecting the link "delete" for the corresponding user.

 

[Fig. 15]

 

When you install BibFormat for the first time and you access to the WI you'll see that no login or password is asked. The security mechanism doesn't get activate until at least one user is added to the BibFormat's access list. So if you don't want to limit the access to BibFormat WI keep the access list without any user in.


 

11. - Evaluation Language Reference

 

In this section we'll present a more or less formal definition of the Evaluation Language (EL); although we are using some formal methods to describe it we'll also make a quick explanation about the elements that made up the language and how to combine them to arrive to desired results.

Just below you can find the EL definition, expressed in terms of EBNF (Extended Backus-Naur Form) notation. We have used capital letters to express non-terminal elements and non-capital/bold characters for the terminal ones. There's one remark to make: Whenever you find the mark [REX] after any definition, it means that we have used a regular expression just before in order to express a set of non-terminals.

 

SENTENCE ::= TERM {&& TERM | || TERM}

TERM ::= FACTOR {= FACTOR | != FACTOR | FACTOR}

FACTOR ::= [!] BASIC

BASIC ::= VARIABLE | LITERAL | FUNCTION | ( SENTENCE ) | FORALL |

IF | FORMAT | LINK | COUNT | KB

VARIABLE ::= $ STRING [. STRING]

LITERAL ::= "([^"] | \")*"  [REX]

FUNCTION ::= STRING ( [ SENTENCE {, SENTENCE} ] )

FORALL ::= forall ( VARIABLE [, LITERAL] ) { SENTENCE }

IF ::= if( SENTENCE ) { SENTENCE } [else { SENTENCE }]

FORMAT ::= format( SENTENCE )

LINK ::= link( SENTENCE , [SENTENCE {, SENTENCE}] ) { SENTENCE }

        [else { SENTENCE }]

 

COUNT ::= count( VARIABLE )

KB ::= kb( SENTENCE )

STRING ::= [a-zA-Z0-9_] [REX]

 

This is just a formal way of describing the language, but don't worry if you don't understand it very well because just below these lines we'll try to describe it in a more informal way.

To begin with, you should know that EL is a language designed to work with strings (a string is a collection of characters) but it has also some logic and comparison operations. One important thing you have to be aware of is that in EL blank spaces, tabulators or carriage returns have no more meaning than separator for elements of the language; that means that between two basic elements you can have as many spaces or carriage returns as you want.

One of the basic elements of the language is what we call LITERALS. These things represent constant string values; they are delimited by a pair of double quote (") symbols surrounding the string you want to express. Everything you put inside the double quotes will be considered as it is, so inside a literal several spaces or carriage have meaning (it's the only case). If you want to express a double quote symbol inside a literal you have to escape it using \.

Some examples of literals:

·        If you want to represent the string hello, inside the EL you'll have to use "hello".

·        For the string hello "big"      man, the representation in EL is "hello \"big\"      man" (notice the escape characters and that spaces have meaning).

·        Let's see \"" string has to be expressed in this way "Let's see \\\"\"".

Another important basic element of the language is VARIABLES. These elements represent string data from the input to which you can refer inside of the language (and is considered also as a string). Variables are defined in advance by the administrator (or even users) so you have to know which of them you have access to. Additionally, variables can contain FIELDS that are simply other input values that are grouped under a variable because they have some kind of relationship between them (for example, you could have a variable for the information about the author and fields like name, born place, etc for it). If you want to know more about variables and their correspondence with the input you can look at the Mapping the Input section. The way of expressing a variable in EL is by a dollar symbol followed by any letter, number or underscore; variables are case-insensitive. To refer to any field of a variable, you simply put a dot followed by the field name (which is also made up of any character, number or underscore).

Some examples about variables and fields:

·        Imagine you have a variable which contains the author information and which is called author, to represent in EL you would have to write $author. In every place that $author appears BibFormat will consider the value defined for it from the current input record.

·        Then you know that the field name of variable author contains the author full name and you want to refer to it inside an EL statement, so you'd write $author.name.

·        If we speak about CDS configuration, variable and field names correspond to MARC 21 tag & indicator names; so to refer to the main title of a bibliographic record we should use variable 245 field b, in EL terms: $245.b.

Now that we know basic elements of the language we can start thinking about how to combine them. The most important (and unique) string operation is concatenation: adding strings. This operation is implicit to the language, so we just put language elements one before another, and the representation result will be the result of the basic elements one after another.

Some samples:

·        To represent the constant string Author: followed by the name of the author of the input record you should write "Author: " $100.a (it's supposed CDS configuration in which MARC 21 notation is used; authors correspond to variable 100 field a).

·        You want to output the title in bold (always HTML speaking) followed by the author in normal chars separated of the title by char /: "<b>" $245.b "</b>/" $100.a

These two, literals and variables, are only basic elements of the EL. You can combine them using concatenation to get new strings. But, of course, there are some more operations you can apply over strings: UDFs (User Defined Functions). We'll also name these elements as functions, because they are that: functions or operations to be applied over strings; when talking about strings we include basic elements or resulting string from applying any operations. A UDF has a name that identifies it uniquely and needs to get some information that we call parameters. A UDF gives another string as result depending on the parameter values (always strings). So to represent a function in EL you need its name followed by an open parenthesis, the parameter values separated by comas and a closing parenthesis. There's a list of UDF you can look at through the interface but this list can be extended to fit your needs (look at UDFs section of this manual).

Some examples:

·        You want to ensure that the title of a bibliographic record is always going to be in capital letters; good, there's a function called upper that takes one parameter and gives as result the parameter transformed in capital letters. You have to write the call like this: upper($245.b).

·        You want only the 3 first chars of an author name to appear in capital letters. We've seen there's a function for uppercasing a string but there's another one, called copy that gets a sub string from a string passed as first parameter from the char position indicated by the 2nd parameter and with the length given by the 3rd one: copy( upper($100.a), "0", "3").

As you can see, these UDFs are very powerful because you can concatenate their result with another element (literal, variable or even function) and the parameters can be basic elements or expressions. We can extend this ensuring that any element or expression of the EL that gives as result a string value can be combined with other EL expressions or elements.

Another very useful feature of EL is the possibility to use KWONLEDGE BASES (KBs). A KB is just a set of key values that map (one-to-one) another set of values; may be knowledge bases isn't a very appropriate name because they are more like translation tables. BibFormat offers tools to create and maintain KBs that can be used in the EL afterwards (see chapter KBs management in this manual). You can see KB invocation as a special function (the syntax for calling it is the same) with name kb and that takes two parameters: one for indicating the KB name (BibFormat can handle several KBs) and another one for the key value to translate. The result is the mapped KB value or an empty string if it doesn't exist as a key value in the specified KB. A typical example is when you have months with numbers and you want to translate them into month names; you could have a KB that maps all the month numbers to month names and then call it like this kb("MONTH", $m).

Now let's move to FORMATS. Formats are some EL code which is grouped under a label (a name) and that can be used in any other EL statement. BibFormat allows the user to define as many formats as he wants and identify each of them with a simple name. In few words, formats allow you to reuse EL code; within a format you can put any EL code (even other format calls) and all the variable values are completely available.  Again, a format call in EL follows the same convention as functions: the word format followed by the format name (a string) between parenthesis. When you call a format is like if the EL code define inside that format was pasted, as it is in the place you make the call.

Example: Imagine you have to write the title of a bibliographic records with a certain format, let's say in bold and red; but this formatted title you are going to use it in several places. So can take advantages of EL formats and define a format called TITLE that contains the code "<font color=\"red\"><b>" $245.b "</b></font>". Once this is done, you could use it to format records by printing their title in that way and their author after it: format("TITLE") "/" $100.a. The good thing is that if some day you decide to change the title formatting you'd only need to modify the TITLE format definition and not all the places where you show the title.

At this point, you have seen basic elements and operations with EL. You may think that is powerful enough to express your formatting work, but there are more complex situations that you'll have to face. We have tried to design the EL to be easy enough but with the next advanced structures, sometimes, can arrive to be a bit complex.

All these basic elements and operations are quite OK. But there are sometimes where you want to compare expressions and decide what to do depending on the result of the comparison. For this purpose, EL has an IF statement and a few comparison and logic operators built in (don't forget that any functionality needed can be achieved by defining new UDFs; EL gives basic operations to provide this possibility). Let's go step by step: First let's talk about the set of operators that can be used in a comparison:

1.     Comparison operators: Equal and non-equal (=, !=). They take two operators that have to be strings and produce a logic (true or false) value.

2.     Logical operators: AND, OR and NOT (&&, ||, !). All of them have to be used over logical values, taking two operators AND and OR, and one operator NOT.

All of them are right associative (except NOT which is unary left-associative) and their precedence goes like this (more to less): NOT, (EQUAL, NON-EQUAL), (AND, OR). These operators cannot be used anywhere, only inside statements that expect a logic value as result, in other words, inside condition statements.

The IF structure is quite easy to learn: First we indicate the word IF followed by a condition statement surrounded by parenthesis; then a EL statement into braces can be specified, this statement will be executed only if the condition was true; optionally, we can add an ELSE word followed by another EL statement into braces, that will only be triggered if the IF condition was not true.

Let's have a look at some examples:

·        I want the title of a record to appear followed by the constant Author: and its author afterwards. But it could be nice if the constant string appeared only if the record has author:

 

format("TITLE") if($100.a!="") { "Author: " $100.a }

 

BibFormat is not only an EL processor. Among others, it contains a link solver that contains it's own rule repository in order to be able to automatically solve links (see chapter Link solver of this manual). EL has one special structure for asking the link solver for some links and including them in the formatted version of the bibliographic record. This way links are easy to maintain (you modify the rules independently from where the link is being used) and as re-usable as formats or UDFs. Links are identified by a label and need some information to be passed as parameters; then an EL statement has to be specified which will be effective only if the link is solved and inside which, you'll have access to an special variable, named LINK, which contains the solved link among other information (see chapter Link solver for more information about which values are accessible); additionally, an else statement can be added (following the same syntax as in the IF construction) that will be effective only if the link can't be solved by the Link solver.

Example:

·        We are with our typical example of the simple format that contains the title and the author, but now we want the author to be linked to the search. Supposing that a this kind of link is already defined under the label "AUTHOR_SEARCH" we should proceed like this:

 

format("TITLE") "/"

link("AUTHOR_SEARCH", $100.a)

     { "<a href=\""$link "\">"$100.a"</a>"}

 

The next step when talking about EL components is to deal with multiple values. Life is no so easy and, of course, and a bibliographic record can have more than one author or can have a related document which is in more than one format and that has to be linked. In other words, BibFormat supports having variables and fields with multiple values (see chapter Mapping input), consequently a way of applying an EL statement over all the values of a variable or a field would be quite useful. FORALL is our construction!! It allows you to specify a variable or a field followed by a EL statement (between braces) that will be applied for every value of the variable or the field; any reference to the iteration variable inside the FORALL EL statement will be related to the current iteration variable value (if you refer to a variable that has multiple values outside a FORALL the first value is considered). One limitation is that you shouldn't nest FORALL statements, in other words, never put a FORALL inside another one. This construction let's you also limit the number of times you want to iterate over a variable or field by adding a literal with the number of iterations.

Some examples:

·        Let's continue refining our simple format; now we have to consider that there can be more than one author for one bibliographic record, so we want to show all of them with the link included, of course.

 

format("TITLE") "/"

forall($100.a)

{

  link("AUTHOR_SEARCH", $100.a)

     { "<a href=\""$link "\">"$100.a"</a>"}

}

·        Although this FORALL construction could seem not very useful, it's used a lot when defining formats or behaviors. Quite often you will have the case where you want only some EL piece of code to be effective if a certain variable or field exist; FORALL can also be used in that situation and it has to be said that is the most comfortable way of doing it. Imagine the case you want the title, the constant string "Author: " followed by the authors of a bibliographic record; but you don't want the constant "Author: " to appear if there's no author at all. You could use something like this:

 

format("TITLE") " - "

forall($100.a)

{

  rep_prefix("Author: ") $100.a " "

}

As you can see we are using a new function: rep_prefix. In fact this is an UDF which prints the string passed as parameter only once at the beginning inside a FORALL statement. But the interesting thing here is the FORALL application.

 

Finally, there's still one EL special function: COUNT. Due to certain special situations or strange input data in the variables, sometimes is useful to know how many values contain a variable or a field. So this function, simply takes a variable or field as argument and returns a string with the number of values that contains; if the value returned is 0, that means that no value is in the variable, what means that variable doesn't exist or there weren't any values mapped from the input.

Examples:

·        As this is the last example, let's do it a bit more complicated: Continuing with our very well known simple format, we want all the authors of the record appear if there are less than 10, in any other case we want only the first one to appear followed by the string "et al.". We'll also use a function called GT which returns a non-empty string if the first parameter is greater than the second one.

 

format("TITLE") "/"

if(gt(count($100.a), "10")!="")

{ $100.a "et al." }

else

{

  forall($100.a)

  {

    link("AUTHOR_SEARCH", $100.a)

     { "<a href=\""$link "\">"$100.a"</a>"}

  }

}

 


diff --git a/modules/bibharvest/doc/admin/guide.html.wml b/modules/bibharvest/doc/admin/guide.html.wml index 2ee12d859..3fcbb17f5 100644 --- a/modules/bibharvest/doc/admin/guide.html.wml +++ b/modules/bibharvest/doc/admin/guide.html.wml @@ -1,84 +1,86 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="BibHarvest Admin Guide" \ navtrail_previous_links="/admin/> > /admin/bibharvest/>BibHarvest Admin" \ navbar_name="admin" \ navbar_select="bibharvest-admin-guide"

WARNING: THIS ADMIN GUIDE IS NOT FULLY COMPLETED
This Admin Guide is not yet completed. Moreover, some admin-level functionality for this module exists only in the form of manual recipes. We are in the process of developing both the guide as well as the web admin interface. If you are interested in seeing some specific things implemented with high priority, please contact us at . Thanks for your interest!
+

Version <: print generate_pretty_revision_date_string('$Id$'); :> +

Contents

1. Overview
2. OAI Data Harvesting
       2.1 BibHarvest command-line tool
       2.2 Periodical harvesting
3. OAI Data Providing

1. Overview

FIXME.

2. OAI Data Harvesting

2.1. BibHarvest command-line tool

To harvest records from an OAI compliant repository, run the bibharvest command-line tool. For example:

 $ bibharvest -vListRecords -f2004-04-01 -u2004-04-02 -pmarcxml -o/tmp/z.xml \\
              http://cdsweb.cern.ch/oai2d.py
 

For further help with the command-line harvesting tool, run bibharvest --help.

2.2. Periodical harvesting

It is not currently possible to set up periodical execution of bibharvest. You would have to set up an external cron job script to do that.

3. OAI Data Providing

FIXME. (See config.wml for OAI tags.) diff --git a/modules/bibindex/doc/admin/guide.html.wml b/modules/bibindex/doc/admin/guide.html.wml index 372a07a6d..7ca5bd249 100644 --- a/modules/bibindex/doc/admin/guide.html.wml +++ b/modules/bibindex/doc/admin/guide.html.wml @@ -1,71 +1,73 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="BibIndex Admin Guide" \ navtrail_previous_links="/admin/> > /admin/bibindex/>BibIndex Admin" \ navbar_name="admin" \ navbar_select="bibindex-admin-guide"

WARNING: BIBINDEX ADMIN GUIDE IS UNDER DEVELOPMENT
BibIndex Admin Guide is not yet completed. Most of admin-level functionality for BibIndex exists only in commandline mode. We are in the process of developing both the guide as well as the web admin interface. If you are interested in seeing some specific things implemented with high priority, please contact us at . Thanks for your interest!
+

Version <: print generate_pretty_revision_date_string('$Id$'); :> +

Contents

1.Overview
2. Configure Metadata Tags and Fields
       2.1 Configure Physical MARC Tags
       2.2 Configure Logical Fields
3. Configure Word/Phrase Indexes
       3.1 Define New Index
       3.2 Configure Word-Breaking Procedure
       3.3 Configure Stopwords List
       3.4 Configure Accent Stripping
4. Run BibIndex Daemon

1. Overview

2. Configure Metadata Tags and Fields

2.1 Configure Physical MARC Tags

2.2 Configure Logical Fields

3. Configure Word/Phrase Indexes

3.1 Define New Index

3.2 Configure Word-Breaking Procedure

3.3 Configure Stopwords List

3.4 Configure Accent Stripping

4. Run BibIndex Daemon

diff --git a/modules/bibrank/doc/admin/guide.html.wml b/modules/bibrank/doc/admin/guide.html.wml index e5930a588..e7ff277aa 100644 --- a/modules/bibrank/doc/admin/guide.html.wml +++ b/modules/bibrank/doc/admin/guide.html.wml @@ -1,377 +1,379 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="BibRank Admin Guide" \ navtrail_previous_links="/admin/> > /admin/bibrank/>BibRank Admin" \ navbar_name="admin" \ navbar_select="bibrank-admin-guide" +

Version <: print generate_pretty_revision_date_string('$Id$'); :> +

Contents

1.Overview
2.Configuration Conventions
3.BibRank Admin Interface
       3.1.Main interface
       3.2.Add rank method
       3.3.Show details of rank method
       3.4.Modify rank method
       3.5.Delete rank method
       3.6.Modify translations
       3.7.Modify visibility toward collections
4.BibRank Daemon
       4.1.Command Line Interface
       4.2.Using BibRank
5.bibrankgkb Tool
      5.1.Command Line Interface
      5.2.Using bibrankgkb
6.Additional Information

1. Overview

The bibrank module consist currently of two tools:

bibrank - Generates star categories for ranking searchresults based on methods like:

 Journal Impact Factor
 ##Number of downloads
 ##Author Impact
 ##Citation Impact
 
bibrankgkb - For generating knowledgebase files for use with bibrank

The bibrankgkb may not be necessary to use, it depends on which ranking methods you are planning to use, and what data you already got. This guide will take you through the necessary steps in detail in order to create different kinds of ranking methods for the search engine to use.

2. Configuration Conventions

 - comment line starts with '#' sign in the first column
 - each section in a configuration file is declared inside '[' ']' signs 
 - values in knowledgebasefiles are separated by '---' 
 

3. BibRank Admin Interface

The bibrank webinterface enables you to modify the configuration of most aspects of BibRank. For full functionality, it is advised to let the http-daemon have write/read access to your cdsware/etc/bibrank directory. If this is not wanted, you have to edit the configuration files from the console using your favourite text editor.

3.1 Main interface

In the main interface screen, you see a list of all rank methods currently added. If you have added the 'long name' translation in the current chosen language for a rank method, you will see this name, if not, and the default cdsware language translation exists, it will be used instead. And if no translation exists, the bibrank code will be used. To find out about the functionality available, check out the topics below.
Explanation of concepts
 Rank method:
 A method responsible for creating the necessary data to rank a result.
 Translations:
 Each rank method may have many names in many languages.
 Collections:
 Which collections the rank method should be visible in.
 

3.2 Add rank method

When pressing the link in the upper right corner from the main interface, you will see the interface for adding a new rank method. The two available options that needs to be decided upon, are the bibrank code and the template to use, both values can be changed later. The bibrank code is used by the bibrank daemon to run the method, and should be fairly short without spaces. Which template you are using, decides how the ranking will be done, and must before used, be changed to suit your cdsware configuration. When confirming to add a new rank method, it will be added to the list of possible rank methods, and a configuration file will be created if the httpd user has proper rights to the 'cdsware/etc/bibrank' directory. If not, the file has to manually be created with the name 'bibrankcode.cfg' where bibrankcode is the same as given in the interface.

3.3 Show details of rank method

This interface gives you an overview of the current status of the rank method, and gives direct access to the various interfaces for changing the configuration. In the overview section, you see the bibrank code, for use with the bibrank daemon, and the date for the last run of the rank method. In the rank set section you see how many records there are in each star category, and the threshold value deciding the range of each category. In the collection part, the collections which the rank method is visible to is shown. The translations part shows the various translations in the languages available in cdsware. On the bottom the configuration file is shown, if accessible.

3.4 Modify rank method

This interface gives access to modify the bibrank code given when creating the rank method and the configuration file of the rank method, if the file can be accessed. If not, it may not exist, or the httpd user doesn't have enough rights to read the file. On the bottom of the interface, it is possible to choose a template, see it, and copy it over the old rank method configuration if wanted. Remember that the values present in the template is an example, and must be changed where necessary. See this documentation for information about this, and the 'BibRank Internals' link below for additional information.

3.5 Delete rank method

If it is necessary to delete a rank method, some precautions must be taken since the configuration of the method will be lost. When deleting a rank method, the configuration file will also be deleted ('cdsware/etc/bibrank/bibrankcode.cfg' where bibrankcode is the code of the rank method) if accessible to the httpd user. If not, the file can be deleted manually from console. Any bibrank tasks scheduled to run the deleted rank method must be modified or deleted manually.

3.6 Modify translations

If you want to use internalization of the rank method names, you have to add them using the 'Modify translations' interface. The interface shows a list of the various name types like 'long name' and 'short name' with the 'long name' initially selected. Below a list of all the languages used in the cdsware installation will be shown with the possibility to add the translation for each language.

3.7 Modify visibility toward collections

If a rank method should be visible to the users of the cdsware search interface, it must be enabled for one or several collections. A rank method can be visible in the search interface of the whole site, or just one collection. The collections in the upper listbox does not show the rank method in the search interface to the user. To change this select the wanted collection and press 'Enable' to enable the rank method for this collection. The collections that the method has been activated for, is shown in the lower listbox. To remove a collection, select it and press the 'Disable' button to remove it from the list of collections which the rank method is enabled for.

4. BibRank Daemon

The bibrank daemon read the necessary metadata from the cdsware database and combines the read metadata in different ways to output the records ranked into the number of categories (stars) given.

4.1 Command Line Interface

 Usage: %s [options]
      Examples:
        %s --id=0-30000,30001-860000 --run=jif --verbose=9
        %s --modified='2002-10-27 13:57:26' --run=jif
        %s --rebalance --collection=Articles --run=jif
 
  Ranking options:
  -c, --collection=c1,c2    Collections to include in this rank method
                             if not given, the collections the method is
                             enabled for will be used.
  -i, --id=idr1,idr2        Record ranges to include in this rank method
  -m, --modified=[from]     Update records modified after date
  -k, --check=value         Check if the rank method  needs rebalancing, (if the top
                            star is higher than given percentage 0-1.0)
  -S, --stat                Show statistics
  -w, --run=rm1,rm2         Runs each rank method in the order given
  -r, --rebalance           Rebalance, do full update
 
  Scheduling options:
  -u, --user=USER           user name to store task, password needed
  -s, --sleeptime=SLEEP     time after which to repeat tasks (no)
                             e.g.: 1s, 30m, 24h, 7d
  -t, --time=TIME           moment for the task to be active (now)
                             e.g.: +15s, 5m, 3h , 2002-10-27 13:57:26
 
  General options:
  -h, --help                print this help and exit
  -V, --version             print version and exit
  -v, --verbose=LEVEL       verbose level (from 0 to 9, default 1
 

4.2 Using BibRank

Step 1 - Adding the rank option to the search interface

To be able to add the needed ranking data to the database, you first have to add the rank method to the database, and add the wished abbreviation you want to use together with it. The name of the configuration file in the next section, needs to have the same name as the abbreviation stored in the database.

Step 2 - Get necessary external data (ex. jif values)

Check out bibrankgkb documentation below.

Example
jif.kb -- sample data with the name of the journals and jif values.

Step 3 - Create the configuration file

The configuration files for the different rank methods has different option, so verify that you are using the correct configuration file and rank method.

Example
jif.cfg -- sample configuration file, for creating the ranking stars based on journal impact factor
Single_tag_rank_method:
 
[rank_method] ##The function which is responsible for doing the work, must be one of the listed ones above. function = single_tag_rank_method ##How big the top star category should be of all available records. Remember that if a lot of records ##have the same rank value, the size may go above this limit top_star_percentage = 0.10 ##The importance of this rank method if several methods are merged into one rank method. overall_importance = 1.0 ##This section must be available if the single_tag_rank_method is going to be used [single_tag_kb] ##The tag which got the value to be searched for on the left side in the kb file (like the journal name) tag = 909C4p ##The path to the kb file which got the content of the tag above on left side, and value on the left side kb_src = /log/cdsware-DEMODEV/etc/bibrank/jif.kb ##Tags that must be included for a record to be added to a star category, to disable remove tags check_mandatory_tags = 909C4c,909C4v,909C4y ##For single_tag_rank_method, this needs to be 'yes', depends on the rank method, what it needs of data enable_modified = yes
For other functions than the single_tag_rank_method, you may need different configuration files, which will be added here when supported by CDSware.

Step 4 - Add the ranking method as a scheduled task

When the configuration is okay, you can add the bibrank daemon to the task scheduler using the scheduling options. The daemon can then do a update of the rank method once each day or similar automatically.

Example
 $ bibrank -wjif -r
 Task #53 was successfully scheduled for execution.
 

Step 5 - Full update, rebalancing

For the first run of a new ranking method, a full update is needed (not default) to establish the ranges to be used for the categories. A full update/rebalance can be run by using the --rebalance/-r option. Sometimes you may want to run the program with the rebalance option, to balance the categories. To check if it is necessary, run the bibrank daemon using the --check/-k option together with the max size allowed for the top star , a message will then be given on screen if a rebalance is needed.

Example
 $ bibrank 53      
 2004-03-09 14:28:47 --> Task #53 started.
 2004-03-09 14:28:47 --> Running: Journal Impact Factor.
 2004-03-09 14:28:47 --> Statistics: Journal Impact Factor , Top Star size: 10.0% , Overall Importance: 100.0%,
 2004-03-09 14:28:47 --> 0 star(s): Range>=      -9.9    7990
 2004-03-09 14:28:47 --> 1 star(s): Range>=      -1.0    1
 2004-03-09 14:28:47 --> 2 star(s): Range>=      0.964   2
 2004-03-09 14:28:47 --> 3 star(s): Range>=      2.047   0
 2004-03-09 14:28:47 --> 4 star(s): Range>=      3.13    2
 2004-03-09 14:28:47 --> 5 star(s): Range>=      4.213   6
 2004-03-09 14:28:47 --> Total: 8001
 

Step 6 - Fast update of modified records

If you just want to update the latest additions or modified records, you may want to do a faster update by running the daemon without the rebalance option. If you don't mention anything, the daemon will try to update the records modified after the last run. If you want to update records modified after a certain time, you can do this with the '--modified=date' option.

5. bibrankgkb Tool

Before the bibrank daemon can be used, a knowledgebase file (kb) with the needed data in the correct format needs to be created. This file can be created using the bibrankgkb tool which can read the data either from the cdsware database, from several webpages using regular expressions, or from another file. In case one source has another naming convention, bibrank can convert between them using a convert file.

5.1 Command Line Interface

 Usage: bibrankgkb %s [options]
      Examples:
        bibrankgkb --input=bibrankgkb.cfg --output=test.kb
        bibrankgkb -otest.cfg -v9
        bibrankgkb 
 
  Generate options:
  -i,  --input=file          input file, default from /etc/bibrank/bibrankgkb.cfg
  -o,  --output=file         output file, will be placed in current folder
  General options:
  -h,  --help                print this help and exit
  -V,  --version             print version and exit
  -v,  --verbose=LEVEL       verbose level (from 0 to 9, default 1)
 

5.2 Using bibrankgkb

Step 1 - Find sources

Since some of the data used for ranking purposes is not freely available, it cannot be bundled with CDSware. To get hold of the necessary data, you may find it useful to ask your library if they have a copy of the data that can be used (like the Journal Impact Factors from the Science Citation Index), or use google to search the web for any public source.

Step 2 - Create configuration file

The default configuration file is shown below.
 
##The main section [bibrankgkb] ##The url to a webpage with the data to be read, does not need to have the same name as this one, but if there are several links, the url should end with _0-> url_0 = http://www.taelinke.land.ru/impact_A.html url_1 = http://www.taelinke.land.ru/impact_B.html url_2 = http://www.taelinke.land.ru/impact_C.html url_3 = http://www.taelinke.land.ru/impact_DE.html url_4 = http://www.taelinke.land.ru/impact_FH.html url_5 = http://www.taelinke.land.ru/impact_I.html url_6 = http://www.taelinke.land.ru/impact_J.html url_7 = http://www.taelinke.land.ru/impact_KN.html url_8 = http://www.taelinke.land.ru/impact_QQ.html url_9 = http://www.taelinke.land.ru/impact_RZ.html ##The regular expression for the url mentioned should be given here url_regexp = ##The various sources that can be read in, can either be a file, webpage or from the database kb_1 = /home/trondaks/w/cdsware/modules/bibrank/etc/cern_jif.kb kb_2 = /home/trondaks/w/cdsware/modules/bibrank/etc/cdsware_jif.kb kb_2_filter = /home/trondaks/w/cdsware/modules/bibrank/etc/convert.kb kb_3 = SELECT id_bibrec,value FROM bib93x,bibrec_bib93x WHERE tag='938__f' AND id_bibxxx=id kb_4 = SELECT id_bibrec,value FROM bib21x,bibrec_bib21x WHERE tag='210__a' AND id_bibxxx=id ##This points to the url above (the common part of the url is 'url_' followed by a number kb_5 = url_%s ##This is the part that will be read by the bibrankgkb tool to determine what to read. ##The first two part (separated by ,,) gives where to look for the convertion file (which convert ##the names between to formats), and the second part is the datasource. A convertion file is not ##needed, as shown in create_0. If the source is from a file, url or the database, it must be ##given with file,www or db. If several create lines exists, each will be read in turn, and added ##to a common kb file. ##So this means that: ##create_0: Load from file in variable kb_1 without convertion ##create_1: Load from file in variable kb_2 using convertion from file kb_2_filter ##create_3: Load from www using url in variable kb_5 and regular expression in url_regexp ##create_4: Load from database using sql statements in kb_4 and kb_5 create_0 = ,, ,,file,,%(kb_1)s create_1 = file,,%(kb_2_filter)s,,file,,%(kb_2)s #create_2 = ,, ,,www,,%(kb_5)s,,%(url_regexp)s #create_3 = ,, ,,db,,%(kb_4)s,,%(kb_4)s
When you have found a source for the data, created the configuration file, it may be necessary to create an convertion file, but this depends on the coversions used in the available data versus the convertion used in your cdsware installation.
The available data may look like this:
 
COLLOID SURFACE A---1.98
But in cdsware you are using:
 
Colloids Surf., A---1.98
By using a convertion file like:
 
COLLOID SURFACE A---Colloids Surf., A
You can convert the source to the correct naming convention.
 
Colloids Surf., A---1.98

Step 3 - Run tool

When ready to run the tool, you may either use the default file (/etc/bibrank/bibrankgkb.cfg), or use another one by giving it using the input variable '--input'. If you want to test the configuration, you can use '--verbose=9' to output on screen, or if you want to save it to a file, use '--output=filename', but remember that the file will be saved in the programdirectory. The output may look like this:
 
$ ./bibrankgkb -v9 2004-03-11 17:30:17 --> Running: Generate Knowledgebase. 2004-03-11 17:30:17 --> Reading data from file: /log/cdsware-DEMODEV/etc/bibrank/jif.kb 2004-03-11 17:30:17 --> Reading data from file: /log/cdsware-DEMODEV/etc/bibrank/conv.kb 2004-03-11 17:30:17 --> Using last resource for converting values. 2004-03-11 17:30:17 --> Reading data from file: /log/cdsware-DEMODEV/etc/bibrank/jif2.kb 2004-03-11 17:30:17 --> Converting between naming conventions given. 2004-03-11 17:30:17 --> Colloids Surf., A---1.98 2004-03-11 17:30:17 --> Phys. Rev. Lett.---6.462 2004-03-11 17:30:17 --> J. High Energy Phys.---8.664 2004-03-11 17:30:17 --> Nucl. Instrum. Methods Phys. Res., A---0.964 2004-03-11 17:30:17 --> Phys. Lett., B---4.213 2004-03-11 17:30:17 --> Phys. Rev., D---3.838 2004-03-11 17:30:17 --> Total nr of lines: 6 2004-03-11 17:30:17 --> Time used: 0 second(s).

6. Additional Information

BibRank Internals diff --git a/modules/bibsched/doc/admin/guide.html.wml b/modules/bibsched/doc/admin/guide.html.wml index c8d1fd319..df810b2a5 100644 --- a/modules/bibsched/doc/admin/guide.html.wml +++ b/modules/bibsched/doc/admin/guide.html.wml @@ -1,65 +1,69 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="BibSched Admin Guide" \ navtrail_previous_links="/admin/> > /admin/bibsched/>BibSched Admin" \ navbar_name="admin" \ navbar_select="bibsched-admin-guide"

WARNING: THIS ADMIN GUIDE IS NOT FULLY COMPLETED
This Admin Guide is not yet completed. If you are interested in seeing some specific things implemented with high priority, please contact us at . Thanks for your interest!
-BibSched -- the bibliographic task scheduler -- is central unit of the +

Version <: print generate_pretty_revision_date_string('$Id$'); :> + +

Overview

+ +

BibSched -- the bibliographic task scheduler -- is central unit of the system that allows all other modules to access the bibliographic database in a controlled manner, preventing sharing violation threats and assuring the coherent execution of the database update tasks. The module comes with an administrative interface that allows to monitor the task queue including various possibilities of a manual intervention, for example to re-schedule queued tasks, change the task order, etc. You can run the administrative interface by doing:

 $ bibsched
 
The bibsched can run in two modes: auto and manual. In the auto mode, it will execute tasks automatically as they arrive in the waiting queue. In the manual mode, the administrator has to launch the tasks manually. diff --git a/modules/bibupload/doc/admin/guide.html.wml b/modules/bibupload/doc/admin/guide.html.wml index 78abe2a59..7df6798da 100644 --- a/modules/bibupload/doc/admin/guide.html.wml +++ b/modules/bibupload/doc/admin/guide.html.wml @@ -1,116 +1,120 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="BibUpload Admin Guide" \ navtrail_previous_links="/admin/> > /admin/bibupload/>BibUpload Admin" \ navbar_name="admin" \ navbar_select="bibupload-admin-guide"

WARNING: THIS ADMIN GUIDE IS NOT FULLY COMPLETED
This Admin Guide is not yet completed. Moreover, some admin-level functionality for this module exists only in the form of manual recipes. We are in the process of developing both the guide as well as the web admin interface. If you are interested in seeing some specific things implemented with high priority, please contact us at . Thanks for your interest!
-BibUpload enables you to upload bibliographic data in XML MARC +

Version <: print generate_pretty_revision_date_string('$Id$'); :> + +

Overview

+ +

BibUpload enables you to upload bibliographic data in XML MARC format into CDSware bibliographic database.

Configuring BibUpload

There is nothing to be configured at the moment. All the data upload configuration is usually done when transforming the data via BibConvert.

NOTE: Please note that BibUpload currently assumes 037 $a tag to be a "primary report number" that is unique throughout the system. Therefore, if you upload two records with the same 037 $a tag value, it will override the exising record with the new one. See the beginning of the BibUpload file to know more.

More advanced BibUpload configuration functionality will be included later.

Running BibUpload

Consider that you have an XML MARC file that is to be uploaded into the CDSware. (For example, it might have been produced by BibConvert.) To finish the upload, you would call the BibUpload script as follows:

 $ bibupload -i file.xml
 
 

For available command-line options, see bibupload --help.

BibUpload Modes

FIXME
 
     -i, --insertrecord  Insert records from XML MARC file as new into the system.
                         Signals error if record already exists (see the -m matching 
                         option below on how this is decided).
 
     -r, --replacerecord Replace existing records by those from the XML MARC file.
                         The original content is wiped out and fully replaced.
                         Signals error if record is not found via -m matching criteria.
 
                         Note also that `-r' can be combined with `-i' into an `-ir' option
                         that would automatically either insert records as new if they are 
                         not found in the system, or correct existing records if they 
                         are found to exist.
 
     -a, --appendfield   Append fields from XML MARC file at the end of existing records.
                         The original content is enriched only.
                         Signals error if record is not found via -m matching criteria.
 
     -c, --correctfield  Correct fields of existing records by those from XML MARC file.  
                         The original record content is modified only in the fields 
                         from the XML MARC file: the original fields are removed and replaced 
                         by those from the XML MARC file.  Fields not present in XML MARC file 
                         are not changed  (unlike the -r option).
                         Signals error if record is not found via -m matching criteria.
 
     -f, --format        Upload only the format (FMT) fields. 
                         The original content is not changed, and neither its modification date.
 
 
diff --git a/modules/webaccess/doc/admin/guide.html.wml b/modules/webaccess/doc/admin/guide.html.wml index e4a9faa70..a3ef5ef81 100644 --- a/modules/webaccess/doc/admin/guide.html.wml +++ b/modules/webaccess/doc/admin/guide.html.wml @@ -1,804 +1,803 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="WebAccess Admin Guide" \ navtrail_previous_links="/admin/>Admin Area > /admin/webaccess/>WebAccess Admin " \ navbar_name="admin" \ navbar_select="webaccess-admin-guide" +

Version <: print generate_pretty_revision_date_string('$Id$'); :>

-WEBACCESS ADMIN GUIDE / $Date$
-
 1. Introduction, using roles
 2. WebAccess admin interface
 3. Example pages, illustrating snapshots
 
 
 1. INTRODUCTION, USING ROLES
 
   WebAccess is a common RBAC, role based access control, for all of
   CDSware. This means that users are connected to roles that cover
   different areas of access. I.e administrator of the photo
   collection or system librarian. Users can be active in
   different areas and of course connected to as many roles as needed.
   
   The roles are connected to actions. An action identifies a task you
   can perform in CDSware. It can be defined to take any number of
   arguments in order to more clearly describe what you are allowing
   connected users to do.
   
   For example the system librarian can be allowed to run bibwords on
   the different indexes. To allow system librarians to run the
   bibwords indexing on the field author we connect role system
   librarian with action runbibwords using the argument
   index='author'.
   
   WebAccess is based on allowing users to perform actions. This means
   that only allowed actions are stored in the access control engine's
   database.
 
 
 2. WEBACCESS ADMIN INTERFACE
 
 All the WebAccess Administration web pages have certain
 features/design choices in common
 
 - Divided into steps
 
   The process of adding new authorizations/information is
   stepwise. The subtitle contains information about wich step you are
   on and what you are supposed to do.
 
 - Restart from any wanted step 
 
   You can always start from an earlier step by simply clicking the
   wanted button. This is not a way to undo changes! No information
   about previous database is kept, so all changes are definite.
 
 - Change or new entry must confirmed
 
   On all the pages you will be asked to confirm the change, with
   information about what kind of change you are about to perform.
 
 - Links to other relevant admin areas on the right side
 
   To make it easier to perform your administration tasks, we have
   added a menu area on the right hand side of these pages. The menu
   contain links to other relevant admin pages and change according to
   the page you are on and the information you have selected.
 
 
 3. EXAMPLE PAGES
 
 I. Role area
 II. Example - connecting role and user
 
 
 I. Role area
 
   Administration tasks starts in one of the administration areas. The
   role area is the main area from where you can perform all your
   managing tasks. The other admin areas are just other ways of
   entering.
 
 

Role Administration

administration with roles as access point
Users:
add or remove users from the access to a role and its priviliges.
Authorizations/Actions:
these terms means almost the same, but an authorization is a
connection between a role and an action (possibly) containing arguments.
Roles:
see all the information attached to a role and decide if you want to
delete it.
id name description users authorizations / actions role
2 photoadmin administrator of the photo col... add / remove add / modify / remove delete show details
9 submitter add / remove add / modify / remove delete show details
1 superadmin all rights add / remove add / modify / remove delete show details
4 systemlibrarian system librarian add / remove add / modify / remove delete show details
3 webaccessadmin access to web administrator in... add / remove add / modify / remove delete show details
Create new role
go here to add a new role.
Create new action
go here to add a new action.
 
 II. Example - connecting role and user
   
   One of the important tasks that can be handled via the WebAccess Admin Web Interface
   is the delegation of access rights to users. This is done by connecting them to the 
   different roles offered.
 
   The task is divided into 5 simple and comprehensive steps. Below follows the pages from 
   the different steps with comments on the ongoing procedure.
 
 - step 1 - select a role
 
   You must first select the role you want to connect users to. All the available roles are
   listed alfabetically in a select box. Just find the wanted role and select it. Then click on
   the button saying "select role". 
 
   If you start from the Role Area, this step is already done, and you start directly on step 2.
 
 

Connect user to role

step 1 - select a role
1. select role
Create new role
go here to add a new role.
 
 - step 2 - search for users
 
   As you can see, the subtitle of the page has now changed. The subtitle always tells you
   which step you are on and what your current task is.
 
   There can be possibly thousands of users using your online library, therefore it is important
   to make it easier to identify the user you are looking for. Give part of, or the entire search 
   string and all users with partly matching e-mails will be listed on the next step.
 
   You can also see that the right hand menu has changed. This area is always updated with links
   to related admin areas.
 
 

Connect user to role

step 2 - search for users
1. select role
2. search pattern
Create new role
go here to add a new role.
Remove users
remove users from role superadmin.
Connected users
show all users connected to role superadmin.
Add authorization
start adding new authorizations to role superadmin.
 
 - step 3 - select a user.
 
   The select box contains all users with partly matching e-mail adresses. Select the one
   you want to connect to the role and continue.
 
   Notice the navigation trail that tells you were on the Administrator pages you are currently
   working. 
 
 

Connect user to role

step 3 - select a user
1. select role
2. search pattern
3. select user
Create new role
go here to add a new role.
Remove users
remove users from role superadmin.
Connected users
show all users connected to role superadmin.
Add authorization
start adding new authorizations to role superadmin.
 
 - step 4 - confirm to add user
 
   All WebAccess Administrator web pages display the action you are about to peform, this
   means explaining what kind of addition, change or update will be done to your access control
   data.
 
   If you are happy with your decision, simply confirm it.
 
 

Connect user to role

step 4 - confirm to add user
1. select role
2. search pattern
3. select user
add user mikael.vik@cern.ch to role superadmin?
Create new role
go here to add a new role.
Remove users
remove users from role superadmin.
Connected users
show all users connected to role superadmin.
Add authorization
start adding new authorizations to role superadmin.
 
 - step 5 - confirm user added.
 
   The user has now been added to this role. You can easily continue adding more users to this 
   role be restarting from step 2 or 3. You can also go directly to another area and keep working
   on the same role.
 
 

Connect user to role

step 5 - confirm user added
1. select role
2. search pattern
3. select user
add user mikael.vik@cern.ch to role superadmin?

confirm: user mikael.vik@cern.ch added to role superadmin.

Create new role
go here to add a new role.
Remove users
remove users from role superadmin.
Connected users
show all users connected to role superadmin.
Add authorization
start adding new authorizations to role superadmin.
 
 - we are done
 
   This example is very similar to all the other pages where you administrate WebAccess. The pages
   are an easy gateway to maintaing access control rights and share a lot of features.
   - divided into steps
   - restart from any wanted step (not undo)
   - changes must be confirmed
   - link to other relevant areas
   - prevent unwanted input
 
   As an administrator with access to these pages you are free to manage the rights any way you want.
 
 - end of file -
 
diff --git a/modules/webalert/doc/admin/guide.html.wml b/modules/webalert/doc/admin/guide.html.wml index f7d7ef41b..4a520a8a8 100644 --- a/modules/webalert/doc/admin/guide.html.wml +++ b/modules/webalert/doc/admin/guide.html.wml @@ -1,77 +1,79 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="WebAlert Admin Guide" \ navtrail_previous_links="/admin/> > /admin/webalert/>WebAlert Admin" \ navbar_name="admin" \ navbar_select="webalert-admin-guide"

WARNING: THIS ADMIN GUIDE IS NOT FULLY COMPLETED
This Admin Guide is not yet completed. Moreover, some admin-level functionality for this module exists only in the form of manual recipes. We are in the process of developing both the guide as well as the web admin interface. If you are interested in seeing some specific things implemented with high priority, please contact us at . Thanks for your interest!
+

Version <: print generate_pretty_revision_date_string('$Id$'); :> +

Overview

users may set up an automatic notification email alerts that would send them documents corresponding to the user profile by email either daily, weekly, or monthly. It is the job of the WebAlert module to permit this functionality.

Configuring Alert Queries

Users may set up alert queries for example from their search history pages.

Administrators may edit existing users' alerts by modifying the user_query_basket table. (There is no web interface yet for this task.)

Running Alert Engine

The alert engine has to be run each day in order to send users email notifications for the alerts they have set up:

    $ alertengine
    
HINT: You may want to set up an external cron job to call alertengine each day. diff --git a/modules/webbasket/doc/admin/guide.html.wml b/modules/webbasket/doc/admin/guide.html.wml index 141c8806f..8cc5d7226 100644 --- a/modules/webbasket/doc/admin/guide.html.wml +++ b/modules/webbasket/doc/admin/guide.html.wml @@ -1,27 +1,29 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="WebBasket Admin Guide" \ navtrail_previous_links="/admin/> > /admin/webbasket/>WebBasket Admin" \ navbar_name="admin" \ navbar_select="webbasket-admin-guide" -Not implemented yet. If you want to manipulate user baskets, see +

Version <: print generate_pretty_revision_date_string('$Id$'); :> + +

Not implemented yet. If you want to manipulate user baskets, see tables user_basket, basket, basket_record. diff --git a/modules/websession/doc/admin/guide.html.wml b/modules/websession/doc/admin/guide.html.wml index 5f5642cc3..37cf7bfaa 100644 --- a/modules/websession/doc/admin/guide.html.wml +++ b/modules/websession/doc/admin/guide.html.wml @@ -1,63 +1,65 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="WebSession Admin Guide" \ navtrail_previous_links="/admin/> > /admin/websession/>WebSession Admin" \ navbar_name="admin" \ navbar_select="websession-admin-guide"

WARNING: THIS ADMIN GUIDE IS NOT FULLY COMPLETED
This Admin Guide is not yet completed. Moreover, some admin-level functionality for this module exists only in the form of manual recipes. We are in the process of developing both the guide as well as the web admin interface. If you are interested in seeing some specific things implemented with high priority, please contact us at . Thanks for your interest!
+

Version <: print generate_pretty_revision_date_string('$Id$'); :> +

Guest User Sessions

Guest users create a lot of entries in tables that are related to their web sessions, their search history, personal baskets, etc. This data has to be garbage-collected periodically. At the moment this is done via a command line program:

    $ sessiongc
    
HINT: You may want to launch this command every day. In the future the garbage collection task may be done via BibSched task queue. diff --git a/modules/webstyle/doc/admin/guide.html.wml b/modules/webstyle/doc/admin/guide.html.wml index a11a75bee..a9abb1c82 100644 --- a/modules/webstyle/doc/admin/guide.html.wml +++ b/modules/webstyle/doc/admin/guide.html.wml @@ -1,54 +1,56 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "cdspage.wml" \ title="WebStyle Admin Guide" \ navtrail_previous_links="/admin/> > /admin/webstyle/>WebStyle Admin" \ navbar_name="admin" \ navbar_select="webstyle-admin-guide" +

Version <: print generate_pretty_revision_date_string('$Id$'); :> +

Compile-time Configuration of Page Layout

The style of the CDSware installation is defined largely also during the configuration time, as explained in the INSTALL guide. You can modify page header, footer, general portalboxes, etc. See the installation guide for more details.

Messages emited by the web interface are to be edited during configuration time in the messages.wml file.

Run-time Configuration of Page Layout

During runtime you most probably want to modify mostly the CDS style sheet and images.

The look of the search interface pages may be modify to a very large extent in the WebSearch Admin Interface by adding portalboxes on various places on the page.

Advanced Page Layout Changes

More advanced changes to the web page layout have to be carried out on the programming level. For example, most mod_python dynamic pages are using the page() function defined in the webpage.py file. diff --git a/modules/websubmit/doc/admin/index.html.wml b/modules/websubmit/doc/admin/index.html.wml index 4f6a84372..82e76fc6f 100644 --- a/modules/websubmit/doc/admin/index.html.wml +++ b/modules/websubmit/doc/admin/index.html.wml @@ -1,82 +1,84 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "configbis.wml" #include "cdspage.wml" \ title="" \ navtrail_previous_links="/admin/> > /admin/websubmit/>" \ navbar_name="admin" \ navbar_select="websubmit-admin-guide" +

Version <: print generate_pretty_revision_date_string('$Id$'); :> +

Table of Contents