diff --git a/ABOUT-NLS b/ABOUT-NLS
index cfc796ed6..60f81badf 100644
--- a/ABOUT-NLS
+++ b/ABOUT-NLS
@@ -1,310 +1,330 @@
Invenio NATIVE LANGUAGE SUPPORT
===============================
About
=====
This document describes the Native Language Support (NLS) in Invenio.
Contents
========
1. Native Language Support information for administrators
2. Native Language Support information for translators
3. Native Language Support information for programmers
A. Introducing a new language
B. Integrating translation contributions
1. Native Language Support information for administrators
=========================================================
Invenio is currently available in the following languages:
af = Afrikaans
ar = Arabic
bg = Bulgarian
ca = Catalan
cs = Czech
de = German
el = Greek
en = English
es = Spanish
fa = Persian (Farsi)
fr = French
gl = Galician
hr = Croatian
hu = Hungarian
it = Italian
ja = Japanese
ka = Georgian
lt = Lithuanian
no = Norwegian (Bokmål)
pl = Polish
pt = Portuguese
ro = Romanian
ru = Russian
rw = Kinyarwanda
sk = Slovak
sv = Swedish
uk = Ukrainian
zh_CN = Chinese (China)
zh_TW = Chinese (Taiwan)
If you are installing Invenio and you want to enable/disable some
languages, please just follow the standard installation procedure as
described in the INSTALL file. The default language of the
installation as well as the list of all user-seen languages can be
selected in the general invenio.conf file, see variables CFG_SITE_LANG
and CFG_SITE_LANGS.
(Please note that some runtime Invenio daemons -- such as webcoll,
responsible for updating the collection cache, running every hour or
so -- may work twice as long when twice as many user-seen languages
are selected, because it creates collection cache page elements for
every user-seen language. Therefore, if you have defined thousands of
collections and if you find the webcoll speed to be slow in your
setup, you may want to try to limit the list of selected languages.)
2. Native Language Support information for translators
======================================================
If you want to contibute a translation to Invenio, then please follow
the procedure below:
- Please check out the existence of po/LL.po file for your language,
where LL stands for the ISO 639 language code (e.g. `el' for
Greek). If such a file exists, then this language is already
supported, in which case you may want to review the existing
translation (see below). If the file does not exist yet, then you
can create an empty one by copying the invenio.pot template file
into LL.po that you can review as described in the next item.
(Please note that you would have to translate some dynamic
elements that are currently not located in the PO file, see the
appendix A below.)
- Please edit LL.po to review existing translation. The PO file
format is a standard GNU gettext one and so you can take advantage
of dedicated editing modes of programs such as GNU Emacs, KBabel,
or poEdit to edit it. Pay special attention to strings marked as
fuzzy and untranslated. (E.g. in the Emacs PO mode, press `f' and
`u' to find them.) Do not forget to remove fuzzy marks for
reviewed translations. (E.g. in the Emacs PO mode, press `TAB' to
remove fuzzy status of a string.)
- After you are done with translations, please validate your file to
make sure it does not contain formatting errors. (E.g. in the
Emacs PO mode, press `V' to validate the file.)
- If you have access to a test installation of Invenio, you may want
to see your modified PO file in action:
$ cd po
$ emacs ja.po # edit Japanese translation
$ make update-gmo
$ make install
$ sudo apachectl restart
$ firefox http://your.site/?ln=ja # check it out in context
If you do not have access to a test installation, please
contribute your PO file to the developers team (see the next step)
and we shall install it on a test site and contact you so that you
will be able to check your translation in the global context of
the application.
(Note to developers: note that ``make update-gmo'' command may be
necessary to run before ``make'' if the latter fails, even if you
are not touching translation business at all. The reason being
that the gmo files are not stored in CVS, while they are included
in the distribution tarball. So, if you are building from CVS,
and you do not have them in your tree, you may get build errors in
directories like modules/webhelp/web/admin saying things like ``No
rule to make target `index.bg.html'''. The solution is to run
``make update-gmo'' to produce the gmo files before running
``make''. End of note to developers.)
- Please contribute your translation by emailing the file to
The INVENIO admin can configure the various ways in which authority control works for INVENIO by means of the For examples of how Authority Control works in Invenio from a user's perspective, cf. _(HOWTO Manage Authority Records)_. INVENIO is originally agnostic about the types of authority records it contains. Everything it needs to know about authority records comes, on the one hand, from the authority record types that are contained within the '980__a' fields, and from the configurations related to these types on the other hand. Whereas the '980__a' values are usually edited by the librarians, the INVENIO configuration is the responsibility of the administrator. It is important for librarians and administrators to communicate the exact authority record types as well as the desired functionality relative to the types for the various INVENIO modules. As admin of an INVENIO instance, you have the possibility of configuring which fields are under authority control. In the “Configuration File Overview” at the end of this page you will find an example of a configuration which will enable the auto-complete functionality for the '100__a', '100__u', '110__a', '130__a', '150__a', '700__a' and '700__u' fields of a bibliographic record in BibEdit. The keys of the “CFG BIBAUTHORITY CONTROLLED FIELDS” dictionary indicate which bibliographic fields are under authority control. If the user types Ctrl-Shift-A while typing within one of these fields, they will propose an auto-complete dropdown list in BibEdit. The user still has the option to enter values manually without use of the drop-down list. The values associated with each key of the dictionary indicate which kind of authority record is to be associated with this field. In the example given, the '100__a' field is associated with the authority record type 'AUTHOR'. The “CFG BIBAUTHORITY AUTOSUGGEST OPTIONS” dictionary gives us the remaining configurations, specific only to the auto-suggest functionality. The value for the 'index' key determines which index type will be used find the authority records that will populate the drop-down with a list of suggestions (cf. the following paragraph on configuring the BibIndex for authority records). The value of the 'insert_here_field' determines which authority record field contains the value that should be used both for constructing the strings of the entries in the drop-down list as well as the value to be inserted directly into the edited subfield if the user clicks on one of the drop-down entries. Finally, the value for the 'disambiguation_fields' key is an ordered list of authority record fields that are used, in the order in which they appear in the list, to disambiguate between authority records with exactly the same value in their 'insert_here_field'. As an admin of INVENIO, you have the possibility of configuring how indexing works in regards to authority records that are referenced by bibliographic records. When a bibliographic record is indexed for a particular index type, and if that index type contains MARC fields which are under authority control in this particular INVENIO instance (as configured by the, “CFG BIBAUTHORITY CONTROLLED FIELDS” dictionary in the bibauthority_config.py configuration file, mentioned above), then the indexer will include authority record data from specific MARC fields of these authority records in the same index. Which authority record fields are to be used to enrich the indexes for bibliographic records can be configured by the “CFG BIBAUTHORITY AUTHORITY SUBFIELDS TO INDEX” dictionary. In the example below each of the 4 authority record types ('AUTHOR', 'INSTITUTION', 'JOURNAL' and 'SUBJECT') is given a list of authority record MARC fields which are to be scanned for data that is to be included in the indexed terms of the dependent bibliographic records. For the 'AUTHOR' authority records, the example specifies that the values of the fields '100__a', '100__d', '100__q', '400__a', '400__d', and '400__q' (i.e. name, alternative names, and year of birth) should all be included in the data to be indexed for any bibliographic records referencing these authority records in their authority-controlled subfields. As an admin of INVENIO, you have the possibility of configuring how indexing works in regards to authority records that are referenced by bibliographic records. When a bibliographic record is indexed for a particular index type, and if that index type contains MARC fields which are under authority control in this particular INVENIO instance (as configured by the, “CFG BIBAUTHORITY CONTROLLED FIELDS” dictionary in the bibauthority_config.py configuration file, mentioned above), then the indexer will include authority record data from specific MARC fields of these authority records in the same index. Which authority record fields are to be used to enrich the indexes for bibliographic records can be configured by the “CFG BIBAUTHORITY AUTHORITY SUBFIELDS TO INDEX” dictionary. In the example below each of the 4 authority record types ('AUTHOR', 'INSTITUTE', 'JOURNAL' and 'SUBJECT') is given a list of authority record MARC fields which are to be scanned for data that is to be included in the indexed terms of the dependent bibliographic records. For the 'AUTHOR' authority records, the example specifies that the values of the fields '100__a', '100__d', '100__q', '400__a', '400__d', and '400__q' (i.e. name, alternative names, and year of birth) should all be included in the data to be indexed for any bibliographic records referencing these authority records in their authority-controlled subfields. The configuration file for the BibAuthority module can be found at There are two cases that need special attention when idexing bibliographic
data that contains references to authority records.
The first case is relatively simple and requires the
enriching of bibliographic data with data from authority records
whenever a bibliographic record is being indexed. The second is a bit
more complex, for it requires detecting which bibliographic records
should be re-indexed, based on referenced authority records having
been updated within a given date range. First of all, we need to say something about how INVENIO let's the
admin index the data. INVENIO's indexer (BibIndex) is always run as a
task that is executed by INVENIO's scheduler (BibSched). Typically,
this is done either by scheduling a bibindex task from the command
line (manually), or it is part of a periodic task (BibTask) run
directly from BibSched, typically ever 5 minutes. In case it is run
manually, the user has the option of specifying certain record IDs to
be re-indexed, e.g. by specifying ranges of IDs or collections to be
re-indexed. In this case, the selected records are re-indexed whether
or not there were any modifications to the data. Alternatively, the
user can specify a date range, in which case the indexer will search
all the record IDs that have been modified in the selected date range
(by default, the date range would specify all IDs modified since the
last time the indexer was run) and update the index only for those
records. As a third option, the user can specify specific types of
indexes. INVENIO lets you search by different criteria (e.g. 'any
field', 'title', 'author', 'abstract', 'keyword', 'journal', 'year',
'fulltext', …), and each of these criteria corresponds to a
separate index, indexing only the data from the relevant MARC
subfields. Normally, the indexer would update all index types for any
given record ID, but with this third option, the user can limit the
re-indexing to only specific types of indexes if desired.
Note: In reality, INVENIO creates not only 1 but 6 different
indexes per index type. 3 are forward indexes (mapping words, pairs
or phrases to record IDs), 3 are reverse indexes (mapping record IDs
to words, pairs or phrases). The word, pair and phrase indexes are
used for optimizing the searching speed depending on whether the user
searches for words, sub-phrases or entire phrases. These details are
however not relevant for BibAuthority. It simply finds the values to
be indexed and passes them on to the indexer which indexes them as if
it was data coming directly from the bibliographic record. Once the indexer knows which record ID (and optionally, which
index type) to re-index, including authority data is simply a
question of checking whether the MARC subfields currently being
indexed are under authority control (as specified in the BibAuthority
configuration file). If they are, the indexer must follow the
following (pseudo-)algorithm which will fetch the necessary data from
the referenced authority records: For each
subfield and each record ID currently being re-indexed: If the
subfield is under authority control (→ config file): Get the
type of referenced authority record expected for this field For each
authority record control number found in the corresponding 'XXX__0'
subfields and matching the expected authority record type
(control number prefix): Find the
authority record ID (MARC field '001' control number) corresponding
to the authority record control number (as contained in MARC field
'035' of the authority record) For each
authority record subfield marked as index relevant for the given
$type (→ config file) Add the
values of these subfields to the list of values to be returned and
used for enriching the indexed strings. The strings collected with this algorithm are simply added to the
strings already found by the indexer in the regular bibliographic
record MARC data. Once all the strings are collected, the indexer
goes on with the usual operation, parsing them 3 different times,
once for phrases, once for word-pairs, once for words, which are used
to populate the 6 forward and reverse index tables in the database. When a bibindex task is created by date range, we are presented
with a more tricky situation which requires a more complex treatment
for it to work properly. As long as the bibindex task is configured
to index by record ID, the simple algorithm described above is enough
to properly index the authority data along with the data from
bibliographic records. This is true also if we use the third option
described above, specifying the particular index type to re-index
with the bibindex task. However, if we launch a bibindex task based
on a date range (by default the date range covers the time since the
last time bibindex task was run on for each of the index types),
bibindex would have no way to know that it must update the index for
a specific bibliographic record if one of the authority records it
references was modified in the specified date range. This would lead
to incomplete indexes. A first idea was to modify the time-stamp for any bibliographic
records as soon as an authority record is modified. Every MARC record
in INVENIO has a 'modification_date' time-stamp which indicates to
the indexer when this record was last modified. If we search for
dependent bibliographic records every time we modify an authority
record, and if we then update the 'modification_date' time-stamp for
each of these dependent bibliographic records, then we can be sure
that the indexer would find and re-index these bibliographic records
as well when indexing by a specified date-range. The problem with
this is a performance problem. If we update the time-stamp for the
bibliographic record, this record will be re-indexed for all of the
mentioned index-types ('author', 'abstract', 'fulltext', etc.), even
though many of them may not cover MARC subfields that are under
authority control, and hence re-indexing them because of a change in
an authority record would be quite useless. In an INVENIO
installation there would typically be 15-30 index-types. Imagine if
you make a change to a 'journal' authority record and only 1 out of
the 20+ index-types is for 'journal'. INVENIO would be re-indexing
20+ index types in stead of only the 1 index type which is relevant
to the the type of the changed authority record. There are two approaches that could solve this problem equally
well. The first approach would require checking – for each
authority record ID which is to be re-indexed – whether there are
any dependent bibliographic records that need to be re-indexed as
well. If done in the right manner, this approach would only re-index
the necessary index types that can contain information from
referenced authority records, and the user could specify the index
type to be re-indexed and the right bibliographic records would still
be found. The second approach works the other way around. In stead of
waiting until we find a recently modified authority record, and then
looking for dependent bibliographic records, we directly launch a
search for bibliographic records containing links to recently updated
authority records and add the record IDs found in this way to the
list of record IDs that need to be re-indexed.
Of the two approaches, the second one was choses based solely upon
considerations of integration into existing INVENIO code. As indexing
in INVENIO currently works, it is more natural and easily readable to
apply the second method than the first. According to the second method, the pseudo-algorithm for finding
the bibliographic record IDs that need to be updated based upon
recently modified authority records in a given date range looks like
this: For each index-type to
re-index: For each subfield
concerned by the index-type: If the subfield is under
authority control (→ config file): Get the type of authority
record associated with this field Get all of the record IDs for
authority records updated in the specified date range. For each record ID Get the authority record
control numbers of this record ID For each authority record
control number Search for and add the
record IDs of bibliographic
records containing this control number (with type in the
prefix) in the 'XXX__0' field of the current subfield
to the list of record IDs to be returned to the caller to be marked
as needing re-indexing. The record IDs returned in this way are added to the record IDs
that need to be re-indexed (by date range) and then the rest of the
indexing can run as usual. The pseudo-algorithms described above were used as described in
this document, but were not each implemented in a single function. In
order for parts of them to be reusable and also for the various parts
to be properly integrated into existing python modules with similar
functionality (e.g auxiliary search functions were added to INVENIO's
search_engine.py code), the pseudo-algorithms were split up into
multiple nested function calls and integrated where it seemed to best
fit the existing code base of INVENIO. In the case of the
pseudo-algorithm described in “Updating the index by date range”,
the very choice of the algorithm had already depended on how to best
integrate it into the existing code for date-range related indexing. In order to reference authority records, we use alphanumeric strings
stored in the $0 subfields of fields that contain other, authority-controlled
subfields as well. The format of these alphanumeric strings for INVENIO is
in part determined by the MARC standard itself, which states that: Subfield $0 contains the system control
number of the related authority record, or a standard identifier such
as an International Standard Name Identifier (ISNI). The control
number or identifier is preceded by the appropriate MARC Organization
code (for a related authority record) or the Standard Identifier
source code (for a standard identifier scheme), enclosed in
parentheses. See MARC Code List for Organizations for a listing of
organization codes and Standard Identifier Source Codes for code
systems for standard identifiers. Subfield $0 is repeatable for
different control numbers or identifiers. An example of such a string could be
“(SzGeCERN)abc1234”, where “SzGeCERN” would be the MARC
organization code,
and abc1234 would be the unique identifier for this authority record
within the given organization. Since it is possible for a single field (e.g. field '100') to have
multiple $0 subfields for the same field entry, we need a way to
specify which $0 subfield reference is associated with which other
subfield of the same field entry. For example, imagine that in bibliographic records both '700__a'
('other author' name) as well as '700__u' ('other author'
affiliation) are under authority control. In this case we
would have two '700__0' subfields. Of of them would reference the
author authority record (for the name),
-the other one would reference an institution authority record
+the other one would reference an institute authority record
(for the affiliation).
INVENIO needs some way to know which $0 subfield is associated with
the $a subfield and which one with the $u subfield. We have chosen to solve this in the
following way. Every $0 subfield value will not only contain the
authority record control number, but in addition will be prefixed by
-the type of authority record (e.g. 'AUTHOR', 'INSTITUTION', 'JOURNAL'
+the type of authority record (e.g. 'AUTHOR', 'INSTITUTE', 'JOURNAL'
or 'SUBJECT), separated from the control number by a separator, e.g. ':' (configurable). A possible
$0 subfield value could therefore be: “author:(SzGeCERN)abc1234”.
This will allow INVENIO to know that the $0 subfield containing
“author:(SzGeCERN)abc1234” is associated with the $a subfield
(author's name), containing e.g. “Ellis, John”, whereas the $0
-subfield containing “institution:(SzGeCERN)xyz4321” is associated
-with the $u subfield (author's affiliation/institution) of the same
+subfield containing “institute:(SzGeCERN)xyz4321” is associated
+with the $u subfield (author's affiliation/institute) of the same
field entry, containing e.g. “CERN”. BibClassify automatically extracts keywords from fulltext documents.
The automatic assignment of keywords to textual documents has clear
benefits in the digital library environment as it aids
catalogization, classification and retrieval of documents. BibClassify performs an extraction of keywords based on the
recurrence of specific terms, taken from a controlled vocabulary. A
controlled vocabulary is a thesaurus of all the terms that are
relevant in a specific context. When a context is defined by a
discipline or branch of knowledge then the vocabulary is said to be a
subject thesaurus. Various existing subject thesauri can be found
here. A subject thesaurus can be expressed in several different
-formats. Different institutions/disciplines have developed different
+formats. Different institutes/disciplines have developed different
ways of representing their vocabulary systems. The taxonomy used by
bibclassify is expressed in RDF/SKOS. It allows not only to list keywords but
to specify relations between the keywords and alternative ways to represent the
same keyword. The specification of the SKOS language and
various manuals
that aid the building of a semantic thesaurus can be found at the
SKOS W3C website.
Furthermore, BibClassify can function on top of an extended version of SKOS,
which includes special elements such as key chains, composite keywords and
special annotations. The extension of the SKOS language is documented in the
hacking guide. BibClassify computes the keywords of a fulltext document based on the
frequency of thesaurus terms in it. In other words, it calculates how many
times a thesaurus keyword (and its alternative and hidden labels, defined in
the taxonomy) appears in a text and it ranks the results. Unlike other similar
systems, BibClassify does not use any machine learning or AI methodologies - a
just plain phrase matching using
regular expressions:
it exploits the conformation and richness of the thesaurus to produce accurate
results. It is then clear that BibClassify performs best on top of rich,
well-structured, subject thesauri expressed in the RDF/SKOS language. A detailed account of the phrase matching mechanisms used by BibClassify is
included in the
hacking guide. Dependencies. BibClassify requires
Python RDFLib in order to process the
RDF/SKOS taxonomy. In order to extract relevant keywords from a document
Launching NB. BibClassify can run as a CDS
Invenio module or as a standalone program. If you already run a server with a
Invenio installation, you can simply run
/opt/invenio/bin/bibclassify [options]. Otherwise, you can run from
BibClassify sources bibclassify [options].
As an example, running BibClassify on document
nucl-th/0204033 using the
high-energy physics RDF/SKOS taxonomy (Introduction
bibauthority_config.py
file. The location and full contents of this configuration file with a commented example configuration are shown at the bottom of this page. Their functionality is explained in the following paragraphs.Enforcing types of authority records
BibEdit
BibIndex
-Configuration File Overview
invenio/lib/python/invenio.legacy.bibauthority.config.py
. Below is a commented example configuration to show how one would typically configure the parameters for BibAuthority. The details of how this works were explained in the paragraphs above.
# CFG_BIBAUTHORITY_RECORD_CONTROL_NUMBER_FIELD
# the authority record field containing the authority record control number
CFG_BIBAUTHORITY_RECORD_CONTROL_NUMBER_FIELD = '035__a'
# Separator to be used in control numbers to separate the authority type
-# PREFIX (e.g. "INSTITUTION") from the control_no (e.g. "(CERN)abc123"
+# PREFIX (e.g. "INSTITUTE") from the control_no (e.g. "(CERN)abc123"
CFG_BIBAUTHORITY_PREFIX_SEP = '|'
# the ('980__a') string that identifies an authority record
CFG_BIBAUTHORITY_AUTHORITY_COLLECTION_IDENTIFIER = 'AUTHORITY'
# the name of the authority collection.
# This is needed for searching within the authority record collection.
CFG_BIBAUTHORITY_AUTHORITY_COLLECTION_NAME = 'Authority Records'
# used in log file and regression tests
CFG_BIBAUTHORITY_BIBINDEX_UPDATE_MESSAGE = \
"Indexing records dependent on modified authority records"
# CFG_BIBAUTHORITY_TYPE_NAMES
# Some administrators may want to be able to change the names used for the
# authority types. Although the keys of this dictionary are hard-coded into
# Invenio, the values are not and can therefore be changed to match whatever
# values are to be used in the MARC records.
# WARNING: These values shouldn't be changed on a running INVENIO installation
# ... since the same values are hard coded into the MARC data,
# ... including the 980__a subfields of all authority records
# ... and the $0 subfields of the bibliographic fields under authority control
CFG_BIBAUTHORITY_TYPE_NAMES = {
- 'INSTITUTION': 'INSTITUTION',
+ 'INSTITUTE': 'INSTITUTE',
'AUTHOR': 'AUTHOR',
'JOURNAL': 'JOURNAL',
'SUBJECT': 'SUBJECT',
}
# CFG_BIBAUTHORITY_CONTROLLED_FIELDS_BIBLIOGRAPHIC
# 1. tells us which bibliographic subfields are under authority control
# 2. tells us which bibliographic subfields refer to which type of
# ... authority record (must conform to the keys of CFG_BIBAUTHORITY_TYPE_NAMES)
CFG_BIBAUTHORITY_CONTROLLED_FIELDS_BIBLIOGRAPHIC = {
'100__a': 'AUTHOR',
- '100__u': 'INSTITUTION',
- '110__a': 'INSTITUTION',
+ '100__u': 'INSTITUTE',
+ '110__a': 'INSTITUTE',
'130__a': 'JOURNAL',
'150__a': 'SUBJECT',
- '260__b': 'INSTITUTION',
+ '260__b': 'INSTITUTE',
'700__a': 'AUTHOR',
- '700__u': 'INSTITUTION',
+ '700__u': 'INSTITUTE',
}
# CFG_BIBAUTHORITY_CONTROLLED_FIELDS_AUTHORITY
# Tells us which authority record subfields are under authority control
# used by autosuggest feature in BibEdit
# authority record subfields use the $4 field for the control_no (not $0)
CFG_BIBAUTHORITY_CONTROLLED_FIELDS_AUTHORITY = {
'500__a': 'AUTHOR',
- '510__a': 'INSTITUTION',
+ '510__a': 'INSTITUTE',
'530__a': 'JOURNAL',
'550__a': 'SUBJECT',
- '909C1u': 'INSTITUTION', # used in bfe_affiliation
- '920__v': 'INSTITUTION', # used by FZ Juelich demo data
+ '909C1u': 'INSTITUTE', # used in bfe_affiliation
+ '920__v': 'INSTITUTE', # used by FZ Juelich demo data
}
# constants for CFG_BIBEDIT_AUTOSUGGEST_TAGS
# CFG_BIBAUTHORITY_AUTOSUGGEST_SORT_ALPHA for alphabetical sorting
# ... of drop-down suggestions
# CFG_BIBAUTHORITY_AUTOSUGGEST_SORT_POPULAR for sorting of drop-down
# ... suggestions according to a popularity ranking
CFG_BIBAUTHORITY_AUTOSUGGEST_SORT_ALPHA = 'alphabetical'
CFG_BIBAUTHORITY_AUTOSUGGEST_SORT_POPULAR = 'by popularity'
# CFG_BIBAUTHORITY_AUTOSUGGEST_CONFIG
# some additional configuration for auto-suggest drop-down
# 'field' : which logical or MARC field field to use for this
# ... auto-suggest type
# 'insert_here_field' : which authority record field to use
# ... for insertion into the auto-completed bibedit field
# 'disambiguation_fields': an ordered list of fields to use
# ... in case multiple suggestions have the same 'insert_here_field' values
# TODO: 'sort_by'. This has not been implemented yet !
CFG_BIBAUTHORITY_AUTOSUGGEST_CONFIG = {
'AUTHOR': {
'field': 'authorityauthor',
'insert_here_field': '100__a',
'sort_by': CFG_BIBAUTHORITY_AUTOSUGGEST_SORT_POPULAR,
'disambiguation_fields': ['100__d', '270__m'],
},
- 'INSTITUTION':{
- 'field': 'authorityinstitution',
+ 'INSTITUTE':{
+ 'field': 'authorityinstitute',
'insert_here_field': '110__a',
'sort_by': CFG_BIBAUTHORITY_AUTOSUGGEST_SORT_ALPHA,
'disambiguation_fields': ['270__b'],
},
'JOURNAL':{
'field': 'authorityjournal',
'insert_here_field': '130__a',
'sort_by': CFG_BIBAUTHORITY_AUTOSUGGEST_SORT_POPULAR,
},
'SUBJECT':{
'field': 'authoritysubject',
'insert_here_field': '150__a',
'sort_by': CFG_BIBAUTHORITY_AUTOSUGGEST_SORT_ALPHA,
},
}
# list of authority record fields to index for each authority record type
# R stands for 'repeatable'
# NR stands for 'non-repeatable'
CFG_BIBAUTHORITY_AUTHORITY_SUBFIELDS_TO_INDEX = {
'AUTHOR': [
'100__a', #Personal Name (NR, NR)
'100__d', #Year of birth or other dates (NR, NR)
'100__q', #Fuller form of name (NR, NR)
'400__a', #(See From Tracing) (R, NR)
'400__d', #(See From Tracing) (R, NR)
'400__q', #(See From Tracing) (R, NR)
],
- 'INSTITUTION': [
+ 'INSTITUTE': [
'110__a', #(NR, NR)
'410__a', #(R, NR)
],
'JOURNAL': [
'130__a', #(NR, NR)
'130__f', #(NR, NR)
'130__l', #(NR, NR)
'430__a', #(R, NR)
],
'SUBJECT': [
'150__a', #(NR, NR)
'450__a', #(R, NR)
],
}
diff --git a/invenio/legacy/bibauthority/doc/hacking/bibauthority-internals.webdoc b/invenio/legacy/bibauthority/doc/hacking/bibauthority-internals.webdoc
index 5e1cfdf53..4f3203c8e 100644
--- a/invenio/legacy/bibauthority/doc/hacking/bibauthority-internals.webdoc
+++ b/invenio/legacy/bibauthority/doc/hacking/bibauthority-internals.webdoc
@@ -1,254 +1,254 @@
## -*- mode: html; coding: utf-8; -*-
## This file is part of Invenio.
-## Copyright (C) 2011 CERN.
+## Copyright (C) 2011, 2013 CERN.
##
## Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
Here you will find a few explanations to the inner workings of BibAuthority.
Indexing
Introduction
Indexing by record ID, by modification date
or by index type
Enriching the index data – simple case
Updating the index by date range
Implementation specifics
Cross-referencing between MARC records
Contents
1. Overview
1.1 Thesaurus
1.2 Keyword extraction
2. Running BibClassify
1. Overview
1.1 Thesaurus
In RDF/SKOS, every keyword is wrapped around a concept which
encapsulates the full semantics and hierarchical status of a term - including
synonyms, alternative forms, broader concepts, notes and so on - rather than
just a plain keyword.
<Concept rdf:about="http://cern.ch/thesauri/HEP.rdf#scalar">
<composite rdf:resource="http://cern.ch/thesauri/HEP.rdf#Composite.fieldtheoryscalar"/>
<prefLabel xml:lang="en">scalar</prefLabel>
<note xml:lang="en">nostandalone</note>
</Concept>
<Concept rdf:about="http://cern.ch/thesauri/HEP.rdf#fieldtheory">
<composite rdf:resource="http://cern.ch/thesauri/HEP.rdf#Composite.fieldtheoryscalar"/>
<prefLabel xml:lang="en">field theory</prefLabel>
<altLabel xml:lang="en">QFT</altLabel>
<hiddenLabel xml:lang="en">/field theor\w*/</hiddenLabel>
<note xml:lang="en">nostandalone</note>
</Concept>
<Concept rdf:about="http://cern.ch/thesauri/HEP.rdf#Composite.fieldtheoryscalar">
<compositeOf rdf:resource="http://cern.ch/thesauri/HEP.rdf#scalar"/>
<compositeOf rdf:resource="http://cern.ch/thesauri/HEP.rdf#fieldtheory"/>
<prefLabel xml:lang="en">field theory: scalar</prefLabel>
<altLabel xml:lang="en">scalar field</altLabel>
</Concept>
1.2 Keyword extraction
2. Running BibClassify
fulltext.pdf
based on a controlled vocabulary
thesaurus.rdf
, you would run BibClassify as follows:
$ bibclassify.py -k thesaurus.rdf fulltext.pdf
bibclassify --help
shows the options
available for BibClassify:
Usage: bibclassify [OPTION]... [FILE/URL]...
bibclassify [OPTION]... [DIRECTORY]...
Searches keywords in FILEs and/or files in DIRECTORY(ies). If a directory is
specified, BibClassify will generate keywords for all PDF documents contained
in the directory. Can also run in a daemon mode, in which case the files to
be run are looked for from the database (=records modified since the last run).
General options:
-h, --help display this help and exit
-V, --version output version information and exit
-v, --verbose=LEVEL sets the verbose to LEVEL (=0)
-k, --taxonomy=NAME sets the taxonomy NAME. It can be a simple
controlled vocabulary or a descriptive RDF/SKOS
and can be located in a local file or URL.
Standalone file mode options:
-o, --output-mode=TYPE changes the output format to TYPE (text, marcxml or
html) (=text)
-s, --spires outputs keywords in the SPIRES format
-n, --keywords-number=INT sets the number of keywords displayed (=20), use 0
to set no limit
-m, --matching-mode=TYPE changes the search mode to TYPE (full or partial)
(=full)
--detect-author-keywords detect keywords that are explicitely written in the
document
Daemon mode options:
-i, --recid=RECID extract keywords for a record and store into DB
(=all necessary ones for pre-defined taxonomies)
-c, --collection=COLL extract keywords for a collection and store into DB
(=all necessary ones for pre-defined taxonomies)
Taxonomy management options:
--check-taxonomy checks the taxonomy and reports warnings and errors
--rebuild-cache ignores the existing cache and regenerates it
--no-cache don't cache the taxonomy
Backward compatibility options (discouraged):
-q equivalent to -s
-f FILE URL sets the file to read the keywords from
Examples (standalone file mode):
$ bibclassify -k HEP.rdf http://arxiv.org/pdf/0808.1825
$ bibclassify -k HEP.rdf article.pdf
$ bibclassify -k HEP.rdf directory/
Examples (daemon mode):
$ bibclassify -u admin -s 24h -L 23:00-05:00
$ bibclassify -u admin -i 1234
$ bibclassify -u admin -c Preprints
HEP.rdf
) would yield the
following results (based on the HEP taxonomy from October 10th 2008):
or, the following keyword-cloud HTML visualization:
Input file: 0204033.pdf
Author keywords:
Dense matter
Saturation
Unstable nuclei
Composite keywords:
10 nucleus: stability [36, 14]
6 saturation: density [25, 31]
6 energy: symmetry [35, 11]
4 nucleon: density [13, 31]
3 energy: Coulomb [35, 3]
2 energy: density [35, 31]
2 nuclear matter: asymmetry [21, 2]
1 n: matter [54, 36]
1 n: density [54, 31]
1 n: mass [54, 16]
Single keywords:
61 K0
23 equation of state
12 slope
4 mass number
4 nuclide
3 nuclear model
3 mass formula
2 charge distribution
2 elastic scattering
2 binding energy
BibUpload enables you to upload bibliographic data in MARCXML format into Invenio bibliographic database. It is also used internally by other Invenio modules as the sole entrance of metadata into the bibliographic databases.
Note that before uploading a MARCXML file, you may want to run
provided /opt/invenio/bin/xmlmarclint
on it in order
to verify its correctness.
BibUpload takes a MARCXML file as its input. There is nothing to be configured for these files. If the files have to be coverted into MARCXML from some other format, structured or not, this is usually done beforehand via BibConvert module.
Note that if you are using external system numbers for your records, such as when your records are being synchronized from an external system, then BibUpload knows about the tag 970 as the one containing external system number. (To change this 970 tag into something else, you would have to edit BibUpload config source file.)
Note also that in the similar way BibUpload knows about OAI identifiers, so that it will refuse to insert the same OAI harvested record twice, for example.
Consider that you have an MARCXML file containing new records that is to be uploaded into the Invenio. (For example, it might have been produced by BibConvert.) To finish the upload, you would call the BibUpload script in the insert mode as follows:
In the insert mode, all the records from the file will be treated as new. This means that they should not contain neither 001 tags (holding record IDs) nor 970 tags (holding external system numbers). BibUpload would refuse to upload records having these tags, in order to prevent potential double uploading. If your file does contain 001 or 970, then chances are that you want to update existing records, not re-upload them as new, and so BibUpload will warn you about this and will refuse to continue.$ bibupload -i file.xml
For example, to insert a new record, your file should look like this:
<record> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">Doe, John</subfield> </datafield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">On The Foo And Bar</subfield> </datafield> </record>
A special mode of BibUpload that is thigthly connected with BibEdit is the Holding Pen mode.
When you insert a record using the holding pen mode such as in the following example:
the records are not actually integrated into the database, but are instead put into an intermediate space called holding pen, where authorized curators can review them, manipulate them and eventually approve them.$ bibupload -o file.xml
The holding pen is integrated with BibEdit.
When you want to update existing records, with the new content from your input MARCXML file, then your input file should contain either tags 001 (holding record IDs) or tag 970 (holding external system numbers). BibUpload will try to match existing records via 001 and 970 and if it finds a record in the database that corresponds to a record from the file, it will update its content. Otherwise it will signal an error saying that it could not find the record-to-be-updated.
For example, to update a title of record #123 via correct mode, your input file should contain record ID in the 001 tag and the title in 245 tag as follows:
<record> <controlfield tag="001">123</controlfield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">My Newly Updated Title</subfield> </datafield> </record>
There are several updating modes:
-r, --replace Replace existing records by those from the XML MARC file. The original content is wiped out and fully replaced. Signals error if record is not found via matching record IDs or system numbers. Fields defined in Invenio config variable CFG_BIBUPLOAD_STRONG_TAGS are not replaced. Note also that `-r' can be combined with `-i' into an `-ir' option that would automatically either insert records as new if they are not found in the system, or correct existing records if they are found to exist. -a, --append Append fields from XML MARC file at the end of existing records. The original content is enriched only. Signals error if record is not found via matching record IDs or system numbers. -c, --correct Correct fields of existing records by those from XML MARC file. The original record content is modified only on those fields from the XML MARC file where both the tags and the indicators match: the original fields are removed and replaced by those from the XML MARC file. Fields not present in XML MARC file are not changed (unlike the -r option). Fields with "provenance" subfields defined in 'CFG_BIBUPLOAD_CONTROLLED_PROVENANCE_TAGS' are protected against deletion unless the input MARCXML contains a matching provenance value. Signals error if record is not found via matching record IDs or system numbers. -d, --delete Delete fields of existing records that are contained in the XML MARC file. The fields in the original record that are not present in the XML MARC file are preserved. This is incompatible with FFT (see below).
Note that if you are using the --replace
mode, and you specify
in the incoming MARCXML a 001 tag with a value representing a record ID that
does not exist, bibupload will not create the record on-the-fly unless the
--force
parameter was also passed on the command line. This is done
in order to avoid creating, by mistake, holes in the database list of record
identifiers. When you ask, in fact, to --replace
a non-existing record
imposing a record ID with a value of, say, 1 000 000
and, subsequently, you
--insert
a new record, this will automatically receive an ID with
the value 1 000 001
.
If you combine the --pretend
parameter with the above updating modes you can actually test what would be executed without modifying the database or altering the system status.
Note that the insert/update modes can be combined together. For example, if you have a file that contains a mixture of new records with possibly some records to be updated, then you can run:
In this case BibUpload will try to do an update (for records having either 001 or 970 identifiers), or an insert (for the other ones).$ bibupload -i -r file.xml
The fulltext files can be uploaded and revised via a special FFT ("fulltext file transfer") tag with the following semantic:
FFT $a ... location of the docfile to upload (a filesystem path or a URL) $d ... docfile description (optional) $f ... format (optional; if not set, deduced from $a) $m ... new desired docfile name (optional; used for renaming files) $n ... docfile name (optional; if not set, deduced from $a) $o ... flag (repeatable subfield) $r ... restriction (optional, see below) $s ... set timestamp (optional, see below) $t ... docfile type (e.g. Main, Additional) $v ... version (used only with REVERT and DELETE-FILE, see below) $x ... url/path for an icon (optional) $z ... comment (optional) $w ... MoreInfo modification of the document $p ... MoreInfo modification of a current version of the document $b ... MoreInfo modification of a current version and format of the document $u ... MoreInfo modification of a format (of any version) of the document
For example, to upload a new fulltext file thesis.pdf
associated to record ID 123:
<record> <controlfield tag="001">123</controlfield> <datafield tag="FFT" ind1=" " ind2=" "> <subfield code="a">/tmp/thesis.pdf</subfield> <subfield code="t">Main</subfield> <subfield code="d"> This is the fulltext version of my thesis in the PDF format. Chapter 5 still needs some revision. </subfield> </datafield> </record>
The FFT tag can be repetitive, so one can pass along another FFT tag instance containing a pointer to e.g. the thesis defence slides. The subfields of an FFT tag are non-repetitive.
When more than one FFT tag is specified for the same document (e.g. for adding more than one format at a time), if $t (docfile type), $m (new desired docfile name), $r (restriction), $v (version), $x (url/path for an icon), are specified, they should be identically specified for each single entry of FFT. E.g. if you want to specify an icon for a document with two formats (say .pdf and .doc), you'll write two FFT tags, both containing the same $x subfield.
The bibupload process, when it encounters FFT tags, will
automatically populate fulltext storage space
(/opt/invenio/var/data/files
) and metadata record
associated tables (bibrec_bibdoc
, bibdoc
) as
appropriate. It will also enrich the 856 tags (URL tags) of the MARC
metadata of the record in question with references to the latest
versions of each file.
Note that for $a and $x subfields filesystem paths must be absolute
(e.g. /tmp/icon.gif
is valid,
while Destkop/icon.gif
is not) and they must be readable
by the user/group of the bibupload process that will handle the FFT.
The bibupload process supports the usual modes correct, append, replace, insert with a semantic that is somewhat similar to the semantic of the metadata upload:
-
Metadata Fulltext objects being uploaded MARC field instances characterized by tags (010-999) fulltext files characterized by unique file names (FFT $n) insert insert new record; must not exist insert new files; must not exist append append new tag instances for the given tag XXX, regardless of existing tag instances append new files, if filename (i.e. new format) not already present correct correct tag instances for the given tag XXX; delete existing ones and replace with given ones correct files with the given filename; add new revision or delete file; if the docname does not exist the file is added replace replace all tags, whatever XXX are replace all files, whatever filenames are delete delete all existing tag instances not supported
Note, in append and insert mode,
$mis ignored. +
Note that you can mix regular MARC tags with special FFT tags in +the incoming XML input file. Both record metadata and record files +will be updated as a result. Hence beware with some input modes, such +as replace mode, if you would like to touch only files.
+ +Note that in append and insert mode the $m
is ignored.
In order to rename a document just use the the correct mode specifing in the $n subfield the original docname that should be renamed and in $m the new name.
Special values can be assigned to the $t subfield.
Value | Meaning |
---|---|
PURGE | In order to purge previous file revisions (i.e. in order to keep only the latest file version), please use the correct mode with $n docname and $t PURGE as the special keyword. |
DELETE | In order to delete all existing versions of a file, making it effectively hidden, please use the correct mode with $n docname and $t DELETE as the special keyword. |
EXPUNGE | In order to expunge (i.e. remove completely, also from the filesystem) all existing versions of a file, making it effectively disappear, please use the correct mode with $n docname and $t EXPUNGE as the special keyword. |
FIX-MARC | In order to synchronize MARC to the bibrec/bibdoc structure (e.g. after an update or a tweak in the database), please use the correct mode with $n docname and $t FIX-MARC as the special keyword. |
FIX-ALL | In order to fix a record (i.e. put all its linked documents in a coherent state) and synchronize the MARC to the table, please use the correct mode with $n docname and $t FIX-ALL as the special keyword. |
REVERT | In order to revert to a previous file revision (i.e. to create a new revision with the same content as some previous revision had), please use the correct mode with $n docname, $t REVERT as the special keyword and $v the number corresponding to the desired version. |
DELETE-FILE | In order to delete a particular file added by mistake, please use the correct mode with $n docname, $t DELETE-FILE, specifing $v version and $f format. Note that this operation is not reversible. Note that if you don't spcify a version, the last version will be used. |
In order to preserve previous comments and descriptions when correcting, please use the KEEP-OLD-VALUE special keyword with the desired $d and $z subfield.
The $r subfield can contain a string that can be use to restrict the given document. The same value must be specified for all the format of a given document. By default the keyword will be used as the status parameter for the "viewrestrdoc" action, which can be used to give access right/restriction to desired user. e.g. if you set the keyword "thesis", you can the connect the "thesisviewer" to the action "viewrestrdoc" with parameter "status" set to "thesis". Then all the user which are linked with the "thesisviewer" role will be able to download the document. Instead any other user which are not considered as authors for the given record will not be allowed. Note, if you use the keyword "KEEP-OLD-VALUE" the previous restrictions if applicable will be kept.
More advanced document-level restriction is indeed possible. If the value contains infact:
Some special flags might be set via FFT and associated with the current document by using the $o subfield. This feature is experimental. Currently only two flags are actively considered:
Note that each time bibupload is called on a record, the 8564 tags pointing to locally stored files are recreated on the basis of the full-text files connected to the record. Thus, if you whish to update some 8564 tag pointing to a locally managed file, the only way to perform this is through the FFT tag, not by editing 8564 directly.
The subfield $s of FFT can be used to set time stamp of the uploaded file to a given value, e.g. 2007-05-04 03:02:01. This is useful when uploading old files. When $s is not present, the current time will be used.
Sometimes, to implement a particular workflow or policy in a digital repository, it might be nice to receive an automatic machine friendly feedback that aknowledges the outcome of a bibupload execution. To this aim the --callback-url command line parameter can be used. This parameter expects a URL to be specified to which a JSON-serialized response will POSTed.
Say, you have an external service reachable via the URL http://www.example.org/accept_feedback. If the argument:
--callback-url http://www.example.org/accept_feedbackis added to the usual bibupload call, at the end of the execution of the corresponding bibupload task, an HTTP POST request will be performed, if possible to the given URL, reporting the outcome of the bibupload execution as a JSON-serialized response with the following structure:
For example, a possible JSON response posted to a specified URL can look like:
{ "results": [ { "recid": -1, "error_message": "ERROR: can not retrieve the record identifier", "success": false }, { "recid": 1000, "error_message": "", "success": true, "marcxml": "", "url": "http://www.example.org/record/1000" }, ... ] } 1000 ...
Note that, currently, in case the specified URL can not be reached at the time of the POST request, the whole bibupload task will fail.
If you use the same callback URL to receive the feedback from more than one bibupload request you might want to be able to correctly identify each bibupload call with the corresponding feedback. For this reason you can pass to the bibupload call an additional argument:
--nonce VALUEwhere value can be any string you wish. Such string will be then added to the JSON structure, as in (supposing you specified --nonce 1234):
{ "nonce": "1234", "results": [ { "recid": -1, "error_message": "ERROR: can not retrieve the record identifier", "success": false }, { "recid": 1000, "error_message": "", "success": true, "marcxml": "", "url": "http://www.example.org/record/1000" }, ... ] } 1000 ...
Some bits of meta-data should not be viewed by Invenio users directly and stored in the MARC format. This includes all types of non-standard data related to records and documents, for example flags realted to documetns (sepcified inside of a FFT tage) or bits of semantic information related to entities managed in Invenio. This type of data is usually machine generated and should be used by modules of Invenio internally.
Invenio provides a general mechanism allowing to store objects related to different entities of Invenio. This mechanism is called MoreInfo and resembles well known more-info solutions. Every entity (document, version of a document, format of a particular version of a document, relation between documents) can be assigned a dictionary of arbitrary values. The dictionary is divided into namespaces, which allow to separate data from different modules and serving different purposes.
BibUpload, the only gateway to uploading data into the Invenio database, allows to populate MoreInfo structures. MoreInfo related to a given entity can be modified by providing a Pickle-serialised byte64 encoded Python object having following structure:
{ "namespace": { "key": "value", "key2": "value2" } }
For example the above dictionary should be uploaded as
KGRwMQpTJ25hbWVzcGFjZScKcDIKKGRwMwpTJ2tleTInCnA0ClMndmFsdWUyJwpwNQpzUydrZXknCnA2ClMndmFsdWUnCnA3CnNzLg==
Which is a base-64 encoded representation of the string
(dp0\nS'namespace'\np1\n(dp2\nS'key2'\np3\nS'value2'\np4\nsS'key'\np5\nS'value'\np6\nss.
Removing of data keys from a dictionary can happen by providing None value as a value. Empty namespaces are considered non-existent.
The string representation of modifications to the MoreInfo dictionary can be provided in several places, depending, to which object it should be attached. The most general upload method, the BDM tag has following semantic:
BDM $r ... Identifier of a relation between documents (optional) $i ... Identifier of a BibDoc (optional) $v ... Version of a BibDoc (optional) $n ... Name of a BibDoc (within a current record) (optional) $f ... Format of a BibDoc (optional) $m ... Serialised update to the MoreInfo dictionary
All (except $m) subfields are optional and allow to identify an entity to which MoreInfo should refer.
Besides the BDM tag, MoreInfo can be transfered using special subfields of FFT and BDR tags. The first one allows to modify MoreInfo of a newly uploaded document, the second of a relation. The additional subfields have following semantic:
FFT $w ... MoreInfo modification of the document $p ... MoreInfo modification of a current version of the document $s ... MoreInfo modification of a current version and format of the document $u ... MoreInfo modification of a format (of any version) of the document BDR $m ... MoreInfo modification of a relation between BibDocs
One of additional pieces of non-MARC data which can be uploaded to Invenio are relations between documents. Similarly to MoreInfos, relations are intended to be used by Invenio modules. The semantics of BDR field allowing to upload relations looks as follows
BDR $r ... Identifier of the relation (optional, can be provided if modifying a known relation) $i ... Identifier of the first document $n ... Name of the first document (within the current record) (optional) $v ... Version of the first document (optional) $f ... Format of the first document (optional) $j ... Identifier of the second document $o ... Name of the second document (within the current record) (optional) $w ... Version of the second document (optional) $g ... Format of the second document (optional) $t ... Type of the relation $m ... Modification of the MoreInfo of the relation $d ... Special field. if value=DELETE, relation is removed
Behavious of BDR tag in different upload modes:
insert, append | Inserts new relation if necessary. Appends fields to the MoreInfo structure |
correct, replace | Creates new relation if necessary, replaces the entire content of MoreInfo field. |
In many cases, users want to upload large collections of documents using single BibUpload tasks. The infrastructure described in the rest of this manual allows easy upload of multiple documents, but lacks facilities for relating them to each other. A sample use-case which can not be satisfied by simple usage of FFT tags is uploading a document and relating it to another which is either already in the database or is being uploaded within the same BibUpload task. BibUpload provides a mechanism of temportaty identifiers which allows to serve scenarios similar to the aforementioned.
Temporary identifier is a string (unique in the context of a single MARC XML document), which replaces document number or a version number. In the context of BibDoc manipulations (FFT, BDR and BDM tags), temporary identifeirs can appear everywhere where version or numerical id are required. If a temporary identifier appears in a context of document already having an ID assigned, it will be interpreted as this already existent number. If newly created document is assigned a temporary identifier, the newly generated numerical ID is assigned to the temporary id. In order to be recognised as a temporary identifier, a string has to begin with a prefix TMP:. The mechanism of temporary identifiers can not be used in the con text of records, but only with BibDocs.
A BibUpload input using temporary identifiers can look like:
<collection xmlns="http://www.loc.gov/MARC21/slim"> <record> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">This is a record of the publication</subfield> </datafield> <datafield tag="FFT" ind1=" " ind2=" "> <subfield code="a">http://somedomain.com/document.pdf</subfield> <subfield code="t">Main</subfield> <subfield code="n">docname</subfield> <subfield code="i">TMP:id_identifier1</subfield> <subfield code="v">TMP:ver_identifier1</subfield> </datafield> </record> <record> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">This is a record of a dataset extracted from the publication</subfield> </datafield> <datafield tag="FFT" ind1=" " ind2=" "> <subfield code="a">http://sample.com/dataset.data</subfield> <subfield code="t">Main</subfield> <subfield code="n">docname2</subfielxd> <subfield code="i">TMP:id_identifier2</subfield> <subfield code="v">TMP:ver_identifier2</subfield> </datafield> <datafield tag="BDR" ind1=" " ind2=" "> <subfield code="i">TMP:id_identifier1</subfield> <subfield code="v">TMP:ver_identifier1</subfield> <subfield code="j">TMP:id_identifier2</subfield> <subfield code="w">TMP:ver_identifier2</subfield> <subfield code="t">is_extracted_from</subfield> </datafield> </record> </collection>
The batchuploader web interface can be used either to upload metadata files or documents. Opposed to daemon mode, actions will be executed only once.
The available upload history displays metadata and document uploads using the web interface, not daemon mode.
If it is needed to use the batch upload function from within command line, this can be achieved with a curl call, like:
$ curl -F 'file=@localfile.xml' -F 'mode=-i' http://cds.cern.ch/batchuploader/robotupload [-F 'callback_url=http://...'] -A invenio_webupload
This service provides (client, file) checking to assure the records are put into a collection the client has rights to.
To configure this permissions, check CFG_BATCHUPLOADER_WEB_ROBOT_RIGHTS variable in the configuration file.
The allowed user agents can also be defined using the CFG_BATCHUPLOADER_WEB_ROBOT_AGENT variable.
Note that you can receive machine-friendly feedbacks from the corresponding bibupload task that is launched by a given batchuploader request, by adding the optional POST field callback_url with the same semantic of the --callback-url command line parameter of bibupload (see the previous paragraph Obtaining feedbacks).
A second more RESTful interface is also available: it will suffice to append to the URL the specific mode (among "insert", "append", "correct", "delete", "replace"), as in:
http://cds.cern.ch/batchuploader/robotupload/insert
The callback_url argument can be put in query part of the URL as in:
http://cds.cern.ch/batchuploader/robotupload/insert?callback_url=http://myhandler
In case the HTTP server that is going to receive the feedback at callback_url expect the request to be encoded in application/x-www-form-urlencoded rather than application/json (e.g. if the server is implemented directly in Oracle), you can further specify the special_treatment argument and set it to oracle. The feedback will then be further encoded into an application/x-www-form-urlencoded request, with a single form key called results, which will contain the final JSON data.
The MARCXML content should then be specified as the body of the request. With curl this can be implemented as in:
$ curl -T localfile.xml http://cds.cern.ch/batchuploader/robotupload/insert?callback_url=http://... -A invenio_webupload -H "Content-Type: application/marcxml+xml"
The nonce argument that can be passed to BibUpload as described in the previous paragraph can also be specified with both robotupload interfaces. E.g.:
$ curl -F 'file=@localfile.xml' -F 'nonce=1234' -F 'mode=-i' http://cds.cern.ch/batchuploader/robotupload -F 'callback_url=http://...' -A invenio_webuploadand
$ curl -T localfile.xml http://cds.cern.ch/batchuploader/robotupload/insert?nonce=1234&callback_url=http://... -A invenio_webupload -H "Content-Type: application/marcxml+xml"
The batchuploader daemon mode is intended to be a bibsched task for document or metadata upload. The parent directory where the daemon will look for folders metadata and documents must be specified in the invenio configuration file.
An example of how directories should be arranged, considering that invenio was installed in folder /opt/invenio would be:
/opt/invenio/var/batchupload /opt/invenio/var/batchupload/documents /opt/invenio/var/batchupload/documents/append /opt/invenio/var/batchupload/documents/revise /opt/invenio/var/batchupload/metadata /opt/invenio/var/batchupload/metadata/append /opt/invenio/var/batchupload/metadata/correct /opt/invenio/var/batchupload/metadata/insert /opt/invenio/var/batchupload/metadata/replace
When running the batchuploader daemon there are two possible execution modes:
-m, --metadata Look for metadata files in folders insert, append, correct and replace. All files are uploaded and then moved to the corresponding DONE folder. -d, --documents Look for documents in folders append and revise. Uploaded files are then moved to DONE folders if possible.By default, metadata mode is used.
An example of invocation would be:
It is possible to program batch uploader to run periodically. Read the Howto-run guide to see how. diff --git a/invenio/legacy/miscutil/sql/tabfill.sql b/invenio/legacy/miscutil/sql/tabfill.sql index f3b574f5b..8d5d47d91 100644 --- a/invenio/legacy/miscutil/sql/tabfill.sql +++ b/invenio/legacy/miscutil/sql/tabfill.sql @@ -1,863 +1,863 @@ -- This file is part of Invenio. -- Copyright (C) 2008, 2009, 2010, 2011, 2012, 2013 CERN. -- -- Invenio is free software; you can redistribute it and/or -- modify it under the terms of the GNU General Public License as -- published by the Free Software Foundation; either version 2 of the -- License, or (at your option) any later version. -- -- Invenio is distributed in the hope that it will be useful, but -- WITHOUT ANY WARRANTY; without even the implied warranty of -- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU -- General Public License for more details. -- -- You should have received a copy of the GNU General Public License -- along with Invenio; if not, write to the Free Software Foundation, Inc., -- 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. -- Fill Invenio configuration tables with defaults suitable for any site. INSERT INTO rnkMETHOD (id,name,last_updated) VALUES (1,'wrd','0000-00-00 00:00:00'); INSERT INTO collection_rnkMETHOD (id_collection,id_rnkMETHOD,score) VALUES (1,1,100); INSERT INTO rnkCITATIONDATA VALUES (1,'citationdict',NULL,'0000-00-00'); INSERT INTO rnkCITATIONDATA VALUES (2,'reversedict',NULL,'0000-00-00'); INSERT INTO rnkCITATIONDATA VALUES (3,'selfcitdict',NULL,'0000-00-00'); INSERT INTO rnkCITATIONDATA VALUES (4,'selfcitedbydict',NULL,'0000-00-00'); INSERT INTO field VALUES (1,'any field','anyfield'); INSERT INTO field VALUES (2,'title','title'); INSERT INTO field VALUES (3,'author','author'); INSERT INTO field VALUES (4,'abstract','abstract'); INSERT INTO field VALUES (5,'keyword','keyword'); INSERT INTO field VALUES (6,'report number','reportnumber'); INSERT INTO field VALUES (7,'subject','subject'); INSERT INTO field VALUES (8,'reference','reference'); INSERT INTO field VALUES (9,'fulltext','fulltext'); INSERT INTO field VALUES (10,'collection','collection'); INSERT INTO field VALUES (11,'division','division'); INSERT INTO field VALUES (12,'year','year'); INSERT INTO field VALUES (13,'experiment','experiment'); INSERT INTO field VALUES (14,'record ID','recid'); INSERT INTO field VALUES (15,'isbn','isbn'); INSERT INTO field VALUES (16,'issn','issn'); INSERT INTO field VALUES (17,'coden','coden'); -- INSERT INTO field VALUES (18,'doi','doi'); INSERT INTO field VALUES (19,'journal','journal'); INSERT INTO field VALUES (20,'collaboration','collaboration'); INSERT INTO field VALUES (21,'affiliation','affiliation'); INSERT INTO field VALUES (22,'exact author','exactauthor'); INSERT INTO field VALUES (23,'date created','datecreated'); INSERT INTO field VALUES (24,'date modified','datemodified'); INSERT INTO field VALUES (25,'refers to','refersto'); INSERT INTO field VALUES (26,'cited by','citedby'); INSERT INTO field VALUES (27,'caption','caption'); INSERT INTO field VALUES (28,'first author','firstauthor'); INSERT INTO field VALUES (29,'exact first author','exactfirstauthor'); INSERT INTO field VALUES (30,'author count','authorcount'); INSERT INTO field VALUES (31,'reference to','rawref'); INSERT INTO field VALUES (32,'exact title','exacttitle'); INSERT INTO field VALUES (33,'authority author','authorityauthor'); -INSERT INTO field VALUES (34,'authority institution','authorityinstitution'); +INSERT INTO field VALUES (34,'authority institute','authorityinstitute'); INSERT INTO field VALUES (35,'authority journal','authorityjournal'); INSERT INTO field VALUES (36,'authority subject','authoritysubject'); INSERT INTO field VALUES (37,'item count','itemcount'); INSERT INTO field VALUES (38,'file type','filetype'); INSERT INTO field VALUES (39,'miscellaneous', 'miscellaneous'); INSERT INTO field VALUES (40,'tag','tag'); INSERT INTO field_tag VALUES (10,11,100); INSERT INTO field_tag VALUES (11,14,100); INSERT INTO field_tag VALUES (12,15,10); INSERT INTO field_tag VALUES (13,116,10); INSERT INTO field_tag VALUES (2,3,100); INSERT INTO field_tag VALUES (2,4,90); INSERT INTO field_tag VALUES (3,1,100); INSERT INTO field_tag VALUES (3,2,90); INSERT INTO field_tag VALUES (4,5,100); INSERT INTO field_tag VALUES (5,6,100); INSERT INTO field_tag VALUES (6,7,30); INSERT INTO field_tag VALUES (6,8,10); INSERT INTO field_tag VALUES (6,9,20); INSERT INTO field_tag VALUES (7,12,100); INSERT INTO field_tag VALUES (7,13,90); INSERT INTO field_tag VALUES (8,10,100); INSERT INTO field_tag VALUES (9,115,100); INSERT INTO field_tag VALUES (14,117,100); INSERT INTO field_tag VALUES (15,118,100); INSERT INTO field_tag VALUES (16,119,100); INSERT INTO field_tag VALUES (17,120,100); -- INSERT INTO field_tag VALUES (18,121,100); INSERT INTO field_tag VALUES (19,131,100); INSERT INTO field_tag VALUES (20,132,100); INSERT INTO field_tag VALUES (21,133,100); INSERT INTO field_tag VALUES (21,134,90); INSERT INTO field_tag VALUES (22,1,100); INSERT INTO field_tag VALUES (22,2,90); INSERT INTO field_tag VALUES (27,135,100); INSERT INTO field_tag VALUES (28,1,100); INSERT INTO field_tag VALUES (29,1,100); INSERT INTO field_tag VALUES (30,1,100); INSERT INTO field_tag VALUES (30,2,90); INSERT INTO field_tag VALUES (32,3,100); INSERT INTO field_tag VALUES (32,4,90); -- authority fields INSERT INTO field_tag VALUES (33,1,100); INSERT INTO field_tag VALUES (33,146,100); INSERT INTO field_tag VALUES (33,140,100); INSERT INTO field_tag VALUES (34,148,100); INSERT INTO field_tag VALUES (34,149,100); INSERT INTO field_tag VALUES (34,150,100); INSERT INTO field_tag VALUES (35,151,100); INSERT INTO field_tag VALUES (35,152,100); INSERT INTO field_tag VALUES (35,153,100); INSERT INTO field_tag VALUES (36,154,100); INSERT INTO field_tag VALUES (36,155,100); INSERT INTO field_tag VALUES (36,156,100); -- misc fields INSERT INTO field_tag VALUES (39,17,10); INSERT INTO field_tag VALUES (39,18,10); INSERT INTO field_tag VALUES (39,157,10); INSERT INTO field_tag VALUES (39,158,10); INSERT INTO field_tag VALUES (39,159,10); INSERT INTO field_tag VALUES (39,160,10); INSERT INTO field_tag VALUES (39,161,10); INSERT INTO field_tag VALUES (39,162,10); INSERT INTO field_tag VALUES (39,163,10); INSERT INTO field_tag VALUES (39,164,10); INSERT INTO field_tag VALUES (39,20,10); INSERT INTO field_tag VALUES (39,21,10); INSERT INTO field_tag VALUES (39,22,10); INSERT INTO field_tag VALUES (39,23,10); INSERT INTO field_tag VALUES (39,165,10); INSERT INTO field_tag VALUES (39,166,10); INSERT INTO field_tag VALUES (39,167,10); INSERT INTO field_tag VALUES (39,168,10); INSERT INTO field_tag VALUES (39,169,10); INSERT INTO field_tag VALUES (39,170,10); INSERT INTO field_tag VALUES (39,25,10); INSERT INTO field_tag VALUES (39,27,10); INSERT INTO field_tag VALUES (39,28,10); INSERT INTO field_tag VALUES (39,29,10); INSERT INTO field_tag VALUES (39,30,10); INSERT INTO field_tag VALUES (39,31,10); INSERT INTO field_tag VALUES (39,32,10); INSERT INTO field_tag VALUES (39,33,10); INSERT INTO field_tag VALUES (39,34,10); INSERT INTO field_tag VALUES (39,35,10); INSERT INTO field_tag VALUES (39,36,10); INSERT INTO field_tag VALUES (39,37,10); INSERT INTO field_tag VALUES (39,38,10); INSERT INTO field_tag VALUES (39,39,10); INSERT INTO field_tag VALUES (39,171,10); INSERT INTO field_tag VALUES (39,172,10); INSERT INTO field_tag VALUES (39,173,10); INSERT INTO field_tag VALUES (39,174,10); INSERT INTO field_tag VALUES (39,175,10); INSERT INTO field_tag VALUES (39,41,10); INSERT INTO field_tag VALUES (39,42,10); INSERT INTO field_tag VALUES (39,43,10); INSERT INTO field_tag VALUES (39,44,10); INSERT INTO field_tag VALUES (39,45,10); INSERT INTO field_tag VALUES (39,46,10); INSERT INTO field_tag VALUES (39,47,10); INSERT INTO field_tag VALUES (39,48,10); INSERT INTO field_tag VALUES (39,49,10); INSERT INTO field_tag VALUES (39,50,10); INSERT INTO field_tag VALUES (39,51,10); INSERT INTO field_tag VALUES (39,52,10); INSERT INTO field_tag VALUES (39,53,10); INSERT INTO field_tag VALUES (39,54,10); INSERT INTO field_tag VALUES (39,55,10); INSERT INTO field_tag VALUES (39,56,10); INSERT INTO field_tag VALUES (39,57,10); INSERT INTO field_tag VALUES (39,58,10); INSERT INTO field_tag VALUES (39,59,10); INSERT INTO field_tag VALUES (39,60,10); INSERT INTO field_tag VALUES (39,61,10); INSERT INTO field_tag VALUES (39,62,10); INSERT INTO field_tag VALUES (39,63,10); INSERT INTO field_tag VALUES (39,64,10); INSERT INTO field_tag VALUES (39,65,10); INSERT INTO field_tag VALUES (39,66,10); INSERT INTO field_tag VALUES (39,67,10); INSERT INTO field_tag VALUES (39,176,10); INSERT INTO field_tag VALUES (39,177,10); INSERT INTO field_tag VALUES (39,178,10); INSERT INTO field_tag VALUES (39,179,10); INSERT INTO field_tag VALUES (39,180,10); INSERT INTO field_tag VALUES (39,69,10); INSERT INTO field_tag VALUES (39,70,10); INSERT INTO field_tag VALUES (39,71,10); INSERT INTO field_tag VALUES (39,72,10); INSERT INTO field_tag VALUES (39,73,10); INSERT INTO field_tag VALUES (39,74,10); INSERT INTO field_tag VALUES (39,75,10); INSERT INTO field_tag VALUES (39,76,10); INSERT INTO field_tag VALUES (39,77,10); INSERT INTO field_tag VALUES (39,78,10); INSERT INTO field_tag VALUES (39,79,10); INSERT INTO field_tag VALUES (39,80,10); INSERT INTO field_tag VALUES (39,181,10); INSERT INTO field_tag VALUES (39,182,10); INSERT INTO field_tag VALUES (39,183,10); INSERT INTO field_tag VALUES (39,184,10); INSERT INTO field_tag VALUES (39,185,10); INSERT INTO field_tag VALUES (39,186,10); INSERT INTO field_tag VALUES (39,82,10); INSERT INTO field_tag VALUES (39,83,10); INSERT INTO field_tag VALUES (39,84,10); INSERT INTO field_tag VALUES (39,85,10); INSERT INTO field_tag VALUES (39,187,10); INSERT INTO field_tag VALUES (39,88,10); INSERT INTO field_tag VALUES (39,89,10); INSERT INTO field_tag VALUES (39,90,10); INSERT INTO field_tag VALUES (39,91,10); INSERT INTO field_tag VALUES (39,92,10); INSERT INTO field_tag VALUES (39,93,10); INSERT INTO field_tag VALUES (39,94,10); INSERT INTO field_tag VALUES (39,95,10); INSERT INTO field_tag VALUES (39,96,10); INSERT INTO field_tag VALUES (39,97,10); INSERT INTO field_tag VALUES (39,98,10); INSERT INTO field_tag VALUES (39,99,10); INSERT INTO field_tag VALUES (39,100,10); INSERT INTO field_tag VALUES (39,102,10); INSERT INTO field_tag VALUES (39,103,10); INSERT INTO field_tag VALUES (39,104,10); INSERT INTO field_tag VALUES (39,105,10); INSERT INTO field_tag VALUES (39,188,10); INSERT INTO field_tag VALUES (39,189,10); INSERT INTO field_tag VALUES (39,190,10); INSERT INTO field_tag VALUES (39,191,10); INSERT INTO field_tag VALUES (39,192,10); INSERT INTO field_tag VALUES (39,193,10); INSERT INTO field_tag VALUES (39,194,10); INSERT INTO field_tag VALUES (39,195,10); INSERT INTO field_tag VALUES (39,196,10); INSERT INTO field_tag VALUES (39,107,10); INSERT INTO field_tag VALUES (39,108,10); INSERT INTO field_tag VALUES (39,109,10); INSERT INTO field_tag VALUES (39,110,10); INSERT INTO field_tag VALUES (39,111,10); INSERT INTO field_tag VALUES (39,112,10); INSERT INTO field_tag VALUES (39,113,10); INSERT INTO field_tag VALUES (39,197,10); INSERT INTO field_tag VALUES (39,198,10); INSERT INTO field_tag VALUES (39,199,10); INSERT INTO field_tag VALUES (39,200,10); INSERT INTO field_tag VALUES (39,201,10); INSERT INTO field_tag VALUES (39,202,10); INSERT INTO field_tag VALUES (39,203,10); INSERT INTO field_tag VALUES (39,204,10); INSERT INTO field_tag VALUES (39,205,10); INSERT INTO field_tag VALUES (39,206,10); INSERT INTO field_tag VALUES (39,207,10); INSERT INTO field_tag VALUES (39,208,10); INSERT INTO field_tag VALUES (39,209,10); INSERT INTO field_tag VALUES (39,210,10); INSERT INTO field_tag VALUES (39,211,10); INSERT INTO field_tag VALUES (39,212,10); INSERT INTO field_tag VALUES (39,213,10); INSERT INTO field_tag VALUES (39,214,10); INSERT INTO field_tag VALUES (39,215,10); INSERT INTO field_tag VALUES (39,122,10); INSERT INTO field_tag VALUES (39,123,10); INSERT INTO field_tag VALUES (39,124,10); INSERT INTO field_tag VALUES (39,125,10); INSERT INTO field_tag VALUES (39,126,10); INSERT INTO field_tag VALUES (39,127,10); INSERT INTO field_tag VALUES (39,128,10); INSERT INTO field_tag VALUES (39,129,10); INSERT INTO field_tag VALUES (39,130,10); INSERT INTO field_tag VALUES (39,1,10); INSERT INTO field_tag VALUES (39,2,10); -- misc authority fields INSERT INTO field_tag VALUES (39,216,10); INSERT INTO field_tag VALUES (39,217,10); INSERT INTO field_tag VALUES (39,218,10); INSERT INTO field_tag VALUES (39,219,10); INSERT INTO field_tag VALUES (39,220,10); INSERT INTO field_tag VALUES (39,221,10); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (1,'HTML brief','hb', 'HTML brief output format, used for search results pages.', 'text/html', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (2,'HTML detailed','hd', 'HTML detailed output format, used for Detailed record pages.', 'text/html', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (3,'MARC','hm', 'HTML MARC.', 'text/html', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (4,'Dublin Core','xd', 'XML Dublin Core.', 'text/xml', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (5,'MARCXML','xm', 'XML MARC.', 'text/xml', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (6,'portfolio','hp', 'HTML portfolio-style output format for photos.', 'text/html', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (7,'photo captions only','hc', 'HTML caption-only output format for photos.', 'text/html', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (8,'BibTeX','hx', 'BibTeX.', 'text/html', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (9,'EndNote','xe', 'XML EndNote.', 'text/xml', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (10,'NLM','xn', 'XML NLM.', 'text/xml', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (11,'Excel','excel', 'Excel csv output', 'application/ms-excel', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (12,'HTML similarity','hs', 'Very short HTML output for similarity box (people also viewed..).', 'text/html', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (13,'RSS','xr', 'RSS.', 'text/xml', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (14,'OAI DC','xoaidc', 'OAI DC.', 'text/xml', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (15,'File mini-panel', 'hdfile', 'Used to show fulltext files in mini-panel of detailed record pages.', 'text/html', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (16,'Actions mini-panel', 'hdact', 'Used to display actions in mini-panel of detailed record pages.', 'text/html', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (17,'References tab', 'hdref', 'Display record references in References tab.', 'text/html', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (18,'HTML citesummary','hcs', 'HTML cite summary format, used for search results pages.', 'text/html', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (19,'RefWorks','xw', 'RefWorks.', 'text/xml', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (20,'MODS', 'xo', 'Metadata Object Description Schema', 'application/xml', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (21,'HTML author claiming', 'ha', 'Very brief HTML output format for author/paper claiming facility.', 'text/html', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (22,'Podcast', 'xp', 'Sample format suitable for multimedia feeds, such as podcasts', 'application/rss+xml', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (23,'WebAuthorProfile affiliations helper','wapaff', 'cPickled dicts', 'text', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (24,'EndNote (8-X)','xe8x', 'XML EndNote (8-X).', 'text/xml', 1); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (25,'HTML citesummary extended','hcs2', 'HTML cite summary format, including self-citations counts.', 'text/html', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (26,'DataCite','dcite', 'DataCite XML format.', 'text/xml', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (27,'Mobile brief','mobb', 'Mobile brief format.', 'text/html', 0); INSERT INTO format (id,name,code,description,content_type,visibility) VALUES (28,'Mobile detailed','mobd', 'Mobile detailed format.', 'text/html', 0); INSERT INTO tag VALUES (1,'first author name','100__a'); INSERT INTO tag VALUES (2,'additional author name','700__a'); INSERT INTO tag VALUES (3,'main title','245__%'); INSERT INTO tag VALUES (4,'additional title','246__%'); INSERT INTO tag VALUES (5,'abstract','520__%'); INSERT INTO tag VALUES (6,'keyword','6531_a'); INSERT INTO tag VALUES (7,'primary report number','037__a'); INSERT INTO tag VALUES (8,'additional report number','088__a'); INSERT INTO tag VALUES (9,'added report number','909C0r'); INSERT INTO tag VALUES (10,'reference','999C5%'); INSERT INTO tag VALUES (11,'collection identifier','980__%'); INSERT INTO tag VALUES (12,'main subject','65017a'); INSERT INTO tag VALUES (13,'additional subject','65027a'); INSERT INTO tag VALUES (14,'division','909C0p'); INSERT INTO tag VALUES (15,'year','909C0y'); INSERT INTO tag VALUES (16,'00x','00%'); INSERT INTO tag VALUES (17,'01x','01%'); INSERT INTO tag VALUES (18,'02x','02%'); INSERT INTO tag VALUES (19,'03x','03%'); INSERT INTO tag VALUES (20,'lang','04%'); INSERT INTO tag VALUES (21,'05x','05%'); INSERT INTO tag VALUES (22,'06x','06%'); INSERT INTO tag VALUES (23,'07x','07%'); INSERT INTO tag VALUES (24,'08x','08%'); INSERT INTO tag VALUES (25,'09x','09%'); INSERT INTO tag VALUES (26,'10x','10%'); INSERT INTO tag VALUES (27,'11x','11%'); INSERT INTO tag VALUES (28,'12x','12%'); INSERT INTO tag VALUES (29,'13x','13%'); INSERT INTO tag VALUES (30,'14x','14%'); INSERT INTO tag VALUES (31,'15x','15%'); INSERT INTO tag VALUES (32,'16x','16%'); INSERT INTO tag VALUES (33,'17x','17%'); INSERT INTO tag VALUES (34,'18x','18%'); INSERT INTO tag VALUES (35,'19x','19%'); INSERT INTO tag VALUES (36,'20x','20%'); INSERT INTO tag VALUES (37,'21x','21%'); INSERT INTO tag VALUES (38,'22x','22%'); INSERT INTO tag VALUES (39,'23x','23%'); INSERT INTO tag VALUES (40,'24x','24%'); INSERT INTO tag VALUES (41,'25x','25%'); INSERT INTO tag VALUES (42,'internal','26%'); INSERT INTO tag VALUES (43,'27x','27%'); INSERT INTO tag VALUES (44,'28x','28%'); INSERT INTO tag VALUES (45,'29x','29%'); INSERT INTO tag VALUES (46,'pages','30%'); INSERT INTO tag VALUES (47,'31x','31%'); INSERT INTO tag VALUES (48,'32x','32%'); INSERT INTO tag VALUES (49,'33x','33%'); INSERT INTO tag VALUES (50,'34x','34%'); INSERT INTO tag VALUES (51,'35x','35%'); INSERT INTO tag VALUES (52,'36x','36%'); INSERT INTO tag VALUES (53,'37x','37%'); INSERT INTO tag VALUES (54,'38x','38%'); INSERT INTO tag VALUES (55,'39x','39%'); INSERT INTO tag VALUES (56,'40x','40%'); INSERT INTO tag VALUES (57,'41x','41%'); INSERT INTO tag VALUES (58,'42x','42%'); INSERT INTO tag VALUES (59,'43x','43%'); INSERT INTO tag VALUES (60,'44x','44%'); INSERT INTO tag VALUES (61,'45x','45%'); INSERT INTO tag VALUES (62,'46x','46%'); INSERT INTO tag VALUES (63,'47x','47%'); INSERT INTO tag VALUES (64,'48x','48%'); INSERT INTO tag VALUES (65,'series','49%'); INSERT INTO tag VALUES (66,'50x','50%'); INSERT INTO tag VALUES (67,'51x','51%'); INSERT INTO tag VALUES (68,'52x','52%'); INSERT INTO tag VALUES (69,'53x','53%'); INSERT INTO tag VALUES (70,'54x','54%'); INSERT INTO tag VALUES (71,'55x','55%'); INSERT INTO tag VALUES (72,'56x','56%'); INSERT INTO tag VALUES (73,'57x','57%'); INSERT INTO tag VALUES (74,'58x','58%'); INSERT INTO tag VALUES (75,'summary','59%'); INSERT INTO tag VALUES (76,'60x','60%'); INSERT INTO tag VALUES (77,'61x','61%'); INSERT INTO tag VALUES (78,'62x','62%'); INSERT INTO tag VALUES (79,'63x','63%'); INSERT INTO tag VALUES (80,'64x','64%'); INSERT INTO tag VALUES (81,'65x','65%'); INSERT INTO tag VALUES (82,'66x','66%'); INSERT INTO tag VALUES (83,'67x','67%'); INSERT INTO tag VALUES (84,'68x','68%'); INSERT INTO tag VALUES (85,'subject','69%'); INSERT INTO tag VALUES (86,'70x','70%'); INSERT INTO tag VALUES (87,'71x','71%'); INSERT INTO tag VALUES (88,'author-ad','72%'); INSERT INTO tag VALUES (89,'73x','73%'); INSERT INTO tag VALUES (90,'74x','74%'); INSERT INTO tag VALUES (91,'75x','75%'); INSERT INTO tag VALUES (92,'76x','76%'); INSERT INTO tag VALUES (93,'77x','77%'); INSERT INTO tag VALUES (94,'78x','78%'); INSERT INTO tag VALUES (95,'79x','79%'); INSERT INTO tag VALUES (96,'80x','80%'); INSERT INTO tag VALUES (97,'81x','81%'); INSERT INTO tag VALUES (98,'82x','82%'); INSERT INTO tag VALUES (99,'83x','83%'); INSERT INTO tag VALUES (100,'84x','84%'); INSERT INTO tag VALUES (101,'electr','85%'); INSERT INTO tag VALUES (102,'86x','86%'); INSERT INTO tag VALUES (103,'87x','87%'); INSERT INTO tag VALUES (104,'88x','88%'); INSERT INTO tag VALUES (105,'89x','89%'); INSERT INTO tag VALUES (106,'publication','90%'); INSERT INTO tag VALUES (107,'pub-conf-cit','91%'); INSERT INTO tag VALUES (108,'92x','92%'); INSERT INTO tag VALUES (109,'93x','93%'); INSERT INTO tag VALUES (110,'94x','94%'); INSERT INTO tag VALUES (111,'95x','95%'); INSERT INTO tag VALUES (112,'catinfo','96%'); INSERT INTO tag VALUES (113,'97x','97%'); INSERT INTO tag VALUES (114,'98x','98%'); INSERT INTO tag VALUES (115,'url','8564_u'); INSERT INTO tag VALUES (116,'experiment','909C0e'); INSERT INTO tag VALUES (117,'record ID','001'); INSERT INTO tag VALUES (118,'isbn','020__a'); INSERT INTO tag VALUES (119,'issn','022__a'); INSERT INTO tag VALUES (120,'coden','030__a'); INSERT INTO tag VALUES (121,'doi','909C4a'); INSERT INTO tag VALUES (122,'850x','850%'); INSERT INTO tag VALUES (123,'851x','851%'); INSERT INTO tag VALUES (124,'852x','852%'); INSERT INTO tag VALUES (125,'853x','853%'); INSERT INTO tag VALUES (126,'854x','854%'); INSERT INTO tag VALUES (127,'855x','855%'); INSERT INTO tag VALUES (128,'857x','857%'); INSERT INTO tag VALUES (129,'858x','858%'); INSERT INTO tag VALUES (130,'859x','859%'); INSERT INTO tag VALUES (131,'journal','909C4%'); INSERT INTO tag VALUES (132,'collaboration','710__g'); INSERT INTO tag VALUES (133,'first author affiliation','100__u'); INSERT INTO tag VALUES (134,'additional author affiliation','700__u'); INSERT INTO tag VALUES (135,'caption','8564_y'); INSERT INTO tag VALUES (136,'journal page','909C4c'); INSERT INTO tag VALUES (137,'journal title','909C4p'); INSERT INTO tag VALUES (138,'journal volume','909C4v'); INSERT INTO tag VALUES (139,'journal year','909C4y'); INSERT INTO tag VALUES (140,'comment','500__a'); INSERT INTO tag VALUES (141,'title','245__a'); INSERT INTO tag VALUES (142,'main abstract','245__a'); INSERT INTO tag VALUES (143,'internal notes','595__a'); INSERT INTO tag VALUES (144,'other relationship entry', '787%'); -- INSERT INTO tag VALUES (145,'authority: main personal name','100__a'); -- already exists under a different name ('first author name') INSERT INTO tag VALUES (146,'authority: alternative personal name','400__a'); -- INSERT INTO tag VALUES (147,'authority: personal name from other record','500__a'); -- already exists under a different name ('comment') INSERT INTO tag VALUES (148,'authority: organization main name','110__a'); INSERT INTO tag VALUES (149,'organization alternative name','410__a'); INSERT INTO tag VALUES (150,'organization main from other record','510__a'); INSERT INTO tag VALUES (151,'authority: uniform title','130__a'); INSERT INTO tag VALUES (152,'authority: uniform title alternatives','430__a'); INSERT INTO tag VALUES (153,'authority: uniform title from other record','530__a'); INSERT INTO tag VALUES (154,'authority: subject from other record','150__a'); INSERT INTO tag VALUES (155,'authority: subject alternative name','450__a'); INSERT INTO tag VALUES (156,'authority: subject main name','550__a'); -- tags for misc index INSERT INTO tag VALUES (157,'031x','031%'); INSERT INTO tag VALUES (158,'032x','032%'); INSERT INTO tag VALUES (159,'033x','033%'); INSERT INTO tag VALUES (160,'034x','034%'); INSERT INTO tag VALUES (161,'035x','035%'); INSERT INTO tag VALUES (162,'036x','036%'); INSERT INTO tag VALUES (163,'037x','037%'); INSERT INTO tag VALUES (164,'038x','038%'); INSERT INTO tag VALUES (165,'080x','080%'); INSERT INTO tag VALUES (166,'082x','082%'); INSERT INTO tag VALUES (167,'083x','083%'); INSERT INTO tag VALUES (168,'084x','084%'); INSERT INTO tag VALUES (169,'085x','085%'); INSERT INTO tag VALUES (170,'086x','086%'); INSERT INTO tag VALUES (171,'240x','240%'); INSERT INTO tag VALUES (172,'242x','242%'); INSERT INTO tag VALUES (173,'243x','243%'); INSERT INTO tag VALUES (174,'244x','244%'); INSERT INTO tag VALUES (175,'247x','247%'); INSERT INTO tag VALUES (176,'521x','521%'); INSERT INTO tag VALUES (177,'522x','522%'); INSERT INTO tag VALUES (178,'524x','524%'); INSERT INTO tag VALUES (179,'525x','525%'); INSERT INTO tag VALUES (180,'526x','526%'); INSERT INTO tag VALUES (181,'650x','650%'); INSERT INTO tag VALUES (182,'651x','651%'); INSERT INTO tag VALUES (183,'6531_v','6531_v'); INSERT INTO tag VALUES (184,'6531_y','6531_y'); INSERT INTO tag VALUES (185,'6531_9','6531_9'); INSERT INTO tag VALUES (186,'654x','654%'); INSERT INTO tag VALUES (187,'655x','655%'); INSERT INTO tag VALUES (188,'656x','656%'); INSERT INTO tag VALUES (189,'657x','657%'); INSERT INTO tag VALUES (190,'658x','658%'); INSERT INTO tag VALUES (191,'711x','711%'); INSERT INTO tag VALUES (192,'900x','900%'); INSERT INTO tag VALUES (193,'901x','901%'); INSERT INTO tag VALUES (194,'902x','902%'); INSERT INTO tag VALUES (195,'903x','903%'); INSERT INTO tag VALUES (196,'904x','904%'); INSERT INTO tag VALUES (197,'905x','905%'); INSERT INTO tag VALUES (198,'906x','906%'); INSERT INTO tag VALUES (199,'907x','907%'); INSERT INTO tag VALUES (200,'908x','908%'); INSERT INTO tag VALUES (201,'909C1x','909C1%'); INSERT INTO tag VALUES (202,'909C5x','909C5%'); INSERT INTO tag VALUES (203,'909CSx','909CS%'); INSERT INTO tag VALUES (204,'909COx','909CO%'); INSERT INTO tag VALUES (205,'909CKx','909CK%'); INSERT INTO tag VALUES (206,'909CPx','909CP%'); INSERT INTO tag VALUES (207,'981x','981%'); INSERT INTO tag VALUES (208,'982x','982%'); INSERT INTO tag VALUES (209,'983x','983%'); INSERT INTO tag VALUES (210,'984x','984%'); INSERT INTO tag VALUES (211,'985x','985%'); INSERT INTO tag VALUES (212,'986x','986%'); INSERT INTO tag VALUES (213,'987x','987%'); INSERT INTO tag VALUES (214,'988x','988%'); INSERT INTO tag VALUES (215,'989x','989%'); -- authority controled tags INSERT INTO tag VALUES (216,'author control','100__0'); -INSERT INTO tag VALUES (217,'institution control','110__0'); +INSERT INTO tag VALUES (217,'institute control','110__0'); INSERT INTO tag VALUES (218,'journal control','130__0'); INSERT INTO tag VALUES (219,'subject control','150__0'); -INSERT INTO tag VALUES (220,'additional institution control', '260__0'); +INSERT INTO tag VALUES (220,'additional institute control', '260__0'); INSERT INTO tag VALUES (221,'additional author control', '700__0'); INSERT INTO idxINDEX VALUES (1,'global','This index contains words/phrases from global fields.','0000-00-00 00:00:00', '', 'native', 'INDEX-SYNONYM-TITLE,exact','No','No','No','BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (2,'collection','This index contains words/phrases from collection identifiers fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (3,'abstract','This index contains words/phrases from abstract fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (4,'author','This index contains fuzzy words/phrases from author fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexAuthorTokenizer'); INSERT INTO idxINDEX VALUES (5,'keyword','This index contains words/phrases from keyword fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (6,'reference','This index contains words/phrases from references fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (7,'reportnumber','This index contains words/phrases from report numbers fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (8,'title','This index contains words/phrases from title fields.','0000-00-00 00:00:00', '', 'native','INDEX-SYNONYM-TITLE,exact','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (9,'fulltext','This index contains words/phrases from fulltext fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexFulltextTokenizer'); INSERT INTO idxINDEX VALUES (10,'year','This index contains words/phrases from year fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexYearTokenizer'); INSERT INTO idxINDEX VALUES (11,'journal','This index contains words/phrases from journal publication information fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexJournalTokenizer'); INSERT INTO idxINDEX VALUES (12,'collaboration','This index contains words/phrases from collaboration name fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); -INSERT INTO idxINDEX VALUES (13,'affiliation','This index contains words/phrases from institutional affiliation fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); +INSERT INTO idxINDEX VALUES (13,'affiliation','This index contains words/phrases from affiliation fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (14,'exactauthor','This index contains exact words/phrases from author fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexExactAuthorTokenizer'); INSERT INTO idxINDEX VALUES (15,'caption','This index contains exact words/phrases from figure captions.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (16,'firstauthor','This index contains fuzzy words/phrases from first author field.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexAuthorTokenizer'); INSERT INTO idxINDEX VALUES (17,'exactfirstauthor','This index contains exact words/phrases from first author field.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexExactAuthorTokenizer'); INSERT INTO idxINDEX VALUES (18,'authorcount','This index contains number of authors of the record.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexAuthorCountTokenizer'); INSERT INTO idxINDEX VALUES (19,'exacttitle','This index contains exact words/phrases from title fields.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (20,'authorityauthor','This index contains words/phrases from author authority records.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexAuthorTokenizer'); -INSERT INTO idxINDEX VALUES (21,'authorityinstitution','This index contains words/phrases from institution authority records.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); +INSERT INTO idxINDEX VALUES (21,'authorityinstitute','This index contains words/phrases from institute authority records.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (22,'authorityjournal','This index contains words/phrases from journal authority records.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (23,'authoritysubject','This index contains words/phrases from subject authority records.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX VALUES (24,'itemcount','This index contains number of copies of items in the library.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexItemCountTokenizer'); INSERT INTO idxINDEX VALUES (25,'filetype','This index contains extensions of files connected to records.','0000-00-00 00:00:00', '', 'native', '','No','No','No', 'BibIndexFiletypeTokenizer'); INSERT INTO idxINDEX VALUES (26,'miscellaneous','This index contains words/phrases from miscellaneous fields','0000-00-00 00:00:00', '', 'native','','No','No','No', 'BibIndexDefaultTokenizer'); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (1,1); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (2,10); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (3,4); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (4,3); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (5,5); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (6,8); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (7,6); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (8,2); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (9,9); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (10,12); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (11,19); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (12,20); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (13,21); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (14,22); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (15,27); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (16,28); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (17,29); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (18,30); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (19,32); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (20,33); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (21,34); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (22,35); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (23,36); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (24,37); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (25,38); INSERT INTO idxINDEX_field (id_idxINDEX, id_field) VALUES (26,39); INSERT INTO idxINDEX_idxINDEX (id_virtual, id_normal) VALUES (1, 2); INSERT INTO idxINDEX_idxINDEX (id_virtual, id_normal) VALUES (1, 3); INSERT INTO idxINDEX_idxINDEX (id_virtual, id_normal) VALUES (1, 5); INSERT INTO idxINDEX_idxINDEX (id_virtual, id_normal) VALUES (1, 7); INSERT INTO idxINDEX_idxINDEX (id_virtual, id_normal) VALUES (1, 8); INSERT INTO idxINDEX_idxINDEX (id_virtual, id_normal) VALUES (1, 10); INSERT INTO idxINDEX_idxINDEX (id_virtual, id_normal) VALUES (1, 11); INSERT INTO idxINDEX_idxINDEX (id_virtual, id_normal) VALUES (1, 12); INSERT INTO idxINDEX_idxINDEX (id_virtual, id_normal) VALUES (1, 13); INSERT INTO idxINDEX_idxINDEX (id_virtual, id_normal) VALUES (1, 19); INSERT INTO idxINDEX_idxINDEX (id_virtual, id_normal) VALUES (1, 26); INSERT INTO sbmACTION VALUES ('Submit New Record','SBI','running','1998-08-17','2001-08-08','','Submit New Record'); INSERT INTO sbmACTION VALUES ('Modify Record','MBI','modify','1998-08-17','2001-11-07','','Modify Record'); INSERT INTO sbmACTION VALUES ('Submit New File','SRV','revise','0000-00-00','2001-11-07','','Submit New File'); INSERT INTO sbmACTION VALUES ('Approve Record','APP','approve','2001-11-08','2002-06-11','','Approve Record'); INSERT INTO sbmALLFUNCDESCR VALUES ('Ask_For_Record_Details_Confirmation',''); INSERT INTO sbmALLFUNCDESCR VALUES ('CaseEDS',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Create_Modify_Interface',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Create_Recid',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Finish_Submission',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Get_Info',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Get_Recid', 'This function gets the recid for a document with a given report-number (as stored in the global variable rn).'); INSERT INTO sbmALLFUNCDESCR VALUES ('Get_Report_Number',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Get_Sysno',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Insert_Modify_Record',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Insert_Record',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Is_Original_Submitter',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Is_Referee','This function checks whether the logged user is a referee for the current document'); INSERT INTO sbmALLFUNCDESCR VALUES ('Mail_Approval_Request_to_Referee',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Mail_Approval_Withdrawn_to_Referee',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Mail_Submitter',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Make_Modify_Record',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Make_Record',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_From_Pending',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_to_Done',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_to_Pending',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Print_Success',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Print_Success_Approval_Request',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Print_Success_APP',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Print_Success_DEL','Prepare a message for the user informing them that their record was successfully deleted.'); INSERT INTO sbmALLFUNCDESCR VALUES ('Print_Success_MBI',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Print_Success_SRV',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Register_Approval_Request',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Register_Referee_Decision',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Withdraw_Approval_Request',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Report_Number_Generation',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Second_Report_Number_Generation','Generate a secondary report number for a document.'); INSERT INTO sbmALLFUNCDESCR VALUES ('Send_Approval_Request',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Send_APP_Mail',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Send_Delete_Mail',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Send_Modify_Mail',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Send_SRV_Mail',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Set_Embargo','Set an embargo on all the documents of a given record.'); INSERT INTO sbmALLFUNCDESCR VALUES ('Stamp_Replace_Single_File_Approval','Stamp a single file when a document is approved.'); INSERT INTO sbmALLFUNCDESCR VALUES ('Stamp_Uploaded_Files','Stamp some of the files that were uploaded during a submission.'); INSERT INTO sbmALLFUNCDESCR VALUES ('Test_Status',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Update_Approval_DB',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('User_is_Record_Owner_or_Curator','Check if user is owner or special editor of a record'); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_Files_to_Storage','Attach files received from chosen file input element(s)'); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_Revised_Files_to_Storage','Revise files initially uploaded with "Move_Files_to_Storage"'); INSERT INTO sbmALLFUNCDESCR VALUES ('Make_Dummy_MARC_XML_Record',''); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_CKEditor_Files_to_Storage','Transfer files attached to the record with the CKEditor'); INSERT INTO sbmALLFUNCDESCR VALUES ('Create_Upload_Files_Interface','Display generic interface to add/revise/delete files. To be used before function "Move_Uploaded_Files_to_Storage"'); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_Uploaded_Files_to_Storage','Attach files uploaded with "Create_Upload_Files_Interface"'); INSERT INTO sbmALLFUNCDESCR VALUES ('Move_Photos_to_Storage','Attach/edit the pictures uploaded with the "create_photos_manager_interface()" function'); INSERT INTO sbmALLFUNCDESCR VALUES ('Link_Records','Link two records toghether via MARC'); INSERT INTO sbmALLFUNCDESCR VALUES ('Video_Processing',NULL); INSERT INTO sbmALLFUNCDESCR VALUES ('Set_RN_From_Sysno', 'Set the value of global rn variable to the report number identified by sysno (recid)'); INSERT INTO sbmALLFUNCDESCR VALUES ('Notify_URL','Access URL, possibly to post content'); INSERT INTO sbmALLFUNCDESCR VALUES ('Run_PlotExtractor','Run PlotExtractor on the current record'); INSERT INTO sbmFIELDDESC VALUES ('Upload_Photos',NULL,'','R',NULL,NULL,NULL,NULL,NULL,'\"\"\"\r\nThis is an example of element that creates a photos upload interface.\r\nClone it, customize it and integrate it into your submission. Then add function \r\n\'Move_Photos_to_Storage\' to your submission functions list, in order for files \r\nuploaded with this interface to be attached to the record. More information in \r\nthe WebSubmit admin guide.\r\n\"\"\"\r\n\r\nfrom invenio.legacy.websubmit.functions.Shared_Functions import ParamFromFile\r\nfrom invenio.websubmit_functions.Move_Photos_to_Storage import \\\r\n read_param_file, \\\r\n create_photos_manager_interface, \\\r\n get_session_id\r\n\r\n# Retrieve session id\r\ntry:\r\n # User info is defined only in MBI/MPI actions...\r\n session_id = get_session_id(None, uid, user_info) \r\nexcept:\r\n session_id = get_session_id(req, uid, {})\r\n\r\n# Retrieve context\r\nindir = curdir.split(\'/\')[-3]\r\ndoctype = curdir.split(\'/\')[-2]\r\naccess = curdir.split(\'/\')[-1]\r\n\r\n# Get the record ID, if any\r\nsysno = ParamFromFile(\"%s/%s\" % (curdir,\'SN\')).strip()\r\n\r\n\"\"\"\r\nModify below the configuration of the photos manager interface.\r\nNote: `can_reorder_photos\' parameter is not yet fully taken into consideration\r\n\r\nDocumentation of the function is available at$ batchuploader --documents
No match close to %s found in given collections. #Please try different term.
Displaying matches in any collection...""" % p_orig)
## try to get nbhits for these phrases in any collection:
for phrase in browsed_phrases:
nbhits = get_nbhits_in_bibxxx(phrase, f, coll_hitset)
if nbhits > 0:
browsed_phrases_in_colls.append([phrase, nbhits])
return browsed_phrases_in_colls
def browse_pattern(req, colls, p, f, rg, ln=CFG_SITE_LANG):
"""Displays either biliographic phrases or words indexes."""
# load the right message language
_ = gettext_set_language(ln)
browsed_phrases_in_colls = browse_pattern_phrases(req, colls, p, f, rg, ln)
if len(browsed_phrases_in_colls) == 0:
req.write(_("No values found."))
return
## display results now:
out = websearch_templates.tmpl_browse_pattern(
f=f,
fn=get_field_i18nname(get_field_name(f) or f, ln, False),
ln=ln,
browsed_phrases_in_colls=browsed_phrases_in_colls,
colls=colls,
rg=rg,
)
req.write(out)
return
def browse_in_bibwords(req, p, f, ln=CFG_SITE_LANG):
"""Browse inside words indexes."""
if not p:
return
_ = gettext_set_language(ln)
urlargd = {}
urlargd.update(req.argd)
urlargd['action'] = 'search'
nearest_box = create_nearest_terms_box(urlargd, p, f, 'w', ln=ln, intro_text_p=0)
req.write(websearch_templates.tmpl_search_in_bibwords(
p = p,
f = f,
ln = ln,
nearest_box = nearest_box
))
return
def search_pattern(req=None, p=None, f=None, m=None, ap=0, of="id", verbose=0, ln=CFG_SITE_LANG, display_nearest_terms_box=True, wl=0):
"""Search for complex pattern 'p' within field 'f' according to
matching type 'm'. Return hitset of recIDs.
The function uses multi-stage searching algorithm in case of no
exact match found. See the Search Internals document for
detailed description.
The 'ap' argument governs whether an alternative patterns are to
be used in case there is no direct hit for (p,f,m). For
example, whether to replace non-alphanumeric characters by
spaces if it would give some hits. See the Search Internals
document for detailed description. (ap=0 forbits the
alternative pattern usage, ap=1 permits it.)
'ap' is also internally used for allowing hidden tag search
(for requests coming from webcoll, for example). In this
case ap=-9
The 'of' argument governs whether to print or not some
information to the user in case of no match found. (Usually it
prints the information in case of HTML formats, otherwise it's
silent).
The 'verbose' argument controls the level of debugging information
to be printed (0=least, 9=most).
All the parameters are assumed to have been previously washed.
This function is suitable as a mid-level API.
"""
_ = gettext_set_language(ln)
hitset_empty = intbitset()
# sanity check:
if not p:
hitset_full = intbitset(trailing_bits=1)
hitset_full.discard(0)
# no pattern, so return all universe
return hitset_full
# search stage 1: break up arguments into basic search units:
if verbose and of.startswith("h"):
t1 = os.times()[4]
basic_search_units = create_basic_search_units(req, p, f, m, of)
if verbose and of.startswith("h"):
t2 = os.times()[4]
write_warning("Search stage 1: basic search units are: %s" % cgi.escape(repr(basic_search_units)), req=req)
write_warning("Search stage 1: execution took %.2f seconds." % (t2 - t1), req=req)
# search stage 2: do search for each search unit and verify hit presence:
if verbose and of.startswith("h"):
t1 = os.times()[4]
basic_search_units_hitsets = []
#prepare hiddenfield-related..
myhiddens = cfg['CFG_BIBFORMAT_HIDDEN_TAGS']
can_see_hidden = False
if req:
user_info = collect_user_info(req)
can_see_hidden = user_info.get('precached_canseehiddenmarctags', False)
if not req and ap == -9: # special request, coming from webcoll
can_see_hidden = True
if can_see_hidden:
myhiddens = []
if CFG_INSPIRE_SITE and of.startswith('h'):
# fulltext/caption search warnings for INSPIRE:
fields_to_be_searched = [f for o, p, f, m in basic_search_units]
if 'fulltext' in fields_to_be_searched:
write_warning( _("Warning: full-text search is only available for a subset of papers mostly from %(x_range_from_year)s-%(x_range_to_year)s.") % \
{'x_range_from_year': '2006',
'x_range_to_year': '2012'}, req=req)
elif 'caption' in fields_to_be_searched:
write_warning(_("Warning: figure caption search is only available for a subset of papers mostly from %(x_range_from_year)s-%(x_range_to_year)s.") % \
{'x_range_from_year': '2008',
'x_range_to_year': '2012'}, req=req)
for idx_unit in xrange(len(basic_search_units)):
bsu_o, bsu_p, bsu_f, bsu_m = basic_search_units[idx_unit]
if bsu_f and len(bsu_f) < 2:
if of.startswith("h"):
write_warning(_("There is no index %(x_name)s. Searching for %(x_text)s in all fields.", x_name=bsu_f, x_text=bsu_p), req=req)
bsu_f = ''
bsu_m = 'w'
if of.startswith("h") and verbose:
write_warning(_('Instead searching %(x_name)s.', x_name=str([bsu_o, bsu_p, bsu_f, bsu_m])), req=req)
try:
basic_search_unit_hitset = search_unit(bsu_p, bsu_f, bsu_m, wl)
except InvenioWebSearchWildcardLimitError as excp:
basic_search_unit_hitset = excp.res
if of.startswith("h"):
write_warning(_("Search term too generic, displaying only partial results..."), req=req)
# FIXME: print warning if we use native full-text indexing
if bsu_f == 'fulltext' and bsu_m != 'w' and of.startswith('h') and not CFG_SOLR_URL:
write_warning(_("No phrase index available for fulltext yet, looking for word combination..."), req=req)
#check that the user is allowed to search with this tag
#if he/she tries it
if bsu_f and len(bsu_f) > 1 and bsu_f[0].isdigit() and bsu_f[1].isdigit():
for htag in myhiddens:
ltag = len(htag)
samelenfield = bsu_f[0:ltag]
if samelenfield == htag: #user searches by a hidden tag
#we won't show you anything..
basic_search_unit_hitset = intbitset()
if verbose >= 9 and of.startswith("h"):
write_warning("Pattern %s hitlist omitted since \
it queries in a hidden tag %s" %
(cgi.escape(repr(bsu_p)), repr(myhiddens)), req=req)
display_nearest_terms_box = False #..and stop spying, too.
if verbose >= 9 and of.startswith("h"):
write_warning("Search stage 1: pattern %s gave hitlist %s" % (cgi.escape(bsu_p), basic_search_unit_hitset), req=req)
if len(basic_search_unit_hitset) > 0 or \
ap<1 or \
bsu_o=="|" or \
((idx_unit+1) Date: %d " % yyyymmdd)
req.write(""" This page describes how to use Authority Control in Invenio from a user's perspective For an explanation of how to configure Authority Control in Invenio, cf. _(BibAuthority Admin Guide)_. When adding an authority record to INVENIO, whether by uploading a MARC record manually or by adding a new record in BibEdit, it is important to add two separate '980' fields to the record.
-The first field will contain the value “AUTHORITY” in the $a subfield.
-This is to tell INVENIO that this is an authority record.
-The second '980' field will likewise contain a value in its $a subfield, only this time you must specify what kind of authority record it is.
-Typically an author authority record would contain the term “AUTHOR”, an institution would contain “INSTITUTION” etc.
+ When adding an authority record to INVENIO, whether by uploading a MARC record manually or by adding a new record in BibEdit, it is important to add two separate '980' fields to the record.
+The first field will contain the value “AUTHORITY” in the $a subfield.
+This is to tell INVENIO that this is an authority record.
+The second '980' field will likewise contain a value in its $a subfield, only this time you must specify what kind of authority record it is.
+Typically an author authority record would contain the term “AUTHOR”, an institute would contain “INSTITUTE” etc.
It is important to communicate these exact terms to the INVENIO admin who will configure how INVENIO handles each of these authority record types for the individual INVENIO modules.
Further, you must add a unique control number to each authority record. In Invenio, this number must be contained in the 035__ $a field of the authority record and contains the MARC code (enclosed in parentheses) of the organization originating the system control number, followed immediately by the number, e.g. "(SzGeCERN)abc123").
Cf. 035 - System Control Number from the MARC 21 reference page.
When creating links between MARC records, we must distinguish two cases: 1) references from bibliographic records towards authority records, and 2) references between authority records Example: You have an article (bibliographic record) with Author "Ellis" in the 100__ $a field and you want to create a reference to the authority record for this author. This can be done by inserting the control number of this authority record (as contained in the 035__ $a subfield of the authority record) into the $0 subfield of the same 100__ field of the bibliographic record, prefixed by the type of authority record being referenced and a (configurable) separator.
+ This can be done by inserting the control number of this authority record (as contained in the 035__ $a subfield of the authority record) into the $0 subfield of the same 100__ field of the bibliographic record, prefixed by the type of authority record being referenced and a (configurable) separator.
A 100 field might look like this: In this case, since we are referencing an AUTHOR authority record, the 100__ $0 subfield would read, e.g. "AUTHOR:(CERN)abc123". If you want to reference an institution, e.g. SLAC, as affiliation for an author, you would prefix the control number with "INSTITUTION". You would add another 100__ $0 subfield to the same 100 field and add the value "INSTITUTION:(CERN)xyz789". In this case, since we are referencing an AUTHOR authority record, the 100__ $0 subfield would read, e.g. "AUTHOR:(CERN)abc123". If you want to reference an institute, e.g. SLAC, as affiliation for an author, you would prefix the control number with "INSTITUTE". You would add another 100__ $0 subfield to the same 100 field and add the value "INSTITUTE:(CERN)xyz789". Links between authority records use the 5xx fields. AUTHOR records use the 500 fields, INSTITUTION records the 510 fields and so on, according to the MARC 21 standard.
- Links between authority records use the 5xx fields. AUTHOR records use the 500 fields, INSTITUTE records the 510 fields and so on, according to the MARC 21 standard.
+
No results found."
if search_timed_out:
return "
The search engine did not respond in time."
return websearch_templates.tmpl_print_hosted_results(
url_and_engine=url_and_engine,
ln=ln,
of=of,
req=req,
limit=limit,
display_body = em == "" or EM_REPOSITORY["body"] in em,
display_add_to_basket = em == "" or EM_REPOSITORY["basket"] in em)
class BibSortDataCacher(DataCacher):
"""
Cache holding all structures created by bibsort
( _data, data_dict).
"""
def __init__(self, method_name):
self.method_name = method_name
self.method_id = 0
try:
res = run_sql("""SELECT id from bsrMETHOD where name = %s""", (self.method_name,))
except:
self.method_id = 0
if res and res[0]:
self.method_id = res[0][0]
else:
self.method_id = 0
def cache_filler():
method_id = self.method_id
alldicts = {}
if self.method_id == 0:
return {}
try:
res_data = run_sql("""SELECT data_dict_ordered from bsrMETHODDATA \
where id_bsrMETHOD = %s""", (method_id,))
res_buckets = run_sql("""SELECT bucket_no, bucket_data from bsrMETHODDATABUCKET\
where id_bsrMETHOD = %s""", (method_id,))
except Exception:
# database problems, return empty cache
return {}
try:
data_dict_ordered = deserialize_via_marshal(res_data[0][0])
except:
data_dict_ordered = {}
alldicts['data_dict_ordered'] = data_dict_ordered # recid: weight
if not res_buckets:
alldicts['bucket_data'] = {}
return alldicts
for row in res_buckets:
bucket_no = row[0]
try:
bucket_data = intbitset(row[1])
except:
bucket_data = intbitset([])
alldicts.setdefault('bucket_data', {})[bucket_no] = bucket_data
return alldicts
def timestamp_verifier():
method_id = self.method_id
res = run_sql("""SELECT last_updated from bsrMETHODDATA where id_bsrMETHOD = %s""", (method_id,))
try:
update_time_methoddata = str(res[0][0])
except IndexError:
update_time_methoddata = '1970-01-01 00:00:00'
res = run_sql("""SELECT max(last_updated) from bsrMETHODDATABUCKET where id_bsrMETHOD = %s""", (method_id,))
try:
update_time_buckets = str(res[0][0])
except IndexError:
update_time_buckets = '1970-01-01 00:00:00'
return max(update_time_methoddata, update_time_buckets)
DataCacher.__init__(self, cache_filler, timestamp_verifier)
def get_sorting_methods():
if not CFG_BIBSORT_BUCKETS: # we do not want to use buckets
return {}
try: # make sure the method has some data
res = run_sql("""SELECT m.name, m.definition FROM bsrMETHOD m, bsrMETHODDATA md WHERE m.id = md.id_bsrMETHOD""")
except:
return {}
return dict(res)
sorting_methods = get_sorting_methods()
cache_sorted_data = {}
for sorting_method in sorting_methods:
try:
cache_sorted_data[sorting_method].is_ok_p
except Exception:
cache_sorted_data[sorting_method] = BibSortDataCacher(sorting_method)
def get_tags_from_sort_fields(sort_fields):
"""Given a list of sort_fields, return the tags associated with it and
also the name of the field that has no tags associated, to be able to
display a message to the user."""
tags = []
if not sort_fields:
return [], ''
for sort_field in sort_fields:
if sort_field and str(sort_field[0:2]).isdigit():
# sort_field starts by two digits, so this is probably a MARC tag already
tags.append(sort_field)
else:
# let us check the 'field' table
field_tags = get_field_tags(sort_field)
if field_tags:
tags.extend(field_tags)
else:
return [], sort_field
return tags, ''
def rank_records(req, rank_method_code, rank_limit_relevance, hitset_global, pattern=None, verbose=0, sort_order='d', of='hb', ln=CFG_SITE_LANG, rg=None, jrec=None, field=''):
"""Initial entry point for ranking records, acts like a dispatcher.
(i) rank_method_code is in bsrMETHOD, bibsort buckets can be used;
(ii)rank_method_code is not in bsrMETHOD, use bibrank;
"""
if CFG_BIBSORT_BUCKETS and sorting_methods:
for sort_method in sorting_methods:
definition = sorting_methods[sort_method]
if definition.startswith('RNK') and \
definition.replace('RNK:','').strip().lower() == string.lower(rank_method_code):
(solution_recs, solution_scores) = sort_records_bibsort(req, hitset_global, sort_method, '', sort_order, verbose, of, ln, rg, jrec, 'r')
#return (solution_recs, solution_scores, '', '', '')
comment = ''
if verbose > 0:
comment = 'find_citations retlist %s' % [[solution_recs[i], solution_scores[i]] for i in range(len(solution_recs))]
return (solution_recs, solution_scores, '(', ')', comment)
return rank_records_bibrank(rank_method_code, rank_limit_relevance, hitset_global, pattern, verbose, field, rg, jrec)
def sort_records(req, recIDs, sort_field='', sort_order='d', sort_pattern='', verbose=0, of='hb', ln=CFG_SITE_LANG, rg=None, jrec=None):
"""Initial entry point for sorting records, acts like a dispatcher.
(i) sort_field is in the bsrMETHOD, and thus, the BibSort has sorted the data for this field, so we can use the cache;
(ii)sort_field is not in bsrMETHOD, and thus, the cache does not contain any information regarding this sorting method"""
_ = gettext_set_language(ln)
#we should return sorted records up to irec_max(exclusive)
dummy, irec_max = get_interval_for_records_to_sort(len(recIDs), jrec, rg)
#calculate the min index on the reverted list
index_min = max(len(recIDs) - irec_max, 0) #just to be sure that the min index is not negative
#bibsort does not handle sort_pattern for now, use bibxxx
if sort_pattern:
return sort_records_bibxxx(req, recIDs, None, sort_field, sort_order, sort_pattern, verbose, of, ln, rg, jrec)
use_sorting_buckets = True
if not CFG_BIBSORT_BUCKETS or not sorting_methods: #ignore the use of buckets, use old fashion sorting
use_sorting_buckets = False
if not sort_field:
if use_sorting_buckets:
return sort_records_bibsort(req, recIDs, 'latest first', sort_field, sort_order, verbose, of, ln, rg, jrec)
else:
return recIDs[index_min:]
sort_fields = string.split(sort_field, ",")
if len(sort_fields) == 1:
# we have only one sorting_field, check if it is treated by BibSort
for sort_method in sorting_methods:
definition = sorting_methods[sort_method]
if use_sorting_buckets and \
((definition.startswith('FIELD') and \
definition.replace('FIELD:','').strip().lower() == string.lower(sort_fields[0])) or \
sort_method == sort_fields[0]):
#use BibSort
return sort_records_bibsort(req, recIDs, sort_method, sort_field, sort_order, verbose, of, ln, rg, jrec)
#deduce sorting MARC tag out of the 'sort_field' argument:
tags, error_field = get_tags_from_sort_fields(sort_fields)
if error_field:
if use_sorting_buckets:
return sort_records_bibsort(req, recIDs, 'latest first', sort_field, sort_order, verbose, of, ln, rg, jrec)
else:
if of.startswith('h'):
write_warning(_("Sorry, %(x_name)s does not seem to be a valid sort option. The records will not be sorted.", x_name=cgi.escape(error_field)), "Error", req=req)
return recIDs[index_min:]
if tags:
for sort_method in sorting_methods:
definition = sorting_methods[sort_method]
if definition.startswith('MARC') \
and definition.replace('MARC:','').strip().split(',') == tags \
and use_sorting_buckets:
#this list of tags have a designated method in BibSort, so use it
return sort_records_bibsort(req, recIDs, sort_method, sort_field, sort_order, verbose, of, ln, rg, jrec)
#we do not have this sort_field in BibSort tables -> do the old fashion sorting
return sort_records_bibxxx(req, recIDs, tags, sort_field, sort_order, sort_pattern, verbose, of, ln, rg, jrec)
return recIDs[index_min:]
def sort_records_bibsort(req, recIDs, sort_method, sort_field='', sort_order='d', verbose=0, of='hb', ln=CFG_SITE_LANG, rg=None, jrec=None, sort_or_rank = 's'):
"""This function orders the recIDs list, based on a sorting method(sort_field) using the BibSortDataCacher for speed"""
_ = gettext_set_language(ln)
#sanity check
if sort_method not in sorting_methods:
if sort_or_rank == 'r':
return rank_records_bibrank(sort_method, 0, recIDs, None, verbose)
else:
return sort_records_bibxxx(req, recIDs, None, sort_field, sort_order, '', verbose, of, ln, rg, jrec)
if verbose >= 3 and of.startswith('h'):
write_warning("Sorting (using BibSort cache) by method %s (definition %s)." \
% (cgi.escape(repr(sort_method)), cgi.escape(repr(sorting_methods[sort_method]))), req=req)
#we should return sorted records up to irec_max(exclusive)
dummy, irec_max = get_interval_for_records_to_sort(len(recIDs), jrec, rg)
solution = intbitset([])
input_recids = intbitset(recIDs)
cache_sorted_data[sort_method].recreate_cache_if_needed()
sort_cache = cache_sorted_data[sort_method].cache
bucket_numbers = sort_cache['bucket_data'].keys()
#check if all buckets have been constructed
if len(bucket_numbers) != CFG_BIBSORT_BUCKETS:
if verbose > 3 and of.startswith('h'):
write_warning("Not all buckets have been constructed.. switching to old fashion sorting.", req=req)
if sort_or_rank == 'r':
return rank_records_bibrank(sort_method, 0, recIDs, None, verbose)
else:
return sort_records_bibxxx(req, recIDs, None, sort_field, sort_order, '', verbose, of, ln, rg, jrec)
if sort_order == 'd':
bucket_numbers.reverse()
for bucket_no in bucket_numbers:
solution.union_update(input_recids & sort_cache['bucket_data'][bucket_no])
if len(solution) >= irec_max:
break
dict_solution = {}
missing_records = []
for recid in solution:
try:
dict_solution[recid] = sort_cache['data_dict_ordered'][recid]
except KeyError:
#recid is in buckets, but not in the bsrMETHODDATA,
#maybe because the value has been deleted, but the change has not yet been propagated to the buckets
missing_records.append(recid)
#check if there are recids that are not in any bucket -> to be added at the end/top, ordered by insertion date
if len(solution) < irec_max:
#some records have not been yet inserted in the bibsort structures
#or, some records have no value for the sort_method
missing_records = sorted(missing_records + list(input_recids.difference(solution)))
#the records need to be sorted in reverse order for the print record function
#the return statement should be equivalent with the following statements
#(these are clearer, but less efficient, since they revert the same list twice)
#sorted_solution = (missing_records + sorted(dict_solution, key=dict_solution.__getitem__, reverse=sort_order=='d'))[:irec_max]
#sorted_solution.reverse()
#return sorted_solution
if sort_method.strip().lower().startswith('latest') and sort_order == 'd':
# if we want to sort the records on their insertion date, add the mission records at the top
solution = sorted(dict_solution, key=dict_solution.__getitem__, reverse=sort_order=='a') + missing_records
else:
solution = missing_records + sorted(dict_solution, key=dict_solution.__getitem__, reverse=sort_order=='a')
#calculate the min index on the reverted list
index_min = max(len(solution) - irec_max, 0) #just to be sure that the min index is not negative
#return all the records up to irec_max, but on the reverted list
if sort_or_rank == 'r':
# we need the recids, with values
return (solution[index_min:], [dict_solution.get(record, 0) for record in solution[index_min:]])
else:
return solution[index_min:]
def sort_records_bibxxx(req, recIDs, tags, sort_field='', sort_order='d', sort_pattern='', verbose=0, of='hb', ln=CFG_SITE_LANG, rg=None, jrec=None):
"""OLD FASHION SORTING WITH NO CACHE, for sort fields that are not run in BibSort
Sort records in 'recIDs' list according sort field 'sort_field' in order 'sort_order'.
If more than one instance of 'sort_field' is found for a given record, try to choose that that is given by
'sort pattern', for example "sort by report number that starts by CERN-PS".
Note that 'sort_field' can be field code like 'author' or MARC tag like '100__a' directly."""
_ = gettext_set_language(ln)
#we should return sorted records up to irec_max(exclusive)
dummy, irec_max = get_interval_for_records_to_sort(len(recIDs), jrec, rg)
#calculate the min index on the reverted list
index_min = max(len(recIDs) - irec_max, 0) #just to be sure that the min index is not negative
## check arguments:
if not sort_field:
return recIDs[index_min:]
if len(recIDs) > CFG_WEBSEARCH_NB_RECORDS_TO_SORT:
if of.startswith('h'):
write_warning(_("Sorry, sorting is allowed on sets of up to %(x_name)d records only. Using default sort order.", x_name=CFG_WEBSEARCH_NB_RECORDS_TO_SORT), "Warning", req=req)
return recIDs[index_min:]
recIDs_dict = {}
recIDs_out = []
if not tags:
# tags have not been camputed yet
sort_fields = string.split(sort_field, ",")
tags, error_field = get_tags_from_sort_fields(sort_fields)
if error_field:
if of.startswith('h'):
write_warning(_("Sorry, %(x_name)s does not seem to be a valid sort option. The records will not be sorted.", x_name=cgi.escape(error_field)), "Error", req=req)
return recIDs[index_min:]
if verbose >= 3 and of.startswith('h'):
write_warning("Sorting by tags %s." % cgi.escape(repr(tags)), req=req)
if sort_pattern:
write_warning("Sorting preferentially by %s." % cgi.escape(sort_pattern), req=req)
## check if we have sorting tag defined:
if tags:
# fetch the necessary field values:
for recID in recIDs:
val = "" # will hold value for recID according to which sort
vals = [] # will hold all values found in sorting tag for recID
for tag in tags:
if CFG_CERN_SITE and tag == '773__c':
# CERN hack: journal sorting
# 773__c contains page numbers, e.g. 3-13, and we want to sort by 3, and numerically:
vals.extend(["%050s" % x.split("-", 1)[0] for x in get_fieldvalues(recID, tag)])
else:
vals.extend(get_fieldvalues(recID, tag))
if sort_pattern:
# try to pick that tag value that corresponds to sort pattern
bingo = 0
for v in vals:
if v.lower().startswith(sort_pattern.lower()): # bingo!
bingo = 1
val = v
break
if not bingo: # sort_pattern not present, so add other vals after spaces
val = sort_pattern + " " + string.join(vals)
else:
# no sort pattern defined, so join them all together
val = string.join(vals)
val = strip_accents(val.lower()) # sort values regardless of accents and case
if val in recIDs_dict:
recIDs_dict[val].append(recID)
else:
recIDs_dict[val] = [recID]
# sort them:
recIDs_dict_keys = recIDs_dict.keys()
recIDs_dict_keys.sort()
# now that keys are sorted, create output array:
for k in recIDs_dict_keys:
for s in recIDs_dict[k]:
recIDs_out.append(s)
# ascending or descending?
if sort_order == 'a':
recIDs_out.reverse()
# okay, we are done
# return only up to the maximum that we need to sort
if len(recIDs_out) != len(recIDs):
dummy, irec_max = get_interval_for_records_to_sort(len(recIDs_out), jrec, rg)
index_min = max(len(recIDs_out) - irec_max, 0) #just to be sure that the min index is not negative
return recIDs_out[index_min:]
else:
# good, no sort needed
return recIDs[index_min:]
def get_interval_for_records_to_sort(nb_found, jrec=None, rg=None):
"""calculates in which interval should the sorted records be
a value of 'rg=-9999' means to print all records: to be used with care."""
if not jrec:
jrec = 1
if not rg:
#return all
return jrec-1, nb_found
if rg == -9999: # print all records
rg = nb_found
else:
rg = abs(rg)
if jrec < 1: # sanity checks
jrec = 1
if jrec > nb_found:
jrec = max(nb_found-rg+1, 1)
# will sort records from irec_min to irec_max excluded
irec_min = jrec - 1
irec_max = irec_min + rg
if irec_min < 0:
irec_min = 0
if irec_max > nb_found:
irec_max = nb_found
return irec_min, irec_max
def print_records(req, recIDs, jrec=1, rg=CFG_WEBSEARCH_DEF_RECORDS_IN_GROUPS, format='hb', ot='', ln=CFG_SITE_LANG,
relevances=[], relevances_prologue="(", relevances_epilogue="%%)",
decompress=zlib.decompress, search_pattern='', print_records_prologue_p=True,
print_records_epilogue_p=True, verbose=0, tab='', sf='', so='d', sp='',
rm='', em=''):
"""
Prints list of records 'recIDs' formatted according to 'format' in
groups of 'rg' starting from 'jrec'.
Assumes that the input list 'recIDs' is sorted in reverse order,
so it counts records from tail to head.
A value of 'rg=-9999' means to print all records: to be used with care.
Print also list of RELEVANCES for each record (if defined), in
between RELEVANCE_PROLOGUE and RELEVANCE_EPILOGUE.
Print prologue and/or epilogue specific to 'format' if
'print_records_prologue_p' and/or print_records_epilogue_p' are
True.
'sf' is sort field and 'rm' is ranking method that are passed here
only for proper linking purposes: e.g. when a certain ranking
method or a certain sort field was selected, keep it selected in
any dynamic search links that may be printed.
"""
if em != "" and EM_REPOSITORY["body"] not in em:
return
# load the right message language
_ = gettext_set_language(ln)
# sanity checking:
if req is None:
return
# get user_info (for formatting based on user)
if isinstance(req, cStringIO.OutputType):
user_info = {}
else:
user_info = collect_user_info(req)
if len(recIDs):
nb_found = len(recIDs)
if rg == -9999: # print all records
rg = nb_found
else:
rg = abs(rg)
if jrec < 1: # sanity checks
jrec = 1
if jrec > nb_found:
jrec = max(nb_found-rg+1, 1)
# will print records from irec_max to irec_min excluded:
irec_max = nb_found - jrec
irec_min = nb_found - jrec - rg
if irec_min < 0:
irec_min = -1
if irec_max >= nb_found:
irec_max = nb_found - 1
#req.write("%s:%d-%d" % (recIDs, irec_min, irec_max))
if format.startswith('x'):
# print header if needed
if print_records_prologue_p:
print_records_prologue(req, format)
# print records
recIDs_to_print = [recIDs[x] for x in range(irec_max, irec_min, -1)]
if ot:
# asked to print some filtered fields only, so call print_record() on the fly:
for irec in range(irec_max, irec_min, -1):
x = print_record(recIDs[irec], format, ot, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose, sf=sf, so=so, sp=sp, rm=rm)
req.write(x)
if x:
req.write('\n')
else:
format_records(recIDs_to_print,
format,
ln=ln,
search_pattern=search_pattern,
record_separator="\n",
user_info=user_info,
req=req)
# print footer if needed
if print_records_epilogue_p:
print_records_epilogue(req, format)
elif format.startswith('t') or str(format[0:3]).isdigit():
# we are doing plain text output:
for irec in range(irec_max, irec_min, -1):
x = print_record(recIDs[irec], format, ot, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose, sf=sf, so=so, sp=sp, rm=rm)
req.write(x)
if x:
req.write('\n')
elif format == 'excel':
recIDs_to_print = [recIDs[x] for x in range(irec_max, irec_min, -1)]
create_excel(recIDs=recIDs_to_print, req=req, ln=ln, ot=ot, user_info=user_info)
else:
# we are doing HTML output:
if format == 'hp' or format.startswith("hb_") or format.startswith("hd_"):
# portfolio and on-the-fly formats:
for irec in range(irec_max, irec_min, -1):
req.write(print_record(recIDs[irec], format, ot, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose, sf=sf, so=so, sp=sp, rm=rm))
elif format.startswith("hb"):
# HTML brief format:
display_add_to_basket = True
if user_info:
if user_info['email'] == 'guest':
if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS > 4:
display_add_to_basket = False
else:
if not user_info['precached_usebaskets']:
display_add_to_basket = False
if em != "" and EM_REPOSITORY["basket"] not in em:
display_add_to_basket = False
req.write(websearch_templates.tmpl_record_format_htmlbrief_header(
ln = ln))
for irec in range(irec_max, irec_min, -1):
row_number = jrec+irec_max-irec
recid = recIDs[irec]
if relevances and relevances[irec]:
relevance = relevances[irec]
else:
relevance = ''
record = print_record(recIDs[irec], format, ot, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose, sf=sf, so=so, sp=sp, rm=rm)
req.write(websearch_templates.tmpl_record_format_htmlbrief_body(
ln = ln,
recid = recid,
row_number = row_number,
relevance = relevance,
record = record,
relevances_prologue = relevances_prologue,
relevances_epilogue = relevances_epilogue,
display_add_to_basket = display_add_to_basket
))
req.write(websearch_templates.tmpl_record_format_htmlbrief_footer(
ln = ln,
display_add_to_basket = display_add_to_basket))
elif format.startswith("hd"):
# HTML detailed format:
for irec in range(irec_max, irec_min, -1):
if record_exists(recIDs[irec]) == -1:
write_warning(_("The record has been deleted."), req=req)
merged_recid = get_merged_recid(recIDs[irec])
if merged_recid:
write_warning(_("The record %(x_rec)d replaces it.", x_rec=merged_recid), req=req)
continue
unordered_tabs = get_detailed_page_tabs(get_colID(guess_primary_collection_of_a_record(recIDs[irec])),
recIDs[irec], ln=ln)
ordered_tabs_id = [(tab_id, values['order']) for (tab_id, values) in iteritems(unordered_tabs)]
ordered_tabs_id.sort(lambda x, y: cmp(x[1], y[1]))
link_ln = ''
if ln != CFG_SITE_LANG:
link_ln = '?ln=%s' % ln
recid = recIDs[irec]
recid_to_display = recid # Record ID used to build the URL.
if CFG_WEBSEARCH_USE_ALEPH_SYSNOS:
try:
recid_to_display = get_fieldvalues(recid,
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG)[0]
except IndexError:
# No external sysno is available, keep using
# internal recid.
pass
tabs = [(unordered_tabs[tab_id]['label'], \
'%s/%s/%s/%s%s' % (CFG_SITE_URL, CFG_SITE_RECORD, recid_to_display, tab_id, link_ln), \
tab_id == tab,
unordered_tabs[tab_id]['enabled']) \
for (tab_id, order) in ordered_tabs_id
if unordered_tabs[tab_id]['visible'] == True]
tabs_counts = get_detailed_page_tabs_counts(recid)
citedbynum = tabs_counts['Citations']
references = tabs_counts['References']
discussions = tabs_counts['Discussions']
# load content
if tab == 'usage':
req.write(webstyle_templates.detailed_record_container_top(recIDs[irec],
tabs,
ln,
citationnum=citedbynum,
referencenum=references,
discussionnum=discussions))
r = calculate_reading_similarity_list(recIDs[irec], "downloads")
downloadsimilarity = None
downloadhistory = None
#if r:
# downloadsimilarity = r
if CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS:
downloadhistory = create_download_history_graph_and_box(recIDs[irec], ln)
r = calculate_reading_similarity_list(recIDs[irec], "pageviews")
viewsimilarity = None
if r: viewsimilarity = r
content = websearch_templates.tmpl_detailed_record_statistics(recIDs[irec],
ln,
downloadsimilarity=downloadsimilarity,
downloadhistory=downloadhistory,
viewsimilarity=viewsimilarity)
req.write(content)
req.write(webstyle_templates.detailed_record_container_bottom(recIDs[irec],
tabs,
ln))
elif tab == 'citations':
recid = recIDs[irec]
req.write(webstyle_templates.detailed_record_container_top(recid,
tabs,
ln,
citationnum=citedbynum,
referencenum=references,
discussionnum=discussions))
req.write(websearch_templates.tmpl_detailed_record_citations_prologue(recid, ln))
# Citing
citinglist = calculate_cited_by_list(recid)
req.write(websearch_templates.tmpl_detailed_record_citations_citing_list(recid,
ln,
citinglist,
sf=sf,
so=so,
sp=sp,
rm=rm))
# Self-cited
selfcited = get_self_cited_by(recid)
req.write(websearch_templates.tmpl_detailed_record_citations_self_cited(recid,
ln, selfcited=selfcited, citinglist=citinglist))
# Co-cited
s = calculate_co_cited_with_list(recid)
cociting = None
if s:
cociting = s
req.write(websearch_templates.tmpl_detailed_record_citations_co_citing(recid,
ln,
cociting=cociting))
# Citation history, if needed
citationhistory = None
if citinglist:
citationhistory = create_citation_history_graph_and_box(recid, ln)
#debug
if verbose > 3:
write_warning("Citation graph debug: " + \
str(len(citationhistory)), req=req)
req.write(websearch_templates.tmpl_detailed_record_citations_citation_history(recid, ln, citationhistory))
req.write(websearch_templates.tmpl_detailed_record_citations_epilogue(recid, ln))
req.write(webstyle_templates.detailed_record_container_bottom(recid,
tabs,
ln))
elif tab == 'references':
req.write(webstyle_templates.detailed_record_container_top(recIDs[irec],
tabs,
ln,
citationnum=citedbynum,
referencenum=references,
discussionnum=discussions))
req.write(format_record(recIDs[irec], 'HDREF', ln=ln, user_info=user_info, verbose=verbose))
req.write(webstyle_templates.detailed_record_container_bottom(recIDs[irec],
tabs,
ln))
elif tab == 'keywords':
from invenio.bibclassify_webinterface import \
record_get_keywords, write_keywords_body, \
generate_keywords
from invenio.webinterface_handler import wash_urlargd
form = req.form
argd = wash_urlargd(form, {
'generate': (str, 'no'),
'sort': (str, 'occurrences'),
'type': (str, 'tagcloud'),
'numbering': (str, 'off'),
})
recid = recIDs[irec]
req.write(webstyle_templates.detailed_record_container_top(recid,
tabs,
ln))
content = websearch_templates.tmpl_record_plots(recID=recid,
ln=ln)
req.write(content)
req.write(webstyle_templates.detailed_record_container_bottom(recid,
tabs,
ln))
req.write(webstyle_templates.detailed_record_container_top(recid,
tabs, ln, citationnum=citedbynum, referencenum=references))
if argd['generate'] == 'yes':
# The user asked to generate the keywords.
keywords = generate_keywords(req, recid, argd)
else:
# Get the keywords contained in the MARC.
keywords = record_get_keywords(recid, argd)
if argd['sort'] == 'related' and not keywords:
req.write('You may want to run BibIndex.')
# Output the keywords or the generate button.
write_keywords_body(keywords, req, recid, argd)
req.write(webstyle_templates.detailed_record_container_bottom(recid,
tabs, ln))
elif tab == 'plots':
req.write(webstyle_templates.detailed_record_container_top(recIDs[irec],
tabs,
ln))
content = websearch_templates.tmpl_record_plots(recID=recIDs[irec],
ln=ln)
req.write(content)
req.write(webstyle_templates.detailed_record_container_bottom(recIDs[irec],
tabs,
ln))
else:
# Metadata tab
req.write(webstyle_templates.detailed_record_container_top(recIDs[irec],
tabs,
ln,
show_short_rec_p=False,
citationnum=citedbynum, referencenum=references,
discussionnum=discussions))
creationdate = None
modificationdate = None
if record_exists(recIDs[irec]) == 1:
creationdate = get_creation_date(recIDs[irec])
modificationdate = get_modification_date(recIDs[irec])
content = print_record(recIDs[irec], format, ot, ln,
search_pattern=search_pattern,
user_info=user_info, verbose=verbose,
sf=sf, so=so, sp=sp, rm=rm)
content = websearch_templates.tmpl_detailed_record_metadata(
recID = recIDs[irec],
ln = ln,
format = format,
creationdate = creationdate,
modificationdate = modificationdate,
content = content)
# display of the next-hit/previous-hit/back-to-search links
# on the detailed record pages
content += websearch_templates.tmpl_display_back_to_search(req,
recIDs[irec],
ln)
req.write(content)
req.write(webstyle_templates.detailed_record_container_bottom(recIDs[irec],
tabs,
ln,
creationdate=creationdate,
modificationdate=modificationdate,
show_short_rec_p=False))
if len(tabs) > 0:
# Add the mini box at bottom of the page
if CFG_WEBCOMMENT_ALLOW_REVIEWS:
from invenio.modules.comments.api import get_mini_reviews
reviews = get_mini_reviews(recid = recIDs[irec], ln=ln)
else:
reviews = ''
actions = format_record(recIDs[irec], 'HDACT', ln=ln, user_info=user_info, verbose=verbose)
files = format_record(recIDs[irec], 'HDFILE', ln=ln, user_info=user_info, verbose=verbose)
req.write(webstyle_templates.detailed_record_mini_panel(recIDs[irec],
ln,
format,
files=files,
reviews=reviews,
actions=actions))
else:
# Other formats
for irec in range(irec_max, irec_min, -1):
req.write(print_record(recIDs[irec], format, ot, ln,
search_pattern=search_pattern,
user_info=user_info, verbose=verbose,
sf=sf, so=so, sp=sp, rm=rm))
else:
write_warning(_("Use different search terms."), req=req)
def print_records_prologue(req, format, cc=None):
"""
Print the appropriate prologue for list of records in the given
format.
"""
prologue = "" # no prologue needed for HTML or Text formats
if format.startswith('xm'):
prologue = websearch_templates.tmpl_xml_marc_prologue()
elif format.startswith('xn'):
prologue = websearch_templates.tmpl_xml_nlm_prologue()
elif format.startswith('xw'):
prologue = websearch_templates.tmpl_xml_refworks_prologue()
elif format.startswith('xr'):
prologue = websearch_templates.tmpl_xml_rss_prologue(cc=cc)
elif format.startswith('xe8x'):
prologue = websearch_templates.tmpl_xml_endnote_8x_prologue()
elif format.startswith('xe'):
prologue = websearch_templates.tmpl_xml_endnote_prologue()
elif format.startswith('xo'):
prologue = websearch_templates.tmpl_xml_mods_prologue()
elif format.startswith('xp'):
prologue = websearch_templates.tmpl_xml_podcast_prologue(cc=cc)
elif format.startswith('x'):
prologue = websearch_templates.tmpl_xml_default_prologue()
req.write(prologue)
def print_records_epilogue(req, format):
"""
Print the appropriate epilogue for list of records in the given
format.
"""
epilogue = "" # no epilogue needed for HTML or Text formats
if format.startswith('xm'):
epilogue = websearch_templates.tmpl_xml_marc_epilogue()
elif format.startswith('xn'):
epilogue = websearch_templates.tmpl_xml_nlm_epilogue()
elif format.startswith('xw'):
epilogue = websearch_templates.tmpl_xml_refworks_epilogue()
elif format.startswith('xr'):
epilogue = websearch_templates.tmpl_xml_rss_epilogue()
elif format.startswith('xe8x'):
epilogue = websearch_templates.tmpl_xml_endnote_8x_epilogue()
elif format.startswith('xe'):
epilogue = websearch_templates.tmpl_xml_endnote_epilogue()
elif format.startswith('xo'):
epilogue = websearch_templates.tmpl_xml_mods_epilogue()
elif format.startswith('xp'):
epilogue = websearch_templates.tmpl_xml_podcast_epilogue()
elif format.startswith('x'):
epilogue = websearch_templates.tmpl_xml_default_epilogue()
req.write(epilogue)
def get_record(recid):
"""Directly the record object corresponding to the recid."""
if CFG_BIBUPLOAD_SERIALIZE_RECORD_STRUCTURE:
value = run_sql("SELECT value FROM bibfmt WHERE id_bibrec=%s AND FORMAT='recstruct'", (recid, ))
if value:
try:
return deserialize_via_marshal(value[0][0])
except:
### In case of corruption, let's rebuild it!
pass
return create_record(print_record(recid, 'xm'))[0]
def print_record(recID, format='hb', ot='', ln=CFG_SITE_LANG, decompress=zlib.decompress,
search_pattern=None, user_info=None, verbose=0, sf='', so='d',
sp='', rm='', brief_links=True):
"""
Prints record 'recID' formatted according to 'format'.
'sf' is sort field and 'rm' is ranking method that are passed here
only for proper linking purposes: e.g. when a certain ranking
method or a certain sort field was selected, keep it selected in
any dynamic search links that may be printed.
"""
if format == 'recstruct':
return get_record(recID)
_ = gettext_set_language(ln)
display_claim_this_paper = False
try:
display_claim_this_paper = user_info["precached_viewclaimlink"]
except (KeyError, TypeError):
display_claim_this_paper = False
#check from user information if the user has the right to see hidden fields/tags in the
#records as well
can_see_hidden = False
if user_info:
can_see_hidden = user_info.get('precached_canseehiddenmarctags', False)
out = ""
# sanity check:
record_exist_p = record_exists(recID)
if record_exist_p == 0: # doesn't exist
return out
# New Python BibFormat procedure for formatting
# Old procedure follows further below
# We must still check some special formats, but these
# should disappear when BibFormat improves.
if not (CFG_BIBFORMAT_USE_OLD_BIBFORMAT \
or format.lower().startswith('t') \
or format.lower().startswith('hm') \
or str(format[0:3]).isdigit() \
or ot):
# Unspecified format is hd
if format == '':
format = 'hd'
if record_exist_p == -1 and get_output_format_content_type(format) == 'text/html':
# HTML output displays a default value for deleted records.
# Other format have to deal with it.
out += _("The record has been deleted.")
# was record deleted-but-merged ?
merged_recid = get_merged_recid(recID)
if merged_recid:
out += ' ' + _("The record %(x_rec)d replaces it.", x_rec=merged_recid)
else:
out += call_bibformat(recID, format, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
# at the end of HTML brief mode, print the "Detailed record" functionality:
if brief_links and format.lower().startswith('hb') and \
format.lower() != 'hb_p':
out += websearch_templates.tmpl_print_record_brief_links(ln=ln,
recID=recID,
sf=sf,
so=so,
sp=sp,
rm=rm,
display_claim_link=display_claim_this_paper)
return out
# Old PHP BibFormat procedure for formatting
# print record opening tags, if needed:
if format == "marcxml" or format == "oai_dc":
out += " " + cgi.escape(get_fieldvalues_alephseq_like(recID, ["001", CFG_OAI_ID_FIELD, "980"], can_see_hidden)) + "
"
+ out += "\n" + cgi.escape(get_fieldvalues_alephseq_like(recID, ["001", CFG_OAI_ID_FIELD, "980"], can_see_hidden)) + "
"
else:
- out += "\n" + cgi.escape(get_fieldvalues_alephseq_like(recID, ot, can_see_hidden)) + "
"
+ out += "\n" + cgi.escape(get_fieldvalues_alephseq_like(recID, ot, can_see_hidden)) + "
"
elif format.startswith("h") and ot:
## user directly asked for some tags to be displayed only
if record_exist_p == -1:
out += "\n" + get_fieldvalues_alephseq_like(recID, ["001", CFG_OAI_ID_FIELD, "980"], can_see_hidden) + "
"
else:
out += "\n" + get_fieldvalues_alephseq_like(recID, ot, can_see_hidden) + "
"
elif format == "hd":
# HTML detailed format
if record_exist_p == -1:
out += _("The record has been deleted.")
else:
# look for detailed format existence:
query = "SELECT value FROM bibfmt WHERE id_bibrec=%s AND format=%s"
res = run_sql(query, (recID, format), 1)
if res:
# record 'recID' is formatted in 'format', so print it
out += "%s" % decompress(res[0][0])
else:
# record 'recID' is not formatted in 'format', so try to call BibFormat on the fly or use default format:
out_record_in_format = call_bibformat(recID, format, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
if out_record_in_format:
out += out_record_in_format
else:
out += websearch_templates.tmpl_print_record_detailed(
ln = ln,
recID = recID,
)
elif format.startswith("hb_") or format.startswith("hd_"):
# underscore means that HTML brief/detailed formats should be called on-the-fly; suitable for testing formats
if record_exist_p == -1:
out += _("The record has been deleted.")
else:
out += call_bibformat(recID, format, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
elif format.startswith("hx"):
# BibTeX format, called on the fly:
if record_exist_p == -1:
out += _("The record has been deleted.")
else:
out += call_bibformat(recID, format, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
elif format.startswith("hs"):
# for citation/download similarity navigation links:
if record_exist_p == -1:
out += _("The record has been deleted.")
else:
out += '' % websearch_templates.build_search_url(recid=recID, ln=ln)
# firstly, title:
titles = get_fieldvalues(recID, "245__a")
if titles:
for title in titles:
out += "%s" % title
else:
# usual title not found, try conference title:
titles = get_fieldvalues(recID, "111__a")
if titles:
for title in titles:
out += "%s" % title
else:
# just print record ID:
out += "%s %d" % (get_field_i18nname("record ID", ln, False), recID)
out += ""
# secondly, authors:
authors = get_fieldvalues(recID, "100__a") + get_fieldvalues(recID, "700__a")
if authors:
out += " - %s" % authors[0]
if len(authors) > 1:
out += " et al"
# thirdly publication info:
publinfos = get_fieldvalues(recID, "773__s")
if not publinfos:
publinfos = get_fieldvalues(recID, "909C4s")
if not publinfos:
publinfos = get_fieldvalues(recID, "037__a")
if not publinfos:
publinfos = get_fieldvalues(recID, "088__a")
if publinfos:
out += " - %s" % publinfos[0]
else:
# fourthly publication year (if not publication info):
years = get_fieldvalues(recID, "773__y")
if not years:
years = get_fieldvalues(recID, "909C4y")
if not years:
years = get_fieldvalues(recID, "260__c")
if years:
out += " (%s)" % years[0]
else:
# HTML brief format by default
if record_exist_p == -1:
out += _("The record has been deleted.")
else:
query = "SELECT value FROM bibfmt WHERE id_bibrec=%s AND format=%s"
res = run_sql(query, (recID, format))
if res:
# record 'recID' is formatted in 'format', so print it
out += "%s" % decompress(res[0][0])
else:
# record 'recID' is not formatted in 'format', so try to call BibFormat on the fly: or use default format:
if CFG_WEBSEARCH_CALL_BIBFORMAT:
out_record_in_format = call_bibformat(recID, format, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
if out_record_in_format:
out += out_record_in_format
else:
out += websearch_templates.tmpl_print_record_brief(
ln = ln,
recID = recID,
)
else:
out += websearch_templates.tmpl_print_record_brief(
ln = ln,
recID = recID,
)
# at the end of HTML brief mode, print the "Detailed record" functionality:
if format == 'hp' or format.startswith("hb_") or format.startswith("hd_"):
pass # do nothing for portfolio and on-the-fly formats
else:
out += websearch_templates.tmpl_print_record_brief_links(ln=ln,
recID=recID,
sf=sf,
so=so,
sp=sp,
rm=rm,
display_claim_link=display_claim_this_paper)
# print record closing tags, if needed:
if format == "marcxml" or format == "oai_dc":
out += " Search Cache
"
req.write(out)
# show collection reclist cache:
out = "Collection reclist cache
"
out += "- collection table last updated: %s" % get_table_update_time('collection')
out += "
- reclist cache timestamp: %s" % collection_reclist_cache.timestamp
out += "
- reclist cache contents:"
out += ""
for coll in collection_reclist_cache.cache.keys():
if collection_reclist_cache.cache[coll]:
out += "%s (%d)
"
req.write(out)
# show field i18nname cache:
out = "
" % (coll, len(collection_reclist_cache.cache[coll]))
out += "Field I18N names cache
"
out += "- fieldname table last updated: %s" % get_table_update_time('fieldname')
out += "
- i18nname cache timestamp: %s" % field_i18nname_cache.timestamp
out += "
- i18nname cache contents:"
out += ""
for field in field_i18nname_cache.cache.keys():
for ln in field_i18nname_cache.cache[field].keys():
out += "%s, %s = %s
"
req.write(out)
# show collection i18nname cache:
out = "
" % (field, ln, field_i18nname_cache.cache[field][ln])
out += "Collection I18N names cache
"
out += "- collectionname table last updated: %s" % get_table_update_time('collectionname')
out += "
- i18nname cache timestamp: %s" % collection_i18nname_cache.timestamp
out += "
- i18nname cache contents:"
out += ""
for coll in collection_i18nname_cache.cache.keys():
for ln in collection_i18nname_cache.cache[coll].keys():
out += "%s, %s = %s
"
req.write(out)
req.write("")
return "\n"
def perform_request_log(req, date=""):
"""Display search log information for given date."""
req.content_type = "text/html"
req.send_http_header()
req.write("")
req.write("
" % (coll, ln, collection_i18nname_cache.cache[coll][ln])
out += "Search Log
")
if date: # case A: display stats for a day
yyyymmdd = string.atoi(date)
req.write("""")
req.write("
")
else: # case B: display summary stats per day
yyyymm01 = int(time.strftime("%Y%m01", time.localtime()))
yyyymmdd = int(time.strftime("%Y%m%d", time.localtime()))
req.write(""" " % ("No.", "Time", "Pattern", "Field", "Collection", "Number of Hits"))
# read file:
p = os.popen("grep ^%d %s/search.log" % (yyyymmdd, CFG_LOGDIR), 'r')
lines = p.readlines()
p.close()
# process lines:
i = 0
for line in lines:
try:
datetime, dummy_aas, p, f, c, nbhits = string.split(line,"#")
i += 1
req.write("%s %s %s %s %s %s " \
% (i, datetime[8:10], datetime[10:12], datetime[12:], p, f, c, nbhits))
except:
pass # ignore eventual wrong log lines
req.write("#%d %s:%s:%s %s %s %s %s """)
req.write("
")
req.write("")
return "\n"
def get_all_field_values(tag):
"""
Return all existing values stored for a given tag.
@param tag: the full tag, e.g. 909C0b
@type tag: string
@return: the list of values
@rtype: list of strings
"""
table = 'bib%02dx' % int(tag[:2])
return [row[0] for row in run_sql("SELECT DISTINCT(value) FROM %s WHERE tag=%%s" % table, (tag, ))]
def get_most_popular_field_values(recids, tags, exclude_values=None, count_repetitive_values=True, split_by=0):
"""
Analyze RECIDS and look for TAGS and return most popular values
and the frequency with which they occur sorted according to
descending frequency.
If a value is found in EXCLUDE_VALUES, then do not count it.
If COUNT_REPETITIVE_VALUES is True, then we count every occurrence
of value in the tags. If False, then we count the value only once
regardless of the number of times it may appear in a record.
(But, if the same value occurs in another record, we count it, of
course.)
@return: list of tuples containing tag and its frequency
Example:
>>> get_most_popular_field_values(range(11,20), '980__a')
[('PREPRINT', 10), ('THESIS', 7), ...]
>>> get_most_popular_field_values(range(11,20), ('100__a', '700__a'))
[('Ellis, J', 10), ('Ellis, N', 7), ...]
>>> get_most_popular_field_values(range(11,20), ('100__a', '700__a'), ('Ellis, J'))
[('Ellis, N', 7), ...]
"""
def _get_most_popular_field_values_helper_sorter(val1, val2):
"""Compare VAL1 and VAL2 according to, firstly, frequency, then
secondly, alphabetically."""
compared_via_frequencies = cmp(valuefreqdict[val2],
valuefreqdict[val1])
if compared_via_frequencies == 0:
return cmp(val1.lower(), val2.lower())
else:
return compared_via_frequencies
valuefreqdict = {}
## sanity check:
if not exclude_values:
exclude_values = []
if isinstance(tags, str):
tags = (tags,)
## find values to count:
vals_to_count = []
displaytmp = {}
if count_repetitive_values:
# counting technique A: can look up many records at once: (very fast)
for tag in tags:
vals_to_count.extend(get_fieldvalues(recids, tag, sort=False,
split_by=split_by))
else:
# counting technique B: must count record-by-record: (slow)
for recid in recids:
vals_in_rec = []
for tag in tags:
for val in get_fieldvalues(recid, tag, False):
vals_in_rec.append(val)
# do not count repetitive values within this record
# (even across various tags, so need to unify again):
dtmp = {}
for val in vals_in_rec:
dtmp[val.lower()] = 1
displaytmp[val.lower()] = val
vals_in_rec = dtmp.keys()
vals_to_count.extend(vals_in_rec)
## are we to exclude some of found values?
for val in vals_to_count:
if val not in exclude_values:
if val in valuefreqdict:
valuefreqdict[val] += 1
else:
valuefreqdict[val] = 1
## sort by descending frequency of values:
if not CFG_NUMPY_IMPORTABLE:
## original version
out = []
vals = valuefreqdict.keys()
vals.sort(_get_most_popular_field_values_helper_sorter)
for val in vals:
tmpdisplv = ''
if val in displaytmp:
tmpdisplv = displaytmp[val]
else:
tmpdisplv = val
out.append((tmpdisplv, valuefreqdict[val]))
return out
else:
f = [] # frequencies
n = [] # original names
ln = [] # lowercased names
## build lists within one iteration
for (val, freq) in iteritems(valuefreqdict):
f.append(-1 * freq)
if val in displaytmp:
n.append(displaytmp[val])
else:
n.append(val)
ln.append(val.lower())
## sort by frequency (desc) and then by lowercased name.
return [(n[i], -1 * f[i]) for i in numpy.lexsort([ln, f])]
def profile(p="", f="", c=CFG_SITE_NAME):
"""Profile search time."""
import profile
import pstats
profile.run("perform_request_search(p='%s',f='%s', c='%s')" % (p, f, c), "perform_request_search_profile")
p = pstats.Stats("perform_request_search_profile")
p.strip_dirs().sort_stats("cumulative").print_stats()
return 0
def perform_external_collection_search_with_em(req, current_collection, pattern_list, field,
external_collection, verbosity_level=0, lang=CFG_SITE_LANG,
selected_external_collections_infos=None, em=""):
perform_external_collection_search(req, current_collection, pattern_list, field, external_collection,
verbosity_level, lang, selected_external_collections_infos,
print_overview=em == "" or EM_REPOSITORY["overview"] in em,
print_search_info=em == "" or EM_REPOSITORY["search_info"] in em,
print_see_also_box=em == "" or EM_REPOSITORY["see_also_box"] in em,
print_body=em == "" or EM_REPOSITORY["body"] in em)
@cache.memoize(timeout=5)
def get_fulltext_terms_from_search_pattern(search_pattern):
keywords = []
if search_pattern is not None:
for unit in create_basic_search_units(None, search_pattern.encode('utf-8'), None):
bsu_o, bsu_p, bsu_f, bsu_m = unit[0], unit[1], unit[2], unit[3]
if (bsu_o != '-' and bsu_f in [None, 'fulltext']):
if bsu_m == 'a' and bsu_p.startswith('%') and bsu_p.endswith('%'):
# remove leading and training `%' representing partial phrase search
keywords.append(bsu_p[1:-1])
else:
keywords.append(bsu_p)
return keywords
diff --git a/invenio/legacy/webhelp/web/admin/howto/howto-authority.webdoc b/invenio/legacy/webhelp/web/admin/howto/howto-authority.webdoc
index 78a2b1491..9314de0ea 100644
--- a/invenio/legacy/webhelp/web/admin/howto/howto-authority.webdoc
+++ b/invenio/legacy/webhelp/web/admin/howto/howto-authority.webdoc
@@ -1,99 +1,99 @@
## -*- mode: html; coding: utf-8; -*-
## This file is part of Invenio.
-## Copyright (C) 2007, 2008, 2009, 2010, 2011 CERN.
+## Copyright (C) 2007, 2008, 2009, 2010, 2011, 2013 CERN.
##
## Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
" % ("Day", "Number of Queries"))
for day in range(yyyymm01, yyyymmdd + 1):
p = os.popen("grep -c ^%d %s/search.log" % (day, CFG_LOGDIR), 'r')
for line in p.readlines():
req.write("""%s %s """ % \
(day, CFG_SITE_URL, day, line))
p.close()
req.write("%s %s Introduction
How to MARC authority records
1. The 980 field
-2. The 035 field
3. Links between MARC records
3.1 Creating a reference from a bibliographic record
100__ $a Ellis, J.
$0 AUTHOR:(CERN)abc123
$u CERN
- $0 INSTITUTION:(CERN)xyz789
+ $0 INSTITUTE:(CERN)xyz789
-3.2 Creating links between authority records
-
Subfield codes:
$a - Corporate name or jurisdiction name as entry element (NR)
e.g. "SLAC National Accelerator Laboratory" or "European Organization for Nuclear Research"
$w - Control subfield (NR)
'a' - for predecessor
'b' - for successor
't' - for top / parent
$4 - Relationship code (R)
- The control number of the referenced authority record,
+ The control number of the referenced authority record,
e.g. "(CERN)iii000"
-
Example: You want to add a predecessor to an INSTITUTION authority record. Let's say "Institution A" has control number "(CERN)iii000" and its successor "Institution B" has control number "(CERN)iii001". In order to designate Institution A as predecessor of Institution B, we would add a 510 field to Institution B with a $w value of 'a', a $a value of 'Institution A', and a $4 value of '(CERN)iii000' like this: +
Example: You want to add a predecessor to an INSTITUTE authority record. Let's say "Institute A" has control number "(CERN)iii000" and its successor "Institute B" has control number "(CERN)iii001". In order to designate Institute A as predecessor of Institute B, we would add a 510 field to Institute B with a $w value of 'a', a $a value of 'Institute A', and a $4 value of '(CERN)iii000' like this:
-510__ $a Institution A +510__ $a Institute A $w a - $4 INSTITUTION:(CERN)iii000 + $4 INSTITUTE:(CERN)iii000
All other MARC fields should follow the MARC 21 Format for Authority Data
Once the authority records have been given the appropriate '980__a' values (cf. above), creating a collection of authority records is no different from creating any other collection in INVENIO. You can simply define a new collection defined by the usual collection query 'collection:AUTHOR' for author authority records, or 'collection:INSTITUTION' for institutions, etc.
+Once the authority records have been given the appropriate '980__a' values (cf. above), creating a collection of authority records is no different from creating any other collection in INVENIO. You can simply define a new collection defined by the usual collection query 'collection:AUTHOR' for author authority records, or 'collection:INSTITUTE' for institutes, etc.
The recommended way of creating collections for authority records is to create a “virtual collection” for the main 'collection:AUTHORITY' collection and then add the individual authority record collections as regular children of this collection. This will allow you to browse and search within authority records without making this the default for all INVENIO searches.
When using BibEdit to modify MARC meta-data of bibliographic records, certain fields may be configured (by the admin of your INVENIO installation) to offer you auto-complete functionality based upon the data contained in authority records for that field. For example, if MARC subfield 100__ $a was configured to be under authority control, then typing the beginning of a word into this subfield will trigger a drop-down list, offering you a choice of values to choose from. When you click on one of the entries in the drop-down list, this will not only populate the immediate subfield you are editing, but it will also insert a reference into a new $0 subfield of the same MARC field you are editing. This reference tells the system that the author you are referring to is the author as contained in the 'author' authority record with the given authority record control number.
The illustration below demonstrates how this works:
Typing “Elli” into the 100__ $a subfield will present you with a list of authors that contain a word starting with “Elli” somewhere in their name. In case there are multiple authors with similar or identical names (as is the case in the example shown here), you will receive additional information about these authors to help you disambiguate. The fields to be used for disambiguation can be configured by your INVENIO administrator. If such fields have not been configured, or if they are not sufficient for disambiguation, the authority record control number will be used to assure a unique value for each entry in the drop-down list. In the example above, the first author can be uniquely identified by his email address, whereas for the latter we have only the authority record control number as uniquely identifying characteristic.
-If in the shown example you click on the first author from the list, this author's name will automatically be inserted into the 100__ $a subfield you were editing, while the authority type and the authority record control number “author:(SzGeCERN)abc123” , is inserted into a new $0 subfield (cf. Illustration 2). This new subfield tells INVENIO that “Ellis, John” is associated with the 'author' authority record containing the authority record control number “(SzGeCERN)abc123”. In this example you can also see that the author's affiliation has been entered in the same way as well, using the auto-complete option for the 100__ $u subfield. In this case the author's affiliation is the “University of Oxford”, which is associated in this INVENIO installation with the 'institution' authority record containing the authority record control number “(SzGeCERN)inst0001”.
-If INVENIO has no authority record data to match what you type into the authority-controlled subfield, you still have the possibility to enter a value manually.
\ No newline at end of file +If in the shown example you click on the first author from the list, this author's name will automatically be inserted into the 100__ $a subfield you were editing, while the authority type and the authority record control number “author:(SzGeCERN)abc123” , is inserted into a new $0 subfield (cf. Illustration 2). This new subfield tells INVENIO that “Ellis, John” is associated with the 'author' authority record containing the authority record control number “(SzGeCERN)abc123”. In this example you can also see that the author's affiliation has been entered in the same way as well, using the auto-complete option for the 100__ $u subfield. In this case the author's affiliation is the “University of Oxford”, which is associated in this INVENIO installation with the 'institute' authority record containing the authority record control number “(SzGeCERN)inst0001”.
+If INVENIO has no authority record data to match what you type into the authority-controlled subfield, you still have the possibility to enter a value manually.
diff --git a/invenio/legacy/websearch/templates.py b/invenio/legacy/websearch/templates.py index 9184b1356..b86b00647 100644 --- a/invenio/legacy/websearch/templates.py +++ b/invenio/legacy/websearch/templates.py @@ -1,4643 +1,4643 @@ # -*- coding: utf-8 -*- ## This file is part of Invenio. ## Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013 CERN. ## ## Invenio is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## Invenio is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with Invenio; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. # pylint: disable=C0301 __revision__ = "$Id$" import time import cgi import string import re import locale from six import iteritems from urllib import quote, urlencode from xml.sax.saxutils import escape as xml_escape from invenio.config import \ CFG_WEBSEARCH_LIGHTSEARCH_PATTERN_BOX_WIDTH, \ CFG_WEBSEARCH_SIMPLESEARCH_PATTERN_BOX_WIDTH, \ CFG_WEBSEARCH_ADVANCEDSEARCH_PATTERN_BOX_WIDTH, \ CFG_WEBSEARCH_AUTHOR_ET_AL_THRESHOLD, \ CFG_WEBSEARCH_USE_ALEPH_SYSNOS, \ CFG_WEBSEARCH_SPLIT_BY_COLLECTION, \ CFG_WEBSEARCH_DEF_RECORDS_IN_GROUPS, \ CFG_BIBRANK_SHOW_READING_STATS, \ CFG_BIBRANK_SHOW_DOWNLOAD_STATS, \ CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS, \ CFG_BIBRANK_SHOW_CITATION_LINKS, \ CFG_BIBRANK_SHOW_CITATION_STATS, \ CFG_BIBRANK_SHOW_CITATION_GRAPHS, \ CFG_WEBSEARCH_RSS_TTL, \ CFG_SITE_LANG, \ CFG_SITE_NAME, \ CFG_SITE_NAME_INTL, \ CFG_VERSION, \ CFG_SITE_URL, \ CFG_SITE_SUPPORT_EMAIL, \ CFG_SITE_ADMIN_EMAIL, \ CFG_CERN_SITE, \ CFG_INSPIRE_SITE, \ CFG_WEBSEARCH_DEFAULT_SEARCH_INTERFACE, \ CFG_WEBSEARCH_ENABLED_SEARCH_INTERFACES, \ CFG_WEBSEARCH_MAX_RECORDS_IN_GROUPS, \ CFG_BIBINDEX_CHARS_PUNCTUATION, \ CFG_WEBCOMMENT_ALLOW_COMMENTS, \ CFG_WEBCOMMENT_ALLOW_REVIEWS, \ CFG_WEBSEARCH_WILDCARD_LIMIT, \ CFG_WEBSEARCH_SHOW_COMMENT_COUNT, \ CFG_WEBSEARCH_SHOW_REVIEW_COUNT, \ CFG_SITE_RECORD, \ CFG_WEBSEARCH_PREV_NEXT_HIT_LIMIT from invenio.legacy.dbquery import run_sql from invenio.base.i18n import gettext_set_language from invenio.base.globals import cfg from invenio.utils.url import make_canonical_urlargd, drop_default_urlargd, create_html_link, create_url from invenio.utils.html import nmtoken_from_string from invenio.ext.legacy.handler import wash_urlargd from invenio.legacy.bibrank.citation_searcher import get_cited_by_count from invenio.legacy.webuser import session_param_get from invenio.modules.search.services import \ CFG_WEBSEARCH_MAX_SEARCH_COLL_RESULTS_TO_PRINT from intbitset import intbitset from invenio.legacy.websearch_external_collections import external_collection_get_state, get_external_collection_engine from invenio.legacy.websearch_external_collections.utils import get_collection_id from invenio.legacy.websearch_external_collections.config import CFG_EXTERNAL_COLLECTION_MAXRESULTS from invenio.legacy.bibrecord import get_fieldvalues _RE_PUNCTUATION = re.compile(CFG_BIBINDEX_CHARS_PUNCTUATION) _RE_SPACES = re.compile(r"\s+") class Template: # This dictionary maps Invenio language code to locale codes (ISO 639) tmpl_localemap = { 'bg': 'bg_BG', 'ar': 'ar_AR', 'ca': 'ca_ES', 'de': 'de_DE', 'el': 'el_GR', 'en': 'en_US', 'es': 'es_ES', 'pt': 'pt_BR', 'fa': 'fa_IR', 'fr': 'fr_FR', 'it': 'it_IT', 'ka': 'ka_GE', 'lt': 'lt_LT', 'ro': 'ro_RO', 'ru': 'ru_RU', 'rw': 'rw_RW', 'sk': 'sk_SK', 'cs': 'cs_CZ', 'no': 'no_NO', 'sv': 'sv_SE', 'uk': 'uk_UA', 'ja': 'ja_JA', 'pl': 'pl_PL', 'hr': 'hr_HR', 'zh_CN': 'zh_CN', 'zh_TW': 'zh_TW', 'hu': 'hu_HU', 'af': 'af_ZA', 'gl': 'gl_ES' } tmpl_default_locale = "en_US" # which locale to use by default, useful in case of failure # Type of the allowed parameters for the web interface for search results @property def search_results_default_urlargd(self): from invenio.modules.search.washers import \ search_results_default_urlargd return search_results_default_urlargd # ...and for search interfaces search_interface_default_urlargd = { 'aas': (int, CFG_WEBSEARCH_DEFAULT_SEARCH_INTERFACE), 'as': (int, CFG_WEBSEARCH_DEFAULT_SEARCH_INTERFACE), 'verbose': (int, 0), 'em' : (str, "")} # ...and for RSS feeds rss_default_urlargd = {'c' : (list, []), 'cc' : (str, ""), 'p' : (str, ""), 'f' : (str, ""), 'p1' : (str, ""), 'f1' : (str, ""), 'm1' : (str, ""), 'op1': (str, ""), 'p2' : (str, ""), 'f2' : (str, ""), 'm2' : (str, ""), 'op2': (str, ""), 'p3' : (str, ""), 'f3' : (str, ""), 'm3' : (str, ""), 'wl' : (int, CFG_WEBSEARCH_WILDCARD_LIMIT)} tmpl_openurl_accepted_args = { 'id' : (list, []), 'genre' : (str, ''), 'aulast' : (str, ''), 'aufirst' : (str, ''), 'auinit' : (str, ''), 'auinit1' : (str, ''), 'auinitm' : (str, ''), 'issn' : (str, ''), 'eissn' : (str, ''), 'coden' : (str, ''), 'isbn' : (str, ''), 'sici' : (str, ''), 'bici' : (str, ''), 'title' : (str, ''), 'stitle' : (str, ''), 'atitle' : (str, ''), 'volume' : (str, ''), 'part' : (str, ''), 'issue' : (str, ''), 'spage' : (str, ''), 'epage' : (str, ''), 'pages' : (str, ''), 'artnum' : (str, ''), 'date' : (str, ''), 'ssn' : (str, ''), 'quarter' : (str, ''), 'url_ver' : (str, ''), 'ctx_ver' : (str, ''), 'rft_val_fmt' : (str, ''), 'rft_id' : (list, []), 'rft.atitle' : (str, ''), 'rft.title' : (str, ''), 'rft.jtitle' : (str, ''), 'rft.stitle' : (str, ''), 'rft.date' : (str, ''), 'rft.volume' : (str, ''), 'rft.issue' : (str, ''), 'rft.spage' : (str, ''), 'rft.epage' : (str, ''), 'rft.pages' : (str, ''), 'rft.artnumber' : (str, ''), 'rft.issn' : (str, ''), 'rft.eissn' : (str, ''), 'rft.aulast' : (str, ''), 'rft.aufirst' : (str, ''), 'rft.auinit' : (str, ''), 'rft.auinit1' : (str, ''), 'rft.auinitm' : (str, ''), 'rft.ausuffix' : (str, ''), 'rft.au' : (list, []), 'rft.aucorp' : (str, ''), 'rft.isbn' : (str, ''), 'rft.coden' : (str, ''), 'rft.sici' : (str, ''), 'rft.genre' : (str, 'unknown'), 'rft.chron' : (str, ''), 'rft.ssn' : (str, ''), 'rft.quarter' : (int, ''), 'rft.part' : (str, ''), 'rft.btitle' : (str, ''), 'rft.isbn' : (str, ''), 'rft.atitle' : (str, ''), 'rft.place' : (str, ''), 'rft.pub' : (str, ''), 'rft.edition' : (str, ''), 'rft.tpages' : (str, ''), 'rft.series' : (str, ''), } tmpl_opensearch_rss_url_syntax = "%(CFG_SITE_URL)s/rss?p={searchTerms}&jrec={startIndex}&rg={count}&ln={language}" % {'CFG_SITE_URL': CFG_SITE_URL} tmpl_opensearch_html_url_syntax = "%(CFG_SITE_URL)s/search?p={searchTerms}&jrec={startIndex}&rg={count}&ln={language}" % {'CFG_SITE_URL': CFG_SITE_URL} def tmpl_openurl2invenio(self, openurl_data): """ Return an Invenio url corresponding to a search with the data included in the openurl form map. """ def isbn_to_isbn13_isbn10(isbn): isbn = isbn.replace(' ', '').replace('-', '') if len(isbn) == 10 and isbn.isdigit(): ## We already have isbn10 return ('', isbn) if len(isbn) != 13 and isbn.isdigit(): return ('', '') isbn13, isbn10 = isbn, isbn[3:-1] checksum = 0 weight = 10 for char in isbn10: checksum += int(char) * weight weight -= 1 checksum = 11 - (checksum % 11) if checksum == 10: isbn10 += 'X' if checksum == 11: isbn10 += '0' else: isbn10 += str(checksum) return (isbn13, isbn10) from invenio.legacy.search_engine import perform_request_search doi = '' pmid = '' bibcode = '' oai = '' issn = '' isbn = '' for elem in openurl_data['id']: if elem.startswith('doi:'): doi = elem[len('doi:'):] elif elem.startswith('pmid:'): pmid = elem[len('pmid:'):] elif elem.startswith('bibcode:'): bibcode = elem[len('bibcode:'):] elif elem.startswith('oai:'): oai = elem[len('oai:'):] for elem in openurl_data['rft_id']: if elem.startswith('info:doi/'): doi = elem[len('info:doi/'):] elif elem.startswith('info:pmid/'): pmid = elem[len('info:pmid/'):] elif elem.startswith('info:bibcode/'): bibcode = elem[len('info:bibcode/'):] elif elem.startswith('info:oai/'): oai = elem[len('info:oai/')] elif elem.startswith('urn:ISBN:'): isbn = elem[len('urn:ISBN:'):] elif elem.startswith('urn:ISSN:'): issn = elem[len('urn:ISSN:'):] ## Building author query aulast = openurl_data['rft.aulast'] or openurl_data['aulast'] aufirst = openurl_data['rft.aufirst'] or openurl_data['aufirst'] auinit = openurl_data['rft.auinit'] or \ openurl_data['auinit'] or \ openurl_data['rft.auinit1'] + ' ' + openurl_data['rft.auinitm'] or \ openurl_data['auinit1'] + ' ' + openurl_data['auinitm'] or aufirst[:1] auinit = auinit.upper() if aulast and aufirst: author_query = 'author:"%s, %s" or author:"%s, %s"' % (aulast, aufirst, aulast, auinit) elif aulast and auinit: author_query = 'author:"%s, %s"' % (aulast, auinit) else: author_query = '' ## Building title query title = openurl_data['rft.atitle'] or \ openurl_data['atitle'] or \ openurl_data['rft.btitle'] or \ openurl_data['rft.title'] or \ openurl_data['title'] if title: title_query = 'title:"%s"' % title title_query_cleaned = 'title:"%s"' % _RE_SPACES.sub(' ', _RE_PUNCTUATION.sub(' ', title)) else: title_query = '' ## Building journal query jtitle = openurl_data['rft.stitle'] or \ openurl_data['stitle'] or \ openurl_data['rft.jtitle'] or \ openurl_data['title'] if jtitle: journal_query = 'journal:"%s"' % jtitle else: journal_query = '' ## Building isbn query isbn = isbn or openurl_data['rft.isbn'] or \ openurl_data['isbn'] isbn13, isbn10 = isbn_to_isbn13_isbn10(isbn) if isbn13: isbn_query = 'isbn:"%s" or isbn:"%s"' % (isbn13, isbn10) elif isbn10: isbn_query = 'isbn:"%s"' % isbn10 else: isbn_query = '' ## Building issn query issn = issn or openurl_data['rft.eissn'] or \ openurl_data['eissn'] or \ openurl_data['rft.issn'] or \ openurl_data['issn'] if issn: issn_query = 'issn:"%s"' % issn else: issn_query = '' ## Building coden query coden = openurl_data['rft.coden'] or openurl_data['coden'] if coden: coden_query = 'coden:"%s"' % coden else: coden_query = '' ## Building doi query if False: #doi: #FIXME Temporaly disabled until doi field is properly setup doi_query = 'doi:"%s"' % doi else: doi_query = '' ## Trying possible searches if doi_query: if perform_request_search(p=doi_query): return '%s/search?%s' % (CFG_SITE_URL, urlencode({ 'p' : doi_query, 'sc' : CFG_WEBSEARCH_SPLIT_BY_COLLECTION, 'of' : 'hd'})) if isbn_query: if perform_request_search(p=isbn_query): return '%s/search?%s' % (CFG_SITE_URL, urlencode({ 'p' : isbn_query, 'sc' : CFG_WEBSEARCH_SPLIT_BY_COLLECTION, 'of' : 'hd'})) if coden_query: if perform_request_search(p=coden_query): return '%s/search?%s' % (CFG_SITE_URL, urlencode({ 'p' : coden_query, 'sc' : CFG_WEBSEARCH_SPLIT_BY_COLLECTION, 'of' : 'hd'})) if author_query and title_query: if perform_request_search(p='%s and %s' % (title_query, author_query)): return '%s/search?%s' % (CFG_SITE_URL, urlencode({ 'p' : '%s and %s' % (title_query, author_query), 'sc' : CFG_WEBSEARCH_SPLIT_BY_COLLECTION, 'of' : 'hd'})) if title_query: result = len(perform_request_search(p=title_query)) if result == 1: return '%s/search?%s' % (CFG_SITE_URL, urlencode({ 'p' : title_query, 'sc' : CFG_WEBSEARCH_SPLIT_BY_COLLECTION, 'of' : 'hd'})) elif result > 1: return '%s/search?%s' % (CFG_SITE_URL, urlencode({ 'p' : title_query, 'sc' : CFG_WEBSEARCH_SPLIT_BY_COLLECTION, 'of' : 'hb'})) ## Nothing worked, let's return a search that the user can improve if author_query and title_query: return '%s/search%s' % (CFG_SITE_URL, make_canonical_urlargd({ 'p' : '%s and %s' % (title_query_cleaned, author_query), 'sc' : CFG_WEBSEARCH_SPLIT_BY_COLLECTION, 'of' : 'hb'}, {})) elif title_query: return '%s/search%s' % (CFG_SITE_URL, make_canonical_urlargd({ 'p' : title_query_cleaned, 'sc' : CFG_WEBSEARCH_SPLIT_BY_COLLECTION, 'of' : 'hb'}, {})) else: ## Mmh. Too few information provided. return '%s/search%s' % (CFG_SITE_URL, make_canonical_urlargd({ 'p' : 'recid:-1', 'sc' : CFG_WEBSEARCH_SPLIT_BY_COLLECTION, 'of' : 'hb'}, {})) def tmpl_opensearch_description(self, ln): """ Returns the OpenSearch description file of this site. """ _ = gettext_set_language(ln) return """%(example)s
''' % {'example': _("Example: %(x_sample_search_query)s") % \ {'x_sample_search_query': example_query_link}, 'more': more} # display options to search in current collection or everywhere search_in = '' if collection_name != CFG_SITE_NAME_INTL.get(ln, CFG_SITE_NAME): search_in += ''' ''' % {'search_in_collection_name': _("Search in %(x_collection_name)s") % \ {'x_collection_name': collection_name}, 'collection_id': collection_id, 'root_collection_name': CFG_SITE_NAME, 'search_everywhere': _("Search everywhere")} # print commentary start: out += ''' %(more)s%(example_query_html)s |
%(msg_search_tips)s %(asearch)s |
%(header)s | ||
---|---|---|
%(middle_option)s | ||
%(msg_search_tips)s :: %(asearch)s |
%(header)s | ||
---|---|---|
%(matchbox_m1)s | %(middle_option_1)s | %(andornot_op1)s |
%(matchbox_m2)s | %(middle_option_2)s | %(andornot_op2)s |
%(matchbox_m3)s | %(middle_option_3)s | |
%(msg_search_tips)s :: %(ssearch)s |
%(searchheader)s |
---|
%(searchoptions)s |
%(added)s | %(until)s |
---|---|
%(added_or_modified)s %(date_added)s | %(date_until)s |
%(msg_sort)s | %(msg_display)s | %(msg_format)s |
---|---|---|
%(sortoptions)s %(rankoptions)s | %(displayoptions)s | %(formatoptions)s |
%(title)s | |
---|---|
""" % \ { 'narrowsearchbox': {'r': 'narrowsearchbox', 'v': 'focusonsearchbox'}[type]} if type == 'r': if son.restricted_p() and son.restricted_p() != father.restricted_p(): out += """ | """ % {'name' : cgi.escape(son.name) } # hosted collections are checked by default only when configured so elif str(son.dbquery).startswith("hostedcollection:"): external_collection_engine = get_external_collection_engine(str(son.name)) if external_collection_engine and external_collection_engine.selected_by_default: out += """""" % {'name' : cgi.escape(son.name) } elif external_collection_engine and not external_collection_engine.selected_by_default: out += """""" % {'name' : cgi.escape(son.name) } else: # strangely, the external collection engine was never found. In that case, # why was the hosted collection here in the first place? out += """""" % {'name' : cgi.escape(son.name) } else: out += """""" % {'name' : cgi.escape(son.name) } else: out += '' out += """%(link)s%(recs)s """ % {
'link': create_html_link(self.build_search_interface_url(c=son.name, ln=ln, aas=aas),
{}, style_prolog + cgi.escape(son.get_name(ln)) + style_epilog),
'recs' : self.tmpl_nbrecs_info(son.nbrecs, ln=ln)}
# the following prints the "external collection" arrow just after the name and
# number of records of the hosted collection
# 1) we might want to make the arrow work as an anchor to the hosted collection as well.
# That would probably require a new separate function under invenio.utils.url
# 2) we might want to place the arrow between the name and the number of records of the hosted collection
# That would require to edit/separate the above out += ...
if type == 'r':
if str(son.dbquery).startswith("hostedcollection:"):
out += """""" % \
{ 'siteurl' : CFG_SITE_URL, 'name' : cgi.escape(son.name), }
if son.restricted_p():
out += """ [%(msg)s] """ % { 'msg' : _("restricted") }
if display_grandsons and len(grandsons[i]):
# iterate trough grandsons:
out += """ """ for grandson in grandsons[i]: out += """ %(link)s%(nbrec)s """ % { 'link': create_html_link(self.build_search_interface_url(c=grandson.name, ln=ln, aas=aas), {}, cgi.escape(grandson.get_name(ln))), 'nbrec' : self.tmpl_nbrecs_info(grandson.nbrecs, ln=ln)} # the following prints the "external collection" arrow just after the name and # number of records of the hosted collection # Some relatives comments have been made just above if type == 'r': if str(grandson.dbquery).startswith("hostedcollection:"): out += """""" % \ { 'siteurl' : CFG_SITE_URL, 'name' : cgi.escape(grandson.name), } out += """ |
|
'
for recid in recids:
if grid_layout:
body += '''
%(body)s
''' % {
'recid': recid['id'],
'body': recid['body']}
else:
body += '''
%(date)s |
%(body)s
|
|
%(header)s |
---|
%(body)s |
'
if f:
out += _("Words nearest to %(x_word)s inside %(x_field)s in any collection are:") % {'x_word': '' + cgi.escape(p) + '',
'x_field': '' + cgi.escape(f) + ''}
else:
out += _("Words nearest to %(x_word)s in any collection are:") % {'x_word': '' + cgi.escape(p) + ''}
out += '
' + nearest_box + '
%(hits)s | %(term)s |
" + out + "" def tmpl_browse_pattern(self, f, fn, ln, browsed_phrases_in_colls, colls, rg): """ Displays the *Nearest search terms* box Parameters: - 'f' *string* - field (*not* i18nized) - 'fn' *string* - field name (i18nized) - 'ln' *string* - The language to display - 'browsed_phrases_in_colls' *array* - the phrases to display - 'colls' *array* - the list of collection parameters of the search (c's) - 'rg' *int* - the number of records """ # load the right message language _ = gettext_set_language(ln) out = """
%(hits)s | %(fn)s | |
---|---|---|
%(nbhits)s | %(link)s | |
%(nbhits)s | %(link)s | |
%(link_previous)s %(link_next)s |
%(collection_link)s | ''' % { 'collection_id': collection_id, 'siteurl' : CFG_SITE_URL, 'collection_link': create_html_link(self.build_search_interface_url(c=collection, aas=aas, ln=ln), {}, cgi.escape(collection_name)) } else: out += """%(recs_found)s """ % { 'recs_found' : _("%(x_rec)s records found", x_rec=('' + self.tmpl_nice_number(nb_found, ln) + '')) } #elif nb_found = -963: # out += """ | # %(recs_found)s """ % { # 'recs_found' : _("%s records found") % ('' + self.tmpl_nice_number(nb_found, ln) + '') # } else: out += "" # we do not care about timed out hosted collections here, because the bumber of records found will never be bigger # than rg anyway, since it's negative if nb_found > rg: out += "" + cgi.escape(collection_name) + " : " + _("%(x_rec)s records found", x_rec=('' + self.tmpl_nice_number(nb_found, ln) + '')) + " " if nb_found > rg: # navig.arrows are needed, since we have many hits query = {'p': p, 'f': f, 'cc': collection, 'sf': sf, 'so': so, 'sp': sp, 'rm': rm, 'of': of, 'ot': ot, 'aas': aas, 'ln': ln, 'p1': p1, 'p2': p2, 'p3': p3, 'f1': f1, 'f2': f2, 'f3': f3, 'm1': m1, 'm2': m2, 'm3': m3, 'op1': op1, 'op2': op2, 'sc': 0, 'd1y': d1y, 'd1m': d1m, 'd1d': d1d, 'd2y': d2y, 'd2m': d2m, 'd2d': d2d, 'dt': dt, } # @todo here def img(gif, txt): return '' % { 'txt': txt, 'gif': gif, 'siteurl': CFG_SITE_URL} if jrec - rg > 1: out += create_html_link(self.build_search_url(query, jrec=1, rg=rg), {}, img('sb', _("begin")), {'class': 'img'}) if jrec > 1: out += create_html_link(self.build_search_url(query, jrec=max(jrec - rg, 1), rg=rg), {}, img('sp', _("previous")), {'class': 'img'}) if jrec + rg - 1 < nb_found: out += "%d - %d" % (jrec, jrec + rg - 1) else: out += "%d - %d" % (jrec, nb_found) if nb_found >= jrec + rg: out += create_html_link(self.build_search_url(query, jrec=jrec + rg, rg=rg), {}, img('sn', _("next")), {'class':'img'}) if nb_found >= jrec + rg + rg: out += create_html_link(self.build_search_url(query, jrec=nb_found - rg + 1, rg=rg), {}, img('se', _("end")), {'class': 'img'}) # still in the navigation part cc = collection sc = 0 for var in ['p', 'cc', 'f', 'sf', 'so', 'of', 'rg', 'aas', 'ln', 'p1', 'p2', 'p3', 'f1', 'f2', 'f3', 'm1', 'm2', 'm3', 'op1', 'op2', 'sc', 'd1y', 'd1m', 'd1d', 'd2y', 'd2m', 'd2d', 'dt']: out += self.tmpl_input_hidden(name=var, value=vars()[var]) for var in ['ot', 'sp', 'rm']: if vars()[var]: out += self.tmpl_input_hidden(name=var, value=vars()[var]) if pl_in_url: fieldargs = cgi.parse_qs(pl_in_url) for fieldcode in all_fieldcodes: # get_fieldcodes(): if fieldcode in fieldargs: for val in fieldargs[fieldcode]: out += self.tmpl_input_hidden(name=fieldcode, value=val) out += """ %(jump)s """ % { 'jump' : _("jump to record:"), 'jrec' : jrec, } if not middle_only: out += " | " else: out += "" # right table cell: cpu time info if not middle_only: if cpu_time > -1: out += """%(time)s | """ % { 'time' : _("Search took %(x_sec)s seconds.", x_sec=('%.2f' % cpu_time)), } out += "
%(founds)s | ||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
""" % {
'founds' : _("%(x_fmt_open)sResults overview:%(x_fmt_close)s Found %(x_nb_records)s records in %(x_nb_seconds)s seconds.") % \
{'x_fmt_open': '',
'x_fmt_close': '',
'x_nb_records': '' + self.tmpl_nice_number(results_final_nb_total, ln) + '',
'x_nb_seconds': '%.2f' % cpu_time}
}
# if there were (only) hosted_collections that timed out during the pre-search print out a fuzzier message
else:
if results_final_nb_total == 0:
out = """
|