diff --git a/INSTALL b/INSTALL
index 3c9b54200..4914b8cfa 100644
--- a/INSTALL
+++ b/INSTALL
@@ -1,554 +1,557 @@
CDS Invenio INSTALLATION
========================
Revision: $Id$
About
=====
This document specifies how to build, customize, and install CDS
Invenio for the first time. See RELEASE-NOTES if you are upgrading
from a previous CDS Invenio release.
Contents
========
0. Prerequisites
1. Quick instructions for the impatient CDS Invenio admin
2. Detailed instructions for the patient CDS Invenio admin
0. Prerequisites
================
Here is the software you need to have around before you
start installing CDS Invenio:
a) Unix-like operating system. The main development and
production platforms for CDS Invenio at CERN are GNU/Linux
distributions SLC (RHEL), Debian, and Gentoo, but we also
develop on FreeBSD and Mac OS X. Basically any Unix system
supporting the software listed below should do.
Note that if you are using Debian "Sarge" GNU/Linux, you can
install most of the below-mentioned prerequisites and
recommendations by running:
$ sudo apt-get install libapache2-mod-python2.3 \
apache2-mpm-prefork mysql-server-4.1 mysql-client-4.1 \
python2.3-mysqldb python2.3-4suite \
python2.3-xml python2.3-libxml2 python2.3-libxslt1 \
rxp gnuplot xpdf-utils gs-common antiword catdoc \
wv html2text ppthtml xlhtml clisp gettext
You can also install the following packages:
$ sudo apt-get install python2.3-psyco sbcl cmucl
The last three packages are not available on all Debian
"Sarge" GNU/Linux architectures (e.g. not on AMD64), but they
are only recommended so you can safely continue without them.
Note that you can consult CDS Invenio wiki pages at
for more
system-specific notes.
Note that the web application server should run a Message
Transfer Agent (MTA) such as Postfix so that CDS Invenio can
email notification alerts or registration information to the
end users, contact moderators and reviewers of submitted
documents, inform administrators about various runtime system
information, etc.
b) MySQL server (may be on a remote machine), and MySQL client
(must be available locally too). MySQL versions 4.1 or 5.0
are supported. Please set the variable "max_allowed_packet"
in your "my.cnf" init file to at least 4M. You may also want
to run your MySQL server natively in UTF-8 mode by setting
"default-character-set=utf8" in various parts of your "my.cnf"
file, such as in the "[mysql]" part and elsewhere.
c) Apache 2 server, with support for loading DSO modules, and
optionally with SSL support for HTTPS-secure user
authentication. Tested mainly with version 2.0.43 and above.
Apache 2.x is required for the mod_python module (see below).
d) Python v2.3 or above:
as well as the following Python modules:
- (mandatory) MySQLdb (version >= 1.2.1_p2; see below)
- (recommended) PyXML, for XML processing:
- (recommended) PyRXP, for very fast XML MARC processing:
- (recommended) libxml2-python, for XML/XLST processing:
- (recommended) Gnuplot.Py, for producing graphs:
- (recommended) Snowball Stemmer, for stemming:
- (optional) 4suite, slower alternative to PyRXP and
libxml2-python:
- (optional) feedparser, for web journal creation:
- (optional) Psyco, to speed up the code at places:
- (optional) RDFLib, to use RDF ontologies and thesauri:
- (optional) mechanize, to run regression web test suite:
Note: MySQLdb version 1.2.1_p2 or higher is recommended. If
you are using an older version of MySQLdb, you may get
into problems with character encoding.
e) mod_python Apache module. Tested mainly with versions
3.0BETA4 and above. mod_python 3.x is required for Apache 2.
Previous versions (as well as Apache 1 ones) exhibited some
problems with MySQL connectivity in our experience.
f) If you want to be able to extract references from PDF fulltext
files, then you need to install pdftotext version 3 at least.
g) If you want to be able to search for words in the fulltext
files (i.e. to have fulltext indexing) or to stamp submitted
files, then you need as well to install some of the following
tools:
- for PDF file stamping: pdftk, pdf2ps
- for PDF files: pdftotext or pstotext
- for PostScript files: pstotext or ps2ascii
- for MS Word files: antiword, catdoc, or wvText
- for MS PowerPoint files: pptHtml and html2text
- for MS Excel files: xlhtml and html2text
h) If you have chosen to install fast XML MARC Python processors
in the step d) above, then you have to install the parsers
themselves:
- (optional) RXP:
- (optional) 4suite:
i) (recommended) Gnuplot, the command-line driven interactive
plotting program. It is used to display download and citation
history graphs on the Detailed record pages on the web
interface. Note that Gnuplot must be compiled with PNG output
support, that is, with the GD library. Note also that Gnuplot
is not required, only recommended.
j) (recommended) A Common Lisp implementation, such as CLISP,
SBCL or CMUCL. It is used for the web server log analysing
tool and the metadata checking program. Note that any of the
three implementations CLISP, SBCL, or CMUCL will do. CMUCL
produces fastest machine code, but it does not support UTF-8
yet. Pick up CLISP if you don't know what to do. Note that a
Common Lisp implementation is not required, only recommended.
k) GNU gettext, a set of tools that makes it possible to
translate the application in multiple languages.
This is available by default on many systems.
Note that the configure script checks whether you have all the
prerequisite software installed and that it won't let you continue
unless everything is in order. It also warns you if it cannot find
some optional but recommended software.
1. Quick instructions for the impatient CDS Invenio admin
=========================================================
1a. Installation
----------------
$ cd /usr/local/src/
$ wget http://cdsware.cern.ch/download/cds-invenio-0.99.0.tar.gz
$ wget http://cdsware.cern.ch/download/cds-invenio-0.99.0.tar.gz.md5
$ wget http://cdsware.cern.ch/download/cds-invenio-0.99.0.tar.gz.sig
$ md5sum -v -c cds-invenio-0.99.0.tar.gz.md5
$ gpg --verify cds-invenio-0.99.0.tar.gz.sig cds-invenio-0.99.0.tar.gz
$ tar xvfz cds-invenio-0.99.0.tar.gz
$ cd cds-invenio-0.99.0
$ ./configure
$ make
$ make install
$ make install-jsmath-plugin ## optional
1b. Configuration
-----------------
$ emacs /opt/cds-invenio/etc/invenio.conf
$ emacs /opt/cds-invenio/etc/invenio-local.conf
$ /opt/cds-invenio/bin/inveniocfg --update-all
$ /opt/cds-invenio/bin/inveniocfg --create-tables
$ /opt/cds-invenio/bin/inveniocfg --create-apache-conf
$ sudo /path/to/apache/bin/apachectl graceful
$ sudo chgrp -R www-data /opt/cds-invenio
$ sudo chmod -R g+r /opt/cds-invenio
$ sudo chmod -R g+rw /opt/cds-invenio/var
$ sudo find /opt/cds-invenio -type d -exec chmod g+rxw {} \;
$ /opt/cds-invenio/bin/inveniocfg --create-demo-site
$ /opt/cds-invenio/bin/inveniocfg --load-demo-records
$ /opt/cds-invenio/bin/inveniocfg --run-unit-tests
$ /opt/cds-invenio/bin/inveniocfg --run-regression-tests
$ /opt/cds-invenio/bin/inveniocfg --remove-demo-records
$ /opt/cds-invenio/bin/inveniocfg --drop-demo-site
$ firefox http://your.site.com/help/admin/howto-run
2. Detailed instructions for the patient CDS Invenio admin
==========================================================
2a. Installation
----------------
The CDS Invenio uses standard GNU autoconf method to build and
install its files. This means that you proceed as follows:
$ cd /usr/local/src/
Change to a directory where we will configure and build the
CDS Invenio. (The built files will be installed into
different "target" directories later.)
$ wget http://cdsware.cern.ch/download/cds-invenio-0.99.0.tar.gz
$ wget http://cdsware.cern.ch/download/cds-invenio-0.99.0.tar.gz.md5
$ wget http://cdsware.cern.ch/download/cds-invenio-0.99.0.tar.gz.sig
Fetch CDS Invenio source tarball from the CDS Software
Consortium distribution server, together with MD5 checksum
and GnuPG cryptographic signature files useful for verifying
the integrity of the tarball.
$ md5sum -v -c cds-invenio-0.99.0.tar.gz.md5
Verify MD5 checksum.
$ gpg --verify cds-invenio-0.99.0.tar.gz.sig cds-invenio-0.99.0.tar.gz
Verify GnuPG cryptographic signature. Note that you may
first have to import my public key into your keyring, if you
haven't done that already:
$ gpg --keyserver wwwkeys.eu.pgp.net --recv-keys 0xBA5A2B67
The output of the gpg --verify command should then read:
Good signature from "Tibor Simko "
You can safely ignore any trusted signature certification
warning that may follow after the signature has been
successfully verified.
$ tar xvfz cds-invenio-0.99.0.tar.gz
Untar the distribution tarball.
$ cd cds-invenio-0.99.0
Go to the source directory.
$ ./configure
Configure CDS Invenio software for building on this specific
platform. You can use the following optional parameters:
--prefix=/opt/cds-invenio
Optionally, specify the CDS Invenio general
installation directory (default is /opt/cds-invenio).
It will contain command-line binaries and program
libraries containing the core CDS Invenio
functionality, but also store web pages, runtime log
and cache information, document data files, etc.
Several subdirs like `bin', `etc', `lib', or `var'
will be created inside the prefix directory to this
effect. Note that the prefix directory should be
chosen outside of the Apache htdocs tree, since only
one its subdirectory (prefix/var/www) is to be
accessible directly via the Web (see below).
Note that CDS Invenio won't install to any other
directory but to the prefix mentioned in this
configuration line.
--with-python=/opt/python/bin/python2.3
Optionally, specify a path to some specific Python
binary. This is useful if you have more than one
Python installation on your system. If you don't set
this option, then the first Python that will be found
in your PATH will be chosen for running CDS Invenio.
--with-mysql=/opt/mysql/bin/mysql
Optionally, specify a path to some specific MySQL
client binary. This is useful if you have more than
one MySQL installation on your system. If you don't
set this option, then the first MySQL client
executable that will be found in your PATH will be
chosen for running CDS Invenio.
--with-clisp=/opt/clisp/bin/clisp
Optionally, specify a path to CLISP executable. This
is useful if you have more than one CLISP
installation on your system. If you don't set this
option, then the first executable that will be found
in your PATH will be chosen for running CDS Invenio.
--with-cmucl=/opt/cmucl/bin/lisp
Optionally, specify a path to CMUCL executable. This
is useful if you have more than one CMUCL
installation on your system. If you don't set this
option, then the first executable that will be found
in your PATH will be chosen for running CDS Invenio.
--with-sbcl=/opt/sbcl/bin/sbcl
Optionally, specify a path to SBCL executable. This
is useful if you have more than one SBCL
installation on your system. If you don't set this
option, then the first executable that will be found
in your PATH will be chosen for running CDS Invenio.
This configuration step is mandatory. Usually, you do this
step only once.
(Note that if you prefer to build CDS Invenio out of its
source tree, you may run the above configure command like
this: mkdir build && cd build && ../configure --prefix=...
FIXME: this is not working right now as per the introduction
of intbitset_setup.py.)
$ make
Launch the CDS Invenio build. Since many messages are printed
during the build process, you may want to run it in a
fast-scrolling terminal such as rxvt or in a detached screen
session.
During this step all the pages and scripts will be
pre-created and customized based on the config you have
edited in the previous step.
Note that on systems such as FreeBSD or Mac OS X you have to
use GNU make ("gmake") instead of "make".
$ make install
Install the web pages, scripts, utilities and everything
needed for runtime into the respective directories, as
specified earlier by the configure command.
Note that if you are installing CDS Invenio for the first
time, you will be asked to create a symbolic link for the
"invenio" Python module from Python's site-packages
directory to instruct Python where to find CDS Invenio's
Python files. The process will hint you at the exact
command to use based on the values you have used in the
configure line.
(Note also that on some operating systems you might need to
create another symlink manually for lib64:
$ sudo ln -s /opt/cds-invenio/lib/python/invenio \
/usr/local/lib64/python2.3/site-packages/invenio
if you happen to encounter some troubles finding intbitset
libraries.)
$ sudo make install-jsmath-plugin ## optional
This will automatically download and install in the proper
place jsMath, a Javascript library to render LaTeX formulas
in the client browser.
Note that in order to enable the rendering you will have to
set later the variable CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS
in the invenio.conf to a suitable list of output format
codes like in "['hd', 'hb']".
2b. Configuration
-----------------
Once the basic software installation is done, we proceed to
configuring your Invenio system.
$ emacs /opt/cds-invenio/etc/invenio.conf
$ emacs /opt/cds-invenio/etc/invenio-local.conf
Customize your CDS Invenio installation. The 'invenio.conf'
file contains the vanilla default configuration parameters
of a CDS Invenio installation, as coming from the
distribution. You could in principle go ahead and change
the values according to your local needs.
However, you can also create a file named
'invenio-local.conf' in the same directory where
'invenio.conf' lives and put there only the localizations
you need to have different from the default ones. For
example:
$ cat /opt/cds-invenio/etc/invenio-local.conf
[Invenio]
- WEBURL = http://your.site.com
- SWEBURL = https://your.site.com
+ CFG_SITE_URL = http://your.site.com
+ CFG_SITE_SECURE_URL = https://your.site.com
+ CFG_SITE_ADMIN_EMAIL = john.doe@your.site.com
+ CFG_SITE_SUPPORT_EMAIL = john.doe@your.site.com
The Invenio system will then read both the default
invenio.conf file and your customized invenio-local.conf
file and it will override any default options with the ones
you have set in your local file. This cascading of
configuration parameters will ease you future upgrades.
You should override at least the parameters from the top of
invenio.conf file in order to define some very essential
runtime parameters such as the visible URL of your document
- server (look for WEBURL and SWEBURL), the database
- credentials (look for CFG_DATABASE_*), the name of your
- document server (look for CDSNAME and CDSNAMEINTL), or the
- email address of the local CDS Invenio administrator (look
- for SUPPORTEMAIL and ADMINEMAIL).
+ server (look for CFG_SITE_URL and CFG_SITE_SECURE_URL), the
+ database credentials (look for CFG_DATABASE_*), the name of
+ your document server (look for CFG_SITE_NAME and
+ CFG_SITE_NAME_INTL_*), or the email address of the local CDS
+ Invenio administrator (look for CFG_SITE_SUPPORT_EMAIL and
+ CFG_SITE_ADMIN_EMAIL).
$ /opt/cds-invenio/bin/inveniocfg --update-all
Make the rest of the Invenio system aware of your
invenio.conf changes. This step is mandatory each time you
edit your conf files.
$ /opt/cds-invenio/bin/inveniocfg --create-tables
If you are installing CDS Invenio for the first time, you
have to create database tables.
Note that this step checks for potential problems such as
the database connection rights and may ask you to perform
some more administrative steps in case it detects a problem.
Notably, it may ask you to set up database access
permissions, based on your configure values.
If you are installing CDS Invenio for the first time, you
have to create a dedicated database on your MySQL server
that the CDS Invenio can use for its purposes. Please
contact your MySQL administrator and ask him to execute the
commands this step proposes you.
At this point you should now have successfully completed the
"make install" process. We continue by setting up the
Apache web server.
$ /opt/cds-invenio/bin/inveniocfg --create-apache-conf
Running this command will generate Apache virtual host
configurations matching your installation. You will be
instructed to check created files (usually they are located
under /opt/cds-invenio/etc/apache/) and edit your httpd.conf
to put the following include statements:
Include /opt/cds-invenio/etc/apache/invenio-apache-vhost.conf
Include /opt/cds-invenio/etc/apache/invenio-apache-vhost-ssl.conf
$ sudo /path/to/apache/bin/apachectl graceful
Please ask your webserver administrator to restart the
Apache server after the above "httpd.conf" changes.
$ sudo chgrp -R www-data /opt/cds-invenio
$ sudo chmod -R g+r /opt/cds-invenio
$ sudo chmod -R g+rw /opt/cds-invenio/var
$ sudo find /opt/cds-invenio -type d -exec chmod g+rxw {} \;
One more superuser step, because we need to enable Apache
server to read files from the installation place and to
write some log information and to cache interesting entities
inside the "var" subdirectory of our CDS Invenio
installation directory.
Here we assumed that your Apache server processes are run
under "www-data" group. Change this appropriately for your
system.
Moreover, note that if you are using SELinux extensions
(e.g. on Fedora Core 6), you may have to check and enable
the write access of Apache user there too.
After these admin-level tasks to be performed as root, let's
now go back to finish the installation of the CDS Invenio.
$ /opt/cds-invenio/bin/inveniocfg --create-demo-site
This step is recommended to test your local CDS Invenio
installation. It should give you our "Atlantis Institute of
Science" demo installation, exactly as you see it at
.
$ /opt/cds-invenio/bin/inveniocfg --load-demo-records
Optionally, load some demo records to be able to test
indexing and searching of your local CDS Invenio demo
installation.
$ /opt/cds-invenio/bin/inveniocfg --run-unit-tests
Optionally, you can run the unit test suite to verify the
unit behaviour of your local CDS Invenio installation. Note
that this command should be run only after you have
installed the whole system via `make install'.
$ /opt/cds-invenio/bin/inveniocfg --run-regression-tests
Optionally, you can run the full regression test suite to
verify the functional behaviour of your local CDS Invenio
installation. Note that this command requires to have
created the demo site and loaded the demo records. Note
also that running the regression test suite may alter the
database content with junk data, so that rebuilding the
demo site is strongly recommended afterwards.
$ /opt/cds-invenio/bin/inveniocfg --remove-demo-records
Optionally, remove the demo records loaded in the previous
step, but keeping otherwise the demo collection, submission,
format, and other configurations that you may reuse and
modify for your own production purposes.
$ /opt/cds-invenio/bin/inveniocfg --drop-demo-site
Optionally, drop also all the demo configuration so that
you'll end up with a completely blank CDS Invenio system.
However, you may want to find it more practical not to drop
the demo site configuration but to start customizing from
there.
$ firefox http://your.site.com/help/admin/howto-run
In order to start using your CDS Invenio installation, you
can start indexing, formatting and other daemons as
indicated in the "HOWTO Run" guide on the above URL. You
can also use the Admin Area web interfaces to perform
further runtime configurations such as the definition of
data collections, document types, document formats, word
indexes, etc.
Good luck, and thanks for choosing CDS Invenio.
- CDS Development Group
diff --git a/RELEASE-NOTES b/RELEASE-NOTES
index 7fdf399fa..b6ab62a51 100644
--- a/RELEASE-NOTES
+++ b/RELEASE-NOTES
@@ -1,181 +1,183 @@
--------------------------------------------------------------------
CDS Invenio v0.99.0 is released
FIXME 20, 2008
http://cdsware.cern.ch/invenio/news.html
--------------------------------------------------------------------
CDS Invenio v0.99.0 was released on FIXME 20, 2008.
What's new:
-----------
*) FIXME
Download:
---------
Installation notes:
-------------------
Please follow the INSTALL file bundled in the distribution tarball.
Upgrade notes:
--------------
If you are upgrading from CDS Invenio v0.92.1, then please follow the
following steps:
- Launch the bibsched monitor and wait until all active bibsched
tasks are finished. Then put bibsched daemon into manual mode.
- Stop all submission procedures and other write operations. For
example, you may want to stop Apache, edit httpd.conf to
introduce a global site redirect to a temporary splash page
saying that upgrade is in progress, and restart Apache.
- Take a backup of your current MySQL database and your CDS Invenio
installation directory (usually /opt/cds-invenio).
- First of all, note that CDS Invenio v0.99.0 must use MySQL server
at least 4.1 and the database must be running in UTF-8 mode. If
you have been running olderon MySQL server, or you have not
created your database and data in default charset UTF-8 but say
in Latin-1, then you must dump and reload all your tables. In
order to check which version and charset you have been using, you
can run:
$ echo "SELECT VERSION()" | /opt/cds-invenio/bin/dbexec
$ echo "SHOW CREATE DATABASE cdsinvenio" | /opt/cds-invenio/bin/dbexec
$ echo "SHOW CREATE TABLE bib10x" | /opt/cds-invenio/bin/dbexec
FIXME: provide some more detailed instructions how to dump data,
alter table charset, and load data back.
Note that the table definition and data charset altering process
may take some time. If you have to upgrade your MySQL server as
well, you may want to prepare the migration on another server.
- Second of all, you must upgrade some Python modules, namely your
MySQLdb module as indicated in the INSTALL file. Note that this
will make your current site unusable, but we have stopped Apache
already anyway. If you have been using Stemmer, you must upgrade
its version as indicated in the INSTALL file. The version is not
backward-compatible.
- Untar new sources and rerun configure with old prefix argument
(your old configure line is available at
/opt/cds-invenio-BACKUP/etc/build/config.nice) and install CDS
Invenio (see INSTALL file, part 1a).
- CDS Invenio v0.99.0 uses a new INI-style of configuration.
Please see INSTALL file, part 1b on how to customize your
installation. You will have to create 'invenio-local.conf' where
you have to merge your old config.wml values (they are available
- at /opt/cds-invenio-BACKUP/lib/wml/invenio/config.wml).
+ at /opt/cds-invenio-BACKUP/lib/wml/invenio/config.wml). Beware
+ of variable name updates, e.g. 'filedirsize' now became
+ 'CFG_WEBSUBMIT_FILESYSTEM_BIBDOC_GROUP_LIMIT'.
- If you have previously customized your page header and footer via
WML, then you should now put your page customizations in a new
template skin (please see the WebStyle Admin Guide for more
information). Also, if you have edited v0.92.1 CSS style sheet,
you may want to merge your changes into new elements of the
0.99.0 style sheet.
- If you have customized your /opt/cds-invenio/etc/ files in the
last installation (for example to edit the stopwords list or
configure language stemmers), then you have to restore your
changes into corresponding /opt/cds-invenio/etc files.
- Update your database table structure:
$ make update-v0.92.1-tables
If you are upgrading from previous CDSware releases such as
v0.7.1, then you may need to run more update statements of this
kind, as indicated in previous RELEASE-NOTES files. (You could
also start afresh, copying your old data into a fresh v0.99.0
installation.)
- If you have previously customized your page templates
(e.g. webbasket_templates.py), then check for changes the new
version brings:
$ make check-custom-templates
This script will check if your customized page templates
(see the WebStyle Admin Guide) still conform to the default
templates. This gives you a chance to fix your templates
before running 'make install', by providing a list of
incompatibilities of your custom templates with the new
versions of the templates.
This step is only useful if you are upgrading and if you
have previously customized default page templates.
- You have also to run manually several migration scripts:
$ python ./modules/webaccess/lib/collection_restrictions_migration_kit.py
This script will migrate restricted collections that used
Apache user groups to a new firewall-like role definition
language.
$ python ./modules/websubmit/lib/fulltext_files_migration_kit.py
This script will check and update your fulltext file
storage system to the new style (e.g. unique document
names).
$ python ./modules/websession/lib/password_migration_kit.py
This script will update your local user table in order to
use encrypted passwords for more security.
- CDS Invenio v0.99.0 uses new, faster word indexer. You will have
to reindex all your indexes by running:
$ /opt/cds-invenio/bin/bibindex -u admin -R
- You have to edit your httpd.conf in order to make Apache aware of
new URLs. Please run:
$ /opt/cds-invenio/bin/inveniocfg --create-apache-conf
and check differences with your current setup and edit as
appropriate.
- Restart Apache and check whether everything is alright with your
system.
Note that the detailed record pages (/record/10) have now a
tab-style look by default. You may want to update your formats
to this style or else to disable this feature. Please see the
WebStyle Admin Guide for more information.
- Put the bibsched daemon back into the automatic mode. You are
done.
Further notes and issues:
-------------------------
*) Some modules of this release (e.g. mail submission system) are
still experimental and not yet activated. You may have a peek at
what is planned, but please do not rely on them.
*) The admin-level functionality of several modules is not fully
developed or documented yet.
What's next:
------------
*) Improving the known issues mentioned above. Strengthening the
documentation towards v1.0 release.
*) Improving the record editing capabilities.
*) Deploying the new URL schema for all pages (admin).
- end of file -
\ No newline at end of file
diff --git a/config/invenio-autotools.conf.in b/config/invenio-autotools.conf.in
index 83bb1ec26..6285c73a5 100644
--- a/config/invenio-autotools.conf.in
+++ b/config/invenio-autotools.conf.in
@@ -1,78 +1,75 @@
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
## DO NOT EDIT THIS FILE.
## YOU SHOULD NOT EDIT THESE VALUES. THEY WERE AUTOMATICALLY
## CALCULATED BY AUTOTOOLS DURING THE "CONFIGURE" STAGE.
[Invenio]
-VERSION = @VERSION@
-CFG_PATH_PHP = @PHP@
+## Invenio version:
+CFG_VERSION = @VERSION@
+
+## directories detected from 'configure --prefix ...' parameters:
CFG_PREFIX = @prefix@
-BINDIR = @prefix@/bin
-PYLIBDIR = @prefix@/lib/python
-LOGDIR = @localstatedir@/log
-ETCDIR = @prefix@/etc
-LOCALEDIR = @prefix@/share/locale
-TMPDIR = @localstatedir@/tmp
-CACHEDIR = @localstatedir@/cache
-WEBDIR = @localstatedir@/www
+CFG_BINDIR = @prefix@/bin
+CFG_PYLIBDIR = @prefix@/lib/python
+CFG_LOGDIR = @localstatedir@/log
+CFG_ETCDIR = @prefix@/etc
+CFG_LOCALEDIR = @prefix@/share/locale
+CFG_TMPDIR = @localstatedir@/tmp
+CFG_CACHEDIR = @localstatedir@/cache
+CFG_WEBDIR = @localstatedir@/www
+
+## path to interesting programs:
+CFG_PATH_PHP = @PHP@
CFG_PATH_ACROREAD = @ACROREAD@
CFG_PATH_GZIP = @GZIP@
CFG_PATH_GUNZIP = @GUNZIP@
CFG_PATH_TAR = @TAR@
CFG_PATH_DISTILLER = @PS2PDF@
CFG_PATH_GFILE = @FILE@
CFG_PATH_CONVERT = @CONVERT@
CFG_PATH_PDFTOTEXT = @PDFTOTEXT@
CFG_PATH_PDFTK = @PDFTK@
CFG_PATH_PDF2PS = @PDF2PS@
CFG_PATH_PSTOTEXT = @PSTOTEXT@
CFG_PATH_PSTOASCII = @PSTOASCII@
CFG_PATH_ANTIWORD = @ANTIWORD@
CFG_PATH_CATDOC = @CATDOC@
CFG_PATH_WVTEXT = @WVTEXT@
CFG_PATH_PPTHTML = @PPTHTML@
CFG_PATH_XLHTML = @XLHTML@
CFG_PATH_HTMLTOTEXT = @HTMLTOTEXT@
## CFG_BIBINDEX_PATH_TO_STOPWORDS_FILE -- path to the stopwords file. You
## probably don't want to change this path, although you may want to
## change the content of that file. Note that the file is used by the
## rank engine internally, so it should be given even if stopword
## removal in the indexes is not used.
CFG_BIBINDEX_PATH_TO_STOPWORDS_FILE = @prefix@/etc/bibrank/stopwords.kb
-## Furthemore, here are some legacy config variables used by
-## WebSubmit. FIXME: clean most of them away and rename existing ones
-## to fit the CFG_WEBSUBMIT_* naming schema.
-
-counters = @localstatedir@/data/submit/counters
-storage = @localstatedir@/data/submit/storage
-filedir = @localstatedir@/data/files
-xmlmarc2textmarc = @prefix@/bin/xmlmarc2textmarc
-bibupload = @prefix@/bin/bibupload
-bibformat = @prefix@/bin/bibformat
-bibwords = @prefix@/bin/bibwords
-bibconvert = @prefix@/bin/bibconvert
-bibconvertconf = @prefix@/etc/bibconvert/config
+## helper style of variables for WebSubmit:
+CFG_WEBSUBMIT_COUNTERSDIR = @localstatedir@/data/submit/counters
+CFG_WEBSUBMIT_STORAGEDIR = @localstatedir@/data/submit/storage
+CFG_WEBSUBMIT_FILEDIR = @localstatedir@/data/files
+CFG_WEBSUBMIT_BIBCONVERTCONFIGDIR = @prefix@/etc/bibconvert/config
## - end of file -
\ No newline at end of file
diff --git a/config/invenio.conf b/config/invenio.conf
index d3032310d..7c3b8aaa8 100644
--- a/config/invenio.conf
+++ b/config/invenio.conf
@@ -1,560 +1,567 @@
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
###################################################
## About 'invenio.conf' and 'invenio-local.conf' ##
###################################################
## The 'invenio.conf' file contains the vanilla default configuration
## parameters of a CDS Invenio installation, as coming from the
## distribution. The file should be self-explanatory. Once installed
## in its usual location (usually /opt/cds-invenio/etc), you could in
## principle go ahead and change the values according to your local
## needs.
##
## However, you can also create a file named 'invenio-local.conf' in
## the same directory where 'invenio.conf' lives and put there only
## the localizations you need to have different from the default ones.
## For example:
##
## $ cat /opt/cds-invenio/etc/invenio-local.conf
## [Invenio]
-## WEBURL = http://your.site.com
-## SWEBURL = https://your.site.com
+## CFG_SITE_URL = http://your.site.com
+## CFG_SITE_SECURE_URL = https://your.site.com
+## CFG_SITE_ADMIN_EMAIL = john.doe@your.site.com
+## CFG_SITE_SUPPORT_EMAIL = john.doe@your.site.com
##
## The Invenio system will then read both the default invenio.conf
## file and your customized invenio-local.conf file and it will
## override any default options with the ones you have set in your
## local file. This cascading of configuration parameters will ease
## you future upgrades.
[Invenio]
###################################
## Part 1: Essential parameters ##
###################################
## This part defines essential CDS Invenio internal parameters that
## everybody should override, like the name of the server or the email
## address of the local CDS Invenio administrator.
-## Specify which MySQL server to use, the name of the database to use,
-## and the database access credentials.
-
+## CFG_DATABASE_* - specify which MySQL server to use, the name of the
+## database to use, and the database access credentials.
CFG_DATABASE_HOST = localhost
CFG_DATABASE_NAME = cdsinvenio
CFG_DATABASE_USER = cdsinvenio
CFG_DATABASE_PASS = my123p$ss
-## WEBURL - specify URL under which your installation will be visible.
-## For example, use "http://webserver.domain.com". Do not leave
+## CFG_SITE_URL - specify URL under which your installation will be
+## visible. For example, use "http://your.site.com". Do not leave
## trailing slash.
-WEBURL = http://localhost
+CFG_SITE_URL = http://localhost
-## SWEBURL - specify secure URL under which your installation secure
-## pages such as login or registration will be visible. For example,
-## use "https://webserver.domain.com". Do not leave trailing slash.
-## If you don't plan on using HTTPS, then you may leave this empty.
-SWEBURL = https://localhost
+## CFG_SITE_SECURE_URL - specify secure URL under which your
+## installation secure pages such as login or registration will be
+## visible. For example, use "https://your.site.com". Do not leave
+## trailing slash. If you don't plan on using HTTPS, then you may
+## leave this empty.
+CFG_SITE_SECURE_URL = https://localhost
-## CDSNAME -- the visible name of your CDS Invenio installation.
-CDSNAME = Atlantis Institute of Fictive Science
+## CFG_SITE_NAME -- the visible name of your CDS Invenio installation.
+CFG_SITE_NAME = Atlantis Institute of Fictive Science
-## CDSNAMEINTL -- the international versions of CDSNAME in various
+## CFG_SITE_NAME_INTL -- the international versions of CDSNAME in various
## languages, defined using the standard locale-like language codes.
-CDSNAMEINTL_en = Atlantis Institute of Fictive Science
-CDSNAMEINTL_fr = Atlantis Institut des Sciences Fictives
-CDSNAMEINTL_de = Atlantis Institut der fiktiven Wissenschaft
-CDSNAMEINTL_es = Atlantis Instituto de la Ciencia Fictive
-CDSNAMEINTL_ca = Institut Atlantis de Ciència Fictícia
-CDSNAMEINTL_pt = Instituto Atlantis de Ciência Fictícia
-CDSNAMEINTL_it = Atlantis Istituto di Scienza Fittizia
-CDSNAMEINTL_ru = Атлантис Институт фиктивных Наук
-CDSNAMEINTL_sk = Atlantis Inštitút Fiktívnych Vied
-CDSNAMEINTL_cs = Atlantis Institut Fiktivních Věd
-CDSNAMEINTL_no = Atlantis Institutt for Fiktiv Vitenskap
-CDSNAMEINTL_sv = Atlantis Institut för Fiktiv Vetenskap
-CDSNAMEINTL_el = Ινστιτούτο Φανταστικών Επιστημών Ατλαντίδος
-CDSNAMEINTL_uk = Інститут вигаданих наук в Атлантісі
-CDSNAMEINTL_ja = Fictive 科学のAtlantis の協会
-CDSNAMEINTL_pl = Instytut Fikcyjnej Nauki Atlantis
-CDSNAMEINTL_bg = Институт за фиктивни науки Атлантис
-CDSNAMEINTL_hr = Institut Fiktivnih Znanosti Atlantis
-CDSNAMEINTL_zh_CN = 阿特兰提斯虚拟科学学院
-CDSNAMEINTL_zh_TW = 阿特蘭提斯虛擬科學學院
-
-## CDSLANG -- the default language of the interface:
-CDSLANG = en
-
-## CDSLANGS -- list of all languages the user interface should be
-## available in, separated by commas. The order specified below will
-## be respected on the interface pages. A good default would be to
-## use the alphabetical order. Currently supported languages include
-## Bulgarian, Catalan, Czech, German, Greek, English, Spanish, French,
-## Italian, Japanese, Norwegian, Polish, Portuguese, Russian, Slovak, Swedish,
-## and Ukrainian, Chinese (China), Chinese (Taiwan), so that the current
-## eventual maximum you can currently select is
+CFG_SITE_NAME_INTL_en = Atlantis Institute of Fictive Science
+CFG_SITE_NAME_INTL_fr = Atlantis Institut des Sciences Fictives
+CFG_SITE_NAME_INTL_de = Atlantis Institut der fiktiven Wissenschaft
+CFG_SITE_NAME_INTL_es = Atlantis Instituto de la Ciencia Fictive
+CFG_SITE_NAME_INTL_ca = Institut Atlantis de Ciència Fictícia
+CFG_SITE_NAME_INTL_pt = Instituto Atlantis de Ciência Fictícia
+CFG_SITE_NAME_INTL_it = Atlantis Istituto di Scienza Fittizia
+CFG_SITE_NAME_INTL_ru = Атлантис Институт фиктивных Наук
+CFG_SITE_NAME_INTL_sk = Atlantis Inštitút Fiktívnych Vied
+CFG_SITE_NAME_INTL_cs = Atlantis Institut Fiktivních Věd
+CFG_SITE_NAME_INTL_no = Atlantis Institutt for Fiktiv Vitenskap
+CFG_SITE_NAME_INTL_sv = Atlantis Institut för Fiktiv Vetenskap
+CFG_SITE_NAME_INTL_el = Ινστιτούτο Φανταστικών Επιστημών Ατλαντίδος
+CFG_SITE_NAME_INTL_uk = Інститут вигаданих наук в Атлантісі
+CFG_SITE_NAME_INTL_ja = Fictive 科学のAtlantis の協会
+CFG_SITE_NAME_INTL_pl = Instytut Fikcyjnej Nauki Atlantis
+CFG_SITE_NAME_INTL_bg = Институт за фиктивни науки Атлантис
+CFG_SITE_NAME_INTL_hr = Institut Fiktivnih Znanosti Atlantis
+CFG_SITE_NAME_INTL_zh_CN = 阿特兰提斯虚拟科学学院
+CFG_SITE_NAME_INTL_zh_TW = 阿特蘭提斯虛擬科學學院
+
+## CFG_SITE_LANG -- the default language of the interface:
+CFG_SITE_LANG = en
+
+## CFG_SITE_LANGS -- list of all languages the user interface should
+## be available in, separated by commas. The order specified below
+## will be respected on the interface pages. A good default would be
+## to use the alphabetical order. Currently supported languages
+## include Bulgarian, Catalan, Czech, German, Greek, English, Spanish,
+## French, Italian, Japanese, Norwegian, Polish, Portuguese, Russian,
+## Slovak, Swedish, and Ukrainian, Chinese (China), Chinese (Taiwan),
+## so that the current eventual maximum you can currently select is
## "bg,ca,cs,de,el,en,es,fr,hr,it,ja,no,pl,pt,ru,sk,sv,uk,zh_CN,zh_TW".
-CDSLANGS = bg,ca,cs,de,el,en,es,fr,hr,it,ja,no,pl,pt,ru,sk,sv,uk,zh_CN,zh_TW
-
-## ALERTENGINEEMAIL -- the email address from which the alert emails
-## will appear to be send:
-ALERTENGINEEMAIL = cds.alert@cdsdev.cern.ch
+CFG_SITE_LANGS = bg,ca,cs,de,el,en,es,fr,hr,it,ja,no,pl,pt,ru,sk,sv,uk,zh_CN,zh_TW
-## SUPPORTEMAIL -- the email address of the support team for this
-## installation:
-SUPPORTEMAIL = cds.support@cern.ch
+## CFG_SITE_SUPPORT_EMAIL -- the email address of the support team for
+## this installation:
+CFG_SITE_SUPPORT_EMAIL = cds.support@cern.ch
-## ADMINEMAIL -- the email address of the 'superuser' for this
-## installation. Enter your email address below and login with this
-## address when using CDS Invenio administration modules. You will then
-## be automatically recognized as superuser of the system.
-ADMINEMAIL = cds.support@cern.ch
+## CFG_SITE_ADMIN_EMAIL -- the email address of the 'superuser' for
+## this installation. Enter your email address below and login with
+## this address when using CDS Invenio administration modules. You
+## will then be automatically recognized as superuser of the system.
+CFG_SITE_ADMIN_EMAIL = cds.support@cern.ch
## CFG_MAX_CACHED_QUERIES -- maximum cached queries number possible
## after reaching this number of cached queries the cache is pruned
## deleting half of the older accessed cached queries.
CFG_MAX_CACHED_QUERIES = 10000
# FIXME: change name to express SQL queries
## CFG_MISCUTIL_USE_SQLALCHEMY -- whether to use SQLAlchemy.pool in
## the DB engine of CDS Invenio. It is okay to enable this flag even
## if you have not installed SQLAlchemy. Note that Invenio will loose
## some perfomance if CFG_MISCUTIL_USE_SQLALCHEMY is enabled.
CFG_MISCUTIL_USE_SQLALCHEMY = False
## CFG_MISCUTIL_SMTP_HOST -- which server to use as outgoing mail server to
## send outgoing emails generated by the system, for example concerning
## submissions or email notification alerts.
CFG_MISCUTIL_SMTP_HOST = localhost
## CFG_MISCUTIL_SMTP_PORT -- which port to use on the outgoing mail server
## defined in the previous step.
CFG_MISCUTIL_SMTP_PORT = 25
## CFG_APACHE_PASSWORD_FILE -- the file where Apache user credentials
## are stored. Must be an absolute pathname. If the value does not
## start by a slash, it is considered to be the filename of a file
## located under prefix/var/tmp directory. This is useful for the
## demo site testing purposes. For the production site, if you plan
## to restrict access to some collections based on the Apache user
## authentication mechanism, you should put here an absolute path to
## your Apache password file.
CFG_APACHE_PASSWORD_FILE = demo-site-apache-user-passwords
## CFG_APACHE_GROUP_FILE -- the file where Apache user groups are
## defined. See the documentation of the preceding config variable.
CFG_APACHE_GROUP_FILE = demo-site-apache-user-groups
## CFG_CERN_SITE -- do we want to enable CERN-specific code, like the
## one that proposes links to famous HEP sites such as Spires and KEK?
## Put "1" for "yes" and "0" for "no".
CFG_CERN_SITE = 0
################################
## Part 2: Web page style ##
################################
## The variables affecting the page style. The most important one is
## the 'template skin' you would like to use. Please refer to the
## WebStyle Admin Guide for more explanation. The other variables are
## listed here mostly for backwards compatibility purposes only.
## CFG_WEBSTYLE_TEMPLATE_SKIN -- what template skin do you want to
## use?
CFG_WEBSTYLE_TEMPLATE_SKIN = default
## CFG_WEBSTYLE_CDSPAGEBOXLEFTTOP -- eventual global HTML left top box:
CFG_WEBSTYLE_CDSPAGEBOXLEFTTOP =
## CFG_WEBSTYLE_CDSPAGEBOXLEFTBOTTOM -- eventual global HTML left bottom box:
CFG_WEBSTYLE_CDSPAGEBOXLEFTBOTTOM =
## CFG_WEBSTYLE_CDSPAGEBOXRIGHTTOP -- eventual global HTML right top box:
CFG_WEBSTYLE_CDSPAGEBOXRIGHTTOP =
## CFG_WEBSTYLE_CDSPAGEBOXRIGHTBOTTOM -- eventual global HTML right bottom box:
CFG_WEBSTYLE_CDSPAGEBOXRIGHTBOTTOM =
##################################
## Part 3: WebSearch parameters ##
##################################
## This section contains some configuration parameters for WebSearch
## module. Please note that WebSearch is mostly configured on
## run-time via its WebSearch Admin web interface. The parameters
## below are the ones that you do not probably want to modify very
## often during the runtime. (Note that you may modify them
## afterwards too, though.)
## CFG_WEBSEARCH_SEARCH_CACHE_SIZE -- how many queries we want to
## cache in memory per one Apache httpd process? This cache is used
## mainly for "next/previous page" functionality, but it caches also
## "popular" user queries if more than one user happen to search for
## the same thing. Note that large numbers may lead to great memory
## consumption. We recommend a value not greater than 100.
CFG_WEBSEARCH_SEARCH_CACHE_SIZE = 100
## CFG_WEBSEARCH_FIELDS_CONVERT -- if you migrate from an older
## system, you may want to map field codes of your old system (such as
## 'ti') to CDS Invenio/MySQL ("title"). Use Python dictionary syntax
## for the translation table, e.g. {'wau':'author', 'wti':'title'}.
## Usually you don't want to do that, and you would use empty dict {}.
CFG_WEBSEARCH_FIELDS_CONVERT = {}
## CFG_WEBSEARCH_SIMPLESEARCH_PATTERN_BOX_WIDTH -- width of the search
## pattern window in the simple search interface, in characters.
CFG_WEBSEARCH_SIMPLESEARCH_PATTERN_BOX_WIDTH = 40
## CFG_WEBSEARCH_ADVANCEDSEARCH_PATTERN_BOX_WIDTH -- width of the
## search pattern window in the advanced search interface, in
## characters.
CFG_WEBSEARCH_ADVANCEDSEARCH_PATTERN_BOX_WIDTH = 30
## CFG_WEBSEARCH_NB_RECORDS_TO_SORT -- how many records do we still
## want to sort? For higher numbers we print only a warning and won't
## perform any sorting other than default 'latest records first', as
## sorting would be very time consuming then. We recommend a value of
## not more than a couple of thousands.
CFG_WEBSEARCH_NB_RECORDS_TO_SORT = 1000
## CFG_WEBSEARCH_CALL_BIBFORMAT -- if a record is being displayed but
## it was not preformatted in the "HTML brief" format, do we want to
## call BibFormatting on the fly? Put "1" for "yes" and "0" for "no".
## Note that "1" will display the record exactly as if it were fully
## preformatted, but it may be slow due to on-the-fly processing; "0"
## will display a default format very fast, but it may not have all
## the fields as in the fully preformatted HTML brief format. Note
## also that this option is active only for old (PHP) formats; the new
## (Python) formats are called on the fly by default anyway, since
## they are much faster. When usure, please set "0" here.
CFG_WEBSEARCH_CALL_BIBFORMAT = 0
## CFG_WEBSEARCH_USE_ALEPH_SYSNOS -- do we want to make old SYSNOs
## visible rather than MySQL's record IDs? You may use this if you
## migrate from a different e-doc system, and you store your old
## system numbers into 970__a. Put "1" for "yes" and "0" for
## "no". Usually you don't want to do that, though.
CFG_WEBSEARCH_USE_ALEPH_SYSNOS = 0
## CFG_WEBSEARCH_I18N_LATEST_ADDITIONS -- Put "1" if you want the
## "Latest Additions" in the web collection pages to show
## internationalized records. Useful only if your brief BibFormat
## templates contains internationalized strings. Otherwise put "0" in
## order not to slow down the creation of latest additions by WebColl.
CFG_WEBSEARCH_I18N_LATEST_ADDITIONS = 0
## CFG_WEBSEARCH_INSTANT_BROWSE -- the number of records to display
## under 'Latest Additions' in the web collection pages.
CFG_WEBSEARCH_INSTANT_BROWSE = 10
## CFG_WEBSEARCH_INSTANT_BROWSE_RSS -- the number of records to
## display in the RSS feed.
CFG_WEBSEARCH_INSTANT_BROWSE_RSS = 25
## CFG_WEBSEARCH_RSS_TTL -- number of minutes that indicates how long
## a feed cache is valid.
CFG_WEBSEARCH_RSS_TTL = 360
## CFG_WEBSEARCH_RSS_MAX_CACHED_REQUESTS -- maximum number of request kept
## in cache. If the cache is filled, following request are not cached.
CFG_WEBSEARCH_RSS_MAX_CACHED_REQUESTS = 1000
## CFG_WEBSEARCH_AUTHOR_ET_AL_THRESHOLD -- up to how many author names
## to print explicitely; for more print "et al". Note that this is
## used in default formatting that is seldomly used, as usually
## BibFormat defines all the format. The value below is only used
## when BibFormat fails, for example.
CFG_WEBSEARCH_AUTHOR_ET_AL_THRESHOLD = 3
## CFG_WEBSEARCH_NARROW_SEARCH_SHOW_GRANDSONS -- whether to show or
## not collection grandsons in Narrow Search boxes (sons are shown by
## default, grandsons are configurable here). Use 0 for no and 1 for
## yes.
CFG_WEBSEARCH_NARROW_SEARCH_SHOW_GRANDSONS = 1
## CFG_WEBSEARCH_CREATE_SIMILARLY_NAMED_AUTHORS_LINK_BOX -- shall we
## create help links for Ellis, Nick or Ellis, Nicholas and friends
## when Ellis, N was searched for? Useful if you have one author
## stored in the database under several name formats, namely surname
## comma firstname and surname comma initial cataloging policy. Use 0
## for no and 1 for yes.
CFG_WEBSEARCH_CREATE_SIMILARLY_NAMED_AUTHORS_LINK_BOX = 1
## CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS -- jsMath is a Javascript
## library that renders (La)TeX mathematical formulas in the client
## browser. This parameter must contain a list of output format for
## which to apply jsMath rendering, for example "['hd', 'hb']". If
## the list is empty, jsMath is disabled.
CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS = []
#######################################
## Part 4: BibHarvest OAI parameters ##
#######################################
## This part defines parameters for the CDS Invenio OAI gateway.
## Useful if you are running CDS Invenio as OAI data provider.
## CFG_OAI_ID_FIELD -- OAI identifier MARC field:
CFG_OAI_ID_FIELD = 909COo
## CFG_OAI_SET_FIELD -- OAI set MARC field:
CFG_OAI_SET_FIELD = 909COp
## CFG_OAI_DELETED_POLICY -- OAI deletedrecordspolicy
## (no/transient/persistent).
CFG_OAI_DELETED_POLICY = no
## CFG_OAI_ID_PREFIX -- OAI identifier prefix:
CFG_OAI_ID_PREFIX = atlantis.cern.ch
## CFG_OAI_SAMPLE_IDENTIFIER -- OAI sample identifier:
CFG_OAI_SAMPLE_IDENTIFIER = oai:atlantis.cern.ch:CERN-TH-4036
## CFG_OAI_IDENTIFY_DESCRIPTION -- description for the OAI Identify verb:
CFG_OAI_IDENTIFY_DESCRIPTION =
oai
atlantis.cern.ch
:
oai:atlantis.cern.ch:CERN-TH-4036
http://atlantis.cern.ch/Free and unlimited use by anybody with obligation to refer to original recordFull content, i.e. preprints may not be harvested by robotsSubmission restricted. Submitted documents are subject of approval by OAI repository admins.
## CFG_OAI_LOAD -- OAI number of records in a response:
CFG_OAI_LOAD = 1000
## CFG_OAI_EXPIRE -- OAI resumptionToken expiration time:
CFG_OAI_EXPIRE = 90000
## CFG_OAI_SLEEP -- service unavailable between two consecutive
## requests for CFG_OAI_SLEEP seconds:
CFG_OAI_SLEEP = 10
##################################
## Part 5: WebSubmit parameters ##
##################################
## This section contains some configuration parameters for WebSubmit
## module. Please note that WebSubmit is mostly configured on
## run-time via its WebSubmit Admin web interface. The parameters
## below are the ones that you do not probably want to modify during
## the runtime.
-## filedirsize -- all attached fulltext files are stored
-## under the CFG_FILE_DIR directory, inside subdirectories called gX
-## this variable indicates the maximum number of files stored in each
-## subdirectories.
-filedirsize = 5000
+## CFG_WEBSUBMIT_FILESYSTEM_BIBDOC_GROUP_LIMIT -- the fulltext
+## documents are stored under "/opt/cds-invenio/var/data/files/gX/Y"
+## directories where X is 0,1,... and Y stands for bibdoc ID. Thusly
+## documents Y are grouped into directories X and this variable
+## indicates the maximum number of documents Y stored in each
+## directory X. This limit is imposed solely for filesystem
+## performance reasons in order not to have too many subdirectories in
+## a given directory.
+CFG_WEBSUBMIT_FILESYSTEM_BIBDOC_GROUP_LIMIT = 5000
#################################
## Part 6: BibIndex parameters ##
#################################
## This section contains some configuration parameters for BibIndex
## module. Please note that BibIndex is mostly configured on run-time
## via its BibIndex Admin web interface. The parameters below are the
## ones that you do not probably want to modify very often during the
## runtime.
## CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY -- when fulltext indexing, do
## you want to index locally stored files only, or also external URLs?
## Use "0" to say "no" and "1" to say "yes".
CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY = 0
## CFG_BIBINDEX_REMOVE_STOPWORDS -- when indexing, do we want to remove
## stopwords? Use "0" to say "no" and "1" to say "yes".
CFG_BIBINDEX_REMOVE_STOPWORDS = 0
## CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS -- characters considered as
## alphanumeric separators of word-blocks inside words. You probably
## don't want to change this.
CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS = \!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~
# FIXME: maybe remove backslashes
## CFG_BIBINDEX_CHARS_PUNCTUATION -- characters considered as punctuation
## between word-blocks inside words. You probably don't want to
## change this.
CFG_BIBINDEX_CHARS_PUNCTUATION = \.\,\:\;\?\!\"\(\)\'\`\<\>
# FIXME: maybe remove backslashes
## CFG_BIBINDEX_REMOVE_HTML_MARKUP -- should we attempt to remove HTML markup
## before indexing? Use 1 if you have HTML markup inside metadata
## (e.g. in abstracts), use 0 otherwise.
CFG_BIBINDEX_REMOVE_HTML_MARKUP = 0
## CFG_BIBINDEX_REMOVE_LATEX_MARKUP -- should we attempt to remove LATEX markup
## before indexing? Use 1 if you have LATEX markup inside metadata
## (e.g. in abstracts), use 0 otherwise.
CFG_BIBINDEX_REMOVE_LATEX_MARKUP = 0
## CFG_BIBINDEX_MIN_WORD_LENGTH -- minimum word length allowed to be added to
## index. The terms smaller then this amount will be discarded.
## Useful to keep the database clean, however you can safely leave
## this value on 0 for up to 1,000,000 documents.
CFG_BIBINDEX_MIN_WORD_LENGTH = 0
## CFG_BIBINDEX_URLOPENER_USERNAME and CFG_BIBINDEX_URLOPENER_PASSWORD --
## access credentials to access restricted URLs, interesting only if
## you are fulltext-indexing files located on a remote server that is
## only available via username/password. But it's probably better to
## handle this case via IP or some convention; the current scheme is
## mostly there for demo only.
CFG_BIBINDEX_URLOPENER_USERNAME = mysuperuser
CFG_BIBINDEX_URLOPENER_PASSWORD = mysuperpass
## CFG_INTBITSET_ENABLE_SANITY_CHECKS --
## Enable sanity checks for integers passed to the intbitset data
## structures. It is good to enable this during debugging
## and to disable this value for speed improvements.
CFG_INTBITSET_ENABLE_SANITY_CHECKS = False
#######################################
## Part 7: Access control parameters ##
#######################################
## This section contains some configuration parameters for the access
## control system. Please note that WebAccess is mostly configured on
## run-time via its WebAccess Admin web interface. The parameters
## below are the ones that you do not probably want to modify very
## often during the runtime. (If you do want to modify them during
## runtime, for example te deny access temporarily because of backups,
## you can edit access_control_config.py directly, no need to get back
## here and no need to redo the make process.)
## CFG_ACCESS_CONTROL_LEVEL_SITE -- defines how open this site is.
## Use 0 for normal operation of the site, 1 for read-only site (all
## write operations temporarily closed), 2 for site fully closed.
## Useful for site maintenance.
CFG_ACCESS_CONTROL_LEVEL_SITE = 0
## CFG_ACCESS_CONTROL_LEVEL_GUESTS -- guest users access policy. Use
## 0 to allow guest users, 1 not to allow them (all users must login).
CFG_ACCESS_CONTROL_LEVEL_GUESTS = 0
## CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS -- account registration and
## activation policy. When 0, users can register and accounts are
## automatically activated. When 1, users can register but admin must
## activate the accounts. When 2, users cannot register nor update
## their email address, only admin can register accounts. When 3,
## users cannot register nor update email address nor password, only
## admin can register accounts. When 4, the same as 3 applies, nor
## user cannot change his login method.
CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS = 0
## CFG_ACCESS_CONTROL_LIMIT_REGISTRATION_TO_DOMAIN -- limit account
## registration to certain email addresses? If wanted, give domain
## name below, e.g. "cern.ch". If not wanted, leave it empty.
CFG_ACCESS_CONTROL_LIMIT_REGISTRATION_TO_DOMAIN =
## CFG_ACCESS_CONTROL_NOTIFY_ADMIN_ABOUT_NEW_ACCOUNTS -- send a
## notification email to the administrator when a new account is
## created? Use 0 for no, 1 for yes.
CFG_ACCESS_CONTROL_NOTIFY_ADMIN_ABOUT_NEW_ACCOUNTS = 0
## CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_NEW_ACCOUNT -- send a
## notification email to the user when a new account is created in order to
## to verify the validity of the provided email address? Use
## 0 for no, 1 for yes.
CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_NEW_ACCOUNT = 1
## CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_ACTIVATION -- send a
## notification email to the user when a new account is activated?
## Use 0 for no, 1 for yes.
CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_ACTIVATION = 0
## CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_DELETION -- send a
## notification email to the user when a new account is deleted or
## account demand rejected? Use 0 for no, 1 for yes.
CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_DELETION = 0
###############################
## FIXME: Undocumented ones: ##
###############################
## BibRank:
CFG_BIBRANK_SHOW_READING_STATS = 1
CFG_BIBRANK_SHOW_DOWNLOAD_STATS = 1
CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS = 1
CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS_CLIENT_IP_DISTRIBUTION = 0
CFG_BIBRANK_SHOW_CITATION_LINKS = 1
CFG_BIBRANK_SHOW_CITATION_STATS = 1
CFG_BIBRANK_SHOW_CITATION_GRAPHS = 1
## WebComment:
CFG_WEBCOMMENT_ALLOW_COMMENTS = 1
CFG_WEBCOMMENT_ALLOW_REVIEWS = 1
CFG_WEBCOMMENT_ALLOW_SHORT_REVIEWS = 0
CFG_WEBCOMMENT_NB_REPORTS_BEFORE_SEND_EMAIL_TO_ADMIN = 5
CFG_WEBCOMMENT_NB_COMMENTS_IN_DETAILED_VIEW = 1
CFG_WEBCOMMENT_NB_REVIEWS_IN_DETAILED_VIEW = 1
CFG_WEBCOMMENT_ADMIN_NOTIFICATION_LEVEL = 1
CFG_WEBCOMMENT_TIMELIMIT_PROCESSING_COMMENTS_IN_SECONDS = 20
CFG_WEBCOMMENT_TIMELIMIT_PROCESSING_REVIEWS_IN_SECONDS = 20
# FIXME: not found in modules subdir?!
CFG_WEBCOMMENT_TIMELIMIT_VOTE_VALIDITY_IN_DAYS = 365
# FIXME: not found in modules subdir?!
CFG_WEBCOMMENT_TIMELIMIT_REPORT_VALIDITY_IN_DAYS = 100
-
## BibSched:
CFG_BIBSCHED_REFRESHTIME = 5
# CFG_BIBSCHED_LOG_PAGER = "/bin/more"
CFG_BIBSCHED_LOG_PAGER = None
+## WebAlert:
+
+## CFG_WEBALERT_ALERT_ENGINE_EMAIL -- the email address from which the
+## alert emails will appear to be send:
+CFG_WEBALERT_ALERT_ENGINE_EMAIL = cds.alert@cdsdev.cern.ch
+
##########################
## THAT's ALL, FOLKS! ##
##########################
\ No newline at end of file
diff --git a/modules/bibclassify/lib/bibclassify_daemon.py b/modules/bibclassify/lib/bibclassify_daemon.py
index 05dc0a4b8..2c5798a45 100644
--- a/modules/bibclassify/lib/bibclassify_daemon.py
+++ b/modules/bibclassify/lib/bibclassify_daemon.py
@@ -1,175 +1,175 @@
# -*- coding: utf-8 -*-
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
BibClassify daemon.
FIXME: the code below requires collection table to be updated to add column:
clsMETHOD_fk mediumint(9) unsigned NOT NULL,
This is not clean and should be fixed.
"""
__revision__ = "$Id$"
import sys
from invenio.dbquery import run_sql
from invenio.bibtask import task_init, write_message, get_datetime, \
task_set_option, task_get_option, task_get_task_param, task_update_status, \
task_update_progress
from invenio.bibclassifylib import generate_keywords_rdf
from invenio.config import *
from os import popen, remove, listdir
import sys
from invenio.intbitset import intbitset
from invenio.search_engine import get_collection_reclist
from invenio.bibdocfile import BibRecDocs
import time
import os
def get_recids_foreach_ontology():
"""Returns an array containing hash objects containing the collection, its
corresponding ontology and the records belonging to the given collection."""
rec_onts = []
res = run_sql("""SELECT clsMETHOD.name, last_updated, collection.name
FROM clsMETHOD JOIN collection_clsMETHOD ON clsMETHOD.id=id_clsMETHOD
JOIN collection ON id_collection=collection.id""")
for ontology, date_last_run, collection in res:
recs = get_collection_reclist(collection)
if recs:
if not date_last_run:
date_last_run = '0000-00-00'
modified_records = intbitset(run_sql("SELECT id FROM bibrec WHERE modification_date >=%s", (date_last_run, )))
recs &= modified_records
if recs:
rec_onts.append({
'ontology' : ontology,
'collection' : collection,
'recIDs' : recs
})
return rec_onts
def update_date_of_last_run():
"""
Update bibclassify daemon table information about last run time.
"""
run_sql("UPDATE clsMETHOD SET last_updated=NOW()")
def task_run_core():
"""Runs anayse_documents for each ontology,collection,record ids set."""
- outfilename = tmpdir + "/bibclassifyd_%s.xml" % time.strftime("%Y%m%dH%M%S", time.localtime())
+ outfilename = CFG_TMPDIR + "/bibclassifyd_%s.xml" % time.strftime("%Y%m%dH%M%S", time.localtime())
outfiledesc = open(outfilename, "w")
coll_counter = 0
print >> outfiledesc, """"""
print >> outfiledesc, """"""
for onto_rec in get_recids_foreach_ontology():
write_message('Applying taxonomy %s to collection %s (%s records)' % (onto_rec['ontology'], onto_rec['collection'], len(onto_rec['recIDs'])))
if onto_rec['recIDs']:
coll_counter += analyse_documents(onto_rec['recIDs'], onto_rec['ontology'], onto_rec['collection'], outfilename, outfiledesc)
print >> outfiledesc, ''
outfiledesc.close()
if coll_counter:
- cmd = "%s/bibupload -n -c '%s' " % (bindir, outfilename)
+ cmd = "%s/bibupload -n -c '%s' " % (CFG_BINDIR, outfilename)
errcode = 0
try:
errcode = os.system(cmd)
except OSError, e:
print 'command' + cmd + ' failed ',e
if errcode != 0:
write_message("WARNING, %s failed, error code is %s" % (cmd,errcode))
return 0
update_date_of_last_run()
return 1
def analyse_documents(recs, ontology, collection, outfilename, outfiledesc):
"""For each collection, parse the documents attached to the records in collection with the corresponding ontology."""
time_now = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
did_something = False
counter = 1
max = len(recs)
# store running time:
# see which records we'll have to process:
#recIDs = get_recIDs_of_modified_records_since_last_run()
temp_text = None
if recs:
# process records:
cmd = None
path = None
- temp_text = tmpdir + '/bibclassify.pdftotext.' + str(os.getpid())
+ temp_text = CFG_TMPDIR + '/bibclassify.pdftotext.' + str(os.getpid())
for rec in recs:
bibdocfiles = BibRecDocs(rec).list_latest_files()
found_one_pdf = False
for bibdocfile in bibdocfiles:
if bibdocfile.get_format() == '.pdf':
found_one_pdf = True
if found_one_pdf:
did_something = True
print >> outfiledesc, ''
print >> outfiledesc, """%(recID)s""" % ({'recID':rec})
for f in bibdocfiles:
if f.get_format() == '.pdf':
cmd = "%s '%s' '%s'" % (CFG_PATH_PDFTOTEXT, f.get_full_path(), temp_text)
else:
write_message("Can't parse file %s." % f.get_full_path(), verbose=3)
continue
errcode = os.system(cmd)
if errcode != 0 or not os.path.exists("%s" % temp_text):
write_message("Error while executing command %s Error code was: %s " % (cmd, errcode))
write_message('Generating keywords for %s' % f.get_full_path())
- print >> outfiledesc, generate_keywords_rdf(temp_text, etcdir + '/bibclassify/' + ontology + '.rdf', 2, 70, 25, 0, False, verbose=0, ontology=ontology)
+ print >> outfiledesc, generate_keywords_rdf(temp_text, CFG_ETCDIR + '/bibclassify/' + ontology + '.rdf', 2, 70, 25, 0, False, verbose=0, ontology=ontology)
print >> outfiledesc, ''
task_update_progress("Done %s of %s for collction %s." % (counter, max, collection))
counter += 1
else:
write_message("Nothing to be done, move along")
return did_something
def cleanup_tmp():
"""Remove old temporary files created by this module"""
- for f in listdir(tmpdir):
- if 'bibclassify' in f: remove(tmpdir + '/' +f)
+ for f in listdir(CFG_TMPDIR):
+ if 'bibclassify' in f: remove(CFG_TMPDIR + '/' +f)
def main():
"""Constructs the bibclassifyd bibtask."""
cleanup_tmp()
task_init(authorization_action='runbibclassify',
authorization_msg="BibClassify Task Submission",
description="""Examples:
%s -u admin
""" % (sys.argv[0],),
version=__revision__,
task_run_fnc = task_run_core)
if __name__ == '__main__':
main()
# FIXME: one can have more than one ontologies in clsMETHOD.
# bibclassifyd -w HEP,Pizza
# FIXME: add more CLI options like bibindex ones, e.g.
# bibclassifyd -a -i 10-20
# FIXME: outfiledesc can be multiple files, e.g. when processing
# 100000 records it is good to store results by 1000 records
# (see oaiharvest)
diff --git a/modules/bibclassify/lib/bibclassifylib.py b/modules/bibclassify/lib/bibclassifylib.py
index 0d3a66a7b..d94b3869c 100644
--- a/modules/bibclassify/lib/bibclassifylib.py
+++ b/modules/bibclassify/lib/bibclassifylib.py
@@ -1,805 +1,805 @@
# -*- coding: utf-8 -*-
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
Bibclassify keyword extractor command line entry point.
"""
__revision__ = "$Id$"
import getopt
import string
import os
import re
import sys
import time
import copy
import shelve
from invenio.bibtask import write_message
# Please point the following variables to the correct paths if using standalone (Invenio-independent) version
TMPDIR_STANDALONE = "/tmp"
PDFTOTEXT_STANDALONE = "/usr/bin/pdftotext"
fontSize = [12, 14, 16, 18, 20, 22, 24, 26, 28, 30]
def usage(code, msg=''):
"Prints usage for this module."
if msg:
sys.stderr.write("Error: %s.\n" % msg)
usagetext = """
Usage: bibclassify [options]
Examples:
bibclassify -f file.pdf -k thesaurus.txt -o TEXT
bibclassify -f file.txt -K taxonomy.rdf -l 120 -m FULL
Specific options:
-f, --file=FILENAME name of the file to be classified (Use '.pdf' extension for PDF files; every other extension is treated as text)
-k, --thesaurus=FILENAME name of the text thesaurus (one keyword per line)
-K, --taxonomy=FILENAME name of the RDF SKOS taxonomy/ontology (a local file or URL)
-o, --output=HTML|TEXT|MARCXML output list of keywords in either HTML, text, or MARCXML
-l, --limit=INTEGER maximum number of keywords that will be processed to generate results (the higher the l, the higher the number of possible composite keywords)
-n, --nkeywords=INTEGER maximum number of single keywords that will be generated
-m, --mode=FULL|PARTIAL processing mode: PARTIAL (run on abstract and selected pages), FULL (run on whole document - more accurate, but slower)
-q, --spires outputs composite keywords in the SPIRES standard format (ckw1, ckw2)
General options:
-h, --help print this help and exit
-V, --version print version and exit
-v, --verbose=LEVEL Verbose level (0=min, 1=default, 9=max).
"""
sys.stderr.write(usagetext)
sys.exit(code)
def generate_keywords(textfile, dictfile, verbose=0):
""" A method that generates a sorted list of keywords of a document (textfile) based on a simple thesaurus (dictfile). """
keylist = []
keyws = []
wordlista = os.popen("more " + dictfile)
thesaurus = [x[:-1] for x in wordlista.readlines()]
for keyword in thesaurus:
try:
string.atoi(keyword)
except ValueError:
dummy = 1
else:
continue
if len(keyword)<=1: #whitespace or one char - get rid of
continue
else:
dictOUT = os.popen('grep -iwc "' +keyword.strip()+'" '+textfile).read()
try:
occur = int(dictOUT)
if occur != 0:
keylist.append([occur, keyword])
except ValueError:
continue
keylist.sort()
keylist.reverse()
for item in keylist:
keyws.append(item[1])
return keyws
def generate_keywords_rdf(textfile, dictfile, output, limit, nkeywords, mode, spires, verbose=0, ontology=None):
""" A method that generates a sorted list of keywords (text or html output) based on a RDF thesaurus. """
import rdflib
keylist = []
ckwlist = {}
outlist = []
compositesOUT = []
compositesTOADD = []
keys2drop = []
raw = []
composites = {}
compositesIDX = {}
text_out = ""
html_out = []
store = None
reusing_compiled_ontology_p = False
compiled_ontology_db = None
compiled_ontology_db_file = dictfile + '.db'
namespace = rdflib.Namespace("http://www.w3.org/2004/02/skos/core#")
if not(os.access(dictfile,os.F_OK) and os.access(compiled_ontology_db_file,os.F_OK) and os.path.getmtime(compiled_ontology_db_file) > os.path.getmtime(dictfile)):
# changed graph type, recommended by devel team
store = rdflib.ConjunctiveGraph()
store.parse(dictfile)
compiled_ontology_db = shelve.open(compiled_ontology_db_file)
compiled_ontology_db['graph'] = store
if verbose >= 3:
write_message("Creating compiled ontology %s for the first time" % compiled_ontology_db_file, sys.stderr)
else:
if verbose >= 3:
write_message("Reusing compiled ontology %s" % compiled_ontology_db_file, sys.stderr)
reusing_compiled_ontology_p = True
compiled_ontology_db = shelve.open(compiled_ontology_db_file)
store = compiled_ontology_db['graph']
size = int(os.stat(textfile).st_size)
rtmp = open(textfile, 'r')
atmp = open(textfile, 'r')
# ASSUMPTION: Guessing that the first 10% of file contains title and abstract
abstract = " " + str(atmp.read(int(size*0.1))) + " "
if mode == 1:
# Partial mode: analysing only abstract + title + middle portion of document
# Abstract and title is generally never more than 20% of whole document.
text_string = " " + str(rtmp.read(int(size*0.2)))
throw_away = str(rtmp.read(int(size*0.25)))
text_string += str(rtmp.read(int(size*0.2)))
else:
# Full mode: get all document
text_string = " " + str(rtmp.read()) + " "
atmp.close()
rtmp.close()
try:
# Here we are trying to match the human-assigned keywords
# These are generally found in a document after the key phrase "keywords" or similar
if text_string.find("Keywords:"):
safe_keys = text_string.split("Keywords:")[1].split("\n")[0]
elif text_string.find("Key words:"):
safe_keys = text_string.split("Key words:")[1].split("\n")[0]
elif text_string.find("Key Words:"):
safe_keys = text_string.split("Key Words:")[1].split("\n")[0]
except:
safe_keys = ""
if safe_keys != "":
write_message("Author keyword string detected: %s" % safe_keys, verbose=8)
# Here we start the big for loop around all concepts in the RDF ontology
if not reusing_compiled_ontology_p:
# we have to compile ontology first:
for s,pref in store.subject_objects(namespace["prefLabel"]):
dictOUT = 0
safeOUT = 0
hideOUT = 0
candidates = []
wildcard = ""
regex = False
nostandalone = False
# For each concept, we gather the candidates (i.e. prefLabel, hiddenLabel and altLabel)
candidates.append(pref.strip())
# If the candidate is a ckw and it has no altLabel, we are not interested at this point, go to the next item
if store.value(s,namespace["compositeOf"],default=False,any=True) and not store.value(s,namespace["altLabel"],default=False,any=True):
continue
if str(store.value(s,namespace["note"],any=True)) == "nostandalone":
nostandalone = True
for alt in store.objects(s, namespace["altLabel"]):
candidates.append(alt.strip())
for hid in store.objects(s, namespace["hiddenLabel"]):
candidates.append(hid.strip())
# We then create a regex pattern for each candidate and we match it in the document
# First we match any possible candidate containing regex. These have to be handled a priori
# (because they might cause double matching, e.g. "gauge theor*" will match "gauge theory"
for candidate in candidates:
if candidate.find("/", 0, 1) > -1:
# We have a wildcard or other regex, do not escape chars
# Wildcards matched with '\w*'. These truncations should go into hidden labels in the ontology
regex = True
pattern = makePattern(candidate, 3)
wildcard = pattern
hideOUT += len(re.findall(pattern,text_string))
# print "HIDEOUT: " + str(candidate) + " " + str(hideOUT)
for candidate in candidates:
# Different patterns are created according to the type of candidate keyword encountered
if candidate.find("/", 0, 1) > -1:
# We have already taken care of this
continue
elif regex and candidate.find("/", 0, 1) == -1 and len(re.findall(wildcard," " + candidate + " ")) > 0:
# The wildcard in hiddenLabel matches this candidate: skip it
# print "\ncase 2 touched\n"
continue
elif candidate.find("-") > -1:
# We have an hyphen -> e.g. "word-word". Look for: "word-word", "wordword", "word word" (case insensitive)
pattern = makePattern(candidate, 2)
elif candidate[:2].isupper() or len(candidate) < 3:
# First two letters are uppercase or very short keyword. This could be an acronym. Better leave case untouched
pattern = makePattern(candidate, 1)
else:
# Let's do some plain case insensitive search
pattern = makePattern(candidate, 0)
if len(candidate) < 3:
# We have a short keyword
if len(re.findall(pattern,abstract))> 0:
# The short keyword appears in the abstract/title, retain it
dictOUT += len(re.findall(pattern,text_string))
safeOUT += len(re.findall(pattern,safe_keys))
else:
dictOUT += len(re.findall(pattern,text_string))
safeOUT += len(re.findall(pattern,safe_keys))
dictOUT += hideOUT
if dictOUT > 0 and store.value(s,namespace["compositeOf"],default=False,any=True):
# This is a ckw whose altLabel occurs in the text
ckwlist[s.strip()] = dictOUT
elif dictOUT > 0:
keylist.append([dictOUT, s.strip(), pref.strip(), safeOUT, candidates, nostandalone])
regex = False
keylist.sort()
keylist.reverse()
compiled_ontology_db['keylist'] = keylist
compiled_ontology_db.close()
else:
# we can reuse compiled ontology:
keylist = compiled_ontology_db['keylist']
compiled_ontology_db.close()
if limit > len(keylist):
limit = len(keylist)
if nkeywords > limit:
nkeywords = limit
# Sort out composite keywords based on limit (default=70)
# Work out whether among l single keywords, there are possible composite combinations
# Generate compositesIDX dictionary of the form: s (URI) : keylist
for i in range(limit):
try:
if store.value(rdflib.Namespace(keylist[i][1]),namespace["composite"],default=False,any=True):
compositesIDX[keylist[i][1]] = keylist[i]
for composite in store.objects(rdflib.Namespace(keylist[i][1]),namespace["composite"]):
if composites.has_key(composite):
composites[composite].append(keylist[i][1])
else:
composites[composite]=[keylist[i][1]]
elif store.value(rdflib.Namespace(keylist[i][1]),namespace["compositeOf"],default=False,any=True):
compositesIDX[keylist[i][1]] = keylist[i]
else:
outlist.append(keylist[i])
except:
write_message("Problem with composites.. : %s" % keylist[i][1])
for s_CompositeOf in composites:
if len(composites.get(s_CompositeOf)) > 2:
write_message("%s - Sorry! Only composite combinations of max 2 keywords are supported at the moment." % s_CompositeOf)
elif len(composites.get(s_CompositeOf)) > 1:
# We have a composite match. Need to look for composite1 near composite2
comp_one = compositesIDX[composites.get(s_CompositeOf)[0]][2]
comp_two = compositesIDX[composites.get(s_CompositeOf)[1]][2]
# Now check that comp_one and comp_two really correspond to ckw1 : ckw2
if store.value(rdflib.Namespace(s_CompositeOf),namespace["prefLabel"],default=False,any=True).split(":")[0].strip() == comp_one:
# order is correct
searchables_one = compositesIDX[composites.get(s_CompositeOf)[0]][4]
searchables_two = compositesIDX[composites.get(s_CompositeOf)[1]][4]
comp_oneOUT = compositesIDX[composites.get(s_CompositeOf)[0]][0]
comp_twoOUT = compositesIDX[composites.get(s_CompositeOf)[1]][0]
else:
# reverse order
comp_one = compositesIDX[composites.get(s_CompositeOf)[1]][2]
comp_two = compositesIDX[composites.get(s_CompositeOf)[0]][2]
searchables_one = compositesIDX[composites.get(s_CompositeOf)[1]][4]
searchables_two = compositesIDX[composites.get(s_CompositeOf)[0]][4]
comp_oneOUT = compositesIDX[composites.get(s_CompositeOf)[1]][0]
comp_twoOUT = compositesIDX[composites.get(s_CompositeOf)[0]][0]
compOUT = 0
wildcards = []
phrases = []
for searchable_one in searchables_one:
# Work out all possible combination of comp1 near comp2
c1 = searchable_one
if searchable_one.find("/", 0, 1) > -1: m1 = 3
elif searchable_one.find("-") > -1: m1 = 2
elif searchable_one[:2].isupper() or len(searchable_one) < 3: m1 = 1
else: m1 = 0
for searchable_two in searchables_two:
c2 = searchable_two
if searchable_two.find("/", 0, 1) > -1: m2 = 3
elif searchable_two.find("-") > -1: m2 = 2
elif searchable_two[:2].isupper() or len(searchable_two) < 3: m2 = 1
else: m2 = 0
c = [c1,c2]
m = [m1,m2]
patterns = makeCompPattern(c, m)
if m1 == 3 or m2 == 3:
# One of the composites had a wildcard inside
wildcards.append(patterns[0])
wildcards.append(patterns[1])
else:
# No wildcards
phrase1 = c1 + " " + c2
phrase2 = c2 + " " + c1
phrases.append([phrase1, patterns[0]])
phrases.append([phrase2, patterns[1]])
THIScomp = len(re.findall(patterns[0],text_string)) + len(re.findall(patterns[1],text_string))
compOUT += THIScomp
if len(wildcards)>0:
for wild in wildcards:
for phrase in phrases:
if len(re.findall(wild," " + phrase[0] + " ")) > 0:
compOUT = compOUT - len(re.findall(phrase[1],text_string))
# Add extra results due to altLabels, calculated in the first part
if ckwlist.get(s_CompositeOf, 0) > 0:
# Add count and pop the item out of the dictionary
compOUT += ckwlist.pop(s_CompositeOf)
if compOUT > 0 and spires:
# Output ckws in spires standard output mode (,)
if store.value(rdflib.Namespace(s_CompositeOf),namespace["spiresLabel"],default=False,any=True):
compositesOUT.append([compOUT, store.value(rdflib.Namespace(s_CompositeOf),namespace["spiresLabel"],default=False,any=True), comp_one, comp_two, comp_oneOUT, comp_twoOUT])
else:
compositesOUT.append([compOUT, store.value(rdflib.Namespace(s_CompositeOf),namespace["prefLabel"],default=False,any=True).replace(":",","), comp_one, comp_two, comp_oneOUT, comp_twoOUT])
keys2drop.append(comp_one.strip())
keys2drop.append(comp_two.strip())
elif compOUT > 0:
# Output ckws in bibclassify mode (:)
compositesOUT.append([compOUT, store.value(rdflib.Namespace(s_CompositeOf),namespace["prefLabel"],default=False,any=True), comp_one, comp_two, comp_oneOUT, comp_twoOUT])
keys2drop.append(comp_one.strip())
keys2drop.append(comp_two.strip())
# Deal with ckws that only occur as altLabels
ckwleft = len(ckwlist)
while ckwleft > 0:
compositesTOADD.append(ckwlist.popitem())
ckwleft = ckwleft - 1
for s_CompositeTOADD, compTOADD_OUT in compositesTOADD:
if spires:
compositesOUT.append([compTOADD_OUT, store.value(rdflib.Namespace(s_CompositeTOADD),namespace["prefLabel"],default=False,any=True).replace(":",","), "null", "null", 0, 0])
else:
compositesOUT.append([compTOADD_OUT, store.value(rdflib.Namespace(s_CompositeTOADD),namespace["prefLabel"],default=False,any=True), "null", "null", 0, 0])
compositesOUT.sort()
compositesOUT.reverse()
# Some more keylist filtering: inclusion, e.g subtract "magnetic" if have "magnetic field"
for i in keylist:
pattern_to_match = " " + i[2].strip() + " "
for j in keylist:
test_key = " " + j[2].strip() + " "
if test_key.strip() != pattern_to_match.strip() and test_key.find(pattern_to_match) > -1:
keys2drop.append(pattern_to_match.strip())
text_out += "\nComposite keywords:\n"
for ncomp, pref_cOf_label, comp_one, comp_two, comp_oneOUT, comp_twoOUT in compositesOUT:
safe_comp_mark = " "
safe_one_mark = ""
safe_two_mark = ""
if safe_keys.find(pref_cOf_label)>-1:
safe_comp_mark = "*"
if safe_keys.find(comp_one)>-1:
safe_one_mark = "*"
if safe_keys.find(comp_two)>-1:
safe_two_mark = "*"
raw.append([str(ncomp),str(pref_cOf_label)])
text_out += str(ncomp) + safe_comp_mark + " " + str(pref_cOf_label) + " [" + str(comp_oneOUT) + safe_one_mark + ", " + str(comp_twoOUT) + safe_two_mark + "]\n"
if safe_comp_mark == "*": html_out.append([ncomp, str(pref_cOf_label), 1])
else: html_out.append([ncomp, str(pref_cOf_label), 0])
text_out += "\n\nSingle keywords:\n"
for i in range(limit):
safe_mark = " "
try:
idx = keys2drop.index(keylist[i][2].strip())
except:
idx = -1
if safe_keys.find(keylist[i][2])>-1:
safe_mark = "*"
if idx == -1 and nkeywords > 0 and not keylist[i][5]:
text_out += str(keylist[i][0]) + safe_mark + " " + keylist[i][2] + "\n"
raw.append([keylist[i][0], keylist[i][2]])
if safe_mark == "*": html_out.append([keylist[i][0], keylist[i][2], 1])
else: html_out.append([keylist[i][0], keylist[i][2], 0])
nkeywords = nkeywords - 1
if output == 0:
# Output some text
return text_out
elif output == 2:
# return marc xml output.
xml = ""
for key in raw:
xml += """
%sBibClassify/%s""" % (key[1],os.path.splitext(os.path.basename(ontology))[0])
return xml
else:
# Output some HTML
html_out.sort()
html_out.reverse()
return make_tag_cloud(html_out)
def make_tag_cloud(entries):
"""Using the counts for each of the tags, write a simple HTML page to
standard output containing a tag cloud representation. The CSS
describes ten levels, each of which has differing font-size's,
line-height's and font-weight's.
"""
max_occurrence = int(entries[0][0])
ret = "\n"
ret += "\n"
ret += "Keyword Cloud\n"
ret += "\n"
ret += "\n"
ret += "\n"
ret += "
\n"
cloud = ""
cloud_list = []
cloud += '
'
# Generate some ad-hoc count distribution
for i in range(0, len(entries)):
count = int(entries[i][0])
tag = str(entries[i][1])
color = int(entries[i][2])
if count < (max_occurrence/10):
cloud_list.append([tag,0,color])
elif count < (max_occurrence/7.5):
cloud_list.append([tag,1,color])
elif count < (max_occurrence/5):
cloud_list.append([tag,2,color])
elif count < (max_occurrence/4):
cloud_list.append([tag,3,color])
elif count < (max_occurrence/3):
cloud_list.append([tag,4,color])
elif count < (max_occurrence/2):
cloud_list.append([tag,5,color])
elif count < (max_occurrence/1.7):
cloud_list.append([tag,6,color])
elif count < (max_occurrence/1.5):
cloud_list.append([tag,7,color])
elif count < (max_occurrence/1.3):
cloud_list.append([tag,8,color])
else:
cloud_list.append([tag,9,color])
cloud_list.sort()
for i in range(0, len(cloud_list)):
cloud += ' 0:
cloud += 'style="color:red" '
cloud += '> %s ' % cloud_list[i][0]
cloud += '
'
ret += cloud + '\n'
ret += "
\n"
ret += "\n"
return ret
def makeCompPattern(candidates, modes):
"""Takes a set of two composite keywords (candidates) and compiles a REGEX expression around it, according to the chosen modes for each one:
- 0 : plain case-insensitive search
- 1 : plain case-sensitive search
- 2 : hyphen
- 3 : wildcard"""
begREGEX = '(?:[^A-Za-z0-9\+-])('
endREGEX = ')(?=[^A-Za-z0-9\+-])'
pattern_text = []
patterns = []
for i in range(2):
if modes[i] == 0:
pattern_text.append(str(re.escape(candidates[i]) + 's?'))
if modes[i] == 1:
pattern_text.append(str(re.escape(candidates[i])))
if modes[i] == 2:
hyphen = True
parts = candidates[i].split("-")
pattern_string = ""
for part in parts:
if len(part)<1 or part.find(" ", 0, 1)> -1:
# This is not really a hyphen, maybe a minus sign: treat as isupper().
hyphen = False
pattern_string = pattern_string + re.escape(part) + "[- \t]?"
if hyphen:
pattern_text.append(pattern_string)
else:
pattern_text.append(re.escape(candidates[i]))
if modes[i] == 3:
pattern_text.append(candidates[i].replace("/",""))
pattern_one = re.compile(begREGEX + pattern_text[0] + "s?[ \s,-]*" + pattern_text[1] + endREGEX, re.I)
pattern_two = re.compile(begREGEX + pattern_text[1] + "s?[ \s,-]*" + pattern_text[0] + endREGEX, re.I)
patterns.append(pattern_one)
patterns.append(pattern_two)
return patterns
def makePattern(candidate, mode):
"""Takes a keyword (candidate) and compiles a REGEX expression around it, according to the chosen mode:
- 0 : plain case-insensitive search
- 1 : plain case-sensitive search
- 2 : hyphen
- 3 : wildcard"""
# NB. At the moment, some patterns are compiled having an optional trailing "s".
# This is a very basic method to find plurals in English.
# If this program is to be used in other languages, please remove the "s?" from the REGEX
# Also, inclusion of plurals at the ontology level would be preferred.
begREGEX = '(?:[^A-Za-z0-9\+-])('
endREGEX = ')(?=[^A-Za-z0-9\+-])'
try:
if mode == 0:
pattern = re.compile(begREGEX + re.escape(candidate) + 's?' + endREGEX, re.I)
if mode == 1:
pattern = re.compile(begREGEX + re.escape(candidate) + endREGEX)
if mode == 2:
hyphen = True
parts = candidate.split("-")
pattern_string = begREGEX
for part in parts:
if len(part)<1 or part.find(" ", 0, 1)> -1:
# This is not really a hyphen, maybe a minus sign: treat as isupper().
hyphen = False
pattern_string = pattern_string + re.escape(part) + "[- \t]?"
pattern_string += endREGEX
if hyphen:
pattern = re.compile(pattern_string, re.I)
else:
pattern = re.compile(begREGEX + re.escape(candidate) + endREGEX, re.I)
if mode == 3:
pattern = re.compile(begREGEX + candidate.replace("/","") + endREGEX, re.I)
except:
print "Invalid thesaurus term: " + re.escape(candidate) + " "
return pattern
def profile(t="", d=""):
import profile
import pstats
profile.run("generate_keywords_rdf(textfile='%s',dictfile='%s')" % (t, d), "bibclassify_profile")
p = pstats.Stats("bibclassify_profile")
p.strip_dirs().sort_stats("cumulative").print_stats()
return 0
def main():
"""Main function """
global options
long_flags =["file=",
"thesaurus=","ontology=",
"output=","limit=", "nkeywords=", "mode=",
"spires", "help", "version"]
short_flags ="f:k:K:o:l:n:m:qhVv:"
spires = False
limit = 70
nkeywords = 25
input_file = ""
dict_file = ""
output = 0
mode = 0
verbose = 0
try:
opts, args = getopt.getopt(sys.argv[1:], short_flags, long_flags)
except getopt.GetoptError, err:
write_message(err, sys.stderr)
usage(1)
if args:
usage(1)
try:
- from invenio.config import tmpdir, CFG_PATH_PDFTOTEXT, version
+ from invenio.config import CFG_TMPDIR, CFG_PATH_PDFTOTEXT, CFG_VERSION
version_bibclassify = 0.1
- bibclassify_engine_version = "CDS Invenio/%s bibclassify/%s" % (version, version_bibclassify)
+ bibclassify_engine_version = "CDS Invenio/%s bibclassify/%s" % (CFG_VERSION, version_bibclassify)
except:
- tmpdir = TMPDIR_STANDALONE
+ CFG_TMPDIR = TMPDIR_STANDALONE
CFG_PATH_PDFTOTEXT = PDFTOTEXT_STANDALONE
- temp_text = tmpdir + '/bibclassify.pdftotext.' + str(os.getpid())
+ temp_text = CFG_TMPDIR + '/bibclassify.pdftotext.' + str(os.getpid())
try:
for opt in opts:
if opt == ("-h","") or opt == ("--help",""):
usage(1)
elif opt == ("-V","") or opt == ("--version",""):
print bibclassify_engine_version
sys.exit(1)
elif opt[0] in [ "-v", "--verbose" ]:
verbose = opt[1]
elif opt[0] in [ "-f", "--file" ]:
if opt[1].find(".pdf")>-1:
# Treat as PDF
cmd = "%s " % CFG_PATH_PDFTOTEXT + opt[1] + " " + temp_text
errcode = os.system(cmd)
if errcode == 0 and os.path.exists("%s" % temp_text):
input_file = temp_text
else:
print "Error while running %s.\n" % cmd
sys.exit(1)
else:
# Treat as text
input_file = opt[1]
elif opt[0] in [ "-k", "--thesaurus" ]:
if dict_file=="":
dict_file = opt[1]
else:
print "Either a text thesaurus or an ontology (in .rdf format)"
sys.exit(1)
elif opt[0] in [ "-K", "--taxonomy" ]:
if dict_file=="" and opt[1].find(".rdf")!=-1:
dict_file = opt[1]
else:
print "Either a text thesaurus or an ontology (in .rdf format)"
sys.exit(1)
elif opt[0] in [ "-o", "--output" ]:
try:
if str(opt[1]).lower().strip() == "html":
output = 1
elif str(opt[1]).lower().strip() == "text":
output = 0
elif str(opt[1]).lower().strip() == "marcxml":
output = 2
else:
write_message('Output mode (-o) can only be "HTML", "TEXT", or "MARCXML". Using default output mode (HTML)')
except:
write_message('Output mode (-o) can only be "HTML", "TEXT", or "MARCXML". Using default output mode (HTML)')
elif opt[0] in [ "-m", "--mode" ]:
try:
if str(opt[1]).lower().strip() == "partial":
mode = 1
elif str(opt[1]).lower().strip() == "full":
mode = 0
else:
write_message('Processing mode (-m) can only be "PARTIAL" or "FULL". Using default output mode (FULL)')
except:
write_message('Processing mode (-m) can only be "PARTIAL" or "FULL". Using default output mode (FULL)')
elif opt[0] in [ "-q", "--spires" ]:
spires = True
elif opt[0] in [ "-l", "--limit" ]:
try:
num = int(opt[1])
if num>1:
limit = num
else:
write_message("Number of keywords for processing (--limit) must be an integer higher than 1. Using default value of 70...")
except ValueError:
write_message("Number of keywords for processing (-n) must be an integer. Using default value of 70...")
elif opt[0] in [ "-n", "--nkeywords" ]:
try:
num = int(opt[1])
if num>1:
nkeywords = num
else:
write_message("Number of keywords (--nkeywords) must be an integer higher than 1. Using default value of 25...")
except ValueError:
write_message("Number of keywords (--n) must be an integer. Using default value of 25...")
except StandardError, e:
write_message(e, sys.stderr)
sys.exit(1)
if input_file == "" or dict_file == "":
write_message("Need to enter the name of an input file AND a thesaurus file \n")
usage(1)
# Weak method to detect dict_file. Need to improve this (e.g. by looking inside the metadata with rdflib?)
if dict_file.find(".rdf")!=-1:
outcome = generate_keywords_rdf(input_file, dict_file, output, limit, nkeywords, mode, spires, verbose, dict_file)
else: # Treat as text
outcome = generate_keywords(input_file, dict_file, verbose)
print outcome
if limit > len(outcome): limit = len(outcome)
if output == 0:
for i in range(limit):
print outcome[i]
else:
print ""
print ""
print "Keywords"
print ""
print "
"
print '
'
for i in range(limit):
print "" + str(outcome[i]) + " "
print '
'
print "
"
print ""
return
if __name__ == '__main__':
main()
diff --git a/modules/bibconvert/lib/bibconvert.py b/modules/bibconvert/lib/bibconvert.py
index 2da2a596c..bb1dd6700 100644
--- a/modules/bibconvert/lib/bibconvert.py
+++ b/modules/bibconvert/lib/bibconvert.py
@@ -1,2100 +1,2100 @@
## $Id$
-
+
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""BibConvert tool to convert bibliographic records from any format to any format."""
__revision__ = "$Id$"
import fileinput
import string
import os
import re
import sys
import time
import getopt
from time import gmtime, strftime, localtime
import os.path
from invenio.config import \
CFG_OAI_ID_PREFIX, \
- version,\
- etcdir
+ CFG_VERSION,\
+ CFG_ETCDIR
from invenio.search_engine import perform_request_search
-CFG_BIBCONVERT_KB_PATH = "%s%sbibconvert%sKB" % (etcdir, os.sep, os.sep)
+CFG_BIBCONVERT_KB_PATH = "%s%sbibconvert%sKB" % (CFG_ETCDIR, os.sep, os.sep)
### Matching records with database content
def parse_query_string(query_string):
"""Parse query string, e.g.:
Input: 245__a::REP(-, )::SHAPE::SUP(SPACE, )::MINL(4)::MAXL(8)::EXPW(PUNCT)::WORDS(4,L)::SHAPE::SUP(SPACE, )||700__a::MINL(2)::REP(COMMA,).
Output:[['245__a','REP(-,)','SHAPE','SUP(SPACE, )','MINL(4)','MAXL(8)','EXPW(PUNCT)','WORDS(4,L)','SHAPE','SUP(SPACE, )'],['700__a','MINL(2)','REP(COMMA,)']]
"""
query_string_out = []
query_string_out_in = []
query_string_split_1 = query_string.split('||')
for item_1 in query_string_split_1:
query_string_split_2 = item_1.split('::')
query_string_out_in = []
for item in query_string_split_2:
query_string_out_in.append(item)
query_string_out.append(query_string_out_in)
return query_string_out
def set_conv():
"""
bibconvert common settings
=======================
minimal length of output line = 1
maximal length of output line = 4096
"""
conv_setting = [
- 1,
+ 1,
4096
]
return conv_setting
def get_pars(fn):
"Read function and its parameters into list"
-
+
out = []
out.append(re.split('\(|\)', fn)[0])
out.append(re.split(',', re.split('\(|\)', fn)[1]))
return out
def get_other_par(par, cfg):
"Get other parameter (par) from the configuration file (cfg)"
out = ""
other_parameters = {
'_QRYSTR_' : '_QRYSTR_---.*$',
'_MATCH_' : '_MATCH_---.*$',
'_RECSEP_' : '_RECSEP_---.*$',
'_EXTCFG_' : '_EXTCFG_---.*$',
'_SRCTPL_' : '_SRCTPL_---.*$',
'_DSTTPL_' : '_DSTTPL_---.*$',
'_RECHEAD_': '_RECHEAD_---.*$',
'_RECFOOT_': '_RECFOOT_---.*$',
'_HEAD_' : '_HEAD_---.*$',
'_FOOT_' : '_FOOT_---.*$',
'_EXT_' : '_EXT_---.*$',
'_SEP_' : '_SEP_---.*$',
'_COD_' : '_COD_---.*$',
'_FRK_' : '_FRK_---.*$',
'_NC_' : '_NC_---.*$',
'_MCH_' : '_MCH_---.*$',
'_UPL_' : '_UPL_---.*$',
'_AUTO_' : '_AUTO_---.*$'
-
+
}
-
+
parameters = other_parameters.keys()
for line in fileinput.input(cfg):
pattern = re.compile(other_parameters[par])
items = pattern.findall(line)
for item in items:
out = item.split('---')[1]
return out
def append_to_output_file(filename, output):
"bibconvert output file creation by output line"
try:
file = open(filename, 'a')
file.write(output)
file.close()
except IOError, e:
exit_on_error("Cannot write into %s" % filename)
-
+
return 1
-
+
def sub_keywd(out):
"bibconvert keywords literal substitution"
out = string.replace(out, "EOL", "\n")
out = string.replace(out, "_CR_", "\r")
out = string.replace(out, "_LF_", "\n")
out = string.replace(out, "\\", '\\')
out = string.replace(out, "\r", '\r')
out = string.replace(out, "BSLASH", '\\')
- out = string.replace(out, "COMMA", ',')
+ out = string.replace(out, "COMMA", ',')
out = string.replace(out, "LEFTB", '[')
out = string.replace(out, "RIGHTB", ']')
out = string.replace(out, "LEFTP", '(')
out = string.replace(out, "RIGHTP", ')')
-
+
return out
def check_split_on(data_item_split, sep, tpl_f):
"""
bibconvert conditional split with following conditions
===================================================
::NEXT(N,TYPE,SIDE) - next N chars are of the TYPE having the separator on the SIDE
::PREV(N,TYPE,SIDE) - prev.N chars are of the TYPE having the separator on the SIDE
- """
+ """
fn = get_pars(tpl_f)[0]
par = get_pars(tpl_f)[1]
-
-
+
+
done = 0
while (done == 0):
if ( (( fn == "NEXT" ) and ( par[2]=="R" )) or
(( fn == "PREV" ) and ( par[2]=="L" )) ):
test_value = data_item_split[0][-(string.atoi(par[0])):]
-
+
elif ( ((fn == "NEXT") and ( par[2]=="L")) or
((fn == "PREV") and ( par[2]=="R")) ):
-
+
test_value = data_item_split[1][:(string.atoi(par[0]))]
data_item_split_tmp = []
if ((FormatField(test_value, "SUP(" + par[1] + ",)") != "") \
or (len(test_value) < string.atoi(par[0]))):
data_item_split_tmp = data_item_split[1].split(sep, 1)
if(len(data_item_split_tmp)==1):
done = 1
data_item_split[0] = data_item_split[0] + sep + \
data_item_split_tmp[0]
data_item_split[1] = ""
else:
data_item_split[0] = data_item_split[0] + sep + \
data_item_split_tmp[0]
data_item_split[1] = data_item_split_tmp[1]
else:
done = 1
return data_item_split
def get_subfields(data, subfield, src_tpl):
"Get subfield according to the template"
out = []
for data_item in data:
found = 0
for src_tpl_item in src_tpl:
if (src_tpl_item[:2] == "<:"):
if (src_tpl_item[2:-2] == subfield):
found = 1
else:
sep_in_list = src_tpl_item.split("::")
sep = sep_in_list[0]
-
+
data_item_split = data_item.split(sep, 1)
if (len(data_item_split)==1):
data_item = data_item_split[0]
else:
if (len(sep_in_list) > 1):
data_item_split = check_split_on(data_item.split(sep, 1),
sep_in_list[0],
sep_in_list[1])
if(found == 1):
data_item = data_item_split[0]
else:
data_item = string.join(data_item_split[1:], sep)
out.append(data_item)
return out
def exp_n(word):
"Replace newlines and carriage return's from string."
out = ""
-
+
for ch in word:
if ((ch != '\n') and (ch != '\r')):
out = out + ch
- return out
-
+ return out
+
def exp_e(list):
"Expunge empty elements from a list"
out = []
for item in list:
item = exp_n(item)
if ((item != '\r\n' and item != '\r' \
and item != '\n' and item !="" \
and len(item)!=0)):
out.append(item)
return out
def sup_e(word):
"Replace spaces"
out = ""
-
+
for ch in word:
if (ch != ' '):
out = out + ch
- return out
+ return out
def select_line(field_code, list):
"Return appropriate item from a list"
-
+
out = ['']
for field in list:
-
+
field[0] = sup_e(field[0])
field_code = sup_e(field_code)
if (field[0] == field_code):
out = field[1]
return out
def parse_field_definition(source_field_definition):
"Create list of source_field_definition"
-
+
word_list = []
out = []
word = ""
counter = 0
-
+
if (len(source_field_definition.split("---"))==4):
out = source_field_definition.split("---")
else:
element_list_high = source_field_definition.split("<:")
for word_high in element_list_high:
element_list_low = word_high.split(':>')
for word_low in element_list_low:
word_list.append(word_low)
word_list.append(":>")
- word_list.pop()
+ word_list.pop()
word_list.append("<:")
word_list.pop()
for item in word_list:
word = word + item
if (item == "<:"):
counter = counter + 1
if (item == ":>"):
counter = counter - 1
if counter == 0:
out.append(word)
word = ""
return out
def parse_template(template):
"""
bibconvert parse template
======================
- in - template filename
+ in - template filename
out - [ [ field_code , [ field_template_parsed ] , [] ]
"""
out = []
for field_def in read_file(template, 1):
field_tpl_new = []
if ((len(field_def.split("---", 1)) > 1) and (field_def[:1] != "#")):
-
+
field_code = field_def.split("---", 1)[0]
field_tpl = parse_field_definition(field_def.split("---", 1)[1])
-
+
field_tpl_new = field_tpl
field_tpl = exp_e(field_tpl_new)
out_data = [field_code, field_tpl]
out.append(out_data)
-
+
return out
def parse_common_template(template, part):
"""
bibconvert parse template
=========================
in - template filename
out - [ [ field_code , [ field_template_parsed ] , [] ]
"""
out = []
counter = 0
for field_def in read_file(template, 1):
if (exp_n(field_def)[:3] == "==="):
counter = counter + 1
-
+
elif (counter == part):
-
+
field_tpl_new = []
if ((len(field_def.split("---", 1)) > 1) and (field_def[:1]!="#")):
-
+
field_code = field_def.split("---", 1)[0]
field_tpl = parse_field_definition(field_def.split("---", 1)[1])
-
+
field_tpl_new = field_tpl
field_tpl = exp_e(field_tpl_new)
out_data = [field_code, field_tpl]
out.append(out_data)
return out
def parse_input_data_f(source_data_open, source_tpl):
"""
bibconvert parse input data
========================
in - input source data location (filehandle)
source data template
source_field_code list of source field codes
source_field_data list of source field data values (repetitive fields each line one occurence)
out - [ [ source_field_code , [ source_field_data ] ] , [] ]
source_data_template entry - field_code---[const]<:subfield_code:>[const][<:subfield_code:>][]
destination_templace entry - [::GFF()]---[const]<:field_code::subfield_code[::FF()]:>[]
input data file; by line: - fieldcode value
"""
global separator
out = [['', []]]
count = 0
values = []
while (count < 1):
line = source_data_open.readline()
if (line == ""):
return(-1)
line_split = line.split(" ", 1)
if (re.sub("\s", "", line) == separator):
count = count + 1
if (len(line_split) == 2):
field_code = line_split[0]
field_value = exp_n(line_split[1])
-
+
values.append([field_code, field_value])
item_prev = ""
stack = ['']
-
+
for item in values:
if ((item[0]==item_prev)or(item_prev == "")):
stack.append(item[1])
item_prev = item[0]
else:
out.append([item_prev, stack])
item_prev = item[0]
stack = []
stack.append(item[1])
try:
if (stack[0] != ""):
if (out[0][0]==""):
out = []
out.append([field_code, stack])
except IndexError, e:
out = out
-
+
return out
def parse_input_data_fx(source_tpl):
"""
bibconvert parse input data
========================
in - input source data location (filehandle)
source data template
source_field_code list of source field codes
source_field_data list of source field data values (repetitive fields each line one occurence)
out - [ [ source_field_code , [ source_field_data ] ] , [] ]
- extraction_template_entry -
+ extraction_template_entry -
input data file - specified by extract_tpl
"""
global separator
count = 0
record = ""
field_data_1_in_list = []
out = [['', []]]
while (count <10):
line = sys.stdin.readline()
if (line == ""):
count = count + 1
if (record == "" and count):
return (-1)
if (re.sub("\s", "", line) == separator):
count = count + 10
else:
record = record + line
for field_defined in extract_tpl_parsed:
try:
field_defined[1][0] = sub_keywd(field_defined[1][0])
field_defined[1][1] = sub_keywd(field_defined[1][1])
except IndexError, e:
field_defined = field_defined
-
+
try:
field_defined[1][2] = sub_keywd(field_defined[1][2])
except IndexError, e:
field_defined = field_defined
-
+
field_data_1 =""
-
+
if ((field_defined[1][0][0:2] == '//') and \
(field_defined[1][0][-2:] == '//')):
field_defined_regexp = field_defined[1][0][2:-2]
try:
####
if (len(re.split(field_defined_regexp, record)) == 1):
field_data_1 = ""
field_data_1_in_list = []
else:
field_data_1_tmp = re.split(field_defined_regexp, record, 1)[1]
field_data_1_in_list = field_data_1_tmp.split(field_defined_regexp)
-
+
except IndexError, e:
field_data_1 = ""
else:
try:
if (len(record.split(field_defined[1][0])) == 1):
field_data_1 = ""
field_data_1_in_list = []
else:
field_data_1_tmp = record.split(field_defined[1][0], 1)[1]
field_data_1_in_list = field_data_1_tmp.split(field_defined[1][0])
except IndexError, e:
field_data_1 = ""
-
+
spliton = []
outvalue = ""
field_data_2 = ""
field_data = ""
-
+
try:
if ((field_defined[1][1])=="EOL"):
spliton = ['\n']
elif ((field_defined[1][1])=="MIN"):
spliton = ['\n']
elif ((field_defined[1][1])=="MAX"):
for item in extract_tpl_parsed:
try:
spliton.append(item[1][0])
except IndexError, e:
spliton = spliton
elif (field_defined[1][1][0:2] == '//') and \
(field_defined[1][1][-2:] == '//'):
spliton = [field_defined[1][1][2:-2]]
-
+
else:
spliton = [field_defined[1][1]]
-
+
except IndexError,e :
spliton = ""
outvalues = []
-
+
for field_data in field_data_1_in_list:
outvalue = ""
for splitstring in spliton:
-
+
field_data_2 = ""
if (len(field_data.split(splitstring))==1):
if (outvalue == ""):
field_data_2 = field_data
else:
field_data_2 = outvalue
else:
field_data_2 = field_data.split(splitstring)[0]
-
+
outvalue = field_data_2
field_data = field_data_2
-
+
outvalues.append(outvalue)
outvalues = exp_e(outvalues)
if (len(outvalues) > 0):
if (out[0][0]==""):
out = []
outstack = []
if (len(field_defined[1])==3):
-
+
spliton = [field_defined[1][2]]
if (field_defined[1][2][0:2] == '//') and \
(field_defined[1][2][-2:] == '//'):
spliton = [field_defined[1][2][2:-2]]
for item in outvalues:
stack = re.split(spliton[0], item)
for stackitem in stack:
- outstack.append(stackitem)
+ outstack.append(stackitem)
else:
outstack = outvalues
-
+
out.append([field_defined[0], outstack])
return out
def parse_input_data_d(source_data, source_tpl):
"""
bibconvert parse input data
========================
in - input source data location (directory)
source data template
source_field_code list of source field codes
source_field_data list of source field data values (repetitive fields each line one occurence)
out - [ [ source_field_code , [ source_field_data ] ] , [] ]
source_data_template entry - field_code---[const]<:subfield_code:>[const][<:subfield_code:>][]
destination_templace entry - [::GFF()]---[const]<:field_code::subfield_code[::FF()]:>[]
input data dir; by file: - fieldcode value per line
"""
-
+
out = []
-
+
for source_field_tpl in read_file(source_tpl, 1):
source_field_code = source_field_tpl.split("---")[0]
source_field_data = read_file(source_data + source_field_code, 0)
source_field_data = exp_e(source_field_data)
-
+
out_data = [source_field_code, source_field_data]
out.append(out_data)
-
+
return out
def sub_empty_lines(value):
out = re.sub('\n\n+', '', value)
return out
def set_par_defaults(par1, par2):
"Set default parameter when not defined"
par_new_in_list = par2.split(",")
i = 0
out = []
for par in par_new_in_list:
-
+
if (len(par1)>i):
if (par1[i] == ""):
out.append(par)
else:
out.append(par1[i])
else:
out.append(par)
i = i + 1
return out
def generate(keyword):
"""
bibconvert generaded values:
=========================
SYSNO() - generate date as '%w%H%M%S'
WEEK(N) - generate date as '%V' with shift (N)
DATE(format) - generate date in specifieddate FORMAT
VALUE(value) - enter value literarly
OAI() - generate oai_identifier, starting value given at command line as -o
"""
out = keyword
fn = keyword + "()"
par = get_pars(fn)[1]
fn = get_pars(fn)[0]
-
+
par = set_par_defaults(par, "")
-
+
if (fn == "SYSNO"):
out = sysno500
if (fn == "SYSNO330"):
out = sysno
if (fn == "WEEK"):
par = set_par_defaults(par, "0")
out = "%02d" % (string.atoi(strftime("%V", localtime())) \
+ string.atoi(par[0]))
if (string.atoi(out)<0):
out = "00"
if (fn == "VALUE"):
par = set_par_defaults(par, "")
out = par[0]
if (fn == "DATE"):
par = set_par_defaults(par, "%w%H%M%S," + "%d" % set_conv()[1])
out = strftime(par[0], localtime())
out = out[:string.atoi(par[1])]
if (fn == "XDATE"):
par = set_par_defaults(par,"%w%H%M%S," + ",%d" % set_conv()[1])
out = strftime(par[0], localtime())
out = par[1] + out[:string.atoi(par[2])]
if (fn == "OAI"):
out = "%s:%d" % (CFG_OAI_ID_PREFIX, tcounter + oai_identifier_from)
return out
def read_file(filename, exception):
"Read file into list"
out = []
if (os.path.isfile(filename)):
file = open(filename,'r')
out = file.readlines()
file.close()
else:
if exception:
exit_on_error("Cannot access file: %s" % filename)
return out
-
+
def crawl_KB(filename, value, mode):
"""
bibconvert look-up value in KB_file in one of following modes:
===========================================================
1 - case sensitive / match (default)
2 - not case sensitive / search
3 - case sensitive / search
4 - not case sensitive / match
5 - case sensitive / search (in KB)
6 - not case sensitive / search (in KB)
7 - case sensitive / search (reciprocal)
8 - not case sensitive / search (reciprocal)
9 - replace by _DEFAULT_ only
R - not case sensitive / search (reciprocal) (8) replace
"""
if (os.path.isfile(filename) != 1):
# Look for KB in same folder as extract_tpl, if exists
try:
pathtmp = string.split(extract_tpl,"/")
pathtmp.pop()
path = string.join(pathtmp,"/")
filename = path + "/" + filename
except NameError:
# File was not found. Try to look inside default KB
# directory
filename = CFG_BIBCONVERT_KB_PATH + os.sep + filename
-
+
# FIXME: Remove \n from returned value?
if (os.path.isfile(filename)):
-
+
file_to_read = open(filename,"r")
-
+
file_read = file_to_read.readlines()
for line in file_read:
code = string.split(line, "---")
-
+
if (mode == "2"):
value_to_cmp = string.lower(value)
code[0] = string.lower(code[0])
if ((len(string.split(value_to_cmp, code[0])) > 1) \
or (code[0]=="_DEFAULT_")):
value = code[1]
return value
-
+
elif ((mode == "3") or (mode == "0")):
if ((len(string.split(value, code[0])) > 1) or \
(code[0] == "_DEFAULT_")):
value = code[1]
return value
elif (mode == "4"):
value_to_cmp = string.lower(value)
code[0] = string.lower(code[0])
if ((code[0] == value_to_cmp) or \
(code[0] == "_DEFAULT_")):
value = code[1]
return value
elif (mode == "5"):
if ((len(string.split(code[0], value)) > 1) or \
(code[0] == "_DEFAULT_")):
value = code[1]
return value
-
+
elif (mode == "6"):
value_to_cmp = string.lower(value)
code[0] = string.lower(code[0])
if ((len(string.split(code[0], value_to_cmp)) > 1) or \
(code[0] == "_DEFAULT_")):
value = code[1]
return value
-
+
elif (mode == "7"):
if ((len(string.split(code[0], value)) > 1) or \
(len(string.split(value,code[0])) > 1) or \
(code[0] == "_DEFAULT_")):
value = code[1]
return value
-
+
elif (mode == "8"):
value_to_cmp = string.lower(value)
code[0] = string.lower(code[0])
if ((len(string.split(code[0], value_to_cmp)) > 1) or \
(len(string.split(value_to_cmp, code[0])) > 1) or \
(code[0] == "_DEFAULT_")):
value = code[1]
return value
-
+
elif (mode == "9"):
if (code[0]=="_DEFAULT_"):
value = code[1]
return value
elif (mode == "R"):
value_to_cmp = string.lower(value)
code[0] = string.lower(code[0])
if ((len(string.split(code[0], value_to_cmp)) > 1) or \
(len(string.split(value_to_cmp, code[0])) > 1) or \
(code[0] == "_DEFAULT_")):
value = value.replace(code[0], code[1])
else:
if ((code[0] == value) or (code[0]=="_DEFAULT_")):
value = code[1]
return value
else:
sys.stderr.write("Warning: given KB could not be found. \n")
return value
def FormatField(value, fn):
"""
bibconvert formatting functions:
================================
- ADD(prefix,suffix) - add prefix/suffix
- KB(kb_file,mode) - lookup in kb_file and replace value
+ ADD(prefix,suffix) - add prefix/suffix
+ KB(kb_file,mode) - lookup in kb_file and replace value
ABR(N,suffix) - abbreviate to N places with suffix
ABRX() - abbreviate exclusively words longer
ABRW() - abbreviate word (limit from right)
REP(x,y) - replace
SUP(type) - remove characters of certain TYPE
LIM(n,side) - limit to n letters from L/R
LIMW(string,side) - L/R after split on string
WORDS(n,side) - limit to n words from L/R
IF(value,valueT,valueF) - replace on IF condition
MINL(n) - replace words shorter than n
MINLW(n) - replace words shorter than n
MAXL(n) - replace words longer than n
EXPW(type) - replace word from value containing TYPE
EXP(STR,0/1) - replace word from value containing string
NUM() - take only digits in given string
SHAPE() - remove extra space
UP() - to uppercase
DOWN() - to lowercase
CAP() - make capitals each word
SPLIT(n,h,str,from) - only for final Aleph field, i.e. AB , maintain whole words
SPLITW(sep,h,str,from) - only for final Aleph field, split on string
CONF(filed,value,0/1) - confirm validity of output line (check other field)
CONFL(substr,0/1) - confirm validity of output line (check field being processed)
CUT(prefix,postfix) - remove substring from side
RANGE(MIN,MAX) - select items in repetitive fields
RE(regexp) - regular expressions
IFDEFP(field,value,0/1) - confirm validity of output line (check other field)
NOTE: This function works for CONSTANT
lines - those without any variable values in
them.
JOINMULTILINES(prefix,suffix) - Given a field-value with newlines in it,
split the field on the new lines (\n), separating
them with prefix, then suffix. E.g.:
For the field XX with the value:
Test
Case, A
And the function call:
<:XX^::XX::JOINMULTILINES(,):>
The results would be:
TestCase, A
One note on this: <:XX^::XX:
Without the ^ the newlines will be lost as
bibconvert will remove them, so you'll
never see an effect from this function.
-
-
+
+
bibconvert character TYPES
==========================
ALPHA - alphabetic
NALPHA - not alpphabetic
NUM - numeric
NNUM - not numeric
ALNUM - alphanumeric
NALNUM - non alphanumeric
LOWER - lowercase
UPPER - uppercase
PUNCT - punctual
NPUNCT - non punctual
SPACE - space
"""
global data_parsed
out = value
fn = fn + "()"
par = get_pars(fn)[1]
fn = get_pars(fn)[0]
regexp = "//"
NRE = len(regexp)
value = sub_keywd(value)
par_tmp = []
for item in par:
item = sub_keywd(item)
par_tmp.append(item)
- par = par_tmp
-
+ par = par_tmp
+
if (fn == "RE"):
new_value = ""
par = set_par_defaults(par,".*,0")
if (re.search(par[0], value) and (par[1] == "0")):
new_value = value
out = new_value
-
+
if (fn == "KB"):
new_value = ""
-
+
par = set_par_defaults(par, "KB,0")
new_value = crawl_KB(par[0], value, par[1])
out = new_value
elif (fn == "ADD"):
-
+
par = set_par_defaults(par, ",")
out = par[0] + value + par[1]
-
+
elif (fn == "ABR"):
- par = set_par_defaults(par, "1,.")
+ par = set_par_defaults(par, "1,.")
out = value[:string.atoi(par[0])] + par[1]
elif (fn == "ABRW"):
tmp = FormatField(value, "ABR(1,.)")
tmp = tmp.upper()
out = tmp
elif (fn == "ABRX"):
- par = set_par_defaults(par, ",")
- toout = []
+ par = set_par_defaults(par, ",")
+ toout = []
tmp = value.split(" ")
for wrd in tmp:
if (len(wrd) > string.atoi(par[0])):
wrd = wrd[:string.atoi(par[0])] + par[1]
toout.append(wrd)
out = string.join(toout, " ")
elif (fn == "SUP"):
par = set_par_defaults(par, ",")
if(par[0]=="NUM"):
out = re.sub('\d+', par[1], value)
-
+
if(par[0]=="NNUM"):
out = re.sub('\D+', par[1], value)
if(par[0]=="ALPHA"):
out = re.sub('[a-zA-Z]+', par[1], value)
if(par[0]=="NALPHA"):
out = re.sub('[^a-zA-Z]+', par[1], value)
if((par[0]=="ALNUM") or (par[0] == "NPUNCT")):
out = re.sub('\w+', par[1], value)
if(par[0]=="NALNUM"):
out = re.sub('\W+', par[1], value)
if(par[0]=="PUNCT"):
out = re.sub('\W+', par[1], value)
-
+
if(par[0]=="LOWER"):
out = re.sub('[a-z]+', par[1], value)
if(par[0]=="UPPER"):
out = re.sub('[A-Z]+', par[1], value)
if(par[0]=="SPACE"):
out = re.sub('\s+', par[1], value)
-
+
elif (fn == "LIM"):
- par = set_par_defaults(par,",")
+ par = set_par_defaults(par,",")
if (par[1] == "L"):
- out = value[(len(value) - string.atoi(par[0])):]
+ out = value[(len(value) - string.atoi(par[0])):]
if (par[1] == "R"):
out = value[:string.atoi(par[0])]
elif (fn == "LIMW"):
- par = set_par_defaults(par,",")
+ par = set_par_defaults(par,",")
if (par[0]!= ""):
if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp):
par[0] = par[0][NRE:-NRE]
par[0] = re.search(par[0], value).group()
tmp = value.split(par[0])
if (par[1] == "L"):
out = par[0] + tmp[1]
if (par[1] == "R"):
out = tmp[0] + par[0]
elif (fn == "WORDS"):
tmp2 = [value]
- par = set_par_defaults(par, ",")
+ par = set_par_defaults(par, ",")
if (par[1] == "R"):
tmp = value.split(" ")
- tmp2 = []
+ tmp2 = []
i = 0
while (i < string.atoi(par[0])):
tmp2.append(tmp[i])
i = i + 1
if (par[1] == "L"):
tmp = value.split(" ")
tmp.reverse()
tmp2 = []
i = 0
while (i < string.atoi(par[0])):
tmp2.append(tmp[i])
i = i + 1
tmp2.reverse()
out = string.join(tmp2, " ")
elif (fn == "MINL"):
-
- par = set_par_defaults(par, "1")
+
+ par = set_par_defaults(par, "1")
tmp = value.split(" ")
tmp2 = []
i = 0
for wrd in tmp:
if (len(wrd) >= string.atoi(par[0])):
tmp2.append(wrd)
out = string.join(tmp2, " ")
elif (fn == "MINLW"):
- par = set_par_defaults(par, "1")
+ par = set_par_defaults(par, "1")
if (len(value) >= string.atoi(par[0])):
out = value
else:
out = ""
elif (fn == "MAXL"):
- par = set_par_defaults(par, "4096")
+ par = set_par_defaults(par, "4096")
tmp = value.split(" ")
tmp2 = []
i = 0
for wrd in tmp:
if (len(wrd) <= string.atoi(par[0])):
tmp2.append(wrd)
out = string.join(tmp2, " ")
-
+
elif (fn == "REP"):
set_par_defaults(par, ",")
if (par[0]!= ""):
if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp):
par[0] = par[0][NRE:-NRE]
out = re.sub(par[0], value)
else:
out = value.replace(par[0], par[1])
elif (fn == "SHAPE"):
-
+
if (value != ""):
out = value.strip()
elif (fn == "UP"):
out = value.upper()
elif (fn == "DOWN"):
out = value.lower()
elif (fn == "CAP"):
tmp = value.split(" ")
out2 = []
for wrd in tmp:
wrd2 = wrd.capitalize()
out2.append(wrd2)
out = string.join(out2, " ")
elif (fn == "IF"):
par = set_par_defaults(par, ",,")
N = 0
while N < 3:
if (par[N][0:NRE] == regexp and par[N][-NRE:] == regexp):
par[N] = par[N][NRE:-NRE]
par[N] = re.search(par[N], value).group()
N += 1
if (value == par[0]):
out = par[1]
else:
out = par[2]
if (out == "ORIG"):
out = value
elif (fn == "EXP"):
par = set_par_defaults(par, ",0")
if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp):
par[0] = par[0][NRE:-NRE]
par[0] = re.search(par[0], value).group()
-
+
tmp = value.split(" ")
out2 = []
for wrd in tmp:
if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp):
par[0] = par[0][NRE:-NRE]
if ((re.search(par[0], wrd).group() == wrd) and \
(par[1] == "1")):
out2.append(wrd)
if ((re.search(par[0], wrd).group() != wrd) and \
(par[1] == "0")):
out2.append(wrd)
else:
if ((len(wrd.split(par[0])) == 1) and \
(par[1] == "1")):
out2.append(wrd)
if ((len(wrd.split(par[0])) != 1) and \
(par[1] == "0")):
- out2.append(wrd)
+ out2.append(wrd)
out = string.join(out2," ")
elif (fn == "EXPW"):
par = set_par_defaults(par,",0")
tmp = value.split(" ")
out2 = []
for wrd in tmp:
if ((FormatField(wrd,"SUP(" + par[0] + ")") == wrd) and \
(par[1] == "1")):
out2.append(wrd)
if ((FormatField(wrd,"SUP(" + par[0] + ")") != wrd) and \
(par[1] == "0")):
out2.append(wrd)
-
+
out = string.join(out2," ")
-
+
elif fn == "JOINMULTILINES":
## Take a string, split it on newlines, and join them together, with
## a prefix and suffix for each segment. If prefix and suffix are
## empty strings, make suffix a single space.
prefix = par[0]
suffix = par[1]
if prefix == "" and suffix == "":
## Values should at least be separated by something;
## make suffix a space:
suffix = " "
new_value = ""
vals_list = value.split("\n")
for item in vals_list:
new_value += "%s%s%s" % (prefix, item, suffix)
new_value.rstrip(" ")
## Update "out" with the newly created value:
out = new_value
elif (fn == "SPLIT"):
par = set_par_defaults(par, "%d,0,,1" % conv_setting[1])
length = string.atoi(par[0]) + (string.atoi(par[1]))
header = string.atoi(par[1])
headerplus = par[2]
starting = string.atoi(par[3])
line = ""
tmp2 = []
tmp3 = []
tmp = value.split(" ")
linenumber = 1
if (linenumber >= starting):
tmp2.append(headerplus)
line = line + headerplus
-
+
for wrd in tmp:
line = line + " " + wrd
tmp2.append(wrd)
if (len(line) > length):
linenumber = linenumber + 1
line = tmp2.pop()
toout = string.join(tmp2)
tmp3.append(toout)
tmp2 = []
line2 = value[:header]
if (linenumber >= starting):
line3 = line2 + headerplus + line
else:
line3 = line2 + line
- line = line3
- tmp2.append(line)
+ line = line3
+ tmp2.append(line)
tmp3.append(line)
out = string.join(tmp3, "\n")
out = FormatField(out, "SHAPE()")
elif (fn == "SPLITW"):
par = set_par_defaults(par, ",0,,1")
if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp):
par[0] = par[0][NRE:-NRE]
str = re.search(par[0], value)
header = string.atoi(par[1])
headerplus = par[2]
starting = string.atoi(par[3])
counter = 1
-
+
tmp2 = []
tmp = re.split(par[0], value)
last = tmp.pop()
-
+
for wrd in tmp:
counter = counter + 1
if (counter >= starting):
tmp2.append(value[:header] + headerplus + wrd + str)
else:
tmp2.append(value[:header] + wrd + str)
if (last != ""):
counter = counter + 1
if (counter >= starting):
tmp2.append(value[:header] + headerplus + last)
else:
tmp2.append(value[:header] + last)
-
+
out = string.join(tmp2,"\n")
elif (fn == "CONF"):
par = set_par_defaults(par, ",,1")
found = 0
par1 = ""
data = select_line(par[0], data_parsed)
-
+
for line in data:
if (par[1][0:NRE] == regexp and par[1][-NRE:] == regexp):
par1 = par[1][NRE:-NRE]
else:
par1 = par[1]
if (par1 == ""):
if (line == ""):
found = 1
elif (len(re.split(par1,line)) > 1 ):
found = 1
if ((found == 1) and (string.atoi(par[2]) == 1)):
out = value
if ((found == 1) and (string.atoi(par[2]) == 0)):
out = ""
if ((found == 0) and (string.atoi(par[2]) == 1)):
out = ""
if ((found == 0) and (string.atoi(par[2]) == 0)):
out = value
return out
elif (fn == "IFDEFP"):
par = set_par_defaults(par, ",,1")
found = 0
par1 = ""
data = select_line(par[0], data_parsed)
if len(data) == 0 and par[1] == "":
## The "found" condition is that the field was empty
found = 1
else:
## Seeking a value in the field - conduct the search:
for line in data:
if (par[1][0:NRE] == regexp and par[1][-NRE:] == regexp):
par1 = par[1][NRE:-NRE]
else:
par1 = par[1]
if (par1 == ""):
if (line == ""):
found = 1
elif (len(re.split(par1,line)) > 1 ):
found = 1
if ((found == 1) and (string.atoi(par[2]) == 1)):
out = value
if ((found == 1) and (string.atoi(par[2]) == 0)):
out = ""
if ((found == 0) and (string.atoi(par[2]) == 1)):
out = ""
if ((found == 0) and (string.atoi(par[2]) == 0)):
out = value
return out
elif (fn == "CONFL"):
set_par_defaults(par,",1")
if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp):
par[0] = par[0][NRE:-NRE]
if (re.search(par[0], value)):
- if (string.atoi(par[1]) == 1):
+ if (string.atoi(par[1]) == 1):
out = value
else:
out = ""
else:
- if (string.atoi(par[1]) == 1):
+ if (string.atoi(par[1]) == 1):
out = ""
else:
out = value
return out
elif (fn == "CUT"):
par = set_par_defaults(par, ",")
left = value[:len(par[0])]
right = value[-(len(par[1])):]
if (left == par[0]):
out = out[len(par[0]):]
if (right == par[1]):
out = out[:-(len(par[1]))]
-
+
return out
elif (fn == "NUM"):
tmp = re.findall('\d', value)
out = string.join(tmp, "")
return out
def format_field(value, fn):
"""
bibconvert formatting functions:
================================
- ADD(prefix,suffix) - add prefix/suffix
- KB(kb_file,mode) - lookup in kb_file and replace value
+ ADD(prefix,suffix) - add prefix/suffix
+ KB(kb_file,mode) - lookup in kb_file and replace value
ABR(N,suffix) - abbreviate to N places with suffix
ABRX() - abbreviate exclusively words longer
ABRW() - abbreviate word (limit from right)
REP(x,y) - replace
SUP(type) - remove characters of certain TYPE
LIM(n,side) - limit to n letters from L/R
LIMW(string,side) - L/R after split on string
WORDS(n,side) - limit to n words from L/R
IF(value,valueT,valueF) - replace on IF condition
MINL(n) - replace words shorter than n
MINLW(n) - replace words shorter than n
MAXL(n) - replace words longer than n
EXPW(type) - replace word from value containing TYPE
EXP(STR,0/1) - replace word from value containing string
NUM() - take only digits in given string
SHAPE() - remove extra space
UP() - to uppercase
DOWN() - to lowercase
CAP() - make capitals each word
SPLIT(n,h,str,from) - only for final Aleph field, i.e. AB , maintain whole words
SPLITW(sep,h,str,from) - only for final Aleph field, split on string
CONF(filed,value,0/1) - confirm validity of output line (check other field)
CONFL(substr,0/1) - confirm validity of output line (check field being processed)
CUT(prefix,postfix) - remove substring from side
RANGE(MIN,MAX) - select items in repetitive fields
RE(regexp) - regular expressions
-
+
bibconvert character TYPES
==========================
ALPHA - alphabetic
NALPHA - not alpphabetic
NUM - numeric
NNUM - not numeric
ALNUM - alphanumeric
NALNUM - non alphanumeric
LOWER - lowercase
UPPER - uppercase
PUNCT - punctual
NPUNCT - non punctual
SPACE - space
"""
global data_parsed
out = value
fn = fn + "()"
par = get_pars(fn)[1]
fn = get_pars(fn)[0]
regexp = "//"
NRE = len(regexp)
value = sub_keywd(value)
par_tmp = []
for item in par:
item = sub_keywd(item)
par_tmp.append(item)
- par = par_tmp
-
+ par = par_tmp
+
if (fn == "RE"):
new_value = ""
par = set_par_defaults(par, ".*,0")
if (re.search(par[0], value) and (par[1] == "0")):
new_value = value
out = new_value
-
+
if (fn == "KB"):
new_value = ""
-
+
par = set_par_defaults(par, "KB,0")
new_value = crawl_KB(par[0], value, par[1])
out = new_value
elif (fn == "ADD"):
-
+
par = set_par_defaults(par, ",")
out = par[0] + value + par[1]
-
+
elif (fn == "ABR"):
- par = set_par_defaults(par, "1,.")
+ par = set_par_defaults(par, "1,.")
out = value[:string.atoi(par[0])] + par[1]
elif (fn == "ABRW"):
tmp = format_field(value,"ABR(1,.)")
tmp = tmp.upper()
out = tmp
elif (fn == "ABRX"):
- par = set_par_defaults(par, ",")
- toout = []
+ par = set_par_defaults(par, ",")
+ toout = []
tmp = value.split(" ")
for wrd in tmp:
if (len(wrd) > string.atoi(par[0])):
wrd = wrd[:string.atoi(par[0])] + par[1]
toout.append(wrd)
out = string.join(toout, " ")
elif (fn == "SUP"):
par = set_par_defaults(par, ",")
if(par[0] == "NUM"):
out = re.sub('\d+', par[1], value)
-
+
if(par[0] == "NNUM"):
out = re.sub('\D+', par[1], value)
if(par[0] == "ALPHA"):
out = re.sub('[a-zA-Z]+', par[1], value)
if(par[0] == "NALPHA"):
out = re.sub('[^a-zA-Z]+', par[1], value)
if((par[0] == "ALNUM") or (par[0] == "NPUNCT")):
out = re.sub('\w+', par[1], value)
if(par[0] == "NALNUM"):
out = re.sub('\W+', par[1], value)
if(par[0] == "PUNCT"):
out = re.sub('\W+', par[1], value)
-
+
if(par[0] == "LOWER"):
out = re.sub('[a-z]+', par[1], value)
if(par[0] == "UPPER"):
out = re.sub('[A-Z]+', par[1], value)
if(par[0] == "SPACE"):
out = re.sub('\s+', par[1], value)
-
+
elif (fn == "LIM"):
- par = set_par_defaults(par, ",")
+ par = set_par_defaults(par, ",")
if (par[1] == "L"):
- out = value[(len(value) - string.atoi(par[0])):]
+ out = value[(len(value) - string.atoi(par[0])):]
if (par[1] == "R"):
out = value[:string.atoi(par[0])]
elif (fn == "LIMW"):
- par = set_par_defaults(par, ",")
+ par = set_par_defaults(par, ",")
if (par[0]!= ""):
if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp):
par[0] = par[0][NRE:-NRE]
par[0] = re.search(par[0], value).group()
tmp = value.split(par[0])
if (par[1] == "L"):
out = par[0] + tmp[1]
if (par[1] == "R"):
out = tmp[0] + par[0]
elif (fn == "WORDS"):
tmp2 = [value]
- par = set_par_defaults(par, ",")
+ par = set_par_defaults(par, ",")
if (par[1] == "R"):
tmp = value.split(" ")
- tmp2 = []
+ tmp2 = []
i = 0
while (i < string.atoi(par[0])):
tmp2.append(tmp[i])
i = i + 1
if (par[1] == "L"):
tmp = value.split(" ")
tmp.reverse()
tmp2 = []
i = 0
while (i < string.atoi(par[0])):
tmp2.append(tmp[i])
i = i + 1
tmp2.reverse()
out = string.join(tmp2, " ")
elif (fn == "MINL"):
-
- par = set_par_defaults(par, "1")
+
+ par = set_par_defaults(par, "1")
tmp = value.split(" ")
tmp2 = []
i = 0
for wrd in tmp:
if (len(wrd) >= string.atoi(par[0])):
tmp2.append(wrd)
out = string.join(tmp2, " ")
elif (fn == "MINLW"):
- par = set_par_defaults(par, "1")
+ par = set_par_defaults(par, "1")
if (len(value) >= string.atoi(par[0])):
out = value
else:
out = ""
elif (fn == "MAXL"):
- par = set_par_defaults(par, "4096")
+ par = set_par_defaults(par, "4096")
tmp = value.split(" ")
tmp2 = []
i = 0
for wrd in tmp:
if (len(wrd) <= string.atoi(par[0])):
tmp2.append(wrd)
out = string.join(tmp2, " ")
-
+
elif (fn == "REP"):
set_par_defaults(par, ",")
if (par[0]!= ""):
if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp):
par[0] = par[0][NRE:-NRE]
out = re.sub(par[0], value)
else:
out = value.replace(par[0], par[1])
elif (fn == "SHAPE"):
-
+
if (value != ""):
out = value.strip()
elif (fn == "UP"):
out = value.upper()
elif (fn == "DOWN"):
out = value.lower()
elif (fn == "CAP"):
tmp = value.split(" ")
out2 = []
for wrd in tmp:
wrd2 = wrd.capitalize()
out2.append(wrd2)
out = string.join(out2," ")
elif (fn == "IF"):
par = set_par_defaults(par,",,")
N = 0
while N < 3:
if (par[N][0:NRE] == regexp and par[N][-NRE:] == regexp):
par[N] = par[N][NRE:-NRE]
par[N] = re.search(par[N], value).group()
N += 1
if (value == par[0]):
out = par[1]
else:
out = par[2]
if (out == "ORIG"):
out = value
elif (fn == "EXP"):
par = set_par_defaults(par, ",0")
if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp):
par[0] = par[0][NRE:-NRE]
par[0] = re.search(par[0], value).group()
-
+
tmp = value.split(" ")
out2 = []
for wrd in tmp:
if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp):
par[0] = par[0][NRE:-NRE]
if ((re.search(par[0], wrd).group() == wrd) and \
(par[1] == "1")):
out2.append(wrd)
if ((re.search(par[0], wrd).group() != wrd) and \
(par[1] == "0")):
out2.append(wrd)
else:
if ((len(wrd.split(par[0])) == 1) and \
(par[1] == "1")):
out2.append(wrd)
if ((len(wrd.split(par[0])) != 1) and \
(par[1] == "0")):
- out2.append(wrd)
+ out2.append(wrd)
out = string.join(out2," ")
elif (fn == "EXPW"):
par = set_par_defaults(par,",0")
tmp = value.split(" ")
out2 = []
for wrd in tmp:
if ((format_field(wrd,"SUP(" + par[0] + ")") == wrd) and \
(par[1] == "1")):
out2.append(wrd)
if ((format_field(wrd,"SUP(" + par[0] + ")") != wrd) and \
(par[1] == "0")):
out2.append(wrd)
-
+
out = string.join(out2," ")
-
+
elif (fn == "SPLIT"):
par = set_par_defaults(par, "%d,0,,1" % conv_setting[1])
length = string.atoi(par[0]) + (string.atoi(par[1]))
header = string.atoi(par[1])
headerplus = par[2]
starting = string.atoi(par[3])
line = ""
tmp2 = []
tmp3 = []
tmp = value.split(" ")
linenumber = 1
if (linenumber >= starting):
tmp2.append(headerplus)
line = line + headerplus
-
+
for wrd in tmp:
line = line + " " + wrd
tmp2.append(wrd)
if (len(line) > length):
linenumber = linenumber + 1
line = tmp2.pop()
toout = string.join(tmp2)
tmp3.append(toout)
tmp2 = []
line2 = value[:header]
if (linenumber >= starting):
line3 = line2 + headerplus + line
else:
line3 = line2 + line
- line = line3
- tmp2.append(line)
+ line = line3
+ tmp2.append(line)
tmp3.append(line)
out = string.join(tmp3, "\n")
out = format_field(out, "SHAPE()")
elif (fn == "SPLITW"):
par = set_par_defaults(par, ",0,,1")
if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp):
par[0] = par[0][NRE:-NRE]
str = re.search(par[0], value)
header = string.atoi(par[1])
headerplus = par[2]
starting = string.atoi(par[3])
counter = 1
-
+
tmp2 = []
tmp = re.split(par[0], value)
last = tmp.pop()
-
+
for wrd in tmp:
counter = counter + 1
if (counter >= starting):
tmp2.append(value[:header] + headerplus + wrd + str)
else:
tmp2.append(value[:header] + wrd + str)
if (last != ""):
counter = counter + 1
if (counter >= starting):
tmp2.append(value[:header] + headerplus + last)
else:
tmp2.append(value[:header] + last)
-
+
out = string.join(tmp2, "\n")
elif (fn == "CONF"):
par = set_par_defaults(par, ",,1")
found = 0
par1 = ""
data = select_line(par[0], data_parsed)
-
+
for line in data:
if (par[1][0:NRE] == regexp and par[1][-NRE:] == regexp):
par1 = par[1][NRE:-NRE]
else:
par1 = par[1]
if (par1 == ""):
if (line == ""):
found = 1
elif (len(re.split(par1,line)) > 1 ):
found = 1
if ((found == 1) and (string.atoi(par[2]) == 1)):
out = value
if ((found == 1) and (string.atoi(par[2]) == 0)):
out = ""
if ((found == 0) and (string.atoi(par[2]) == 1)):
out = ""
if ((found == 0) and (string.atoi(par[2]) == 0)):
out = value
-
+
return out
-
+
elif (fn == "CONFL"):
set_par_defaults(par,",1")
if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp):
par[0] = par[0][NRE:-NRE]
if (re.search(par[0], value)):
- if (string.atoi(par[1]) == 1):
+ if (string.atoi(par[1]) == 1):
out = value
else:
out = ""
else:
- if (string.atoi(par[1]) == 1):
+ if (string.atoi(par[1]) == 1):
out = ""
else:
out = value
return out
elif (fn == "CUT"):
par = set_par_defaults(par, ",")
left = value[:len(par[0])]
right = value[-(len(par[1])):]
if (left == par[0]):
out = out[len(par[0]):]
if (right == par[1]):
out = out[:-(len(par[1]))]
-
+
return out
elif (fn == "NUM"):
tmp = re.findall('\d', value)
out = string.join(tmp, "")
return out
## Match records with the database content
##
def match_in_database(record, query_string):
"Check if record is in alreadey in database with an oai identifier. Returns recID if present, 0 otherwise."
query_string_parsed = parse_query_string(query_string)
search_pattern = []
search_field = []
for query_field in query_string_parsed:
ind1 = query_field[0][3:4]
if ind1 == "_":
ind1 = ""
ind2 = query_field[0][4:5]
if ind2 == "_":
ind2 = ""
stringsplit = "" % (query_field[0][0:3], ind1, ind2, query_field[0][5:6])
formatting = query_field[1:]
record1 = string.split(record, stringsplit)
-
+
if len(record1) > 1:
-
+
matching_value = string.split(record1[1], "<")[0]
for fn in formatting:
matching_value = FormatField(matching_value, fn)
search_pattern.append(matching_value)
search_field.append(query_field[0])
search_field.append("")
search_field.append("")
search_field.append("")
search_pattern.append("")
search_pattern.append("")
search_pattern.append("")
recID_list = perform_request_search(p1=search_pattern[0],
f1=search_field[0],
p2=search_pattern[1],
f2=search_field[1],
p3=search_pattern[2],
f3=search_field[2])
return recID_list
def parse_query_string(query_string):
"""Parse query string, e.g.:
Input: 245__a::REP(-, )::SHAPE::SUP(SPACE, )::MINL(4)::MAXL(8)::EXPW(PUNCT)::WORDS(4,L)::SHAPE::SUP(SPACE, )||700__a::MINL(2)::REP(COMMA,).
Output:[['245__a','REP(-,)','SHAPE','SUP(SPACE, )','MINL(4)','MAXL(8)','EXPW(PUNCT)','WORDS(4,L)','SHAPE','SUP(SPACE, )'],['700__a','MINL(2)','REP(COMMA,)']]
"""
query_string_out = []
query_string_out_in = []
query_string_split_1 = query_string.split('||')
for item_1 in query_string_split_1:
query_string_split_2 = item_1.split('::')
query_string_out_in = []
for item in query_string_split_2:
query_string_out_in.append(item)
query_string_out.append(query_string_out_in)
return query_string_out
def exit_on_error(error_message):
"exit when error occured"
sys.stderr.write("\n bibconvert data convertor\n")
sys.stderr.write(" Error: %s\n" % error_message)
sys.exit()
return 0
def create_record(begin_record_header,
ending_record_footer,
query_string,
match_mode,
Xcount):
"Create output record"
global data_parsed
out_to_print = ""
out = []
field_data_item_LIST = []
ssn5cnt = "%3d" % Xcount
sysno = generate("DATE(%w%H%M%S)")
sysno500 = generate("XDATE(%w%H%M%S)," + ssn5cnt)
-
+
for T_tpl_item_LIST in target_tpl_parsed:
# the line is printed only if the variables inside are not empty
print_line = 0
to_output = []
- rows = 1
+ rows = 1
for field_tpl_item_STRING in T_tpl_item_LIST[1]:
save_field_newlines = 0
DATA = []
if (field_tpl_item_STRING[:2]=="<:"):
field_tpl_item_STRING = field_tpl_item_STRING[2:-2]
field = field_tpl_item_STRING.split("::")[0]
if (len(field_tpl_item_STRING.split("::")) == 1):
value = generate(field)
to_output.append([value])
else:
subfield = field_tpl_item_STRING.split("::")[1]
if (field[-1] == "*"):
repetitive = 1
field = field[:-1]
elif field[-1] == "^":
## Keep the newlines in a field's value:
repetitive = 0
save_field_newlines = 1
field = field[:-1]
else:
repetitive = 0
if dirmode:
DATA = select_line(field, data_parsed)
else:
DATA = select_line(field, data_parsed)
if save_field_newlines == 1:
## put newlines back into the element value:
DATA = [string.join(DATA, "\n")]
elif (repetitive == 0):
DATA = [string.join(DATA, " ")]
SRC_TPL = select_line(field, source_tpl_parsed)
try:
## Get the components that this field is composed of:
field_components = field_tpl_item_STRING.split("::")
num_field_components = len(field_components)
## Test the number of components. If it is greater that 2,
## some kind of functions must be called on the value of
## the field, and it should therefore be evaluated. If however,
## the field is made-up of only 2 components, (i.e. no functions
## are called on its value, AND the value is empty, do not bother
## to evaluate it.
##
## E.g. In the following line:
## 300---<:Num::Num:><:Num::Num::IF(,mult. p):>
##
## If we have a value "3" for page number (Num), we want the following result:
## 3 p
## If however, we have no value for page number (Num), we want this result:
## mult. p
## The functions relating to the datafield must therefore be executed
##
## If however, the template contains this line:
## 300---<:Num::Num:>
##
## If we have a value "3" for page number (Num), we want the following result:
## 3
## If however, we have no value for page number (Num), we do NOT want the line
## to be printed at all - we should SKIP the element and not return an empty
## value ( would be pointless.)
if (DATA[0] != "" or num_field_components > 2):
DATA = get_subfields(DATA, subfield, SRC_TPL)
FF = field_tpl_item_STRING.split("::")
if (len(FF) > 2):
FF = FF[2:]
for fn in FF:
# DATAFORMATTED = []
if (len(DATA) != 0):
DATA = get_subfields(DATA, subfield, SRC_TPL)
FF = field_tpl_item_STRING.split("::")
if (len(FF) > 2):
FF = FF[2:]
for fn2 in FF:
DATAFORMATTED = []
for item in DATA:
item = FormatField(item, fn)
if item != "":
DATAFORMATTED.append(item)
DATA = DATAFORMATTED
if (len(DATA) > rows):
rows = len(DATA)
if DATA[0] != "":
print_line = 1
to_output.append(DATA)
except IndexError, e:
pass
else:
to_output.append([field_tpl_item_STRING])
current = 0
default_print = 0
while (current < rows):
line_to_print = []
for item in to_output:
if (item == []):
item = ['']
if (len(item) <= current):
printout = item[0]
else:
printout = item[current]
line_to_print.append(printout)
output = exp_n(string.join(line_to_print,""))
global_formatting_functions = T_tpl_item_LIST[0].split("::")[1:]
for GFF in global_formatting_functions:
if (GFF[:5] == "RANGE"):
parR = get_pars(GFF)[1]
parR = set_par_defaults(parR,"MIN,MAX")
if (parR[0]!="MIN"):
if (string.atoi(parR[0]) > (current+1)):
output = ""
if (parR[1]!="MAX"):
if (string.atoi(parR[1]) < (current+1)):
output = ""
elif (GFF[:6] == "IFDEFP"):
## Like a DEFP and a CONF combined. I.e. Print the line
## EVEN if its a constant, but ONLY IF the condition in
## the IFDEFP is met.
## If the value returned is an empty string, no line will
## be printed.
output = FormatField(output, GFF)
print_line = 1
elif (GFF[:4] == "DEFP"):
default_print = 1
else:
output = FormatField(output, GFF)
if ((len(output) > set_conv()[0] and print_line == 1) or default_print):
out_to_print = out_to_print + output + "\n"
current = current + 1
###
out_flag = 0
if query_string:
recID = match_in_database(out_to_print, query_string)
-
+
if len(recID) == 1 and match_mode == 1:
ctrlfield = "%d" % (recID[0])
out_to_print = ctrlfield + "\n" + out_to_print
out_flag = 1
-
+
if len(recID) == 0 and match_mode == 0:
out_flag = 1
-
+
if len(recID) > 1 and match_mode == 2:
out_flag = 1
-
-
+
+
if out_flag or match_mode == -1:
if begin_record_header != "":
out_to_print = begin_record_header + "\n" + out_to_print
if ending_record_footer != "":
out_to_print = out_to_print + "\n" + ending_record_footer
else:
out_to_print = ""
-
+
return out_to_print
def convert(ar_):
global dirmode, Xcount, conv_setting, sysno, sysno500, separator, tcounter, source_data, query_string, match_mode, begin_record_header, ending_record_footer, output_rec_sep, begin_header, ending_footer, oai_identifier_from, source_tpl, source_tpl_parsed, target_tpl, target_tpl_parsed, extract_tpl, extract_tpl_parsed, data_parsed
dirmode, Xcount, conv_setting, sysno, sysno500, separator, tcounter, source_data, query_string, match_mode, begin_record_header, ending_record_footer, output_rec_sep, begin_header, ending_footer, oai_identifier_from, source_tpl, source_tpl_parsed, target_tpl, target_tpl_parsed, extract_tpl, extract_tpl_parsed = ar_
# separator = spt
# Added by Alberto
separator = sub_keywd(separator)
-
+
if dirmode:
if (os.path.isdir(source_data)):
data_parsed = parse_input_data_d(source_data, source_tpl)
-
+
record = create_record(begin_record_header, ending_record_footer, query_string, match_mode, Xcount)
if record != "":
print record
tcounter = tcounter + 1
if output_rec_sep != "":
print output_rec_sep
else:
exit_on_error("Cannot access directory: %s" % source_data)
-
+
else:
done = 0
print begin_header
while (done == 0):
data_parsed = parse_input_data_fx(source_tpl)
if (data_parsed == -1):
done = 1
else:
if (data_parsed[0][0]!= ''):
record = create_record(begin_record_header, ending_record_footer, query_string, match_mode, Xcount)
Xcount += 1
if record != "":
print record
tcounter = tcounter + 1
if output_rec_sep != "":
print output_rec_sep
print ending_footer
return
diff --git a/modules/bibconvert/lib/bibconvert_bfx_engine.py b/modules/bibconvert/lib/bibconvert_bfx_engine.py
index f2aa705d0..39d7d6f38 100644
--- a/modules/bibconvert/lib/bibconvert_bfx_engine.py
+++ b/modules/bibconvert/lib/bibconvert_bfx_engine.py
@@ -1,303 +1,303 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-## General Public License for more details.
+## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
bibconvert_bfx_engine - XML processing library for CDS Invenio
using bfx stylesheets.
Does almost what an XSLT processor does, but using a special
syntax for the transformation stylesheet: a combination of
'BibFormat for XML' (bibformat bfx) templates and XPath is
used.
Dependencies: bibformat_bfx_engine.py
Used by: bibconvert.in
"""
__revision__ = "$Id$"
import sys
import os
from cStringIO import StringIO
processor_type = -1
try:
# Try to load
from xml.xpath import Evaluate
from xml.dom import minidom, Node
from xml.xpath.Context import Context
processor_type = 0
except ImportError:
pass
-# TODO: Try to explicitely load 4suite Xpath
+# TODO: Try to explicitely load 4suite Xpath
#
# From :
## 1. PyXML usage (do not use with 4Suite)
## * import xml.xslt
## * import xml.xpath
## 2. 4Suite usage (use these imports)
## * import Ft.Xml.XPath
## * import Ft.Xml.Xslt
-
+
from invenio import bibformat_bfx_engine
-from invenio.config import etcdir
+from invenio.config import CFG_ETCDIR
-CFG_BFX_TEMPLATES_PATH = "%s%sbibconvert%sconfig" % (etcdir, os.sep, os.sep)
+CFG_BFX_TEMPLATES_PATH = "%s%sbibconvert%sconfig" % (CFG_ETCDIR, os.sep, os.sep)
def convert(xmltext, template_filename=None, template_source=None):
"""
Processes an XML text according to a template, and returns the result.
The template can be given either by name (or by path) or by source.
If source is given, name is ignored.
bibconvert_bfx_engine will look for template_filename in standard directories
for templates. If not found, template_filename will be assumed to be a path to
a template. If none can be found, return None.
Raises an exception if cannot find an appropriate XPath module.
@param xmltext The string representation of the XML to process
@param template_filename The name of the template to use for the processing
@param template_source The configuration describing the processing.
@return the transformed XML text.
"""
if processor_type == -1:
# No XPath processor found
raise "No XPath processor could be found"
-
+
# Retrieve template and read it
if template_source:
template = template_source
elif template_filename:
try:
path_to_templates = (CFG_BFX_TEMPLATES_PATH + os.sep +
template_filename)
if os.path.exists(path_to_templates):
template = file(path_to_templates).read()
elif os.path.exists(template_filename):
template = file(template_filename).read()
else:
sys.stderr.write(template_filename +' does not exist.')
return None
except IOError:
sys.stderr.write(template_filename +' could not be read.')
return None
else:
sys.stderr.write(template_filename +' was not given.')
return None
# Prepare some variables
- out_file = StringIO() # Virtual file-like object to write result in
+ out_file = StringIO() # Virtual file-like object to write result in
trans = XML2XMLTranslator()
trans.set_xml_source(xmltext)
parser = bibformat_bfx_engine.BFXParser(trans)
-
+
# Load template
# This might print some info. Redirect to stderr
# but do no print on standard output
standard_output = sys.stdout
sys.stdout = sys.stderr
# Always set 'template_name' to None, otherwise
# bibformat for XML will look for it in wrong directory
template_tree = parser.load_template(template_name=None,
template_source=template)
sys.stdout = standard_output
# Transform the source using loaded template
parser.walk(template_tree, out_file)
- output = out_file.getvalue()
+ output = out_file.getvalue()
return output
class XML2XMLTranslator:
"""
Generic translator for XML.
"""
def __init__(self):
'''
Create an instance of the translator and init with the list of the defined labels and their rules.
'''
self.xml_source = ''
self.dom = None
self.current_node = None
self.namespaces = {}
def is_defined(self, name):
'''
Check whether a variable is defined.
Accept all names. get_value will return empty string if not exist
-
+
@param name the name of the variable
'''
return True
## context = Context(self.current_node, processorNss=self.namespaces)
-
+
## results_list = Evaluate(name, context=context)
## if results_list != []:
## return True
## else:
## return False
-
+
def get_num_elements(self, name):
'''
An API function to get the number of elements for a variable.
Do not use this function to build loops, Use iterator instead.
'''
context = Context(self.current_node, processorNss=self.namespaces)
results_list = Evaluate(name, context=context)
return len(results_list)
def get_value(self, name, display_type='value'):
'''
The API function for quering the translator for values of a certain variable.
Called in a loop will result in a different value each time.
-
+
@param name the name of the variable you want the value of
@param display_type an optional value for the type of the desired output, one of: value, tag, ind1, ind2, code, fulltag;
These can be easily added in the proper place of the code (display_value)
'''
context = Context(self.current_node, processorNss=self.namespaces)
results_list = Evaluate(name, context=context)
if len(results_list) == 0:
return ''
# Select text node value of selected nodes
# and concatenate
return ' '.join([node.childNodes[0].nodeValue.encode( "utf-8" )
for node in results_list])
-
+
def iterator(self, name):
'''
An iterator over the values of a certain name.
The iterator changes state of interenal variables and objects.
When calling get_value in a loop, this will result each time in a different value.
'''
saved_node = self.current_node
context = Context(self.current_node, processorNss=self.namespaces)
results_list = Evaluate(name, context=context)
for node in results_list:
self.current_node = node
yield node
self.current_node = saved_node
-
+
def call_function(self, function_name, parameters=None):
'''
Call an external element which is a Python file, using BibFormat
@param function_name the name of the function to call
@param parameters a dictionary of the parameters to pass as key=value pairs
@return a string value, which is the result of the function call
'''
#No support for this in bibconvert_bfx_engine
## if parameters is None:
## parameters = {}
## bfo = BibFormatObject(self.recID)
## format_element = get_format_element(function_name)
## (value, errors) = eval_format_element(format_element, bfo, parameters)
## #to do: check errors from function call
## return value
return ""
-
+
def set_xml_source(self, xmltext):
"""
Specify the source XML for this transformer
@param xmltext the XML text representation to use as source
"""
self.xml_source = xmltext
self.dom = minidom.parseString(xmltext)
self.current_node = self.dom
self.namespaces = build_namespaces(self.dom)
def doc_order_iter_filter(node, filter_func):
"""
Iterates over each node in document order,
applying the filter function to each in turn,
starting with the given node, and yielding each node in
cases where the filter function computes true
@param node the starting point (subtree rooted at node will be iterated over document order)
@param filter_func a callable object taking a node and returning true or false
"""
if filter_func(node):
yield node
for child in node.childNodes:
for cn in doc_order_iter_filter(child, filter_func):
yield cn
return
def get_all_elements(node):
"""
Returns an iterator (using document order) over all element nodes
that are descendants of the given one
"""
return doc_order_iter_filter(
node, lambda n: n.nodeType == Node.ELEMENT_NODE
)
def build_namespaces(dom):
"""
Build the namespaces present in dom tree.
-
+
Necessary to use prior processing an XML file
in order to execute XPath queries correctly.
@param dom the dom tree to parse to discover namespaces
@return a dictionary with prefix as key and namespace as value
"""
namespaces = {}
for elem in get_all_elements(dom):
if elem.prefix is not None:
namespaces[elem.prefix] = elem.namespaceURI
-
+
for attr in elem.attributes.values():
if attr.prefix is not None:
namespaces[attr.prefix] = attr.namespaceURI
return namespaces
def bc_profile():
"""
Runs a benchmark
"""
global xmltext
-
+
convert(xmltext, 'oaidc2marcxml.bfx')
return
def benchmark():
"""
Benchmark the module, using profile and pstats
"""
import profile
import pstats
from invenio.bibformat import record_get_xml
global xmltext
-
+
xmltext = record_get_xml(10, 'oai_dc')
profile.run('bc_profile()', "bibconvert_xslt_profile")
p = pstats.Stats("bibconvert_xslt_profile")
p.strip_dirs().sort_stats("cumulative").print_stats()
if __name__ == "__main__":
# FIXME: Implement command line options
pass
diff --git a/modules/bibconvert/lib/bibconvert_xslt_engine.py b/modules/bibconvert/lib/bibconvert_xslt_engine.py
index 56c645f83..d38c8b9e7 100644
--- a/modules/bibconvert/lib/bibconvert_xslt_engine.py
+++ b/modules/bibconvert/lib/bibconvert_xslt_engine.py
@@ -1,263 +1,263 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-## General Public License for more details.
+## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
bibconvert_xslt_engine - Wrapper for an XSLT engine.
Customized to support BibConvert functions through the
use of XPath 'format' function.
Dependencies: Need one of the following XSLT processors:
- libxml2 & libxslt
- 4suite
Used by: bibconvert.in
FIXME: - Find better namespace for functions
- Find less bogus URI (given as param to processor)
for source and template
- Implement command-line options
- Think about better handling of 'value' parameter
in bibconvert_function_*
"""
__revision__ = "$Id$"
import sys
import os
from invenio.config import \
- etcdir, \
+ CFG_ETCDIR, \
weburl
from invenio.bibconvert import FormatField
# The namespace used for BibConvert functions
CFG_BIBCONVERT_FUNCTION_NS = "http://cdsweb.cern.ch/bibconvert/fn"
# Import one XSLT processor
#
# processor_type:
# -1 : No processor found
# 0 : libxslt
# 1 : 4suite
processor_type = -1
try:
# libxml2 & libxslt
import libxml2
import libxslt
processor_type = 0
except ImportError:
pass
if processor_type == -1:
try:
# 4suite
from Ft.Xml.Xslt import Processor
from Ft.Xml import InputSource
from xml.dom import Node
processor_type = 1
except ImportError:
pass
-CFG_BIBCONVERT_XSL_PATH = "%s%sbibconvert%sconfig" % (etcdir, os.sep, os.sep)
+CFG_BIBCONVERT_XSL_PATH = "%s%sbibconvert%sconfig" % (CFG_ETCDIR, os.sep, os.sep)
def bibconvert_function_libxslt(ctx, value, func):
"""
libxslt extension function:
Bridge between BibConvert formatting functions and XSL stylesheets.
Can be used in that way in XSL stylesheet
(provided xmlns:fn="http://cdsweb.cern.ch/bibconvert/fn" has been declared):
(Adds strings 'mypref' and 'mysuff' as prefix/suffix to current node value,
using BibConvert ADD function)
-
+
if value is int, value is converted to string
if value is Node (PyCObj), first child node (text node) is taken as value
"""
try:
if isinstance(value, str):
string_value = value
elif isinstance(value, int):
string_value = str(value)
else:
string_value = libxml2.xmlNode(_obj=value[0]).children.content
return FormatField(string_value, func).rstrip('\n')
except Exception, err:
sys.stderr.write("Error during formatting function evaluation: " + \
str(err) + \
'\n')
-
+
return ''
def bibconvert_function_4suite(ctx, value, func):
"""
4suite extension function:
Bridge between BibConvert formatting functions and XSL stylesheets.
Can be used in that way in XSL stylesheet
(provided xmlns:fn="http://cdsweb.cern.ch/bibconvert/fn" has been declared):
(Adds strings 'mypref' and 'mysuff' as prefix/suffix to current node value,
using BibConvert ADD function)
if value is int, value is converted to string
if value is Node, first child node (text node) is taken as value
"""
try:
if len(value) > 0 and isinstance(value[0], Node):
string_value = value[0].firstChild.nodeValue
if string_value is None:
string_value = ''
else:
string_value = str(value)
return FormatField(string_value, func).rstrip('\n')
-
+
except Exception, err:
sys.stderr.write("Error during formatting function evaluation: " + \
str(err) + \
'\n')
-
+
return ''
def convert(xmltext, template_filename=None, template_source=None):
"""
Processes an XML text according to a template, and returns the result.
The template can be given either by name (or by path) or by source.
If source is given, name is ignored.
bibconvert_xslt_engine will look for template_filename in standard directories
for templates. If not found, template_filename will be assumed to be a path to
a template. If none can be found, return None.
Raises an exception if cannot find an appropriate XSLT processor.
@param xmltext The string representation of the XML to process
@param template_filename The name of the template to use for the processing
@param template_source The configuration describing the processing.
@return the transformed XML text.
"""
if processor_type == -1:
# No XSLT processor found
raise "No XSLT processor could be found"
-
+
# Retrieve template and read it
if template_source:
template = template_source
elif template_filename:
try:
path_to_templates = (CFG_BIBCONVERT_XSL_PATH + os.sep +
template_filename)
if os.path.exists(path_to_templates):
template = file(path_to_templates).read()
elif os.path.exists(template_filename):
template = file(template_filename).read()
else:
sys.stderr.write(template_filename +' does not exist.')
return None
except IOError:
sys.stderr.write(template_filename +' could not be read.')
return None
else:
sys.stderr.write(template_filename +' was not given.')
return None
result = ""
if processor_type == 0:
# libxml2 & libxslt
-
+
# Register BibConvert functions for use in XSL
libxslt.registerExtModuleFunction("format",
CFG_BIBCONVERT_FUNCTION_NS,
bibconvert_function_libxslt)
# Load template and source
template_xml = libxml2.parseDoc(template)
processor = libxslt.parseStylesheetDoc(template_xml)
source = libxml2.parseDoc(xmltext)
# Transform
result_object = processor.applyStylesheet(source, None)
result = processor.saveResultToString(result_object)
# Deallocate
processor.freeStylesheet()
source.freeDoc()
result_object.freeDoc()
elif processor_type == 1:
# 4suite
# Init
processor = Processor.Processor()
-
+
# Register BibConvert functions for use in XSL
processor.registerExtensionFunction(CFG_BIBCONVERT_FUNCTION_NS,
"format",
bibconvert_function_4suite)
# Load template and source
transform = InputSource.DefaultFactory.fromString(template,
uri=weburl)
source = InputSource.DefaultFactory.fromString(xmltext,
uri=weburl)
processor.appendStylesheet(transform)
# Transform
result = processor.run(source)
else:
sys.stderr.write("No XSLT processor could be found")
-
+
return result
## def bc_profile():
## """
## Runs a benchmark
## """
## global xmltext
## convert(xmltext, 'oaidc2marcxml.xsl')
## return
## def benchmark():
## """
## Benchmark the module, using profile and pstats
## """
## import profile
## import pstats
## from invenio.bibformat import record_get_xml
## global xmltext
-
+
## xmltext = record_get_xml(10, 'oai_dc')
## profile.run('bc_profile()', "bibconvert_xslt_profile")
## p = pstats.Stats("bibconvert_xslt_profile")
## p.strip_dirs().sort_stats("cumulative").print_stats()
-
+
if __name__ == "__main__":
pass
diff --git a/modules/bibedit/lib/bibedit_engine.py b/modules/bibedit/lib/bibedit_engine.py
index 2271c4fd6..82c949d3a 100644
--- a/modules/bibedit/lib/bibedit_engine.py
+++ b/modules/bibedit/lib/bibedit_engine.py
@@ -1,390 +1,390 @@
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
__revision__ = "$Id$"
import os
import time
import cPickle
-from invenio.config import bindir, tmpdir
+from invenio.config import CFG_BINDIR, CFG_TMPDIR
from invenio.bibedit_dblayer import marc_to_split_tag
from invenio.bibedit_config import *
from invenio.search_engine import print_record, record_exists
from invenio.bibrecord import record_xml_output, create_record, field_add_subfield, record_add_field
import invenio.template
bibedit_templates = invenio.template.load('bibedit')
def perform_request_index(ln, recid, cancel, delete, confirm_delete, uid, temp, format_tag, edit_tag,
delete_tag, num_field, add, dict_value=None):
"""Returns the body of main page. """
errors = []
warnings = []
body = ''
if cancel != 0:
os.system("rm %s.tmp" % get_file_path(cancel))
if delete != 0:
if confirm_delete != 0:
body = bibedit_templates.tmpl_deleted(ln, 1, delete, temp, format_tag)
else:
(record, junk) = get_record(ln, delete, uid, "false")
add_field(delete, uid, record, "980", "", "", "c", "DELETED")
save_temp_record(record, uid, "%s.tmp" % get_file_path(delete))
save_xml_record(delete)
body = bibedit_templates.tmpl_deleted(ln)
else:
if recid != 0 :
if record_exists(recid) > 0:
(record, body) = get_record(ln, recid, uid, temp)
if record != '':
if add == 3:
body = ''
if edit_tag is not None and dict_value is not None:
record = edit_record(recid, uid, record, edit_tag, dict_value, num_field)
if delete_tag is not None and num_field is not None:
record = delete_field(recid, uid, record, delete_tag, num_field)
if add == 4:
tag = dict_value.get("add_tag" , '')
ind1 = dict_value.get("add_ind1" , '')
ind2 = dict_value.get("add_ind2" , '')
subcode = dict_value.get("add_subcode", '')
value = dict_value.get("add_value" , '')
if tag != '' and subcode != '' and value != '':
record = add_field(recid, uid, record, tag, ind1, ind2, subcode, value)
body += bibedit_templates.tmpl_table_header(ln, "record", recid, temp, format_tag, add=add)
keys = record.keys()
keys.sort()
for tag in keys:
fields = record.get(str(tag), "empty")
if fields != "empty":
for field in fields:
if field[0]: # Only display if has subfield(s)
body += bibedit_templates.tmpl_table_value(ln, recid, tag,
field, format_tag, "record", add)
if add == 3:
body += bibedit_templates.tmpl_table_value(ln, recid, '', [], format_tag, "record", add, 1)
body += bibedit_templates.tmpl_table_footer(ln, "record", add)
else:
body = bibedit_templates.tmpl_record_choice_box(ln, 2)
else:
if record_exists(recid) == -1:
body = bibedit_templates.tmpl_record_choice_box(ln, 3)
else:
body = bibedit_templates.tmpl_record_choice_box(ln, 1)
else:
body = bibedit_templates.tmpl_record_choice_box(ln, 0)
return (body, errors, warnings)
def perform_request_edit(ln, recid, uid, tag, num_field, num_subfield,
format_tag, temp, del_subfield, add, dict_value):
"""Returns the body of edit page. """
errors = []
warnings = []
body = ''
if record_exists(recid) in (-1, 0):
body = bibedit_templates.tmpl_record_choice_box(ln, 0)
return (body, errors, warnings)
(record, junk) = get_record(ln, recid, uid, temp)
if del_subfield is not None:
record = delete_subfield(recid, uid, record, tag, num_field, num_subfield)
if add == 2:
subcode = dict_value.get("add_subcode", "empty")
value = dict_value.get("add_value" , "empty")
if subcode == '':
subcode = "empty"
if value == '':
value = "empty"
if value != "empty" and subcode != "empty":
record = add_subfield(recid, uid, tag, record, num_field, subcode, value)
body += bibedit_templates.tmpl_table_header(ln, "edit", recid, temp=temp,
tag=tag, num_field=num_field, add=add)
tag = tag[:3]
fields = record.get(str(tag), "empty")
if fields != "empty":
for field in fields:
if field[4] == int(num_field) :
body += bibedit_templates.tmpl_table_value(ln, recid, tag, field, format_tag, "edit", add)
break
body += bibedit_templates.tmpl_table_footer(ln, "edit", add)
return (body, errors, warnings)
def perform_request_submit(ln, recid):
"""Submits record to the database. """
save_xml_record(recid)
errors = []
warnings = []
body = bibedit_templates.tmpl_submit(ln)
return (body, errors, warnings)
def get_file_path(recid):
""" return the file path of record. """
- return "%s/%s_%s" % (tmpdir, CFG_BIBEDIT_TMPFILENAMEPREFIX, str(recid))
+ return "%s/%s_%s" % (CFG_TMPDIR, CFG_BIBEDIT_TMPFILENAMEPREFIX, str(recid))
def save_xml_record(recid):
"""Saves XML record file to database. """
file_path = get_file_path(recid)
file_temp = open("%s.xml" % file_path, 'w')
file_temp.write(record_xml_output(get_temp_record("%s.tmp" % file_path)[1]))
file_temp.close()
- os.system("%s/bibupload -u bibedit -r %s.xml" % (bindir, file_path))
+ os.system("%s/bibupload -u bibedit -r %s.xml" % (CFG_BINDIR, file_path))
os.system("rm %s.tmp" % file_path)
def save_temp_record(record, uid, file_path):
""" Save record dict in temp file. """
file_temp = open(file_path, "w")
cPickle.dump([uid, record], file_temp)
file_temp.close()
def get_temp_record(file_path):
"""Loads record dict from a temp file. """
file_temp = open(file_path)
[uid, record] = cPickle.load(file_temp)
file_temp.close()
return (uid, record)
def get_record(ln, recid, uid, temp):
"""Returns a record dict, and warning message in case of error. """
file_path = get_file_path(recid)
if temp != "false":
warning_temp_file = bibedit_templates.tmpl_warning_temp_file(ln)
else:
warning_temp_file = ''
if os.path.isfile("%s.tmp" % file_path):
(uid_record_temp, record) = get_temp_record("%s.tmp" % file_path)
if uid_record_temp != uid:
time_tmp_file = os.path.getmtime("%s.tmp" % file_path)
time_out_file = int(time.time()) - CFG_BIBEDIT_TIMEOUT
if time_tmp_file < time_out_file :
os.system("rm %s.tmp" % file_path)
record = create_record(print_record(recid, 'xm'))[0]
save_temp_record(record, uid, "%s.tmp" % file_path)
else:
record = ''
else:
record = create_record(print_record(recid, 'xm'))[0]
save_temp_record(record, uid, "%s.tmp" % file_path)
return (record, warning_temp_file)
######### EDIT #########
def edit_record(recid, uid, record, edit_tag, dict_value, num_field):
"""Edits value of a record. """
for num_subfield in range( len(dict_value.keys())/3 ): # Iterate over subfield indices of field
new_subcode = dict_value.get("subcode%s" % num_subfield, None)
old_subcode = dict_value.get("old_subcode%s" % num_subfield, None)
new_value = dict_value.get("value%s" % num_subfield, None)
old_value = dict_value.get("old_value%s" % num_subfield, None)
if new_value is not None and old_value is not None \
and new_subcode is not None and old_subcode is not None: # Make sure we actually get these values
if new_value != '' and new_subcode != '': # Forbid empty values
if new_value != old_value or \
new_subcode != old_subcode: # only change when necessary
edit_tag = edit_tag[:5]
record = edit_subfield(record,
edit_tag,
new_subcode,
new_value,
num_field,
num_subfield)
save_temp_record(record, uid, "%s.tmp" % get_file_path(recid))
return record
def edit_subfield(record, tag, new_subcode, new_value, num_field, num_subfield):
"""Edits the value of a subfield. """
new_value = bibedit_templates.tmpl_clean_value(str(new_value), "html")
(tag, ind1, ind2, junk) = marc_to_split_tag(tag)
fields = record.get(str(tag), None)
if fields is not None:
i = -1
for field in fields:
i += 1
if field[4] == int(num_field):
subfields = field[0]
j = -1
for subfield in subfields:
j += 1
if j == num_subfield: # Rely on counted index to identify subfield to edit...
record[tag][i][0][j] = (new_subcode, new_value)
break
break
return record
######### ADD ########
def add_field(recid, uid, record, tag, ind1, ind2, subcode, value_subfield):
"""Adds a new field to the record. """
tag = tag[:3]
new_field_number = record_add_field(record, tag, ind1, ind2)
record = add_subfield(recid, uid, tag, record, new_field_number, subcode, value_subfield)
save_temp_record(record, uid, "%s.tmp" % get_file_path(recid))
return record
def add_subfield(recid, uid, tag, record, num_field, subcode, value):
"""Adds a new subfield to a field. """
tag = tag[:3]
fields = record.get(str(tag))
i = -1
for field in fields:
i += 1
if field[4] == int(num_field) :
subfields = field[0]
same_subfield = False
for subfield in subfields:
if subfield[0] == subcode:
same_subfield = True
if not same_subfield:
field_add_subfield(record[tag][i], subcode, value)
break
save_temp_record(record, uid, "%s.tmp" % get_file_path(recid))
return record
######### DELETE ########
def delete_field(recid, uid, record, tag, num_field):
"""Deletes field in record. """
(tag, junk, junk, junk) = marc_to_split_tag(tag)
tmp = []
for field in record[tag]:
if field[4] != int(num_field) :
tmp.append(field)
if tmp != []:
record[tag] = tmp
else:
del record[tag]
save_temp_record(record, uid, "%s.tmp" % get_file_path(recid))
return record
def delete_subfield(recid, uid, record, tag, num_field, num_subfield):
"""Deletes subfield of a field. """
(tag, junk, junk, subcode) = marc_to_split_tag(tag)
tmp = []
i = -1
deleted = False
for field in record[tag]:
i += 1
if field[4] == int(num_field):
j = 0
for subfield in field[0]:
if j != num_subfield:
#if subfield[0] != subcode or deleted == True:
tmp.append((subfield[0], subfield[1]))
#else:
# deleted = True
j += 1
break
record[tag][i] = (tmp, record[tag][i][1], record[tag][i][2], record[tag][i][3], record[tag][i][4])
save_temp_record(record, uid, "%s.tmp" % get_file_path(recid))
return record
diff --git a/modules/bibedit/lib/bibrecord_config.py b/modules/bibedit/lib/bibrecord_config.py
index 0c0d15dc4..5b2957f25 100644
--- a/modules/bibedit/lib/bibrecord_config.py
+++ b/modules/bibedit/lib/bibrecord_config.py
@@ -1,55 +1,55 @@
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-## General Public License for more details.
+## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
### CONFIGURATION OPTIONS FOR BIBRECORD LIBRARY
"""bibrecord configuration"""
__revision__ = "$Id$"
-from invenio.config import etcdir
+from invenio.config import CFG_ETCDIR
# location of the MARC21 DTD file:
-CFG_MARC21_DTD = "%s/bibedit/MARC21slim.dtd" % etcdir
+CFG_MARC21_DTD = "%s/bibedit/MARC21slim.dtd" % CFG_ETCDIR
# pylint: disable-msg=C0301
# internal dictionary of warning messages:
CFG_BIBRECORD_WARNING_MSGS = {
0: '' ,
1: 'WARNING: tag missing for field(s)\nValue stored with tag \'000\'',
2: 'WARNING: bad range for tags (tag must be in range 001-999)\nValue stored with tag \'000\'',
3: 'WARNING: Missing atributte \'code\' for subfield\nValue stored with code \'\'',
4: 'WARNING: Missing attributte \'ind1\'\n Value stored with ind1 = \'\'',
5: 'WARNING: Missing attributte \'ind2\'\n Value stored with ind2 = \'\'',
6: 'Import Error\n',
7: 'WARNING: value expected of type string.',
8: 'WARNING: empty datafield',
98:'WARNING: problems importing invenio',
99: 'Document not well formed'
- }
+ }
# verbose level to be used when creating records from XML: (0=least, ..., 9=most)
CFG_BIBRECORD_DEFAULT_VERBOSE_LEVEL = 0
# correction level to be used when creating records from XML: (0=no, 1=yes)
CFG_BIBRECORD_DEFAULT_CORRECT = 0
# XML parsers available: (0=minidom, 1=4suite, 2=PyRXP)
-CFG_BIBRECORD_PARSERS_AVAILABLE = [0, 1, 2]
+CFG_BIBRECORD_PARSERS_AVAILABLE = [0, 1, 2]
diff --git a/modules/bibedit/lib/bibrecord_tests.py b/modules/bibedit/lib/bibrecord_tests.py
index f01fa9866..f1b8632e1 100644
--- a/modules/bibedit/lib/bibrecord_tests.py
+++ b/modules/bibedit/lib/bibrecord_tests.py
@@ -1,822 +1,822 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
__revision__ = "$Id$"
import unittest
from string import expandtabs, replace
-from invenio.config import tmpdir, etcdir
+from invenio.config import CFG_TMPDIR, CFG_ETCDIR
from invenio import bibrecord
# pylint: disable-msg=C0301
class BibRecordSanityTest(unittest.TestCase):
""" bibrecord - sanity test (xml -> create records -> xml)"""
def test_for_sanity(self):
""" bibrecord - demo file sanity test (xml -> create records -> xml)"""
- f = open(tmpdir + '/demobibdata.xml', 'r')
+ f = open(CFG_TMPDIR + '/demobibdata.xml', 'r')
xmltext = f.read()
f.close()
# let's try to reproduce the demo XML MARC file by parsing it and printing it back:
recs = map((lambda x:x[0]), bibrecord.create_records(xmltext))
xmltext_reproduced = bibrecord.records_xml_output(recs)
x = xmltext_reproduced
y = xmltext
# 'normalize' the two XML MARC files for the purpose of comparing
x = expandtabs(x)
y = expandtabs(y)
x = x.replace(' ', '')
y = y.replace(' ', '')
- x = x.replace('\n' % etcdir,
+ x = x.replace('\n' % CFG_ETCDIR,
'')
x = x.replace('', "\n")
x = x.replace('', "\n\n")
x = x[1:100]
y = y[1:100]
self.assertEqual(x, y)
class BibRecordSuccessTest(unittest.TestCase):
""" bibrecord - demo file parsing test """
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize stuff"""
- f = open(tmpdir + '/demobibdata.xml', 'r')
+ f = open(CFG_TMPDIR + '/demobibdata.xml', 'r')
xmltext = f.read()
f.close()
self.recs = map((lambda x: x[0]), bibrecord.create_records(xmltext))
def test_records_created(self):
""" bibrecord - demo file how many records are created """
self.assertEqual(95, len(self.recs))
def test_tags_created(self):
""" bibrecord - demo file which tags are created """
## check if the tags are correct
# tags = ['020', '037', '041', '080', '088', '100', '245', '246', '250', '260', '270', '300', '340', '490', '500', '502', '520', '590', '595', '650', '653', '690', '700', '710', '856', '909', '980', '999']
tags = [u'003', u'005', '020', '035', '037', '041', '080', '088', '100', '245', '246', '250', '260', '269', '270', '300', '340', '490', '500', '502', '520', '590', '595', '650', '653', '690', '695', '700', '710', '720', '856', '859', '901', '909', '916', '960', '961', '962', '963', '970', '980', '999', 'FFT']
t = []
for rec in self.recs:
t.extend(rec.keys())
t.sort()
#eliminate the elements repeated
tt = []
for x in t:
if not x in tt:
tt.append(x)
self.assertEqual(tags, tt)
def test_fields_created(self):
"""bibrecord - demo file how many fields are created"""
## check if the number of fields for each record is correct
fields = [14, 14, 8, 11, 11, 12, 11, 15, 10, 18, 14, 16, 10, 9, 15, 10, 11, 11, 11, 9, 10, 10, 10, 8, 8, 8, 9, 9, 9, 10, 8, 8, 8, 8, 14, 13, 14, 14, 15, 12, 12, 12, 15, 14, 12, 16, 16, 15, 15, 14, 16, 15, 15, 15, 16, 15, 16, 15, 15, 16, 15, 14, 14, 15, 12, 13, 11, 15, 8, 11, 14, 13, 12, 13, 6, 6, 25, 24, 27, 26, 26, 24, 26, 27, 25, 28, 24, 23, 27, 25, 25, 26, 26, 24, 19]
cr = []
ret = []
for rec in self.recs:
cr.append(len(rec.values()))
ret.append(rec)
self.assertEqual(fields, cr)
class BibRecordBadInputTreatmentTest(unittest.TestCase):
""" bibrecord - testing for bad input treatment """
def test_wrong_attribute(self):
"""bibrecord - bad input subfield \'cde\' instead of \'code\'"""
ws = bibrecord.CFG_BIBRECORD_WARNING_MSGS
xml_error1 = """
33engDoe, JohnOn the foo and bar
"""
(rec, st, e) = bibrecord.create_record(xml_error1, 1, 1)
ee =''
for i in e:
if type(i).__name__ == 'str':
if i.count(ws[3])>0:
ee = i
self.assertEqual(bibrecord.warning((3, '(field number: 4)')), ee)
def test_missing_attribute(self):
""" bibrecord - bad input missing \"tag\" """
ws = bibrecord.CFG_BIBRECORD_WARNING_MSGS
xml_error2 = """
33engDoe, JohnOn the foo and bar
"""
(rec, st, e) = bibrecord.create_record(xml_error2, 1, 1)
ee = ''
for i in e:
if type(i).__name__ == 'str':
if i.count(ws[1])>0:
ee = i
self.assertEqual(bibrecord.warning((1, '(field number(s): [2])')), ee)
def test_empty_datafield(self):
""" bibrecord - bad input no subfield """
ws = bibrecord.CFG_BIBRECORD_WARNING_MSGS
xml_error3 = """
33Doe, JohnOn the foo and bar
"""
(rec, st, e) = bibrecord.create_record(xml_error3, 1, 1)
ee = ''
for i in e:
if type(i).__name__ == 'str':
if i.count(ws[8])>0:
ee = i
self.assertEqual(bibrecord.warning((8, '(field number: 2)')), ee)
def test_missing_tag(self):
"""bibrecord - bad input missing end \"tag\" """
ws = bibrecord.CFG_BIBRECORD_WARNING_MSGS
xml_error4 = """
33engDoe, JohnOn the foo and bar
"""
(rec, st, e) = bibrecord.create_record(xml_error4, 1, 1)
ee = ''
for i in e:
if type(i).__name__ == 'str':
if i.count(ws[99])>0:
ee = i
self.assertEqual(bibrecord.warning((99, '(Tagname : datafield)')), ee)
class BibRecordAccentedUnicodeLettersTest(unittest.TestCase):
""" bibrecord - testing accented UTF-8 letters """
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize stuff"""
self.xml_example_record = """33engDöè1, JohnDoe2, J>ohneditorПушкинOn the foo and bar2"""
(self.rec, st, e) = bibrecord.create_record(self.xml_example_record, 1, 1)
def test_accented_unicode_characters(self):
"""bibrecord - accented Unicode letters"""
self.assertEqual(self.xml_example_record,
bibrecord.record_xml_output(self.rec))
self.assertEqual(bibrecord.record_get_field_instances(self.rec, "100", " ", " "),
[([('a', 'Döè1, John')], " ", " ", "", 3), ([('a', 'Doe2, J>ohn'), ('b', 'editor')], " ", " ", "", 4)])
self.assertEqual(bibrecord.record_get_field_instances(self.rec, "245", " ", "1"),
[([('a', 'Пушкин')], " ", '1', "", 5)])
class BibRecordGettingFieldValuesTest(unittest.TestCase):
""" bibrecord - testing for getting field/subfield values """
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize stuff"""
xml_example_record = """
33engDoe1, JohnDoe2, JohneditorOn the foo and bar1On the foo and bar2
"""
(self.rec, st, e) = bibrecord.create_record(xml_example_record, 1, 1)
def test_get_field_instances(self):
"""bibrecord - getting field instances"""
self.assertEqual(bibrecord.record_get_field_instances(self.rec, "100", " ", " "),
[([('a', 'Doe1, John')], " ", " ", "", 3), ([('a', 'Doe2, John'), ('b', 'editor')], " ", " ", "", 4)])
self.assertEqual(bibrecord.record_get_field_instances(self.rec, "", " ", " "),
[('245', [([('a', 'On the foo and bar1')], " ", '1', "", 5), ([('a', 'On the foo and bar2')], " ", '2', "", 6)]), ('001', [([], " ", " ", '33', 1)]), ('100', [([('a', 'Doe1, John')], " ", " ", "", 3), ([('a', 'Doe2, John'), ('b', 'editor')], " ", " ", "", 4)]), ('041', [([('a', 'eng')], " ", " ", "", 2)])])
def test_get_field_values(self):
"""bibrecord - getting field values"""
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "a"),
['Doe1, John', 'Doe2, John'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "b"),
['editor'])
def test_get_field_value(self):
"""bibrecord - getting first field value"""
self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", " ", " ", "a"),
'Doe1, John')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", " ", " ", "b"),
'editor')
def test_get_subfield_values(self):
"""bibrecord - getting subfield values"""
fi1, fi2 = bibrecord.record_get_field_instances(self.rec, "100", " ", " ")
self.assertEqual(bibrecord.field_get_subfield_values(fi1, "b"), [])
self.assertEqual(bibrecord.field_get_subfield_values(fi2, "b"), ["editor"])
class BibRecordGettingFieldValuesViaWildcardsTest(unittest.TestCase):
""" bibrecord - testing for getting field/subfield values via wildcards """
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize stuff"""
xml_example_record = """
1val1val2val3val4aval4bval5val6val7aval7b
"""
(self.rec, st, e) = bibrecord.create_record(xml_example_record, 1, 1)
def test_get_field_instances_via_wildcard(self):
"""bibrecord - getting field instances via wildcards"""
self.assertEqual(bibrecord.record_get_field_instances(self.rec, "100", " ", " "),
[])
self.assertEqual(bibrecord.record_get_field_instances(self.rec, "100", "%", " "),
[])
self.assertEqual(bibrecord.record_get_field_instances(self.rec, "100", "%", "%"),
[([('a', 'val1')], 'C', '5', "", 2)])
self.assertEqual(bibrecord.record_get_field_instances(self.rec, "55%", "A", "%"),
[([('a', 'val2')], 'A', 'B', "", 3),
([('a', 'val3')], 'A', " ", "", 4),
([('a', 'val6')], 'A', 'C', "", 7),
([('a', 'val7a'), ('b', 'val7b')], 'A', " ", "", 8)])
self.assertEqual(bibrecord.record_get_field_instances(self.rec, "55%", "A", " "),
[([('a', 'val3')], 'A', " ", "", 4),
([('a', 'val7a'), ('b', 'val7b')], 'A', " ", "", 8)])
self.assertEqual(bibrecord.record_get_field_instances(self.rec, "556", "A", " "),
[([('a', 'val7a'), ('b', 'val7b')], 'A', " ", "", 8)])
def test_get_field_values_via_wildcard(self):
"""bibrecord - getting field values via wildcards"""
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", " "),
[])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "%", " ", " "),
[])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", "%", " "),
[])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "%", "%", " "),
[])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "%", "%", "z"),
[])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "%"),
[])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "a"),
[])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "%", " ", "a"),
[])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "%", "%", "a"),
['val1'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "%", "%", "%"),
['val1'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "A", "%", "a"),
['val2', 'val3', 'val6', 'val7a'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "A", " ", "a"),
['val3', 'val7a'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "556", "A", " ", "a"),
['val7a'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "555", " ", " ", " "),
[])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "555", " ", " ", "z"),
[])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "555", " ", " ", "%"),
['val4a', 'val4b'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", " ", " ", "b"),
['val4b'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "%", "%", "b"),
['val4b', 'val7b'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "A", " ", "b"),
['val7b'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "A", "%", "b"),
['val7b'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "A", " ", "a"),
['val3', 'val7a'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "A", "%", "a"),
['val2', 'val3', 'val6', 'val7a'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", "%", "%", "a"),
['val2', 'val3', 'val4a', 'val5', 'val6', 'val7a'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "55%", " ", " ", "a"),
['val4a'])
def test_get_field_value_via_wildcard(self):
"""bibrecord - getting first field value via wildcards"""
self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", " ", " ", " "),
'')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", "%", " ", " "),
'')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", " ", "%", " "),
'')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", "%", "%", " "),
'')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", " ", " ", "%"),
'')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", " ", " ", "a"),
'')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", "%", " ", "a"),
'')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", "%", "%", "a"),
'val1')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "100", "%", "%", "%"),
'val1')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "A", "%", "a"),
'val2')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "A", " ", "a"),
'val3')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "556", "A", " ", "a"),
'val7a')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "555", " ", " ", " "),
'')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "555", " ", " ", "%"),
'val4a')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", " ", " ", "b"),
'val4b')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "%", "%", "b"),
'val4b')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "A", " ", "b"),
'val7b')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "A", "%", "b"),
'val7b')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "A", " ", "a"),
'val3')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "A", "%", "a"),
'val2')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", "%", "%", "a"),
'val2')
self.assertEqual(bibrecord.record_get_field_value(self.rec, "55%", " ", " ", "a"),
'val4a')
class BibRecordAddFieldTest(unittest.TestCase):
""" bibrecord - testing adding field """
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize stuff"""
xml_example_record = """
33engDoe1, JohnDoe2, JohneditorOn the foo and bar1On the foo and bar2
"""
(self.rec, st, e) = bibrecord.create_record(xml_example_record, 1, 1)
def test_add_controlfield(self):
"""bibrecord - adding controlfield"""
field_number_1 = bibrecord.record_add_field(self.rec, "003", " ", " ", "SzGeCERN")
field_number_2 = bibrecord.record_add_field(self.rec, "004", " ", " ", "Test")
self.assertEqual(field_number_1, 7)
self.assertEqual(field_number_2, 8)
self.assertEqual(bibrecord.record_get_field_values(self.rec, "003", " ", " ", ""),
['SzGeCERN'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "004", " ", " ", ""),
['Test'])
def test_add_datafield(self):
"""bibrecord - adding datafield"""
field_number_1 = bibrecord.record_add_field(self.rec, "100", " ", " ", "",
[('a', 'Doe3, John')])
field_number_2 = bibrecord.record_add_field(self.rec, "100", " ", " ", "",
[('a', 'Doe4, John'), ('b', 'editor')])
self.assertEqual(field_number_1, 7)
self.assertEqual(field_number_2, 8)
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "a"),
['Doe1, John', 'Doe2, John', 'Doe3, John', 'Doe4, John'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "b"),
['editor', 'editor'])
def test_add_controlfield_on_desired_position(self):
"""bibrecord - adding controlfield on desired position"""
field_number_1 = bibrecord.record_add_field(self.rec, "005", " ", " ", "Foo", [], 0)
field_number_2 = bibrecord.record_add_field(self.rec, "006", " ", " ", "Bar", [], 0)
self.assertEqual(field_number_1, 0)
self.assertEqual(field_number_2, 7)
def test_add_datafield_on_desired_position(self):
"""bibrecord - adding datafield on desired position"""
field_number_1 = bibrecord.record_add_field(self.rec, "100", " ", " ", " ",
[('a', 'Doe3, John')], 0)
field_number_2 = bibrecord.record_add_field(self.rec, "100", " ", " ", " ",
[('a', 'Doe4, John'), ('b', 'editor')], 0)
self.assertEqual(field_number_1, 0)
self.assertEqual(field_number_2, 7)
class BibRecordDeleteFieldTest(unittest.TestCase):
""" bibrecord - testing field deletion """
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize stuff"""
xml_example_record = """
33engDoe1, JohnDoe2, JohneditorOn the foo and bar1On the foo and bar2
"""
(self.rec, st, e) = bibrecord.create_record(xml_example_record, 1, 1)
xml_example_record_empty = """
"""
(self.rec_empty, st, e) = bibrecord.create_record(xml_example_record_empty, 1, 1)
def test_delete_controlfield(self):
"""bibrecord - deleting controlfield"""
bibrecord.record_delete_field(self.rec, "001", " ", " ")
self.assertEqual(bibrecord.record_get_field_values(self.rec, "001", " ", " ", " "),
[])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "b"),
['editor'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "2", "a"),
['On the foo and bar2'])
def test_delete_datafield(self):
"""bibrecord - deleting datafield"""
bibrecord.record_delete_field(self.rec, "100", " ", " ")
self.assertEqual(bibrecord.record_get_field_values(self.rec, "001", " ", " ", ""),
['33'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", " ", " ", "b"),
[])
bibrecord.record_delete_field(self.rec, "245", " ", " ")
self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "1", "a"),
['On the foo and bar1'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "2", "a"),
['On the foo and bar2'])
bibrecord.record_delete_field(self.rec, "245", " ", "2")
self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "1", "a"),
['On the foo and bar1'])
self.assertEqual(bibrecord.record_get_field_values(self.rec, "245", " ", "2", "a"),
[])
def test_add_delete_add_field_to_empty_record(self):
"""bibrecord - adding, deleting, and adding back a field to an empty record"""
field_number_1 = bibrecord.record_add_field(self.rec_empty, "003", " ", " ", "SzGeCERN")
self.assertEqual(field_number_1, 1)
self.assertEqual(bibrecord.record_get_field_values(self.rec_empty, "003", " ", " ", ""),
['SzGeCERN'])
bibrecord.record_delete_field(self.rec_empty, "003", " ", " ")
self.assertEqual(bibrecord.record_get_field_values(self.rec_empty, "003", " ", " ", ""),
[])
field_number_1 = bibrecord.record_add_field(self.rec_empty, "003", " ", " ", "SzGeCERN2")
self.assertEqual(field_number_1, 1)
self.assertEqual(bibrecord.record_get_field_values(self.rec_empty, "003", " ", " ", ""),
['SzGeCERN2'])
class BibRecordSpecialTagParsingTest(unittest.TestCase):
""" bibrecord - parsing special tags (FMT, FFT)"""
def setUp(self):
# pylint: disable-msg=C0103
"""setting up example records"""
self.xml_example_record_with_fmt = """
33engHBLet us see if this gets inserted well.
"""
self.xml_example_record_with_fft = """
33engfile:///foo.pdfhttp://bar.com/baz.ps.gz
"""
self.xml_example_record_with_xyz = """
33engHBLet us see if this gets inserted well.
"""
def test_parsing_file_containing_fmt_special_tag_with_correcting(self):
"""bibrecord - parsing special FMT tag, correcting on"""
rec, st, e = bibrecord.create_record(self.xml_example_record_with_fmt, 1, 1)
self.assertEqual(rec,
{u'001': [([], " ", " ", '33', 1)],
'FMT': [([('f', 'HB'), ('g', 'Let us see if this gets inserted well.')], " ", " ", "", 3)],
'041': [([('a', 'eng')], " ", " ", "", 2)]})
self.assertEqual(bibrecord.record_get_field_values(rec, "041", " ", " ", "a"),
['eng'])
self.assertEqual(bibrecord.record_get_field_values(rec, "FMT", " ", " ", "f"),
['HB'])
self.assertEqual(bibrecord.record_get_field_values(rec, "FMT", " ", " ", "g"),
['Let us see if this gets inserted well.'])
def test_parsing_file_containing_fmt_special_tag_without_correcting(self):
"""bibrecord - parsing special FMT tag, correcting off"""
rec, st, e = bibrecord.create_record(self.xml_example_record_with_fmt, 1, 0)
self.assertEqual(rec,
{u'001': [([], " ", " ", '33', 1)],
'FMT': [([('f', 'HB'), ('g', 'Let us see if this gets inserted well.')], " ", " ", "", 3)],
'041': [([('a', 'eng')], " ", " ", "", 2)]})
self.assertEqual(bibrecord.record_get_field_values(rec, "041", " ", " ", "a"),
['eng'])
self.assertEqual(bibrecord.record_get_field_values(rec, "FMT", " ", " ", "f"),
['HB'])
self.assertEqual(bibrecord.record_get_field_values(rec, "FMT", " ", " ", "g"),
['Let us see if this gets inserted well.'])
def test_parsing_file_containing_fft_special_tag_with_correcting(self):
"""bibrecord - parsing special FFT tag, correcting on"""
rec, st, e = bibrecord.create_record(self.xml_example_record_with_fft, 1, 1)
self.assertEqual(rec,
{u'001': [([], " ", " ", '33', 1)],
'FFT': [([('a', 'file:///foo.pdf'), ('a', 'http://bar.com/baz.ps.gz')], " ", " ", "", 3)],
'041': [([('a', 'eng')], " ", " ", "", 2)]})
self.assertEqual(bibrecord.record_get_field_values(rec, "041", " ", " ", "a"),
['eng'])
self.assertEqual(bibrecord.record_get_field_values(rec, "FFT", " ", " ", "a"),
['file:///foo.pdf', 'http://bar.com/baz.ps.gz'])
def test_parsing_file_containing_fft_special_tag_without_correcting(self):
"""bibrecord - parsing special FFT tag, correcting off"""
rec, st, e = bibrecord.create_record(self.xml_example_record_with_fft, 1, 0)
self.assertEqual(rec,
{u'001': [([], " ", " ", '33', 1)],
'FFT': [([('a', 'file:///foo.pdf'), ('a', 'http://bar.com/baz.ps.gz')], " ", " ", "", 3)],
'041': [([('a', 'eng')], " ", " ", "", 2)]})
self.assertEqual(bibrecord.record_get_field_values(rec, "041", " ", " ", "a"),
['eng'])
self.assertEqual(bibrecord.record_get_field_values(rec, "FFT", " ", " ", "a"),
['file:///foo.pdf', 'http://bar.com/baz.ps.gz'])
def test_parsing_file_containing_xyz_special_tag_with_correcting(self):
"""bibrecord - parsing unrecognized special XYZ tag, correcting on"""
# XYZ should not get accepted when correcting is on; should get changed to 000
rec, st, e = bibrecord.create_record(self.xml_example_record_with_xyz, 1, 1)
self.assertEqual(rec,
{u'001': [([], " ", " ", '33', 1)],
'000': [([('f', 'HB'), ('g', 'Let us see if this gets inserted well.')], " ", " ", "", 3)],
'041': [([('a', 'eng')], " ", " ", "", 2)]})
self.assertEqual(bibrecord.record_get_field_values(rec, "041", " ", " ", "a"),
['eng'])
self.assertEqual(bibrecord.record_get_field_values(rec, "XYZ", " ", " ", "f"),
[])
self.assertEqual(bibrecord.record_get_field_values(rec, "XYZ", " ", " ", "g"),
[])
self.assertEqual(bibrecord.record_get_field_values(rec, "000", " ", " ", "f"),
['HB'])
self.assertEqual(bibrecord.record_get_field_values(rec, "000", " ", " ", "g"),
['Let us see if this gets inserted well.'])
def test_parsing_file_containing_xyz_special_tag_without_correcting(self):
"""bibrecord - parsing unrecognized special XYZ tag, correcting off"""
# XYZ should get accepted without correcting
rec, st, e = bibrecord.create_record(self.xml_example_record_with_xyz, 1, 0)
self.assertEqual(rec,
{u'001': [([], " ", " ", '33', 1)],
'XYZ': [([('f', 'HB'), ('g', 'Let us see if this gets inserted well.')], " ", " ", "", 3)],
'041': [([('a', 'eng')], " ", " ", "", 2)]})
self.assertEqual(bibrecord.record_get_field_values(rec, "041", " ", " ", "a"),
['eng'])
self.assertEqual(bibrecord.record_get_field_values(rec, "XYZ", " ", " ", "f"),
['HB'])
self.assertEqual(bibrecord.record_get_field_values(rec, "XYZ", " ", " ", "g"),
['Let us see if this gets inserted well.'])
class BibRecordPrintingTest(unittest.TestCase):
""" bibrecord - testing for printing record """
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize stuff"""
self.xml_example_record = """
81TEST-ARTICLE-2006-001ARTICLE-2006-001Test ti"""
self.xml_example_record_short = """
81TEST-ARTICLE-2006-001ARTICLE-2006-001"""
self.xml_example_multi_records = """
81TEST-ARTICLE-2006-001ARTICLE-2006-001Test ti82Author, t"""
self.xml_example_multi_records_short = """
81TEST-ARTICLE-2006-001ARTICLE-2006-00182"""
def test_print_rec(self):
"""bibrecord - print rec"""
rec, st, e = bibrecord.create_record(self.xml_example_record, 1, 1)
rec_short, st_short, e_short = bibrecord.create_record(self.xml_example_record_short, 1, 1)
self.assertEqual(bibrecord.create_record(bibrecord.print_rec(rec, tags=[]), 1, 1)[0], rec)
self.assertEqual(bibrecord.create_record(bibrecord.print_rec(rec, tags=["001", "037"]), 1, 1)[0], rec_short)
self.assertEqual(bibrecord.create_record(bibrecord.print_rec(rec, tags=["037"]), 1, 1)[0], rec_short)
def test_print_recs(self):
"""bibrecord - print multiple recs"""
list_of_recs = bibrecord.create_records(self.xml_example_multi_records, 1, 1)
list_of_recs_elems = [elem[0] for elem in list_of_recs]
list_of_recs_short = bibrecord.create_records(self.xml_example_multi_records_short, 1, 1)
list_of_recs_short_elems = [elem[0] for elem in list_of_recs_short]
self.assertEqual(bibrecord.create_records(bibrecord.print_recs(list_of_recs_elems, tags=[]), 1, 1), list_of_recs)
self.assertEqual(bibrecord.create_records(bibrecord.print_recs(list_of_recs_elems, tags=["001", "037"]), 1, 1), list_of_recs_short)
self.assertEqual(bibrecord.create_records(bibrecord.print_recs(list_of_recs_elems, tags=["037"]), 1, 1), list_of_recs_short)
def create_test_suite():
"""Return test suite for the bibrecord module"""
return unittest.TestSuite((unittest.makeSuite(BibRecordSanityTest, 'test'),
unittest.makeSuite(BibRecordSuccessTest, 'test'),
unittest.makeSuite(BibRecordBadInputTreatmentTest, 'test'),
unittest.makeSuite(BibRecordGettingFieldValuesTest, 'test'),
unittest.makeSuite(BibRecordGettingFieldValuesViaWildcardsTest, 'test'),
unittest.makeSuite(BibRecordAddFieldTest, 'test'),
unittest.makeSuite(BibRecordDeleteFieldTest, 'test'),
unittest.makeSuite(BibRecordAccentedUnicodeLettersTest, 'test'),
unittest.makeSuite(BibRecordSpecialTagParsingTest, 'test'),
unittest.makeSuite(BibRecordPrintingTest, 'test'),
))
if __name__ == '__main__':
unittest.TextTestRunner(verbosity=2).run(create_test_suite())
diff --git a/modules/bibedit/lib/refextract_config.py b/modules/bibedit/lib/refextract_config.py
index ebc1b1025..7f3b858dc 100644
--- a/modules/bibedit/lib/refextract_config.py
+++ b/modules/bibedit/lib/refextract_config.py
@@ -1,75 +1,75 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""RefExtract configuration."""
__revision__ = "$Id$"
-from invenio.config import version, etcdir, cdsname
+from invenio.config import CFG_VERSION, CFG_ETCDIR, cdsname
# pylint: disable-msg=C0301
# version number:
-CFG_REFEXTRACT_VERSION = "CDS Invenio/%s refextract/%s" % (version, version)
+CFG_REFEXTRACT_VERSION = "CDS Invenio/%s refextract/%s" % (CFG_VERSION, CFG_VERSION)
# periodicals knowledge base:
-CFG_REFEXTRACT_KB_JOURNAL_TITLES = "%s/bibedit/refextract-journal-titles.kb" % etcdir
+CFG_REFEXTRACT_KB_JOURNAL_TITLES = "%s/bibedit/refextract-journal-titles.kb" % CFG_ETCDIR
# report numbers knowledge base:
-CFG_REFEXTRACT_KB_REPORT_NUMBERS = "%s/bibedit/refextract-report-numbers.kb" % etcdir
+CFG_REFEXTRACT_KB_REPORT_NUMBERS = "%s/bibedit/refextract-report-numbers.kb" % CFG_ETCDIR
## MARC Fields and subfields used by refextract:
## reference fields:
CFG_REFEXTRACT_CTRL_FIELD_RECID = "001" ## control-field recid
CFG_REFEXTRACT_TAG_ID_REFERENCE = "999" ## ref field tag
CFG_REFEXTRACT_IND1_REFERENCE = "C" ## ref field ind1
CFG_REFEXTRACT_IND2_REFERENCE = "5" ## ref field ind2
CFG_REFEXTRACT_SUBFIELD_MARKER = "o" ## ref marker subfield
CFG_REFEXTRACT_SUBFIELD_MISC = "m" ## ref misc subfield
CFG_REFEXTRACT_SUBFIELD_REPORT_NUM = "r" ## ref reportnum subfield
CFG_REFEXTRACT_SUBFIELD_TITLE = "s" ## ref title subfield
CFG_REFEXTRACT_SUBFIELD_URL = "u" ## ref url subfield
CFG_REFEXTRACT_SUBFIELD_URL_DESCR = "z" ## ref url-text subfield
## refextract statisticts fields:
CFG_REFEXTRACT_TAG_ID_EXTRACTION_STATS = "999" ## ref-stats tag
CFG_REFEXTRACT_IND1_EXTRACTION_STATS = "C" ## ref-stats ind1
CFG_REFEXTRACT_IND2_EXTRACTION_STATS = "6" ## ref-stats ind2
CFG_REFEXTRACT_SUBFIELD_EXTRACTION_STATS = "a" ## ref-stats subfield
## Internal tags are used by refextract to mark-up recognised citation
## information. These are the "closing tags:
CFG_REFEXTRACT_MARKER_CLOSING_REPORT_NUM = r""
CFG_REFEXTRACT_MARKER_CLOSING_TITLE = r""
CFG_REFEXTRACT_MARKER_CLOSING_SERIES = r""
CFG_REFEXTRACT_MARKER_CLOSING_VOLUME = r""
CFG_REFEXTRACT_MARKER_CLOSING_YEAR = r""
CFG_REFEXTRACT_MARKER_CLOSING_PAGE = r""
## XML Record and collection opening/closing tags:
CFG_REFEXTRACT_XML_VERSION = u""""""
CFG_REFEXTRACT_XML_COLLECTION_OPEN = u""""""
CFG_REFEXTRACT_XML_COLLECTION_CLOSE = u"""\n"""
CFG_REFEXTRACT_XML_RECORD_OPEN = u""
CFG_REFEXTRACT_XML_RECORD_CLOSE = u""
diff --git a/modules/bibformat/lib/bibformat_bfx_engine.py b/modules/bibformat/lib/bibformat_bfx_engine.py
index aa17be5d3..de176a104 100644
--- a/modules/bibformat/lib/bibformat_bfx_engine.py
+++ b/modules/bibformat/lib/bibformat_bfx_engine.py
@@ -1,1260 +1,1260 @@
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-## General Public License for more details.
+## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
-BFX formatting engine.
+BFX formatting engine.
For API: see format_with_bfx() docstring below.
"""
__revision__ = "$Id$"
import re
import copy as p_copy
from xml.dom import minidom, Node
from xml.sax import saxutils
from invenio.bibformat_engine import BibFormatObject, get_format_element, eval_format_element
-from invenio.bibformat_bfx_engine_config import CFG_BIBFORMAT_BFX_LABEL_DEFINITIONS, CFG_BIBFORMAT_BFX_TEMPLATES_PATH
+from invenio.bibformat_bfx_engine_config import CFG_BIBFORMAT_BFX_LABEL_DEFINITIONS, CFG_BIBFORMAT_BFX_TEMPLATES_PATH
from invenio.bibformat_bfx_engine_config import CFG_BIBFORMAT_BFX_FORMAT_TEMPLATE_EXTENSION, CFG_BIBFORMAT_BFX_ELEMENT_NAMESPACE
from invenio.bibformat_bfx_engine_config import CFG_BIBFORMAT_BFX_ERROR_MESSAGES, CFG_BIBFORMAT_BFX_WARNING_MESSAGES
address_pattern = r'(?P[a-z_]*):?/?(?P[0-9_?\w]*)/?(?P[\w_?]?)#?(?P.*)'
def format_with_bfx(recIDs, out_file, template_name, preprocess=None):
'''
Format a set of records according to a BFX template.
This is the main entry point to the BFX engine.
-
+
@param recIDs a list of record IDs to format
@param out_file an object to write in; this can be every object which has a 'write' method: file, req, StringIO
@param template_name the file name of the BFX template without the path and the .bfx extension
@param preprocess an optional function; every record is passed through this function for initial preprocessing before formatting
'''
trans = MARCTranslator(CFG_BIBFORMAT_BFX_LABEL_DEFINITIONS)
trans.set_record_ids(recIDs, preprocess)
parser = BFXParser(trans)
template_tree = parser.load_template(template_name)
parser.walk(template_tree, out_file)
return None
class BFXParser:
'''
A general-purpose parser for generating xml/xhtml/text output based on a template system.
Must be initialised with a translator. A translator is like a blackbox that returns values, calls functions, etc...
Works with every translator supporting the following simple interface:
- is_defined(name)
- get_value(name)
- iterator(name)
- call_function(func_name, list_of_parameters)
Customized for MARC to XML conversion through the use of a MARCTranslator.
- Templates are strict XML files. They are built by combining any tags with the
+ Templates are strict XML files. They are built by combining any tags with the
special BFX tags living in the http://cdsware.cern.ch/invenio/ namespace.
Easily extensible by tags of your own.
Defined tags:
- template: defines a template
- template_ref: a reference to a template
- loop structure
- if, then, elif, else structure
- text: output text
- field: query translator for field 'name'
- element: call external functions
'''
def __init__(self, translator):
'''
Create an instance of the BFXParser class. Initialize with a translator.
The BFXparser makes queries to the translator for the values of certain names.
For the communication it uses the following translator methods:
- is_defined(name)
- iterator(name)
- get_value(name, [display_specifier])
@param translator the translator used by the class instance
'''
self.translator = translator
self.known_operators = ['style', 'format', 'template', 'template_ref', 'text', 'field', 'element', 'loop', 'if', 'then', 'else', 'elif']
self.flags = {} # store flags here;
self.templates = {} # store templates and formats here
- self.start_template_name = None #the name of the template from which the 'execution' starts;
+ self.start_template_name = None #the name of the template from which the 'execution' starts;
#this is usually a format or the only template found in a doc
def load_template(self, template_name, template_source=None):
'''
Load a BFX template file.
A template file can have one of two forms:
- it is a file with a single template. Root tag is 'template'.
In an API call the single template element is 'executed'.
- it is a 'style' file which contains exactly one format and zero or more templates. Root tag is 'style' with children 'format' and 'template'(s).
In this case only the format code is 'executed'. Naturally, in it, it would have references to other templates in the document.
Template can be given by name (in that case search path is in
standard directory for bfx template) or directly using the template source.
If given, template_source overrides template_name
-
+
@param template_name the name of the BFX template, the same as the name of the filename without the extension
@return a DOM tree of the template
'''
if template_source is None:
template_file_name = CFG_BIBFORMAT_BFX_TEMPLATES_PATH + '/' + template_name + '.' + CFG_BIBFORMAT_BFX_FORMAT_TEMPLATE_EXTENSION
#load document
doc = minidom.parse(template_file_name)
else:
doc = minidom.parseString(template_source)
#set exec flag to false and walk document to find templates and formats
self.flags['exec'] = False
self.walk(doc)
#check found templates
if self.start_template_name:
start_template = self.templates[self.start_template_name]['node']
else:
#print CFG_BIBFORMAT_BFX_WARNING_MESSAGES['WRN_BFX_NO_FORMAT_FOUND']
if len(self.templates) == 1:
# no format found, check if there is a default template
self.start_template_name = self.templates.keys()[0]
start_template = self.templates[self.start_template_name]['node']
else:
#no formats found, templates either zero or more than one
if len(self.templates) > 1:
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_TOO_MANY_TEMPLATES']
#else:
# print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_NO_TEMPLATES_FOUND']
return None
self.flags['exec'] = True
return start_template
def parse_attribute(self, expression):
'''
A function to check if an expression is of the special form [!name:display].
A short form for saying , used in element attributes.
@param expression a string, usually taken from an attribute value
@return if the string is special, parse it and return the corresponding value; else return the initial expression
'''
output = expression
pattern = '\[!(?P[\w_.:]*)\]'
expr = re.compile(pattern)
match = expr.match(expression)
if match:
tmp = match.group('tmp')
tmp = tmp.split(':')
var = tmp[0]
display = ''
if len(tmp) == 2:
display = tmp[1]
output = self.translator.get_value(var, display)
output = xml_escape(output)
return output
-
+
def walk(self, parent, out_file=None):
'''
Walk a template DOM tree.
The main function in the parser. It is recursively called until all the nodes are processed.
This function is used in two different ways:
- for initial loading of the template (and validation)
- for 'execution' of a format/template
The different behaviour is achieved through the use of flags, which can be set to True or False.
@param parent a node to process; in an API call this is the root node
@param out_file an object to write to; must have a 'write' method
-
+
@return None
'''
for node in parent.childNodes:
if node.nodeType == Node.TEXT_NODE:
value = get_node_value(node)
value = value.strip()
if out_file:
out_file.write(value)
if node.nodeType == Node.ELEMENT_NODE:
#get values
name, attributes, element_namespace = get_node_name(node), get_node_attributes(node), get_node_namespace(node)
# write values
if element_namespace != CFG_BIBFORMAT_BFX_ELEMENT_NAMESPACE:
#parse all the attributes
for key in attributes.keys():
attributes[key] = self.parse_attribute(attributes[key])
if node_has_subelements(node):
if out_file:
out_file.write(create_xml_element(name=name, attrs=attributes, element_type=xmlopen))
self.walk(node, out_file) #walk subnodes
if out_file:
out_file.write(create_xml_element(name=name, element_type=xmlclose))
else:
if out_file:
out_file.write(create_xml_element(name=name, attrs=attributes, element_type=xmlempty))
#name is a special name, must fall in one of the next cases:
elif node.localName == 'style':
self.ctl_style(node, out_file)
elif node.localName == 'format':
self.ctl_format(node, out_file)
elif node.localName == 'template':
self.ctl_template(node, out_file)
elif node.localName == 'template_ref':
self.ctl_template_ref(node, out_file)
elif node.localName == 'element':
self.ctl_element(node, out_file)
elif node.localName == 'field':
self.ctl_field(node, out_file)
elif node.localName == 'text':
self.ctl_text(node, out_file)
elif node.localName == 'loop':
self.ctl_loop(node, out_file)
elif node.localName == 'if':
self.ctl_if(node, out_file)
elif node.localName == 'then':
self.ctl_then(node, out_file)
elif node.localName == 'else':
self.ctl_else(node, out_file)
elif node.localName == 'elif':
self.ctl_elif(node, out_file)
else:
if node.localName in self.known_operators:
print 'Note for programmer: you haven\'t implemented operator %s.' % (name)
else:
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_INVALID_OPERATOR_NAME'] % (name)
return None
def ctl_style(self, node, out_file):
'''
Process a style root node.
'''
#exec mode
if self.flags['exec']:
return None
#test mode
self.walk(node, out_file)
return None
def ctl_format(self, node, out_file):
'''
Process a format node.
Get name, description and content attributes.
This function is called only in test mode.
'''
#exec mode
if self.flags['exec']:
return None
#test mode
attrs = get_node_attributes(node)
#get template name and give control to ctl_template
if attrs.has_key('name'):
name = attrs['name']
if self.templates.has_key(name):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_DUPLICATE_NAME'] % (name)
return None
self.start_template_name = name
self.ctl_template(node, out_file)
else:
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_TEMPLATE_NO_NAME']
return None
return None
-
+
def ctl_template(self, node, out_file):
'''
Process a template node.
Get name, description and content attributes.
Register name and store for later calls from template_ref.
This function is called only in test mode.
'''
#exec mode
if self.flags['exec']:
return None
#test mode
attrs = get_node_attributes(node)
#get template name
if attrs.has_key('name'):
name = attrs['name']
if self.templates.has_key(name):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_DUPLICATE_NAME'] % (name)
return None
self.templates[name] = {}
self.templates[name]['node'] = node
else:
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_TEMPLATE_NO_NAME']
return None
#get template description
if attrs.has_key('description'):
description = attrs['description']
else:
description = ''
print CFG_BIBFORMAT_BFX_WARNING_MESSAGES['WRN_BFX_TEMPLATE_NO_DESCRIPTION']
self.templates[name]['description'] = description
#get content-type of resulting output
if attrs.has_key('content'):
content_type = attrs['content']
else:
content_type = 'text/xml'
print CFG_BIBFORMAT_BFX_WARNING_MESSAGES['WRN_BFX_TEMPLATE_NO_CONTENT']
self.templates[name]['content_type'] = content_type
#walk node
self.walk(node, out_file)
return None
-
+
def ctl_template_ref(self, node, out_file):
'''
Reference to an external template.
This function is called only in execution mode. Bad references appear as run-time errors.
'''
#test mode
if not self.flags['exec']:
return None
#exec mode
attrs = get_node_attributes(node)
- if not attrs.has_key('name'):
+ if not attrs.has_key('name'):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_TEMPLATE_REF_NO_NAME']
return None
name = attrs['name']
#first check for a template in the same file, that is in the already cached templates
if self.templates.has_key(name):
node_to_walk = self.templates[name]['node']
self.walk(node_to_walk, out_file)
else:
#load a file and execute it
pass
- #template_file_name = CFG_BIBFORMAT_BFX_TEMPLATES_PATH + name + '/' + CFG_BIBFORMAT_BFX_FORMAT_TEMPLATE_EXTENSION
+ #template_file_name = CFG_BIBFORMAT_BFX_TEMPLATES_PATH + name + '/' + CFG_BIBFORMAT_BFX_FORMAT_TEMPLATE_EXTENSION
#try:
# node = minidom.parse(template_file_name)
#except:
# print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_TEMPLATE_NOT_FOUND'] % (template_file_name)
return None
-
+
def ctl_element(self, node, out_file):
'''
Call an external element (written in Python).
'''
#test mode
if not self.flags['exec']:
return None
#exec mode
parameters = get_node_attributes(node)
if not parameters.has_key('name'):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_ELEMENT_NO_NAME']
return None
function_name = parameters['name']
del parameters['name']
#now run external bfe_name.py, with param attrs
if function_name:
value = self.translator.call_function(function_name, parameters)
value = xml_escape(value)
out_file.write(value)
return None
-
+
def ctl_field(self, node, out_file):
'''
Get the value of a field by its name.
'''
#test mode
- if not self.flags['exec']:
+ if not self.flags['exec']:
return None
#exec mode
attrs = get_node_attributes(node)
if not attrs.has_key('name'):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_FIELD_NO_NAME']
return None
display = ''
if attrs.has_key('display'):
display = attrs['display']
var = attrs['name']
if not self.translator.is_defined(var):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_NO_SUCH_FIELD'] % (var)
return None
value = self.translator.get_value(var, display)
value = xml_escape(value)
out_file.write(value)
return None
def ctl_text(self, node, out_file):
'''
Output a text
'''
#test mode
- if not self.flags['exec']:
+ if not self.flags['exec']:
return None
#exec mode
attrs = get_node_attributes(node)
if not attrs.has_key('value'):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_TEXT_NO_VALUE']
return None
value = attrs['value']
value = value.replace(r'\n', '\n')
#value = xml_escape(value)
if type(value) == type(u''):
value = value.encode('utf-8')
out_file.write(value)
return None
def ctl_loop(self, node, out_file):
'''
Loop through a set of values.
'''
#test mode
- if not self.flags['exec']:
+ if not self.flags['exec']:
self.walk(node, out_file)
return None
#exec mode
attrs = get_node_attributes(node)
if not attrs.has_key('object'):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_LOOP_NO_OBJECT']
return None
name = attrs['object']
if not self.translator.is_defined(name):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_NO_SUCH_FIELD'] % (name)
return None
for new_object in self.translator.iterator(name):
self.walk(node, out_file)
return None
def ctl_if(self, node, out_file):
'''
An if/then/elif/.../elif/else construct.
'If' can have several forms:
: True if var is non-empty, eval as string
: True if var=value, eval as string
: True if var : True if var>value, try to eval as num, else eval as string
: True if var<=value, try to eval as num, else eval as string
: True if var>=value, try to eval as num, else eval as string
: True if var in [val1, val2], eval as string
: True if var not in [val1, val2], eval as string
: True if var!=value, eval as string
: Match against a regular expression
-
+
Example:
PauliPauliother
'''
#test mode
if not self.flags['exec']:
self.walk(node, out_file)
return None
#exec mode
attrs = get_node_attributes(node)
if not attrs.has_key('name'):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_IF_NO_NAME']
return None
- #determine result
+ #determine result
var = attrs['name']
if not self.translator.is_defined(var):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_NO_SUCH_FIELD'] % (var)
- return None
+ return None
value = self.translator.get_value(var)
value = value.strip()
#equal
if attrs.has_key('eq'):
pattern = attrs['eq']
if is_number(pattern) and is_number(value):
result = (float(value)==float(pattern))
else:
result = (value==pattern)
#not equal
elif attrs.has_key('neq'):
pattern = attrs['neq']
if is_number(pattern) and is_number(value):
result = (float(value)!=float(pattern))
else:
- result = (value!=pattern)
+ result = (value!=pattern)
#lower than
elif attrs.has_key('lt'):
pattern = attrs['lt']
if is_number(pattern) and is_number(value):
result = (float(value)float(pattern))
else:
result = (value>pattern)
#lower or equal than
elif attrs.has_key('le'):
pattern = attrs['le']
if is_number(pattern) and is_number(value):
result = (float(value)<=float(pattern))
else:
result = (value<=pattern)
#greater or equal than
elif attrs.has_key('ge'):
pattern = attrs['ge']
if is_number(pattern) and is_number(value):
result = (float(value)>=float(pattern))
else:
result = (value>=pattern)
#in
elif attrs.has_key('in'):
pattern = attrs['in']
values = pattern.split()
result = (value in values)
#not in
elif attrs.has_key('nin'):
pattern = attrs['nin']
values = pattern.split()
result = (value not in values)
#match against a regular expression
elif attrs.has_key('like'):
pattern = attrs['like']
try:
expr = re.compile(pattern)
result = expr.match(value)
except:
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_INVALID_RE'] % (pattern)
#simple form: True if non-empty, otherwise False
else:
result = value
#end of evaluation
#=================
#validate subnodes
then_node = get_node_subelement(node, 'then', CFG_BIBFORMAT_BFX_ELEMENT_NAMESPACE)
else_node = get_node_subelement(node, 'else', CFG_BIBFORMAT_BFX_ELEMENT_NAMESPACE)
elif_node = get_node_subelement(node, 'elif', CFG_BIBFORMAT_BFX_ELEMENT_NAMESPACE)
#having else and elif siblings at the same time is a syntax error
if (else_node is not None) and (elif_node is not None):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_IF_WRONG_SYNTAX']
return None
- #now walk appropriate nodes, according to the result
+ #now walk appropriate nodes, according to the result
if result: #True
if then_node:
self.walk(then_node, out_file)
#todo: add short form, without 'then', just elements within if statement to walk on 'true' and no 'elif' or 'else' elements
else: #False
if elif_node:
self.ctl_if(elif_node, out_file)
elif else_node:
self.walk(else_node, out_file)
return None
def ctl_then(self, node, out_file):
'''
Calling 'then' directly from the walk function means a syntax error.
'''
#test mode
if not self.flags['exec']:
self.walk(node, out_file)
return None
#exec mode
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_IF_WRONG_SYNTAX']
return None
-
+
def ctl_else(self, node, out_file):
'''
Calling 'else' directly from the walk function means a syntax error.
'''
#test mode
if not self.flags['exec']:
self.walk(node, out_file)
return None
#exec mode
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_IF_WRONG_SYNTAX']
return None
-
+
def ctl_elif(self, node, out_file):
'''
Calling 'elif' directly from the walk function means a syntax error.
'''
#test mode
if not self.flags['exec']:
self.walk(node, out_file)
return None
- #exec mode
+ #exec mode
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_IF_WRONG_SYNTAX']
return None
-
-
+
+
class MARCTranslator:
'''
memory[name]
[name]['addresses'] - the set of rules for each of the defined names
[name]['parent'] - the name of the parent; '' if none;
[name]['children'] - a list with the name of the children of every variable
[name]['object'] - stored state of object for performance efficiency
'''
def __init__(self, labels=None):
'''
Create an instance of the translator and init with the list of the defined labels and their rules.
'''
if labels is None:
labels = {}
self.recIDs = []
self.recID = 0
self.recID_index = 0
self.record = None
self.memory = {}
pattern = address_pattern
- expr = re.compile(pattern)
+ expr = re.compile(pattern)
for name in labels.keys():
self.memory[name] = {}
self.memory[name]['object'] = None
self.memory[name]['parent'] = ''
self.memory[name]['children'] = []
self.memory[name]['addresses'] = p_copy.deepcopy(labels[name])
for name in self.memory:
for i in range(len(self.memory[name]['addresses'])):
address = self.memory[name]['addresses'][i]
match = expr.match(address)
if not match:
print 'Invalid address: ', name, address
else:
parent_name = match.group('parent')
if parent_name:
if not self.memory.has_key(parent_name):
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_NO_SUCH_FIELD'] % (parent_name)
else:
self.memory[name]['parent'] = parent_name
#now make parent aware of children
if not name in self.memory[parent_name]['children']:
self.memory[parent_name]['children'].append(name)
level = self.determine_level(parent_name)
self.memory[name]['addresses'][i] = self.memory[name]['addresses'][i].replace(parent_name, '/'*level)
#special case 'record'
self.memory['record'] = {}
self.memory['record']['object'] = None
self.memory['record']['parent'] = ''
self.memory['record']['children'] = []
def set_record_ids(self, recIDs, preprocess=None):
'''
Initialize the translator with the set of record IDs.
@param recIDs a list of the record IDs
@param preprocess an optional function which acts on every record structure after creating it
- This can be used to enrich the record with fields not present in the record initially,
+ This can be used to enrich the record with fields not present in the record initially,
verify the record data or whatever plausible.
Another solution is to use external function elements.
'''
- self.record = None
+ self.record = None
self.recIDs = recIDs
self.preprocess = preprocess
if self.recIDs:
self.recID_index = 0
self.recID = self.recIDs[self.recID_index]
self.record = get_record(self.recID)
if self.preprocess:
self.preprocess(self.record)
- return None
-
+ return None
+
def determine_level(self, name):
'''
Determine the type of the variable, whether this is an instance or a subfield.
This is done by observing the first provided address for the name.
todo: define variable types in config file, remove this function, results in a clearer concept
'''
level = 0 #default value
if self.memory.has_key(name):
expr = re.compile(address_pattern)
if self.memory[name]['addresses']:
match = expr.match(self.memory[name]['addresses'][0])
if match:
- tag = match.group('tag')
+ tag = match.group('tag')
code = match.group('code')
reg = match.group('reg')
if reg:
level = 2 #subfield
elif code:
level = 2 #subfield
elif tag:
level = 1 #instance
return level
#========================================
#API functions for quering the translator
#========================================
def is_defined(self, name):
'''
Check whether a variable is defined.
@param name the name of the variable
'''
return self.memory.has_key(name)
def get_num_elements(self, name):
'''
An API function to get the number of elements for a variable.
Do not use this function to build loops, Use iterator instead.
'''
if name == 'record':
return len(self.recIDs)
num = 0
for part in self.iterator(name):
num = num + 1
return num
def get_value(self, name, display_type='value'):
'''
The API function for quering the translator for values of a certain variable.
Called in a loop will result in a different value each time.
Objects are cached in memory, so subsequent calls for the same variable take less time.
@param name the name of the variable you want the value of
@param display_type an optional value for the type of the desired output, one of: value, tag, ind1, ind2, code, fulltag;
These can be easily added in the proper place of the code (display_value)
'''
if name == 'record':
return ''
record = self.get_object(name)
return self.display_record(record, display_type)
-
+
def iterator(self, name):
'''
An iterator over the values of a certain name.
The iterator changes state of internal variables and objects.
When calling get_value in a loop, this will result each time in a different value.
'''
if name == 'record':
for self.recID in self.recIDs:
self.record = get_record(self.recID)
if self.preprocess:
self.preprocess(self.record)
yield str(self.recID)
else:
full_object = self.build_object(name)
level = self.determine_level(name)
for new_object in record_parts(full_object, level):
self.memory[name]['object'] = new_object
#parent has changed state; also set childs state to None;
for children_name in self.memory[name]['children']:
self.memory[children_name]['object'] = None
yield new_object
#the result for a call of the same name after an iterator should be the same as if there was no iterator called before
self.memory[name]['object'] = None
-
+
def call_function(self, function_name, parameters=None):
'''
Call an external element which is a Python file, using BibFormat
@param function_name the name of the function to call
@param parameters a dictionary of the parameters to pass as key=value pairs
@return a string value, which is the result of the function call
'''
if parameters is None:
parameters = {}
bfo = BibFormatObject(self.recID)
format_element = get_format_element(function_name)
(value, errors) = eval_format_element(format_element, bfo, parameters)
#to do: check errors from function call
return value
-
+
#========================================
#end of API functions
#========================================
def get_object(self, name):
'''
Responsible for creating the desired object, corresponding to provided name.
If object is not cached in memory, it is build again.
Directly called by API function get_value.
The result is then formatted by display_record according to display_type.
'''
if self.memory[name]['object'] is not None:
return self.memory[name]['object']
new_object = self.build_object(name)
#if you have reached here you are not in an iterator; return first non-empty
- level = self.determine_level(name)
+ level = self.determine_level(name)
for tmp_object in record_parts(new_object, level):
#get the first non-empty
if tmp_object:
new_object = tmp_object
break
self.memory[name]['object'] = new_object
return new_object
def build_object(self, name):
'''
Build the object from the list of addresses
A slave function for get_object.
'''
new_object = {}
- parent_name = self.memory[name]['parent'];
+ parent_name = self.memory[name]['parent'];
has_parent = parent_name
- for address in self.memory[name]['addresses']:
+ for address in self.memory[name]['addresses']:
if not has_parent:
tmp_object = copy(self.record, address)
new_object = merge(new_object, tmp_object)
else: #has parent
parent_object = self.get_object(parent_name) #already returns the parents instance
tmp_object = copy(parent_object, address)
new_object = merge(new_object, tmp_object)
return new_object
-
-
+
+
def display_record(self, record, display_type='value'):
'''
Decide what the final output value is according to the display_type.
@param record the record structure to display; this is most probably just a single subfield
@param display_type a string specifying the desired output; can be one of: value, tag, ind1, ind2, code, fulltag
@return a string to output
'''
output = ''
tag, ind1, ind2, code, value = '', '', '', '', ''
if record:
tags = record.keys()
tags.sort()
if tags:
fulltag = tags[0]
tag, ind1, ind2 = fulltag[0:3], fulltag[3:4], fulltag[4:5]
field_instances = record[fulltag]
if field_instances:
field_instance = field_instances[0]
codes = field_instance.keys()
codes.sort()
if codes:
code = codes[0]
value = field_instance[code]
if not display_type:
display_type = 'value'
if display_type == 'value':
output = value
elif display_type == 'tag':
output = tag
elif display_type == 'ind1':
ind1 = ind1.replace('_', ' ')
output = ind1
elif display_type=='ind2':
ind2 = ind2.replace('_', ' ')
output = ind2
elif display_type == 'code':
output = code
elif display_type == 'fulltag':
output = tag + ind1 + ind2
else:
print CFG_BIBFORMAT_BFX_ERROR_MESSAGES['ERR_BFX_INVALID_DISPLAY_TYPE'] % (display_type)
- return output
+ return output
'''
Functions for use with the structure representing a MARC record defined here.
This record structure differs from the one defined in bibrecord.
The reason is that we want a symmetry between controlfields and datafields.
In this format controlfields are represented internally as a subfield value with code ' ' of a datafield.
This allows for easier handling of the fields.
-However, there is a restriction associated with this structure and it is that subfields cannot be repeated
+However, there is a restriction associated with this structure and it is that subfields cannot be repeated
in the same instance. If this is the case, the result will be incorrect.
The record structure has the form:
fields={field_tag:field_instances}
field_instances=[field_instance]
field_instance={field_code:field_value}
'''
def convert_record(old_record):
'''
Convert a record from the format defined in bibrecord to the format defined here
@param old_record the record as returned from bibrecord.create_record()
@return a record of the new form
'''
fields = {}
old_tags = old_record.keys()
old_tags.sort()
for old_tag in old_tags:
if int(old_tag) < 11:
- #controlfields
+ #controlfields
new_tag = old_tag
fields[new_tag] = [{' ':old_record[old_tag][0][3]}]
else:
#datafields
old_field_instances = old_record[old_tag]
num_fields = len(old_field_instances)
for i in range(num_fields):
old_field_instance = old_field_instances[i]
ind1 = old_field_instance[1]
if not ind1 or ind1 == ' ':
ind1 = '_'
ind2 = old_field_instance[2]
if not ind2 or ind2 == ' ':
ind2 = '_'
new_tag = old_tag + ind1 + ind2
new_field_instance = {}
for old_subfield in old_field_instance[0]:
new_code = old_subfield[0]
new_value = old_subfield[1]
if new_field_instance.has_key(new_code):
print 'Error: Repeating subfield codes in the same instance!'
new_field_instance[new_code] = new_value
if not fields.has_key(new_tag):
fields[new_tag] = []
fields[new_tag].append(new_field_instance)
return fields
def get_record(recID):
'''
Get a record with a specific recID.
@param recID the ID of the record
@return a record in the structure defined here
'''
bfo = BibFormatObject(recID)
return convert_record(bfo.get_record())
def print_record(record):
'''
Print a record.
'''
tags = record.keys()
tags.sort()
for tag in tags:
field_instances = record[tag]
for field_instance in field_instances:
print tag, field_instance
def record_fields_value(record, tag, subfield):
'''
Return a list of all the fields with a certain tag and subfield code.
- Works on subfield level.
+ Works on subfield level.
@param record a record
@param tag a 3 or 5 letter tag; required
@param subfield a subfield code; required
'''
output = []
if record.has_key(tag):
for field_instance in record[tag]:
if field_instance.has_key(subfield):
output.append(field_instance[subfield])
return output
def record_add_field_instance(record, tag, field_instance):
'''
Add a field_instance to the beginning of the instances of a corresponding tag.
@param record a record
@param tag a 3 or 5 letter tag; required
@param field_instance the field instance to add
@return None
'''
if not record.has_key(tag):
record[tag] = []
record[tag] = [field_instance] + record[tag]
return None
def record_num_parts(record, level):
'''
Count the number of instances or the number of subfields in the whole record.
@param record
@param level either 1 or 2
level=1 - view record on instance level
level=2 - view record on subfield level
@return the number of parts
'''
num = 0
for part in record_parts(record, level):
num = num + 1
def record_parts(record, level):
'''
An iterator over the instances or subfields of a record.
@param record
@param level either 1 or 2
level=1 - iterate over instances
level=2 - iterate over subfields
@yield a record structure representing the part (instance or subfield)
- '''
+ '''
if level == 1:
names = record.keys()
names.sort()
for name in names:
old_field_instances = record[name]
for old_field_instance in old_field_instances:
new_record = {}
- new_field_instances = []
+ new_field_instances = []
new_field_instance = {}
for old_field_code in old_field_instance.keys():
new_field_code = old_field_code
new_field_value = old_field_instance[old_field_code]
new_field_instance[new_field_code] = new_field_value
new_field_instances.append(new_field_instance)
new_record[name] = []
new_record[name].extend(new_field_instances)
- yield new_record
+ yield new_record
if level == 2:
names = record.keys()
names.sort()
for name in names:
old_field_instances = record[name]
for old_field_instance in old_field_instances:
old_field_codes = old_field_instance.keys()
old_field_codes.sort()
for old_field_code in old_field_codes:
new_record = {}
new_field_instances = []
new_field_instance = {}
new_field_code = old_field_code
new_field_value = old_field_instance[old_field_code]
new_field_instance[new_field_code] = new_field_value
new_field_instances.append(new_field_instance)
new_record[name] = []
new_record[name].extend(new_field_instances)
- yield new_record
+ yield new_record
def copy(old_record, address=''):
'''
Copy a record by filtering all parts of the old record specified by address
(A better name for the function is filter.)
@param record the initial record
@param address an address; for examples see bibformat_bfx_engine_config.
If no address is specified, return the initial record.
@return the filtered record
'''
if not old_record:
return {}
tag_pattern, code_pattern, reg_pattern = '', '', ''
expr = re.compile(address_pattern)
match = expr.match(address)
if match:
tag_pattern = match.group('tag')
code_pattern = match.group('code')
reg_pattern = match.group('reg')
if tag_pattern:
tag_pattern = tag_pattern.replace('?','[0-9_\w]')
else:
tag_pattern = r'.*'
if code_pattern:
code_pattern = code_pattern.replace('?','[\w ]')
else:
- code_pattern = r'.*'
+ code_pattern = r'.*'
tag_expr = re.compile(tag_pattern)
code_expr = re.compile(code_pattern)
new_record = {}
for tag in old_record.keys():
tag_match = tag_expr.match(tag)
if tag_match:
if tag_match.end() == len(tag):
old_field_instances = old_record[tag]
new_field_instances = []
for old_field_instance in old_field_instances:
new_field_instance = {}
for old_field_code in old_field_instance.keys():
new_field_code = old_field_code
code_match = code_expr.match(new_field_code)
if code_match:
new_field_value = old_field_instance[old_field_code]
new_field_instance[new_field_code] = new_field_value
- if new_field_instance:
+ if new_field_instance:
new_field_instances.append(new_field_instance)
if new_field_instances:
new_record[tag] = new_field_instances
#in new_record pass all subfields through regexp
if reg_pattern:
for tag in new_record:
field_instances = new_record[tag]
for field_instance in field_instances:
field_codes = field_instance.keys()
for field_code in field_codes:
field_instance[field_code] = pass_through_regexp(field_instance[field_code], reg_pattern)
return new_record
def merge(record1, record2):
'''
Merge two records.
Controlfields with the same tag in record2 as in record1 are ignored.
@param record1, record2
@return the merged record
'''
new_record = {}
if record1:
new_record = copy(record1)
if not record2:
return new_record
for tag in record2.keys():
#append only datafield tags;
#if controlfields conflict, leave first;
old_field_instances = record2[tag]
new_field_instances = []
for old_field_instance in old_field_instances:
new_field_instance = {}
for old_field_code in old_field_instance.keys():
new_field_code = old_field_code
new_field_value = old_field_instance[old_field_code]
new_field_instance[new_field_code] = new_field_value
- if new_field_instance:
+ if new_field_instance:
new_field_instances.append(new_field_instance)
if new_field_instances:
#controlfield
if len(tag) == 3:
if not new_record.has_key(tag):
new_record[tag] = []
new_record[tag].extend(new_field_instances)
#datafield
if len(tag) == 5:
if not new_record.has_key(tag):
new_record[tag] = []
new_record[tag].extend(new_field_instances)
return new_record
#======================
#Help functions
#=====================
xmlopen = 1
xmlclose = 2
xmlfull = 3
xmlempty = 4
def create_xml_element(name, value='', attrs=None, element_type=xmlfull, level=0):
'''
Create a XML element as string.
@param name the name of the element
@param value the element value; default is ''
@param attrs a dictionary with the element attributes
@param element_type a constant which defines the type of the output
xmlopen = 1
xmlclose = 2
xmlfull = 3 value
xmlempty = 4
@return a formatted XML string
'''
output = ''
if attrs is None:
attrs = {}
- if element_type == xmlempty:
+ if element_type == xmlempty:
output += '<'+name
for attrname in attrs.keys():
attrvalue = attrs[attrname]
if type(attrvalue) == type(u''):
attrvalue = attrvalue.encode('utf-8')
output += ' %s="%s"' % (attrname, attrvalue)
output += ' />'
if element_type == xmlfull:
output += '<'+name
for attrname in attrs.keys():
attrvalue = attrs[attrname]
if type(attrvalue) == type(u''):
attrvalue = attrvalue.encode('utf-8')
output += ' %s="%s"' % (attrname, attrvalue)
output += '>'
output += value
output += ''+name+'>'
if element_type == xmlopen:
output += '<'+name
for attrname in attrs.keys():
output += ' '+attrname+'="'+attrs[attrname]+'"'
output += '>'
if element_type == xmlclose:
output += ''+name+'>'
output = ' '*level + output
if type(output) == type(u''):
output = output.encode('utf-8')
return output
def xml_escape(value):
'''
Escape a string value for use as a xml element or attribute value.
@param value the string value to escape
@return escaped value
'''
return saxutils.escape(value)
def xml_unescape(value):
'''
Unescape a string value for use as a xml element.
@param value the string value to unescape
@return unescaped value
'''
return saxutils.unescape(value)
-
+
def node_has_subelements(node):
'''
Check if a node has any childnodes.
Check for element or text nodes.
@return True if childnodes exist, False otherwise.
'''
result = False
for node in node.childNodes:
if node.nodeType == Node.ELEMENT_NODE or node.nodeType == Node.TEXT_NODE:
result = True
return result
def get_node_subelement(parent_node, name, namespace = None):
'''
Get the first childnode with specific name and (optional) namespace
@param parent_node the node to check
@param name the name to search
@param namespace An optional namespace URI. This is usually a URL: http://cdsware.cern.ch/invenio/
@return the found node; None otherwise
'''
output = None
for node in parent_node.childNodes:
if node.nodeType == Node.ELEMENT_NODE and node.localName == name and node.namespaceURI == namespace:
output = node
return output
return output
def get_node_value(node):
'''
Get the node value of a node. For use with text nodes.
@param node a text node
@return a string of the nodevalue encoded in utf-8
'''
return node.nodeValue.encode('utf-8')
def get_node_namespace(node):
'''
Get node namespace. For use with element nodes.
@param node an element node
@return the namespace of the node
'''
return node.namespaceURI
def get_node_name(node):
'''
Get the node value of a node. For use with element nodes.
@param node an element node
@return a string of the node name
- '''
+ '''
return node.nodeName
def get_node_attributes(node):
'''
Get attributes of an element node. For use with element nodes
@param node an element node
@return a dictionary of the attributes as key:value pairs
'''
attributes = {}
attrs = node.attributes
for attrname in attrs.keys():
attrnode = attrs.get(attrname)
attrvalue = attrnode.nodeValue
attributes[attrname] = attrvalue
return attributes
def pass_through_regexp(value, regexp):
'''
Pass a value through a regular expression.
@param value a string
@param regexp a regexp with a group 'value' in it. No group named 'value' will result in an error.
@return if the string matches the regexp, return named group 'value', otherwise return ''
'''
output = ''
expr = re.compile(regexp)
match = expr.match(value)
if match:
output = match.group('value')
return output
def is_number(value):
'''
Check if a value is a number.
@param value the value to check
@return True or False
'''
result = True
try:
float(value)
except ValueError:
result = False
return result
-
+
diff --git a/modules/bibformat/lib/bibformat_bfx_engine_config.py b/modules/bibformat/lib/bibformat_bfx_engine_config.py
index 56c7828e5..6cab88bdf 100644
--- a/modules/bibformat/lib/bibformat_bfx_engine_config.py
+++ b/modules/bibformat/lib/bibformat_bfx_engine_config.py
@@ -1,117 +1,117 @@
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-## General Public License for more details.
+## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
# pylint: disable-msg=C0301
"""BibFormat BFX engine configuration."""
__revision__ = "$Id$"
import os
-from invenio.config import etcdir
+from invenio.config import CFG_ETCDIR
-CFG_BIBFORMAT_BFX_TEMPLATES_PATH = "%s%sbibformat%sformat_templates" % (etcdir, os.sep, os.sep)
+CFG_BIBFORMAT_BFX_TEMPLATES_PATH = "%s%sbibformat%sformat_templates" % (CFG_ETCDIR, os.sep, os.sep)
CFG_BIBFORMAT_BFX_FORMAT_TEMPLATE_EXTENSION = "bfx"
CFG_BIBFORMAT_BFX_ELEMENT_NAMESPACE = "http://cdsware.cern.ch/invenio/"
CFG_BIBFORMAT_BFX_LABEL_DEFINITIONS = {
#record is a reserved keyword, don't use it
#define one or more addresses for each name or zero if you plan to define them later
'controlfield': [r'/???'],
'datafield': [r'/?????'],
'datafield.subfield': [r'datafield/?'],
'recid': [r'/001'],
'article_id': [],
'language': [r'/041__/a'],
'title': [r'/245__/a'],
'subtitle': [r'/245__/b'],
'secondary_title': [r'/773__/p'],
'first_author': [r'/100__/a'],
'author': [r'/100__/a',
r'/700__/a'],
'author.surname': [r'author#(?P.*),[ ]*(.*)'],
'author.names': [r'author#(.*),[ ]*(?P.*)'],
'abstract': [r'/520__/a'],
'publisher': [r'/260__/b'],
'publisher_location': [r'/260__/a'],
'issn': [r'/022__/a'],
'doi': [r'/773__/a'],
'journal_name_long': [r'/222__/a',
r'/210__/a',
r'/773__/p',
r'/909C4/p'],
'journal_name_short': [r'/210__/a',
r'/773__/p',
r'/909C4/p'],
'journal_name': [r'/773__/p',
r'/909C4/p'],
'journal_volume': [r'/773__/v',
r'/909C4/v'],
'journal_issue': [r'/773__/n'],
'pages': [r'/773__/c',
r'/909C4/c'],
'first_page': [r'/773__/c#(?P\d*)-(\d*)',
r'/909C4/c#(?P\d*)-(\d*)'],
'last_page': [r'/773__/c#(\d*)-(?P\d*)',
r'/909C4/c#(\d*)-(?P\d*)'],
'date': [r'/260__/c'],
'year': [r'/773__/y#(.*)(?P\d\d\d\d).*',
r'/260__/c#(.*)(?P\d\d\d\d).*',
r'/925__/a#(.*)(?P\d\d\d\d).*',
r'/909C4/y'],
'doc_type': [r'/980__/a'],
'doc_status': [r'/980__/c'],
'uri': [r'/8564_/u',
r'/8564_/q'],
'subject': [r'/65017/a'],
'keyword': [r'/6531_/a'],
'day': [],
'month': [],
'creation_date': [],
'reference': []
}
CFG_BIBFORMAT_BFX_ERROR_MESSAGES = \
{
'ERR_BFX_TEMPLATE_REF_NO_NAME' : 'Error: Missing attribute "name" in TEMPLATE_REF.',
'ERR_BFX_TEMPLATE_NOT_FOUND' : 'Error: Template %s not found.',
'ERR_BFX_ELEMENT_NO_NAME' : 'Error: Missing attribute "name" in ELEMENT.',
'ERR_BFX_FIELD_NO_NAME' : 'Error: Missing attribute "name" in FIELD.',
'ERR_BFX_LOOP_NO_OBJECT' : 'Error: Missing attribute "object" in LOOP.',
'ERR_BFX_NO_SUCH_FIELD' : 'Error: Field %s is not defined',
'ERR_BFX_IF_NO_NAME' : 'Error: Missing attrbute "name" in IF.',
'ERR_BFX_TEXT_NO_VALUE' : 'Error: Missing attribute "value" in TEXT.',
'ERR_BFX_INVALID_RE' : 'Error: Invalid regular expression: %s',
'ERR_BFX_INVALID_OPERATOR_NAME' : 'Error: Name %s is not recognised as a valid operator name.',
'ERR_BFX_INVALID_DISPLAY_TYPE' : 'Error: Invalid display type. Must be one of: value, tag, ind1, ind2, code; received: %s',
'ERR_BFX_IF_WRONG_SYNTAX' : 'Error: Invalid syntax of IF statement.',
'ERR_BFX_DUPLICATE_NAME' : 'Error: Duplicate name: %s.',
'ERR_BFX_TEMPLATE_NO_NAME' : 'Error: No name defined for the template.',
'ERR_BFX_NO_TEMPLATES_FOUND' : 'Error: No templates found in the document.',
'ERR_BFX_TOO_MANY_TEMPLATES' : 'Error: More than one templates found in the document. No format found.'
}
CFG_BIBFORMAT_BFX_WARNING_MESSAGES = \
{
'WRN_BFX_TEMPLATE_NO_DESCRIPTION' : 'Warning: No description entered for the template.',
'WRN_BFX_TEMPLATE_NO_CONTENT' : 'Warning: No content type specified for the template. Using default: text/xml.',
'WRN_BFX_NO_FORMAT_FOUND' : 'Warning: No format found. Will look for a default template.'
}
diff --git a/modules/bibformat/lib/bibformat_config.py b/modules/bibformat/lib/bibformat_config.py
index 39d99b050..caffdb43b 100644
--- a/modules/bibformat/lib/bibformat_config.py
+++ b/modules/bibformat/lib/bibformat_config.py
@@ -1,97 +1,97 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
# pylint: disable-msg=C0301
"""BibFormat configuration parameters."""
__revision__ = "$Id$"
import os
-from invenio.config import etcdir, pylibdir
+from invenio.config import CFG_ETCDIR, CFG_PYLIBDIR
#True if old php format written in EL must be used by Invenio.
#False if new python format must be used. If set to 'False' but
#new format cannot be found, old format will be used.
CFG_BIBFORMAT_USE_OLD_BIBFORMAT = False
#Paths to main formats directories
-CFG_BIBFORMAT_TEMPLATES_PATH = "%s%sbibformat%sformat_templates" % (etcdir, os.sep, os.sep)
+CFG_BIBFORMAT_TEMPLATES_PATH = "%s%sbibformat%sformat_templates" % (CFG_ETCDIR, os.sep, os.sep)
CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH = "invenio.bibformat_elements"
-CFG_BIBFORMAT_ELEMENTS_PATH = "%s%sinvenio%sbibformat_elements" % (pylibdir, os.sep, os.sep)
-CFG_BIBFORMAT_OUTPUTS_PATH = "%s%sbibformat%soutput_formats" % (etcdir, os.sep, os.sep)
+CFG_BIBFORMAT_ELEMENTS_PATH = "%s%sinvenio%sbibformat_elements" % (CFG_PYLIBDIR, os.sep, os.sep)
+CFG_BIBFORMAT_OUTPUTS_PATH = "%s%sbibformat%soutput_formats" % (CFG_ETCDIR, os.sep, os.sep)
#File extensions of formats
CFG_BIBFORMAT_FORMAT_TEMPLATE_EXTENSION = "bft"
CFG_BIBFORMAT_FORMAT_OUTPUT_EXTENSION = "bfo"
CFG_BIBFORMAT_ERROR_MESSAGES = \
{ 'ERR_BIBFORMAT_INVALID_TAG' : '%s is an invalid tag.',
'ERR_BIBFORMAT_NO_TEMPLATE_FOUND' : 'No template could be found for output format %s.',
'ERR_BIBFORMAT_CANNOT_RESOLVE_ELEMENT_NAME' : 'Could not find format element named %s.',
'ERR_BIBFORMAT_CANNOT_RESOLVE_OUTPUT_NAME' : 'Could not find output format named %s.',
'ERR_BIBFORMAT_CANNOT_RESOLVE_TEMPLATE_FILE' : 'Could not find format template named %s.',
'ERR_BIBFORMAT_FORMAT_ELEMENT_NOT_FOUND' : 'Format element %s could not be found.',
'ERR_BIBFORMAT_BAD_BFO_RECORD' : 'Could not initialize new BibFormatObject with record id %s.',
'ERR_BIBFORMAT_NB_OUTPUTS_LIMIT_REACHED' : 'Could not find a fresh name for output format %s.',
'ERR_BIBFORMAT_KB_ID_UNKNOWN' : 'Knowledge base with id %s is unknown.',
'ERR_BIBFORMAT_OUTPUT_FORMAT_CODE_UNKNOWN' : 'Output format with code %s could not be found.',
'ERR_BIBFORMAT_CANNOT_READ_TEMPLATE_FILE' : 'Format template %s cannot not be read. %s',
'ERR_BIBFORMAT_CANNOT_WRITE_TEMPLATE_FILE' : 'BibFormat could not write to format template %s. %s',
'ERR_BIBFORMAT_CANNOT_READ_OUTPUT_FILE' : 'Output format %s cannot not be read. %s',
'ERR_BIBFORMAT_CANNOT_WRITE_OUTPUT_FILE' : 'BibFormat could not write to output format %s. %s',
'ERR_BIBFORMAT_EVALUATING_ELEMENT' : 'Error when evaluating format element %s with parameters %s',
'ERR_BIBFORMAT_CANNOT_READ_ELEMENT_FILE' : 'Format element %s cannot not be read. %s',
'ERR_BIBFORMAT_INVALID_OUTPUT_RULE_FIELD' : 'Should be "tag field_number:" at line %s.',
'ERR_BIBFORMAT_INVALID_OUTPUT_RULE_FIELD_TAG' : 'Invalid tag "%s" at line %s.',
'ERR_BIBFORMAT_OUTPUT_CONDITION_OUTSIDE_FIELD': 'Condition "%s" is outside a tag specification at line %s.',
'ERR_BIBFORMAT_INVALID_OUTPUT_CONDITION' : 'Condition "%s" can only have a single separator --- at line %s.',
'ERR_BIBFORMAT_WRONG_OUTPUT_RULE_TEMPLATE_REF': 'Template "%s" does not exist at line %s.',
'ERR_BIBFORMAT_WRONG_OUTPUT_LINE' : 'Line %s could not be understood at line %s.',
'ERR_BIBFORMAT_OUTPUT_WRONG_TAG_CASE' : '"tag" must be lowercase in "%s" at line %s.',
'ERR_BIBFORMAT_OUTPUT_RULE_FIELD_COL' : 'Tag specification "%s" must end with column ":" at line %s.',
'ERR_BIBFORMAT_OUTPUT_TAG_MISSING' : 'Tag specification "%s" must start with "tag" at line %s.',
'ERR_BIBFORMAT_OUTPUT_WRONG_DEFAULT_CASE' : '"default" keyword must be lowercase in "%s" at line %s',
'ERR_BIBFORMAT_OUTPUT_RULE_DEFAULT_COL' : 'Missing column ":" after "default" in "%s" at line %s.',
'ERR_BIBFORMAT_OUTPUT_DEFAULT_MISSING' : 'Default template specification "%s" must start with "default :" at line %s.',
'ERR_BIBFORMAT_FORMAT_ELEMENT_FORMAT_FUNCTION': 'Format element %s has no function named "format".',
'ERR_BIBFORMAT_VALIDATE_NO_FORMAT' : 'No format specified for validation. Please specify one.',
'ERR_BIBFORMAT_TEMPLATE_HAS_NO_NAME' : 'Could not find a name specified in tag "" inside format template %s.',
'ERR_BIBFORMAT_TEMPLATE_HAS_NO_DESCRIPTION' : 'Could not find a description specified in tag "" inside format template %s.',
'ERR_BIBFORMAT_TEMPLATE_CALLS_UNREADABLE_ELEM': 'Format template %s calls unreadable element "%s". Check element file permissions.',
'ERR_BIBFORMAT_TEMPLATE_CALLS_UNLOADABLE_ELEM': 'Cannot load element "%s" in template %s. Check element code.',
'ERR_BIBFORMAT_TEMPLATE_CALLS_UNDEFINED_ELEM' : 'Format template %s calls undefined element "%s".',
'ERR_BIBFORMAT_TEMPLATE_WRONG_ELEM_ARG' : 'Format element %s uses unknown parameter "%s" in format template %s.',
'ERR_BIBFORMAT_IN_FORMAT_ELEMENT' : 'Error in format element %s. %s',
'ERR_BIBFORMAT_NO_RECORD_FOUND_FOR_PATTERN' : 'No Record Found for %s.',
'ERR_BIBFORMAT_NBMAX_NOT_INT' : '"nbMax" parameter for %s must be an "int".',
'ERR_BIBFORMAT_EVALUATING_ELEMENT_ESCAPE' : 'Escape mode for format element %s could not be retrieved. Using default mode instead.'
}
CFG_BIBFORMAT_WARNING_MESSAGES = \
{ 'WRN_BIBFORMAT_OUTPUT_FORMAT_NAME_TOO_LONG' : 'Name %s is too long for output format %s in language %s. Truncated to first 256 characters.',
'WRN_BIBFORMAT_KB_NAME_UNKNOWN' : 'Cannot find knowledge base named %s.',
'WRN_BIBFORMAT_KB_MAPPING_UNKNOWN' : 'Cannot find a mapping with key %s in knowledge base %s.',
'WRN_BIBFORMAT_CANNOT_WRITE_IN_ETC_BIBFORMAT' : 'Cannot write in etc/bibformat dir of your Invenio installation. Check directory permission.',
'WRN_BIBFORMAT_CANNOT_WRITE_MIGRATION_STATUS' : 'Cannot write file migration_status.txt in etc/bibformat dir of your Invenio installation. Check file permission.',
'WRN_BIBFORMAT_CANNOT_EXECUTE_REQUEST' : 'Your request could not be executed.'
}
diff --git a/modules/bibformat/lib/bibformat_engine.py b/modules/bibformat/lib/bibformat_engine.py
index 0668c8e59..2ccc2e37b 100644
--- a/modules/bibformat/lib/bibformat_engine.py
+++ b/modules/bibformat/lib/bibformat_engine.py
@@ -1,2008 +1,2008 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
Formats a single XML Marc record using specified format.
There is no API for the engine. Instead use bibformat.py.
SEE: bibformat.py, bibformat_utils.py
"""
__revision__ = "$Id$"
import re
import sys
import os
import inspect
import traceback
import zlib
import cgi
from invenio.config import \
CFG_PATH_PHP, \
- bindir, \
+ CFG_BINDIR, \
cdslang
from invenio.errorlib import \
register_errors, \
get_msgs_for_code_list
from invenio.bibrecord import \
create_record, \
record_get_field_instances, \
record_get_field_value, \
record_get_field_values
from invenio.bibformat_xslt_engine import format
from invenio.dbquery import run_sql
from invenio.messages import \
language_list_long, \
wash_language, \
gettext_set_language
from invenio import bibformat_dblayer
from invenio.bibformat_config import \
CFG_BIBFORMAT_FORMAT_TEMPLATE_EXTENSION, \
CFG_BIBFORMAT_FORMAT_OUTPUT_EXTENSION, \
CFG_BIBFORMAT_TEMPLATES_PATH, \
CFG_BIBFORMAT_ELEMENTS_PATH, \
CFG_BIBFORMAT_OUTPUTS_PATH, \
CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH
from invenio.bibformat_utils import \
record_get_xml, \
parse_tag
from invenio.htmlutils import HTMLWasher
from invenio.webuser import collect_user_info
if CFG_PATH_PHP: #Remove when call_old_bibformat is removed
from xml.dom import minidom
import tempfile
# Cache for data we have already read and parsed
format_templates_cache = {}
format_elements_cache = {}
format_outputs_cache = {}
kb_mappings_cache = {}
cdslangs = language_list_long()
html_field = '' # String indicating that field should be
# treated as HTML (and therefore no escaping of
# HTML tags should occur.
# Appears in some field values.
washer = HTMLWasher() # Used to remove dangerous tags from HTML
# sources
# Regular expression for finding ... tag in format templates
pattern_lang = re.compile(r'''
#closing start tag
(?P.*?) #anything but the next group (greedy)
() #end tag
''', re.IGNORECASE | re.DOTALL | re.VERBOSE)
# Builds regular expression for finding each known language in tags
ln_pattern_text = r"<("
for lang in cdslangs:
ln_pattern_text += lang[0] +r"|"
ln_pattern_text = ln_pattern_text.rstrip(r"|")
ln_pattern_text += r")>(.*?)\1>"
ln_pattern = re.compile(ln_pattern_text, re.IGNORECASE | re.DOTALL)
# Regular expression for finding text to be translated
translation_pattern = re.compile(r'_\((?P.*?)\)_', \
re.IGNORECASE | re.DOTALL | re.VERBOSE)
# Regular expression for finding tag in format templates
pattern_format_template_name = re.compile(r'''
#closing start tag
(?P.*?) #name value. any char that is not end tag
()(\n)? #end tag
''', re.IGNORECASE | re.DOTALL | re.VERBOSE)
# Regular expression for finding tag in format templates
pattern_format_template_desc = re.compile(r'''
#closing start tag
(?P.*?) #description value. any char that is not end tag
(\n)? #end tag
''', re.IGNORECASE | re.DOTALL | re.VERBOSE)
# Regular expression for finding tags in format templates
pattern_tag = re.compile(r'''
[^/\s]+) #any char but a space or slash
\s* #any number of spaces
(?P(\s* #params here
(?P([^=\s])*)\s* #param name: any chars that is not a white space or equality. Followed by space(s)
=\s* #equality: = followed by any number of spaces
(?P[\'"]) #one of the separators
(?P.*?) #param value: any chars that is not a separator like previous one
(?P=sep) #same separator as starting one
)*) #many params
\s* #any number of spaces
(/)?> #end of the tag
''', re.IGNORECASE | re.DOTALL | re.VERBOSE)
# Regular expression for finding params inside tags in format templates
pattern_function_params = re.compile('''
(?P([^=\s])*)\s* # Param name: any chars that is not a white space or equality. Followed by space(s)
=\s* # Equality: = followed by any number of spaces
(?P[\'"]) # One of the separators
(?P.*?) # Param value: any chars that is not a separator like previous one
(?P=sep) # Same separator as starting one
''', re.VERBOSE | re.DOTALL )
# Regular expression for finding format elements "params" attributes
# (defined by @param)
pattern_format_element_params = re.compile('''
@param\s* # Begins with @param keyword followed by space(s)
(?P[^\s=]*)\s* # A single keyword, and then space(s)
#(=\s*(?P[\'"]) # Equality, space(s) and then one of the separators
#(?P.*?) # Default value: any chars that is not a separator like previous one
#(?P=sep) # Same separator as starting one
#)?\s* # Default value for param is optional. Followed by space(s)
(?P.*) # Any text that is not end of line (thanks to MULTILINE parameter)
''', re.VERBOSE | re.MULTILINE)
# Regular expression for finding format elements "see also" attribute
# (defined by @see)
pattern_format_element_seealso = re.compile('''@see\s*(?P.*)''',
re.VERBOSE | re.MULTILINE)
#Regular expression for finding 2 expressions in quotes, separated by
#comma (as in template("1st","2nd") )
#Used when parsing output formats
## pattern_parse_tuple_in_quotes = re.compile('''
## (?P[\'"])
## (?P.*)
## (?P=sep1)
## \s*,\s*
## (?P[\'"])
## (?P.*)
## (?P=sep2)
## ''', re.VERBOSE | re.MULTILINE)
def call_old_bibformat(recID, format="HD", on_the_fly=False, verbose=0):
"""
FIXME: REMOVE FUNCTION WHEN MIGRATION IS DONE
Calls BibFormat for the record RECID in the desired output format FORMAT.
@param on_the_fly if False, try to return an already preformatted version of the record in the database
Note: this functions always try to return HTML, so when
bibformat returns XML with embedded HTML format inside the tag
FMT $g, as is suitable for prestoring output formats, we
perform un-XML-izing here in order to return HTML body only.
"""
out = ""
res = []
if not on_the_fly:
# look for formatted notice existence:
query = "SELECT value, last_updated FROM bibfmt WHERE "\
"id_bibrec='%s' AND format='%s'" % (recID, format)
res = run_sql(query, None, 1)
if res:
# record 'recID' is formatted in 'format', so print it
if verbose == 9:
last_updated = res[0][1]
out += """\n
Found preformatted output for record %i (cache updated on %s).
""" % (recID, last_updated)
decompress = zlib.decompress
return "%s" % decompress(res[0][0])
else:
# record 'recID' is not formatted in 'format',
# so try to call BibFormat on the fly or use default format:
if verbose == 9:
out += """\n
Formatting record %i on-the-fly with old BibFormat.
""" % recID
# Retrieve MARCXML
# Build it on-the-fly only if 'call_old_bibformat' was called
# with format=xm and on_the_fly=True
xm_record = record_get_xml(recID, 'xm',
on_the_fly=(on_the_fly and format == 'xm'))
## import platform
## # Some problem have been found using either popen or os.system command.
## # Here is a temporary workaround until the issue is solved.
## if platform.python_compiler().find('Red Hat') > -1:
## # use os.system
## (result_code, result_path) = tempfile.mkstemp()
-## command = "( %s/bibformat otype=%s ) > %s" % (bindir, format, result_path)
+## command = "( %s/bibformat otype=%s ) > %s" % (CFG_BINDIR, format, result_path)
## (xm_code, xm_path) = tempfile.mkstemp()
## xm_file = open(xm_path, "w")
## xm_file.write(xm_record)
## xm_file.close()
## command = command + " <" + xm_path
## os.system(command)
## result_file = open(result_path,"r")
## bibformat_output = result_file.read()
## result_file.close()
## os.remove(result_path)
## os.remove(xm_path)
## else:
## # use popen
- pipe_input, pipe_output, pipe_error = os.popen3(["%s/bibformat" % bindir,
+ pipe_input, pipe_output, pipe_error = os.popen3(["%s/bibformat" % CFG_BINDIR,
"otype=%s" % format],
'rw')
pipe_input.write(xm_record)
pipe_input.flush()
pipe_input.close()
bibformat_output = pipe_output.read()
pipe_output.close()
pipe_error.close()
if bibformat_output.startswith(""):
dom = minidom.parseString(bibformat_output)
for e in dom.getElementsByTagName('subfield'):
if e.getAttribute('code') == 'g':
for t in e.childNodes:
out += t.data.encode('utf-8')
else:
out += bibformat_output
return out
def format_record(recID, of, ln=cdslang, verbose=0,
search_pattern=[], xml_record=None, user_info=None):
"""
Formats a record given output format. Main entry function of
bibformat engine.
Returns a formatted version of the record in the specified
language, search pattern, and with the specified output format.
The function will define which format template must be applied.
You can either specify an record ID to format, or give its xml
representation. if 'xml_record' is not None, then use it instead
of recID.
'user_info' allows to grant access to some functionalities on a
page depending on the user's priviledges. 'user_info' is the same
object as the one returned by 'webuser.collect_user_info(req)'
@param recID the ID of record to format
@param of an output format code (or short identifier for the output format)
@param ln the language to use to format the record
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings, stop if error in format elements
9: errors and warnings, stop if error (debug mode ))
@param search_pattern list of strings representing the user request in web interface
@param xml_record an xml string representing the record to format
@param user_info the information of the user who will view the formatted page
@return formatted record
"""
out = ""
errors_ = []
# Temporary workflow (during migration of formats):
# Call new BibFormat
# But if format not found for new BibFormat, then call old BibFormat
#Create a BibFormat Object to pass that contain record and context
bfo = BibFormatObject(recID, ln, search_pattern, xml_record, user_info, of)
#Find out which format template to use based on record and output format.
template = decide_format_template(bfo, of)
if verbose == 9 and template is not None:
out += """\n
Using %s template for record %i.
""" % (template, recID)
############### FIXME: REMOVE WHEN MIGRATION IS DONE ###############
path = "%s%s%s" % (CFG_BIBFORMAT_TEMPLATES_PATH, os.sep, template)
if template is None or not os.access(path, os.R_OK):
# template not found in new BibFormat. Call old one
if verbose == 9:
if template is None:
out += """\n
No template found for output format %s and record %i.
(Check invenio.err log file for more details)
""" % (of, recID)
else:
out += """\n
Template %s could not be read.
""" % (template)
if CFG_PATH_PHP:
if verbose == 9:
out += """\n
Using old BibFormat for record %s.
""" % recID
return out + call_old_bibformat(recID, format=of, on_the_fly=True, verbose=verbose)
############################# END ##################################
error = get_msgs_for_code_list([("ERR_BIBFORMAT_NO_TEMPLATE_FOUND", of)],
stream='error', ln=cdslang)
errors_.append(error)
if verbose == 0:
register_errors(error, 'error')
elif verbose > 5:
return out + error[0][1]
return out
# Format with template
(out_, errors) = format_with_format_template(template, bfo, verbose)
errors_.extend(errors)
out += out_
return out
def decide_format_template(bfo, of):
"""
Returns the format template name that should be used for formatting
given output format and BibFormatObject.
Look at of rules, and take the first matching one.
If no rule matches, returns None
To match we ignore lettercase and spaces before and after value of
rule and value of record
@param bfo a BibFormatObject
@param of the code of the output format to use
"""
output_format = get_output_format(of)
for rule in output_format['rules']:
value = bfo.field(rule['field']).strip()#Remove spaces
pattern = rule['value'].strip() #Remove spaces
match_obj = re.match(pattern, value, re.IGNORECASE)
if match_obj is not None and \
match_obj.start() == 0 and match_obj.end() == len(value):
return rule['template']
template = output_format['default']
if template != '':
return template
else:
return None
def format_with_format_template(format_template_filename, bfo,
verbose=0, format_template_code=None):
""" Format a record given a
format template. Also returns errors
Returns a formatted version of the record represented by bfo,
in the language specified in bfo, and with the specified format template.
If format_template_code is provided, the template will not be loaded from
format_template_filename (but format_template_filename will still be used to
determine if bft or xsl transformation applies). This allows to preview format
code without having to save file on disk.
@param format_template_filename the dilename of a format template
@param bfo the object containing parameters for the current formatting
@param format_template_code if not empty, use code as template instead of reading format_template_filename (used for previews)
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@return tuple (formatted text, errors)
"""
_ = gettext_set_language(bfo.lang)
def translate(match):
"""
Translate matching values
"""
word = match.group("word")
translated_word = _(word)
return translated_word
errors_ = []
if format_template_code is not None:
format_content = str(format_template_code)
else:
format_content = get_format_template(format_template_filename)['code']
if format_template_filename is None or \
format_template_filename.endswith("."+CFG_BIBFORMAT_FORMAT_TEMPLATE_EXTENSION):
# .bft
filtered_format = filter_languages(format_content, bfo.lang)
localized_format = translation_pattern.sub(translate, filtered_format)
(evaluated_format, errors) = eval_format_template_elements(localized_format,
bfo,
verbose)
errors_ = errors
else:
#.xsl
# Fetch MARCXML. On-the-fly xm if we are now formatting in xm
xml_record = record_get_xml(bfo.recID, 'xm', on_the_fly=(bfo.format != 'xm'))
# Transform MARCXML using stylesheet
evaluated_format = format(xml_record, template_source=format_content)
return (evaluated_format, errors_)
def eval_format_template_elements(format_template, bfo, verbose=0):
"""
Evalutes the format elements of the given template and replace each element with its value.
Also returns errors.
Prepare the format template content so that we can directly replace the marc code by their value.
This implies: 1) Look for special tags
2) replace special tags by their evaluation
@param format_template the format template code
@param bfo the object containing parameters for the current formatting
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@return tuple (result, errors)
"""
errors_ = []
# First define insert_element_code(match), used in re.sub() function
def insert_element_code(match):
"""
Analyses 'match', interpret the corresponding code, and return the result of the evaluation.
Called by substitution in 'eval_format_template_elements(...)'
@param match a match object corresponding to the special tag that must be interpreted
"""
function_name = match.group("function_name")
try:
format_element = get_format_element(function_name, verbose)
except Exception, e:
if verbose >= 5:
return '' + \
cgi.escape(str(e)).replace('\n', ' ') + \
''
if format_element is None:
error = get_msgs_for_code_list([("ERR_BIBFORMAT_CANNOT_RESOLVE_ELEMENT_NAME", function_name)],
stream='error', ln=cdslang)
errors_.append(error)
if verbose >= 5:
return '' + \
error[0][1]+''
else:
params = {}
# Look for function parameters given in format template code
all_params = match.group('params')
if all_params is not None:
function_params_iterator = pattern_function_params.finditer(all_params)
for param_match in function_params_iterator:
name = param_match.group('param')
value = param_match.group('value')
params[name] = value
# Evaluate element with params and return (Do not return errors)
(result, errors) = eval_format_element(format_element,
bfo,
params,
verbose)
errors_.append(errors)
return result
# Substitute special tags in the format by our own text.
# Special tags have the form
format = pattern_tag.sub(insert_element_code, format_template)
return (format, errors_)
def eval_format_element(format_element, bfo, parameters={}, verbose=0):
"""
Returns the result of the evaluation of the given format element
name, with given BibFormatObject and parameters. Also returns
the errors of the evaluation.
@param format_element a format element structure as returned by get_format_element
@param bfo a BibFormatObject used for formatting
@param parameters a dict of parameters to be used for formatting. Key is parameter and value is value of parameter
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@return tuple (result, errors)
"""
errors = []
#Load special values given as parameters
prefix = parameters.get('prefix', "")
suffix = parameters.get('suffix', "")
default_value = parameters.get('default', "")
escape = parameters.get('escape', "")
output_text = ''
# 3 possible cases:
# a) format element file is found: we execute it
# b) format element file is not found, but exist in tag table (e.g. bfe_isbn)
# c) format element is totally unknown. Do nothing or report error
if format_element is not None and format_element['type'] == "python":
# a) We found an element with the tag name, of type "python"
# Prepare a dict 'params' to pass as parameter to 'format'
# function of element
params = {}
# Look for parameters defined in format element
# Fill them with specified default values and values
# given as parameters
for param in format_element['attrs']['params']:
name = param['name']
default = param['default']
params[name] = parameters.get(name, default)
# Add BibFormatObject
params['bfo'] = bfo
# Execute function with given parameters and return result.
function = format_element['code']
try:
output_text = apply(function, (), params)
except Exception, e:
name = format_element['attrs']['name']
error = ("ERR_BIBFORMAT_EVALUATING_ELEMENT", name, str(params))
errors.append(error)
if verbose == 0:
register_errors(errors, 'error')
elif verbose >= 5:
tb = sys.exc_info()[2]
error_string = get_msgs_for_code_list(error,
stream='error',
ln=cdslang)
stack = traceback.format_exception(Exception, e, tb, limit=None)
output_text = ''+ \
str(error_string[0][1]) + "".join(stack) +' '
# None can be returned when evaluating function
if output_text is None:
output_text = ""
else:
output_text = str(output_text)
# Escaping:
# (1) By default, everything is escaped in mode 1
# (2) If evaluated element has 'escape_values()' function, use
# its returned value as escape mode, and override (1)
# (3) If template has a defined parameter (in allowed values),
# use it, and override (1) and (2)
# (1)
escape_mode = 1
# (2)
escape_function = format_element['escape_function']
if escape_function is not None:
try:
escape_mode = apply(escape_function, (), {'bfo': bfo})
except Exception, e:
error = ("ERR_BIBFORMAT_EVALUATING_ELEMENT_ESCAPE", name)
errors.append(error)
if verbose == 0:
register_errors(errors, 'error')
elif verbose >= 5:
tb = sys.exc_info()[2]
error_string = get_msgs_for_code_list(error,
stream='error',
ln=cdslang)
output_text += ''+ \
str(error_string[0][1]) +' '
# (3)
if escape in ['0', '1', '2', '3', '4']:
escape_mode = int(escape)
#If escape is equal to 1, then escape all
# HTML reserved chars.
if escape_mode > 0:
output_text = escape_field(output_text, mode=escape_mode)
# Add prefix and suffix if they have been given as parameters and if
# the evaluation of element is not empty
if output_text.strip() != "":
output_text = prefix + output_text + suffix
# Add the default value if output_text is empty
if output_text == "":
output_text = default_value
return (output_text, errors)
elif format_element is not None and format_element['type'] == "field":
# b) We have not found an element in files that has the tag
# name. Then look for it in the table "tag"
#
#
#
# Load special values given as parameters
separator = parameters.get('separator ', "")
nbMax = parameters.get('nbMax', "")
escape = parameters.get('escape', "1") # By default, escape here
# Get the fields tags that have to be printed
tags = format_element['attrs']['tags']
output_text = []
# Get values corresponding to tags
for tag in tags:
p_tag = parse_tag(tag)
values = record_get_field_values(bfo.get_record(),
p_tag[0],
p_tag[1],
p_tag[2],
p_tag[3])
if len(values)>0 and isinstance(values[0], dict):
#flatten dict to its values only
values_list = map(lambda x: x.values(), values)
#output_text.extend(values)
for values in values_list:
output_text.extend(values)
else:
output_text.extend(values)
if nbMax != "":
try:
nbMax = int(nbMax)
output_text = output_text[:nbMax]
except:
name = format_element['attrs']['name']
error = ("ERR_BIBFORMAT_NBMAX_NOT_INT", name)
errors.append(error)
if verbose < 5:
register_errors(error, 'error')
elif verbose >= 5:
error_string = get_msgs_for_code_list(error,
stream='error',
ln=cdslang)
output_text = output_text.append(error_string[0][1])
# Add prefix and suffix if they have been given as parameters and if
# the evaluation of element is not empty.
# If evaluation is empty string, return default value if it exists.
# Else return empty string
if ("".join(output_text)).strip() != "":
# If escape is equal to 1, then escape all
# HTML reserved chars.
if escape == '1':
output_text = cgi.escape(separator.join(output_text))
else:
output_text = separator.join(output_text)
output_text = prefix + output_text + suffix
else:
#Return default value
output_text = default_value
return (output_text, errors)
else:
# c) Element is unknown
error = get_msgs_for_code_list([("ERR_BIBFORMAT_CANNOT_RESOLVE_ELEMENT_NAME", format_element)],
stream='error', ln=cdslang)
errors.append(error)
if verbose < 5:
register_errors(error, 'error')
return ("", errors)
elif verbose >= 5:
if verbose >= 9:
sys.exit(error[0][1])
return ('' + \
error[0][1]+'', errors)
def filter_languages(format_template, ln='en'):
"""
Filters the language tags that do not correspond to the specified language.
@param format_template the format template code
@param ln the language that is NOT filtered out from the template
@return the format template with unnecessary languages filtered out
"""
# First define search_lang_tag(match) and clean_language_tag(match), used
# in re.sub() function
def search_lang_tag(match):
"""
Searches for the ... tag and remove inner localized tags
such as , , that are not current_lang.
If current_lang cannot be found inside ... , try to use 'cdslang'
@param match a match object corresponding to the special tag that must be interpreted
"""
current_lang = ln
def clean_language_tag(match):
"""
Return tag text content if tag language of match is output language.
Called by substitution in 'filter_languages(...)'
@param match a match object corresponding to the special tag that must be interpreted
"""
if match.group(1) == current_lang:
return match.group(2)
else:
return ""
# End of clean_language_tag
lang_tag_content = match.group("langs")
# Try to find tag with current lang. If it does not exists,
# then current_lang becomes cdslang until the end of this
# replace
pattern_current_lang = re.compile(r"<("+current_lang+ \
r")\s*>(.*?)("+current_lang+r"\s*>)", re.IGNORECASE | re.DOTALL)
if re.search(pattern_current_lang, lang_tag_content) is None:
current_lang = cdslang
cleaned_lang_tag = ln_pattern.sub(clean_language_tag, lang_tag_content)
return cleaned_lang_tag
# End of search_lang_tag
filtered_format_template = pattern_lang.sub(search_lang_tag, format_template)
return filtered_format_template
def get_format_template(filename, with_attributes=False):
"""
Returns the structured content of the given formate template.
if 'with_attributes' is true, returns the name and description. Else 'attrs' is not
returned as key in dictionary (it might, if it has already been loaded previously)
{'code':"Some template code"
'attrs': {'name': "a name", 'description': "a description"}
}
@param filename the filename of an format template
@param with_attributes if True, fetch the attributes (names and description) for format'
@return strucured content of format template
"""
# Get from cache whenever possible
global format_templates_cache
if not filename.endswith("."+CFG_BIBFORMAT_FORMAT_TEMPLATE_EXTENSION) and \
not filename.endswith(".xsl"):
return None
if format_templates_cache.has_key(filename):
# If we must return with attributes and template exist in
# cache with attributes then return cache.
# Else reload with attributes
if with_attributes and \
format_templates_cache[filename].has_key('attrs'):
return format_templates_cache[filename]
format_template = {'code':""}
try:
path = "%s%s%s" % (CFG_BIBFORMAT_TEMPLATES_PATH, os.sep, filename)
format_file = open(path)
format_content = format_file.read()
format_file.close()
# Load format template code
# Remove name and description
if filename.endswith("."+CFG_BIBFORMAT_FORMAT_TEMPLATE_EXTENSION):
code_and_description = pattern_format_template_name.sub("",
format_content)
code = pattern_format_template_desc.sub("", code_and_description)
else:
code = format_content
format_template['code'] = code
except Exception, e:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_CANNOT_READ_TEMPLATE_FILE", filename, str(e))],
stream='error', ln=cdslang)
register_errors(errors, 'error')
# Save attributes if necessary
if with_attributes:
format_template['attrs'] = get_format_template_attrs(filename)
# Cache and return
format_templates_cache[filename] = format_template
return format_template
def get_format_templates(with_attributes=False):
"""
Returns the list of all format templates, as dictionary with filenames as keys
if 'with_attributes' is true, returns the name and description. Else 'attrs' is not
returned as key in each dictionary (it might, if it has already been loaded previously)
[{'code':"Some template code"
'attrs': {'name': "a name", 'description': "a description"}
},
...
}
@param with_attributes if True, fetch the attributes (names and description) for formats
"""
format_templates = {}
files = os.listdir(CFG_BIBFORMAT_TEMPLATES_PATH)
for filename in files:
if filename.endswith("."+CFG_BIBFORMAT_FORMAT_TEMPLATE_EXTENSION) or \
filename.endswith(".xsl"):
format_templates[filename] = get_format_template(filename,
with_attributes)
return format_templates
def get_format_template_attrs(filename):
"""
Returns the attributes of the format template with given filename
The attributes are {'name', 'description'}
Caution: the function does not check that path exists or
that the format element is valid.
@param the path to a format element
"""
attrs = {}
attrs['name'] = ""
attrs['description'] = ""
try:
template_file = open("%s%s%s" % (CFG_BIBFORMAT_TEMPLATES_PATH,
os.sep,
filename))
code = template_file.read()
template_file.close()
match = None
if filename.endswith(".xsl"):
# .xsl
attrs['name'] = filename[:-4]
else:
# .bft
match = pattern_format_template_name.search(code)
if match is not None:
attrs['name'] = match.group('name')
else:
attrs['name'] = filename
match = pattern_format_template_desc.search(code)
if match is not None:
attrs['description'] = match.group('desc').rstrip('.')
except Exception, e:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_CANNOT_READ_TEMPLATE_FILE",
filename, str(e))],
stream='error', ln=cdslang)
register_errors(errors, 'error')
attrs['name'] = filename
return attrs
def get_format_element(element_name, verbose=0, with_built_in_params=False):
"""
Returns the format element structured content.
Return None if element cannot be loaded (file not found, not readable or
invalid)
The returned structure is {'attrs': {some attributes in dict. See get_format_element_attrs_from_*}
'code': the_function_code,
'type':"field" or "python" depending if element is defined in file or table,
'escape_function': the function to call to know if element output must be escaped}
@param element_name the name of the format element to load
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@param with_built_in_params if True, load the parameters built in all elements
@return a dictionary with format element attributes
"""
# Get from cache whenever possible
global format_elements_cache
errors = []
# Resolve filename and prepare 'name' as key for the cache
filename = resolve_format_element_filename(element_name)
if filename is not None:
name = filename.upper()
else:
name = element_name.upper()
if format_elements_cache.has_key(name):
element = format_elements_cache[name]
if not with_built_in_params or \
(with_built_in_params and \
element['attrs'].has_key('builtin_params')):
return element
if filename is None:
# Element is maybe in tag table
if bibformat_dblayer.tag_exists_for_name(element_name):
format_element = {'attrs': get_format_element_attrs_from_table( \
element_name,
with_built_in_params),
'code':None,
'escape_function':None,
'type':"field"}
# Cache and returns
format_elements_cache[name] = format_element
return format_element
else:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_FORMAT_ELEMENT_NOT_FOUND",
element_name)],
stream='error', ln=cdslang)
if verbose == 0:
register_errors(errors, 'error')
elif verbose >= 5:
sys.stderr.write(errors[0][1])
return None
else:
format_element = {}
module_name = filename
if module_name.endswith(".py"):
module_name = module_name[:-3]
# Load element
try:
module = __import__(CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH + \
"." + module_name)
# Load last module in import path
# For eg. load bfe_name in
# invenio.bibformat_elements.bfe_name
# Used to keep flexibility regarding where elements
# directory is (for eg. test cases)
components = CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH.split(".")
for comp in components[1:]:
module = getattr(module, comp)
except Exception, e:
# We catch all exceptions here, as we just want to print
# traceback in all cases
tb = sys.exc_info()[2]
stack = traceback.format_exception(Exception, e, tb, limit=None)
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_IN_FORMAT_ELEMENT",
element_name,"\n" + "\n".join(stack[-2:-1]))],
stream='error', ln=cdslang)
if verbose == 0:
register_errors(errors, 'error')
elif verbose >= 5:
sys.stderr.write(errors[0][1])
if errors:
if verbose >= 7:
raise Exception, errors[0][1]
return None
# Load function 'format()' inside element
try:
function_format = module.__dict__[module_name].format
format_element['code'] = function_format
except AttributeError, e:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_FORMAT_ELEMENT_FORMAT_FUNCTION",
element_name)],
stream='warning', ln=cdslang)
if verbose == 0:
register_errors(errors, 'error')
elif verbose >= 5:
sys.stderr.write(errors[0][1])
if errors:
if verbose >= 7:
raise Exception, errors[0][1]
return None
# Load function 'escape_values()' inside element
function_escape = getattr(module.__dict__[module_name],
'escape_values',
None)
format_element['escape_function'] = function_escape
# Prepare, cache and return
format_element['attrs'] = get_format_element_attrs_from_function( \
function_format,
element_name,
with_built_in_params)
format_element['type'] = "python"
format_elements_cache[name] = format_element
return format_element
def get_format_elements(with_built_in_params=False):
"""
Returns the list of format elements attributes as dictionary structure
Elements declared in files have priority over element declared in 'tag' table
The returned object has this format:
{element_name1: {'attrs': {'description':..., 'seealso':...
'params':[{'name':..., 'default':..., 'description':...}, ...]
'builtin_params':[{'name':..., 'default':..., 'description':...}, ...]
},
'code': code_of_the_element
},
element_name2: {...},
...}
Returns only elements that could be loaded (not error in code)
@return a dict of format elements with name as key, and a dict as attributes
@param with_built_in_params if True, load the parameters built in all elements
"""
format_elements = {}
mappings = bibformat_dblayer.get_all_name_tag_mappings()
for name in mappings:
format_elements[name.upper().replace(" ", "_").strip()] = get_format_element(name, with_built_in_params=with_built_in_params)
files = os.listdir(CFG_BIBFORMAT_ELEMENTS_PATH)
for filename in files:
filename_test = filename.upper().replace(" ", "_")
if filename_test.endswith(".PY") and filename.upper() != "__INIT__.PY":
if filename_test.startswith("BFE_"):
filename_test = filename_test[4:]
element_name = filename_test[:-3]
element = get_format_element(element_name,
with_built_in_params=with_built_in_params)
if element is not None:
format_elements[element_name] = element
return format_elements
def get_format_element_attrs_from_function(function, element_name,
with_built_in_params=False):
""" Returns the attributes of the
function given as parameter.
It looks for standard parameters of the function, default
values and comments in the docstring.
The attributes are {'description', 'seealso':['element.py', ...],
'params':{name:{'name', 'default', 'description'}, ...], name2:{}}
The attributes are {'name' : "name of element" #basically the name of 'name' parameter
'description': "a string description of the element",
'seealso' : ["element_1.py", "element_2.py", ...] #a list of related elements
'params': [{'name':"param_name", #a list of parameters for this element (except 'bfo')
'default':"default value",
'description': "a description"}, ...],
'builtin_params': {name: {'name':"param_name",#the parameters builtin for all elem of this kind
'default':"default value",
'description': "a description"}, ...},
}
@param function the formatting function of a format element
@param element_name the name of the element
@param with_built_in_params if True, load the parameters built in all elements
"""
attrs = {}
attrs['description'] = ""
attrs['name'] = element_name.replace(" ", "_").upper()
attrs['seealso'] = []
docstring = function.__doc__
if isinstance(docstring, str):
# Look for function description in docstring
#match = pattern_format_element_desc.search(docstring)
description = docstring.split("@param")[0]
description = description.split("@see")[0]
attrs['description'] = description.strip().rstrip('.')
# Look for @see in docstring
match = pattern_format_element_seealso.search(docstring)
if match is not None:
elements = match.group('see').rstrip('.').split(",")
for element in elements:
attrs['seealso'].append(element.strip())
params = {}
# Look for parameters in function definition
(args, varargs, varkw, defaults) = inspect.getargspec(function)
# Prepare args and defaults_list such that we can have a mapping
# from args to defaults
args.reverse()
if defaults is not None:
defaults_list = list(defaults)
defaults_list.reverse()
else:
defaults_list = []
for arg, default in map(None, args, defaults_list):
if arg == "bfo":
#Don't keep this as parameter. It is hidden to users, and
#exists in all elements of this kind
continue
param = {}
param['name'] = arg
if default is None:
#In case no check is made inside element, we prefer to
#print "" (nothing) than None in output
param['default'] = ""
else:
param['default'] = default
param['description'] = "(no description provided)"
params[arg] = param
if isinstance(docstring, str):
# Look for @param descriptions in docstring.
# Add description to existing parameters in params dict
params_iterator = pattern_format_element_params.finditer(docstring)
for match in params_iterator:
name = match.group('name')
if params.has_key(name):
params[name]['description'] = match.group('desc').rstrip('.')
attrs['params'] = params.values()
# Load built-in parameters if necessary
if with_built_in_params:
builtin_params = []
# Add 'prefix' parameter
param_prefix = {}
param_prefix['name'] = "prefix"
param_prefix['default'] = ""
param_prefix['description'] = """A prefix printed only if the
record has a value for this element"""
builtin_params.append(param_prefix)
# Add 'suffix' parameter
param_suffix = {}
param_suffix['name'] = "suffix"
param_suffix['default'] = ""
param_suffix['description'] = """A suffix printed only if the
record has a value for this element"""
builtin_params.append(param_suffix)
# Add 'default' parameter
param_default = {}
param_default['name'] = "default"
param_default['default'] = ""
param_default['description'] = """A default value printed if the
record has no value for this element"""
builtin_params.append(param_default)
# Add 'escape' parameter
param_escape = {}
param_escape['name'] = "escape"
param_escape['default'] = ""
param_escape['description'] = """If set to 1, replaces special
characters '&', '<' and '>' of this
element by SGML entities"""
builtin_params.append(param_escape)
attrs['builtin_params'] = builtin_params
return attrs
def get_format_element_attrs_from_table(element_name,
with_built_in_params=False):
"""
Returns the attributes of the format element with given name in 'tag' table.
Returns None if element_name does not exist in tag table.
The attributes are {'name' : "name of element" #basically the name of 'element_name' parameter
'description': "a string description of the element",
'seealso' : [] #a list of related elements. Always empty in this case
'params': [], #a list of parameters for this element. Always empty in this case
'builtin_params': [{'name':"param_name", #the parameters builtin for all elem of this kind
'default':"default value",
'description': "a description"}, ...],
'tags':["950.1", 203.a] #the list of tags printed by this element
}
@param element_name an element name in database
@param element_name the name of the element
@param with_built_in_params if True, load the parameters built in all elements
"""
attrs = {}
tags = bibformat_dblayer.get_tags_from_name(element_name)
field_label = "field"
if len(tags)>1:
field_label = "fields"
attrs['description'] = "Prints %s %s of the record" % (field_label,
", ".join(tags))
attrs['name'] = element_name.replace(" ", "_").upper()
attrs['seealso'] = []
attrs['params'] = []
attrs['tags'] = tags
# Load built-in parameters if necessary
if with_built_in_params:
builtin_params = []
# Add 'prefix' parameter
param_prefix = {}
param_prefix['name'] = "prefix"
param_prefix['default'] = ""
param_prefix['description'] = """A prefix printed only if the
record has a value for this element"""
builtin_params.append(param_prefix)
# Add 'suffix' parameter
param_suffix = {}
param_suffix['name'] = "suffix"
param_suffix['default'] = ""
param_suffix['description'] = """A suffix printed only if the
record has a value for this element"""
builtin_params.append(param_suffix)
# Add 'separator' parameter
param_separator = {}
param_separator['name'] = "separator"
param_separator['default'] = " "
param_separator['description'] = """A separator between elements of
the field"""
builtin_params.append(param_separator)
# Add 'nbMax' parameter
param_nbMax = {}
param_nbMax['name'] = "nbMax"
param_nbMax['default'] = ""
param_nbMax['description'] = """The maximum number of values to
print for this element. No limit if not
specified"""
builtin_params.append(param_nbMax)
# Add 'default' parameter
param_default = {}
param_default['name'] = "default"
param_default['default'] = ""
param_default['description'] = """A default value printed if the
record has no value for this element"""
builtin_params.append(param_default)
# Add 'escape' parameter
param_escape = {}
param_escape['name'] = "escape"
param_escape['default'] = ""
param_escape['description'] = """If set to 1, replaces special
characters '&', '<' and '>' of this
element by SGML entities"""
builtin_params.append(param_escape)
attrs['builtin_params'] = builtin_params
return attrs
def get_output_format(code, with_attributes=False, verbose=0):
"""
Returns the structured content of the given output format
If 'with_attributes' is true, also returns the names and description of the output formats,
else 'attrs' is not returned in dict (it might, if it has already been loaded previously).
if output format corresponding to 'code' is not found return an empty structure.
See get_output_format_attrs() to learn more on the attributes
{'rules': [ {'field': "980__a",
'value': "PREPRINT",
'template': "filename_a.bft",
},
{...}
],
'attrs': {'names': {'generic':"a name", 'sn':{'en': "a name", 'fr':"un nom"}, 'ln':{'en':"a long name"}}
'description': "a description"
'code': "fnm1",
'content_type': "application/ms-excel",
'visibility': 1
}
'default':"filename_b.bft"
}
@param code the code of an output_format
@param with_attributes if True, fetch the attributes (names and description) for format
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@return strucured content of output format
"""
output_format = {'rules':[], 'default':""}
filename = resolve_output_format_filename(code, verbose)
if filename is None:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_OUTPUT_FORMAT_CODE_UNKNOWN", code)],
stream='error', ln=cdslang)
register_errors(errors, 'error')
if with_attributes: #Create empty attrs if asked for attributes
output_format['attrs'] = get_output_format_attrs(code, verbose)
return output_format
# Get from cache whenever possible
global format_outputs_cache
if format_outputs_cache.has_key(filename):
# If was must return with attributes but cache has not
# attributes, then load attributes
if with_attributes and not \
format_outputs_cache[filename].has_key('attrs'):
format_outputs_cache[filename]['attrs'] = get_output_format_attrs(code, verbose)
return format_outputs_cache[filename]
try:
if with_attributes:
output_format['attrs'] = get_output_format_attrs(code, verbose)
path = "%s%s%s" % (CFG_BIBFORMAT_OUTPUTS_PATH, os.sep, filename )
format_file = open(path)
current_tag = ''
for line in format_file:
line = line.strip()
if line == "":
# Ignore blank lines
continue
if line.endswith(":"):
# Retrieve tag
# Remove : spaces and eol at the end of line
clean_line = line.rstrip(": \n\r")
# The tag starts at second position
current_tag = "".join(clean_line.split()[1:]).strip()
elif line.find('---') != -1:
words = line.split('---')
template = words[-1].strip()
condition = ''.join(words[:-1])
value = ""
output_format['rules'].append({'field': current_tag,
'value': condition,
'template': template,
})
elif line.find(':') != -1:
# Default case
default = line.split(':')[1].strip()
output_format['default'] = default
except Exception, e:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_CANNOT_READ_OUTPUT_FILE", filename, str(e))],
stream='error', ln=cdslang)
register_errors(errors, 'error')
# Cache and return
format_outputs_cache[filename] = output_format
return output_format
def get_output_format_attrs(code, verbose=0):
"""
Returns the attributes of an output format.
The attributes contain 'code', which is the short identifier of the output format
(to be given as parameter in format_record function to specify the output format),
'description', a description of the output format, 'visibility' the visibility of
the format in the output format list on public pages and 'names', the localized names
of the output format. If 'content_type' is specified then the search_engine will
send a file with this content type and with result of formatting as content to the user.
The 'names' dict always contais 'generic', 'ln' (for long name) and 'sn' (for short names)
keys. 'generic' is the default name for output format. 'ln' and 'sn' contain long and short
localized names of the output format. Only the languages for which a localization exist
are used.
{'names': {'generic':"a name", 'sn':{'en': "a name", 'fr':"un nom"}, 'ln':{'en':"a long name"}}
'description': "a description"
'code': "fnm1",
'content_type': "application/ms-excel",
'visibility': 1
}
@param code the short identifier of the format
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@return strucured content of output format attributes
"""
if code.endswith("."+CFG_BIBFORMAT_FORMAT_OUTPUT_EXTENSION):
code = code[:-(len(CFG_BIBFORMAT_FORMAT_OUTPUT_EXTENSION) + 1)]
attrs = {'names':{'generic':"",
'ln':{},
'sn':{}},
'description':'',
'code':code.upper(),
'content_type':"",
'visibility':1}
filename = resolve_output_format_filename(code, verbose)
if filename is None:
return attrs
attrs['names'] = bibformat_dblayer.get_output_format_names(code)
attrs['description'] = bibformat_dblayer.get_output_format_description(code)
attrs['content_type'] = bibformat_dblayer.get_output_format_content_type(code)
attrs['visibility'] = bibformat_dblayer.get_output_format_visibility(code)
return attrs
def get_output_formats(with_attributes=False):
"""
Returns the list of all output format, as a dictionary with their filename as key
If 'with_attributes' is true, also returns the names and description of the output formats,
else 'attrs' is not returned in dicts (it might, if it has already been loaded previously).
See get_output_format_attrs() to learn more on the attributes
{'filename_1.bfo': {'rules': [ {'field': "980__a",
'value': "PREPRINT",
'template': "filename_a.bft",
},
{...}
],
'attrs': {'names': {'generic':"a name", 'sn':{'en': "a name", 'fr':"un nom"}, 'ln':{'en':"a long name"}}
'description': "a description"
'code': "fnm1"
}
'default':"filename_b.bft"
},
'filename_2.bfo': {...},
...
}
@return the list of output formats
"""
output_formats = {}
files = os.listdir(CFG_BIBFORMAT_OUTPUTS_PATH)
for filename in files:
if filename.endswith("."+CFG_BIBFORMAT_FORMAT_OUTPUT_EXTENSION):
code = "".join(filename.split(".")[:-1])
output_formats[filename] = get_output_format(code, with_attributes)
return output_formats
def get_kb_mapping(kb, string, default=""):
"""
Returns the value of the string' in the knowledge base 'kb'.
If kb does not exist or string does not exist in kb, returns 'default'
string value.
@param kb a knowledge base name
@param string a key in a knowledge base
@param default a default value if 'string' is not in 'kb'
@return the value corresponding to the given string in given kb
"""
global kb_mappings_cache
if kb_mappings_cache.has_key(kb):
kb_cache = kb_mappings_cache[kb]
if kb_cache.has_key(string):
value = kb_mappings_cache[kb][string]
if value is None:
return default
else:
return value
else:
# Precreate for caching this kb
kb_mappings_cache[kb] = {}
value = bibformat_dblayer.get_kb_mapping_value(kb, string)
kb_mappings_cache[kb][str(string)] = value
if value is None:
return default
else:
return value
def resolve_format_element_filename(string):
"""
Returns the filename of element corresponding to string
This is necessary since format templates code call
elements by ignoring case, for eg. is the
same as .
It is also recommended that format elements filenames are
prefixed with bfe_ . We need to look for these too.
The name of the element has to start with "BFE_".
@param name a name for a format element
@return the corresponding filename, with right case
"""
if not string.endswith(".py"):
name = string.replace(" ", "_").upper() +".PY"
else:
name = string.replace(" ", "_").upper()
files = os.listdir(CFG_BIBFORMAT_ELEMENTS_PATH)
for filename in files:
test_filename = filename.replace(" ", "_").upper()
if test_filename == name or \
test_filename == "BFE_" + name or \
"BFE_" + test_filename == name:
return filename
# No element with that name found
# Do not log error, as it might be a normal execution case:
# element can be in database
return None
def resolve_output_format_filename(code, verbose=0):
"""
Returns the filename of output corresponding to code
This is necessary since output formats names are not case sensitive
but most file systems are.
@param code the code for an output format
@param verbose the level of verbosity from 0 to 9 (O: silent,
5: errors,
7: errors and warnings,
9: errors and warnings, stop if error (debug mode ))
@return the corresponding filename, with right case, or None if not found
"""
#Remove non alphanumeric chars (except .)
code = re.sub(r"[^.0-9a-zA-Z]", "", code)
if not code.endswith("."+CFG_BIBFORMAT_FORMAT_OUTPUT_EXTENSION):
code = re.sub(r"\W", "", code)
code += "."+CFG_BIBFORMAT_FORMAT_OUTPUT_EXTENSION
files = os.listdir(CFG_BIBFORMAT_OUTPUTS_PATH)
for filename in files:
if filename.upper() == code.upper():
return filename
# No output format with that name found
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_CANNOT_RESOLVE_OUTPUT_NAME", code)],
stream='error', ln=cdslang)
if verbose == 0:
register_errors(errors, 'error')
elif verbose >= 5:
sys.stderr.write(errors[0][1])
if verbose >= 9:
sys.exit(errors[0][1])
return None
def get_fresh_format_template_filename(name):
"""
Returns a new filename and name for template with given name.
Used when writing a new template to a file, so that the name
has no space, is unique in template directory
Returns (unique_filename, modified_name)
@param a name for a format template
@return the corresponding filename, and modified name if necessary
"""
#name = re.sub(r"\W", "", name) #Remove non alphanumeric chars
name = name.replace(" ", "_")
filename = name
# Remove non alphanumeric chars (except .)
filename = re.sub(r"[^.0-9a-zA-Z]", "", filename)
path = CFG_BIBFORMAT_TEMPLATES_PATH + os.sep + filename \
+ "." + CFG_BIBFORMAT_FORMAT_TEMPLATE_EXTENSION
index = 1
while os.path.exists(path):
index += 1
filename = name + str(index)
path = CFG_BIBFORMAT_TEMPLATES_PATH + os.sep + filename \
+ "." + CFG_BIBFORMAT_FORMAT_TEMPLATE_EXTENSION
if index > 1:
returned_name = (name + str(index)).replace("_", " ")
else:
returned_name = name.replace("_", " ")
return (filename + "." + CFG_BIBFORMAT_FORMAT_TEMPLATE_EXTENSION,
returned_name) #filename.replace("_", " "))
def get_fresh_output_format_filename(code):
"""
Returns a new filename for output format with given code.
Used when writing a new output format to a file, so that the code
has no space, is unique in output format directory. The filename
also need to be at most 6 chars long, as the convention is that
filename == output format code (+ .extension)
We return an uppercase code
Returns (unique_filename, modified_code)
@param code the code of an output format
@return the corresponding filename, and modified code if necessary
"""
#code = re.sub(r"\W", "", code) #Remove non alphanumeric chars
code = code.upper().replace(" ", "_")
# Remove non alphanumeric chars (except .)
code = re.sub(r"[^.0-9a-zA-Z]", "", code)
if len(code) > 6:
code = code[:6]
filename = code
path = CFG_BIBFORMAT_OUTPUTS_PATH + os.sep + filename \
+ "." + CFG_BIBFORMAT_FORMAT_OUTPUT_EXTENSION
index = 2
while os.path.exists(path):
filename = code + str(index)
if len(filename) > 6:
filename = code[:-(len(str(index)))]+str(index)
index += 1
path = CFG_BIBFORMAT_OUTPUTS_PATH + os.sep + filename \
+ "." + CFG_BIBFORMAT_FORMAT_OUTPUT_EXTENSION
# We should not try more than 99999... Well I don't see how we
# could get there.. Sanity check.
if index >= 99999:
errors = get_msgs_for_code_list([("ERR_BIBFORMAT_NB_OUTPUTS_LIMIT_REACHED", code)],
stream='error', ln=cdslang)
register_errors(errors, 'error')
sys.exit("Output format cannot be named as %s"%code)
return (filename + "." + CFG_BIBFORMAT_FORMAT_OUTPUT_EXTENSION, filename)
def clear_caches():
"""
Clear the caches (Output Format, Format Templates and Format Elements)
"""
global format_templates_cache, format_elements_cache , \
format_outputs_cache, kb_mappings_cache
format_templates_cache = {}
format_elements_cache = {}
format_outputs_cache = {}
kb_mappings_cache = {}
class BibFormatObject:
"""
An object that encapsulates a record and associated methods, and that is given
as parameter to all format elements 'format' function.
The object is made specifically for a given formatting, i.e. it includes
for example the language for the formatting.
The object provides basic accessors to the record. For full access, one can get
the record with get_record() and then use BibRecord methods on the returned object.
"""
# The record
record = None
# The language in which the formatting has to be done
lang = cdslang
# A list of string describing the context in which the record has
# to be formatted.
# It represents the words of the user request in web interface search
search_pattern = []
# The id of the record
recID = 0
uid = None # DEPRECATED: use bfo.user_info['uid'] instead
# The information about the user, as returned by
# 'webuser.collect_user_info(req)'
user_info = None
# The format in which the record is being formatted
format = ''
req = None # DEPRECATED: use bfo.user_info instead
def __init__(self, recID, ln=cdslang, search_pattern=[],
xml_record=None, user_info=None, format=''):
"""
Creates a new bibformat object, with given record.
You can either specify an record ID to format, or give its xml representation.
if 'xml_record' is not None, use 'xml_record' instead of recID for the record.
'user_info' allows to grant access to some functionalities on
a page depending on the user's priviledges. It is a dictionary
in the following form:
user_info = {
'remote_ip' : '',
'remote_host' : '',
'referer' : '',
'uri' : '',
'agent' : '',
'apache_user' : '',
'apache_group' : [],
'uid' : -1,
'nickname' : '',
'email' : '',
'group' : [],
'guest' : '1'
}
@param recID the id of a record
@param ln the language in which the record has to be formatted
@param search_pattern list of string representing the request used by the user in web interface
@param xml_record a xml string of the record to format
@param user_info the information of the user who will view the formatted page
@param format the format used for formatting this record
"""
if xml_record is not None:
# If record is given as parameter
self.record = create_record(xml_record)[0]
recID = record_get_field_value(self.record, "001")
self.lang = wash_language(ln)
self.search_pattern = search_pattern
self.recID = recID
self.format = format
self.user_info = user_info
if self.user_info is None:
self.user_info = collect_user_info(None)
def get_record(self):
"""
Returns the record of this BibFormatObject instance
@return the record structure as returned by BibRecord
"""
# Create record if necessary
if self.record is None:
# on-the-fly creation if current output is xm
record = create_record(record_get_xml(self.recID, 'xm',
on_the_fly=(self.format.lower() == 'xm')))
self.record = record[0]
return self.record
def control_field(self, tag, escape=0):
"""
Returns the value of control field given by tag in record
@param tag the marc code of a field
@param escape 1 if returned value should be escaped. Else 0.
@return value of field tag in record
"""
if self.get_record() is None:
#Case where BibRecord could not parse object
return ''
p_tag = parse_tag(tag)
field_value = record_get_field_value(self.get_record(),
p_tag[0],
p_tag[1],
p_tag[2],
p_tag[3])
if escape == 0:
return field_value
else:
return escape_field(field_value, escape)
def field(self, tag, escape=0):
"""
Returns the value of the field corresponding to tag in the
current record.
If the value does not exist, return empty string
'escape' parameter allows to escape special characters
of the field. The value of escape can be:
0 - no escaping
1 - escape all HTML characters
2 - escape all HTML characters by default. If field starts with ,
escape only unsafe characters, but leave basic HTML tags.
@param tag the marc code of a field
@param escape 1 if returned value should be escaped. Else 0. (see above for other modes)
@return value of field tag in record
"""
list_of_fields = self.fields(tag)
if len(list_of_fields) > 0:
# Escaping below
if escape == 0:
return list_of_fields[0]
else:
return escape_field(list_of_fields[0], escape)
else:
return ""
def fields(self, tag, escape=0, repeatable_subfields_p=False):
"""
Returns the list of values corresonding to "tag".
If tag has an undefined subcode (such as 999C5),
the function returns a list of dictionaries, whoose keys
are the subcodes and the values are the values of tag.subcode.
If the tag has a subcode, simply returns list of values
corresponding to tag.
Eg. for given MARC:
999C5 $a value_1a $b value_1b
999C5 $b value_2b
999C5 $b value_3b $b value_3b_bis
>> bfo.fields('999C5b')
>> ['value_1b', 'value_2b', 'value_3b', 'value_3b_bis']
>> bfo.fields('999C5')
>> [{'a':'value_1a', 'b':'value_1b'},
{'b':'value_2b'},
{'b':'value_3b'}]
By default the function returns only one value for each
subfield (that is it considers that repeatable subfields are
not allowed). It is why in the above example 'value3b_bis' is
not shown for bfo.fields('999C5'). (Note that it is not
defined which of value_3b or value_3b_bis is returned). This
is to simplify the use of the function, as most of the time
subfields are not repeatable (in that way we get a string
instead of a list). You can allow repeatable subfields by
setting 'repeatable_subfields_p' parameter to True. In
this mode, the above example would return:
>> bfo.fields('999C5b', repeatable_subfields_p=True)
>> ['value_1b', 'value_2b', 'value_3b']
>> bfo.fields('999C5', repeatable_subfields_p=True)
>> [{'a':['value_1a'], 'b':['value_1b']},
{'b':['value_2b']},
{'b':['value_3b', 'value3b_bis']}]
NOTICE THAT THE RETURNED STRUCTURE IS DIFFERENT. Also note
that whatever the value of 'repeatable_subfields_p' is,
bfo.fields('999C5b') always show all fields, even repeatable
ones. This is because the parameter has no impact on the
returned structure (it is always a list).
'escape' parameter allows to escape special characters
of the fields. The value of escape can be:
0 - no escaping
1 - escape all HTML characters
2 - escape all dangerous HTML tags.
3 - Mix of mode 1 and 2. If value of field starts with
, then use mode 2. Else use mode 1.
4 - Remove all HTML tags
@param tag the marc code of a field
@param escape 1 if returned values should be escaped. Else 0.
@repeatable_subfields_p if True, returns the list of subfields in the dictionary
@return values of field tag in record
"""
if self.get_record() is None:
# Case where BibRecord could not parse object
return []
p_tag = parse_tag(tag)
if p_tag[3] != "":
# Subcode has been defined. Simply returns list of values
values = record_get_field_values(self.get_record(),
p_tag[0],
p_tag[1],
p_tag[2],
p_tag[3])
if escape == 0:
return values
else:
return [escape_field(value, escape) for value in values]
else:
# Subcode is undefined. Returns list of dicts.
# However it might be the case of a control field.
instances = record_get_field_instances(self.get_record(),
p_tag[0],
p_tag[1],
p_tag[2])
if repeatable_subfields_p:
list_of_instances = []
for instance in instances:
instance_dict = {}
for subfield in instance[0]:
if not instance_dict.has_key(subfield[0]):
instance_dict[subfield[0]] = []
if escape == 0:
instance_dict[subfield[0]].append(subfield[1])
else:
instance_dict[subfield[0]].append(escape_field(subfield[1], escape))
list_of_instances.append(instance_dict)
return list_of_instances
else:
if escape == 0:
return [dict(instance[0]) for instance in instances]
else:
return [dict([ (subfield[0], escape_field(subfield[1], escape)) \
for subfield in instance[0] ]) \
for instance in instances]
def kb(self, kb, string, default=""):
"""
Returns the value of the "string" in the knowledge base "kb".
If kb does not exist or string does not exist in kb,
returns 'default' string or empty string if not specified.
@param kb a knowledge base name
@param string the string we want to translate
@param default a default value returned if 'string' not found in 'kb'
"""
if string is None:
return default
val = get_kb_mapping(kb, string, default)
if val is None:
return default
else:
return val
def escape_field(value, mode=0):
"""
Utility function used to escape the value of a field in given mode.
- mode 0: no escaping
- mode 1: escaping all HTML/XML characters (escaped chars are shown as escaped)
- mode 2: escaping dangerous HTML tags to avoid XSS, but
keep basic one (such as )
Escaped characters are removed.
- mode 3: mix of mode 1 and mode 2. If field_value starts with ,
then use mode 2. Else use mode 1.
- mode 4: escaping all HTML/XML tags (escaped tags are removed)
-
"""
if mode == 1:
return cgi.escape(value)
elif mode == 2:
return washer.wash(value,
allowed_attribute_whitelist=['href',
'name',
'class']
)
elif mode == 3:
if value.lstrip(' \n').startswith(html_field):
return washer.wash(value,
allowed_attribute_whitelist=['href',
'name',
'class']
)
else:
return cgi.escape(value)
elif mode == 4:
return washer.wash(value,
allowed_attribute_whitelist=[],
allowed_tag_whitelist=[]
)
else:
return value
def bf_profile():
"""
Runs a benchmark
"""
for i in range(1, 51):
format_record(i, "HD", ln=cdslang, verbose=9, search_pattern=[])
return
if __name__ == "__main__":
import profile
import pstats
#bf_profile()
profile.run('bf_profile()', "bibformat_profile")
p = pstats.Stats("bibformat_profile")
p.strip_dirs().sort_stats("cumulative").print_stats()
diff --git a/modules/bibformat/lib/bibformat_engine_tests.py b/modules/bibformat/lib/bibformat_engine_tests.py
index 57cfc0447..a2d921f37 100644
--- a/modules/bibformat/lib/bibformat_engine_tests.py
+++ b/modules/bibformat/lib/bibformat_engine_tests.py
@@ -1,695 +1,695 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""Test cases for the BibFormat engine. Also test
some utilities function in bibformat_utils module"""
__revision__ = "$Id$"
# pylint: disable-msg=C0301
import unittest
import os
import sys
from invenio import bibformat_engine
from invenio import bibformat_utils
from invenio import bibformat_config
from invenio import bibformatadminlib
from invenio import bibrecord
-from invenio.config import tmpdir
+from invenio.config import CFG_TMPDIR
#CFG_BIBFORMAT_OUTPUTS_PATH = "..%setc%soutput_formats" % (os.sep, os.sep)
#CFG_BIBFORMAT_TEMPLATES_PATH = "..%setc%sformat_templates" % (os.sep, os.sep)
#CFG_BIBFORMAT_ELEMENTS_PATH = "elements"
-CFG_BIBFORMAT_OUTPUTS_PATH = "%s" % (tmpdir)
-CFG_BIBFORMAT_TEMPLATES_PATH = "%s" % (tmpdir)
-CFG_BIBFORMAT_ELEMENTS_PATH = "%s%stests_bibformat_elements" % (tmpdir, os.sep)
+CFG_BIBFORMAT_OUTPUTS_PATH = "%s" % (CFG_TMPDIR)
+CFG_BIBFORMAT_TEMPLATES_PATH = "%s" % (CFG_TMPDIR)
+CFG_BIBFORMAT_ELEMENTS_PATH = "%s%stests_bibformat_elements" % (CFG_TMPDIR, os.sep)
CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH = "tests_bibformat_elements"
class FormatTemplateTest(unittest.TestCase):
""" bibformat - tests on format templates"""
def test_get_format_template(self):
"""bibformat - format template parsing and returned structure"""
bibformat_engine.CFG_BIBFORMAT_TEMPLATES_PATH = CFG_BIBFORMAT_TEMPLATES_PATH
#Test correct parsing and structure
template_1 = bibformat_engine.get_format_template("Test1.bft", with_attributes=True)
self.assert_(template_1 is not None)
self.assertEqual(template_1['code'], "test")
self.assertEqual(template_1['attrs']['name'], "name_test")
self.assertEqual(template_1['attrs']['description'], "desc_test")
#Test correct parsing and structure of file without description or name
template_2 = bibformat_engine.get_format_template("Test_2.bft", with_attributes=True)
self.assert_(template_2 is not None)
self.assertEqual(template_2['code'], "test")
self.assertEqual(template_2['attrs']['name'], "Test_2.bft")
self.assertEqual(template_2['attrs']['description'], "")
#Test correct parsing and structure of file without description or name
unknown_template = bibformat_engine.get_format_template("test_no_template.test", with_attributes=True)
self.assertEqual(unknown_template, None)
def test_get_format_templates(self):
""" bibformat - loading multiple format templates"""
bibformat_engine.CFG_BIBFORMAT_TEMPLATES_PATH = CFG_BIBFORMAT_TEMPLATES_PATH
templates = bibformat_engine.get_format_templates(with_attributes=True)
#test correct loading
self.assert_("Test1.bft" in templates.keys())
self.assert_("Test_2.bft" in templates.keys())
self.assert_("Test3.bft" in templates.keys())
self.assert_("Test_no_template.test" not in templates.keys())
#Test correct pasrsing and structure
self.assertEqual(templates['Test1.bft']['code'], "test")
self.assertEqual(templates['Test1.bft']['attrs']['name'], "name_test")
self.assertEqual(templates['Test1.bft']['attrs']['description'], "desc_test")
def test_get_format_template_attrs(self):
""" bibformat - correct parsing of attributes in format template"""
bibformat_engine.CFG_BIBFORMAT_TEMPLATES_PATH = CFG_BIBFORMAT_TEMPLATES_PATH
attrs = bibformat_engine.get_format_template_attrs("Test1.bft")
self.assertEqual(attrs['name'], "name_test")
self.assertEqual(attrs['description'], "desc_test")
def test_get_fresh_format_template_filename(self):
""" bibformat - getting fresh filename for format template"""
bibformat_engine.CFG_BIBFORMAT_TEMPLATES_PATH = CFG_BIBFORMAT_TEMPLATES_PATH
filename_and_name_1 = bibformat_engine.get_fresh_format_template_filename("Test")
self.assert_(len(filename_and_name_1) >= 2)
self.assertEqual(filename_and_name_1[0], "Test.bft")
filename_and_name_2 = bibformat_engine.get_fresh_format_template_filename("Test1")
self.assert_(len(filename_and_name_2) >= 2)
self.assert_(filename_and_name_2[0] != "Test1.bft")
path = bibformat_engine.CFG_BIBFORMAT_TEMPLATES_PATH + os.sep + filename_and_name_2[0]
self.assert_(not os.path.exists(path))
class FormatElementTest(unittest.TestCase):
""" bibformat - tests on format templates"""
def setUp(self):
# pylint: disable-msg=C0103
"""bibformat - setting python path to test elements"""
- sys.path.append('%s' % tmpdir)
+ sys.path.append('%s' % CFG_TMPDIR)
def test_resolve_format_element_filename(self):
"""bibformat - resolving format elements filename """
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_PATH = CFG_BIBFORMAT_ELEMENTS_PATH
#Test elements filename starting without bfe_, with underscore instead of space
filenames = ["test 1", "test 1.py", "bfe_test 1", "bfe_test 1.py", "BFE_test 1",
"BFE_TEST 1", "BFE_TEST 1.py", "BFE_TeST 1.py", "BFE_TeST 1",
"BfE_TeST 1.py", "BfE_TeST 1","test_1", "test_1.py", "bfe_test_1",
"bfe_test_1.py", "BFE_test_1",
"BFE_TEST_1", "BFE_TEST_1.py", "BFE_Test_1.py", "BFE_TeST_1",
"BfE_TeST_1.py", "BfE_TeST_1"]
for i in range(len(filenames)-2):
filename_1 = bibformat_engine.resolve_format_element_filename(filenames[i])
self.assert_(filename_1 is not None)
filename_2 = bibformat_engine.resolve_format_element_filename(filenames[i+1])
self.assertEqual(filename_1, filename_2)
#Test elements filename starting with bfe_, and with underscores instead of spaces
filenames = ["test 2", "test 2.py", "bfe_test 2", "bfe_test 2.py", "BFE_test 2",
"BFE_TEST 2", "BFE_TEST 2.py", "BFE_TeST 2.py", "BFE_TeST 2",
"BfE_TeST 2.py", "BfE_TeST 2","test_2", "test_2.py", "bfe_test_2",
"bfe_test_2.py", "BFE_test_2",
"BFE_TEST_2", "BFE_TEST_2.py", "BFE_TeST_2.py", "BFE_TeST_2",
"BfE_TeST_2.py", "BfE_TeST_2"]
for i in range(len(filenames)-2):
filename_1 = bibformat_engine.resolve_format_element_filename(filenames[i])
self.assert_(filename_1 is not None)
filename_2 = bibformat_engine.resolve_format_element_filename(filenames[i+1])
self.assertEqual(filename_1, filename_2)
#Test non existing element
non_existing_element = bibformat_engine.resolve_format_element_filename("BFE_NON_EXISTING_ELEMENT")
self.assertEqual(non_existing_element, None)
def test_get_format_element(self):
"""bibformat - format elements parsing and returned structure"""
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_PATH = CFG_BIBFORMAT_ELEMENTS_PATH
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH = CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH
#Test loading with different kind of names, for element with spaces in name, without bfe_
element_1 = bibformat_engine.get_format_element("test 1", with_built_in_params=True)
self.assert_(element_1 is not None)
element_1_bis = bibformat_engine.get_format_element("bfe_tEst_1.py", with_built_in_params=True)
self.assertEqual(element_1, element_1_bis)
#Test loading with different kind of names, for element without spaces in name, wit bfe_
element_2 = bibformat_engine.get_format_element("test 2", with_built_in_params=True)
self.assert_(element_2 is not None)
element_2_bis = bibformat_engine.get_format_element("bfe_tEst_2.py", with_built_in_params=True)
self.assertEqual(element_2, element_2_bis)
#Test loading incorrect elements
element_3 = bibformat_engine.get_format_element("test 3", with_built_in_params=True)
self.assertEqual(element_3, None)
element_4 = bibformat_engine.get_format_element("test 4", with_built_in_params=True)
self.assertEqual(element_4, None)
unknown_element = bibformat_engine.get_format_element("TEST_NO_ELEMENT", with_built_in_params=True)
self.assertEqual(unknown_element, None)
#Test element without docstring
element_5 = bibformat_engine.get_format_element("test_5", with_built_in_params=True)
self.assert_(element_5 is not None)
self.assertEqual(element_5['attrs']['description'], '')
self.assert_({'name':"param1",
'description':"(no description provided)",
'default':""} in element_5['attrs']['params'] )
self.assertEqual(element_5['attrs']['seealso'], [])
#Test correct parsing:
#Test type of element
self.assertEqual(element_1['type'], "python")
#Test name = element filename, with underscore instead of spaces,
#without BFE_ and uppercase
self.assertEqual(element_1['attrs']['name'], "TEST_1")
#Test description parsing
self.assertEqual(element_1['attrs']['description'], "Prints test")
#Test @see parsing
self.assertEqual(element_1['attrs']['seealso'], ["element2.py", "unknown_element.py"])
#Test @param parsing
self.assert_({'name':"param1",
'description':"desc 1",
'default':""} in element_1['attrs']['params'] )
self.assert_({'name':"param2",
'description':"desc 2",
'default':"default value"} in element_1['attrs']['params'] )
#Test non existing element
non_existing_element = bibformat_engine.get_format_element("BFE_NON_EXISTING_ELEMENT")
self.assertEqual(non_existing_element, None)
def test_get_format_element_attrs_from_function(self):
""" bibformat - correct parsing of attributes in 'format' docstring"""
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_PATH = CFG_BIBFORMAT_ELEMENTS_PATH
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH = CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH
element_1 = bibformat_engine.get_format_element("test 1", with_built_in_params=True)
function = element_1['code']
attrs = bibformat_engine.get_format_element_attrs_from_function(function,
element_1['attrs']['name'],
with_built_in_params=True)
self.assertEqual(attrs['name'], "TEST_1")
#Test description parsing
self.assertEqual(attrs['description'], "Prints test")
#Test @see parsing
self.assertEqual(attrs['seealso'], ["element2.py", "unknown_element.py"])
def test_get_format_elements(self):
"""bibformat - multiple format elements parsing and returned structure"""
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_PATH = CFG_BIBFORMAT_ELEMENTS_PATH
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH = CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH
elements = bibformat_engine.get_format_elements()
self.assert_(isinstance(elements, dict))
self.assertEqual(elements['TEST_1']['attrs']['name'], "TEST_1")
self.assertEqual(elements['TEST_2']['attrs']['name'], "TEST_2")
self.assert_("TEST_3" not in elements.keys())
self.assert_("TEST_4" not in elements.keys())
def test_get_tags_used_by_element(self):
"""bibformat - identification of tag usage inside element"""
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_PATH = bibformat_config.CFG_BIBFORMAT_ELEMENTS_PATH
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH = bibformat_config.CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH
tags = bibformatadminlib.get_tags_used_by_element('bfe_abstract.py')
self.failUnless(len(tags) == 4,
'Could not correctly identify tags used in bfe_abstract.py')
class OutputFormatTest(unittest.TestCase):
""" bibformat - tests on output formats"""
def test_get_output_format(self):
""" bibformat - output format parsing and returned structure """
bibformat_engine.CFG_BIBFORMAT_OUTPUTS_PATH = CFG_BIBFORMAT_OUTPUTS_PATH
filename_1 = bibformat_engine.resolve_output_format_filename("test1")
output_1 = bibformat_engine.get_output_format(filename_1, with_attributes=True)
self.assertEqual(output_1['attrs']['names']['generic'], "")
self.assert_(isinstance(output_1['attrs']['names']['ln'], dict))
self.assert_(isinstance(output_1['attrs']['names']['sn'], dict))
self.assertEqual(output_1['attrs']['code'], "TEST1")
self.assert_(len(output_1['attrs']['code']) <= 6)
self.assertEqual(len(output_1['rules']), 4)
self.assertEqual(output_1['rules'][0]['field'], '980.a')
self.assertEqual(output_1['rules'][0]['template'], 'Picture_HTML_detailed.bft')
self.assertEqual(output_1['rules'][0]['value'], 'PICTURE ')
self.assertEqual(output_1['rules'][1]['field'], '980.a')
self.assertEqual(output_1['rules'][1]['template'], 'Article.bft')
self.assertEqual(output_1['rules'][1]['value'], 'ARTICLE')
self.assertEqual(output_1['rules'][2]['field'], '980__a')
self.assertEqual(output_1['rules'][2]['template'], 'Thesis_detailed.bft')
self.assertEqual(output_1['rules'][2]['value'], 'THESIS ')
self.assertEqual(output_1['rules'][3]['field'], '980__a')
self.assertEqual(output_1['rules'][3]['template'], 'Pub.bft')
self.assertEqual(output_1['rules'][3]['value'], 'PUBLICATION ')
filename_2 = bibformat_engine.resolve_output_format_filename("TEST2")
output_2 = bibformat_engine.get_output_format(filename_2, with_attributes=True)
self.assertEqual(output_2['attrs']['names']['generic'], "")
self.assert_(isinstance(output_2['attrs']['names']['ln'], dict))
self.assert_(isinstance(output_2['attrs']['names']['sn'], dict))
self.assertEqual(output_2['attrs']['code'], "TEST2")
self.assert_(len(output_2['attrs']['code']) <= 6)
self.assertEqual(output_2['rules'], [])
unknown_output = bibformat_engine.get_output_format("unknow", with_attributes=True)
self.assertEqual(unknown_output, {'rules':[],
'default':"",
'attrs':{'names':{'generic':"", 'ln':{}, 'sn':{}},
'description':'',
'code':"UNKNOW",
'visibility': 1,
'content_type':""}})
def test_get_output_formats(self):
""" bibformat - loading multiple output formats """
bibformat_engine.CFG_BIBFORMAT_OUTPUTS_PATH = CFG_BIBFORMAT_OUTPUTS_PATH
outputs = bibformat_engine.get_output_formats(with_attributes=True)
self.assert_(isinstance(outputs, dict))
self.assert_("TEST1.bfo" in outputs.keys())
self.assert_("TEST2.bfo" in outputs.keys())
self.assert_("unknow.bfo" not in outputs.keys())
#Test correct parsing
output_1 = outputs["TEST1.bfo"]
self.assertEqual(output_1['attrs']['names']['generic'], "")
self.assert_(isinstance(output_1['attrs']['names']['ln'], dict))
self.assert_(isinstance(output_1['attrs']['names']['sn'], dict))
self.assertEqual(output_1['attrs']['code'], "TEST1")
self.assert_(len(output_1['attrs']['code']) <= 6)
def test_get_output_format_attrs(self):
""" bibformat - correct parsing of attributes in output format"""
bibformat_engine.CFG_BIBFORMAT_OUTPUTS_PATH = CFG_BIBFORMAT_OUTPUTS_PATH
attrs= bibformat_engine.get_output_format_attrs("TEST1")
self.assertEqual(attrs['names']['generic'], "")
self.assert_(isinstance(attrs['names']['ln'], dict))
self.assert_(isinstance(attrs['names']['sn'], dict))
self.assertEqual(attrs['code'], "TEST1")
self.assert_(len(attrs['code']) <= 6)
def test_resolve_output_format(self):
""" bibformat - resolving output format filename"""
bibformat_engine.CFG_BIBFORMAT_OUTPUTS_PATH = CFG_BIBFORMAT_OUTPUTS_PATH
filenames = ["test1", "test1.bfo", "TEST1", "TeST1", "TEST1.bfo", "test1"]
for i in range(len(filenames)-2):
filename_1 = bibformat_engine.resolve_output_format_filename(filenames[i])
self.assert_(filename_1 is not None)
filename_2 = bibformat_engine.resolve_output_format_filename(filenames[i+1])
self.assertEqual(filename_1, filename_2)
def test_get_fresh_output_format_filename(self):
""" bibformat - getting fresh filename for output format"""
bibformat_engine.CFG_BIBFORMAT_OUTPUTS_PATH = CFG_BIBFORMAT_OUTPUTS_PATH
filename_and_name_1 = bibformat_engine.get_fresh_output_format_filename("test")
self.assert_(len(filename_and_name_1) >= 2)
self.assertEqual(filename_and_name_1[0], "TEST.bfo")
filename_and_name_1_bis = bibformat_engine.get_fresh_output_format_filename("")
self.assert_(len(filename_and_name_1_bis) >= 2)
self.assertEqual(filename_and_name_1_bis[0], "TEST.bfo")
filename_and_name_2 = bibformat_engine.get_fresh_output_format_filename("test1")
self.assert_(len(filename_and_name_2) >= 2)
self.assert_(filename_and_name_2[0] != "TEST1.bfo")
path = bibformat_engine.CFG_BIBFORMAT_OUTPUTS_PATH + os.sep + filename_and_name_2[0]
self.assert_(not os.path.exists(path))
filename_and_name_3 = bibformat_engine.get_fresh_output_format_filename("test1testlong")
self.assert_(len(filename_and_name_3) >= 2)
self.assert_(filename_and_name_3[0] != "TEST1TESTLONG.bft")
self.assert_(len(filename_and_name_3[0]) <= 6 + 1 + len(bibformat_config.CFG_BIBFORMAT_FORMAT_OUTPUT_EXTENSION))
path = bibformat_engine.CFG_BIBFORMAT_OUTPUTS_PATH + os.sep + filename_and_name_3[0]
self.assert_(not os.path.exists(path))
class PatternTest(unittest.TestCase):
""" bibformat - tests on re patterns"""
def test_pattern_lang(self):
""" bibformat - correctness of pattern 'pattern_lang'"""
text = '''
Here is my test text
Some wordsQuelques motsEinige Wörter garbage
Here ends the middle of my test text
EnglishFrançaisDeutschHere ends my test text
'''
result = bibformat_engine.pattern_lang.search(text)
self.assertEqual(result.group("langs"), "Some wordsQuelques motsEinige Wörter garbage ")
text = '''
Here is my test text
'''
result = bibformat_engine.pattern_lang.search(text)
self.assertEqual(result.group("langs"), "Some wordsQuelques motsEinige Wörter garbage ")
def test_ln_pattern(self):
""" bibformat - correctness of pattern 'ln_pattern'"""
text = "Some wordsQuelques motsEinige Wörter garbage "
result = bibformat_engine.ln_pattern.search(text)
self.assertEqual(result.group(1), "en")
self.assertEqual(result.group(2), "Some words")
def test_pattern_format_template_name(self):
""" bibformat - correctness of pattern 'pattern_format_template_name'"""
text = '''
garbage
a namea description on
2 lines
the content of the template
content
'''
result = bibformat_engine.pattern_format_template_name.search(text)
self.assertEqual(result.group('name'), "a name")
def test_pattern_format_template_desc(self):
""" bibformat - correctness of pattern 'pattern_format_template_desc'"""
text = '''
garbage
a namea description on
2 lines
the content of the template
content
'''
result = bibformat_engine.pattern_format_template_desc.search(text)
self.assertEqual(result.group('desc'), '''a description on
2 lines ''')
def test_pattern_tag(self):
""" bibformat - correctness of pattern 'pattern_tag'"""
text = '''
garbage but part of content
a namea description on
2 lines
the content of the template
my content is so nice!
'''
result = bibformat_engine.pattern_tag.search(text)
self.assertEqual(result.group('function_name'), "tiTLE")
self.assertEqual(result.group('params').strip(), '''param1="value1"
param2=""''')
def test_pattern_function_params(self):
""" bibformat - correctness of pattern 'test_pattern_function_params'"""
text = '''
param1="" param2="value2"
param3="value3" garbage
'''
names = ["param1", "param2", "param3"]
values = ["", "value2", "value3"]
results = bibformat_engine.pattern_format_element_params.finditer(text) #TODO
param_i = 0
for match in results:
self.assertEqual(match.group('param'), names[param_i])
self.assertEqual(match.group('value'), values [param_i])
param_i += 1
def test_pattern_format_element_params(self):
""" bibformat - correctness of pattern 'pattern_format_element_params'"""
text = '''
a description for my element
some text
@param param1 desc1
@param param2 desc2
@see seethis, seethat
'''
names = ["param1", "param2"]
descriptions = ["desc1", "desc2"]
results = bibformat_engine.pattern_format_element_params.finditer(text) #TODO
param_i = 0
for match in results:
self.assertEqual(match.group('name'), names[param_i])
self.assertEqual(match.group('desc'), descriptions[param_i])
param_i += 1
def test_pattern_format_element_seealso(self):
""" bibformat - correctness of pattern 'pattern_format_element_seealso' """
text = '''
a description for my element
some text
@param param1 desc1
@param param2 desc2
@see seethis, seethat
'''
result = bibformat_engine.pattern_format_element_seealso.search(text)
self.assertEqual(result.group('see').strip(), 'seethis, seethat')
class MiscTest(unittest.TestCase):
""" bibformat - tests on various functions"""
def test_parse_tag(self):
""" bibformat - result of parsing tags"""
tags_and_parsed_tags = ['245COc', ['245', 'C', 'O', 'c'],
'245C_c', ['245', 'C', '' , 'c'],
'245__c', ['245', '' , '' , 'c'],
'245__$$c', ['245', '' , '' , 'c'],
'245__$c', ['245', '' , '' , 'c'],
'245 $c', ['245', '' , '' , 'c'],
'245 $$c', ['245', '' , '' , 'c'],
'245__.c', ['245', '' , '' , 'c'],
'245 .c', ['245', '' , '' , 'c'],
'245C_$c', ['245', 'C', '' , 'c'],
'245CO$$c', ['245', 'C', 'O', 'c'],
'245CO.c', ['245', 'C', 'O', 'c'],
'245$c', ['245', '' , '' , 'c'],
'245.c', ['245', '' , '' , 'c'],
'245$$c', ['245', '' , '' , 'c'],
'245__%', ['245', '' , '' , '%'],
'245__$$%', ['245', '' , '' , '%'],
'245__$%', ['245', '' , '' , '%'],
'245 $%', ['245', '' , '' , '%'],
'245 $$%', ['245', '' , '' , '%'],
'245$%', ['245', '' , '' , '%'],
'245.%', ['245', '' , '' , '%'],
'245_O.%', ['245', '' , 'O', '%'],
'245.%', ['245', '' , '' , '%'],
'245$$%', ['245', '' , '' , '%'],
'2%5$$a', ['2%5', '' , '' , 'a'],
'2%%%%a', ['2%%', '%', '%', 'a'],
'2%%__a', ['2%%', '' , '' , 'a'],
'2%%a', ['2%%', '' , '' , 'a']]
for i in range(0, len(tags_and_parsed_tags), 2):
parsed_tag = bibformat_utils.parse_tag(tags_and_parsed_tags[i])
self.assertEqual(parsed_tag, tags_and_parsed_tags[i+1])
class FormatTest(unittest.TestCase):
""" bibformat - generic tests on function that do the formatting. Main functions"""
def setUp(self):
# pylint: disable-msg=C0103
""" bibformat - prepare BibRecord objects"""
self.xml_text_1 = '''
33thesisDoe1, JohnDoe2, JohneditorOn the foo and bar1On the foo and bar299999
'''
#rec_1 = bibrecord.create_record(self.xml_text_1)
self.bfo_1 = bibformat_engine.BibFormatObject(recID=None,
ln='fr',
xml_record=self.xml_text_1)
self.xml_text_2 = '''
33thesis Doe1, JohnDoe2, JohneditorOn the foo and bar1On the foo and bar2
'''
#self.rec_2 = bibrecord.create_record(xml_text_2)
self.bfo_2 = bibformat_engine.BibFormatObject(recID=None,
ln='fr',
xml_record=self.xml_text_2)
self.xml_text_3 = '''
33engDoe1, JohnDoe2, JohneditorOn the foo and bar1On the foo and bar2article
'''
#self.rec_3 = bibrecord.create_record(xml_text_3)
self.bfo_3 = bibformat_engine.BibFormatObject(recID=None,
ln='fr',
xml_record=self.xml_text_3)
def test_decide_format_template(self):
""" bibformat - choice made by function decide_format_template"""
bibformat_engine.CFG_BIBFORMAT_OUTPUTS_PATH = CFG_BIBFORMAT_OUTPUTS_PATH
result = bibformat_engine.decide_format_template(self.bfo_1, "test1")
self.assertEqual(result, "Thesis_detailed.bft")
result = bibformat_engine.decide_format_template(self.bfo_3, "test3")
self.assertEqual(result, "Test3.bft")
#Only default matches
result = bibformat_engine.decide_format_template(self.bfo_2, "test1")
self.assertEqual(result, "Default_HTML_detailed.bft")
#No match at all for record
result = bibformat_engine.decide_format_template(self.bfo_2, "test2")
self.assertEqual(result, None)
#Non existing output format
result = bibformat_engine.decide_format_template(self.bfo_2, "UNKNOW")
self.assertEqual(result, None)
def test_format_record(self):
""" bibformat - correct formatting"""
bibformat_engine.CFG_BIBFORMAT_OUTPUTS_PATH = CFG_BIBFORMAT_OUTPUTS_PATH
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_PATH = CFG_BIBFORMAT_ELEMENTS_PATH
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH = CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH
bibformat_engine.CFG_BIBFORMAT_TEMPLATES_PATH = CFG_BIBFORMAT_TEMPLATES_PATH
#use output format that has no match TEST DISABLED DURING MIGRATION
#result = bibformat_engine.format_record(recID=None, of="test2", xml_record=self.xml_text_2)
#self.assertEqual(result.replace("\n", ""),"")
#use output format that link to unknown template
result = bibformat_engine.format_record(recID=None, of="test3", xml_record=self.xml_text_2)
self.assertEqual(result.replace("\n", ""),"")
#Unknown output format TEST DISABLED DURING MIGRATION
#result = bibformat_engine.format_record(recID=None, of="unkno", xml_record=self.xml_text_3)
#self.assertEqual(result.replace("\n", ""),"")
#Default formatting
result = bibformat_engine.format_record(recID=None, ln='fr', of="test3", xml_record=self.xml_text_3)
self.assertEqual(result,'''
hi
this is my template\ntesttfrgarbage\n test me!<b>ok</b>a default valueeditor\n test me!oka default valueeditor\n test me!<b>ok</b>a default valueeditor\n''')
def test_format_with_format_template(self):
""" bibformat - correct formatting with given template"""
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_PATH = CFG_BIBFORMAT_ELEMENTS_PATH
bibformat_engine.CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH = CFG_BIBFORMAT_ELEMENTS_IMPORT_PATH
bibformat_engine.CFG_BIBFORMAT_TEMPLATES_PATH = CFG_BIBFORMAT_TEMPLATES_PATH
template = bibformat_engine.get_format_template("Test3.bft")
result = bibformat_engine.format_with_format_template(format_template_filename = None,
bfo=self.bfo_1,
verbose=0,
format_template_code=template['code'])
self.assert_(isinstance(result, tuple))
self.assertEqual(result[0],'''
hi
this is my template\ntesttfrgarbage\n test me!<b>ok</b>a default valueeditor\n test me!oka default valueeditor\n test me!<b>ok</b>a default valueeditor\n99999''')
def create_test_suite():
"""Return test suite for the bibformat module"""
return unittest.TestSuite((unittest.makeSuite(FormatTemplateTest,'test'),
unittest.makeSuite(OutputFormatTest,'test'),
unittest.makeSuite(FormatElementTest,'test'),
unittest.makeSuite(PatternTest,'test'),
unittest.makeSuite(MiscTest,'test'),
unittest.makeSuite(FormatTest,'test')))
if __name__ == '__main__':
unittest.TextTestRunner(verbosity=2).run(create_test_suite())
diff --git a/modules/bibformat/lib/bibformat_templates.py b/modules/bibformat/lib/bibformat_templates.py
index a0a29cdeb..44e2de973 100644
--- a/modules/bibformat/lib/bibformat_templates.py
+++ b/modules/bibformat/lib/bibformat_templates.py
@@ -1,2301 +1,2301 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""HTML Templates for BibFormat administration"""
__revision__ = "$Id$"
# non Invenio imports
import cgi
# Invenio imports
from invenio.messages import gettext_set_language
from invenio.config import weburl, sweburl
from invenio.messages import language_list_long
from invenio.config import CFG_PATH_PHP
class Template:
"""Templating class, refer to bibformat.py for examples of call"""
def tmpl_admin_index(self, ln, warnings, is_admin):
"""
Returns the main BibFormat admin page.
@param ln language
@param warnings a list of warnings to display at top of page. None if no warning
@param is_admin indicate if user is authorized to use BibFormat
@return main BibFormat admin page
"""
_ = gettext_set_language(ln) # load the right message language
out = ''
if warnings:
out += '''
%(warnings)s
''' % {'warnings': ' '.join(warnings)}
out += '''
This is where you can edit the formatting styles available for the records. '''
if not is_admin:
out += '''You need to
login to enter.
''' % {'weburl':weburl}
out += '''
'''% {'weburl':weburl, 'ln':ln}
if CFG_PATH_PHP:
#Show PHP admin only if PHP is enabled
out += '''
Old
BibFormat admin interface (in gray box)
The BibFormat admin interface enables you to specify how the
bibliographic data is presented to the end user in the search
interface and search results pages. For example, you may specify that
titles should be printed in bold font, the abstract in small italic,
etc. Moreover, the BibFormat is not only a simple bibliographic data
output formatter, but also an automated link
constructor. For example, from the information on journal name
and pages, it may automatically create links to publisher's site based
on some configuration rules.
Configuring BibFormat
By default, a simple HTML format based on the most common fields
(title, author, abstract, keywords, fulltext link, etc) is defined.
You certainly want to define your own ouput formats in case you have a
specific metadata structure.
Define one or more output BibFormat behaviours. These are then
passed as parameters to the BibFormat modules while executing
formatting.
Example: You can tell BibFormat that is has to enrich the
incoming metadata file by the created format, or that it only has to
print the format out.
Define how the metadata tags from input are mapped into internal
BibFormat variable names. The variable names can afterwards be used
in formatting and linking rules.
Example: You can tell that 100 $a field
should be mapped into $100.a internal variable that you
could use later.
Define rules for automated creation of URI links from mapped
internal variables.
Example: You can tell a rule how to create a link to
People database out of the $100.a internal variable
repesenting author's name. (The $100.a variable was mapped
in the previous step, see the Extraction Rules.)
Define file format types based on file extensions. This will be
used when proposing various fulltext services.
Example: You can tell that *.pdf files will
be treated as PDF files.
Define your own functions that you can reuse when creating your
own output formats. This enables you to do complex formatting without
ever touching the BibFormat core code.
Example: You can define a function how to match and
extract email addresses out of a text file.
Define the output formats, i.e. how to create the output out of
internal BibFormat variables that were extracted in a previous step.
This is the functionality you would want to configure most of the
time. It may reuse formats, user defined functions, knowledge bases,
etc.
Example: You can tell that authors should be printed in
italic, that if there are more than 10 authors only the first three
should be printed, etc.
Define one or more knowledge bases that enables you to transform
various forms of input data values into the unique standard form on
the output.
Example: You can tell that Phys Rev D and
Physical Review D are both the same journal and that these
names should be standardized to Phys Rev : D.
Enables you to test your formats on your sample data file. Useful
when debugging newly created formats.
To learn more on BibFormat configuration, you can consult the BibFormat Admin Guide.
Running BibFormat
From the Web interface
Run Reformat Records tool.
This tool permits you to update stored formats for bibliographic records.
It should normally be used after configuring BibFormat's
Behaviours and
Formats.
When these are ready, you can choose to rebuild formats for selected
collections or you can manually enter a search query and the web interface
will accomplish all necessary formatting steps.
Example: You can request Photo collections to have their HTML
brief formats rebuilt, or you can reformat all the records written by Ellis.
From the command-line interface
Consider having an XML MARC data file that is to be uploaded into
the CDS Invenio. (For example, it might have been harvested from other
sources and processed via BibConvert.)
Having configured BibFormat and its default output type behaviour, you
would then run this file throught BibFormat as follows:
that would create default HTML formats and would "enrich" the input
XML data file by this format. (You would then continue the upload
procedure by calling successively BibUpload and BibWords.)
+ href="../bibindex/">BibIndex.)
Now consider a different situation. You would like to add a new
possible format, say "HTML portfolio" and "HTML captions" in order to
nicely format multiple photographs in one page. Let us suppose that
these two formats are called hp and hc and
are already loaded in the collection_format table.
(TODO: describe how this is done via WebAdmin.) You would then
proceed as follows: firstly, you would prepare the corresponding output behaviours called HP
and HC (TODO: note the uppercase!) that would not enrich
the input file but that would produce an XML file with only
001 and FMT tags. (This is in order not to
update the bibliographic information but the formats only.) You would
also prepare corresponding formats
at the same time. Secondly, you would launch the formatting as
follows:
that should give you an XML file containing only 001 and FMT tags.
Finally, you would upload the formats:
$ bibupload < /tmp/sample_fmts_only.xml
and that's it. The new formats should now appear in WebSearch.
''' % {'weburl':weburl, 'ln':ln}
return out
def tmpl_admin_format_template_show_attributes(self, ln, name, description, filename, editable,
all_templates=[], new=False):
"""
Returns a page to change format template name and description
If template is new, offer a way to create a duplicate from an
existing template
@param ln language
@param name the name of the format
@param description the description of the format
@param filename the filename of the template
@param editable True if we let user edit, else False
@param all_templates a list of tuples (filename, name) of all other templates
@param new if True, the format template has just been added (is new)
@return editor for 'format'
"""
_ = gettext_set_language(ln) # load the right message language
out = ""
out += '''
''' % {'ln':ln,
'menu':_("Menu"),
'filename':filename,
'close_editor': _("Close Editor"),
'modify_template_attributes': _("Modify Template Attributes"),
'template_editor': _("Template Editor"),
'check_dependencies': _("Check Dependencies")
}
disabled = ""
readonly = ""
if not editable:
disabled = 'disabled="disabled"'
readonly = 'readonly="readonly"'
out += '''
''' % {"description": description,
'ln':ln,
'filename':filename,
'disabled':disabled,
'readonly':readonly,
'description_label': _("Description"),
'update_format_attributes': _("Update Format Attributes"),
'weburl':weburl
}
return out
def tmpl_admin_format_template_show_dependencies(self, ln, name, filename, output_formats, format_elements, tags):
"""
Shows the dependencies (on elements) of the given format.
@param name the name of the template
@param filename the filename of the template
@param format_elements the elements (and list of tags in each element) this template depends on
@param output_formats the output format that depend on this template
@param tags the tags that are called by format elements this template depends on.
"""
_ = gettext_set_language(ln) # load the right message language
out = '''
'
for output_format in output_formats:
name = output_format['names']['generic']
filename = output_format['filename']
out += ''' %(name)s''' % {'filename':filename,
'name':name,
'ln':ln}
if len(output_format['tags']) > 0:
out += "("+", ".join(output_format['tags'])+")"
out += " "
#Print format elements (and tags)
out += '
'
if len(format_elements) == 0:
out += '
This format template uses no format element.
'
for format_element in format_elements:
name = format_element['name']
out += ''' %(name)s''' % {'name':"bfe_"+name.lower(),
'anchor':name.upper(),
'ln':ln}
if len(format_element['tags']) > 0:
out += "("+", ".join(format_element['tags'])+")"
out += " "
#Print tags
out += '
'
if len(tags) == 0:
out += '
This format template uses no tag.
'
for tag in tags:
out += '''%(tag)s ''' % { 'tag':tag}
out += '''
*Note: Some tags linked with this format template might not be shown. Check manually.
'''
return out
def tmpl_admin_format_template_show(self, ln, name, description, code, filename, ln_for_preview, pattern_for_preview, editable, content_type_for_preview, content_types):
"""
Returns the editor for format templates. Edit 'format'
@param ln language
@param format the format to edit
@param filename the filename of the template
@param ln_for_preview the language for the preview (for bfo)
@param pattern_for_preview the search pattern to be used for the preview (for bfo)
@param editable True if we let user edit, else False
@param code the code of the template of the editor
@return editor for 'format'
"""
_ = gettext_set_language(ln) # load the right message language
out = ""
# If xsl, hide some options in the menu
nb_menu_options = 4
if filename.endswith('.xsl'):
nb_menu_options = 2
out += '''
''' % {'weburl':weburl, 'ln':ln}
return out
def tmpl_admin_format_template_show_short_doc(self, ln, format_elements):
"""
Prints the format element documentation in a condensed way to display
inside format template editor.
This page is different from others: it is displayed inside a \n")
else:
oai_out2.write("\n")
oai_out2.write("%s"
"\n" % recID)
if oaisetentry:
# Record is still part of some set
# Keep the OAI ID as such
pass
else:
# Remove record from OAI repository
# i.e. remove OAI ID
oai_out2.write(datafield_id_head)
oai_out2.write("\n")
oai_out2.write("\n")
oai_out2.write("\n")
oai_out2.write(datafield_set_head)
oai_out2.write("\n")
if oaisetentry:
# Record is still part of some set
oai_out2.write(oaisetentry)
else:
# Remove record from OAI repository
oai_out2.write("\n")
oai_out2.write("\n")
oai_out2.write("\n")
elif (mode==1) or mode == 4:
# Add mode (1)
# or clean mode (4)
if ((add_ID_entry)or(oaisetentry)):
if (CFG_OAI_ID_FIELD[0:5] == CFG_OAI_SET_FIELD[0:5]):
# Put set and OAI ID in the same datafield
oai_out.write("\n")
oai_out.write("%s"
"\n" % recID)
oai_out.write(datafield_id_head)
oai_out.write("\n")
if(add_ID_entry):
oai_out.write(oaiIDentry)
if(oaisetentry):
oai_out.write(oaisetentry)
oai_out.write("\n")
oai_out.write("\n")
else:
oai_out.write("\n")
oai_out.write("%s"
"\n" % recID)
if(add_ID_entry):
oai_out.write(datafield_id_head)
oai_out.write("\n")
oai_out.write(oaiIDentry)
oai_out.write("\n")
if(oaisetentry):
oai_out.write(datafield_set_head)
oai_out.write("\n")
oai_out.write(oaisetentry)
oai_out.write("\n")
oai_out.write("\n")
if mode == 4:
# Update records that should no longer be in this set
# Fetch records that are currently marked with this set in
# the database
oai_has_list = perform_request_search(c=cdsname, p1=set,
f1=CFG_OAI_SET_FIELD, m1="e", ap=0)
# Fetch records that should not be in this set
# (oai_has_list - recID_list)
records_to_update = [rec_id for rec_id in oai_has_list \
if not rec_id in recID_list]
datafield_set_head = "" % (CFG_OAI_SET_FIELD[0:3], \
CFG_OAI_SET_FIELD[3:4].replace('_', ' '), \
CFG_OAI_SET_FIELD[4:5].replace('_', ' '))
datafield_id_head = "" % (CFG_OAI_ID_FIELD[0:3], \
CFG_OAI_ID_FIELD[3:4].replace('_', ' '), \
CFG_OAI_ID_FIELD[4:5].replace('_', ' '))
for recID in records_to_update:
oaiIDentry = "oai:%s:%s\n" % \
(CFG_OAI_ID_FIELD[5:6], CFG_OAI_ID_PREFIX, recID)
### Check to which sets this record belongs
query = "select b3.value from bibrec_bib%sx as br " \
"left join bib%sx as b3 on br.id_bibxxx=b3.id " \
"where b3.tag=%%s and br.id_bibrec=%%s" % \
(CFG_OAI_SET_FIELD[0:2], CFG_OAI_SET_FIELD[0:2])
res = run_sql(query, (CFG_OAI_SET_FIELD, recID))
oaisetentry = ''
for in_set in res:
if in_set[0] != set:
oaisetentry += "%s" \
"\n" % \
(CFG_OAI_SET_FIELD[5:6], in_set[0])
if (CFG_OAI_ID_FIELD[0:5] == CFG_OAI_SET_FIELD[0:5]):
# Put set and OAI ID in the same datafield
oai_out2.write("\n")
oai_out2.write("%s"
"\n" % recID)
oai_out2.write(datafield_id_head)
oai_out2.write("\n")
# if oaisetentry:
# Record is still part of some sets
oai_out2.write(oaiIDentry)
oai_out2.write(oaisetentry)
## else:
## # Remove record from OAI repository
## oai_out.write("\n")
## oai_out.write("\n")
oai_out2.write("\n")
oai_out2.write("\n")
else:
oai_out2.write("\n")
oai_out2.write("%s"
"\n" % recID)
## if oaisetentry:
## # Record is still part of some set
## # Keep the OAI ID as such
## pass
## else:
## # Remove record from OAI repository
## # i.e. remove OAI ID
## oai_out.write(datafield_id_head)
## oai_out.write("\n")
## oai_out.write("\n")
## oai_out.write("\n")
oai_out2.write(datafield_set_head)
oai_out2.write("\n")
# if oaisetentry:
# Record is still part of some set
oai_out2.write(oaisetentry)
# else:
# # Remove record from OAI repository
# oai_out.write("\n")
oai_out2.write("\n")
oai_out2.write("\n")
if mode == 1 or mode == 4:
oai_out.close()
if mode == 2 or mode == 4:
oai_out2.close()
if upload:
if (mode == 1 or mode == 4) and oaisetentrycount > 0:
# Check if file is empty or not:
len_file = os.stat(filename)[ST_SIZE]
if len_file > 0:
- command = "%s/bibupload -a %s" % (bindir, filename)
+ command = "%s/bibupload -a %s" % (CFG_BINDIR, filename)
os.system(command)
if mode == 2 or mode == 4:
# Check if file is empty or not:
len_file = os.stat(filename2)[ST_SIZE]
if len_file > 0:
- command = "%s/bibupload -c %s" % (bindir, filename2)
+ command = "%s/bibupload -c %s" % (CFG_BINDIR, filename2)
os.system(command)
return True
#########################
def main():
"""Main that construct all the bibtask."""
task_set_option('upload', 0)
task_set_option('mode', 0)
task_set_option('oaiset', 0)
task_set_option('nice', 0)
task_init(authorization_action='runoaiarchive',
authorization_msg="OAI Archive Task Submission",
description="Examples:\n"
" Expose set 'setSpec' via OAI repository gateway\n"
" oaiarchive --oaiset='setSpec' --add --upload\n"
" oaiarchive -apo 'setSpec'\n\n"
" Expose multiple sets via OAI repository gateway\n"
" oaiarchive --oaiset='setSpec1 setSpec2 setSpec3' --add --upload\n"
" oaiarchive -apo 'setSpec1 setSpec2 setSpec3'\n\n"
" Remove records defined by 'setSpec' from OAI repository\n"
" oaiarchive --oaiset='setSpec' --delete --upload\n"
" oaiarchive -dpo 'setSpec'\n\n"
" Expose entire repository via OAI gateway\n"
" oaiarchive --set=global --add --upload\n"
" oaiarchive -apo global\n\n"
" Print OAI set status\n"
" oaiarchive --oaiset='setSpec' --info\n"
" oaiarchive -io 'setSpec'\n\n"
" Print OAI repository status\n"
" oaiarchive -r\n\n",
help_specific_usage="Options:\n"
" -o --oaiset= Specify setSpec(s) (whitespace separated list of setSpecs) to expose via OAI\n"
"Modes\n"
" -a --add Add records to OAI repository\n"
" -d --delete Remove records from OAI repository\n"
" -r --report OAI repository status\n"
" -i --info Give info about OAI set (default)\n"
"Additional parameters:\n"
" -p --upload Upload records\n",
version=__revision__,
specific_params=("ado:pirn", [
"add",
"delete",
"oaiset=",
"upload",
"info",
"report",
"no-process"]),
task_submit_elaborate_specific_parameter_fnc=
task_submit_elaborate_specific_parameter,
task_run_fnc=oaiarchive_task)
def task_submit_elaborate_specific_parameter(key, value, opts, args):
"""Elaborate specific CLI parameters of oaiarchive"""
if key in ("-n", "--nice"):
task_set_option("nice", value)
elif key in ("-o", "--oaiset"):
sets = [set for set in value.split(' ') if set != '']
task_set_option("oaiset", sets)
elif key in ("-a", "--add"):
task_set_option("mode", 1)
elif key in ("-d", "--delete"):
task_set_option("mode", 2)
elif key in ("-c", "--clean"):
task_set_option("mode", 4)
elif key in ("-p", "--upload"):
task_set_option("upload", 1)
elif key in ("-i", "--info"):
task_set_option("mode", 0)
elif key in ("-r", "--report"):
task_set_option("mode", 3)
elif key in ("-n", "--no-process"):
task_set_option("upload", 0)
else:
return False
return True
### okay, here we go:
if __name__ == '__main__':
main()
diff --git a/modules/bibharvest/lib/oaiarchiveadminlib.py b/modules/bibharvest/lib/oaiarchiveadminlib.py
index d8926bac5..4fb1f47ec 100644
--- a/modules/bibharvest/lib/oaiarchiveadminlib.py
+++ b/modules/bibharvest/lib/oaiarchiveadminlib.py
@@ -1,751 +1,751 @@
## $Id$
## Administrator interface for the OAI repository and archive
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""CDS Invenio OAI Repository and Archive Administrator Interface."""
__revision__ = "$Id$"
import sys
import cgi
import re
import os
import string
import ConfigParser
import time
import random
import urllib
from invenio.config import \
cdslang, \
- tmpdir, \
- version, \
+ CFG_TMPDIR, \
+ CFG_VERSION, \
weburl
import invenio.access_control_engine as access_manager
from invenio.dbquery import run_sql
from invenio.webpage import page, pageheaderonly, pagefooteronly
from invenio.webuser import getUid, get_email
from invenio.oaiarchive_engine import parse_set_definition
import invenio.template
bibharvest_templates = invenio.template.load('bibharvest')
-tmppath = tmpdir + '/oaiarchiveadmin.' + str(os.getpid())
+tmppath = CFG_TMPDIR + '/oaiarchiveadmin.' + str(os.getpid())
guideurl = "help/admin/bibharvest-admin-guide"
def getnavtrail(previous = ''):
"""Get navtrail"""
return bibharvest_templates.tmpl_getnavtrail(previous = previous, ln = cdslang)
def perform_request_index(ln=cdslang):
"""OAI archive admin index"""
out = '''
Define below the sets to expose through the OAI harvesting
protocol. You will have to run the
oaiarchive
utility to apply the settings you have defined here.
"
return out
def input_text(ln, title, name, value):
""""""
if name is None:
name = ""
if value is None:
value = ""
text = """
%s
""" % title
text += """
""" % (cgi.escape(name,1), cgi.escape(value, 1))
return text
def pagebody_text(title):
""""""
text = """%s""" % title
return text
def bar_text(title):
""""""
text = """%s""" % title
return text
def input_form(oai_set_name, oai_set_spec, oai_set_collection, oai_set_description, oai_set_definition, oai_set_reclist, oai_set_p1, oai_set_f1,oai_set_m1, oai_set_p2, oai_set_f2,oai_set_m2, oai_set_p3, oai_set_f3, oai_set_m3, oai_set_op1, oai_set_op2, ln=cdslang):
""""""
modes = {
'r' : 'Regular Expression',
'a' : 'All of the words',
'y' : 'Any of the words',
'e' : 'Exact phrase',
'p' : 'Partial phrase'
}
mode_dropdown = [['r','',modes['r']],
['e','',modes['e']],
['p','',modes['p']],
['a','',modes['a']],
['y','',modes['y']],
['','','']]
operators = {
'a' : 'AND',
'o' : 'OR',
'n' : 'AND NOT',
}
mode_operators_1 = [['a','',operators['a']],
['o','',operators['o']],
['n','',operators['n']],
['a','','']]
mode_operators_2 = [['a','',operators['a']],
['o','',operators['o']],
['n','',operators['n']],
['a','','']]
text = " "
text += "
"
text += input_text(ln = cdslang, title = "OAI Set spec:", name = "oai_set_spec", value = oai_set_spec)
text += "
"
text += input_text(ln = cdslang, title = "OAI Set name:", name = "oai_set_name", value = oai_set_name)
text += "
"
# text += input_text(ln = cdslang, title = "OAI Set description", name = "oai_set_description", value = oai_set_description)
#text += "
"
#menu = create_drop_down_menu("SELECT distinct(name) from collection")
#menu.append(['','',''])
#if (oai_set_collection):
# menu.append([oai_set_collection,'selected',oai_set_collection])
#else:
# menu.append(['','selected','Collection'])
text += input_text(ln = cdslang, title = "Collection(s):", name="oai_set_collection", value=oai_set_collection)
#text += drop_down_menu("oai_set_collection", menu)
text += "
"
text += input_text(ln = cdslang, title = "Phrase:", name = "oai_set_p1", value = oai_set_p1)
text += "
"
fields = create_drop_down_menu("SELECT distinct(code) from field")
fields.append(['','',''])
if (oai_set_f1):
fields.append([oai_set_f1,'selected',oai_set_f1])
else:
fields.append(['','selected','Field'])
if (oai_set_m1):
mode_dropdown.append([oai_set_m1,'selected',modes[oai_set_m1]])
else:
mode_dropdown.append(['','selected','Mode'])
text += drop_down_menu("oai_set_f1", fields)
text += "
"
text += drop_down_menu("oai_set_m1", mode_dropdown)
text += "
"
if (oai_set_op1):
mode_operators_1.append([oai_set_op1,'selected',operators[oai_set_op1]])
else:
mode_operators_1.append(['','selected','Operators'])
text += drop_down_menu("oai_set_op1", mode_operators_1)
text += "
"
text += input_text(ln = cdslang, title = "Phrase:", name = "oai_set_p2", value = oai_set_p2)
text += "
"
fields = create_drop_down_menu("SELECT distinct(code) from field")
fields.append(['','',''])
if (oai_set_f2):
fields.append([oai_set_f2,'selected',oai_set_f2])
else:
fields.append(['','selected','Field'])
if (oai_set_m2):
mode_dropdown.append([oai_set_m2,'selected',modes[oai_set_m2]])
else:
mode_dropdown.append(['','selected','Mode'])
text += drop_down_menu("oai_set_f2", fields)
text += "
"
text += drop_down_menu("oai_set_m2", mode_dropdown)
text += "
"
if (oai_set_op2):
mode_operators_2.append([oai_set_op2,'selected',operators[oai_set_op2]])
else:
mode_operators_2.append(['','selected','Operators'])
text += drop_down_menu("oai_set_op2", mode_operators_2)
text += "
"
text += input_text(ln = cdslang, title = "Phrase:", name = "oai_set_p3", value = oai_set_p3)
text += "
"
fields = create_drop_down_menu("SELECT distinct(code) from field")
fields.append(['','',''])
if (oai_set_f3):
fields.append([oai_set_f3,'selected',oai_set_f3])
else:
fields.append(['','selected','Field'])
if (oai_set_m3):
mode_dropdown.append([oai_set_m3,'selected',modes[oai_set_m3]])
else:
mode_dropdown.append(['','selected','Mode'])
text += drop_down_menu("oai_set_f3", fields)
text += "
"
text += drop_down_menu("oai_set_m3", mode_dropdown)
text += "
"
return text
def check_user(req, role, adminarea=2, authorized=0):
""""""
(auth_code, auth_message) = access_manager.acc_authorize_action(req, role)
if not authorized and auth_code != 0:
return ("false", auth_message)
return ("", auth_message)
def transform_tuple(header=[], tuple=[], start='', end='', extracolumn=''):
""""""
align = []
try:
firstrow = tuple[0]
if type(firstrow) in [int, long]:
align = ['admintdright']
elif type(firstrow) in [str, dict]:
align = ['admintdleft']
else:
for item in firstrow:
if type(item) is int:
align.append('admintdright')
else:
align.append('admintdleft')
except IndexError:
firstrow = []
tblstr = ''
for h in header:
tblstr += '
%s
\n' % (h, )
if tblstr: tblstr = '
\n%s\n
\n' % (tblstr, )
tblstr = start + '
\n' + tblstr
try:
extra = '
'
if type(firstrow) not in [int, long, str, dict]:
for i in range(len(firstrow)): extra += '
%s
\n' % (align[i], firstrow[i])
else:
extra += '
%s
\n' % (align[0], firstrow)
#extra += '
\n%s\n
\n
\n' % (len(tuple), extracolumn)
extra += '\n'
except IndexError:
extra = ''
tblstr += extra
j = 1
for row in tuple[1:]:
style = ''
if j % 2:
style = ' style="background-color: rgb(235, 247, 255);"'
j += 1
tblstr += '
\n' % style
if type(row) not in [int, long, str, dict]:
for i in range(len(row)): tblstr += '
%s
\n' % (align[i], row[i])
else:
tblstr += '
%s
\n' % (align[0], row)
tblstr += '
\n'
tblstr += '
\n '
tblstr += end
return tblstr
def nice_box(header='', datalist=[], cls="admin_wvar"):
""""""
if len(datalist) == 1: per = '100'
else: per = '75'
out = '
\n'
out += """
%s
""" % (len(datalist), header)
out += '
\n'
out += """
%s
""" % (per+'%', datalist[0])
if len(datalist) > 1:
out += """
%s
""" % ('25%', datalist[1])
out += '
\n'
out += """
"""
return out
def extended_input_form(action="", text="", button="func", cnfrm='', **hidden):
""""""
out = '\n'
return out
diff --git a/modules/bibharvest/lib/oaiharvestlib.py b/modules/bibharvest/lib/oaiharvestlib.py
index 2535f9392..1cb6a6142 100644
--- a/modules/bibharvest/lib/oaiharvestlib.py
+++ b/modules/bibharvest/lib/oaiharvestlib.py
@@ -1,570 +1,570 @@
# -*- coding: utf-8 -*-
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
oaiharvest implementation. See oaiharvest executable for entry point.
"""
__revision__ = "$Id$"
import os
import re
import time
import calendar
import shutil
from invenio.config import \
- bindir, \
- tmpdir
+ CFG_BINDIR, \
+ CFG_TMPDIR
from invenio.dbquery import run_sql
from invenio.bibtask import task_get_option, task_set_option, write_message, \
task_update_status, task_init
## precompile some often-used regexp for speed reasons:
re_subfields = re.compile('\$\$\w')
re_html = re.compile("(?s)<[^>]*>|?\w+;")
re_datetime_shift = re.compile("([-\+]{0, 1})([\d]+)([dhms])")
-tmpHARVESTpath = tmpdir + '/oaiharvest'
+tmpHARVESTpath = CFG_TMPDIR + '/oaiharvest'
def get_nb_records_in_file(filename):
"""
Return number of record in FILENAME that is either harvested or converted
file. Useful for statistics.
"""
try:
nb = open(filename, 'r').read().count("")
except IOError:
nb = 0 # file not exists and such
except:
nb = -1
return nb
def task_run_core():
"""Run the harvesting task. The row argument is the Bibharvest task
queue row, containing if, arguments, etc.
Return 1 in case of success and 0 in case of failure.
"""
reposlist = []
datelist = []
dateflag = 0
### go ahead: build up the reposlist
if task_get_option("repository") is not None:
### user requests harvesting from selected repositories
write_message("harvesting from selected repositories")
for reposname in task_get_option("repository"):
row = get_row_from_reposname(reposname)
if row == []:
write_message("source name " + reposname + " is not valid")
continue
else:
reposlist.append(get_row_from_reposname(reposname))
else:
### user requests harvesting from all repositories
write_message("harvesting from all repositories in the database")
reposlist = get_all_rows_from_db()
### go ahead: check if user requested from-until harvesting
if task_get_option("dates"):
### for each repos simply perform a from-until date harvesting...
### no need to update anything
dateflag = 1
for element in task_get_option("dates"):
datelist.append(element)
error_happened_p = False
for repos in reposlist:
postmode = str(repos[0][9])
setspecs = str(repos[0][10])
harvested_files = []
if postmode == "h" or postmode == "h-c" or \
postmode == "h-u" or postmode == "h-c-u" or \
postmode == "h-c-f-u":
- harvestpath = tmpdir + "/oaiharvest" + str(os.getpid())
+ harvestpath = CFG_TMPDIR + "/oaiharvest" + str(os.getpid())
if dateflag == 1:
res = call_bibharvest(prefix=repos[0][2],
baseurl=repos[0][1],
harvestpath=harvestpath,
fro=str(datelist[0]),
until=str(datelist[1]),
setspecs=setspecs)
if res[0] == 1 :
write_message("source " + str(repos[0][6]) + \
" was harvested from " + str(datelist[0]) \
+ " to " + str(datelist[1]))
harvested_files = res[1]
else:
write_message("an error occurred while harvesting "
"from source " +
str(repos[0][6]) + " for the dates chosen")
error_happened_p = True
continue
elif dateflag != 1 and repos[0][7] is None and repos[0][8] != 0:
write_message("source " + str(repos[0][6]) + \
" was never harvested before - harvesting whole "
"repository")
res = call_bibharvest(prefix=repos[0][2],
baseurl=repos[0][1],
harvestpath=harvestpath,
setspecs=setspecs)
if res[0] == 1 :
update_lastrun(repos[0][0])
harvested_files = res[1]
else :
write_message("an error occurred while harvesting from "
"source " + str(repos[0][6]))
error_happened_p = True
continue
elif dateflag != 1 and repos[0][8] != 0:
### check that update is actually needed,
### i.e. lastrun+frequency>today
timenow = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
lastrundate = re.sub(r'\.[0-9]+$', '',
str(repos[0][7])) # remove trailing .00
timeinsec = int(repos[0][8])*60*60
updatedue = add_timestamp_and_timelag(lastrundate, timeinsec)
proceed = compare_timestamps_with_tolerance(updatedue, timenow)
if proceed == 0 or proceed == -1 : #update needed!
write_message("source " + str(repos[0][6]) +
" is going to be updated")
fromdate = str(repos[0][7])
fromdate = fromdate.split()[0] # get rid of time
# of the day for the moment
res = call_bibharvest(prefix=repos[0][2],
baseurl=repos[0][1],
harvestpath=harvestpath,
fro=fromdate,
setspecs=setspecs)
if res[0] == 1 :
update_lastrun(repos[0][0])
harvested_files = res[1]
else :
write_message("an error occurred while harvesting "
"from source " + str(repos[0][6]))
error_happened_p = True
continue
else:
write_message("source " + str(repos[0][6]) +
" does not need updating")
continue
elif dateflag != 1 and repos[0][8] == 0:
write_message("source " + str(repos[0][6]) +
" has frequency set to 'Never' so it will not be updated")
continue
# print stats:
for harvested_file in harvested_files:
write_message("File %s contains %i records." % \
(harvested_file,
get_nb_records_in_file(harvested_file)))
if postmode == "h-u":
res = 0
for harvested_file in harvested_files:
res += call_bibupload(harvested_file)
if res == 0:
write_message("material harvested from source " +
str(repos[0][6]) + " was successfully uploaded")
else:
write_message("an error occurred while uploading "
"harvest from " + str(repos[0][6]))
error_happened_p = True
continue
if postmode == "h-c" or postmode == "h-c-u" or postmode == "h-c-f-u":
- convert_dir = tmpdir
+ convert_dir = CFG_TMPDIR
convertpath = convert_dir + os.sep +"bibconvertrun" + \
str(os.getpid())
converted_files = []
i = 0
for harvested_file in harvested_files:
converted_file = convertpath+".%07d" % i
converted_files.append(converted_file)
res = call_bibconvert(config=str(repos[0][5]),
harvestpath=harvested_file,
convertpath=converted_file)
i += 1
if res == 0:
write_message("material harvested from source " +
str(repos[0][6]) + " was successfully converted")
else:
write_message("an error occurred while converting from " +
str(repos[0][6]))
error_happened_p = True
continue
# print stats:
for converted_file in converted_files:
write_message("File %s contains %i records." % \
(converted_file,
get_nb_records_in_file(converted_file)))
if postmode == "h-c-u":
res = 0
for converted_file in converted_files:
res += call_bibupload(converted_file)
if res == 0:
write_message("material harvested from source " +
str(repos[0][6]) + " was successfully uploaded")
else:
write_message("an error occurred while uploading "
"harvest from " + str(repos[0][6]))
error_happened_p = True
continue
elif postmode == "h-c-f-u":
# first call bibfilter:
res = 0
for converted_file in converted_files:
res += call_bibfilter(str(repos[0][11]), converted_file)
if res == 0:
write_message("material harvested from source " +
str(repos[0][6]) + " was successfully bibfiltered")
else:
write_message("an error occurred while uploading "
"harvest from " + str(repos[0][6]))
error_happened_p = True
continue
# print stats:
for converted_file in converted_files:
write_message("File %s contains %i records." % \
(converted_file + ".insert.xml",
get_nb_records_in_file(converted_file + ".insert.xml")))
write_message("File %s contains %i records." % \
(converted_file + ".correct.xml",
get_nb_records_in_file(converted_file + ".correct.xml")))
# only then call upload:
for converted_file in converted_files:
res += call_bibupload(converted_file + ".insert.xml", "-i")
res += call_bibupload(converted_file + ".correct.xml", "-c")
if res == 0:
write_message("material harvested from source " +
str(repos[0][6]) + " was successfully uploaded")
else:
write_message("an error occurred while uploading "
"harvest from " + str(repos[0][6]))
error_happened_p = True
continue
elif postmode not in ["h", "h-c", "h-u",
"h-c-u", "h-c-f-u"]: ### this should not happen
write_message("invalid postprocess mode: " + postmode +
" skipping repository")
error_happened_p = True
continue
if error_happened_p:
task_update_status("DONE WITH ERRORS")
else:
task_update_status("DONE")
return True
def add_timestamp_and_timelag(timestamp,
timelag):
""" Adds a time lag in seconds to a given date (timestamp).
Returns the resulting date. """
# remove any trailing .00 in timestamp:
timestamp = re.sub(r'\.[0-9]+$', '', timestamp)
# first convert timestamp to Unix epoch seconds:
timestamp_seconds = calendar.timegm(time.strptime(timestamp,
"%Y-%m-%d %H:%M:%S"))
# now add them:
result_seconds = timestamp_seconds + timelag
result = time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime(result_seconds))
return result
def update_lastrun(index):
""" A method that updates the lastrun of a repository
successfully harvested """
try:
today = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
sql = 'UPDATE oaiHARVEST SET lastrun=%s WHERE id=%s'
run_sql(sql, (today, index))
return 1
except StandardError, e:
return (0, e)
def call_bibharvest(prefix, baseurl, harvestpath,
fro="", until="", setspecs=""):
""" A method that calls bibharvest and writes harvested output to disk """
try:
- command = '%s/bibharvest -o %s -v ListRecords -p %s ' % (bindir,
+ command = '%s/bibharvest -o %s -v ListRecords -p %s ' % (CFG_BINDIR,
harvestpath,
prefix)
if fro != "":
command += '-f %s ' % fro
if until != "":
command += '-u %s ' % until
if setspecs != "":
command += '-s "%s" ' % setspecs
command += baseurl
print "Start harvesting"
#print command
os.system(command)
#print "Harvesting finished, merging files"
harvest_dir, harvest_filename = os.path.split(harvestpath)
#print "get files"
files = os.listdir(harvest_dir)
#print "sort file"
files.sort()
harvested_files = [harvest_dir + os.sep + filename for \
filename in files \
if filename.startswith(harvest_filename)]
#print "open dest file"
## hf = file(harvestpath, 'w')
## for f in files:
## if f.startswith(filename)
## print "processing file %s"%f
## rf = file(os.path.join(harvestdir, f), 'r')
## hf.write(rf.read())
## hf.write("\n")
## rf.close()
## #os.remove(os.path.join(harvestdir, f))
## hf.close()
## print "Files merged"
return (1, harvested_files)
except StandardError, e:
print e
return (0, e)
def call_bibconvert(config, harvestpath, convertpath):
""" A method that reads from a file and converts according to a BibConvert
Configuration file. Converted output is returned """
- command = """%s/bibconvert -c %s < %s > %s """ % (bindir, config,
+ command = """%s/bibconvert -c %s < %s > %s """ % (CFG_BINDIR, config,
harvestpath, convertpath)
os.popen(command)
return 0
def call_bibupload(marcxmlfile, mode="-r -i"):
"""Call bibupload in insert mode on MARCXMLFILE."""
if os.path.exists(marcxmlfile):
- command = '%s/bibupload %s %s ' % (bindir, mode, marcxmlfile)
+ command = '%s/bibupload %s %s ' % (CFG_BINDIR, mode, marcxmlfile)
return os.system(command)
else:
write_message("marcxmlfile %s does not exist" % marcxmlfile)
return 1
def call_bibfilter(bibfilterprogram, marcxmlfile):
"""
Call bibfilter program BIBFILTERPROGRAM on MARCXMLFILE that is a
MARCXML file usually obtained after harvest and convert steps.
The bibfilter should produce two files called MARCXMLFILE.insert.xml
and MARCXMLFILE.correct.xml, the first file containing parts of
MARCXML to be uploaded in insert mode and the second file part of
MARCXML to be uploaded in correct mode.
Return 0 if everything went okay, 1 otherwise.
"""
if bibfilterprogram:
if not os.path.isfile(bibfilterprogram):
write_message("bibfilterprogram %s is not a file" %
bibfilterprogram)
return 1
elif not os.path.isfile(marcxmlfile):
write_message("marcxmlfile %s is not a file" % marcxmlfile)
return 1
else:
return os.system('%s %s' % (bibfilterprogram, marcxmlfile))
else:
try:
write_message("no bibfilterprogram defined, copying %s only" %
marcxmlfile)
shutil.copy(marcxmlfile, marcxmlfile + ".insert.xml")
return 0
except:
write_message("cannot copy %s into %s.insert.xml" % marcxmlfile)
return 1
def get_row_from_reposname(reposname):
""" Returns all information about a row (OAI source)
from the source name """
try:
sql = """SELECT id, baseurl, metadataprefix, arguments,
comment, bibconvertcfgfile, name, lastrun,
frequency, postprocess, setspecs,
bibfilterprogram
FROM oaiHARVEST WHERE name=%s"""
res = run_sql(sql, (reposname, ))
reposdata = []
for element in res:
reposdata.append(element)
return reposdata
except StandardError, e:
return (0, e)
def get_all_rows_from_db():
""" This method retrieves the full database of repositories and returns
a list containing (in exact order):
| id | baseurl | metadataprefix | arguments | comment
| bibconvertcfgfile | name | lastrun | frequency
| postprocess | setspecs | bibfilterprogram
"""
try:
reposlist = []
sql = """SELECT id FROM oaiHARVEST"""
idlist = run_sql(sql)
for index in idlist:
sql = """SELECT id, baseurl, metadataprefix, arguments,
comment, bibconvertcfgfile, name, lastrun,
frequency, postprocess, setspecs,
bibfilterprogram
FROM oaiHARVEST WHERE id=%s""" % index
reposelements = run_sql(sql)
repos = []
for element in reposelements:
repos.append(element)
reposlist.append(repos)
return reposlist
except StandardError, e:
return (0, e)
def compare_timestamps_with_tolerance(timestamp1,
timestamp2,
tolerance=0):
"""Compare two timestamps TIMESTAMP1 and TIMESTAMP2, of the form
'2005-03-31 17:37:26'. Optionally receives a TOLERANCE argument
(in seconds). Return -1 if TIMESTAMP1 is less than TIMESTAMP2
minus TOLERANCE, 0 if they are equal within TOLERANCE limit,
and 1 if TIMESTAMP1 is greater than TIMESTAMP2 plus TOLERANCE.
"""
# remove any trailing .00 in timestamps:
timestamp1 = re.sub(r'\.[0-9]+$', '', timestamp1)
timestamp2 = re.sub(r'\.[0-9]+$', '', timestamp2)
# first convert timestamps to Unix epoch seconds:
timestamp1_seconds = calendar.timegm(time.strptime(timestamp1,
"%Y-%m-%d %H:%M:%S"))
timestamp2_seconds = calendar.timegm(time.strptime(timestamp2,
"%Y-%m-%d %H:%M:%S"))
# now compare them:
if timestamp1_seconds < timestamp2_seconds - tolerance:
return -1
elif timestamp1_seconds > timestamp2_seconds + tolerance:
return 1
else:
return 0
def get_dates(dates):
""" A method to validate and process the dates input by the user
at the command line """
twodates = []
if dates:
datestring = dates.split(":")
if len(datestring)==2:
for date in datestring:
### perform some checks on the date format
datechunks = date.split("-")
if len(datechunks)==3:
try:
if int(datechunks[0]) and int(datechunks[1]) and \
int(datechunks[2]):
twodates.append(date)
except StandardError:
write_message("Dates have invalid format, not "
"'yyyy-mm-dd:yyyy-mm-dd'")
twodates = None
return twodates
else:
write_message("Dates have invalid format, not "
"'yyyy-mm-dd:yyyy-mm-dd'")
twodates = None
return twodates
## final check.. date1 must me smaller than date2
date1 = str(twodates[0]) + " 01:00:00"
date2 = str(twodates[1]) + " 01:00:00"
if compare_timestamps_with_tolerance(date1, date2)!=-1:
write_message("First date must be before second date.")
twodates = None
return twodates
else:
write_message("Dates have invalid format, not "
"'yyyy-mm-dd:yyyy-mm-dd'")
twodates = None
else:
twodates = None
return twodates
def get_repository_names(repositories):
""" A method to validate and process the repository names input by the
user at the command line """
repository_names = []
if repositories:
names = repositories.split(", ")
for name in names:
### take into account both single word names and multiple word
### names (which get wrapped around "" or '')
quote = "'"
doublequote = '"'
if name.find(quote)==0 and name.find(quote)==len(name):
name = name.split(quote)[1]
if name.find(doublequote)==0 and name.find(doublequote)==len(name):
name = name.split(doublequote)[1]
repository_names.append(name)
else:
repository_names = None
return repository_names
def main():
"""Main that construct all the bibtask."""
task_set_option("repository", None)
task_set_option("dates", None)
task_init(authorization_action='runoaiharvest',
authorization_msg="oaiharvest Task Submission",
description="""Examples:
oaiharvest -r arxiv -s 24h
oaiharvest -r pubmed -d 2005-05-05:2005-05-10 -t 10m\n""",
help_specific_usage=' -r, --repository=REPOS_ONE, "REPOS TWO" '
'name of the OAI repositories to be harvested (default=all)\n'
' -d, --dates=yyyy-mm-dd:yyyy-mm-dd '
'harvest repositories between specified dates '
'(overrides repositories\' last updated timestamps)\n',
version=__revision__,
specific_params=("r:d:", ["repository=", "dates=", ]),
task_submit_elaborate_specific_parameter_fnc=
task_submit_elaborate_specific_parameter,
task_run_fnc=task_run_core)
def task_submit_elaborate_specific_parameter(key, value, opts, args):
"""Elaborate specific cli parameters for oaiharvest."""
if key in ("-r", "--repository"):
task_set_option('repositories', get_repository_names(value))
elif key in ("-d", "--dates"):
task_set_option('dates', get_dates(value))
if value is not None and task_get_option("dates") is None:
raise StandardError, "Date format not valid."
else:
return False
return True
### okay, here we go:
if __name__ == '__main__':
main()
diff --git a/modules/bibindex/lib/bibindex_engine.py b/modules/bibindex/lib/bibindex_engine.py
index 5805394c1..95229aacf 100644
--- a/modules/bibindex/lib/bibindex_engine.py
+++ b/modules/bibindex/lib/bibindex_engine.py
@@ -1,1584 +1,1584 @@
# -*- coding: utf-8 -*-
##
## $Id$
## BibIndxes bibliographic data, reference and fulltext indexing utility.
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
BibIndex indexing engine implementation. See bibindex executable for entry point.
"""
__revision__ = "$Id$"
import os
import re
import sys
import time
import urllib2
import tempfile
import traceback
from invenio.config import \
CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS, \
CFG_BIBINDEX_CHARS_PUNCTUATION, \
CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY, \
CFG_BIBINDEX_MIN_WORD_LENGTH, \
CFG_BIBINDEX_REMOVE_HTML_MARKUP, \
CFG_BIBINDEX_REMOVE_LATEX_MARKUP, \
- weburl, tmpdir
+ weburl, CFG_TMPDIR
from invenio.bibindex_engine_config import *
from invenio.bibdocfile import bibdocfile_url_to_fullpath, bibdocfile_url_p, decompose_bibdocfile_url
from invenio.search_engine import perform_request_search, strip_accents, wash_index_term, get_index_stemming_language
from invenio.dbquery import run_sql, DatabaseError, serialize_via_marshal, deserialize_via_marshal
from invenio.bibindex_engine_stopwords import is_stopword
from invenio.bibindex_engine_stemmer import stem
from invenio.bibtask import task_init, write_message, get_datetime, \
task_set_option, task_get_option, task_get_task_param, task_update_status, \
task_update_progress
from invenio.intbitset import intbitset
from invenio.errorlib import register_exception
## import optional modules:
try:
import psyco
psyco.bind(get_words_from_phrase)
psyco.bind(WordTable.merge_with_old_recIDs)
except:
pass
## precompile some often-used regexp for speed reasons:
re_subfields = re.compile('\$\$\w')
re_html = re.compile("(?s)<[^>]*>|?\w+;")
re_block_punctuation_begin = re.compile(r"^"+CFG_BIBINDEX_CHARS_PUNCTUATION+"+")
re_block_punctuation_end = re.compile(CFG_BIBINDEX_CHARS_PUNCTUATION+"+$")
re_punctuation = re.compile(CFG_BIBINDEX_CHARS_PUNCTUATION)
re_separators = re.compile(CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS)
re_datetime_shift = re.compile("([-\+]{0,1})([\d]+)([dhms])")
nb_char_in_line = 50 # for verbose pretty printing
chunksize = 1000 # default size of chunks that the records will be treated by
base_process_size = 4500 # process base size
_last_word_table = None
## Dictionary merging functions
def intersection(dict1, dict2):
"Returns intersection of the two dictionaries."
int_dict = {}
if len(dict1) > len(dict2):
for e in dict2:
if dict1.has_key(e):
int_dict[e] = 1
else:
for e in dict1:
if dict2.has_key(e):
int_dict[e] = 1
return int_dict
def union(dict1, dict2):
"Returns union of the two dictionaries."
union_dict = {}
for e in dict1.keys():
union_dict[e] = 1
for e in dict2.keys():
union_dict[e] = 1
return union_dict
def diff(dict1, dict2):
"Returns dict1 - dict2."
diff_dict = {}
for e in dict1.keys():
if not dict2.has_key(e):
diff_dict[e] = 1
return diff_dict
def list_union(list1, list2):
"Returns union of the two lists."
union_dict = {}
for e in list1:
union_dict[e] = 1
for e in list2:
union_dict[e] = 1
return union_dict.keys()
## safety function for killing slow DB threads:
def kill_sleepy_mysql_threads(max_threads=CFG_MAX_MYSQL_THREADS, thread_timeout=CFG_MYSQL_THREAD_TIMEOUT):
"""Check the number of DB threads and if there are more than
MAX_THREADS of them, lill all threads that are in a sleeping
state for more than THREAD_TIMEOUT seconds. (This is useful
for working around the the max_connection problem that appears
during indexation in some not-yet-understood cases.) If some
threads are to be killed, write info into the log file.
"""
res = run_sql("SHOW FULL PROCESSLIST")
if len(res) > max_threads:
for row in res:
r_id, dummy, dummy, dummy, r_command, r_time, dummy, dummy = row
if r_command == "Sleep" and int(r_time) > thread_timeout:
run_sql("KILL %s", (r_id,))
write_message("WARNING: too many DB threads, killing thread %s" % r_id, verbose=1)
return
## MARC-21 tag/field access functions
def get_fieldvalues(recID, tag):
"""Returns list of values of the MARC-21 'tag' fields for the record
'recID'."""
out = []
bibXXx = "bib" + tag[0] + tag[1] + "x"
bibrec_bibXXx = "bibrec_" + bibXXx
query = "SELECT value FROM %s AS b, %s AS bb WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id AND tag LIKE '%s'" \
% (bibXXx, bibrec_bibXXx, recID, tag)
res = run_sql(query)
for row in res:
out.append(row[0])
return out
def get_associated_subfield_value(recID, tag, value, associated_subfield_code):
"""Return list of ASSOCIATED_SUBFIELD_CODE, if exists, for record
RECID and TAG of value VALUE. Used by fulltext indexer only.
Note: TAG must be 6 characters long (tag+ind1+ind2+sfcode),
otherwise en empty string is returned.
FIXME: what if many tag values have the same value but different
associated_subfield_code? Better use bibrecord library for this.
"""
out = ""
if len(tag) != 6:
return out
bibXXx = "bib" + tag[0] + tag[1] + "x"
bibrec_bibXXx = "bibrec_" + bibXXx
query = """SELECT bb.field_number, b.tag, b.value FROM %s AS b, %s AS bb
WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id AND tag LIKE '%s%%'""" % \
(bibXXx, bibrec_bibXXx, recID, tag[:-1])
res = run_sql(query)
field_number = -1
for row in res:
if row[1] == tag and row[2] == value:
field_number = row[0]
if field_number > 0:
for row in res:
if row[0] == field_number and row[1] == tag[:-1] + associated_subfield_code:
out = row[2]
break
return out
def get_field_tags(field):
"""Returns a list of MARC tags for the field code 'field'.
Returns empty list in case of error.
Example: field='author', output=['100__%','700__%']."""
out = []
query = """SELECT t.value FROM tag AS t, field_tag AS ft, field AS f
WHERE f.code='%s' AND ft.id_field=f.id AND t.id=ft.id_tag
ORDER BY ft.score DESC""" % field
res = run_sql(query)
for row in res:
out.append(row[0])
return out
## Fulltext word extraction functions
def get_fulltext_urls_from_html_page(htmlpagebody):
"""Parses htmlpagebody data (the splash page content) looking for
url_directs referring to probable fulltexts.
Returns an array of (ext,url_direct) to fulltexts.
Note: it looks for file format extensions as defined by global
'CONV_PROGRAMS' structure, minus the HTML ones, because we don't
want to index HTML pages that the splash page might point to.
"""
out = []
for ext in CONV_PROGRAMS.keys():
expr = re.compile( r"\"(http://[\w]+\.+[\w]+[^\"'><]*\." + \
ext + r")\"")
match = expr.search(htmlpagebody)
if match and ext not in ['htm', 'html']:
out.append([ext, match.group(1)])
#else: # FIXME: workaround for getfile, should use bibdoc tables
#expr_getfile = re.compile(r"\"(http://.*getfile\.py\?.*format=" + ext + r"&version=.*)\"")
#match = expr_getfile.search(htmlpagebody)
#if match and ext not in ['htm', 'html']:
#out.append([ext, match.group(1)])
return out
def get_words_from_local_fulltext(path, ext=''):
# FIXME
if not ext:
ext = path[len(file_strip_ext(path))+1:].lower()
tmp_name = path.replace(';', '\\;')
- tmp_dst_name = tempfile.mkstemp('invenio.tmp.txt', dir=tmpdir)[1]
+ tmp_dst_name = tempfile.mkstemp('invenio.tmp.txt', dir=CFG_TMPDIR)[1]
# try all available conversion programs according to their order:
bingo = 0
for conv_program in CONV_PROGRAMS.get(ext, []):
if os.path.exists(conv_program):
# intelligence on how to run various conversion programs:
cmd = "" # wil keep command to run
bingo = 0 # had we success?
if os.path.basename(conv_program) == "pdftotext":
cmd = "%s -enc UTF-8 %s %s" % (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "pstotext":
if ext == "ps.gz":
# is there gzip available?
if os.path.exists(CONV_PROGRAMS_HELPERS["gz"]):
cmd = "%s -cd %s | %s > %s" \
% (CONV_PROGRAMS_HELPERS["gz"], tmp_name, conv_program, tmp_dst_name)
else:
cmd = "%s %s > %s" \
% (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "ps2ascii":
if ext == "ps.gz":
# is there gzip available?
if os.path.exists(CONV_PROGRAMS_HELPERS["gz"]):
cmd = "%s -cd %s | %s > %s"\
% (CONV_PROGRAMS_HELPERS["gz"], tmp_name,
conv_program, tmp_dst_name)
else:
cmd = "%s %s %s" \
% (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "antiword":
cmd = "%s %s > %s" % (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "catdoc":
cmd = "%s %s > %s" % (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "wvText":
cmd = "%s %s %s" % (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "ppthtml":
# is there html2text available?
if os.path.exists(CONV_PROGRAMS_HELPERS["html"]):
cmd = "%s %s | %s > %s"\
% (conv_program, tmp_name,
CONV_PROGRAMS_HELPERS["html"], tmp_dst_name)
else:
cmd = "%s %s > %s" \
% (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "xlhtml":
# is there html2text available?
if os.path.exists(CONV_PROGRAMS_HELPERS["html"]):
cmd = "%s %s | %s > %s" % \
(conv_program, tmp_name,
CONV_PROGRAMS_HELPERS["html"], tmp_dst_name)
else:
cmd = "%s %s > %s" % \
(conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "html2text":
cmd = "%s %s > %s" % \
(conv_program, tmp_name, tmp_dst_name)
else:
sys.stderr.write("Error: Do not know how to handle %s conversion program.\n" % conv_program)
# try to run it:
try:
write_message("..... launching %s" % cmd, verbose=9)
# Note we replace ; in order to make happy internal file names
errcode = os.system(cmd)
if errcode == 0 and os.path.exists(tmp_dst_name):
bingo = 1
break # bingo!
else:
write_message("Error while running %s for %s.\n" % (cmd, path), sys.stderr)
except:
write_message("Error running %s for %s.\n" % (cmd, path), sys.stderr)
# were we successful?
if bingo:
tmp_name_txt_file = open(tmp_dst_name)
for phrase in tmp_name_txt_file.xreadlines():
for word in get_words_from_phrase(phrase, stemming_language):
if not words.has_key(word):
words[word] = 1
tmp_name_txt_file.close()
else:
write_message("No conversion success for %s.\n" % (path), sys.stderr)
# delete temp files (they might not exist):
try:
os.unlink(tmp_dst_name)
except StandardError:
write_message("Error: Could not delete file. It didn't exist", sys.stderr)
write_message("... reading fulltext files from %s ended" % path, verbose=2)
return words.keys()
def get_words_from_fulltext(url_direct_or_indirect, stemming_language=None):
"""Returns all the words contained in the document specified by
URL_DIRECT_OR_INDIRECT with the words being split by various
SRE_SEPARATORS regexp set earlier. If FORCE_FILE_EXTENSION is
set (e.g. to "pdf", then treat URL_DIRECT_OR_INDIRECT as a PDF
file. (This is interesting to index Indico for example.) Note
also that URL_DIRECT_OR_INDIRECT may be either a direct URL to
the fulltext file or an URL to a setlink-like page body that
presents the links to be indexed. In the latter case the
URL_DIRECT_OR_INDIRECT is parsed to extract actual direct URLs
to fulltext documents, for all knows file extensions as
specified by global CONV_PROGRAMS config variable.
"""
if CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY and \
url_direct_or_indirect.find(weburl) < 0:
return []
write_message("... reading fulltext files from %s started" % url_direct_or_indirect, verbose=2)
fulltext_urls = []
if bibdocfile_url_p(url_direct_or_indirect):
write_message("... url %s is an internal url" % url_direct_or_indirect, verbose=9)
ext = decompose_bibdocfile_url(url_direct_or_indirect)[2]
if ext[0] == '.':
ext = ext[1:].lower()
fulltext_urls = [(ext, url_direct_or_indirect)]
else:
# check for direct link in url
url_direct_or_indirect_ext = url_direct_or_indirect.split(".")[-1].lower()
if url_direct_or_indirect_ext in CONV_PROGRAMS.keys():
fulltext_urls = [(url_direct_or_indirect_ext, url_direct_or_indirect)]
# Indirect URL. Try to discover the real fulltext(s) from this splash page URL.
if not fulltext_urls:
# read "setlink" data
try:
htmlpagebody = urllib2.urlopen(url_direct_or_indirect).read()
except Exception, e:
register_exception()
sys.stderr.write("Error: Cannot read %s: %s" % (url_direct_or_indirect, e))
return []
fulltext_urls = get_fulltext_urls_from_html_page(htmlpagebody)
write_message("... fulltext_urls = %s" % fulltext_urls, verbose=9)
write_message('... data to elaborate: %s' % fulltext_urls, verbose=9)
words = {}
# process as many urls as they were found:
for (ext, url_direct) in fulltext_urls:
write_message(".... processing %s from %s started" % (ext, url_direct), verbose=2)
# sanity check:
if not url_direct:
break
if bibdocfile_url_p(url_direct):
# Let's manage this with BibRecDocs...
# We got something like http://$(weburl)/record/xxx/yyy.ext
try:
tmp_name = bibdocfile_url_to_fullpath(url_direct)
write_message("Found internal path %s for url %s" % (tmp_name, url_direct), verbose=2)
no_src_delete = True
except Exception, e:
register_exception()
sys.stderr.write("Error in retrieving fulltext from internal url %s: %s\n" % (url_direct, e))
break # try other fulltext files...
else:
# read fulltext file:
try:
url = urllib2.urlopen(url_direct)
no_src_delete = False
except Exception, e:
register_exception()
sys.stderr.write("Error: Cannot read %s: %s\n" % (url_direct, e))
break # try other fulltext files...
tmp_fd, tmp_name = tempfile.mkstemp('invenio.tmp')
data_chunk = url.read(8*1024)
while data_chunk:
os.write(tmp_fd, data_chunk)
data_chunk = url.read(8*1024)
os.close(tmp_fd)
- tmp_dst_name = tempfile.mkstemp('invenio.tmp.txt', dir=tmpdir)[1]
+ tmp_dst_name = tempfile.mkstemp('invenio.tmp.txt', dir=CFG_TMPDIR)[1]
bingo = 0
# try all available conversion programs according to their order:
for conv_program in CONV_PROGRAMS.get(ext, []):
if os.path.exists(conv_program):
# intelligence on how to run various conversion programs:
cmd = "" # will keep command to run
bingo = 0 # had we success?
if os.path.basename(conv_program) == "pdftotext":
cmd = "%s -enc UTF-8 %s %s" % (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "pstotext":
if ext == "ps.gz":
# is there gzip available?
if os.path.exists(CONV_PROGRAMS_HELPERS["gz"]):
cmd = "%s -cd %s | %s > %s" \
% (CONV_PROGRAMS_HELPERS["gz"], tmp_name, conv_program, tmp_dst_name)
else:
cmd = "%s %s > %s" \
% (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "ps2ascii":
if ext == "ps.gz":
# is there gzip available?
if os.path.exists(CONV_PROGRAMS_HELPERS["gz"]):
cmd = "%s -cd %s | %s > %s"\
% (CONV_PROGRAMS_HELPERS["gz"], tmp_name,
conv_program, tmp_dst_name)
else:
cmd = "%s %s %s" \
% (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "antiword":
cmd = "%s %s > %s" % (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "catdoc":
cmd = "%s %s > %s" % (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "wvText":
cmd = "%s %s %s" % (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "ppthtml":
# is there html2text available?
if os.path.exists(CONV_PROGRAMS_HELPERS["html"]):
cmd = "%s %s | %s > %s"\
% (conv_program, tmp_name,
CONV_PROGRAMS_HELPERS["html"], tmp_dst_name)
else:
cmd = "%s %s > %s" \
% (conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "xlhtml":
# is there html2text available?
if os.path.exists(CONV_PROGRAMS_HELPERS["html"]):
cmd = "%s %s | %s > %s" % \
(conv_program, tmp_name,
CONV_PROGRAMS_HELPERS["html"], tmp_dst_name)
else:
cmd = "%s %s > %s" % \
(conv_program, tmp_name, tmp_dst_name)
elif os.path.basename(conv_program) == "html2text":
cmd = "%s %s > %s" % \
(conv_program, tmp_name, tmp_dst_name)
else:
sys.stderr.write("Error: Do not know how to handle %s conversion program.\n" % conv_program)
# try to run it:
try:
write_message("..... launching %s" % cmd, verbose=9)
# Note we replace ; in order to make happy internal file names
errcode = os.system(cmd.replace(';', '\\;'))
if errcode == 0 and os.path.exists(tmp_dst_name):
bingo = 1
break # bingo!
else:
write_message("Error while running %s for %s.\n" % (cmd, url_direct), sys.stderr)
except:
write_message("Error running %s for %s.\n" % (cmd, url_direct), sys.stderr)
# were we successful?
if bingo:
tmp_name_txt_file = open(tmp_dst_name)
for phrase in tmp_name_txt_file.xreadlines():
for word in get_words_from_phrase(phrase, stemming_language):
if not words.has_key(word):
words[word] = 1
tmp_name_txt_file.close()
else:
write_message("No conversion success for %s.\n" % (url_direct), sys.stderr)
# delete temp files (they might not exist):
try:
if not no_src_delete:
os.unlink(tmp_name)
os.unlink(tmp_dst_name)
except StandardError:
write_message("Error: Could not delete file. It didn't exist", sys.stderr)
write_message(".... processing %s from %s ended" % (ext, url_direct), verbose=2)
write_message("... reading fulltext files from %s ended" % url_direct_or_indirect, verbose=2)
return words.keys()
latex_markup_re = re.compile(r"\\begin(\[.+?\])?\{.+?\}|\\end\{.+?}|\\\w+(\[.+?\])?\{(?P.*?)\}|\{\\\w+ (?P.*?)\}")
def remove_latex_markup(phrase):
ret_phrase = ''
index = 0
for match in latex_markup_re.finditer(phrase):
ret_phrase += phrase[index:match.start()]
ret_phrase += match.group('inside1') or match.group('inside2') or ''
index = match.end()
ret_phrase += phrase[index:]
return ret_phrase
def get_nothing_from_phrase(phrase, stemming_language=None):
""" A dump implementation of get_words_from_phrase to be used when
when a tag should not be indexed (such as when trying to extract phrases from
8564_u)."""
return []
latex_formula_re = re.compile(r'\$.*?\$|\\\[.*?\\\]')
def get_words_from_phrase(phrase, stemming_language=None):
"""Return list of words found in PHRASE. Note that the phrase is
split into groups depending on the alphanumeric characters and
punctuation characters definition present in the config file.
"""
words = {}
formulas = []
if CFG_BIBINDEX_REMOVE_HTML_MARKUP and phrase.find("") > -1:
phrase = re_html.sub(' ', phrase)
if CFG_BIBINDEX_REMOVE_LATEX_MARKUP:
formulas = latex_formula_re.findall(phrase)
phrase = remove_latex_markup(phrase)
phrase = latex_formula_re.sub(' ', phrase)
phrase = phrase.lower()
# 1st split phrase into blocks according to whitespace
for block in strip_accents(phrase).split():
# 2nd remove leading/trailing punctuation and add block:
block = re_block_punctuation_begin.sub("", block)
block = re_block_punctuation_end.sub("", block)
if block:
if stemming_language:
block = apply_stemming_and_stopwords_and_length_check(block, stemming_language)
if block:
words[block] = 1
# 3rd break each block into subblocks according to punctuation and add subblocks:
for subblock in re_punctuation.split(block):
if stemming_language:
subblock = apply_stemming_and_stopwords_and_length_check(subblock, stemming_language)
if subblock:
words[subblock] = 1
# 4th break each subblock into alphanumeric groups and add groups:
for alphanumeric_group in re_separators.split(subblock):
if stemming_language:
alphanumeric_group = apply_stemming_and_stopwords_and_length_check(alphanumeric_group, stemming_language)
if alphanumeric_group:
words[alphanumeric_group] = 1
for block in formulas:
words[block] = 1
return words.keys()
phrase_delimiter_re = re.compile(r'[\.:;\?\!]')
space_cleaner_re = re.compile(r'\s+')
def get_phrases_from_phrase(phrase, stemming_language=None):
"""Return list of phrases found in PHRASE. Note that the phrase is
split into groups depending on the alphanumeric characters and
punctuation characters definition present in the config file.
"""
words = {}
phrase = strip_accents(phrase)
# 1st split phrase into blocks according to whitespace
for block1 in phrase_delimiter_re.split(strip_accents(phrase)):
block1 = block1.strip()
if block1 and stemming_language:
new_words = []
for block2 in re_punctuation.split(block1):
block2 = block2.strip()
if block2:
for block3 in block2.split():
block3 = block3.strip()
if block3:
block3 = apply_stemming_and_stopwords_and_length_check(block3, stemming_language)
if block3:
new_words.append(block3)
block1 = ' '.join(new_words)
if block1:
words[block1] = 1
return words.keys()
def apply_stemming_and_stopwords_and_length_check(word, stemming_language):
"""Return WORD after applying stemming and stopword and length checks.
See the config file in order to influence these.
"""
# now check against stopwords:
if is_stopword(word):
return ""
# finally check the word length:
if len(word) < CFG_BIBINDEX_MIN_WORD_LENGTH:
return ""
# stem word, when configured so:
if stemming_language:
word = stem(word, stemming_language)
return word
def remove_subfields(s):
"Removes subfields from string, e.g. 'foo $$c bar' becomes 'foo bar'."
return re_subfields.sub(' ', s)
def get_index_id_from_index_name(index_name):
"""Returns the words/phrase index id for INDEXNAME.
Returns empty string in case there is no words table for this index.
Example: field='author', output=4."""
out = 0
query = """SELECT w.id FROM idxINDEX AS w
WHERE w.name='%s' LIMIT 1""" % index_name
res = run_sql(query, None, 1)
if res:
out = res[0][0]
return out
def get_index_tags(indexname):
"""Returns the list of tags that are indexed inside INDEXNAME.
Returns empty list in case there are no tags indexed in this index.
Note: uses get_field_tags() defined before.
Example: field='author', output=['100__%', '700__%']."""
out = []
query = """SELECT f.code FROM idxINDEX AS w, idxINDEX_field AS wf,
field AS f WHERE w.name='%s' AND w.id=wf.id_idxINDEX
AND f.id=wf.id_field""" % indexname
res = run_sql(query)
for row in res:
out.extend(get_field_tags(row[0]))
return out
def get_all_indexes():
"""Returns the list of the names of all defined words indexes.
Returns empty list in case there are no tags indexed in this index.
Example: output=['global', 'author']."""
out = []
query = """SELECT name FROM idxINDEX"""
res = run_sql(query)
for row in res:
out.append(row[0])
return out
def split_ranges(parse_string):
"""Parse a string a return the list or ranges."""
recIDs = []
ranges = parse_string.split(",")
for arange in ranges:
tmp_recIDs = arange.split("-")
if len(tmp_recIDs)==1:
recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[0])])
else:
if int(tmp_recIDs[0]) > int(tmp_recIDs[1]): # sanity check
tmp = tmp_recIDs[0]
tmp_recIDs[0] = tmp_recIDs[1]
tmp_recIDs[1] = tmp
recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[1])])
return recIDs
def get_word_tables(tables):
""" Given a list of table names it return a dictionary of index_id : index_tags.
if tables is empty it returns the whole dictionary."""
wordTables = {}
if tables:
indexes = tables.split(",")
for index in indexes:
index_id = get_index_id_from_index_name(index)
if index_id:
wordTables[index_id] = get_index_tags(index)
else:
write_message("Error: There is no %s words table." % index, sys.stderr)
else:
for index in get_all_indexes():
index_id = get_index_id_index_name(index)
wordTables[index_id] = get_index_tags(index)
return wordTables
def get_date_range(var):
"Returns the two dates contained as a low,high tuple"
limits = var.split(",")
if len(limits)==1:
low = get_datetime(limits[0])
return low, None
if len(limits)==2:
low = get_datetime(limits[0])
high = get_datetime(limits[1])
return low, high
return None, None
def create_range_list(res):
"""Creates a range list from a recID select query result contained
in res. The result is expected to have ascending numerical order."""
if not res:
return []
row = res[0]
if not row:
return []
else:
range_list = [[row[0], row[0]]]
for row in res[1:]:
row_id = row[0]
if row_id == range_list[-1][1] + 1:
range_list[-1][1] = row_id
else:
range_list.append([row_id, row_id])
return range_list
def beautify_range_list(range_list):
"""Returns a non overlapping, maximal range list"""
ret_list = []
for new in range_list:
found = 0
for old in ret_list:
if new[0] <= old[0] <= new[1] + 1 or new[0] - 1 <= old[1] <= new[1]:
old[0] = min(old[0], new[0])
old[1] = max(old[1], new[1])
found = 1
break
if not found:
ret_list.append(new)
return ret_list
def truncate_index_table(index_name):
"""Properly truncate the given index."""
index_id = get_index_id_from_index_name(index_name)
if index_id:
write_message('Truncating %s index table in order to reindex.' % index_name, verbose=2)
run_sql("UPDATE idxINDEX SET last_updated='0000-00-00 00:00:00' WHERE id=%s", (index_id,))
run_sql("TRUNCATE idxWORD%02dF" % index_id)
run_sql("TRUNCATE idxWORD%02dR" % index_id)
run_sql("TRUNCATE idxPHRASE%02dF" % index_id)
run_sql("TRUNCATE idxPHRASE%02dR" % index_id)
class WordTable:
"A class to hold the words table."
def __init__(self, index_id, fields_to_index, table_name_pattern, default_get_words_fnc, tag_to_words_fnc_map, wash_index_terms=True):
"""Creates words table instance.
@param index_id the index integer identificator
@param fields_to_index a list of fields to index
@param table_name_pattern i.e. idxWORD%02dF or idxPHRASE%02dF
@parm default_get_words_fnc the default function called to extract
words from a metadata
@param tag_to_words_fnc_map a mapping to specify particular function to
extract words from particular metdata (such as 8564_u)
"""
self.index_id = index_id
self.tablename = table_name_pattern % index_id
self.recIDs_in_mem = []
self.fields_to_index = fields_to_index
self.value = {}
self.stemming_language = get_index_stemming_language(index_id)
self.wash_index_terms = wash_index_terms
# tagToFunctions mapping. It offers an indirection level necessary for
# indexing fulltext. The default is get_words_from_phrase
self.tag_to_words_fnc_map = tag_to_words_fnc_map
self.default_get_words_fnc = default_get_words_fnc
if self.stemming_language:
write_message('Stemming(%s) is enabled for table %s' % (self.stemming_language, self.tablename))
else:
write_message('Stemming is disabled for table %s' % self.tablename)
def get_field(self, recID, tag):
"""Returns list of values of the MARC-21 'tag' fields for the
record 'recID'."""
out = []
bibXXx = "bib" + tag[0] + tag[1] + "x"
bibrec_bibXXx = "bibrec_" + bibXXx
query = """SELECT value FROM %s AS b, %s AS bb
WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id
AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID, tag);
res = run_sql(query)
for row in res:
out.append(row[0])
return out
def clean(self):
"Cleans the words table."
self.value = {}
def put_into_db(self, mode="normal"):
"""Updates the current words table in the corresponding DB
idxFOO table. Mode 'normal' means normal execution,
mode 'emergency' means words index reverting to old state.
"""
write_message("%s %s wordtable flush started" % (self.tablename, mode), verbose=2)
write_message('...updating %d words into %s started' % \
(len(self.value), self.tablename), verbose=2)
task_update_progress("%s flushed %d/%d words" % (self.tablename, 0, len(self.value)))
self.recIDs_in_mem = beautify_range_list(self.recIDs_in_mem)
if mode == "normal":
for group in self.recIDs_in_mem:
query = """UPDATE %sR SET type='TEMPORARY' WHERE id_bibrec
BETWEEN '%d' AND '%d' AND type='CURRENT'""" % \
(self.tablename[:-1], group[0], group[1])
write_message(query, verbose=9)
run_sql(query)
nb_words_total = len(self.value)
nb_words_report = int(nb_words_total/10.0)
nb_words_done = 0
for word in self.value.keys():
self.put_word_into_db(word)
nb_words_done += 1
if nb_words_report != 0 and ((nb_words_done % nb_words_report) == 0):
write_message('......processed %d/%d words' % (nb_words_done, nb_words_total))
task_update_progress("%s flushed %d/%d words" % (self.tablename, nb_words_done, nb_words_total))
write_message('...updating %d words into %s ended' % \
(nb_words_total, self.tablename), verbose=9)
write_message('...updating reverse table %sR started' % self.tablename[:-1])
if mode == "normal":
for group in self.recIDs_in_mem:
query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec
BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \
(self.tablename[:-1], group[0], group[1])
write_message(query, verbose=9)
run_sql(query)
query = """DELETE FROM %sR WHERE id_bibrec
BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \
(self.tablename[:-1], group[0], group[1])
write_message(query, verbose=9)
run_sql(query)
write_message('End of updating wordTable into %s' % self.tablename, verbose=9)
elif mode == "emergency":
for group in self.recIDs_in_mem:
query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec
BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \
(self.tablename[:-1], group[0], group[1])
write_message(query, verbose=9)
run_sql(query)
query = """DELETE FROM %sR WHERE id_bibrec
BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \
(self.tablename[:-1], group[0], group[1])
write_message(query, verbose=9)
run_sql(query)
write_message('End of emergency flushing wordTable into %s' % self.tablename, verbose=9)
write_message('...updating reverse table %sR ended' % self.tablename[:-1])
self.clean()
self.recIDs_in_mem = []
write_message("%s %s wordtable flush ended" % (self.tablename, mode))
task_update_progress("%s flush ended" % (self.tablename))
def load_old_recIDs(self, word):
"""Load existing hitlist for the word from the database index files."""
query = "SELECT hitlist FROM %s WHERE term=%%s" % self.tablename
res = run_sql(query, (word,))
if res:
return intbitset(res[0][0])
else:
return None
def merge_with_old_recIDs(self,word,set):
"""Merge the system numbers stored in memory (hash of recIDs with value +1 or -1
according to whether to add/delete them) with those stored in the database index
and received in set universe of recIDs for the given word.
Return 0 in case no change was done to SET, return 1 in case SET was changed.
"""
oldset = intbitset(set)
set.update_with_signs(self.value[word])
return set != oldset
def put_word_into_db(self, word):
"""Flush a single word to the database and delete it from memory"""
set = self.load_old_recIDs(word)
if set: # merge the word recIDs found in memory:
if self.merge_with_old_recIDs(word,set) == 0:
# nothing to update:
write_message("......... unchanged hitlist for ``%s''" % word, verbose=9)
pass
else:
# yes there were some new words:
write_message("......... updating hitlist for ``%s''" % word, verbose=9)
run_sql("UPDATE %s SET hitlist=%%s WHERE term=%%s" % self.tablename,
(set.fastdump(), word))
else: # the word is new, will create new set:
write_message("......... inserting hitlist for ``%s''" % word, verbose=9)
set = intbitset(self.value[word].keys())
run_sql("INSERT INTO %s (term, hitlist) VALUES (%%s, %%s)" % self.tablename,
(word, set.fastdump()))
if not set: # never store empty words
run_sql("DELETE from %s WHERE term=%%s" % self.tablename,
(word,))
del self.value[word]
def display(self):
"Displays the word table."
keys = self.value.keys()
keys.sort()
for k in keys:
write_message("%s: %s" % (k, self.value[k]))
def count(self):
"Returns the number of words in the table."
return len(self.value)
def info(self):
"Prints some information on the words table."
write_message("The words table contains %d words." % self.count())
def lookup_words(self, word=""):
"Lookup word from the words table."
if not word:
done = 0
while not done:
try:
word = raw_input("Enter word: ")
done = 1
except (EOFError, KeyboardInterrupt):
return
if self.value.has_key(word):
write_message("The word '%s' is found %d times." \
% (word, len(self.value[word])))
else:
write_message("The word '%s' does not exist in the word file."\
% word)
def update_last_updated(self, starting_time=None):
"""Update last_updated column of the index table in the database.
Puts starting time there so that if the task was interrupted for record download,
the records will be reindexed next time."""
if starting_time is None:
return None
write_message("updating last_updated to %s...", starting_time, verbose=9)
return run_sql("UPDATE idxINDEX SET last_updated=%s WHERE id=%s",
(starting_time, self.index_id,))
def add_recIDs(self, recIDs, opt_flush):
"""Fetches records which id in the recIDs range list and adds
them to the wordTable. The recIDs range list is of the form:
[[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]].
"""
global chunksize, _last_word_table
flush_count = 0
records_done = 0
records_to_go = 0
for arange in recIDs:
records_to_go = records_to_go + arange[1] - arange[0] + 1
time_started = time.time() # will measure profile time
for arange in recIDs:
i_low = arange[0]
chunksize_count = 0
while i_low <= arange[1]:
# calculate chunk group of recIDs and treat it:
i_high = min(i_low+opt_flush-flush_count-1,arange[1])
i_high = min(i_low+chunksize-chunksize_count-1, i_high)
try:
self.chk_recID_range(i_low, i_high)
except StandardError, e:
write_message("Exception caught: %s" % e, sys.stderr)
if task_get_option('verbose') >= 9:
traceback.print_tb(sys.exc_info()[2])
task_update_status("ERROR")
self.put_into_db()
sys.exit(1)
write_message("%s adding records #%d-#%d started" % \
(self.tablename, i_low, i_high))
if CFG_CHECK_MYSQL_THREADS:
kill_sleepy_mysql_threads()
task_update_progress("%s adding recs %d-%d" % (self.tablename, i_low, i_high))
self.del_recID_range(i_low, i_high)
just_processed = self.add_recID_range(i_low, i_high)
flush_count = flush_count + i_high - i_low + 1
chunksize_count = chunksize_count + i_high - i_low + 1
records_done = records_done + just_processed
write_message("%s adding records #%d-#%d ended " % \
(self.tablename, i_low, i_high))
if chunksize_count >= chunksize:
chunksize_count = 0
# flush if necessary:
if flush_count >= opt_flush:
self.put_into_db()
self.clean()
write_message("%s backing up" % (self.tablename))
flush_count = 0
self.log_progress(time_started,records_done,records_to_go)
# iterate:
i_low = i_high + 1
if flush_count > 0:
self.put_into_db()
self.log_progress(time_started,records_done,records_to_go)
def add_recIDs_by_date(self, dates, opt_flush):
"""Add records that were modified between DATES[0] and DATES[1].
If DATES is not set, then add records that were modified since
the last update of the index.
"""
if not dates:
table_id = self.tablename[-3:-1]
query = """SELECT last_updated FROM idxINDEX WHERE id='%s'
""" % table_id
res = run_sql(query)
if not res:
return
if not res[0][0]:
dates = ("0000-00-00", None)
else:
dates = (res[0][0], None)
if dates[1] is None:
res = run_sql("""SELECT b.id FROM bibrec AS b
WHERE b.modification_date >= %s ORDER BY b.id ASC""",
(dates[0],))
elif dates[0] is None:
res = run_sql("""SELECT b.id FROM bibrec AS b
WHERE b.modification_date <= %s ORDER BY b.id ASC""",
(dates[1],))
else:
res = run_sql("""SELECT b.id FROM bibrec AS b
WHERE b.modification_date >= %s AND
b.modification_date <= %s ORDER BY b.id ASC""",
(dates[0], dates[1]))
alist = create_range_list(res)
if not alist:
write_message( "No new records added. %s is up to date" % self.tablename)
else:
self.add_recIDs(alist, opt_flush)
def add_recID_range(self, recID1, recID2):
"""Add records from RECID1 to RECID2."""
wlist = {}
self.recIDs_in_mem.append([recID1,recID2])
# secondly fetch all needed tags:
for tag in self.fields_to_index:
get_words_function = self.tag_to_words_fnc_map.get(tag, self.default_get_words_fnc)
bibXXx = "bib" + tag[0] + tag[1] + "x"
bibrec_bibXXx = "bibrec_" + bibXXx
query = """SELECT bb.id_bibrec,b.value FROM %s AS b, %s AS bb
WHERE bb.id_bibrec BETWEEN %d AND %d
AND bb.id_bibxxx=b.id AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID1, recID2, tag)
res = run_sql(query)
for row in res:
recID,phrase = row
if not wlist.has_key(recID):
wlist[recID] = []
new_words = get_words_function(phrase, stemming_language=self.stemming_language) # ,self.separators
wlist[recID] = list_union(new_words, wlist[recID])
# were there some words for these recIDs found?
if len(wlist) == 0: return 0
recIDs = wlist.keys()
for recID in recIDs:
# was this record marked as deleted?
if "DELETED" in self.get_field(recID, "980__c"):
wlist[recID] = []
write_message("... record %d was declared deleted, removing its word list" % recID, verbose=9)
write_message("... record %d, termlist: %s" % (recID, wlist[recID]), verbose=9)
# put words into reverse index table with FUTURE status:
for recID in recIDs:
run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'FUTURE')" % self.tablename[:-1],
(recID, serialize_via_marshal(wlist[recID])))
# ... and, for new records, enter the CURRENT status as empty:
try:
run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'CURRENT')" % self.tablename[:-1],
(recID, serialize_via_marshal([])))
except DatabaseError:
# okay, it's an already existing record, no problem
pass
# put words into memory word list:
put = self.put
for recID in recIDs:
for w in wlist[recID]:
put(recID, w, 1)
return len(recIDs)
def log_progress(self, start, done, todo):
"""Calculate progress and store it.
start: start time,
done: records processed,
todo: total number of records"""
time_elapsed = time.time() - start
# consistency check
if time_elapsed == 0 or done > todo:
return
time_recs_per_min = done/(time_elapsed/60.0)
write_message("%d records took %.1f seconds to complete.(%1.f recs/min)"\
% (done, time_elapsed, time_recs_per_min))
if time_recs_per_min:
write_message("Estimated runtime: %.1f minutes" % \
((todo-done)/time_recs_per_min))
def put(self, recID, word, sign):
"Adds/deletes a word to the word list."
try:
if self.wash_index_terms:
word = wash_index_term(word)
if self.value.has_key(word):
# the word 'word' exist already: update sign
self.value[word][recID] = sign
else:
self.value[word] = {recID: sign}
except:
write_message("Error: Cannot put word %s with sign %d for recID %s." % (word, sign, recID))
def del_recIDs(self, recIDs):
"""Fetches records which id in the recIDs range list and adds
them to the wordTable. The recIDs range list is of the form:
[[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]].
"""
count = 0
for arange in recIDs:
self.del_recID_range(arange[0],arange[1])
count = count + arange[1] - arange[0]
self.put_into_db()
def del_recID_range(self, low, high):
"""Deletes records with 'recID' system number between low
and high from memory words index table."""
write_message("%s fetching existing words for records #%d-#%d started" % \
(self.tablename, low, high), verbose=3)
self.recIDs_in_mem.append([low,high])
query = """SELECT id_bibrec,termlist FROM %sR as bb WHERE bb.id_bibrec
BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high)
recID_rows = run_sql(query)
for recID_row in recID_rows:
recID = recID_row[0]
wlist = deserialize_via_marshal(recID_row[1])
for word in wlist:
self.put(recID, word, -1)
write_message("%s fetching existing words for records #%d-#%d ended" % \
(self.tablename, low, high), verbose=3)
def report_on_table_consistency(self):
"""Check reverse words index tables (e.g. idxWORD01R) for
interesting states such as 'TEMPORARY' state.
Prints small report (no of words, no of bad words).
"""
# find number of words:
query = """SELECT COUNT(*) FROM %s""" % (self.tablename)
res = run_sql(query, None, 1)
if res:
nb_words = res[0][0]
else:
nb_words = 0
# find number of records:
query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1])
res = run_sql(query, None, 1)
if res:
nb_records = res[0][0]
else:
nb_records = 0
# report stats:
write_message("%s contains %d words from %d records" % (self.tablename, nb_words, nb_records))
# find possible bad states in reverse tables:
query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1])
res = run_sql(query)
if res:
nb_bad_records = res[0][0]
else:
nb_bad_records = 999999999
if nb_bad_records:
write_message("EMERGENCY: %s needs to repair %d of %d records" % \
(self.tablename, nb_bad_records, nb_records))
else:
write_message("%s is in consistent state" % (self.tablename))
return nb_bad_records
def repair(self, opt_flush):
"""Repair the whole table"""
# find possible bad states in reverse tables:
query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1])
res = run_sql(query, None, 1)
if res:
nb_bad_records = res[0][0]
else:
nb_bad_records = 0
if nb_bad_records == 0:
return
query = """SELECT id_bibrec FROM %sR WHERE type <> 'CURRENT' ORDER BY id_bibrec""" \
% (self.tablename[:-1])
res = run_sql(query)
recIDs = create_range_list(res)
flush_count = 0
records_done = 0
records_to_go = 0
for arange in recIDs:
records_to_go = records_to_go + arange[1] - arange[0] + 1
time_started = time.time() # will measure profile time
for arange in recIDs:
i_low = arange[0]
chunksize_count = 0
while i_low <= arange[1]:
# calculate chunk group of recIDs and treat it:
i_high = min(i_low+opt_flush-flush_count-1,arange[1])
i_high = min(i_low+chunksize-chunksize_count-1, i_high)
try:
self.fix_recID_range(i_low, i_high)
except StandardError, e:
write_message("Exception caught: %s" % e, sys.stderr)
if task_get_option['verbose'] >= 9:
traceback.print_tb(sys.exc_info()[2])
task_update_status("ERROR")
self.put_into_db()
sys.exit(1)
flush_count = flush_count + i_high - i_low + 1
chunksize_count = chunksize_count + i_high - i_low + 1
records_done = records_done + i_high - i_low + 1
if chunksize_count >= chunksize:
chunksize_count = 0
# flush if necessary:
if flush_count >= opt_flush:
self.put_into_db("emergency")
self.clean()
flush_count = 0
self.log_progress(time_started,records_done,records_to_go)
# iterate:
i_low = i_high + 1
if flush_count > 0:
self.put_into_db("emergency")
self.log_progress(time_started,records_done,records_to_go)
write_message("%s inconsistencies repaired." % self.tablename)
def chk_recID_range(self, low, high):
"""Check if the reverse index table is in proper state"""
## check db
query = """SELECT COUNT(*) FROM %sR WHERE type <> 'CURRENT'
AND id_bibrec BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high)
res = run_sql(query, None, 1)
if res[0][0]==0:
write_message("%s for %d-%d is in consistent state"%(self.tablename,low,high))
return # okay, words table is consistent
## inconsistency detected!
write_message("EMERGENCY: %s inconsistencies detected..." % self.tablename)
write_message("""EMERGENCY: Errors found. You should check consistency of the %s - %sR tables.\nRunning 'bibindex --repair' is recommended.""" \
% (self.tablename, self.tablename[:-1]))
raise StandardError
def fix_recID_range(self, low, high):
"""Try to fix reverse index database consistency (e.g. table idxWORD01R) in the low,high doc-id range.
Possible states for a recID follow:
CUR TMP FUT: very bad things have happened: warn!
CUR TMP : very bad things have happened: warn!
CUR FUT: delete FUT (crash before flushing)
CUR : database is ok
TMP FUT: add TMP to memory and del FUT from memory
flush (revert to old state)
TMP : very bad things have happened: warn!
FUT: very bad things have happended: warn!
"""
state = {}
query = "SELECT id_bibrec,type FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d'"\
% (self.tablename[:-1], low, high)
res = run_sql(query)
for row in res:
if not state.has_key(row[0]):
state[row[0]]=[]
state[row[0]].append(row[1])
ok = 1 # will hold info on whether we will be able to repair
for recID in state.keys():
if not 'TEMPORARY' in state[recID]:
if 'FUTURE' in state[recID]:
if 'CURRENT' not in state[recID]:
write_message("EMERGENCY: Record %d is in inconsistent state. Can't repair it." % recID)
ok = 0
else:
write_message("EMERGENCY: Inconsistency in record %d detected" % recID)
query = """DELETE FROM %sR
WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID)
run_sql(query)
write_message("EMERGENCY: Inconsistency in record %d repaired." % recID)
else:
if 'FUTURE' in state[recID] and not 'CURRENT' in state[recID]:
self.recIDs_in_mem.append([recID,recID])
# Get the words file
query = """SELECT type,termlist FROM %sR
WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID)
write_message(query, verbose=9)
res = run_sql(query)
for row in res:
wlist = deserialize_via_marshal(row[1])
write_message("Words are %s " % wlist, verbose=9)
if row[0] == 'TEMPORARY':
sign = 1
else:
sign = -1
for word in wlist:
self.put(recID, word, sign)
else:
write_message("EMERGENCY: %s for %d is in inconsistent state. Couldn't repair it." % (self.tablename, recID))
ok = 0
if not ok:
write_message("""EMERGENCY: Unrepairable errors found. You should check consistency
of the %s - %sR tables. Deleting affected entries from these tables
is recommended.""" % (self.tablename, self.tablename[:-1]))
raise StandardError
def test_fulltext_indexing():
"""Tests fulltext indexing programs on PDF, PS, DOC, PPT,
XLS. Prints list of words and word table on the screen. Does not
integrate anything into the database. Useful when debugging
problems with fulltext indexing: call this function instead of main().
"""
print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=atlnot&categ=Communication&id=com-indet-2002-012") # protected URL
print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=agenda&categ=a00388&id=a00388s2t7") # XLS
print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=agenda&categ=a02883&id=a02883s1t6/transparencies") # PPT
print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=agenda&categ=a99149&id=a99149s1t10/transparencies") # DOC
print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=preprint&categ=cern&id=lhc-project-report-601") # PDF
sys.exit(0)
def main():
"""Main that construct all the bibtask."""
task_set_option('cmd', 'add')
task_set_option('id', [])
task_set_option("modified", [])
task_set_option("collection", [])
task_set_option("maxmem", 0)
task_set_option("flush", 10000)
task_set_option("windex", ','.join(get_all_indexes()))
task_set_option("reindex", False)
task_init(authorization_action='runbibindex',
authorization_msg="BibIndex Task Submission",
description="""Examples:
\t%s -a -i 234-250,293,300-500 -u admin@localhost
\t%s -a -w author,fulltext -M 8192 -v3
\t%s -d -m +4d -A on --flush=10000\n""" % ((sys.argv[0],) * 3), help_specific_usage=""" Indexing options:
-a, --add\t\tadd or update words for selected records
-d, --del\t\tdelete words for selected records
-i, --id=low[-high]\t\tselect according to doc recID
-m, --modified=from[,to]\tselect according to modification date
-c, --collection=c1[,c2]\tselect according to collection
-R, --reindex\treindex the selected indexes from scratch
Repairing options:
-k, --check\t\tcheck consistency for all records in the table(s)
-r, --repair\t\ttry to repair all records in the table(s)
Specific options:
-w, --windex=w1[,w2]\tword/phrase indexes to consider (all)
-M, --maxmem=XXX\tmaximum memory usage in kB (no limit)
-f, --flush=NNN\t\tfull consistent table flush after NNN records (10000)
""",
version=__revision__,
specific_params=("adi:m:c:w:krRM:f:", [
"add",
"del",
"id=",
"modified=",
"collection=",
"windex=",
"check",
"repair",
"reindex",
"maxmem=",
"flush=",
]),
task_stop_helper_fnc=task_stop_table_close_fnc,
task_submit_elaborate_specific_parameter_fnc=task_submit_elaborate_specific_parameter,
task_run_fnc=task_run_core)
def task_submit_elaborate_specific_parameter(key, value, opts, args):
""" Given the string key it checks it's meaning, eventually using the
value. Usually it fills some key in the options dict.
It must return True if it has elaborated the key, False, if it doesn't
know that key.
eg:
if key in ['-n', '--number']:
self.options['number'] = value
return True
return False
"""
if key in ("-a", "--add"):
task_set_option("cmd", "add")
if ("-x","") in opts or ("--del","") in opts:
raise StandardError, "Can not have --add and --del at the same time!"
elif key in ("-k", "--check"):
task_set_option("cmd", "check")
elif key in ("-r", "--repair"):
task_set_option("cmd", "repair")
elif key in ("-d", "--del"):
task_set_option("cmd", "del")
elif key in ("-i", "--id"):
task_set_option('id', task_get_option('id') + split_ranges(value))
elif key in ("-m", "--modified"):
task_set_option("modified", get_date_range(value))
elif key in ("-c", "--collection"):
task_set_option("collection", value)
elif key in ("-R", "--reindex"):
task_set_option("reindex", True)
elif key in ("-w", "--windex"):
task_set_option("windex", value)
elif key in ("-M", "--maxmem"):
task_set_option("maxmem", int(value))
if task_get_option("maxmem") < base_process_size + 1000:
raise StandardError, "Memory usage should be higher than %d kB" % \
(base_process_size + 1000)
elif key in ("-f", "--flush"):
task_set_option("flush", int(value))
else:
return False
return True
def task_stop_table_close_fnc():
""" Close tables to STOP. """
global _last_word_table
if _last_word_table:
_last_word_table.put_into_db()
def task_run_core():
"""Runs the task by fetching arguments from the BibSched task queue. This is
what BibSched will be invoking via daemon call.
The task prints Fibonacci numbers for up to NUM on the stdout, and some
messages on stderr.
Return 1 in case of success and 0 in case of failure."""
global _last_word_table
if task_get_option("cmd") == "check":
wordTables = get_word_tables(task_get_option("windex"))
for index_id, index_tags in wordTables.iteritems():
wordTable = WordTable(index_id, index_tags, 'idxWORD%02dF', get_words_from_phrase, {'8564_u': get_words_from_fulltext})
_last_word_table = wordTable
wordTable.report_on_table_consistency()
_last_word_table = None
return True
if False: # FIXME: remove when idxPHRASE will be plugged to search_engine
if task_get_option("cmd") == "check":
wordTables = get_word_tables(task_get_option("windex"))
for index_id, index_tags in wordTables.iteritems():
wordTable = WordTable(index_id, index_tags, 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564_u': get_nothing_from_phrase}, False)
_last_word_table = wordTable
wordTable.report_on_table_consistency()
_last_word_table = None
return True
if task_get_option("reindex"):
for index_name in task_get_option("windex").split(','):
truncate_index_table(index_name)
# Let's work on single words!
wordTables = get_word_tables(task_get_option("windex"))
for index_id, index_tags in wordTables.iteritems():
wordTable = WordTable(index_id, index_tags, 'idxWORD%02dF', get_words_from_phrase, {'8564_u': get_words_from_fulltext})
_last_word_table = wordTable
wordTable.report_on_table_consistency()
try:
if task_get_option("cmd") == "del":
if task_get_option("id"):
wordTable.del_recIDs(task_get_option("id"))
elif task_get_option("collection"):
l_of_colls = task_get_option("collection").split(",")
recIDs = perform_request_search(c=l_of_colls)
recIDs_range = []
for recID in recIDs:
recIDs_range.append([recID,recID])
wordTable.del_recIDs(recIDs_range)
else:
write_message("Missing IDs of records to delete from index %s." % wordTable.tablename,
sys.stderr)
raise StandardError
elif task_get_option("cmd") == "add":
if task_get_option("id"):
wordTable.add_recIDs(task_get_option("id"), task_get_option("flush"))
elif task_get_option("collection"):
l_of_colls = task_get_option("collection").split(",")
recIDs = perform_request_search(c=l_of_colls)
recIDs_range = []
for recID in recIDs:
recIDs_range.append([recID,recID])
wordTable.add_recIDs(recIDs_range, task_get_option("flush"))
else:
wordTable.add_recIDs_by_date(task_get_option("modified"), task_get_option("flush"))
# only update last_updated if run via automatic mode:
wordTable.update_last_updated(task_get_task_param('task_starting_time'))
elif task_get_option("cmd") == "repair":
wordTable.repair(task_get_option("flush"))
else:
write_message("Invalid command found processing %s" % \
wordTable.tablename, sys.stderr)
raise StandardError
except StandardError, e:
write_message("Exception caught: %s" % e, sys.stderr)
if task_get_option("verbose") >= 8:
traceback.print_tb(sys.exc_info()[2])
task_update_status("ERROR")
if _last_word_table:
_last_word_table.put_into_db()
sys.exit(1)
wordTable.report_on_table_consistency()
if False: # FIXME: remove when idxPHRASE will be plugged to search_engine
# Let's work on phrases now
wordTables = get_word_tables(task_get_option("windex"))
for index_id, index_tags in wordTables.iteritems():
wordTable = WordTable(index_id, index_tags, 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564_u': get_nothing_from_phrase}, False)
_last_word_table = wordTable
wordTable.report_on_table_consistency()
try:
if task_get_option("cmd") == "del":
if task_get_option("id"):
wordTable.del_recIDs(task_get_option("id"))
elif task_get_option("collection"):
l_of_colls = task_get_option("collection").split(",")
recIDs = perform_request_search(c=l_of_colls)
recIDs_range = []
for recID in recIDs:
recIDs_range.append([recID,recID])
wordTable.del_recIDs(recIDs_range)
else:
write_message("Missing IDs of records to delete from index %s." % wordTable.tablename,
sys.stderr)
raise StandardError
elif task_get_option("cmd") == "add":
if task_get_option("id"):
wordTable.add_recIDs(task_get_option("id"), task_get_option("flush"))
elif task_get_option("collection"):
l_of_colls = task_get_option("collection").split(",")
recIDs = perform_request_search(c=l_of_colls)
recIDs_range = []
for recID in recIDs:
recIDs_range.append([recID,recID])
wordTable.add_recIDs(recIDs_range, task_get_option("flush"))
else:
wordTable.add_recIDs_by_date(task_get_option("modified"), task_get_option("flush"))
# only update last_updated if run via automatic mode:
wordTable.update_last_updated(task_get_task_param('task_starting_time'))
elif task_get_option("cmd") == "repair":
wordTable.repair(task_get_option("flush"))
else:
write_message("Invalid command found processing %s" % \
wordTable.tablename, sys.stderr)
raise StandardError
except StandardError, e:
write_message("Exception caught: %s" % e, sys.stderr)
if task_get_option("verbose") >= 9:
traceback.print_tb(sys.exc_info()[2])
task_update_status("ERROR")
if _last_word_table:
_last_word_table.put_into_db()
sys.exit(1)
wordTable.report_on_table_consistency()
_last_word_table = None
return True
### okay, here we go:
if __name__ == '__main__':
main()
diff --git a/modules/bibindex/lib/bibindex_engine_config.py b/modules/bibindex/lib/bibindex_engine_config.py
index e41d2ba9c..eda01a752 100644
--- a/modules/bibindex/lib/bibindex_engine_config.py
+++ b/modules/bibindex/lib/bibindex_engine_config.py
@@ -1,67 +1,67 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
BibIndex indexing engine configuration parameters.
"""
__revision__ = \
"$Id$"
## configuration parameters read from the general config file:
from invenio.config import \
- version, cdsname,\
+ CFG_VERSION, cdsname,\
CFG_PATH_PDFTOTEXT, \
CFG_PATH_PSTOTEXT, \
CFG_PATH_PSTOASCII, \
CFG_PATH_ANTIWORD, \
CFG_PATH_CATDOC, \
CFG_PATH_WVTEXT, \
CFG_PATH_PPTHTML, \
CFG_PATH_XLHTML, \
CFG_PATH_HTMLTOTEXT, \
CFG_PATH_GZIP
## version number:
-BIBINDEX_ENGINE_VERSION = "CDS Invenio/%s bibindex/%s" % (version, version)
+BIBINDEX_ENGINE_VERSION = "CDS Invenio/%s bibindex/%s" % (CFG_VERSION, CFG_VERSION)
## programs used to convert fulltext files to text:
CONV_PROGRAMS = { ### PS switched off at the moment, since PDF is faster
#"ps": [CFG_PATH_PSTOTEXT, CFG_PATH_PSTOASCII],
#"ps.gz": [CFG_PATH_PSTOTEXT, CFG_PATH_PSTOASCII],
"pdf": [CFG_PATH_PDFTOTEXT, CFG_PATH_PSTOTEXT, CFG_PATH_PSTOASCII],
"doc": [CFG_PATH_ANTIWORD, CFG_PATH_CATDOC, CFG_PATH_WVTEXT],
"ppt": [CFG_PATH_PPTHTML],
"xls": [CFG_PATH_XLHTML],
"htm": [CFG_PATH_HTMLTOTEXT],
"html": [CFG_PATH_HTMLTOTEXT],}
## helper programs used if the above programs convert only to html or
## other intermediate file formats:
CONV_PROGRAMS_HELPERS = {"html": CFG_PATH_HTMLTOTEXT,
"gz": CFG_PATH_GZIP}
## safety parameters concerning DB thread-multiplication problem:
CFG_CHECK_MYSQL_THREADS = 0 # to check or not to check the problem?
CFG_MAX_MYSQL_THREADS = 50 # how many threads (connections) we
# consider as still safe
CFG_MYSQL_THREAD_TIMEOUT = 20 # we'll kill threads that were sleeping
# for more than X seconds
diff --git a/modules/bibindex/lib/bibindexadminlib.py b/modules/bibindex/lib/bibindexadminlib.py
index 55107d153..83525c9b2 100644
--- a/modules/bibindex/lib/bibindexadminlib.py
+++ b/modules/bibindex/lib/bibindexadminlib.py
@@ -1,1701 +1,1701 @@
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""CDS Invenio BibIndex Administrator Interface."""
__revision__ = "$Id$"
import cgi
import re
import os
import urllib
import time
import random
from zlib import compress,decompress
from invenio.config import \
cdslang, \
- version, \
+ CFG_VERSION, \
weburl, \
- bindir
+ CFG_BINDIR
from invenio.bibrankadminlib import write_outcome,modify_translations,get_def_name,get_i8n_name,get_name,get_rnk_nametypes,get_languages,check_user,is_adminuser,addadminbox,tupletotable,tupletotable_onlyselected,addcheckboxes,createhiddenform
from invenio.dbquery import run_sql, get_table_status_info
from invenio.webpage import page, pageheaderonly, pagefooteronly, adderrorbox
from invenio.webuser import getUid, get_email
from invenio.bibindex_engine_stemmer import get_stemming_language_map
import invenio.template
websearch_templates = invenio.template.load('websearch')
def getnavtrail(previous = ''):
"""Get the navtrail"""
navtrail = """Admin Area """ % (weburl,)
navtrail = navtrail + previous
return navtrail
def perform_index(ln=cdslang, mtype='', content=''):
"""start area for modifying indexes
mtype - the method that called this method.
content - the output from that method."""
fin_output = """
""" % (weburl, ln, weburl, ln, weburl, ln, weburl, ln, weburl)
if mtype == "perform_showindexoverview" and content:
fin_output += content
elif mtype == "perform_showindexoverview" or not mtype:
fin_output += perform_showindexoverview(ln, callback='')
if mtype == "perform_editindexes" and content:
fin_output += content
elif mtype == "perform_editindexes" or not mtype:
fin_output += perform_editindexes(ln, callback='')
if mtype == "perform_addindex" and content:
fin_output += content
elif mtype == "perform_addindex" or not mtype:
fin_output += perform_addindex(ln, callback='')
return addadminbox("Menu", [fin_output])
def perform_field(ln=cdslang, mtype='', content=''):
"""Start area for modifying fields
mtype - the method that called this method.
content - the output from that method."""
fin_output = """
""" % (weburl, ln, weburl, ln, weburl, ln, weburl, ln, weburl)
if mtype == "perform_editfields" and content:
fin_output += content
elif mtype == "perform_editfields" or not mtype:
fin_output += perform_editfields(ln, callback='')
if mtype == "perform_addfield" and content:
fin_output += content
elif mtype == "perform_addfield" or not mtype:
fin_output += perform_addfield(ln, callback='')
if mtype == "perform_showfieldoverview" and content:
fin_output += content
elif mtype == "perform_showfieldoverview" or not mtype:
fin_output += perform_showfieldoverview(ln, callback='')
return addadminbox("Menu", [fin_output])
def perform_editfield(fldID, ln=cdslang, mtype='', content='', callback='yes', confirm=-1):
"""form to modify a field. this method is calling other methods which again is calling this and sending back the output of the method.
if callback, the method will call perform_editcollection, if not, it will just return its output.
fldID - id of the field
mtype - the method that called this method.
content - the output from that method."""
fld_dict = dict(get_def_name('', "field"))
if fldID in [-1, "-1"]:
return addadminbox("Edit logical field", ["""Please go back and select a logical field"""])
fin_output = """
""" % (weburl, fldID, ln, weburl, fldID, ln, weburl, fldID, ln, weburl, fldID, ln, weburl, fldID, ln, weburl, fldID, ln)
if mtype == "perform_modifyfield" and content:
fin_output += content
elif mtype == "perform_modifyfield" or not mtype:
fin_output += perform_modifyfield(fldID, ln, callback='')
if mtype == "perform_modifyfieldtranslations" and content:
fin_output += content
elif mtype == "perform_modifyfieldtranslations" or not mtype:
fin_output += perform_modifyfieldtranslations(fldID, ln, callback='')
if mtype == "perform_modifyfieldtags" and content:
fin_output += content
elif mtype == "perform_modifyfieldtags" or not mtype:
fin_output += perform_modifyfieldtags(fldID, ln, callback='')
if mtype == "perform_deletefield" and content:
fin_output += content
elif mtype == "perform_deletefield" or not mtype:
fin_output += perform_deletefield(fldID, ln, callback='')
return addadminbox("Edit logical field '%s'" % fld_dict[int(fldID)], [fin_output])
def perform_editindex(idxID, ln=cdslang, mtype='', content='', callback='yes', confirm=-1):
"""form to modify a index. this method is calling other methods which again is calling this and sending back the output of the method.
idxID - id of the index
mtype - the method that called this method.
content - the output from that method."""
if idxID in [-1, "-1"]:
return addadminbox("Edit index", ["""Please go back and select a index"""])
fin_output = """
"
body = [output]
if callback:
return perform_index(ln, "perform_showindexoverview", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_editindexes(ln=cdslang, callback='yes', content='', confirm=-1):
"""show a list of indexes that can be edited."""
subtitle = """2. Edit index [?]""" % (weburl)
fin_output = ''
idx = get_idx()
output = ""
if len(idx) > 0:
text = """
Index name
"""
output += createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/editindex" % weburl,
text=text,
button="Edit",
ln=ln,
confirm=1)
else:
output += """No indexes exists"""
body = [output]
if callback:
return perform_index(ln, "perform_editindexes", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_editfields(ln=cdslang, callback='yes', content='', confirm=-1):
"""show a list of all logical fields that can be edited."""
subtitle = """5. Edit logical field [?]""" % (weburl)
fin_output = ''
res = get_fld()
output = ""
if len(res) > 0:
text = """
Field name
"""
output += createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/editfield" % weburl,
text=text,
button="Edit",
ln=ln,
confirm=1)
else:
output += """No logical fields exists"""
body = [output]
if callback:
return perform_field(ln, "perform_editfields", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_addindex(ln=cdslang, idxNAME='', callback="yes", confirm=-1):
"""form to add a new index.
idxNAME - the name of the new index"""
output = ""
subtitle = """3. Add new index"""
text = """
Index name
""" % idxNAME
output = createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/addindex" % weburl,
text=text,
ln=ln,
button="Add index",
confirm=1)
if idxNAME and confirm in ["1", 1]:
res = add_idx(idxNAME)
output += write_outcome(res) + """ Configure this index.""" % (weburl, res[1], ln)
elif confirm not in ["-1", -1]:
output += """Please give the index a name.
"""
body = [output]
if callback:
return perform_index(ln, "perform_addindex", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_modifyindextranslations(idxID, ln=cdslang, sel_type='', trans=[], confirm=-1, callback='yes'):
"""Modify the translations of a index
sel_type - the nametype to modify
trans - the translations in the same order as the languages from get_languages()"""
output = ''
subtitle = ''
cdslangs = get_languages()
if confirm in ["2", 2] and idxID:
finresult = modify_translations(idxID, cdslangs, sel_type, trans, "idxINDEX")
idx_dict = dict(get_def_name('', "idxINDEX"))
if idxID and idx_dict.has_key(int(idxID)):
idxID = int(idxID)
subtitle = """2. Modify translations for index. [?]""" % weburl
if type(trans) is str:
trans = [trans]
if sel_type == '':
sel_type = get_idx_nametypes()[0][0]
header = ['Language', 'Translation']
actions = []
types = get_idx_nametypes()
if len(types) > 1:
text = """
Name type
"""
output += createhiddenform(action="modifyindextranslations#2",
text=text,
button="Select",
idxID=idxID,
ln=ln,
confirm=0)
if confirm in [-1, "-1", 0, "0"]:
trans = []
for (key, value) in cdslangs:
try:
trans_names = get_name(idxID, key, sel_type, "idxINDEX")
trans.append(trans_names[0][0])
except StandardError, e:
trans.append('')
for nr in range(0,len(cdslangs)):
actions.append(["%s %s" % (cdslangs[nr][1], (cdslangs[nr][0]==cdslang and '(def)' or ''))])
actions[-1].append('' % trans[nr])
text = tupletotable(header=header, tuple=actions)
output += createhiddenform(action="modifyindextranslations#2",
text=text,
button="Modify",
idxID=idxID,
sel_type=sel_type,
ln=ln,
confirm=2)
if sel_type and len(trans):
if confirm in ["2", 2]:
output += write_outcome(finresult)
body = [output]
if callback:
return perform_editindex(idxID, ln, "perform_modifyindextranslations", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_modifyfieldtranslations(fldID, ln=cdslang, sel_type='', trans=[], confirm=-1, callback='yes'):
"""Modify the translations of a field
sel_type - the nametype to modify
trans - the translations in the same order as the languages from get_languages()"""
output = ''
subtitle = ''
cdslangs = get_languages()
if confirm in ["2", 2] and fldID:
finresult = modify_translations(fldID, cdslangs, sel_type, trans, "field")
fld_dict = dict(get_def_name('', "field"))
if fldID and fld_dict.has_key(int(fldID)):
fldID = int(fldID)
subtitle = """3. Modify translations for logical field '%s' [?]""" % (fld_dict[fldID], weburl)
if type(trans) is str:
trans = [trans]
if sel_type == '':
sel_type = get_fld_nametypes()[0][0]
header = ['Language', 'Translation']
actions = []
types = get_fld_nametypes()
if len(types) > 1:
text = """
Name type
"""
output += createhiddenform(action="modifyfieldtranslations#3",
text=text,
button="Select",
fldID=fldID,
ln=ln,
confirm=0)
if confirm in [-1, "-1", 0, "0"]:
trans = []
for (key, value) in cdslangs:
try:
trans_names = get_name(fldID, key, sel_type, "field")
trans.append(trans_names[0][0])
except StandardError, e:
trans.append('')
for nr in range(0,len(cdslangs)):
actions.append(["%s %s" % (cdslangs[nr][1], (cdslangs[nr][0]==cdslang and '(def)' or ''))])
actions[-1].append('' % trans[nr])
text = tupletotable(header=header, tuple=actions)
output += createhiddenform(action="modifyfieldtranslations#3",
text=text,
button="Modify",
fldID=fldID,
sel_type=sel_type,
ln=ln,
confirm=2)
if sel_type and len(trans):
if confirm in ["2", 2]:
output += write_outcome(finresult)
body = [output]
if callback:
return perform_editfield(fldID, ln, "perform_modifytranslations", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_showdetailsfieldtag(fldID, tagID, ln=cdslang, callback="yes", confirm=-1):
"""form to add a new field.
fldNAME - the name of the new field
code - the field code"""
fld_dict = dict(get_def_name('', "field"))
fldID = int(fldID)
tagname = run_sql("SELECT name from tag where id=%s" % tagID)[0][0]
output = ""
subtitle = """Showing details for MARC tag '%s'""" % tagname
output += " This MARC tag is used directly in these logical fields: "
fld_tag = get_fld_tags('', tagID)
exist = {}
for (id_field,id_tag, tname, tvalue, score) in fld_tag:
output += "%s, " % fld_dict[int(id_field)]
exist[id_field] = 1
output += " This MARC tag is used indirectly in these logical fields: "
tag = run_sql("SELECT value from tag where id=%s" % id_tag)
tag = tag[0][0]
for i in range(0, len(tag) - 1):
res = run_sql("SELECT id_field,id_tag FROM field_tag,tag WHERE tag.id=field_tag.id_tag AND tag.value='%s%%'" % tag[0:i])
for (id_field, id_tag) in res:
output += "%s, " % fld_dict[int(id_field)]
exist[id_field] = 1
res = run_sql("SELECT id_field,id_tag FROM field_tag,tag WHERE tag.id=field_tag.id_tag AND tag.value like '%s'" % tag)
for (id_field, id_tag) in res:
if not exist.has_key(id_field):
output += "%s, " % fld_dict[int(id_field)]
body = [output]
if callback:
return perform_modifyfieldtags(fldID, ln, "perform_showdetailsfieldtag", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_showdetailsfield(fldID, ln=cdslang, callback="yes", confirm=-1):
"""form to add a new field.
fldNAME - the name of the new field
code - the field code"""
fld_dict = dict(get_def_name('', "field"))
col_dict = dict(get_def_name('', "collection"))
fldID = int(fldID)
col_fld = get_col_fld('', '', fldID)
sort_types = dict(get_sort_nametypes())
fin_output = ""
subtitle = """5. Show usage for logical field '%s'""" % fld_dict[fldID]
output = "This logical field is used in these collections: "
ltype = ''
exist = {}
for (id_collection, id_field, id_fieldvalue, ftype, score, score_fieldvalue) in col_fld:
if ltype != ftype:
output += " %s: " % sort_types[ftype]
ltype = ftype
exist = {}
if not exist.has_key(id_collection):
output += "%s, " % col_dict[int(id_collection)]
exist[id_collection] = 1
if not col_fld:
output = "This field is not used by any collections."
fin_output = addadminbox('Collections', [output])
body = [fin_output]
if callback:
return perform_editfield(ln, "perform_showdetailsfield", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_addfield(ln=cdslang, fldNAME='', code='', callback="yes", confirm=-1):
"""form to add a new field.
fldNAME - the name of the new field
code - the field code"""
output = ""
subtitle = """6. Add new logical field"""
code = str.replace(code,' ', '')
text = """
Field name Field code
""" % (fldNAME, code)
output = createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/addfield" % weburl,
text=text,
ln=ln,
button="Add field",
confirm=1)
if fldNAME and code and confirm in ["1", 1]:
res = add_fld(fldNAME, code)
output += write_outcome(res)
elif confirm not in ["-1", -1]:
output += """Please give the logical field a name and code.
"""
body = [output]
if callback:
return perform_field(ln, "perform_addfield", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_deletefield(fldID, ln=cdslang, callback='yes', confirm=0):
"""form to remove a field.
fldID - the field id from table field.
"""
fld_dict = dict(get_def_name('', "field"))
if not fld_dict.has_key(int(fldID)):
return """Field does not exist"""
subtitle = """4. Delete the logical field '%s' [?]""" % (fld_dict[int(fldID)], weburl)
output = ""
if fldID:
fldID = int(fldID)
if confirm in ["0", 0]:
check = run_sql("SELECT id_field from idxINDEX_field where id_field=%s" % fldID)
text = ""
if check:
text += """This field is used in an index, deletion may cause problems. """
text += """Do you want to delete the logical field '%s' and all its relations and definitions.""" % (fld_dict[fldID])
output += createhiddenform(action="deletefield#4",
text=text,
button="Confirm",
fldID=fldID,
confirm=1)
elif confirm in ["1", 1]:
res = delete_fld(fldID)
if res[0] == 1:
return """ Field deleted.""" + write_outcome(res)
else:
output += write_outcome(res)
body = [output]
if callback:
return perform_editfield(fldID, ln, "perform_deletefield", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_deleteindex(idxID, ln=cdslang, callback='yes', confirm=0):
"""form to delete an index.
idxID - the index id from table idxINDEX.
"""
if idxID:
subtitle = """5. Delete the index. [?]""" % weburl
output = ""
if confirm in ["0", 0]:
idx = get_idx(idxID)
if idx:
text = ""
text += """By deleting an index, you may also loose any indexed data in the forward and reverse table for this index. """
text += """Do you want to delete the index '%s' and all its relations and definitions.""" % (idx[0][1])
output += createhiddenform(action="deleteindex#5",
text=text,
button="Confirm",
idxID=idxID,
confirm=1)
else:
return """ Index specified does not exist."""
elif confirm in ["1", 1]:
res = delete_idx(idxID)
if res[0] == 1:
return """ Index deleted.""" + write_outcome(res)
else:
output += write_outcome(res)
body = [output]
if callback:
return perform_editindex(idxID, ln, "perform_deleteindex", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_showfieldoverview(ln=cdslang, callback='', confirm=0):
subtitle = """4. Logical fields overview"""
output = """
"""
output += """
%s
%s
%s
""" % ("Field", "MARC Tags", "Translations")
query = "SELECT id,name FROM field"
res = run_sql(query)
col_dict = dict(get_def_name('', "collection"))
fld_dict = dict(get_def_name('', "field"))
for field_id,field_name in res:
query = "SELECT tag.value FROM tag, field_tag WHERE tag.id=field_tag.id_tag AND field_tag.id_field=%d ORDER BY field_tag.score DESC,tag.value ASC" % field_id
res = run_sql(query)
field_tags = ""
for row in res:
field_tags = field_tags + row[0] + ", "
if field_tags.endswith(", "):
field_tags = field_tags[:-2]
if not field_tags:
field_tags = """None"""
lang = get_lang_list("fieldname", "id_field", field_id)
output += """
"
body = [output]
if callback:
return perform_field(ln, "perform_showfieldoverview", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_modifyindex(idxID, ln=cdslang, idxNAME='', idxDESC='', callback='yes', confirm=-1):
"""form to modify an index name.
idxID - the index name to change.
idxNAME - new name of index
idxDESC - description of index content"""
subtitle = ""
output = ""
if idxID not in [-1, "-1"]:
subtitle = """1. Modify index name. [?]""" % weburl
if confirm in [-1, "-1"]:
idx = get_idx(idxID)
idxNAME = idx[0][1]
idxDESC = idx[0][2]
text = """
Index name Index description
""" % (idxNAME, idxDESC)
output += createhiddenform(action="modifyindex#1",
text=text,
button="Modify",
idxID=idxID,
ln=ln,
confirm=1)
if idxID > -1 and idxNAME and confirm in [1, "1"]:
res = modify_idx(idxID, idxNAME, idxDESC)
output += write_outcome(res)
elif confirm in [1, "1"]:
output += """ Please give a name for the index."""
else:
output = """No index to modify."""
body = [output]
if callback:
return perform_editindex(idxID, ln, "perform_modifyindex", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_modifyindexstemming(idxID, ln=cdslang, idxSTEM='', callback='yes', confirm=-1):
"""form to modify an index name.
idxID - the index name to change.
idxSTEM - new stemming language code"""
subtitle = ""
output = ""
stemming_language_map = get_stemming_language_map()
stemming_language_map['None'] = ''
if idxID not in [-1, "-1"]:
subtitle = """4. Modify index stemming language. [?]""" % weburl
if confirm in [-1, "-1"]:
idx = get_idx(idxID)
idxSTEM = idx[0][4]
if not idxSTEM:
idxSTEM = ''
language_html_element = """"""
text = """
Index stemming language
""" + language_html_element
output += createhiddenform(action="modifyindexstemming#4",
text=text,
button="Modify",
idxID=idxID,
ln=ln,
confirm=0)
if confirm in [0, "0"] and get_idx(idxID)[0][4] == idxSTEM:
output += """Stemming language has not been changed"""
elif confirm in [0, "0"]:
text = """
You are going to change the stemming language for this index. Please note you should not enable stemming for structured-data indexes like "report number", "year", "author" or "collection". On the contrary, it is advisable to enable stemming for indexes like "fulltext", "abstract", "title", etc. since this would improve retrieval quality. Beware that after changing the stemming language of an index you will have to reindex it. It is a good idea to change the stemming language and to reindex during low usage hours of your service, since searching will be not functional until the reindexing will be completed. Are you sure you want to change the stemming language of this index?"""
output += createhiddenform(action="modifyindexstemming#4",
text=text,
button="Modify",
idxID=idxID,
idxSTEM=idxSTEM,
ln=ln,
confirm=1)
elif idxID > -1 and confirm in [1, "1"]:
res = modify_idx_stemming(idxID, idxSTEM)
output += write_outcome(res)
output += """ Please note you must run as soon as possible:
$> %s/bibindex --reindex -w %s
- """ % (bindir, get_idx(idxID)[0][1])
+ """ % (CFG_BINDIR, get_idx(idxID)[0][1])
elif confirm in [1, "1"]:
output += """ Please give a name for the index."""
else:
output = """No index to modify."""
body = [output]
if callback:
return perform_editindex(idxID, ln, "perform_modifyindexstemming", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_modifyfield(fldID, ln=cdslang, code='', callback='yes', confirm=-1):
"""form to modify a field.
fldID - the field to change."""
subtitle = ""
output = ""
fld_dict = dict(get_def_name('', "field"))
if fldID not in [-1, "-1"]:
if confirm in [-1, "-1"]:
res = get_fld(fldID)
code = res[0][2]
else:
code = str.replace("%s" % code, " ", "")
fldID = int(fldID)
subtitle = """1. Modify field code for logical field '%s' [?]""" % (fld_dict[int(fldID)], weburl)
text = """
Field code
""" % code
output += createhiddenform(action="modifyfield#2",
text=text,
button="Modify",
fldID=fldID,
ln=ln,
confirm=1)
if fldID > -1 and confirm in [1, "1"]:
fldID = int(fldID)
res = modify_fld(fldID, code)
output += write_outcome(res)
else:
output = """No field to modify.
"""
body = [output]
if callback:
return perform_editfield(fldID, ln, "perform_modifyfield", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_modifyindexfields(idxID, ln=cdslang, callback='yes', content='', confirm=-1):
"""Modify which logical fields to use in this index.."""
output = ''
subtitle = """3. Modify index fields. [?]""" % weburl
output = """
""" % (weburl, fldID, ln, weburl, fldID, ln)
header = ['', 'Value', 'Comment', 'Actions']
actions = []
res = get_fld_tags(fldID)
if len(res) > 0:
i = 0
for (fldID, tagID, tname, tvalue, score) in res:
move = ""
if i != 0:
move += """""" % (weburl, fldID, tagID, res[i - 1][1], ln, random.randint(0, 1000), weburl)
else:
move += " "
i += 1
if i != len(res):
move += '' % (weburl, fldID, tagID, res[i][1], ln, random.randint(0, 1000), weburl)
actions.append([move, tvalue, tname])
for col in [(('Details','showdetailsfieldtag'), ('Modify','modifytag'),('Remove','removefieldtag'),)]:
actions[-1].append('%s' % (weburl, col[0][1], fldID, tagID, ln, col[0][0]))
for (str, function) in col[1:]:
actions[-1][-1] += ' / %s' % (weburl, function, fldID, tagID, ln, str)
output += tupletotable(header=header, tuple=actions)
else:
output += """No fields exists"""
output += content
body = [output]
if callback:
return perform_editfield(fldID, ln, "perform_modifyfieldtags", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_addtag(fldID, ln=cdslang, value=['',-1], name='', callback="yes", confirm=-1):
"""form to add a new field.
fldNAME - the name of the new field
code - the field code"""
output = ""
subtitle = """Add MARC tag to logical field"""
text = """
Add new tag: Tag value Tag comment
""" % ((name=='' and value[0] or name), value[0])
text += """Or existing tag: Tag
"""
output = createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/addtag" % weburl,
text=text,
fldID=fldID,
ln=ln,
button="Add tag",
confirm=1)
if (value[0] and value[1] in [-1, "-1"]) or (not value[0] and value[1] not in [-1, "-1"]):
if confirm in ["1", 1]:
res = add_fld_tag(fldID, name, (value[0] !='' and value[0] or value[1]))
output += write_outcome(res)
elif confirm not in ["-1", -1]:
output += """Please choose to add either a new or an existing MARC tag, but not both.
"""
body = [output]
if callback:
return perform_modifyfieldtags(fldID, ln, "perform_addtag", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_modifytag(fldID, tagID, ln=cdslang, name='', value='', callback='yes', confirm=-1):
"""form to modify a field.
fldID - the field to change."""
subtitle = ""
output = ""
fld_dict = dict(get_def_name('', "field"))
fldID = int(fldID)
tagID = int(tagID)
tag = get_tags(tagID)
if confirm in [-1, "-1"] and not value and not name:
name = tag[0][1]
value = tag[0][2]
subtitle = """Modify MARC tag"""
text = """
Any modifications will apply to all logical fields using this tag. Tag value Comment
""" % (value, name)
output += createhiddenform(action="modifytag#4.1",
text=text,
button="Modify",
fldID=fldID,
tagID=tagID,
ln=ln,
confirm=1)
if name and value and confirm in [1, "1"]:
res = modify_tag(tagID, name, value)
output += write_outcome(res)
body = [output]
if callback:
return perform_modifyfieldtags(fldID, ln, "perform_modifytag", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_removefieldtag(fldID, tagID, ln=cdslang, callback='yes', confirm=0):
"""form to remove a tag from a field.
fldID - the current field, remove the tag from this field.
tagID - remove the tag with this id"""
subtitle = """Remove MARC tag from logical field"""
output = ""
fld_dict = dict(get_def_name('', "field"))
if fldID and tagID:
fldID = int(fldID)
tagID = int(tagID)
tag = get_fld_tags(fldID, tagID)
if confirm not in ["1", 1]:
text = """Do you want to remove the tag '%s - %s ' from the field '%s'.""" % (tag[0][3], tag[0][2], fld_dict[fldID])
output += createhiddenform(action="removefieldtag#4.1",
text=text,
button="Confirm",
fldID=fldID,
tagID=tagID,
confirm=1)
elif confirm in ["1", 1]:
res = remove_fldtag(fldID, tagID)
output += write_outcome(res)
body = [output]
if callback:
return perform_modifyfieldtags(fldID, ln, "perform_removefieldtag", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_addindexfield(idxID, ln=cdslang, fldID='', callback="yes", confirm=-1):
"""form to add a new field.
fldNAME - the name of the new field
code - the field code"""
output = ""
subtitle = """Add logical field to index"""
text = """
Field name
"""
output = createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/addindexfield" % weburl,
text=text,
idxID=idxID,
ln=ln,
button="Add field",
confirm=1)
if fldID and not fldID in [-1, "-1"] and confirm in ["1", 1]:
res = add_idx_fld(idxID, fldID)
output += write_outcome(res)
elif confirm in ["1", 1]:
output += """Please select a field to add."""
body = [output]
if callback:
return perform_modifyindexfields(idxID, ln, "perform_addindexfield", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_removeindexfield(idxID, fldID, ln=cdslang, callback='yes', confirm=0):
"""form to remove a field from an index.
idxID - the current index, remove the field from this index.
fldID - remove the field with this id"""
subtitle = """Remove field from index"""
output = ""
if fldID and idxID:
fldID = int(fldID)
idxID = int(idxID)
fld = get_fld(fldID)
idx = get_idx(idxID)
if fld and idx and confirm not in ["1", 1]:
text = """Do you want to remove the field '%s' from the index '%s'.""" % (fld[0][1], idx[0][1])
output += createhiddenform(action="removeindexfield#3.1",
text=text,
button="Confirm",
idxID=idxID,
fldID=fldID,
confirm=1)
elif confirm in ["1", 1]:
res = remove_idxfld(idxID, fldID)
output += write_outcome(res)
body = [output]
if callback:
return perform_modifyindexfields(idxID, ln, "perform_removeindexfield", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_switchtagscore(fldID, id_1, id_2, ln=cdslang):
"""Switch the score of id_1 and id_2 in the table type.
colID - the current collection
id_1/id_2 - the id's to change the score for.
type - like "format" """
output = ""
name_1 = run_sql("select name from tag where id=%s" % id_1)[0][0]
name_2 = run_sql("select name from tag where id=%s" % id_2)[0][0]
res = switch_score(fldID, id_1, id_2)
output += write_outcome(res)
return perform_modifyfieldtags(fldID, ln, content=output)
def perform_deletetag(fldID, ln=cdslang, tagID=-1, callback='yes', confirm=-1):
"""form to delete an MARC tag not in use.
fldID - the collection id of the current collection.
fmtID - the format id to delete."""
subtitle = """Delete an unused MARC tag"""
output = """
Deleting an MARC tag will also delete the translations associated.
"""
fldID = int(fldID)
if tagID not in [-1," -1"] and confirm in [1, "1"]:
ares = delete_tag(tagID)
fld_tag = get_fld_tags()
fld_tag = dict(map(lambda x: (x[1], x[0]), fld_tag))
tags = get_tags()
text = """
MARC tag """
if i == 0:
output += """No unused MARC tags """
else:
output += createhiddenform(action="deletetag#4.1",
text=text,
button="Delete",
fldID=fldID,
ln=ln,
confirm=0)
if tagID not in [-1,"-1"]:
tagID = int(tagID)
tags = get_tags(tagID)
if confirm in [0, "0"]:
text = """Do you want to delete the MARC tag '%s'.""" % tags[0][2]
output += createhiddenform(action="deletetag#4.1",
text=text,
button="Confirm",
fldID=fldID,
tagID=tagID,
ln=ln,
confirm=1)
elif confirm in [1, "1"]:
output += write_outcome(ares)
elif confirm not in [-1, "-1"]:
output += """Choose a MARC tag to delete."""
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_modifyfieldtags(fldID, ln, content=output)
def compare_on_val(first, second):
"""Compare the two values"""
return cmp(first[1], second[1])
def get_col_fld(colID=-1, type = '', id_field=''):
"""Returns either all portalboxes associated with a collection, or based on either colID or language or both.
colID - collection id
ln - language id"""
sql = "SELECT id_collection,id_field,id_fieldvalue,type,score,score_fieldvalue FROM collection_field_fieldvalue, field WHERE id_field=field.id"
try:
if id_field:
sql += " AND id_field=%s" % id_field
sql += " ORDER BY type, score desc, score_fieldvalue desc"
res = run_sql(sql)
return res
except StandardError, e:
return ""
def get_idx(idxID=''):
sql = "SELECT id,name,description,last_updated,stemming_language FROM idxINDEX"
try:
if idxID:
sql += " WHERE id=%s" % idxID
sql += " ORDER BY id asc"
res = run_sql(sql)
return res
except StandardError, e:
return ""
def get_fld_tags(fldID='', tagID=''):
"""Returns tags associated with a field.
fldID - field id
tagID - tag id"""
sql = "SELECT id_field,id_tag, tag.name, tag.value, score FROM field_tag,tag WHERE tag.id=field_tag.id_tag"
try:
if fldID:
sql += " AND id_field=%s" % fldID
if tagID:
sql += " AND id_tag=%s" % tagID
sql += " ORDER BY score desc, tag.value, tag.name"
res = run_sql(sql)
return res
except StandardError, e:
return ""
def get_tags(tagID=''):
"""Returns all or a given tag.
tagID - tag id
ln - language id"""
sql = "SELECT id, name, value FROM tag"
try:
if tagID:
sql += " WHERE id=%s" % tagID
sql += " ORDER BY name, value"
res = run_sql(sql)
return res
except StandardError, e:
return ""
def get_fld(fldID=''):
"""Returns all fields or only the given field"""
try:
if not fldID:
res = run_sql("SELECT id, name, code FROM field ORDER by name, code")
else:
res = run_sql("SELECT id, name, code FROM field WHERE id=%s ORDER by name, code" % fldID)
return res
except StandardError, e:
return ""
def get_fld_value(fldvID = ''):
"""Returns fieldvalue"""
try:
sql = "SELECT id, name, value FROM fieldvalue"
if fldvID:
sql += " WHERE id=%s" % fldvID
res = run_sql(sql)
return res
except StandardError, e:
return ""
def get_idx_fld(idxID=''):
"""Return a list of fields associated with one or all indexes"""
try:
sql = "SELECT id_idxINDEX, idxINDEX.name, id_field, field.name, regexp_punctuation, regexp_alphanumeric_separators FROM idxINDEX, field, idxINDEX_field WHERE idxINDEX.id = idxINDEX_field.id_idxINDEX AND field.id = idxINDEX_field.id_field"
if idxID:
sql += " AND id_idxINDEX=%s" % idxID
sql += " ORDER BY id_idxINDEX asc"
res = run_sql(sql)
return res
except StandardError, e:
return ""
def get_col_nametypes():
"""Return a list of the various translationnames for the fields"""
type = []
type.append(('ln', 'Long name'))
return type
def get_fld_nametypes():
"""Return a list of the various translationnames for the fields"""
type = []
type.append(('ln', 'Long name'))
return type
def get_idx_nametypes():
"""Return a list of the various translationnames for the index"""
type = []
type.append(('ln', 'Long name'))
return type
def get_sort_nametypes():
"""Return a list of the various translationnames for the fields"""
type = {}
type['soo'] = 'Sort options'
type['seo'] = 'Search options'
type['sew'] = 'Search within'
return type
def remove_fld(colID,fldID, fldvID=''):
"""Removes a field from the collection given.
colID - the collection the format is connected to
fldID - the field which should be removed from the collection."""
try:
sql = "DELETE FROM collection_field_fieldvalue WHERE id_collection=%s AND id_field=%s" % (colID, fldID)
if fldvID:
sql += " AND id_fieldvalue=%s" % fldvID
res = run_sql(sql)
return (1, "")
except StandardError, e:
return (0, e)
def remove_idxfld(idxID, fldID):
"""Remove a field from a index in table idxINDEX_field
idxID - index id from idxINDEX
fldID - field id from field table"""
try:
sql = "DELETE FROM idxINDEX_field WHERE id_field=%s and id_idxINDEX=%s" % (fldID, idxID)
res = run_sql(sql)
return (1, "")
except StandardError, e:
return (0, e)
def remove_fldtag(fldID,tagID):
"""Removes a tag from the field given.
fldID - the field the tag is connected to
tagID - the tag which should be removed from the field."""
try:
sql = "DELETE FROM field_tag WHERE id_field=%s AND id_tag=%s" % (fldID, tagID)
res = run_sql(sql)
return (1, "")
except StandardError, e:
return (0, e)
def delete_tag(tagID):
"""Deletes all data for the given field
fldID - delete all data in the tables associated with field and this id """
try:
res = run_sql("DELETE FROM tag where id=%s" % tagID)
return (1, "")
except StandardError, e:
return (0, e)
def delete_idx(idxID):
"""Deletes all data for the given index together with the idxWORDXXR and idxWORDXXF tables"""
try:
res = run_sql("DELETE FROM idxINDEX WHERE id=%s" % idxID)
res = run_sql("DELETE FROM idxINDEXNAME WHERE id_idxINDEX=%s" % idxID)
res = run_sql("DELETE FROM idxINDEX_field WHERE id_idxINDEX=%s" % idxID)
res = run_sql("DROP TABLE idxWORD%sF" % (idxID < 10 and "0%s" % idxID or idxID))
res = run_sql("DROP TABLE idxWORD%sR" % (idxID < 10 and "0%s" % idxID or idxID))
res = run_sql("DROP TABLE idxPHRASE%sF" % (idxID < 10 and "0%s" % idxID or idxID))
res = run_sql("DROP TABLE idxPHRASE%sR" % (idxID < 10 and "0%s" % idxID or idxID))
return (1, "")
except StandardError, e:
return (0, e)
def delete_fld(fldID):
"""Deletes all data for the given field
fldID - delete all data in the tables associated with field and this id """
try:
res = run_sql("DELETE FROM collection_field_fieldvalue WHERE id_field=%s" % fldID)
res = run_sql("DELETE FROM field_tag WHERE id_field=%s" % fldID)
res = run_sql("DELETE FROM idxINDEX_field WHERE id_field=%s" % fldID)
res = run_sql("DELETE FROM field WHERE id=%s" % fldID)
return (1, "")
except StandardError, e:
return (0, e)
def add_idx(idxNAME):
"""Add a new index. returns the id of the new index.
idxID - the id for the index, number
idxNAME - the default name for the default language of the format."""
try:
idxID = 0
res = run_sql("SELECT id from idxINDEX WHERE name=%s", (idxNAME,))
if res:
return (0, (0, "A index with the given name already exists."))
for i in range(1, 100):
res = run_sql("SELECT id from idxINDEX WHERE id=%s" % i)
res2 = get_table_status_info("idxWORD%s%%" % (i < 10 and "0%s" % i or i))
if not res and not res2:
idxID = i
break
if idxID == 0:
return (0, (0, "Not possible to create new indexes, delete an index and try again."))
res = run_sql("INSERT INTO idxINDEX (id, name) VALUES (%s,%s)", (idxID, idxNAME))
type = get_idx_nametypes()[0][0]
res = run_sql("INSERT INTO idxINDEXNAME (id_idxINDEX, ln, type, value) VALUES (%s,%s,%s,%s)",
(idxID, cdslang, type, idxNAME))
res = run_sql("""CREATE TABLE IF NOT EXISTS idxWORD%sF (
id mediumint(9) unsigned NOT NULL auto_increment,
term varchar(50) default NULL,
hitlist longblob,
PRIMARY KEY (id),
UNIQUE KEY term (term)
) TYPE=MyISAM""" % (idxID < 10 and "0%s" % idxID or idxID))
res = run_sql("""CREATE TABLE IF NOT EXISTS idxWORD%sR (
id_bibrec mediumint(9) unsigned NOT NULL,
termlist longblob,
type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT',
PRIMARY KEY (id_bibrec,type)
) TYPE=MyISAM""" % (idxID < 10 and "0%s" % idxID or idxID))
res = run_sql("""CREATE TABLE `idxPHRASE%sF` (
`id` mediumint(9) unsigned NOT NULL auto_increment,
`term` text default NULL,
`hitlist` longblob,
PRIMARY KEY (`id`),
KEY `term` (`term`(50))
) TYPE=MyISAM""" % (idxID < 10 and "0%s" % idxID or idxID))
res = run_sql("""CREATE TABLE `idxPHRASE%sR` (
`id_bibrec` mediumint(9) unsigned NOT NULL default '0',
`termlist` longblob,
`type` enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT',
PRIMARY KEY (`id_bibrec`,`type`)
) TYPE=MyISAM""" % (idxID < 10 and "0%s" % idxID or idxID))
res = run_sql("SELECT id from idxINDEX WHERE id=%s" % idxID)
res2 = get_table_status_info("idxWORD%sF" % (idxID < 10 and "0%s" % idxID or idxID))
res3 = get_table_status_info("idxWORD%sR" % (idxID < 10 and "0%s" % idxID or idxID))
if res and res2 and res3:
return (1, res[0][0])
elif not res:
return (0, (0, "Could not add the new index to idxINDEX"))
elif not res2:
return (0, (0, "Forward table not created for unknown reason."))
elif not res3:
return (0, (0, "Reverse table not created for unknown reason."))
except StandardError, e:
return (0, e)
def add_fld(name, code):
"""Add a new logical field. Returns the id of the field.
code - the code for the field,
name - the default name for the default language of the field."""
try:
type = get_fld_nametypes()[0][0]
res = run_sql("INSERT INTO field (name, code) VALUES (%s,%s)", (name, code))
fldID = run_sql("SELECT id FROM field WHERE code=%s", (code,))
res = run_sql("INSERT INTO fieldname (id_field, type, ln, value) VALUES (%s,%s,%s,%s)", (fldID[0][0], type, cdslang, name))
if fldID:
return (1, fldID[0][0])
else:
raise StandardError
except StandardError, e:
return (0, e)
def add_fld_tag(fldID, name, value):
"""Add a sort/search/field to the collection.
colID - the id of the collection involved
fmtID - the id of the format.
score - the score of the format, decides sorting, if not given, place the format on top"""
try:
res = run_sql("SELECT score FROM field_tag WHERE id_field=%s ORDER BY score desc" % (fldID))
if res:
score = int(res[0][0]) + 1
else:
score = 0
res = run_sql("SELECT id FROM tag WHERE value=%s", (value,))
if not res:
if name == '':
name = value
res = run_sql("INSERT INTO tag (name, value) VALUES (%s,%s)", (name, value))
res = run_sql("SELECT id FROM tag WHERE value=%s", (value,))
res = run_sql("INSERT INTO field_tag(id_field, id_tag, score) values(%s, %s, %s)" % (fldID, res[0][0], score))
return (1, "")
except StandardError, e:
return (0, e)
def add_idx_fld(idxID, fldID):
"""Add a field to an index"""
try:
sql = "SELECT id_idxINDEX FROM idxINDEX_field WHERE id_idxINDEX=%s and id_field=%s"
res = run_sql(sql, (idxID, fldID))
if res:
return (0, (0, "The field selected already exists for this index"))
sql = "INSERT INTO idxINDEX_field(id_idxINDEX, id_field) values (%s, %s)"
res = run_sql(sql, (idxID, fldID))
return (1, "")
except StandardError, e:
return (0, e)
def modify_idx(idxID, idxNAME, idxDESC):
"""Modify index name or index description in idxINDEX table"""
try:
res = run_sql("UPDATE idxINDEX SET name=%s WHERE id=%s", (idxNAME, idxID))
res = run_sql("UPDATE idxINDEX SET description=%s WHERE ID=%s", (idxDESC, idxID))
return (1, "")
except StandardError, e:
return (0, e)
def modify_idx_stemming(idxID, idxSTEM):
"""Modify the index stemming language in idxINDEX table"""
try:
res = run_sql("UPDATE idxINDEX SET stemming_language=%s WHERE ID=%s", (idxSTEM, idxID))
return (1, "")
except StandardError, e:
return (0, e)
def modify_fld(fldID, code):
"""Modify the code of field
fldID - the id of the field to modify
code - the new code"""
try:
sql = "UPDATE field SET code='%s'" % code
sql += " WHERE id=%s" % fldID
res = run_sql(sql)
return (1, "")
except StandardError, e:
return (0, e)
def modify_tag(tagID, name, value):
"""Modify the name and value of a tag.
tagID - the id of the tag to modify
name - the new name of the tag
value - the new value of the tag"""
try:
sql = "UPDATE tag SET name='%s' WHERE id=%s" % (name, tagID)
res = run_sql(sql)
sql = "UPDATE tag SET value='%s' WHERE id=%s" % (value, tagID)
res = run_sql(sql)
return (1, "")
except StandardError, e:
return (0, e)
def switch_score(fldID, id_1, id_2):
"""Switch the scores of id_1 and id_2 in the table given by the argument.
colID - collection the id_1 or id_2 is connected to
id_1/id_2 - id field from tables like format..portalbox...
table - name of the table"""
try:
res1 = run_sql("SELECT score FROM field_tag WHERE id_field=%s and id_tag=%s" % (fldID, id_1))
res2 = run_sql("SELECT score FROM field_tag WHERE id_field=%s and id_tag=%s" % (fldID, id_2))
res = run_sql("UPDATE field_tag SET score=%s WHERE id_field=%s and id_tag=%s" % (res2[0][0], fldID, id_1))
res = run_sql("UPDATE field_tag SET score=%s WHERE id_field=%s and id_tag=%s" % (res1[0][0], fldID, id_2))
return (1, "")
except StandardError, e:
return (0, e)
def get_lang_list(table, field, id):
langs = run_sql("select ln from %s where %s=%s" % (table, field, id))
exists = {}
lang = ''
for lng in langs:
if not exists.has_key(lng[0]):
lang += lng[0] + ", "
exists[lng[0]] = 1
if lang.endswith(", "):
lang = lang [:-2]
if len(exists) == 0:
lang = """None"""
return lang
diff --git a/modules/bibmatch/lib/bibmatch_engine.py b/modules/bibmatch/lib/bibmatch_engine.py
index d1ef61fed..b78d59f64 100644
--- a/modules/bibmatch/lib/bibmatch_engine.py
+++ b/modules/bibmatch/lib/bibmatch_engine.py
@@ -1,562 +1,562 @@
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""BibMatch tool to match records with database content."""
__revision__ = "$Id$"
import fileinput
import string
import os
import sys
import getopt
from invenio.config import \
- bibconvert, \
- version
+ CFG_BINDIR, \
+ CFG_VERSION
from invenio.search_engine import perform_request_search
from invenio.bibrecord import *
from invenio import bibconvert
from invenio.dbquery import run_sql
def usage():
"""Print help"""
print >> sys.stderr, \
""" Usage: %s [options]
Examples:
$ bibmatch [--print-new] --field=\"title\" < input.xml > output.xml
$ bibmatch --print-match --field=\"245__a\" --mode=\"a\" < input.xml > output.xml
$ bibmatch --print-ambiguous --query-string=\"245__a||100__a\" < input.xml > output.xml
$bibmatch [options] < input.xml > output.xml
Options:
Output:
-0 --print-new (default)
-1 --print-match
-2 --print-ambiguous
-b --batch-output=(filename)
Simple query:
-f --field=(field)
Advanced query:
-c --config=(config-filename)
-q --query-string=(uploader_querystring)
-m --mode=(a|e|o|p|r)[3]
-o --operator=(a|o)[2]
General options:
-h, --help print this help and exit
-V, --version print version information and exit
-v, --verbose=LEVEL verbose level (from 0 to 9, default 1)
""" % sys.argv[0]
sys.exit(1)
return
class Querystring:
"Holds the information about querystring (p1,f1,m1,op1,p2,f2,m2,op2,p3,f3,m3,as)."
def __init__(self, mode="1"):
"""Creates querystring instance"""
self.pattern = []
self.field = []
self.mode = []
self.operator = []
self.format = []
self.pattern.append("")
self.pattern.append("")
self.pattern.append("")
self.field.append("")
self.field.append("")
self.field.append("")
self.mode.append("")
self.mode.append("")
self.mode.append("")
self.operator.append("")
self.operator.append("")
self.format.append([])
self.format.append([])
self.format.append([])
self.advanced = 0
return
def from_qrystr(self, qrystr="", search_mode="eee", operator="aa"):
"""Converts qrystr into querystring (uploader format)"""
self.default()
self.field = []
self.format = []
self.mode = ["e","e","e"]
fields = string.split(qrystr,"||")
for field in fields:
tags = string.split(field, "::")
i = 0
format = []
for tag in tags:
if(i==0):
self.field.append(tag)
else:
format.append(tag)
i +=1
self.format.append(format)
while(len(self.format) < 3):
self.format.append("")
while(len(self.field) < 3):
self.field.append("")
i = 0
for lett in search_mode:
self.mode[i] = lett
i += 1
i = 0
for lett in operator:
self.operator[i] = lett
i += 1
return
def default(self):
self.pattern = []
self.field = []
self.mode = []
self.operator = []
self.format = []
self.pattern.append("")
self.pattern.append("")
self.pattern.append("")
self.field.append("245__a")
self.field.append("")
self.field.append("")
self.mode.append("a")
self.mode.append("")
self.mode.append("")
self.operator.append("")
self.operator.append("")
self.format.append([])
self.format.append([])
self.format.append([])
self.advanced = 1
return
def change_search_mode(self, mode="a"):
self.mode = [mode,mode,mode]
return
def search_engine_encode(self):
field_ = []
for field in self.field:
i = 0
field__ = ""
for letter in field:
if(letter == "%"):
if(i==5):
letter = "a"
else:
letter = "_"
i+=1
field__ += str(letter)
field_.append(field__)
self.field = field_
return
def get_field_tags(field):
"Gets list of field 'field' for the record with 'sysno' system number from the database."
query = "select tag.value from tag left join field_tag on tag.id=field_tag.id_tag left join field on field_tag.id_field=field.id where field.code='%s'" % field;
out = []
res = run_sql(query)
for row in res:
out.append(row[0])
return out
def get_subfield(field, subfield):
"Return subfield of a field."
for sbf in field:
if(sbf[0][0][0] == subfield):
return sbf[0][0][1]
return ""
def matched_records(recID_lists):
"Analyze list of matches. Ambiguous record result is always preferred."
recID_tmp = []
for recID_list in recID_lists:
if(len(recID_list) > 1):
return 2
if(len(recID_list) == 1):
if(len(recID_tmp) == 0):
recID_tmp.append(recID_list[0])
else:
if(recID_list[0] in recID_tmp):
pass
else:
return 2
if(len(recID_tmp) == 1):
return 1
return 0
def matched_records_min(recID_lists):
"Analyze lists of matches. New record result is preferred if result is unmatched."
min = 2
for recID_list in recID_lists:
if(len(recID_list) < min):
min = len(recID_list)
if(min==1):
return min
return min
def matched_records_max(recID_lists):
"Analyze lists of matches. Ambiguous result is preferred if result is unmatched."
max = 0
for recID_list in recID_lists:
if(len(recID_list) == 1):
return 1
if(len(recID_list) > max):
max = len(recID_list)
if (max > 1):
return 2
elif (max == 1):
return 1
else:
return 0
return 2
def main():
# Record matches database content when defined search gives exactly one record in the result set.
# By default the match is done on the title field.
# Using advanced search only 3 fields can be queried concurrently
# qrystr - querystring in the UpLoader format
try:
opts, args = getopt.getopt(sys.argv[1:],"012hVm:f:q:c:nv:o:b:",
[
"print-new",
"print-match",
"print-ambiguous",
"help",
"version",
"mode=",
"field=",
"query-string=",
"config=",
"no-process",
"verbose=",
"operator=",
"batch-output="
])
except getopt.GetoptError, e:
usage()
recs_out = []
recID_list = []
recID_lists = []
qrystrs = []
match_mode = 0 # default match mode to print new records
rec_new = 0 # indicator that record is new
rec_match = 0 # indicator that record is matched
matched = 0 # number of records matched
record_counter = 0 # number of records processed
noprocess = 0
result = [0,0,0]
perform_request_search_mode = "eee"
operator = "aa"
verbose = 1 # 0..be quiet
level = 1 # 1..exact match
file_read = ""
records = []
batch_output = ""
predefined_fields = ["title", "author"]
for opt, opt_value in opts:
if opt in ["-0", "--print-new"]:
match_mode = 0
if opt in ["-1", "--print-match"]:
match_mode = 1
if opt in ["-2", "--print-ambiguous"]:
match_mode = 2
if opt in ["-n", "--no-process"]:
noprocess = 1
if opt in ["-h", "--help"]:
usage()
sys.exit(0)
if opt in ["-V", "--version"]:
print __revision__
sys.exit(0)
if opt in ["-v", "--verbose"]:
verbose = int(opt_value)
if opt in ["-q", "--query-string"]:
qrystrs.append(opt_value)
if opt in ["-m", "--mode"]:
perform_request_search_mode = opt_value
if opt in ["-o", "--operator"]:
operator = opt_value
if opt in ["-b", "--batch-output"]:
batch_output = opt_value
if opt in ["-f", "--field"]:
alternate_querystring = []
if opt_value in predefined_fields:
alternate_querystring = get_field_tags(opt_value)
for item in alternate_querystring:
qrystrs.append(item)
else:
qrystrs.append(opt_value)
if opt in ["-c", "--config"]:
config_file = opt_value
config_file_read = bibconvert.read_file(config_file, 0)
for line in config_file_read:
tmp = string.split(line, "---")
if(tmp[0] == "QRYSTR"):
qrystrs.append(tmp[1])
if verbose:
sys.stderr.write("\nBibMatch: Parsing input file ... ")
for line_in in sys.stdin:
file_read += line_in
records = create_records(file_read)
if len(records) == 0:
if verbose:
sys.stderr.write("\nBibMatch: Input file contains no records.\n")
sys.exit()
else:
if verbose:
sys.stderr.write("read %d records" % len(records))
sys.stderr.write("\nBibMatch: Matching ...")
### Prepare batch output
if (batch_output != ""):
out_0 = []
out_1 = []
out_2 = []
for rec in records:
### for each query-string
record_counter += 1
if (verbose > 1):
sys.stderr.write("\n Processing record: #%d .." % record_counter)
recID_lists = []
if(len(qrystrs)==0):
qrystrs.append("")
more_detailed_info = ""
for qrystr in qrystrs:
querystring = Querystring()
querystring.default()
if(qrystr != ""):
querystring.from_qrystr(qrystr, perform_request_search_mode, operator)
else:
querystring.default()
### search engine qrystr encode
querystring.search_engine_encode()
### get field values
inst = []
### get appropriate corresponding fields from database
i = 0
for field in querystring.field:
### use expanded tags
tag = field[0:3]
ind1 = field[3:4]
ind2 = field[4:5]
code = field[5:6]
if((ind1 == "_")or(ind1 == "%")):
ind1 = ""
if((ind2 == "_")or(ind2 == "%")):
ind2 = ""
if((code == "_")or(code == "%")):
code = "a"
if(field != "001"):
sbf = get_subfield(record_get_field_instances(rec[0], tag, ind1, ind2), code)
inst.append(sbf)
elif(field in ["001"]):
sbf = record_get_field_values(rec[0], field, ind1="", ind2="", code="")
inst.append(sbf)
else:
inst.append("")
i += 1
### format acquired field values
i = 0
for instance in inst:
for format in querystring.format[i]:
inst[i] = bibconvert.FormatField(inst[i],format)
i += 1
### perform sensible request search only
if(inst[0]!=""):
recID_list = perform_request_search(
p1=inst[0], f1=querystring.field[0], m1=querystring.mode[0], op1=querystring.operator[0],
p2=inst[1], f2=querystring.field[1], m2=querystring.mode[1], op2=querystring.operator[1],
p3=inst[2], f3=querystring.field[2], m3=querystring.mode[2], as=querystring.advanced)
else:
recID_list = []
recID_lists.append(recID_list)
### more detailed info ...
if(verbose > 8):
more_detailed_info = "%s\n Matched recIDs: %s" % (more_detailed_info, recID_lists)
if(verbose > 2):
more_detailed_info = "%s\n On query: %s, %s, %s, %s\n %s, %s, %s, %s\n %s, %s, %s\n" % (more_detailed_info, inst[0], querystring.field[0], querystring.mode[0], querystring.operator[0], inst[1], querystring.field[1], querystring.mode[1], querystring.operator[1], inst[2], querystring.field[2], querystring.mode[2])
### for multitagged fields (e.g. title), unmatched result corresponds to the item in extreme
rec_match = matched_records_max(recID_lists)
### print-new
if (rec_match==0):
result[0] += 1
if(match_mode==0):
recs_out.append(rec)
if (batch_output != ""):
out_0.append(rec)
if verbose:
sys.stderr.write(".")
if (verbose > 1):
sys.stderr.write("NEW")
### print-match
elif (rec_match <= level):
result[1] += 1
if(match_mode==1):
recs_out.append(rec)
if (batch_output != ""):
out_1.append(rec)
if verbose:
sys.stderr.write(".")
if (verbose > 1):
sys.stderr.write("MATCH")
### print-ambiguous
elif(rec_match > level):
result[2] += 1
if(match_mode==2):
recs_out.append(rec)
if (batch_output != ""):
out_2.append(rec)
if verbose:
sys.stderr.write(".")
if (verbose > 1):
sys.stderr.write("AMBIGUOUS")
else:
pass
sys.stderr.write(more_detailed_info)
if verbose:
sys.stderr.write("\n\n Bibmatch report\n")
sys.stderr.write("=" * 35)
sys.stderr.write("\n New records : %d" % result[0])
sys.stderr.write("\n Matched records : %d" % result[1])
sys.stderr.write("\n Ambiguous records : %d\n" % result[2])
sys.stderr.write("=" * 35)
sys.stderr.write("\n Total records : %d\n" % record_counter)
if noprocess:
pass
else:
for record in recs_out:
print print_rec(record[0])
if (batch_output != ""):
filename = "%s.0" % batch_output
file_0 = open(filename,"w")
filename = "%s.1" % batch_output
file_1 = open(filename,"w")
filename = "%s.2" % batch_output
file_2 = open(filename,"w")
for record in out_0:
file_0.write(print_rec(record[0]))
for record in out_1:
file_1.write(print_rec(record[0]))
for record in out_2:
file_2.write(print_rec(record[0]))
file_0.close()
file_1.close()
file_2.close()
diff --git a/modules/bibrank/lib/bibrank.py b/modules/bibrank/lib/bibrank.py
index e1cdd3639..0845c6e86 100644
--- a/modules/bibrank/lib/bibrank.py
+++ b/modules/bibrank/lib/bibrank.py
@@ -1,258 +1,258 @@
#!@PYTHON@
## -*- mode: python; coding: utf-8; -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
BibRank ranking daemon.
Usage: %s [options]
Ranking examples:
%s -wjif -a --id=0-30000,30001-860000 --verbose=9
%s -wjif -d --modified='2002-10-27 13:57:26'
%s -wwrd --rebalance --collection=Articles
%s -wwrd -a -i 234-250,293,300-500 -u admin
Ranking options:
-w, --run=r1[,r2] runs each rank method in the order given
-c, --collection=c1[,c2] select according to collection
-i, --id=low[-high] select according to doc recID
-m, --modified=from[,to] select according to modification date
-l, --lastupdate select according to last update
-a, --add add or update words for selected records
-d, --del delete words for selected records
-S, --stat show statistics for a method
-R, --recalculate recalculate weigth data, used by word frequency
method should be used if ca 1% of the document
has been changed since last time -R was used
Repairing options:
-k, --check check consistency for all records in the table(s)
check if update of ranking data is necessary
-r, --repair try to repair all records in the table(s)
Scheduling options:
-u, --user=USER user name to store task, password needed
-s, --sleeptime=SLEEP time after which to repeat tasks (no)
e.g.: 1s, 30m, 24h, 7d
-t, --time=TIME moment for the task to be active (now)
e.g.: +15s, 5m, 3h , 2002-10-27 13:57:26
General options:
-h, --help print this help and exit
-V, --version print version and exit
-v, --verbose=LEVEL verbose level (from 0 to 9, default 1)
"""
__revision__ = "$Id$"
import sys
import traceback
import ConfigParser
-from invenio.config import etcdir
+from invenio.config import CFG_ETCDIR
from invenio.dbquery import run_sql
from invenio.bibtask import task_init, write_message, task_get_option, \
task_set_option, get_datetime, task_update_status
from invenio.bibrank_tag_based_indexer import single_tag_rank_method, citation
from invenio.bibrank_word_indexer import word_similarity
nb_char_in_line = 50 # for verbose pretty printing
chunksize = 1000 # default size of chunks that the records will be treated by
base_process_size = 4500 # process base size
def split_ranges(parse_string):
"""Split ranges of numbers"""
recIDs = []
ranges = parse_string.split(",")
for rang in ranges:
tmp_recIDs = rang.split("-")
if len(tmp_recIDs)==1:
recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[0])])
else:
if int(tmp_recIDs[0]) > int(tmp_recIDs[1]): # sanity check
tmp = tmp_recIDs[0]
tmp_recIDs[0] = tmp_recIDs[1]
tmp_recIDs[1] = tmp
recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[1])])
return recIDs
def get_date_range(var):
"Returns the two dates contained as a low,high tuple"
limits = var.split(",")
if len(limits)==1:
low = get_datetime(limits[0])
return low, None
if len(limits)==2:
low = get_datetime(limits[0])
high = get_datetime(limits[1])
return low, high
def task_run_core():
"""Run the indexing task. The row argument is the BibSched task
queue row, containing if, arguments, etc.
Return 1 in case of success and 0 in case of failure.
"""
try:
for key in task_get_option("run"):
write_message("")
- filename = etcdir + "/bibrank/" + key + ".cfg"
+ filename = CFG_ETCDIR + "/bibrank/" + key + ".cfg"
write_message("Getting configuration from file: %s" % filename,
verbose=9)
config = ConfigParser.ConfigParser()
try:
config.readfp(open(filename))
except StandardError, e:
write_message("Cannot find configurationfile: %s. "
"The rankmethod may also not be registered using "
"the BibRank Admin Interface." % filename, sys.stderr)
raise StandardError
#Using the function variable to call the function related to the
#rank method
cfg_function = config.get("rank_method", "function")
func_object = globals().get(cfg_function)
if func_object:
func_object(key)
else:
write_message("Cannot run method '%s', no function to call"
% key)
except StandardError, e:
write_message("\nException caught: %s" % e, sys.stderr)
traceback.print_tb(sys.exc_info()[2])
task_update_status("ERROR")
sys.exit(1)
return True
def main():
"""Main that construct all the bibtask."""
task_set_option('quick', 'yes')
task_set_option('cmd', 'add')
task_set_option("flush", 100000)
task_set_option('collection', [])
task_set_option("id", [])
task_set_option("check", "")
task_set_option("stat", "")
task_set_option("modified", "")
task_set_option("last_updated", "last_updated")
task_set_option("run", [])
res = run_sql("SELECT name from rnkMETHOD")
for (name,) in res:
task_get_option("run").append(name)
task_init(authorization_action='runbibrank',
authorization_msg="BibRank Task Submission",
description="""Ranking examples:
%s -wjif -a --id=0-30000,30001-860000 --verbose=9
%s -wjif -d --modified='2002-10-27 13:57:26'
%s -wjif --rebalance --collection=Articles
%s -wsbr -a -i 234-250,293,300-500 -u admin
""",
help_specific_usage="""Ranking options:
-w, --run=r1[,r2] runs each rank method in the order given
-c, --collection=c1[,c2] select according to collection
-i, --id=low[-high] select according to doc recID
-m, --modified=from[,to] select according to modification date
-l, --lastupdate select according to last update
-a, --add add or update words for selected records
-d, --del delete words for selected records
-S, --stat show statistics for a method
-R, --recalculate recalculate weigth data, used by word frequency
method should be used if ca 1%% of the document has
been changed since last time -R was used
Repairing options:
-k, --check check consistency for all records in the table(s)
check if update of ranking data is necessary
-r, --repair try to repair all records in the table(s)
""",
version=__revision__,
specific_params=("ladSi:m:c:kUrRM:f:w:", [
"lastupdate",
"add",
"del",
"repair",
"maxmem",
"flush",
"stat",
"rebalance",
"id=",
"collection=",
"check",
"modified=",
"update",
"run="]),
task_submit_elaborate_specific_parameter_fnc=
task_submit_elaborate_specific_parameter,
task_run_fnc=task_run_core)
def task_submit_elaborate_specific_parameter(key, value, opts, dummy):
"""Elaborate a specific parameter of CLI bibrank."""
if key in ("-a", "--add"):
task_set_option("cmd", "add")
if ("-x","") in opts or ("--del","") in opts:
raise StandardError, "--add incompatible with --del"
elif key in ("--run", "-w"):
task_set_option("run", [])
run = value.split(",")
for run_key in range(0, len(run)):
task_get_option('run').append(run[run_key])
elif key in ("-r", "--repair"):
task_set_option("cmd", "repair")
elif key in ("-d", "--del"):
task_set_option("cmd", "del")
elif key in ("-k", "--check"):
task_set_option("cmd", "check")
elif key in ("-S", "--stat"):
task_set_option("cmd", "stat")
elif key in ("-i", "--id"):
task_set_option("id", task_get_option("id") + split_ranges(value))
task_set_option("last_updated", "")
elif key in ("-c", "--collection"):
task_set_option("collection", value)
elif key in ("-R", "--rebalance"):
task_set_option("quick", "no")
elif key in ("-f", "--flush"):
task_set_option("flush", int(value))
elif key in ("-M", "--maxmem"):
task_set_option("maxmem", int(value))
if task_get_option("maxmem") < base_process_size + 1000:
raise StandardError, "Memory usage should be higher than %d kB" % \
(base_process_size + 1000)
elif key in ("-m", "--modified"):
task_set_option("modified", get_date_range(value))#2002-10-27 13:57:26)
task_set_option("last_updated", "")
elif key in ("-l", "--lastupdate"):
task_set_option("last_updated", "last_updated")
else:
return False
return True
if __name__ == "__main__":
main()
diff --git a/modules/bibrank/lib/bibrank_grapher.py b/modules/bibrank/lib/bibrank_grapher.py
index c2c845a20..bddb19ec1 100644
--- a/modules/bibrank/lib/bibrank_grapher.py
+++ b/modules/bibrank/lib/bibrank_grapher.py
@@ -1,208 +1,208 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
__revision__ = "$Id$"
-import os
-import sys
-import time
-import tempfile
+import os
+import sys
+import time
+import tempfile
from invenio.config import \
images, \
- storage, \
- version, \
- webdir
-from invenio.websubmit_config import *
-
-## test gnuplot presence:
-cfg_gnuplot_available = 1
-try:
- import Gnuplot
-except ImportError, e:
- cfg_gnuplot_available = 0
-
-def write_coordinates_in_tmp_file(lists_coordinates):
+ CFG_WEBSUBMIT_STORAGEDIR, \
+ CFG_VERSION, \
+ CFG_WEBDIR
+from invenio.websubmit_config import *
+
+## test gnuplot presence:
+cfg_gnuplot_available = 1
+try:
+ import Gnuplot
+except ImportError, e:
+ cfg_gnuplot_available = 0
+
+def write_coordinates_in_tmp_file(lists_coordinates):
"""write the graph coordinates in a temporary file for reading it later
by the create_temporary_image method
lists_coordinates is a list of list of this form:
[[(1,3),(2,4),(3,5)],[(1,5),(2,5),(3,6)]
This file is organized into one or more sets of 2 columns.
Each set is separated from the others by two blank lines.
Each intern list represents a set and each tuple a line in the file where fist element
of the tuple is the element of the first column, and second element of the
tuple is the element of the second column.
With gnuplot, first column is used as x coordinates, and second column as y coordinates.
One set represents a curve in the graph.
"""
max_y_datas = 0
- tempfile.tempdir = webdir + "/img"
- fname = tempfile.mktemp()
- file_dest = open(fname, 'a')
- for list_elem in lists_coordinates:
- y_axe = []
- #prepare data and store them in a file
- for key_value in list_elem:
- file_dest.write("%s %s\n"%(key_value[0], key_value[1]))
+ tempfile.tempdir = CFG_WEBDIR + "/img"
+ fname = tempfile.mktemp()
+ file_dest = open(fname, 'a')
+ for list_elem in lists_coordinates:
+ y_axe = []
+ #prepare data and store them in a file
+ for key_value in list_elem:
+ file_dest.write("%s %s\n"%(key_value[0], key_value[1]))
y_axe.append(key_value[1])
max_tmp = 0
if y_axe:
- max_tmp = max(y_axe)
- if max_tmp > max_y_datas:
- max_y_datas = max_tmp
+ max_tmp = max(y_axe)
+ if max_tmp > max_y_datas:
+ max_y_datas = max_tmp
file_dest.write("\n\n")
file_dest.close()
- return [fname, max_y_datas]
-
+ return [fname, max_y_datas]
+
def create_temporary_image(recid, kind_of_graphe, data_file, x_label, y_label, origin_tuple, y_max, docid_list, graphe_titles, intervals):
"""From a temporary file, draw a gnuplot graph
The arguments are as follows:
recid - record ID
kind_of_graph - takes one of these values : "citation" ,"download_history", "download_users"
All the commons gnuplot commands for these cases, are written at the beginning
After the particular commands dependaing of each case are written.
data_file - Name of the temporary file which contains the gnuplot datas used to plot the graph.
This file is organized into one or more sets of 2 columns.
First column contains x coordinates, and second column contains y coordinates.
Each set is separated from the others by two blank lines.
x_label - Name of the x axe.
y_label - Name of the y axe.
origin_tuple - Reference coordinates for positionning the graph.
y_max - Max value of y. Used to set y range.
docid_list - In download_history case, docid_list is used to plot multiple curves.
graphe_titles - List of graph titles. It's used to name the curve in the legend.
intervals - x tics location and xrange specification"""
-
- if cfg_gnuplot_available == 0:
+
+ if cfg_gnuplot_available == 0:
return (None, None)
- #For different curves
- color_line_list = ['4', '3', '2', '9', '6']
- #Gnuplot graphe object
- g = Gnuplot.Gnuplot()
- #Graphe name: file to store graph
- graphe_name = "tmp_%s_%s_stats.png" % (kind_of_graphe, recid)
- g('set terminal png small')
- g('set output "%s/img/%s"' % (webdir, graphe_name))
- len_intervals = len(intervals)
- len_docid_list = len(docid_list)
- # Standard options
- g('set size 0.5,0.5')
- g('set origin %s,%s'% (origin_tuple[0], origin_tuple[1]))
- if x_label == '':
- g('unset xlabel')
- else:
- g.xlabel(s = x_label)
- if x_label == '':
- g('unset ylabel')
- else:
- g.ylabel(s = y_label)
- g('set bmargin 5')
- #let a place at the top of the graph
- g('set tmargin 1')
-
- #Will be passed to g at the end to plot the graphe
- plot_text = ""
-
+ #For different curves
+ color_line_list = ['4', '3', '2', '9', '6']
+ #Gnuplot graphe object
+ g = Gnuplot.Gnuplot()
+ #Graphe name: file to store graph
+ graphe_name = "tmp_%s_%s_stats.png" % (kind_of_graphe, recid)
+ g('set terminal png small')
+ g('set output "%s/img/%s"' % (CFG_WEBDIR, graphe_name))
+ len_intervals = len(intervals)
+ len_docid_list = len(docid_list)
+ # Standard options
+ g('set size 0.5,0.5')
+ g('set origin %s,%s'% (origin_tuple[0], origin_tuple[1]))
+ if x_label == '':
+ g('unset xlabel')
+ else:
+ g.xlabel(s = x_label)
+ if x_label == '':
+ g('unset ylabel')
+ else:
+ g.ylabel(s = y_label)
+ g('set bmargin 5')
+ #let a place at the top of the graph
+ g('set tmargin 1')
+
+ #Will be passed to g at the end to plot the graphe
+ plot_text = ""
+
if kind_of_graphe == 'download_history':
- g('set xdata time') #Set x scale as date
- g('set timefmt "%m/%Y"') #Inform about format in file .dat
+ g('set xdata time') #Set x scale as date
+ g('set timefmt "%m/%Y"') #Inform about format in file .dat
g('set format x "%b %y"') #Format displaying
if len(intervals) > 1 :
- g('set xrange ["%s":"%s"]' % (intervals[0], intervals[len_intervals-1]))
+ g('set xrange ["%s":"%s"]' % (intervals[0], intervals[len_intervals-1]))
y_offset = max(3, float(y_max)/60)
g('set yrange [0:%s]' %str(y_max + y_offset))
- if len_intervals > 1 and len_intervals <= 12:
+ if len_intervals > 1 and len_intervals <= 12:
g('set xtics rotate %s' % str(tuple(intervals)))#to prevent duplicate tics
- elif len_intervals > 12 and len_intervals <= 24:
- g('set xtics rotate "%s", 7776000, "%s"' % (intervals[0], intervals[len_intervals-1])) #3 months intervalls
+ elif len_intervals > 12 and len_intervals <= 24:
+ g('set xtics rotate "%s", 7776000, "%s"' % (intervals[0], intervals[len_intervals-1])) #3 months intervalls
else :
- g('set xtics rotate "%s",15552000, "%s"' % (intervals[0], intervals[len_intervals-1])) #6 months intervalls
-
+ g('set xtics rotate "%s",15552000, "%s"' % (intervals[0], intervals[len_intervals-1])) #6 months intervalls
+
if len_docid_list <= 1: #Only one curve
- #g('set style fill solid 0.25')
+ #g('set style fill solid 0.25')
if len(intervals)<=4:
plot_text = plot_command(1, data_file, (0, 0), "", "imp", color_line_list[0], 20)
else:
plot_text = plot_command(1, data_file, (0, 0), "", "linespoint", color_line_list[0], 1, "pt 26", "ps 0.5")
elif len_docid_list > 1: #Multiple curves
if len(intervals)<=4:
plot_text = plot_command(1, data_file, (0, 0), graphe_titles[0], "imp", color_line_list[0], 20)
else:
plot_text = plot_command(1, data_file, (0, 0), graphe_titles[0], "linespoint", color_line_list[0], 1, "pt 26", "ps 0.5")
for d in range(1, len_docid_list):
if len(intervals)<=4:
plot_text += plot_command(0, data_file, (d, d) , graphe_titles[d], "imp", color_line_list[d], 20)
else :
plot_text += plot_command(0, data_file, (d, d) , graphe_titles[d], "linespoint", color_line_list[d], 1, "pt 26", "ps 0.5")
if len(intervals)>2:
- plot_text += plot_command(0, data_file, (len_docid_list, len_docid_list), "", "impulses", 0, 2 )
- plot_text += plot_command(0, data_file, (len_docid_list, len_docid_list), "TOTAL", "lines", 0, 5)
-
- elif kind_of_graphe == 'download_users':
- g('set size 0.25,0.5')
- g('set xrange [0:4]')
- g('set yrange [0:100]')
- g('set format y "%g %%"')
- g("""set xtics ("" 0, "CERN\\n Users" 1, "Other\\n Users" 3, "" 4)""")
- g('set ytics 0,10,100')
- g('set boxwidth 0.7 relative')
- g('set style fill solid 0.25')
- plot_text = 'plot "%s" using 1:2 title "" with boxes lt 7 lw 2' % data_file
-
- else: #citation
- g('set boxwidth 0.6 relative')
- g('set style fill solid 0.250000 border -1')
- g('set xtics rotate %s'% str(tuple(intervals)))
- g('set xrange [%s:%s]' % (str(intervals[0]), str(intervals[len_intervals-1])))
- g('set yrange [0:%s]' %str(y_max+2))
- plot_text = """plot "% s" index 0:0 using 1:2 title "" w steps lt %s lw 3""" % (data_file, color_line_list[1])
-
- g('%s' % plot_text)
- return (graphe_name, data_file)
-
-def remove_old_img(prefix_file_name):
- """Detele all the images older than 10 minutes to prevent to much storage
+ plot_text += plot_command(0, data_file, (len_docid_list, len_docid_list), "", "impulses", 0, 2 )
+ plot_text += plot_command(0, data_file, (len_docid_list, len_docid_list), "TOTAL", "lines", 0, 5)
+
+ elif kind_of_graphe == 'download_users':
+ g('set size 0.25,0.5')
+ g('set xrange [0:4]')
+ g('set yrange [0:100]')
+ g('set format y "%g %%"')
+ g("""set xtics ("" 0, "CERN\\n Users" 1, "Other\\n Users" 3, "" 4)""")
+ g('set ytics 0,10,100')
+ g('set boxwidth 0.7 relative')
+ g('set style fill solid 0.25')
+ plot_text = 'plot "%s" using 1:2 title "" with boxes lt 7 lw 2' % data_file
+
+ else: #citation
+ g('set boxwidth 0.6 relative')
+ g('set style fill solid 0.250000 border -1')
+ g('set xtics rotate %s'% str(tuple(intervals)))
+ g('set xrange [%s:%s]' % (str(intervals[0]), str(intervals[len_intervals-1])))
+ g('set yrange [0:%s]' %str(y_max+2))
+ plot_text = """plot "% s" index 0:0 using 1:2 title "" w steps lt %s lw 3""" % (data_file, color_line_list[1])
+
+ g('%s' % plot_text)
+ return (graphe_name, data_file)
+
+def remove_old_img(prefix_file_name):
+ """Detele all the images older than 10 minutes to prevent to much storage
Takes 0.0 seconds for 50 files to delete"""
-
- command = "find %s/img/ -name tmp_%s*.png -amin +10 -exec rm -f {} \;" % (webdir, prefix_file_name)
- return os.system(command)
-
-def plot_command(first_line, file_source, indexes, title, style, line_type, line_width, point_type="", point_size=""):
+
+ command = "find %s/img/ -name tmp_%s*.png -amin +10 -exec rm -f {} \;" % (CFG_WEBDIR, prefix_file_name)
+ return os.system(command)
+
+def plot_command(first_line, file_source, indexes, title, style, line_type, line_width, point_type="", point_size=""):
"""Return a string of a gnuplot plot command.Particularly useful when multiple curves
From a temporary file, draw a gnuplot graph
Return a plot command string as follows:
- plot datafile , datafile ,...
+ plot datafile , datafile ,...
The arguments are as follows:
first_line - only the drawing command of the first curve contains the word plot
file_source - data file source which containes coordinates
indexes - points out set number in data file source
title - title of the curve in the legend box
style - respresentation of the curve ex: linespoints, lines ...
line_type - color of the line
line_width - width of the line
point_type - optionnal parameter: if not mentionned it's a wide string.
Using in the case of style = linespoints to set point style"""
- if first_line:
- plot_text = """plot "%s" index %s:%s using 1:2 title "%s" with %s lt %s lw %s %s %s""" % (file_source, indexes[0], indexes[1], title, style, line_type, line_width, point_type, point_size)
- else:
- plot_text = """, "%s" index %s:%s using 1:2 title "%s" with %s lt %s lw %s %s %s""" % (file_source, indexes[0], indexes[1], title, style, line_type, line_width, point_type, point_size)
- return plot_text
+ if first_line:
+ plot_text = """plot "%s" index %s:%s using 1:2 title "%s" with %s lt %s lw %s %s %s""" % (file_source, indexes[0], indexes[1], title, style, line_type, line_width, point_type, point_size)
+ else:
+ plot_text = """, "%s" index %s:%s using 1:2 title "%s" with %s lt %s lw %s %s %s""" % (file_source, indexes[0], indexes[1], title, style, line_type, line_width, point_type, point_size)
+ return plot_text
diff --git a/modules/bibrank/lib/bibrank_record_sorter.py b/modules/bibrank/lib/bibrank_record_sorter.py
index c1eec595f..4cf3a5392 100644
--- a/modules/bibrank/lib/bibrank_record_sorter.py
+++ b/modules/bibrank/lib/bibrank_record_sorter.py
@@ -1,678 +1,678 @@
# -*- coding: utf-8 -*-
##
## $Id$
## Ranking of records using different parameters and methods on the fly.
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
__revision__ = "$Id$"
import sys
import string
import time
import math
import re
import ConfigParser
import traceback
import copy
from invenio.config import \
cdslang, \
- etcdir, \
- version
+ CFG_ETCDIR, \
+ CFG_VERSION
from invenio.dbquery import run_sql, serialize_via_marshal, deserialize_via_marshal
from invenio.webpage import adderrorbox
from invenio.bibindex_engine_stemmer import stem
from invenio.bibindex_engine_stopwords import is_stopword
from invenio.bibrank_citation_searcher import calculate_cited_by_list, get_cited_by, get_cited_by_list
from invenio.intbitset import intbitset
def compare_on_val(first, second):
return cmp(second[1], first[1])
def check_term(term, col_size, term_rec, max_occ, min_occ, termlength):
"""Check if the tem is valid for use
term - the term to check
col_size - the number of records in database
term_rec - the number of records which contains this term
max_occ - max frequency of the term allowed
min_occ - min frequence of the term allowed
termlength - the minimum length of the terms allowed"""
try:
if is_stopword(term, 1) or (len(term) <= termlength) or ((float(term_rec) / float(col_size)) >= max_occ) or ((float(term_rec) / float(col_size)) <= min_occ):
return ""
if int(term):
return ""
except StandardError, e:
pass
return "true"
def create_rnkmethod_cache():
"""Create cache with vital information for each rank method."""
global methods
bibrank_meths = run_sql("SELECT name from rnkMETHOD")
methods = {}
global voutput
voutput = ""
for (rank_method_code,) in bibrank_meths:
try:
- file = etcdir + "/bibrank/" + rank_method_code + ".cfg"
+ file = CFG_ETCDIR + "/bibrank/" + rank_method_code + ".cfg"
config = ConfigParser.ConfigParser()
config.readfp(open(file))
except StandardError, e:
pass
cfg_function = config.get("rank_method", "function")
if config.has_section(cfg_function):
methods[rank_method_code] = {}
methods[rank_method_code]["function"] = cfg_function
methods[rank_method_code]["prefix"] = config.get(cfg_function, "relevance_number_output_prologue")
methods[rank_method_code]["postfix"] = config.get(cfg_function, "relevance_number_output_epilogue")
methods[rank_method_code]["chars_alphanumericseparators"] = r"[1234567890\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~]"
else:
- raise Exception("Error in configuration file: %s" % (etcdir + "/bibrank/" + rank_method_code + ".cfg"))
+ raise Exception("Error in configuration file: %s" % (CFG_ETCDIR + "/bibrank/" + rank_method_code + ".cfg"))
i8n_names = run_sql("""SELECT ln,value from rnkMETHODNAME,rnkMETHOD where id_rnkMETHOD=rnkMETHOD.id and rnkMETHOD.name=%s""", (rank_method_code,))
for (ln, value) in i8n_names:
methods[rank_method_code][ln] = value
if config.has_option(cfg_function, "table"):
methods[rank_method_code]["rnkWORD_table"] = config.get(cfg_function, "table")
methods[rank_method_code]["col_size"] = run_sql("SELECT count(*) FROM %sR" % methods[rank_method_code]["rnkWORD_table"][:-1])[0][0]
if config.has_option(cfg_function, "stemming") and config.get(cfg_function, "stemming"):
try:
methods[rank_method_code]["stemmer"] = config.get(cfg_function, "stemming")
except Exception,e:
pass
if config.has_option(cfg_function, "stopword"):
methods[rank_method_code]["stopwords"] = config.get(cfg_function, "stopword")
if config.has_section("find_similar"):
methods[rank_method_code]["max_word_occurence"] = float(config.get("find_similar", "max_word_occurence"))
methods[rank_method_code]["min_word_occurence"] = float(config.get("find_similar", "min_word_occurence"))
methods[rank_method_code]["min_word_length"] = int(config.get("find_similar", "min_word_length"))
methods[rank_method_code]["min_nr_words_docs"] = int(config.get("find_similar", "min_nr_words_docs"))
methods[rank_method_code]["max_nr_words_upper"] = int(config.get("find_similar", "max_nr_words_upper"))
methods[rank_method_code]["max_nr_words_lower"] = int(config.get("find_similar", "max_nr_words_lower"))
methods[rank_method_code]["default_min_relevance"] = int(config.get("find_similar", "default_min_relevance"))
if config.has_section("combine_method"):
i = 1
methods[rank_method_code]["combine_method"] = []
while config.has_option("combine_method", "method%s" % i):
methods[rank_method_code]["combine_method"].append(string.split(config.get("combine_method", "method%s" % i), ","))
i += 1
def is_method_valid(colID, rank_method_code):
"""Checks if a method is valid for the collection given"""
enabled_colls = dict(run_sql("SELECT id_collection, score from collection_rnkMETHOD,rnkMETHOD WHERE id_rnkMETHOD=rnkMETHOD.id AND name='%s'" % rank_method_code))
try:
colID = int(colID)
except TypeError:
return 0
if enabled_colls.has_key(colID):
return 1
else:
while colID:
colID = run_sql("SELECT id_dad FROM collection_collection WHERE id_son=%s" % colID)
if colID and enabled_colls.has_key(colID[0][0]):
return 1
elif colID:
colID = colID[0][0]
return 0
def get_bibrank_methods(collection, ln=cdslang):
"""Returns a list of rank methods and the name om them in the language defined by the ln parameter, if collection is given, only methods enabled for that collection is returned."""
if not globals().has_key('methods'):
create_rnkmethod_cache()
avail_methods = []
for (rank_method_code, options) in methods.iteritems():
if options.has_key("function") and is_method_valid(collection, rank_method_code):
if options.has_key(ln):
avail_methods.append((rank_method_code, options[ln]))
elif options.has_key(cdslang):
avail_methods.append((rank_method_code, options[cdslang]))
else:
avail_methods.append((rank_method_code, rank_method_code))
return avail_methods
def rank_records(rank_method_code, rank_limit_relevance, hitset_global, pattern=[], verbose=0):
"""rank_method_code, e.g. `jif' or `sbr' (word frequency vector model)
rank_limit_relevance, e.g. `23' for `nbc' (number of citations) or `0.10' for `vec'
hitset, search engine hits;
pattern, search engine query or record ID (you check the type)
verbose, verbose level
output:
list of records
list of rank values
prefix
postfix
verbose_output"""
global voutput
voutput = ""
configcreated = ""
starttime = time.time()
afterfind = starttime - time.time()
aftermap = starttime - time.time()
try:
hitset = copy.deepcopy(hitset_global) #we are receiving a global hitset
if not globals().has_key('methods'):
create_rnkmethod_cache()
function = methods[rank_method_code]["function"]
#we get 'citation' method correctly here
func_object = globals().get(function)
if func_object and pattern and pattern[0][0:6] == "recid:" and function == "word_similarity":
result = find_similar(rank_method_code, pattern[0][6:], hitset, rank_limit_relevance, verbose)
elif rank_method_code == "citation" and pattern:
#we get rank_method_code correctly here. pattern[0] is the search word - not used by find_cit
result = find_citations(rank_method_code, pattern[0][6:], hitset, verbose)
elif func_object:
result = func_object(rank_method_code, pattern, hitset, rank_limit_relevance, verbose)
else:
result = rank_by_method(rank_method_code, pattern, hitset, rank_limit_relevance, verbose)
except Exception, e:
result = (None, "", adderrorbox("An error occured when trying to rank the search result "+rank_method_code, ["Unexpected error: %s Traceback:%s" % (e, traceback.format_tb(sys.exc_info()[2]))]), voutput)
afterfind = time.time() - starttime
if result[0] and result[1]: #split into two lists for search_engine
results_similar_recIDs = map(lambda x: x[0], result[0])
results_similar_relevances = map(lambda x: x[1], result[0])
result = (results_similar_recIDs, results_similar_relevances, result[1], result[2], "%s" % configcreated + result[3])
aftermap = time.time() - starttime;
else:
result = (None, None, result[1], result[2], result[3])
if verbose > 0:
voutput = voutput+"\nElapsed time after finding: "+str(afterfind)+"\nElapsed after mapping: "+str(aftermap)
print string.replace(voutput, " ", "\n")
#add stuff from here into voutput from result
tmp = result[4]+voutput
result = (result[0],result[1],result[2],result[3],tmp)
#dbg = string.join(map(str,methods[rank_method_code].items()))
#result = (None, "", adderrorbox("Debug ",rank_method_code+" "+dbg),"",voutput);
return result
def combine_method(rank_method_code, pattern, hitset, rank_limit_relevance,verbose):
"""combining several methods into one based on methods/percentage in config file"""
global voutput
result = {}
try:
for (method, percent) in methods[rank_method_code]["combine_method"]:
function = methods[method]["function"]
func_object = globals().get(function)
percent = int(percent)
if func_object:
this_result = func_object(method, pattern, hitset, rank_limit_relevance, verbose)[0]
else:
this_result = rank_by_method(method, pattern, hitset, rank_limit_relevance, verbose)[0]
for i in range(0, len(this_result)):
(recID, value) = this_result[i]
if value > 0:
result[recID] = result.get(recID, 0) + int((float(i) / len(this_result)) * float(percent))
result = result.items()
result.sort(lambda x, y: cmp(x[1], y[1]))
return (result, "(", ")", voutput)
except Exception, e:
return (None, "Warning: %s method cannot be used for ranking your query." % rank_method_code, "", voutput)
def rank_by_method(rank_method_code, lwords, hitset, rank_limit_relevance,verbose):
"""Ranking of records based on predetermined values.
input:
rank_method_code - the code of the method, from the name field in rnkMETHOD, used to get predetermined values from
rnkMETHODDATA
lwords - a list of words from the query
hitset - a list of hits for the query found by search_engine
rank_limit_relevance - show only records with a rank value above this
verbose - verbose value
output:
reclist - a list of sorted records, with unsorted added to the end: [[23,34], [344,24], [1,01]]
prefix - what to show before the rank value
postfix - what to show after the rank value
voutput - contains extra information, content dependent on verbose value"""
global voutput
rnkdict = run_sql("SELECT relevance_data FROM rnkMETHODDATA,rnkMETHOD where rnkMETHOD.id=id_rnkMETHOD and rnkMETHOD.name='%s'" % rank_method_code)
if not rnkdict:
return (None, "Warning: Could not load ranking data for method %s." % rank_method_code, "", voutput)
max_recid = int(run_sql("SELECT max(id) FROM bibrec")[0][0])
lwords_hitset = None
for j in range(0, len(lwords)): #find which docs to search based on ranges..should be done in search_engine...
if lwords[j] and lwords[j][:6] == "recid:":
if not lwords_hitset:
lwords_hitset = HitSet()
lword = lwords[j][6:]
if string.find(lword, "->") > -1:
lword = string.split(lword, "->")
if int(lword[0]) >= max_recid or int(lword[1]) >= max_recid + 1:
return (None, "Warning: Given record IDs are out of range.", "", voutput)
for i in range(int(lword[0]), int(lword[1])):
lwords_hitset.add(int(i))
elif lword < max_recid + 1:
lwords_hitset.add(int(lword))
else:
return (None, "Warning: Given record IDs are out of range.", "", voutput)
rnkdict = deserialize_via_marshal(rnkdict[0][0])
if verbose > 0:
voutput += " Running rank method: %s, using rank_by_method function in bibrank_record_sorter " % rank_method_code
voutput += "Ranking data loaded, size of structure: %s " % len(rnkdict)
lrecIDs = list(hitset)
if verbose > 0:
voutput += "Number of records to rank: %s " % len(lrecIDs)
reclist = []
reclist_addend = []
if not lwords_hitset: #rank all docs, can this be speed up using something else than for loop?
for recID in lrecIDs:
if rnkdict.has_key(recID):
reclist.append((recID, rnkdict[recID]))
del rnkdict[recID]
else:
reclist_addend.append((recID, 0))
else: #rank docs in hitset, can this be speed up using something else than for loop?
lwords_lrecIDs = lwords_hitset.items()
for recID in lwords_lrecIDs:
if rnkdict.has_key(recID) and recID in hitset:
reclist.append((recID, rnkdict[recID]))
del rnkdict[recID]
elif recID in hitset:
reclist_addend.append((recID, 0))
if verbose > 0:
voutput += "Number of records ranked: %s " % len(reclist)
voutput += "Number of records not ranked: %s " % len(reclist_addend)
reclist.sort(lambda x, y: cmp(x[1], y[1]))
return (reclist_addend + reclist, methods[rank_method_code]["prefix"], methods[rank_method_code]["postfix"], voutput)
def find_citations(rank_method_code, recID, hitset, verbose):
"""Rank by the amount of citations."""
#calculate the cited-by values for all the members of the hitset
#returns: ((recordid,weight),prefix,postfix,message)
global voutput
voutput = ""
#If the recID is numeric, return only stuff that cites it. Otherwise return
#stuff that cites hitset
#try to convert to int
recisint = True
recidint = 0
try:
recidint = int(recID)
except:
recisint = False
ret = []
if recisint:
myrecords = get_cited_by(recidint) #this is a simple list
for r in myrecords:
ret.append([r,0])
else:
ret = get_cited_by_list(hitset)
ret.sort(lambda x,y:cmp(x[1],y[1])) #ascending by the second member of the tuples
if verbose > 0:
voutput = voutput+"\nrecID "+str(recID)+" hitset "+str(hitset)+"\n"+"find_citations retlist "+str(ret)
#voutput = voutput + str(ret)
if ret:
return (ret,"(", ")", "")
else:
return ((),"", "", "")
def find_similar(rank_method_code, recID, hitset, rank_limit_relevance,verbose):
"""Finding terms to use for calculating similarity. Terms are taken from the recid given, returns a list of recids's and relevance,
input:
rank_method_code - the code of the method, from the name field in rnkMETHOD
recID - records to use for find similar
hitset - a list of hits for the query found by search_engine
rank_limit_relevance - show only records with a rank value above this
verbose - verbose value
output:
reclist - a list of sorted records: [[23,34], [344,24], [1,01]]
prefix - what to show before the rank value
postfix - what to show after the rank value
voutput - contains extra information, content dependent on verbose value"""
startCreate = time.time()
global voutput
if verbose > 0:
voutput += " Running rank method: %s, using find_similar/word_frequency in bibrank_record_sorter " % rank_method_code
rank_limit_relevance = methods[rank_method_code]["default_min_relevance"]
try:
recID = int(recID)
except Exception,e :
return (None, "Warning: Error in record ID, please check that a number is given.", "", voutput)
rec_terms = run_sql("""SELECT termlist FROM %sR WHERE id_bibrec=%%s""" % methods[rank_method_code]["rnkWORD_table"][:-1], (recID,))
if not rec_terms:
return (None, "Warning: Requested record does not seem to exist.", "", voutput)
rec_terms = deserialize_via_marshal(rec_terms[0][0])
#Get all documents using terms from the selected documents
if len(rec_terms) == 0:
return (None, "Warning: Record specified has no content indexed for use with this method.", "", voutput)
else:
terms = "%s" % rec_terms.keys()
terms_recs = dict(run_sql("""SELECT term, hitlist FROM %s WHERE term IN (%s)""" % (methods[rank_method_code]["rnkWORD_table"], terms[1:len(terms) - 1])))
tf_values = {}
#Calculate all term frequencies
for (term, tf) in rec_terms.iteritems():
if len(term) >= methods[rank_method_code]["min_word_length"] and terms_recs.has_key(term) and tf[1] != 0:
tf_values[term] = int((1 + math.log(tf[0])) * tf[1]) #calculate term weigth
tf_values = tf_values.items()
tf_values.sort(lambda x, y: cmp(y[1], x[1])) #sort based on weigth
lwords = []
stime = time.time()
(recdict, rec_termcount) = ({}, {})
for (t, tf) in tf_values: #t=term, tf=term frequency
term_recs = deserialize_via_marshal(terms_recs[t])
if len(tf_values) <= methods[rank_method_code]["max_nr_words_lower"] or (len(term_recs) >= methods[rank_method_code]["min_nr_words_docs"] and (((float(len(term_recs)) / float(methods[rank_method_code]["col_size"])) <= methods[rank_method_code]["max_word_occurence"]) and ((float(len(term_recs)) / float(methods[rank_method_code]["col_size"])) >= methods[rank_method_code]["min_word_occurence"]))): #too complicated...something must be done
lwords.append((t, methods[rank_method_code]["rnkWORD_table"])) #list of terms used
(recdict, rec_termcount) = calculate_record_relevance_findsimilar((t, round(tf, 4)) , term_recs, hitset, recdict, rec_termcount, verbose, "true") #true tells the function to not calculate all unimportant terms
if len(tf_values) > methods[rank_method_code]["max_nr_words_lower"] and (len(lwords) == methods[rank_method_code]["max_nr_words_upper"] or tf < 0):
break
if len(recdict) == 0 or len(lwords) == 0:
return (None, "Could not find any similar documents, possibly because of error in ranking data.", "", voutput)
else: #sort if we got something to sort
(reclist, hitset) = sort_record_relevance_findsimilar(recdict, rec_termcount, hitset, rank_limit_relevance, verbose)
if verbose > 0:
voutput += " Number of terms: %s " % run_sql("SELECT count(id) FROM %s" % methods[rank_method_code]["rnkWORD_table"])[0][0]
voutput += "Number of terms to use for query: %s " % len(lwords)
voutput += "Terms: %s " % lwords
voutput += "Current number of recIDs: %s " % (methods[rank_method_code]["col_size"])
voutput += "Prepare time: %s " % (str(time.time() - startCreate))
voutput += "Total time used: %s " % (str(time.time() - startCreate))
rank_method_stat(rank_method_code, reclist, lwords)
return (reclist[:len(reclist)], methods[rank_method_code]["prefix"], methods[rank_method_code]["postfix"], voutput)
def word_similarity(rank_method_code, lwords, hitset, rank_limit_relevance, verbose):
"""Ranking a records containing specified words and returns a sorted list.
input:
rank_method_code - the code of the method, from the name field in rnkMETHOD
lwords - a list of words from the query
hitset - a list of hits for the query found by search_engine
rank_limit_relevance - show only records with a rank value above this
verbose - verbose value
output:
reclist - a list of sorted records: [[23,34], [344,24], [1,01]]
prefix - what to show before the rank value
postfix - what to show after the rank value
voutput - contains extra information, content dependent on verbose value"""
global voutput
startCreate = time.time()
if verbose > 0:
voutput += " Running rank method: %s, using word_frequency function in bibrank_record_sorter " % rank_method_code
lwords_old = lwords
lwords = []
#Check terms, remove non alphanumeric characters. Use both unstemmed and stemmed version of all terms.
for i in range(0, len(lwords_old)):
term = string.lower(lwords_old[i])
if not methods[rank_method_code]["stopwords"] == "True" or methods[rank_method_code]["stopwords"] and not is_stopword(term, 1):
lwords.append((term, methods[rank_method_code]["rnkWORD_table"]))
terms = string.split(string.lower(re.sub(methods[rank_method_code]["chars_alphanumericseparators"], ' ', term)))
for term in terms:
if methods[rank_method_code].has_key("stemmer"): # stem word
term = stem(string.replace(term, ' ', ''), methods[rank_method_code]["stemmer"])
if lwords_old[i] != term: #add if stemmed word is different than original word
lwords.append((term, methods[rank_method_code]["rnkWORD_table"]))
(recdict, rec_termcount, lrecIDs_remove) = ({}, {}, {})
#For each term, if accepted, get a list of the records using the term
#calculate then relevance for each term before sorting the list of records
for (term, table) in lwords:
term_recs = run_sql("""SELECT term, hitlist FROM %s WHERE term=%%s""" % methods[rank_method_code]["rnkWORD_table"], (term,))
if term_recs: #if term exists in database, use for ranking
term_recs = deserialize_via_marshal(term_recs[0][1])
(recdict, rec_termcount) = calculate_record_relevance((term, int(term_recs["Gi"][1])) , term_recs, hitset, recdict, rec_termcount, verbose, quick=None)
del term_recs
if len(recdict) == 0 or (len(lwords) == 1 and lwords[0] == ""):
return (None, "Records not ranked. The query is not detailed enough, or not enough records found, for ranking to be possible.", "", voutput)
else: #sort if we got something to sort
(reclist, hitset) = sort_record_relevance(recdict, rec_termcount, hitset, rank_limit_relevance, verbose)
#Add any documents not ranked to the end of the list
if hitset:
lrecIDs = list(hitset) #using 2-3mb
reclist = zip(lrecIDs, [0] * len(lrecIDs)) + reclist #using 6mb
if verbose > 0:
voutput += " Current number of recIDs: %s " % (methods[rank_method_code]["col_size"])
voutput += "Number of terms: %s " % run_sql("SELECT count(id) FROM %s" % methods[rank_method_code]["rnkWORD_table"])[0][0]
voutput += "Terms: %s " % lwords
voutput += "Prepare and pre calculate time: %s " % (str(time.time() - startCreate))
voutput += "Total time used: %s " % (str(time.time() - startCreate))
rank_method_stat(rank_method_code, reclist, lwords)
return (reclist, methods[rank_method_code]["prefix"], methods[rank_method_code]["postfix"], voutput)
def calculate_record_relevance(term, invidx, hitset, recdict, rec_termcount, verbose, quick=None):
"""Calculating the relevance of the documents based on the input, calculates only one word
term - (term, query term factor) the term and its importance in the overall search
invidx - {recid: tf, Gi: norm value} The Gi value is used as a idf value
hitset - a hitset with records that are allowed to be ranked
recdict - contains currently ranked records, is returned with new values
rec_termcount - {recid: count} the number of terms in this record that matches the query
verbose - verbose value
quick - if quick=yes only terms with a positive qtf is used, to limit the number of records to sort"""
(t, qtf) = term
if invidx.has_key("Gi"):#Gi = weigth for this term, created by bibrank_word_indexer
Gi = invidx["Gi"][1]
del invidx["Gi"]
else: #if not existing, bibrank should be run with -R
return (recdict, rec_termcount)
if not quick or (qtf >= 0 or (qtf < 0 and len(recdict) == 0)):
#Only accept records existing in the hitset received from the search engine
for (j, tf) in invidx.iteritems():
if j in hitset:#only include docs found by search_engine based on query
try: #calculates rank value
recdict[j] = recdict.get(j, 0) + int(math.log(tf[0] * Gi * tf[1] * qtf))
except:
return (recdict, rec_termcount)
rec_termcount[j] = rec_termcount.get(j, 0) + 1 #number of terms from query in document
elif quick: #much used term, do not include all records, only use already existing ones
for (j, tf) in recdict.iteritems(): #i.e: if doc contains important term, also count unimportant
if invidx.has_key(j):
tf = invidx[j]
recdict[j] = recdict.get(j, 0) + int(math.log(tf[0] * Gi * tf[1] * qtf))
rec_termcount[j] = rec_termcount.get(j, 0) + 1 #number of terms from query in document
return (recdict, rec_termcount)
def calculate_record_relevance_findsimilar(term, invidx, hitset, recdict, rec_termcount, verbose, quick=None):
"""Calculating the relevance of the documents based on the input, calculates only one word
term - (term, query term factor) the term and its importance in the overall search
invidx - {recid: tf, Gi: norm value} The Gi value is used as a idf value
hitset - a hitset with records that are allowed to be ranked
recdict - contains currently ranked records, is returned with new values
rec_termcount - {recid: count} the number of terms in this record that matches the query
verbose - verbose value
quick - if quick=yes only terms with a positive qtf is used, to limit the number of records to sort"""
(t, qtf) = term
if invidx.has_key("Gi"): #Gi = weigth for this term, created by bibrank_word_indexer
Gi = invidx["Gi"][1]
del invidx["Gi"]
else: #if not existing, bibrank should be run with -R
return (recdict, rec_termcount)
if not quick or (qtf >= 0 or (qtf < 0 and len(recdict) == 0)):
#Only accept records existing in the hitset received from the search engine
for (j, tf) in invidx.iteritems():
if j in hitset: #only include docs found by search_engine based on query
#calculate rank value
recdict[j] = recdict.get(j, 0) + int((1 + math.log(tf[0])) * Gi * tf[1] * qtf)
rec_termcount[j] = rec_termcount.get(j, 0) + 1 #number of terms from query in document
elif quick: #much used term, do not include all records, only use already existing ones
for (j, tf) in recdict.iteritems(): #i.e: if doc contains important term, also count unimportant
if invidx.has_key(j):
tf = invidx[j]
recdict[j] = recdict[j] + int((1 + math.log(tf[0])) * Gi * tf[1] * qtf)
rec_termcount[j] = rec_termcount.get(j, 0) + 1 #number of terms from query in document
return (recdict, rec_termcount)
def sort_record_relevance(recdict, rec_termcount, hitset, rank_limit_relevance, verbose):
"""Sorts the dictionary and returns records with a relevance higher than the given value.
recdict - {recid: value} unsorted
rank_limit_relevance - a value > 0 usually
verbose - verbose value"""
startCreate = time.time()
global voutput
reclist = []
#remove all ranked documents so that unranked can be added to the end
hitset -= recdict.keys()
#gives each record a score between 0-100
divideby = max(recdict.values())
for (j, w) in recdict.iteritems():
w = int(w * 100 / divideby)
if w >= rank_limit_relevance:
reclist.append((j, w))
#sort scores
reclist.sort(lambda x, y: cmp(x[1], y[1]))
if verbose > 0:
voutput += "Number of records sorted: %s " % len(reclist)
voutput += "Sort time: %s " % (str(time.time() - startCreate))
return (reclist, hitset)
def sort_record_relevance_findsimilar(recdict, rec_termcount, hitset, rank_limit_relevance, verbose):
"""Sorts the dictionary and returns records with a relevance higher than the given value.
recdict - {recid: value} unsorted
rank_limit_relevance - a value > 0 usually
verbose - verbose value"""
startCreate = time.time()
global voutput
reclist = []
#Multiply with the number of terms of the total number of terms in the query existing in the records
for j in recdict.keys():
if recdict[j] > 0 and rec_termcount[j] > 1:
recdict[j] = math.log((recdict[j] * rec_termcount[j]))
else:
recdict[j] = 0
hitset -= recdict.keys()
#gives each record a score between 0-100
divideby = max(recdict.values())
for (j, w) in recdict.iteritems():
w = int(w * 100 / divideby)
if w >= rank_limit_relevance:
reclist.append((j, w))
#sort scores
reclist.sort(lambda x, y: cmp(x[1], y[1]))
if verbose > 0:
voutput += "Number of records sorted: %s " % len(reclist)
voutput += "Sort time: %s " % (str(time.time() - startCreate))
return (reclist, hitset)
def rank_method_stat(rank_method_code, reclist, lwords):
"""Shows some statistics about the searchresult.
rank_method_code - name field from rnkMETHOD
reclist - a list of sorted and ranked records
lwords - the words in the query"""
global voutput
if len(reclist) > 20:
j = 20
else:
j = len(reclist)
voutput += " Rank statistics: "
for i in range(1, j + 1):
voutput += "%s,Recid:%s,Score:%s " % (i,reclist[len(reclist) - i][0],reclist[len(reclist) - i][1])
for (term, table) in lwords:
term_recs = run_sql("""SELECT hitlist FROM %s WHERE term=%%s""" % table, (term,))
if term_recs:
term_recs = deserialize_via_marshal(term_recs[0][0])
if term_recs.has_key(reclist[len(reclist) - i][0]):
voutput += "%s-%s / " % (term, term_recs[reclist[len(reclist) - i][0]])
voutput += " "
voutput += " Score variation: "
count = {}
for i in range(0, len(reclist)):
count[reclist[i][1]] = count.get(reclist[i][1], 0) + 1
i = 100
while i >= 0:
if count.has_key(i):
voutput += "%s-%s " % (i, count[i])
i -= 1
try:
import psyco
psyco.bind(find_similar)
psyco.bind(rank_by_method)
psyco.bind(calculate_record_relevance)
psyco.bind(word_similarity)
psyco.bind(sort_record_relevance)
except StandardError, e:
pass
diff --git a/modules/bibrank/lib/bibrank_tag_based_indexer.py b/modules/bibrank/lib/bibrank_tag_based_indexer.py
index 4db515889..72e03ef75 100644
--- a/modules/bibrank/lib/bibrank_tag_based_indexer.py
+++ b/modules/bibrank/lib/bibrank_tag_based_indexer.py
@@ -1,434 +1,434 @@
# -*- coding: utf-8 -*-
## $Id$
## Ranking of records using different parameters and methods.
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
__revision__ = "$Id$"
import sys
import time
import marshal
import traceback
import ConfigParser
from invenio.config import \
cdslang, \
- etcdir
+ CFG_ETCDIR
from invenio.search_engine import perform_request_search, HitSet
from invenio.bibrank_citation_indexer import get_citation_weight
from invenio.bibrank_downloads_indexer import *
from invenio.dbquery import run_sql, serialize_via_marshal, deserialize_via_marshal
from invenio.bibtask import task_get_option, write_message
options = {}
def citation_exec(rank_method_code, name, config):
"""Creating the rank method data for citation"""
dict = get_citation_weight(rank_method_code, config)
date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
if dict: intoDB(dict, date, rank_method_code)
else: write_message("no need to update the indexes for citations")
def download_weight_filtering_user(run):
return bibrank_engine(run)
def download_weight_total(run):
return bibrank_engine(run)
def file_similarity_by_times_downloaded(run):
return bibrank_engine(run)
def download_weight_filtering_user_exec (rank_method_code, name, config):
"""Ranking by number of downloads per User.
Only one full Text Download is taken in account for one specific userIP address"""
time1 = time.time()
dic = fromDB(rank_method_code)
last_updated = get_lastupdated(rank_method_code)
keys = new_downloads_to_index(last_updated)
filter_downloads_per_hour(keys, last_updated)
dic = get_download_weight_filtering_user(dic, keys)
date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
intoDB(dic, date, rank_method_code)
time2 = time.time()
return {"time":time2-time1}
def download_weight_total_exec(rank_method_code, name, config):
"""rankink by total number of downloads without check the user ip
if users downloads 3 time the same full text document it has to be count as 3 downloads"""
time1 = time.time()
dic = fromDB(rank_method_code)
last_updated = get_lastupdated(rank_method_code)
keys = new_downloads_to_index(last_updated)
filter_downloads_per_hour(keys, last_updated)
dic = get_download_weight_total(dic, keys)
date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
intoDB(dic, date, rank_method_code)
time2 = time.time()
return {"time":time2-time1}
def file_similarity_by_times_downloaded_exec(rank_method_code, name, config):
"""update dictionnary {recid:[(recid, nb page similarity), ()..]}"""
time1 = time.time()
dic = fromDB(rank_method_code)
last_updated = get_lastupdated(rank_method_code)
keys = new_downloads_to_index(last_updated)
filter_downloads_per_hour(keys, last_updated)
dic = get_file_similarity_by_times_downloaded(dic, keys)
date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
intoDB(dic, date, rank_method_code)
time2 = time.time()
return {"time":time2-time1}
def single_tag_rank_method_exec(rank_method_code, name, config):
"""Creating the rank method data"""
startCreate = time.time()
rnkset = {}
rnkset_old = fromDB(rank_method_code)
date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
rnkset_new = single_tag_rank(config)
rnkset = union_dicts(rnkset_old, rnkset_new)
intoDB(rnkset, date, rank_method_code)
def single_tag_rank(config):
"""Connect the given tag with the data from the kb file given"""
write_message("Loading knowledgebase file", verbose=9)
kb_data = {}
records = []
write_message("Reading knowledgebase file: %s" % config.get(config.get("rank_method", "function"), "kb_src"))
input = open(config.get(config.get("rank_method", "function"), "kb_src"), 'r')
data = input.readlines()
for line in data:
if not line[0:1] == "#":
kb_data[string.strip((string.split(string.strip(line), "---"))[0])] = (string.split(string.strip(line), "---"))[1]
write_message("Number of lines read from knowledgebase file: %s" % len(kb_data))
tag = config.get(config.get("rank_method", "function"), "tag")
tags = config.get(config.get("rank_method", "function"), "check_mandatory_tags").split(", ")
if tags == ['']:
tags = ""
records = []
for (recids, recide) in options["recid_range"]:
write_message("......Processing records #%s-%s" % (recids, recide))
recs = run_sql("SELECT id_bibrec, value FROM bib%sx, bibrec_bib%sx WHERE tag=%%s AND id_bibxxx=id and id_bibrec >=%%s and id_bibrec<=%%s" % (tag[0:2], tag[0:2]), (tag, recids, recide))
valid = HitSet(trailing_bits=1)
valid.discard(0)
for key in tags:
newset = HitSet()
newset += [recid[0] for recid in (run_sql("SELECT id_bibrec FROM bib%sx, bibrec_bib%sx WHERE id_bibxxx=id AND tag=%%s AND id_bibxxx=id and id_bibrec >=%%s and id_bibrec<=%%s" % (tag[0:2], tag[0:2]), (key, recids, recide)))]
valid.intersection_update(newset)
if tags:
recs = filter(lambda x: x[0] in valid, recs)
records = records + list(recs)
write_message("Number of records found with the necessary tags: %s" % len(records))
records = filter(lambda x: x[0] in options["validset"], records)
rnkset = {}
for key, value in records:
if kb_data.has_key(value):
if not rnkset.has_key(key):
rnkset[key] = float(kb_data[value])
else:
if kb_data.has_key(rnkset[key]) and float(kb_data[value]) > float((rnkset[key])[1]):
rnkset[key] = float(kb_data[value])
else:
rnkset[key] = 0
write_message("Number of records available in rank method: %s" % len(rnkset))
return rnkset
def get_lastupdated(rank_method_code):
"""Get the last time the rank method was updated"""
res = run_sql("SELECT rnkMETHOD.last_updated FROM rnkMETHOD WHERE name=%s", (rank_method_code, ))
if res:
return res[0][0]
else:
raise Exception("Is this the first run? Please do a complete update.")
def intoDB(dict, date, rank_method_code):
"""Insert the rank method data into the database"""
mid = run_sql("SELECT id from rnkMETHOD where name=%s", (rank_method_code, ))
del_rank_method_codeDATA(rank_method_code)
serdata = serialize_via_marshal(dict);
midstr = str(mid[0][0]);
run_sql("INSERT INTO rnkMETHODDATA(id_rnkMETHOD, relevance_data) VALUES (%s,%s)", (midstr, serdata,))
run_sql("UPDATE rnkMETHOD SET last_updated=%s WHERE name=%s", (date, rank_method_code))
def fromDB(rank_method_code):
"""Get the data for a rank method"""
id = run_sql("SELECT id from rnkMETHOD where name=%s", (rank_method_code, ))
res = run_sql("SELECT relevance_data FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s", (id[0][0], ))
if res:
return deserialize_via_marshal(res[0][0])
else:
return {}
def del_rank_method_codeDATA(rank_method_code):
"""Delete the data for a rank method"""
id = run_sql("SELECT id from rnkMETHOD where name=%s", (rank_method_code, ))
res = run_sql("DELETE FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s", (id[0][0], ))
def del_recids(rank_method_code, range_rec):
"""Delete some records from the rank method"""
id = run_sql("SELECT id from rnkMETHOD where name=%s", (rank_method_code, ))
res = run_sql("SELECT relevance_data FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s", (id[0][0] ))
if res:
rec_dict = deserialize_via_marshal(res[0][0])
write_message("Old size: %s" % len(rec_dict))
for (recids, recide) in range_rec:
for i in range(int(recids), int(recide)):
if rec_dict.has_key(i):
del rec_dict[i]
write_message("New size: %s" % len(rec_dict))
date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
intoDB(rec_dict, date, rank_method_code)
else:
write_message("Create before deleting!")
def union_dicts(dict1, dict2):
"Returns union of the two dicts."
union_dict = {}
for (key, value) in dict1.iteritems():
union_dict[key] = value
for (key, value) in dict2.iteritems():
union_dict[key] = value
return union_dict
def rank_method_code_statistics(rank_method_code):
"""Print statistics"""
method = fromDB(rank_method_code)
max = ('', -999999)
maxcount = 0
min = ('', 999999)
mincount = 0
for (recID, value) in method.iteritems():
if value < min and value > 0:
min = value
if value > max:
max = value
for (recID, value) in method.iteritems():
if value == min:
mincount += 1
if value == max:
maxcount += 1
write_message("Showing statistic for selected method")
write_message("Method name: %s" % getName(rank_method_code))
write_message("Short name: %s" % rank_method_code)
write_message("Last run: %s" % get_lastupdated(rank_method_code))
write_message("Number of records: %s" % len(method))
write_message("Lowest value: %s - Number of records: %s" % (min, mincount))
write_message("Highest value: %s - Number of records: %s" % (max, maxcount))
write_message("Divided into 10 sets:")
for i in range(1, 11):
setcount = 0
distinct_values = {}
lower = -1.0 + ((float(max + 1) / 10)) * (i - 1)
upper = -1.0 + ((float(max + 1) / 10)) * i
for (recID, value) in method.iteritems():
if value >= lower and value <= upper:
setcount += 1
distinct_values[value] = 1
write_message("Set %s (%s-%s) %s Distinct values: %s" % (i, lower, upper, len(distinct_values), setcount))
def check_method(rank_method_code):
write_message("Checking rank method...")
if len(fromDB(rank_method_code)) == 0:
write_message("Rank method not yet executed, please run it to create the necessary data.")
else:
if len(add_recIDs_by_date(rank_method_code)) > 0:
write_message("Records modified, update recommended")
else:
write_message("No records modified, update not necessary")
def bibrank_engine(run):
"""Run the indexing task.
Return 1 in case of success and 0 in case of failure.
"""
try:
import psyco
psyco.bind(single_tag_rank)
psyco.bind(single_tag_rank_method_exec)
psyco.bind(serialize_via_marshal)
psyco.bind(deserialize_via_marshal)
except StandardError, e:
print "Psyco ERROR", e
startCreate = time.time()
sets = {}
try:
options["run"] = []
options["run"].append(run)
for rank_method_code in options["run"]:
cfg_name = getName(rank_method_code)
write_message("Running rank method: %s." % cfg_name)
- file = etcdir + "/bibrank/" + rank_method_code + ".cfg"
+ file = CFG_ETCDIR + "/bibrank/" + rank_method_code + ".cfg"
config = ConfigParser.ConfigParser()
try:
config.readfp(open(file))
except StandardError, e:
write_message("Cannot find configurationfile: %s" % file, sys.stderr)
raise StandardError
cfg_short = rank_method_code
cfg_function = config.get("rank_method", "function") + "_exec"
cfg_name = getName(cfg_short)
options["validset"] = get_valid_range(rank_method_code)
if task_get_option("collection"):
l_of_colls = string.split(task_get_option("collection"), ", ")
recIDs = perform_request_search(c=l_of_colls)
recIDs_range = []
for recID in recIDs:
recIDs_range.append([recID, recID])
options["recid_range"] = recIDs_range
elif task_get_option("id"):
options["recid_range"] = task_get_option("id")
elif task_get_option("modified"):
options["recid_range"] = add_recIDs_by_date(rank_method_code, task_get_option("modified"))
elif task_get_option("last_updated"):
options["recid_range"] = add_recIDs_by_date(rank_method_code)
else:
write_message("No records specified, updating all", verbose=2)
min_id = run_sql("SELECT min(id) from bibrec")[0][0]
max_id = run_sql("SELECT max(id) from bibrec")[0][0]
options["recid_range"] = [[min_id, max_id]]
if task_get_option("quick") == "no":
write_message("Recalculate parameter not used, parameter ignored.", verbose=9)
if task_get_option("cmd") == "del":
del_recids(cfg_short, options["recid_range"])
elif task_get_option("cmd") == "add":
func_object = globals().get(cfg_function)
func_object(rank_method_code, cfg_name, config)
elif task_get_option("cmd") == "stat":
rank_method_code_statistics(rank_method_code)
elif task_get_option("cmd") == "check":
check_method(rank_method_code)
elif task_get_option("cmd") == "repair":
pass
else:
write_message("Invalid command found processing %s" % rank_method_code, sys.stderr)
raise StandardError
except StandardError, e:
write_message("\nException caught: %s" % e, sys.stderr)
if task_get_option("verbose") >= 9:
traceback.print_tb(sys.exc_info()[2])
raise StandardError
if task_get_option("verbose"):
showtime((time.time() - startCreate))
return 1
def get_valid_range(rank_method_code):
"""Return a range of records"""
write_message("Getting records from collections enabled for rank method.", verbose=9)
res = run_sql("SELECT collection.name FROM collection, collection_rnkMETHOD, rnkMETHOD WHERE collection.id=id_collection and id_rnkMETHOD=rnkMETHOD.id and rnkMETHOD.name=%s", (rank_method_code, ))
l_of_colls = []
for coll in res:
l_of_colls.append(coll[0])
if len(l_of_colls) > 0:
recIDs = perform_request_search(c=l_of_colls)
else:
recIDs = []
valid = HitSet()
valid += recIDs
return valid
def add_recIDs_by_date(rank_method_code, dates=""):
"""Return recID range from records modified between DATES[0] and DATES[1].
If DATES is not set, then add records modified since the last run of
the ranking method RANK_METHOD_CODE.
"""
if not dates:
try:
dates = (get_lastupdated(rank_method_code), '')
except Exception, e:
dates = ("0000-00-00 00:00:00", '')
if dates[0] is None:
dates = ("0000-00-00 00:00:00", '')
query = """SELECT b.id FROM bibrec AS b WHERE b.modification_date >= %s"""
if dates[1]:
query += " and b.modification_date <= %s"
query += " ORDER BY b.id ASC"""
if dates[1]:
res = run_sql(query, (dates[0], dates[1]))
else:
res = run_sql(query, (dates[0], ))
list = create_range_list(res)
if not list:
write_message("No new records added since last time method was run")
return list
def getName(rank_method_code, ln=cdslang, type='ln'):
"""Returns the name of the method if it exists"""
try:
rnkid = run_sql("SELECT id FROM rnkMETHOD where name=%s", (rank_method_code, ))
if rnkid:
rnkid = str(rnkid[0][0])
res = run_sql("SELECT value FROM rnkMETHODNAME where type=%s and ln=%s and id_rnkMETHOD=%s", (type, ln, rnkid))
if not res:
res = run_sql("SELECT value FROM rnkMETHODNAME WHERE ln=%s and id_rnkMETHOD=%s and type=%s", (cdslang, rnkid, type))
if not res:
return rank_method_code
return res[0][0]
else:
raise Exception
except Exception, e:
write_message("Cannot run rank method, either given code for method is wrong, or it has not been added using the webinterface.")
raise Exception
def create_range_list(res):
"""Creates a range list from a recID select query result contained
in res. The result is expected to have ascending numerical order."""
if not res:
return []
row = res[0]
if not row:
return []
else:
range_list = [[row[0], row[0]]]
for row in res[1:]:
id = row[0]
if id == range_list[-1][1] + 1:
range_list[-1][1] = id
else:
range_list.append([id, id])
return range_list
def single_tag_rank_method(run):
return bibrank_engine(run)
def showtime(timeused):
"""Show time used for method"""
write_message("Time used: %d second(s)." % timeused, verbose=9)
def citation(run):
return bibrank_engine(run)
diff --git a/modules/bibrank/lib/bibrank_word_indexer.py b/modules/bibrank/lib/bibrank_word_indexer.py
index 2802993a1..a195c8973 100644
--- a/modules/bibrank/lib/bibrank_word_indexer.py
+++ b/modules/bibrank/lib/bibrank_word_indexer.py
@@ -1,1186 +1,1186 @@
## $Id$
## BibRank word frequency indexer utility.
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
__revision__ = "$Id$"
import sys
import time
import urllib
import traceback
import math
import re
import ConfigParser
from invenio.config import \
cdslang, \
- etcdir
+ CFG_ETCDIR
from invenio.search_engine import perform_request_search, strip_accents, wash_index_term
from invenio.dbquery import run_sql, DatabaseError, serialize_via_marshal, deserialize_via_marshal
from invenio.bibindex_engine_stemmer import is_stemmer_available_for_language, stem
from invenio.bibindex_engine_stopwords import is_stopword
from invenio.bibindex_engine import beautify_range_list, \
kill_sleepy_mysql_threads, create_range_list
from invenio.bibtask import write_message, task_get_option, task_update_progress, \
task_update_status
from invenio.intbitset import intbitset
options = {} # global variable to hold task options
## safety parameters concerning DB thread-multiplication problem:
CFG_CHECK_MYSQL_THREADS = 0 # to check or not to check the problem?
CFG_MAX_MYSQL_THREADS = 50 # how many threads (connections) we consider as still safe
CFG_MYSQL_THREAD_TIMEOUT = 20 # we'll kill threads that were sleeping for more than X seconds
## override urllib's default password-asking behaviour:
class MyFancyURLopener(urllib.FancyURLopener):
def prompt_user_passwd(self, host, realm):
# supply some dummy credentials by default
return ("mysuperuser", "mysuperpass")
def http_error_401(self, url, fp, errcode, errmsg, headers):
# do not bother with protected pages
raise IOError, (999, 'unauthorized access')
return None
#urllib._urlopener = MyFancyURLopener()
nb_char_in_line = 50 # for verbose pretty printing
chunksize = 1000 # default size of chunks that the records will be treated by
base_process_size = 4500 # process base size
## Dictionary merging functions
def dict_union(list1, list2):
"Returns union of the two dictionaries."
union_dict = {}
for (e, count) in list1.iteritems():
union_dict[e] = count
for (e, count) in list2.iteritems():
if not union_dict.has_key(e):
union_dict[e] = count
else:
union_dict[e] = (union_dict[e][0] + count[0], count[1])
#for (e, count) in list2.iteritems():
# list1[e] = (list1.get(e, (0, 0))[0] + count[0], count[1])
#return list1
return union_dict
# tagToFunctions mapping. It offers an indirection level necesary for
# indexing fulltext. The default is get_words_from_phrase
tagToWordsFunctions = {}
def get_words_from_phrase(phrase, weight, lang="",
chars_punctuation=r"[\.\,\:\;\?\!\"]",
chars_alphanumericseparators=r"[1234567890\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~]",
split=str.split):
"Returns list of words from phrase 'phrase'."
words = {}
phrase = strip_accents(phrase)
phrase = phrase.lower()
#Getting rid of strange characters
phrase = re.sub("é", 'e', phrase)
phrase = re.sub("è", 'e', phrase)
phrase = re.sub("à", 'a', phrase)
phrase = re.sub(" ", ' ', phrase)
phrase = re.sub("«", ' ', phrase)
phrase = re.sub("»", ' ', phrase)
phrase = re.sub("ê", ' ', phrase)
phrase = re.sub("&", ' ', phrase)
if phrase.find("") > -1:
#Most likely html, remove html code
phrase = re.sub("(?s)<[^>]*>|?\w+;", ' ', phrase)
#removes http links
phrase = re.sub("(?s)http://[^( )]*", '', phrase)
phrase = re.sub(chars_punctuation, ' ', phrase)
#By doing this like below, characters standing alone, like c a b is not added to the inedx, but when they are together with characters like c++ or c$ they are added.
for word in split(phrase):
if options["remove_stopword"] == "True" and not is_stopword(word, 1) and check_term(word, 0):
if lang and lang !="none" and options["use_stemming"]:
word = stem(word, lang)
if not words.has_key(word):
words[word] = (0, 0)
else:
if not words.has_key(word):
words[word] = (0, 0)
words[word] = (words[word][0] + weight, 0)
elif options["remove_stopword"] == "True" and not is_stopword(word, 1):
phrase = re.sub(chars_alphanumericseparators, ' ', word)
for word_ in split(phrase):
if lang and lang !="none" and options["use_stemming"]:
word_ = stem(word_, lang)
if word_:
if not words.has_key(word_):
words[word_] = (0,0)
words[word_] = (words[word_][0] + weight, 0)
return words
class WordTable:
"A class to hold the words table."
def __init__(self, tablename, fields_to_index, separators="[^\s]"):
"Creates words table instance."
self.tablename = tablename
self.recIDs_in_mem = []
self.fields_to_index = fields_to_index
self.separators = separators
self.value = {}
def get_field(self, recID, tag):
"""Returns list of values of the MARC-21 'tag' fields for the
record 'recID'."""
out = []
bibXXx = "bib" + tag[0] + tag[1] + "x"
bibrec_bibXXx = "bibrec_" + bibXXx
query = """SELECT value FROM %s AS b, %s AS bb
WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id
AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID, tag);
res = run_sql(query)
for row in res:
out.append(row[0])
return out
def clean(self):
"Cleans the words table."
self.value={}
def put_into_db(self, mode="normal"):
"""Updates the current words table in the corresponding DB
rnkWORD table. Mode 'normal' means normal execution,
mode 'emergency' means words index reverting to old state.
"""
write_message("%s %s wordtable flush started" % (self.tablename,mode))
write_message('...updating %d words into %sR started' % \
(len(self.value), self.tablename[:-1]))
task_update_progress("%s flushed %d/%d words" % (self.tablename, 0, len(self.value)))
self.recIDs_in_mem = beautify_range_list(self.recIDs_in_mem)
if mode == "normal":
for group in self.recIDs_in_mem:
query = """UPDATE %sR SET type='TEMPORARY' WHERE id_bibrec
BETWEEN '%d' AND '%d' AND type='CURRENT'""" % \
(self.tablename[:-1], group[0], group[1])
write_message(query, verbose=9)
run_sql(query)
nb_words_total = len(self.value)
nb_words_report = int(nb_words_total/10)
nb_words_done = 0
for word in self.value.keys():
self.put_word_into_db(word, self.value[word])
nb_words_done += 1
if nb_words_report!=0 and ((nb_words_done % nb_words_report) == 0):
write_message('......processed %d/%d words' % (nb_words_done, nb_words_total))
task_update_progress("%s flushed %d/%d words" % (self.tablename, nb_words_done, nb_words_total))
write_message('...updating %d words into %s ended' % \
(nb_words_total, self.tablename), verbose=9)
#if options["verbose"]:
# write_message('...updating reverse table %sR started' % self.tablename[:-1])
if mode == "normal":
for group in self.recIDs_in_mem:
query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec
BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \
(self.tablename[:-1], group[0], group[1])
write_message(query, verbose=9)
run_sql(query)
query = """DELETE FROM %sR WHERE id_bibrec
BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \
(self.tablename[:-1], group[0], group[1])
write_message(query, verbose=9)
run_sql(query)
write_message('End of updating wordTable into %s' % self.tablename, verbose=9)
elif mode == "emergency":
write_message("emergency")
for group in self.recIDs_in_mem:
query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec
BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \
(self.tablename[:-1], group[0], group[1])
write_message(query, verbose=9)
run_sql(query)
query = """DELETE FROM %sR WHERE id_bibrec
BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \
(self.tablename[:-1], group[0], group[1])
write_message(query, verbose=9)
run_sql(query)
write_message('End of emergency flushing wordTable into %s' % self.tablename, verbose=9)
#if options["verbose"]:
# write_message('...updating reverse table %sR ended' % self.tablename[:-1])
self.clean()
self.recIDs_in_mem = []
write_message("%s %s wordtable flush ended" % (self.tablename, mode))
task_update_progress("%s flush ended" % (self.tablename))
def load_old_recIDs(self,word):
"""Load existing hitlist for the word from the database index files."""
query = "SELECT hitlist FROM %s WHERE term=%%s" % self.tablename
res = run_sql(query, (word,))
if res:
return deserialize_via_marshal(res[0][0])
else:
return None
def merge_with_old_recIDs(self,word,recIDs, set):
"""Merge the system numbers stored in memory (hash of recIDs with value[0] > 0 or -1
according to whether to add/delete them) with those stored in the database index
and received in set universe of recIDs for the given word.
Return 0 in case no change was done to SET, return 1 in case SET was changed.
"""
set_changed_p = 0
for recID,sign in recIDs.iteritems():
if sign[0] == -1 and set.has_key(recID):
# delete recID if existent in set and if marked as to be deleted
del set[recID]
set_changed_p = 1
elif sign[0] > -1 and not set.has_key(recID):
# add recID if not existent in set and if marked as to be added
set[recID] = sign
set_changed_p = 1
elif sign[0] > -1 and sign[0] != set[recID][0]:
set[recID] = sign
set_changed_p = 1
return set_changed_p
def put_word_into_db(self, word, recIDs, split=str.split):
"""Flush a single word to the database and delete it from memory"""
set = self.load_old_recIDs(word)
#write_message("%s %s" % (word, self.value[word]))
if set: # merge the word recIDs found in memory:
options["modified_words"][word] = 1
if self.merge_with_old_recIDs(word, recIDs, set) == 0:
# nothing to update:
write_message("......... unchanged hitlist for ``%s''" % word, verbose=9)
pass
else:
# yes there were some new words:
write_message("......... updating hitlist for ``%s''" % word, verbose=9)
run_sql("UPDATE %s SET hitlist=%%s WHERE term=%%s" % self.tablename,
(serialize_via_marshal(set), word))
else: # the word is new, will create new set:
write_message("......... inserting hitlist for ``%s''" % word, verbose=9)
set = self.value[word]
if len(set) > 0:
#new word, add to list
options["modified_words"][word] = 1
run_sql("INSERT INTO %s (term, hitlist) VALUES (%%s, %%s)" % self.tablename,
(word, serialize_via_marshal(set)))
if not set: # never store empty words
run_sql("DELETE from %s WHERE term=%%s" % self.tablename,
(word,))
del self.value[word]
def display(self):
"Displays the word table."
keys = self.value.keys()
keys.sort()
for k in keys:
write_message("%s: %s" % (k, self.value[k]))
def count(self):
"Returns the number of words in the table."
return len(self.value)
def info(self):
"Prints some information on the words table."
write_message("The words table contains %d words." % self.count())
def lookup_words(self, word=""):
"Lookup word from the words table."
if not word:
done = 0
while not done:
try:
word = raw_input("Enter word: ")
done = 1
except (EOFError, KeyboardInterrupt):
return
if self.value.has_key(word):
write_message("The word '%s' is found %d times." \
% (word, len(self.value[word])))
else:
write_message("The word '%s' does not exist in the word file."\
% word)
def update_last_updated(self, rank_method_code, starting_time=None):
"""Update last_updated column of the index table in the database.
Puts starting time there so that if the task was interrupted for record download,
the records will be reindexed next time."""
if starting_time is None:
return None
write_message("updating last_updated to %s..." % starting_time, verbose=9)
return run_sql("UPDATE rnkMETHOD SET last_updated=%s WHERE name=%s",
(starting_time, rank_method_code,))
def add_recIDs(self, recIDs):
"""Fetches records which id in the recIDs range list and adds
them to the wordTable. The recIDs range list is of the form:
[[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]].
"""
global chunksize
flush_count = 0
records_done = 0
records_to_go = 0
for range in recIDs:
records_to_go = records_to_go + range[1] - range[0] + 1
time_started = time.time() # will measure profile time
for range in recIDs:
i_low = range[0]
chunksize_count = 0
while i_low <= range[1]:
# calculate chunk group of recIDs and treat it:
i_high = min(i_low+task_get_option("flush")-flush_count-1,range[1])
i_high = min(i_low+chunksize-chunksize_count-1, i_high)
try:
self.chk_recID_range(i_low, i_high)
except StandardError, e:
write_message("Exception caught: %s" % e, sys.stderr)
if task_get_option('verbose') >= 9:
traceback.print_tb(sys.exc_info()[2])
task_update_status("ERROR")
sys.exit(1)
write_message("%s adding records #%d-#%d started" % \
(self.tablename, i_low, i_high))
if CFG_CHECK_MYSQL_THREADS:
kill_sleepy_mysql_threads()
task_update_progress("%s adding recs %d-%d" % (self.tablename, i_low, i_high))
self.del_recID_range(i_low, i_high)
just_processed = self.add_recID_range(i_low, i_high)
flush_count = flush_count + i_high - i_low + 1
chunksize_count = chunksize_count + i_high - i_low + 1
records_done = records_done + just_processed
write_message("%s adding records #%d-#%d ended " % \
(self.tablename, i_low, i_high))
if chunksize_count >= chunksize:
chunksize_count = 0
# flush if necessary:
if flush_count >= task_get_option("flush"):
self.put_into_db()
self.clean()
write_message("%s backing up" % (self.tablename))
flush_count = 0
self.log_progress(time_started,records_done,records_to_go)
# iterate:
i_low = i_high + 1
if flush_count > 0:
self.put_into_db()
self.log_progress(time_started,records_done,records_to_go)
def add_recIDs_by_date(self, dates=""):
"""Add recIDs modified between DATES[0] and DATES[1].
If DATES is not set, then add records modified since the last run of
the ranking method.
"""
if not dates:
write_message("Using the last update time for the rank method")
query = """SELECT last_updated FROM rnkMETHOD WHERE name='%s'
""" % options["current_run"]
res = run_sql(query)
if not res:
return
if not res[0][0]:
dates = ("0000-00-00",'')
else:
dates = (res[0][0],'')
query = """SELECT b.id FROM bibrec AS b WHERE b.modification_date >=
'%s'""" % dates[0]
if dates[1]:
query += "and b.modification_date <= '%s'" % dates[1]
query += " ORDER BY b.id ASC"""
res = run_sql(query)
list = create_range_list(res)
if not list:
write_message( "No new records added. %s is up to date" % self.tablename)
else:
self.add_recIDs(list)
return list
def add_recID_range(self, recID1, recID2):
"""Add records from RECID1 to RECID2."""
wlist = {}
normalize = {}
self.recIDs_in_mem.append([recID1,recID2])
# secondly fetch all needed tags:
for (tag, weight, lang) in self.fields_to_index:
if tag in tagToWordsFunctions.keys():
get_words_function = tagToWordsFunctions[tag]
else:
get_words_function = get_words_from_phrase
bibXXx = "bib" + tag[0] + tag[1] + "x"
bibrec_bibXXx = "bibrec_" + bibXXx
query = """SELECT bb.id_bibrec,b.value FROM %s AS b, %s AS bb
WHERE bb.id_bibrec BETWEEN %d AND %d
AND bb.id_bibxxx=b.id AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID1, recID2, tag)
res = run_sql(query)
nb_total_to_read = len(res)
verbose_idx = 0 # for verbose pretty printing
for row in res:
recID, phrase = row
if recID in options["validset"]:
if not wlist.has_key(recID): wlist[recID] = {}
new_words = get_words_function(phrase, weight, lang) # ,self.separators
wlist[recID] = dict_union(new_words,wlist[recID])
# were there some words for these recIDs found?
if len(wlist) == 0: return 0
recIDs = wlist.keys()
for recID in recIDs:
# was this record marked as deleted?
if "DELETED" in self.get_field(recID, "980__c"):
wlist[recID] = {}
write_message("... record %d was declared deleted, removing its word list" % recID, verbose=9)
write_message("... record %d, termlist: %s" % (recID, wlist[recID]), verbose=9)
# put words into reverse index table with FUTURE status:
for recID in recIDs:
run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'FUTURE')" % self.tablename[:-1],
(recID, serialize_via_marshal(wlist[recID])))
# ... and, for new records, enter the CURRENT status as empty:
try:
run_sql("INSERT INTO %sR (id_bibrec,termlist,type) VALUES (%%s,%%s,'CURRENT')" % self.tablename[:-1],
(recID, serialize_via_marshal([])))
except DatabaseError:
# okay, it's an already existing record, no problem
pass
# put words into memory word list:
put = self.put
for recID in recIDs:
for (w, count) in wlist[recID].iteritems():
put(recID, w, count)
return len(recIDs)
def log_progress(self, start, done, todo):
"""Calculate progress and store it.
start: start time,
done: records processed,
todo: total number of records"""
time_elapsed = time.time() - start
# consistency check
if time_elapsed == 0 or done > todo:
return
time_recs_per_min = done/(time_elapsed/60.0)
write_message("%d records took %.1f seconds to complete.(%1.f recs/min)"\
% (done, time_elapsed, time_recs_per_min))
if time_recs_per_min:
write_message("Estimated runtime: %.1f minutes" % \
((todo-done)/time_recs_per_min))
def put(self, recID, word, sign):
"Adds/deletes a word to the word list."
try:
word = wash_index_term(word)
if self.value.has_key(word):
# the word 'word' exist already: update sign
self.value[word][recID] = sign
# PROBLEM ?
else:
self.value[word] = {recID: sign}
except:
write_message("Error: Cannot put word %s with sign %d for recID %s." % (word, sign, recID))
def del_recIDs(self, recIDs):
"""Fetches records which id in the recIDs range list and adds
them to the wordTable. The recIDs range list is of the form:
[[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]].
"""
count = 0
for range in recIDs:
self.del_recID_range(range[0],range[1])
count = count + range[1] - range[0]
self.put_into_db()
def del_recID_range(self, low, high):
"""Deletes records with 'recID' system number between low
and high from memory words index table."""
write_message("%s fetching existing words for records #%d-#%d started" % \
(self.tablename, low, high), verbose=3)
self.recIDs_in_mem.append([low,high])
query = """SELECT id_bibrec,termlist FROM %sR as bb WHERE bb.id_bibrec
BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high)
recID_rows = run_sql(query)
for recID_row in recID_rows:
recID = recID_row[0]
wlist = deserialize_via_marshal(recID_row[1])
for word in wlist:
self.put(recID, word, (-1, 0))
write_message("%s fetching existing words for records #%d-#%d ended" % \
(self.tablename, low, high), verbose=3)
def report_on_table_consistency(self):
"""Check reverse words index tables (e.g. rnkWORD01R) for
interesting states such as 'TEMPORARY' state.
Prints small report (no of words, no of bad words).
"""
# find number of words:
query = """SELECT COUNT(*) FROM %s""" % (self.tablename)
res = run_sql(query, None, 1)
if res:
nb_words = res[0][0]
else:
nb_words = 0
# find number of records:
query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1])
res = run_sql(query, None, 1)
if res:
nb_records = res[0][0]
else:
nb_records = 0
# report stats:
write_message("%s contains %d words from %d records" % (self.tablename, nb_words, nb_records))
# find possible bad states in reverse tables:
query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1])
res = run_sql(query)
if res:
nb_bad_records = res[0][0]
else:
nb_bad_records = 999999999
if nb_bad_records:
write_message("EMERGENCY: %s needs to repair %d of %d records" % \
(self.tablename, nb_bad_records, nb_records))
else:
write_message("%s is in consistent state" % (self.tablename))
return nb_bad_records
def repair(self):
"""Repair the whole table"""
# find possible bad states in reverse tables:
query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1])
res = run_sql(query, None, 1)
if res:
nb_bad_records = res[0][0]
else:
nb_bad_records = 0
# find number of records:
query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1])
res = run_sql(query)
if res:
nb_records = res[0][0]
else:
nb_records = 0
if nb_bad_records == 0:
return
query = """SELECT id_bibrec FROM %sR WHERE type <> 'CURRENT' ORDER BY id_bibrec""" \
% (self.tablename[:-1])
res = run_sql(query)
recIDs = create_range_list(res)
flush_count = 0
records_done = 0
records_to_go = 0
for range in recIDs:
records_to_go = records_to_go + range[1] - range[0] + 1
time_started = time.time() # will measure profile time
for range in recIDs:
i_low = range[0]
chunksize_count = 0
while i_low <= range[1]:
# calculate chunk group of recIDs and treat it:
i_high = min(i_low+task_get_option("flush")-flush_count-1,range[1])
i_high = min(i_low+chunksize-chunksize_count-1, i_high)
try:
self.fix_recID_range(i_low, i_high)
except StandardError, e:
write_message("Exception caught: %s" % e, sys.stderr)
if task_get_option("verbose") >= 9:
traceback.print_tb(sys.exc_info()[2])
task_update_status("ERROR")
sys.exit(1)
flush_count = flush_count + i_high - i_low + 1
chunksize_count = chunksize_count + i_high - i_low + 1
records_done = records_done + i_high - i_low + 1
if chunksize_count >= chunksize:
chunksize_count = 0
# flush if necessary:
if flush_count >= task_get_option("flush"):
self.put_into_db("emergency")
self.clean()
flush_count = 0
self.log_progress(time_started,records_done,records_to_go)
# iterate:
i_low = i_high + 1
if flush_count > 0:
self.put_into_db("emergency")
self.log_progress(time_started,records_done,records_to_go)
write_message("%s inconsistencies repaired." % self.tablename)
def chk_recID_range(self, low, high):
"""Check if the reverse index table is in proper state"""
## check db
query = """SELECT COUNT(*) FROM %sR WHERE type <> 'CURRENT'
AND id_bibrec BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high)
res = run_sql(query, None, 1)
if res[0][0]==0:
write_message("%s for %d-%d is in consistent state"%(self.tablename,low,high))
return # okay, words table is consistent
## inconsistency detected!
write_message("EMERGENCY: %s inconsistencies detected..." % self.tablename)
write_message("""EMERGENCY: Errors found. You should check consistency of the %s - %sR tables.\nRunning 'bibindex --repair' is recommended.""" \
% (self.tablename, self.tablename[:-1]))
raise StandardError
def fix_recID_range(self, low, high):
"""Try to fix reverse index database consistency (e.g. table rnkWORD01R) in the low,high doc-id range.
Possible states for a recID follow:
CUR TMP FUT: very bad things have happened: warn!
CUR TMP : very bad things have happened: warn!
CUR FUT: delete FUT (crash before flushing)
CUR : database is ok
TMP FUT: add TMP to memory and del FUT from memory
flush (revert to old state)
TMP : very bad things have happened: warn!
FUT: very bad things have happended: warn!
"""
state = {}
query = "SELECT id_bibrec,type FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d'"\
% (self.tablename[:-1], low, high)
res = run_sql(query)
for row in res:
if not state.has_key(row[0]):
state[row[0]]=[]
state[row[0]].append(row[1])
ok = 1 # will hold info on whether we will be able to repair
for recID in state.keys():
if not 'TEMPORARY' in state[recID]:
if 'FUTURE' in state[recID]:
if 'CURRENT' not in state[recID]:
write_message("EMERGENCY: Record %d is in inconsistent state. Can't repair it" % recID)
ok = 0
else:
write_message("EMERGENCY: Inconsistency in record %d detected" % recID)
query = """DELETE FROM %sR
WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID)
run_sql(query)
write_message("EMERGENCY: Inconsistency in record %d repaired." % recID)
else:
if 'FUTURE' in state[recID] and not 'CURRENT' in state[recID]:
self.recIDs_in_mem.append([recID,recID])
# Get the words file
query = """SELECT type,termlist FROM %sR
WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID)
write_message(query, verbose=9)
res = run_sql(query)
for row in res:
wlist = deserialize_via_marshal(row[1])
write_message("Words are %s " % wlist, verbose=9)
if row[0] == 'TEMPORARY':
sign = 1
else:
sign = -1
for word in wlist:
self.put(recID, word, wlist[word])
else:
write_message("EMERGENCY: %s for %d is in inconsistent state. Couldn't repair it." % (self.tablename, recID))
ok = 0
if not ok:
write_message("""EMERGENCY: Unrepairable errors found. You should check consistency
of the %s - %sR tables. Deleting affected records is
recommended.""" % (self.tablename, self.tablename[:-1]))
raise StandardError
def word_index(run):
"""Run the indexing task. The row argument is the BibSched task
queue row, containing if, arguments, etc.
Return 1 in case of success and 0 in case of failure.
"""
## import optional modules:
try:
import psyco
psyco.bind(get_words_from_phrase)
psyco.bind(WordTable.merge_with_old_recIDs)
psyco.bind(update_rnkWORD)
psyco.bind(check_rnkWORD)
except StandardError,e:
print "Warning: Psyco", e
pass
global languages
max_recid = int(run_sql("SELECT max(id) FROM bibrec")[0][0])
options["run"] = []
options["run"].append(run)
for rank_method_code in options["run"]:
method_starting_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
write_message("Running rank method: %s" % getName(rank_method_code))
try:
- file = etcdir + "/bibrank/" + rank_method_code + ".cfg"
+ file = CFG_ETCDIR + "/bibrank/" + rank_method_code + ".cfg"
config = ConfigParser.ConfigParser()
config.readfp(open(file))
except StandardError, e:
write_message("Cannot find configurationfile: %s" % file, sys.stderr)
raise StandardError
options["current_run"] = rank_method_code
options["modified_words"] = {}
options["table"] = config.get(config.get("rank_method", "function"), "table")
options["use_stemming"] = config.get(config.get("rank_method","function"),"stemming")
options["remove_stopword"] = config.get(config.get("rank_method","function"),"stopword")
tags = get_tags(config) #get the tags to include
options["validset"] = get_valid_range(rank_method_code) #get the records from the collections the method is enabled for
function = config.get("rank_method","function")
wordTable = WordTable(options["table"], tags)
wordTable.report_on_table_consistency()
try:
if task_get_option("cmd") == "del":
if task_get_option("id"):
wordTable.del_recIDs(task_get_option("id"))
elif task_get_option("collection"):
l_of_colls = task_get_option("collection").split(",")
recIDs = perform_request_search(c=l_of_colls)
recIDs_range = []
for recID in recIDs:
recIDs_range.append([recID,recID])
wordTable.del_recIDs(recIDs_range)
else:
write_message("Missing IDs of records to delete from index %s.", wordTable.tablename,
sys.stderr)
raise StandardError
elif task_get_option("cmd") == "add":
if task_get_option("id"):
wordTable.add_recIDs(task_get_option("id"))
elif task_get_option("collection"):
l_of_colls = task_get_option("collection").split(",")
recIDs = perform_request_search(c=l_of_colls)
recIDs_range = []
for recID in recIDs:
recIDs_range.append([recID,recID])
wordTable.add_recIDs(recIDs_range)
elif task_get_option("last_updated"):
wordTable.add_recIDs_by_date("")
# only update last_updated if run via automatic mode:
wordTable.update_last_updated(rank_method_code, method_starting_time)
elif task_get_option("modified"):
wordTable.add_recIDs_by_date(task_get_option("modified"))
else:
wordTable.add_recIDs([[0,max_recid]])
elif task_get_option("cmd") == "repair":
wordTable.repair()
check_rnkWORD(options["table"])
elif task_get_option("cmd") == "check":
check_rnkWORD(options["table"])
options["modified_words"] = {}
elif task_get_option("cmd") == "stat":
rank_method_code_statistics(options["table"])
else:
write_message("Invalid command found processing %s" % \
wordTable.tablename, sys.stderr)
raise StandardError
update_rnkWORD(options["table"], options["modified_words"])
except StandardError, e:
write_message("Exception caught: %s" % e, sys.stderr)
if task_get_option("verbose") >= 9:
traceback.print_tb(sys.exc_info()[2])
sys.exit(1)
wordTable.report_on_table_consistency()
# We are done. State it in the database, close and quit
return 1
def get_tags(config):
"""Get the tags that should be used creating the index and each tag's parameter"""
tags = []
function = config.get("rank_method","function")
i = 1
shown_error = 0
#try:
if 1:
while config.has_option(function,"tag%s"% i):
tag = config.get(function, "tag%s" % i)
tag = tag.split(",")
tag[1] = int(tag[1].strip())
tag[2] = tag[2].strip()
#check if stemmer for language is available
if config.get(function,"stemming") and stem("information", "english") != "inform":
if shown_error == 0:
write_message("Warning: Stemming not working. Please check it out!")
shown_error = 1
elif tag[2] and tag[2] != "none" and config.get(function,"stemming") and not is_stemmer_available_for_language(tag[2]):
write_message("Warning: Stemming not available for language '%s'." % tag[2])
tags.append(tag)
i += 1
#except Exception:
# write_message("Could not read data from configuration file, please check for errors")
# raise StandardError
return tags
def get_valid_range(rank_method_code):
"""Returns which records are valid for this rank method, according to which collections it is enabled for."""
#if options["verbose"] >=9:
# write_message("Getting records from collections enabled for rank method.")
#res = run_sql("SELECT collection.name FROM collection,collection_rnkMETHOD,rnkMETHOD WHERE collection.id=id_collection and id_rnkMETHOD=rnkMETHOD.id and rnkMETHOD.name='%s'" % rank_method_code)
#l_of_colls = []
#for coll in res:
# l_of_colls.append(coll[0])
#if len(l_of_colls) > 0:
# recIDs = perform_request_search(c=l_of_colls)
#else:
# recIDs = []
valid = intbitset(trailing_bits=1)
valid.discard(0)
#valid.addlist(recIDs)
return valid
def check_term(term, termlength):
"""Check if term contains not allowed characters, or for any other reasons for not using this term."""
try:
if len(term) <= termlength:
return False
reg = re.compile(r"[1234567890\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~]")
if re.search(reg, term):
return False
term = str.replace(term, "-", "")
term = str.replace(term, ".", "")
term = str.replace(term, ",", "")
if int(term):
return False
except StandardError, e:
pass
return True
def check_rnkWORD(table):
"""Checks for any problems in rnkWORD tables."""
i = 0
errors = {}
termslist = run_sql("SELECT term FROM %s" % table)
N = run_sql("select max(id_bibrec) from %sR" % table[:-1])[0][0]
write_message("Checking integrity of rank values in %s" % table)
terms = map(lambda x: x[0], termslist)
while i < len(terms):
query_params = ()
for j in range(i, ((i+5000)< len(terms) and (i+5000) or len(terms))):
query_params += (terms[j],)
terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE term IN (%s)" % (table, (len(query_params)*"%s,")[:-1]),
query_params)
for (t, hitlist) in terms_docs:
term_docs = deserialize_via_marshal(hitlist)
if (term_docs.has_key("Gi") and term_docs["Gi"][1] == 0) or not term_docs.has_key("Gi"):
write_message("ERROR: Missing value for term: %s (%s) in %s: %s" % (t, repr(t), table, len(term_docs)))
errors[t] = 1
i += 5000
write_message("Checking integrity of rank values in %sR" % table[:-1])
i = 0
while i < N:
docs_terms = run_sql("SELECT id_bibrec, termlist FROM %sR WHERE id_bibrec>=%s and id_bibrec<=%s" % (table[:-1], i, i+5000))
for (j, termlist) in docs_terms:
termlist = deserialize_via_marshal(termlist)
for (t, tf) in termlist.iteritems():
if tf[1] == 0 and not errors.has_key(t):
errors[t] = 1
write_message("ERROR: Gi missing for record %s and term: %s (%s) in %s" % (j,t,repr(t), table))
terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE term=%%s" % table, (t,))
termlist = deserialize_via_marshal(terms_docs[0][1])
i += 5000
if len(errors) == 0:
write_message("No direct errors found, but nonconsistent data may exist.")
else:
write_message("%s errors found during integrity check, repair and rebalancing recommended." % len(errors))
options["modified_words"] = errors
def rank_method_code_statistics(table):
"""Shows some statistics about this rank method."""
maxID = run_sql("select max(id) from %s" % table)
maxID = maxID[0][0]
terms = {}
Gi = {}
write_message("Showing statistics of terms in index:")
write_message("Important: For the 'Least used terms', the number of terms is shown first, and the number of occurences second.")
write_message("Least used terms---Most important terms---Least important terms")
i = 0
while i < maxID:
terms_docs=run_sql("SELECT term, hitlist FROM %s WHERE id>= %s and id < %s" % (table, i, i + 10000))
for (t, hitlist) in terms_docs:
term_docs=deserialize_via_marshal(hitlist)
terms[len(term_docs)] = terms.get(len(term_docs), 0) + 1
if term_docs.has_key("Gi"):
Gi[t] = term_docs["Gi"]
i=i + 10000
terms=terms.items()
terms.sort(lambda x, y: cmp(y[1], x[1]))
Gi=Gi.items()
Gi.sort(lambda x, y: cmp(y[1], x[1]))
for i in range(0, 20):
write_message("%s/%s---%s---%s" % (terms[i][0],terms[i][1], Gi[i][0],Gi[len(Gi) - i - 1][0]))
def update_rnkWORD(table, terms):
"""Updates rnkWORDF and rnkWORDR with Gi and Nj values. For each term in rnkWORDF, a Gi value for the term is added. And for each term in each document, the Nj value for that document is added. In rnkWORDR, the Gi value for each term in each document is added. For description on how things are computed, look in the hacking docs.
table - name of forward index to update
terms - modified terms"""
stime = time.time()
Gi = {}
Nj = {}
N = run_sql("select count(id_bibrec) from %sR" % table[:-1])[0][0]
if len(terms) == 0 and task_get_option("quick") == "yes":
write_message("No terms to process, ending...")
return ""
elif task_get_option("quick") == "yes": #not used -R option, fast calculation (not accurate)
write_message("Beginning post-processing of %s terms" % len(terms))
#Locating all documents related to the modified/new/deleted terms, if fast update,
#only take into account new/modified occurences
write_message("Phase 1: Finding records containing modified terms")
terms = terms.keys()
i = 0
while i < len(terms):
terms_docs = get_from_forward_index(terms, i, (i+5000), table)
for (t, hitlist) in terms_docs:
term_docs = deserialize_via_marshal(hitlist)
if term_docs.has_key("Gi"):
del term_docs["Gi"]
for (j, tf) in term_docs.iteritems():
if (task_get_option("quick") == "yes" and tf[1] == 0) or task_get_option("quick") == "no":
Nj[j] = 0
write_message("Phase 1: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms)))
i += 5000
write_message("Phase 1: Finished finding records containing modified terms")
#Find all terms in the records found in last phase
write_message("Phase 2: Finding all terms in affected records")
records = Nj.keys()
i = 0
while i < len(records):
docs_terms = get_from_reverse_index(records, i, (i + 5000), table)
for (j, termlist) in docs_terms:
doc_terms = deserialize_via_marshal(termlist)
for (t, tf) in doc_terms.iteritems():
Gi[t] = 0
write_message("Phase 2: ......processed %s/%s records " % ((i+5000>len(records) and len(records) or (i+5000)), len(records)))
i += 5000
write_message("Phase 2: Finished finding all terms in affected records")
else: #recalculate
max_id = run_sql("SELECT MAX(id) FROM %s" % table)
max_id = max_id[0][0]
write_message("Beginning recalculation of %s terms" % max_id)
terms = []
i = 0
while i < max_id:
terms_docs = get_from_forward_index_with_id(i, (i+5000), table)
for (t, hitlist) in terms_docs:
Gi[t] = 0
term_docs = deserialize_via_marshal(hitlist)
if term_docs.has_key("Gi"):
del term_docs["Gi"]
for (j, tf) in term_docs.iteritems():
Nj[j] = 0
write_message("Phase 1: ......processed %s/%s terms" % ((i+5000)>max_id and max_id or (i+5000), max_id))
i += 5000
write_message("Phase 1: Finished finding which records contains which terms")
write_message("Phase 2: Jumping over..already done in phase 1 because of -R option")
terms = Gi.keys()
Gi = {}
i = 0
if task_get_option("quick") == "no":
#Calculating Fi and Gi value for each term
write_message("Phase 3: Calculating importance of all affected terms")
while i < len(terms):
terms_docs = get_from_forward_index(terms, i, (i+5000), table)
for (t, hitlist) in terms_docs:
term_docs = deserialize_via_marshal(hitlist)
if term_docs.has_key("Gi"):
del term_docs["Gi"]
Fi = 0
Gi[t] = 1
for (j, tf) in term_docs.iteritems():
Fi += tf[0]
for (j, tf) in term_docs.iteritems():
if tf[0] != Fi:
Gi[t] = Gi[t] + ((float(tf[0]) / Fi) * math.log(float(tf[0]) / Fi) / math.log(2)) / math.log(N)
write_message("Phase 3: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms)))
i += 5000
write_message("Phase 3: Finished calculating importance of all affected terms")
else:
#Using existing Gi value instead of calculating a new one. Missing some accurancy.
write_message("Phase 3: Getting approximate importance of all affected terms")
while i < len(terms):
terms_docs = get_from_forward_index(terms, i, (i+5000), table)
for (t, hitlist) in terms_docs:
term_docs = deserialize_via_marshal(hitlist)
if term_docs.has_key("Gi"):
Gi[t] = term_docs["Gi"][1]
elif len(term_docs) == 1:
Gi[t] = 1
else:
Fi = 0
Gi[t] = 1
for (j, tf) in term_docs.iteritems():
Fi += tf[0]
for (j, tf) in term_docs.iteritems():
if tf[0] != Fi:
Gi[t] = Gi[t] + ((float(tf[0]) / Fi) * math.log(float(tf[0]) / Fi) / math.log(2)) / math.log(N)
write_message("Phase 3: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms)))
i += 5000
write_message("Phase 3: Finished getting approximate importance of all affected terms")
write_message("Phase 4: Calculating normalization value for all affected records and updating %sR" % table[:-1])
records = Nj.keys()
i = 0
while i < len(records):
#Calculating the normalization value for each document, and adding the Gi value to each term in each document.
docs_terms = get_from_reverse_index(records, i, (i + 5000), table)
for (j, termlist) in docs_terms:
doc_terms = deserialize_via_marshal(termlist)
for (t, tf) in doc_terms.iteritems():
if Gi.has_key(t):
Nj[j] = Nj.get(j, 0) + math.pow(Gi[t] * (1 + math.log(tf[0])), 2)
Git = int(math.floor(Gi[t]*100))
if Git >= 0:
Git += 1
doc_terms[t] = (tf[0], Git)
else:
Nj[j] = Nj.get(j, 0) + math.pow(tf[1] * (1 + math.log(tf[0])), 2)
Nj[j] = 1.0 / math.sqrt(Nj[j])
Nj[j] = int(Nj[j] * 100)
if Nj[j] >= 0:
Nj[j] += 1
run_sql("UPDATE %sR SET termlist=%%s WHERE id_bibrec=%%s" % table[:-1],
(serialize_via_marshal(doc_terms), j))
write_message("Phase 4: ......processed %s/%s records" % ((i+5000>len(records) and len(records) or (i+5000)), len(records)))
i += 5000
write_message("Phase 4: Finished calculating normalization value for all affected records and updating %sR" % table[:-1])
write_message("Phase 5: Updating %s with new normalization values" % table)
i = 0
terms = Gi.keys()
while i < len(terms):
#Adding the Gi value to each term, and adding the normalization value to each term in each document.
terms_docs = get_from_forward_index(terms, i, (i+5000), table)
for (t, hitlist) in terms_docs:
term_docs = deserialize_via_marshal(hitlist)
if term_docs.has_key("Gi"):
del term_docs["Gi"]
for (j, tf) in term_docs.iteritems():
if Nj.has_key(j):
term_docs[j] = (tf[0], Nj[j])
Git = int(math.floor(Gi[t]*100))
if Git >= 0:
Git += 1
term_docs["Gi"] = (0, Git)
run_sql("UPDATE %s SET hitlist=%%s WHERE term=%%s" % table,
(serialize_via_marshal(term_docs), t))
write_message("Phase 5: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms)))
i += 5000
write_message("Phase 5: Finished updating %s with new normalization values" % table)
write_message("Time used for post-processing: %.1fmin" % ((time.time() - stime) / 60))
write_message("Finished post-processing")
def get_from_forward_index(terms, start, stop, table):
terms_docs = ()
for j in range(start, (stop < len(terms) and stop or len(terms))):
terms_docs += run_sql("SELECT term, hitlist FROM %s WHERE term=%%s" % table,
(terms[j],))
return terms_docs
def get_from_forward_index_with_id(start, stop, table):
terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE id BETWEEN %s AND %s" % (table, start, stop))
return terms_docs
def get_from_reverse_index(records, start, stop, table):
current_recs = "%s" % records[start:stop]
current_recs = current_recs[1:-1]
docs_terms = run_sql("SELECT id_bibrec, termlist FROM %sR WHERE id_bibrec IN (%s)" % (table[:-1], current_recs))
return docs_terms
#def test_word_separators(phrase="hep-th/0101001"):
#"""Tests word separating policy on various input."""
#print "%s:" % phrase
#gwfp = get_words_from_phrase(phrase)
#for (word, count) in gwfp.iteritems():
#print "\t-> %s - %s" % (word, count)
def getName(methname, ln=cdslang, type='ln'):
"""Returns the name of the rank method, either in default language or given language.
methname = short name of the method
ln - the language to get the name in
type - which name "type" to get."""
try:
rnkid = run_sql("SELECT id FROM rnkMETHOD where name='%s'" % methname)
if rnkid:
rnkid = str(rnkid[0][0])
res = run_sql("SELECT value FROM rnkMETHODNAME where type='%s' and ln='%s' and id_rnkMETHOD=%s" % (type, ln, rnkid))
if not res:
res = run_sql("SELECT value FROM rnkMETHODNAME WHERE ln='%s' and id_rnkMETHOD=%s and type='%s'" % (cdslang, rnkid, type))
if not res:
return methname
return res[0][0]
else:
raise Exception
except Exception, e:
write_message("Cannot run rank method, either given code for method is wrong, or it has not been added using the webinterface.")
raise Exception
def word_similarity(run):
"""Call correct method"""
return word_index(run)
diff --git a/modules/bibrank/lib/bibrankadminlib.py b/modules/bibrank/lib/bibrankadminlib.py
index 610d1f4b0..044d4f50f 100644
--- a/modules/bibrank/lib/bibrankadminlib.py
+++ b/modules/bibrank/lib/bibrankadminlib.py
@@ -1,1041 +1,1041 @@
## $Id$
## Administrator interface for BibRank
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## Youshould have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""CDS Invenio BibRank Administrator Interface."""
__revision__ = "$Id$"
import cgi
import re
import os
import ConfigParser
from zlib import compress,decompress
import marshal
try:
from mod_python import apache
except ImportError:
pass
from invenio.config import \
cdslang, \
- etcdir, \
- version, \
+ CFG_ETCDIR, \
+ CFG_VERSION, \
weburl
import invenio.access_control_engine as acce
from invenio.messages import language_list_long
from invenio.dbquery import run_sql
from invenio.webpage import page, pageheaderonly, pagefooteronly
from invenio.webuser import getUid, get_email
def getnavtrail(previous = ''):
navtrail = """Admin Area """ % (weburl,)
navtrail = navtrail + previous
return navtrail
def check_user(req, role, adminarea=2, authorized=0):
(auth_code, auth_message) = is_adminuser(req, role)
if not authorized and auth_code != 0:
return ("false", auth_message)
return ("", auth_message)
def is_adminuser(req, role):
"""check if user is a registered administrator. """
return acce.acc_authorize_action(req, role)
def perform_index(ln=cdslang):
"""create the bibrank main area menu page."""
header = ['Code', 'Translations', 'Collections', 'Rank method']
rnk_list = get_def_name('', "rnkMETHOD")
actions = []
for (rnkID, name) in rnk_list:
actions.append([name])
for col in [(('Modify', 'modifytranslations'),),
(('Modify', 'modifycollection'),),
(('Show Details', 'showrankdetails'),
('Modify', 'modifyrank'),
('Delete', 'deleterank'))]:
actions[-1].append('%s' % (weburl, col[0][1], rnkID, ln, col[0][0]))
for (str, function) in col[1:]:
actions[-1][-1] += ' / %s' % (weburl, function, rnkID, ln, str)
output = """
Add new rank method
""" % (weburl, ln)
output += tupletotable(header=header, tuple=actions)
return addadminbox("""Overview of rank methods   [?]""" % weburl, datalist=[output, ''])
def perform_modifycollection(rnkID='', ln=cdslang, func='', colID='', confirm=0):
"""Modify which collections the rank method is visible to"""
output = ""
subtitle = ""
if rnkID:
rnkNAME = get_def_name(rnkID, "rnkMETHOD")[0][1]
if func in ["0", 0] and confirm in ["1", 1]:
finresult = attach_col_rnk(rnkID, colID)
elif func in ["1", 1] and confirm in ["1", 1]:
finresult = detach_col_rnk(rnkID, colID)
if colID:
colNAME = get_def_name(colID, "collection")[0][1]
subtitle = """Step 1 - Select collection to enable/disable rank method '%s' for""" % rnkNAME
output = """
The rank method is currently enabled for these collections:
"""
col_list = get_rnk_col(rnkID, ln)
if not col_list:
output += """No collections"""
else:
for (id, name) in col_list:
output += """%s, """ % name
output += """
"""
col_list = get_def_name('', "collection")
col_rnk = dict(get_rnk_col(rnkID))
col_list = filter(lambda x: not col_rnk.has_key(x[0]), col_list)
if col_list:
text = """
Enable for:
"""
output += createhiddenform(action="modifycollection",
text=text,
button="Enable",
rnkID=rnkID,
ln=ln,
func=0,
confirm=1)
if confirm in ["0", 0] and func in ["0", 0] and colID:
subtitle = "Step 2 - Confirm to enable rank method for the chosen collection"
text = "
Please confirm to enable rank method '%s' for the collection '%s'
" % (rnkNAME, colNAME)
output += createhiddenform(action="modifycollection",
text=text,
button="Confirm",
rnkID=rnkID,
ln=ln,
colID=colID,
func=0,
confirm=1)
elif confirm in ["1", 1] and func in ["0", 0] and colID:
subtitle = "Step 3 - Result"
output += write_outcome(finresult)
elif confirm not in ["0", 0] and func in ["0", 0]:
output += """Please select a collection."""
col_list = get_rnk_col(rnkID, ln)
if col_list:
text = """
Disable for:
"""
output += createhiddenform(action="modifycollection",
text=text,
button="Disable",
rnkID=rnkID,
ln=ln,
func=1,
confirm=1)
if confirm in ["1", 1] and func in ["1", 1] and colID:
subtitle = "Step 3 - Result"
output += write_outcome(finresult)
elif confirm not in ["0", 0] and func in ["1", 1]:
output += """Please select a collection."""
body = [output]
return addadminbox(subtitle + """   [?]""" % weburl, body)
def perform_modifytranslations(rnkID, ln, sel_type, trans, confirm, callback='yes'):
"""Modify the translations of a rank method"""
output = ''
subtitle = ''
cdslangs = get_languages()
cdslangs.sort()
if confirm in ["2", 2] and rnkID:
finresult = modify_translations(rnkID, cdslangs, sel_type, trans, "rnkMETHOD")
rnk_name = get_def_name(rnkID, "rnkMETHOD")[0][1]
rnk_dict = dict(get_i8n_name('', ln, get_rnk_nametypes()[0][0], "rnkMETHOD"))
if rnkID and rnk_dict.has_key(int(rnkID)):
rnkID = int(rnkID)
subtitle = """3. Modify translations for rank method '%s'""" % rnk_name
if type(trans) is str:
trans = [trans]
if sel_type == '':
sel_type = get_rnk_nametypes()[0][0]
header = ['Language', 'Translation']
actions = []
text = """
Name type
"""
output += createhiddenform(action="modifytranslations",
text=text,
button="Select",
rnkID=rnkID,
ln=ln,
confirm=0)
if confirm in [-1, "-1", 0, "0"]:
trans = []
for key, value in cdslangs:
try:
trans_names = get_name(rnkID, key, sel_type, "rnkMETHOD")
trans.append(trans_names[0][0])
except StandardError, e:
trans.append('')
for nr in range(0,len(cdslangs)):
actions.append(["%s %s" % (cdslangs[nr][1], (cdslangs[nr][0]==cdslang and '(def)' or ''))])
actions[-1].append('' % trans[nr])
text = tupletotable(header=header, tuple=actions)
output += createhiddenform(action="modifytranslations",
text=text,
button="Modify",
rnkID=rnkID,
sel_type=sel_type,
ln=ln,
confirm=2)
if sel_type and len(trans) and confirm in ["2", 2]:
output += write_outcome(finresult)
body = [output]
return addadminbox(subtitle + """   [?]""" % weburl, body)
def perform_addrankarea(rnkcode='', ln=cdslang, template='', confirm=-1):
"""form to add a new rank method with these values:"""
subtitle = 'Step 1 - Create new rank method'
output = """
BibRank code:
A unique code that identifies a rank method, is used when running the bibrank daemon and used to name the configuration file for the method.
The template files includes the necessary parameters for the chosen rank method, and only needs to be edited with the correct tags and paths.
For more information, please go to the BibRank guide and read the section about adding a rank method
""" % weburl
text = """
BibRank code
""" % (rnkcode)
text += """ Cfg template
"""
output += createhiddenform(action="addrankarea",
text=text,
button="Add rank method",
ln=ln,
confirm=1)
if rnkcode:
if confirm in ["0", 0]:
subtitle = 'Step 2 - Confirm addition of rank method'
text = """Add rank method with BibRank code: '%s'.""" % (rnkcode)
if template:
text += """ Using configuration template: '%s'.""" % (template)
else:
text += """ Create empty configuration file."""
output += createhiddenform(action="addrankarea",
text=text,
rnkcode=rnkcode,
button="Confirm",
template=template,
confirm=1)
elif confirm in ["1", 1]:
rnkID = add_rnk(rnkcode)
subtitle = "Step 3 - Result"
if rnkID[0] == 1:
rnkID = rnkID[1]
text = """Added new rank method with BibRank code '%s'""" % rnkcode
try:
if template:
- infile = open("%s/bibrank/%s" % (etcdir, template), 'r')
+ infile = open("%s/bibrank/%s" % (CFG_ETCDIR, template), 'r')
indata = infile.readlines()
infile.close()
else:
indata = ()
- file = open("%s/bibrank/%s.cfg" % (etcdir, get_rnk_code(rnkID)[0][0]), 'w')
+ file = open("%s/bibrank/%s.cfg" % (CFG_ETCDIR, get_rnk_code(rnkID)[0][0]), 'w')
for line in indata:
file.write(line)
file.close()
if template:
text += """ Configuration file created using '%s' as template.""" % template
else:
text += """ Empty configuration file created."""
except StandardError, e:
- text += """ Sorry, could not create configuration file: '%s/bibrank/%s.cfg', either because it already exists, or not enough rights to create file. Please create the file in the path given.""" % (etcdir, get_rnk_code(rnkID)[0][0])
+ text += """ Sorry, could not create configuration file: '%s/bibrank/%s.cfg', either because it already exists, or not enough rights to create file. Please create the file in the path given.""" % (CFG_ETCDIR, get_rnk_code(rnkID)[0][0])
else:
text = """Sorry, could not add rank method, rank method with the same BibRank code probably exists."""
output += text
elif not rnkcode and confirm not in [-1, "-1"]:
output += """Sorry, could not add rank method, not enough data submitted."""
body = [output]
return addadminbox(subtitle + """   [?]""" % weburl, body)
def perform_modifyrank(rnkID, rnkcode='', ln=cdslang, template='', cfgfile='', confirm=0):
"""form to modify a rank method
rnkID - id of the rank method
"""
if not rnkID:
return "No ranking method selected."
if not get_rnk_code(rnkID):
return "Ranking method %s does not seem to exist." % str(rnkID)
subtitle = 'Step 1 - Please modify the wanted values below'
if not rnkcode:
oldcode = get_rnk_code(rnkID)[0]
else:
oldcode = rnkcode
output = """
When changing the BibRank code of a rank method, you must also change any scheduled tasks using the old value.
For more information, please go to the BibRank guide and read the section about modifying a rank method's BibRank code.
""" % weburl
text = """
BibRank code
""" % (oldcode)
try:
text += """Cfg file"""
textarea = ""
if cfgfile:
textarea +=cfgfile
else:
- file = open("%s/bibrank/%s.cfg" % (etcdir, get_rnk_code(rnkID)[0][0]))
+ file = open("%s/bibrank/%s.cfg" % (CFG_ETCDIR, get_rnk_code(rnkID)[0][0]))
for line in file.readlines():
textarea += line
text += """"""
except StandardError, e:
- text += """Cannot load file, either it does not exist, or not enough rights to read it: '%s/bibrank/%s.cfg' Please create the file in the path given.""" % (etcdir, get_rnk_code(rnkID)[0][0])
+ text += """Cannot load file, either it does not exist, or not enough rights to read it: '%s/bibrank/%s.cfg' Please create the file in the path given.""" % (CFG_ETCDIR, get_rnk_code(rnkID)[0][0])
output += createhiddenform(action="modifyrank",
text=text,
rnkID=rnkID,
button="Modify",
confirm=1)
if rnkcode and confirm in ["1", 1] and get_rnk_code(rnkID)[0][0] != rnkcode:
oldcode = get_rnk_code(rnkID)[0][0]
result = modify_rnk(rnkID, rnkcode)
subtitle = "Step 3 - Result"
if result:
text = """Rank method modified."""
try:
- file = open("%s/bibrank/%s.cfg" % (etcdir, oldcode), 'r')
- file2 = open("%s/bibrank/%s.cfg" % (etcdir, rnkcode), 'w')
+ file = open("%s/bibrank/%s.cfg" % (CFG_ETCDIR, oldcode), 'r')
+ file2 = open("%s/bibrank/%s.cfg" % (CFG_ETCDIR, rnkcode), 'w')
lines = file.readlines()
for line in lines:
file2.write(line)
file.close()
file2.close()
- os.remove("%s/bibrank/%s.cfg" % (etcdir, oldcode))
+ os.remove("%s/bibrank/%s.cfg" % (CFG_ETCDIR, oldcode))
except StandardError, e:
- text = """Sorry, could not change name of cfg file, must be done manually: '%s/bibrank/%s.cfg'""" % (etcdir, oldcode)
+ text = """Sorry, could not change name of cfg file, must be done manually: '%s/bibrank/%s.cfg'""" % (CFG_ETCDIR, oldcode)
else:
text = """Sorry, could not modify rank method."""
output += text
if cfgfile and confirm in ["1", 1]:
try:
- file = open("%s/bibrank/%s.cfg" % (etcdir, get_rnk_code(rnkID)[0][0]), 'w')
+ file = open("%s/bibrank/%s.cfg" % (CFG_ETCDIR, get_rnk_code(rnkID)[0][0]), 'w')
file.write(cfgfile)
file.close()
- text = """ Configuration file modified: '%s/bibrank/%s.cfg'""" % (etcdir, get_rnk_code(rnkID)[0][0])
+ text = """ Configuration file modified: '%s/bibrank/%s.cfg'""" % (CFG_ETCDIR, get_rnk_code(rnkID)[0][0])
except StandardError, e:
- text = """ Sorry, could not modify configuration file, please check for rights to do so: '%s/bibrank/%s.cfg' Please modify the file manually.""" % (etcdir, get_rnk_code(rnkID)[0][0])
+ text = """ Sorry, could not modify configuration file, please check for rights to do so: '%s/bibrank/%s.cfg' Please modify the file manually.""" % (CFG_ETCDIR, get_rnk_code(rnkID)[0][0])
output += text
finoutput = addadminbox(subtitle + """   [?]""" % weburl, [output])
output = ""
text = """
Select """
output += createhiddenform(action="modifyrank",
text=text,
rnkID=rnkID,
button="Show template",
confirm=0)
try:
if template:
textarea = ""
text = """Content:"""
- file = open("%s/bibrank/%s" % (etcdir, template), 'r')
+ file = open("%s/bibrank/%s" % (CFG_ETCDIR, template), 'r')
lines = file.readlines()
for line in lines:
textarea += line
file.close()
text += """"""
output += text
except StandardError, e:
- output += """Cannot load file, either it does not exist, or not enough rights to read it: '%s/bibrank/%s'""" % (etcdir, template)
+ output += """Cannot load file, either it does not exist, or not enough rights to read it: '%s/bibrank/%s'""" % (CFG_ETCDIR, template)
finoutput += addadminbox("View templates", [output])
return finoutput
def perform_deleterank(rnkID, ln=cdslang, confirm=0):
"""form to delete a rank method
"""
subtitle =''
output = """
WARNING:
When deleting a rank method, you also deletes all data related to the rank method, like translations, which collections
it was attached to and the data necessary to rank the searchresults. Any scheduled tasks using the deleted rank method will also stop working.
For more information, please go to the BibRank guide and read the section regarding deleting a rank method.
""" % weburl
if rnkID:
if confirm in ["0", 0]:
rnkNAME = get_def_name(rnkID, "rnkMETHOD")[0][1]
subtitle = 'Step 1 - Confirm deletion'
text = """Delete rank method '%s'.""" % (rnkNAME)
output += createhiddenform(action="deleterank",
text=text,
button="Confirm",
rnkID=rnkID,
confirm=1)
elif confirm in ["1", 1]:
try:
rnkNAME = get_def_name(rnkID, "rnkMETHOD")[0][1]
rnkcode = get_rnk_code(rnkID)[0][0]
table = ""
try:
config = ConfigParser.ConfigParser()
- config.readfp(open("%s/bibrank/%s.cfg" % (etcdir, rnkcode), 'r'))
+ config.readfp(open("%s/bibrank/%s.cfg" % (CFG_ETCDIR, rnkcode), 'r'))
table = config.get(config.get('rank_method', "function"), "table")
except Exception:
pass
result = delete_rnk(rnkID, table)
subtitle = "Step 2 - Result"
if result:
text = """Rank method deleted"""
try:
- os.remove("%s/bibrank/%s.cfg" % (etcdir, rnkcode))
- text += """ Configuration file deleted: '%s/bibrank/%s.cfg'.""" % (etcdir, rnkcode)
+ os.remove("%s/bibrank/%s.cfg" % (CFG_ETCDIR, rnkcode))
+ text += """ Configuration file deleted: '%s/bibrank/%s.cfg'.""" % (CFG_ETCDIR, rnkcode)
except StandardError, e:
- text += """ Sorry, could not delete configuration file: '%s/bibrank/%s.cfg'. Please delete the file manually.""" % (etcdir, rnkcode)
+ text += """ Sorry, could not delete configuration file: '%s/bibrank/%s.cfg'. Please delete the file manually.""" % (CFG_ETCDIR, rnkcode)
else:
text = """Sorry, could not delete rank method"""
except StandardError, e:
text = """Sorry, could not delete rank method, most likely already deleted"""
output = text
body = [output]
return addadminbox(subtitle + """   [?]""" % weburl, body)
def perform_showrankdetails(rnkID, ln=cdslang):
"""Returns details about the rank method given by rnkID"""
if not rnkID:
return "No ranking method selected."
if not get_rnk_code(rnkID):
return "Ranking method %s does not seem to exist." % str(rnkID)
subtitle = """Overview [Modify]""" % (weburl, rnkID, ln)
text = """
BibRank code: %s
Last updated by BibRank:
""" % (get_rnk_code(rnkID)[0][0])
if get_rnk(rnkID)[0][2]:
text += "%s " % get_rnk(rnkID)[0][2]
else:
text += "Not yet run. "
output = addadminbox(subtitle, [text])
subtitle = """Rank method statistics"""
text = ""
try:
text = "Not yet implemented"
except StandardError, e:
text = "BibRank not yet run, cannot show statistics for method"
output += addadminbox(subtitle, [text])
subtitle = """Attached to collections [Modify]""" % (weburl, rnkID, ln)
text = ""
col = get_rnk_col(rnkID, ln)
for key, value in col:
text+= "%s " % value
if not col:
text +="No collections"
output += addadminbox(subtitle, [text])
subtitle = """Translations [Modify]""" % (weburl, rnkID, ln)
prev_lang = ''
trans = get_translations(rnkID)
types = get_rnk_nametypes()
types = dict(map(lambda x: (x[0], x[1]), types))
text = ""
languages = dict(get_languages())
if trans:
for lang, type, name in trans:
if lang and languages.has_key(lang) and type and name:
if prev_lang != lang:
prev_lang = lang
text += """%s: """ % (languages[lang])
if types.has_key(type):
text+= """'%s'(%s) """ % (name, types[type])
else:
text = """No translations exists"""
output += addadminbox(subtitle, [text])
- subtitle = """Configuration file: '%s/bibrank/%s.cfg' [Modify]""" % (etcdir, get_rnk_code(rnkID)[0][0], weburl, rnkID, ln)
+ subtitle = """Configuration file: '%s/bibrank/%s.cfg' [Modify]""" % (CFG_ETCDIR, get_rnk_code(rnkID)[0][0], weburl, rnkID, ln)
text = ""
try:
- file = open("%s/bibrank/%s.cfg" % (etcdir, get_rnk_code(rnkID)[0][0]))
+ file = open("%s/bibrank/%s.cfg" % (CFG_ETCDIR, get_rnk_code(rnkID)[0][0]))
text += """
"""
for line in file.readlines():
text += line
text += """
"""
except StandardError, e:
text = """Cannot load file, either it does not exist, or not enough rights to read it."""
output += addadminbox(subtitle, [text])
return output
def compare_on_val(second, first):
return cmp(second[1], first[1])
def get_rnk_code(rnkID):
"""Returns the name from rnkMETHOD based on argument
rnkID - id from rnkMETHOD"""
try:
res = run_sql("SELECT name FROM rnkMETHOD where id=%s" % (rnkID))
return res
except StandardError, e:
return ()
def get_rnk(rnkID=''):
"""Return one or all rank methods
rnkID - return the rank method given, or all if not given"""
try:
if rnkID:
res = run_sql("SELECT id,name,DATE_FORMAT(last_updated, '%%Y-%%m-%%d %%H:%%i:%%s') from rnkMETHOD WHERE id=%s" % rnkID)
else:
res = run_sql("SELECT id,name,DATE_FORMAT(last_updated, '%%Y-%%m-%%d %%H:%%i:%%s') from rnkMETHOD")
return res
except StandardError, e:
return ()
def get_translations(rnkID):
"""Returns the translations in rnkMETHODNAME for a rankmethod
rnkID - the id of the rankmethod from rnkMETHOD """
try:
res = run_sql("SELECT ln, type, value FROM rnkMETHODNAME where id_rnkMETHOD=%s ORDER BY ln,type" % (rnkID))
return res
except StandardError, e:
return ()
def get_rnk_nametypes():
"""Return a list of the various translationnames for the rank methods"""
type = []
type.append(('ln', 'Long name'))
#type.append(('sn', 'Short name'))
return type
def get_col_nametypes():
"""Return a list of the various translationnames for the rank methods"""
type = []
type.append(('ln', 'Long name'))
return type
def get_rnk_col(rnkID, ln=cdslang):
""" Returns a list of the collections the given rank method is attached to
rnkID - id from rnkMETHOD"""
try:
res1 = dict(run_sql("SELECT id_collection, '' FROM collection_rnkMETHOD WHERE id_rnkMETHOD=%s" % rnkID))
res2 = get_def_name('', "collection")
result = filter(lambda x: res1.has_key(x[0]), res2)
return result
except StandardError, e:
return ()
def get_templates():
- """Read etcdir/bibrank and returns a list of all files with 'template' """
+ """Read CFG_ETCDIR/bibrank and returns a list of all files with 'template' """
templates = []
- files = os.listdir(etcdir + "/bibrank/")
+ files = os.listdir(CFG_ETCDIR + "/bibrank/")
for file in files:
if str.find(file,"template_") != -1:
templates.append(file)
return templates
def attach_col_rnk(rnkID, colID):
"""attach rank method to collection
rnkID - id from rnkMETHOD table
colID - id of collection, as in collection table """
try:
res = run_sql("INSERT INTO collection_rnkMETHOD(id_collection, id_rnkMETHOD) values (%s,%s)" % (colID, rnkID))
return (1, "")
except StandardError, e:
return (0, e)
def detach_col_rnk(rnkID, colID):
"""detach rank method from collection
rnkID - id from rnkMETHOD table
colID - id of collection, as in collection table """
try:
res = run_sql("DELETE FROM collection_rnkMETHOD WHERE id_collection=%s AND id_rnkMETHOD=%s" % (colID, rnkID))
return (1, "")
except StandardError, e:
return (0, e)
def delete_rnk(rnkID, table=""):
"""Deletes all data for the given rank method
rnkID - delete all data in the tables associated with ranking and this id """
try:
res = run_sql("DELETE FROM rnkMETHOD WHERE id=%s" % rnkID)
res = run_sql("DELETE FROM rnkMETHODNAME WHERE id_rnkMETHOD=%s" % rnkID)
res = run_sql("DELETE FROM collection_rnkMETHOD WHERE id_rnkMETHOD=%s" % rnkID)
res = run_sql("DELETE FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s" % rnkID)
if table:
res = run_sql("truncate %s" % table)
res = run_sql("truncate %sR" % table[:-1])
return (1, "")
except StandardError, e:
return (0, e)
def modify_rnk(rnkID, rnkcode):
"""change the code for the rank method given
rnkID - change in rnkMETHOD where id is like this
rnkcode - new value for field 'name' in rnkMETHOD """
try:
res = run_sql("UPDATE rnkMETHOD set name=%s WHERE id=%s", (rnkcode, rnkID))
return (1, "")
except StandardError, e:
return (0, e)
def add_rnk(rnkcode):
"""Adds a new rank method to rnkMETHOD
rnkcode - the "code" for the rank method, to be used by bibrank daemon """
try:
res = run_sql("INSERT INTO rnkMETHOD (name) VALUES (%s)", (rnkcode,))
res = run_sql("SELECT id FROM rnkMETHOD WHERE name=%s", (rnkcode,))
if res:
return (1, res[0][0])
else:
raise StandardError
except StandardError, e:
return (0, e)
def addadminbox(header='', datalist=[], cls="admin_wvar"):
"""used to create table around main data on a page, row based.
header - header on top of the table
datalist - list of the data to be added row by row
cls - possible to select wich css-class to format the look of the table."""
if len(datalist) == 1: per = '100'
else: per = '75'
output = '
"""
return output
def tupletotable(header=[], tuple=[], start='', end='', extracolumn=''):
"""create html table for a tuple.
header - optional header for the columns
tuple - create table of this
start - text to be added in the beginning, most likely beginning of a form
end - text to be added in the end, mot likely end of a form.
extracolumn - mainly used to put in a button. """
# study first row in tuple for alignment
align = []
try:
firstrow = tuple[0]
if type(firstrow) in [int, long]:
align = ['admintdright']
elif type(firstrow) in [str, dict]:
align = ['admintdleft']
else:
for item in firstrow:
if type(item) is int:
align.append('admintdright')
else:
align.append('admintdleft')
except IndexError:
firstrow = []
tblstr = ''
for h in header + ['']:
tblstr += '
%s
\n' % (h, )
if tblstr: tblstr = '
\n%s\n
\n' % (tblstr, )
tblstr = start + '
\n' + tblstr
# extra column
try:
extra = '
'
if type(firstrow) not in [int, long, str, dict]:
# for data in firstrow: extra += '
%s
\n' % ('admintd', data)
for i in range(len(firstrow)): extra += '
%s
\n' % (align[i], firstrow[i])
else:
extra += '
%s
\n' % (align[0], firstrow)
extra += '
\n%s\n
\n
\n' % (len(tuple), extracolumn)
except IndexError:
extra = ''
tblstr += extra
# for i in range(1, len(tuple)):
for row in tuple[1:]:
tblstr += '
\n'
# row = tuple[i]
if type(row) not in [int, long, str, dict]:
# for data in row: tblstr += '
%s
\n' % (data,)
for i in range(len(row)): tblstr += '
%s
\n' % (align[i], row[i])
else:
tblstr += '
%s
\n' % (align[0], row)
tblstr += '
\n'
tblstr += '
\n '
tblstr += end
return tblstr
def tupletotable_onlyselected(header=[], tuple=[], selected=[], start='', end='', extracolumn=''):
"""create html table for a tuple.
header - optional header for the columns
tuple - create table of this
selected - indexes of selected rows in the tuple
start - put this in the beginning
end - put this in the beginning
extracolumn - mainly used to put in a button"""
tuple2 = []
for index in selected:
tuple2.append(tuple[int(index)-1])
return tupletotable(header=header,
tuple=tuple2,
start=start,
end=end,
extracolumn=extracolumn)
def addcheckboxes(datalist=[], name='authids', startindex=1, checked=[]):
"""adds checkboxes in front of the listdata.
datalist - add checkboxes in front of this list
name - name of all the checkboxes, values will be associated with this name
startindex - usually 1 because of the header
checked - values of checkboxes to be pre-checked """
if not type(checked) is list: checked = [checked]
for row in datalist:
if 1 or row[0] not in [-1, "-1", 0, "0"]: # always box, check another place
chkstr = str(startindex) in checked and 'checked="checked"' or ''
row.insert(0, '' % (name, startindex, chkstr))
else:
row.insert(0, '')
startindex += 1
return datalist
def createhiddenform(action="", text="", button="confirm", cnfrm='', **hidden):
"""create select with hidden values and submit button
action - name of the action to perform on submit
text - additional text, can also be used to add non hidden input
button - value/caption on the submit button
cnfrm - if given, must check checkbox to confirm
**hidden - dictionary with name=value pairs for hidden input """
output = '\n'
return output
def get_languages():
languages = []
for (lang, lang_namelong) in language_list_long():
languages.append((lang, lang_namelong))
languages.sort()
return languages
def get_def_name(ID, table):
"""Returns a list of the names, either with the name in the current language, the default language, or just the name from the given table
ln - a language supported by CDS Invenio
type - the type of value wanted, like 'ln', 'sn'"""
name = "name"
if table[-1:].isupper():
name = "NAME"
try:
if ID:
res = run_sql("SELECT id,name FROM %s where id=%s" % (table, ID))
else:
res = run_sql("SELECT id,name FROM %s" % table)
res = list(res)
res.sort(compare_on_val)
return res
except StandardError, e:
return []
def get_i8n_name(ID, ln, rtype, table):
"""Returns a list of the names, either with the name in the current language, the default language, or just the name from the given table
ln - a language supported by CDS Invenio
type - the type of value wanted, like 'ln', 'sn'"""
name = "name"
if table[-1:].isupper():
name = "NAME"
try:
res = ""
if ID:
res = run_sql("SELECT id_%s,value FROM %s%s where type='%s' and ln='%s' and id_%s=%s" % (table, table, name, rtype,ln, table, ID))
else:
res = run_sql("SELECT id_%s,value FROM %s%s where type='%s' and ln='%s'" % (table, table, name, rtype,ln))
if ln != cdslang:
if ID:
res1 = run_sql("SELECT id_%s,value FROM %s%s WHERE ln='%s' and type='%s' and id_%s=%s" % (table, table, name, cdslang, rtype, table, ID))
else:
res1 = run_sql("SELECT id_%s,value FROM %s%s WHERE ln='%s' and type='%s'" % (table, table, name, cdslang, rtype))
res2 = dict(res)
result = filter(lambda x: not res2.has_key(x[0]), res1)
res = res + result
if ID:
res1 = run_sql("SELECT id,name FROM %s where id=%s" % (table, ID))
else:
res1 = run_sql("SELECT id,name FROM %s" % table)
res2 = dict(res)
result = filter(lambda x: not res2.has_key(x[0]), res1)
res = res + result
res = list(res)
res.sort(compare_on_val)
return res
except StandardError, e:
raise StandardError
def get_name(ID, ln, rtype, table):
"""Returns the value from the table name based on arguments
ID - id
ln - a language supported by CDS Invenio
type - the type of value wanted, like 'ln', 'sn'
table - tablename"""
name = "name"
if table[-1:].isupper():
name = "NAME"
try:
res = run_sql("SELECT value FROM %s%s WHERE type='%s' and ln='%s' and id_%s=%s" % (table, name, rtype, ln, table, ID))
return res
except StandardError, e:
return ()
def modify_translations(ID, langs, sel_type, trans, table):
"""add or modify translations in tables given by table
frmID - the id of the format from the format table
sel_type - the name type
langs - the languages
trans - the translations, in same order as in langs
table - the table"""
name = "name"
if table[-1:].isupper():
name = "NAME"
try:
for nr in range(0,len(langs)):
res = run_sql("SELECT value FROM %s%s WHERE id_%s=%%s AND type=%%s AND ln=%%s" % (table, name, table),
(ID, sel_type, langs[nr][0]))
if res:
if trans[nr]:
res = run_sql("UPDATE %s%s SET value=%%s WHERE id_%s=%%s AND type=%%s AND ln=%%s" % (table, name, table),
(trans[nr], ID, sel_type, langs[nr][0]))
else:
res = run_sql("DELETE FROM %s%s WHERE id_%s=%%s AND type=%%s AND ln=%%s" % (table, name, table),
(ID, sel_type, langs[nr][0]))
else:
if trans[nr]:
res = run_sql("INSERT INTO %s%s (id_%s, type, ln, value) VALUES (%%s,%%s,%%s,%%s)" % (table, name, table),
(ID, sel_type, langs[nr][0], trans[nr]))
return (1, "")
except StandardError, e:
return (0, e)
def write_outcome(res):
try:
if res and res[0] == 1:
return """Operation successfully completed."""
elif res:
return """Operation failed. Reason: %s""" % res[1][1]
except Exception, e:
return """Operation failed. Reason unknown """
diff --git a/modules/bibrank/lib/bibrankgkb.py b/modules/bibrank/lib/bibrankgkb.py
index f90cec1a9..94373c1b1 100644
--- a/modules/bibrank/lib/bibrankgkb.py
+++ b/modules/bibrank/lib/bibrankgkb.py
@@ -1,286 +1,286 @@
## -*- mode: python; coding: utf-8; -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
Usage: bibrankgkb %s [options]
Examples:
bibrankgkb --input=bibrankgkb.cfg --output=test.kb
bibrankgkb -otest.kb -v9
bibrankgkb -v9
Generate options:
-i, --input=file input file, default from /etc/bibrank/bibrankgkb.cfg
-o, --output=file output file, will be placed in current folder
General options:
-h, --help print this help and exit
-V, --version print version and exit
-v, --verbose=LEVEL verbose level (from 0 to 9, default 1)
"""
__revision__ = "$Id$"
import getopt
import sys
import time
import urllib
import re
import ConfigParser
-from invenio.config import etcdir
+from invenio.config import CFG_ETCDIR
from invenio.dbquery import run_sql
opts_dict = {}
task_id = -1
def bibrankgkb(config):
"""Generates a .kb file based on input from the configuration file"""
if opts_dict["verbose"] >= 1:
write_message("Running: Generate Knowledgebase.")
journals = {}
journal_src = {}
i = 0
#Reading the configuration file
while config.has_option("bibrankgkb","create_%s" % i):
cfg = config.get("bibrankgkb", "create_%s" % i).split(",,")
conv = {}
temp = {}
#Input source 1, either file, www or from db
if cfg[0] == "file":
conv = get_from_source(cfg[0], cfg[1])
del cfg[0:2]
elif cfg[0] == "www":
j = 0
urls = {}
while config.has_option("bibrankgkb", cfg[1] % j):
urls[j] = config.get("bibrankgkb", cfg[1] % j)
j = j + 1
conv = get_from_source(cfg[0], (urls, cfg[2]))
del cfg[0:3]
elif cfg[0] == "db":
conv = get_from_source(cfg[0], (cfg[1], cfg[2]))
del cfg[0:3]
if not conv:
del cfg[0:2]
else:
if opts_dict["verbose"] >= 9:
write_message("Using last resource for converting values.")
#Input source 2, either file, www or from db
if cfg[0] == "file":
temp = get_from_source(cfg[0], cfg[1])
elif cfg[0] == "www":
j = 0
urls = {}
while config.has_option("bibrankgkb", cfg[1] % j):
urls[j] = config.get("bibrankgkb", cfg[1] % j)
j = j + 1
temp = get_from_source(cfg[0], (urls, cfg[2]))
elif cfg[0] == "db":
temp = get_from_source(cfg[0], (cfg[1], cfg[2]))
i = i + 1
#If a convertion file is given, the names will be converted to the correct convention
if len(conv) != 0:
if opts_dict["verbose"] >= 9:
write_message("Converting between naming conventions given.")
temp = convert(conv, temp)
if len(journals) != 0:
for element in temp.keys():
if not journals.has_key(element):
journals[element] = temp[element]
else:
journals = temp
#Writing output file
if opts_dict["output"]:
f = open(opts_dict["output"], 'w')
f.write("#Created by %s\n" % __revision__)
f.write("#Sources:\n")
for key in journals.keys():
f.write("%s---%s\n" % (key, journals[key]))
f.close()
if opts_dict["verbose"] >= 9:
write_message("Output complete: %s" % opts_dict["output"])
write_message("Number of hits: %s" % len(journals))
if opts_dict["verbose"] >= 9:
write_message("Result:")
for key in journals.keys():
write_message("%s---%s" % (key, journals[key]))
write_message("Total nr of lines: %s" % len(journals))
def showtime(timeused):
if opts_dict["verbose"] >= 9:
write_message("Time used: %d second(s)." % timeused)
def get_from_source(type, data):
"""Read a source based on the input to the function"""
datastruct = {}
if type == "db":
jvalue = run_sql(data[0])
jname = dict(run_sql(data[1]))
if opts_dict["verbose"] >= 9:
write_message("Reading data from database using SQL statements:")
write_message(jvalue)
write_message(jname)
for key, value in jvalue:
if jname.has_key(key):
key2 = jname[key].strip()
datastruct[key2] = value
#print "%s---%s" % (key2, value)
elif type == "file":
input = open(data, 'r')
if opts_dict["verbose"] >= 9:
write_message("Reading data from file: %s" % data)
data = input.readlines()
datastruct = {}
for line in data:
#print line
if not line[0:1] == "#":
key = line.strip().split("---")[0].split()
value = line.strip().split("---")[1]
datastruct[key] = value
#print "%s---%s" % (key,value)
elif type == "www":
if opts_dict["verbose"] >= 9:
write_message("Reading data from www using regexp: %s" % data[1])
write_message("Reading data from url:")
for link in data[0].keys():
if opts_dict["verbose"] >= 9:
write_message(data[0][link])
page = urllib.urlopen(data[0][link])
input = page.read()
#Using the regexp from config file
reg = re.compile(data[1])
iterator = re.finditer(reg, input)
for match in iterator:
if match.group("value"):
key = match.group("key").strip()
value = match.group("value").replace(",", ".")
datastruct[key] = value
if opts_dict["verbose"] == 9:
print "%s---%s" % (key, value)
return datastruct
def convert(convstruct, journals):
"""Converting between names"""
if len(convstruct) > 0 and len(journals) > 0:
invconvstruct = dict(map(lambda x: (x[1], x[0]), convstruct.items()))
tempjour = {}
for name in journals.keys():
if convstruct.has_key(name):
tempjour[convstruct[name]] = journals[name]
elif invconvstruct.has_key(name):
tempjour[name] = journals[name]
return tempjour
else:
return journals
def write_message(msg, stream = sys.stdout):
"""Write message and flush output stream (may be sys.stdout or sys.stderr). Useful for debugging stuff."""
if stream == sys.stdout or stream == sys.stderr:
stream.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime()))
try:
stream.write("%s\n" % msg)
except UnicodeEncodeError:
stream.write("%s\n" % msg.encode('ascii', 'backslashreplace'))
stream.flush()
else:
sys.stderr.write("Unknown stream %s. [must be sys.stdout or sys.stderr]\n" % stream)
return
def usage(code, msg=''):
"Prints usage for this module."
if msg:
sys.stderr.write("Error: %s.\n" % msg)
print >> sys.stderr, \
""" Usage: %s [options]
Examples:
%s --input=bibrankgkb.cfg --output=test.kb
%s -otest.kb -v9
%s -v9
Generate options:
-i, --input=file input file, default from /etc/bibrank/bibrankgkb.cfg
-o, --output=file output file, will be placed in current folder
General options:
-h, --help print this help and exit
-V, --version print version and exit
-v, --verbose=LEVEL verbose level (from 0 to 9, default 1)
""" % ((sys.argv[0],) * 4)
sys.exit(code)
def command_line():
global opts_dict
long_flags = ["input=", "output=", "help", "version", "verbose="]
short_flags = "i:o:hVv:"
format_string = "%Y-%m-%d %H:%M:%S"
sleeptime = ""
try:
opts, args = getopt.getopt(sys.argv[1:], short_flags, long_flags)
except getopt.GetoptError, err:
write_message(err, sys.stderr)
usage(1)
if args:
usage(1)
- opts_dict = {"input": "%s/bibrank/bibrankgkb.cfg" % etcdir, "output":"", "verbose":1}
+ opts_dict = {"input": "%s/bibrank/bibrankgkb.cfg" % CFG_ETCDIR, "output":"", "verbose":1}
sched_time = time.strftime(format_string)
user = ""
try:
for opt in opts:
if opt == ("-h","") or opt == ("--help",""):
usage(1)
elif opt == ("-V","") or opt == ("--version",""):
print __revision__
sys.exit(1)
elif opt[0] in ["--input", "-i"]:
opts_dict["input"] = opt[1]
elif opt[0] in ["--output", "-o"]:
opts_dict["output"] = opt[1]
elif opt[0] in ["--verbose", "-v"]:
opts_dict["verbose"] = int(opt[1])
else:
usage(1)
startCreate = time.time()
file = opts_dict["input"]
config = ConfigParser.ConfigParser()
config.readfp(open(file))
bibrankgkb(config)
if opts_dict["verbose"] >= 9:
showtime((time.time() - startCreate))
except StandardError, e:
write_message(e, sys.stderr)
sys.exit(1)
return
def main():
command_line()
if __name__ == "__main__":
main()
diff --git a/modules/bibsched/lib/bibsched.py b/modules/bibsched/lib/bibsched.py
index 1fc63fe69..517ba8038 100644
--- a/modules/bibsched/lib/bibsched.py
+++ b/modules/bibsched/lib/bibsched.py
@@ -1,936 +1,936 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""BibSched - task management, scheduling and executing system for CDS Invenio
"""
__revision__ = "$Id$"
### -- local configuration section starts here ---
# which tasks are recognized as valid?
cfg_valid_processes = ["bibindex", "bibupload", "bibreformat",
"webcoll", "bibtaskex", "bibrank",
"oaiharvest", "oaiarchive", "inveniogc",
"webstatadmin", "bibclassifyd"]
### -- local configuration section ends here ---
import os
import string
import sys
import time
import re
import marshal
import getopt
import curses
import curses.panel
from curses.wrapper import wrapper
from socket import gethostname
import signal
from invenio.config import \
CFG_PREFIX, \
CFG_BIBSCHED_REFRESHTIME, \
CFG_BIBSCHED_LOG_PAGER, \
- bindir, \
- logdir
+ CFG_BINDIR, \
+ CFG_LOGDIR
from invenio.dbquery import run_sql
def get_datetime(var, format_string="%Y-%m-%d %H:%M:%S"):
"""Returns a date string according to the format string.
It can handle normal date strings and shifts with respect
to now."""
try:
date = time.time()
shift_re = re.compile("([-\+]{0,1})([\d]+)([dhms])")
factors = {"d":24*3600, "h":3600, "m":60, "s":1}
m = shift_re.match(var)
if m:
sign = m.groups()[0] == "-" and -1 or 1
factor = factors[m.groups()[2]]
value = float(m.groups()[1])
date = time.localtime(date + sign * factor * value)
date = time.strftime(format_string, date)
else:
date = time.strptime(var, format_string)
date = time.strftime(format_string, date)
return date
except:
return None
def get_my_pid(process, args=''):
if sys.platform.startswith('freebsd'):
COMMAND = "ps -o pid,args | grep '%s %s' | grep -v 'grep' | sed -n 1p" % (process, args)
else:
COMMAND = "ps -C %s o '%%p%%a' | grep '%s %s' | grep -v 'grep' | sed -n 1p" % (process, process, args)
answer = string.strip(os.popen(COMMAND).read())
if answer == '':
answer = 0
else:
answer = answer[:string.find(answer,' ')]
return int(answer)
def get_task_pid(task_name, task_id):
"""Return the pid of task_name/task_id"""
try:
pid = int(open(os.path.join(CFG_PREFIX, 'var', 'run', 'bibsched_task_%d.pid' % task_id)).read())
except IOError:
return get_my_pid(task_name, str(task_id))
try:
os.kill(pid, signal.SIGCONT)
except OSError:
return get_my_pid(task_name, str(task_id))
return pid
def get_output_channelnames(task_id):
"Construct and return filename for stdout and stderr for the task 'task_id'."
- filenamebase = "%s/bibsched_task_%d" % (logdir, task_id)
+ filenamebase = "%s/bibsched_task_%d" % (CFG_LOGDIR, task_id)
return [filenamebase + ".log", filenamebase + ".err"]
def is_task_scheduled(task_name):
"""Check if a certain task_name is due for execution (WAITING or RUNNING)"""
sql = "SELECT COUNT(proc) FROM schTASK WHERE proc = %s AND (status = 'WAITING' or status = 'RUNNING')"
return run_sql(sql, (task_name,))[0][0] > 0
def get_task_ids_by_descending_date(task_name, statuses=['SCHEDULED']):
"""Returns list of task ids, ordered by descending runtime."""
sql = "SELECT id FROM schTASK WHERE proc=%s AND (" + \
" OR ".join(["status = '%s'" % x for x in statuses]) + ") ORDER BY runtime DESC"
return [x[0] for x in run_sql(sql, (task_name,))]
def get_task_options(task_id):
"""Returns options for task_id read from the BibSched task queue table."""
res = run_sql("SELECT arguments FROM schTASK WHERE id=%s", (task_id,))
try:
return marshal.loads(res[0][0])
except IndexError:
return {}
class Manager:
def __init__(self):
self.helper_modules = cfg_valid_processes
self.running = 1
self.footer_move_mode = "[KeyUp/KeyDown Move] [M Select mode] [Q Quit]"
self.footer_auto_mode = "[A Manual mode] [1/2 Display Type] [P Purge Done] [Q Quit]"
self.footer_select_mode = "[KeyUp/KeyDown/PgUp/PgDown Select] [L View Log] [1/2 Display Type] [M Move mode] [A Auto mode] [Q Quit]"
self.footer_waiting_item = "[R Run] [D Delete]"
self.footer_running_item = "[S Sleep] [T Stop] [K Kill]"
self.footer_stopped_item = "[I Initialise] [D Delete]"
self.footer_sleeping_item = "[W Wake Up]"
self.item_status = ""
self.selected_line = 2
self.rows = []
self.panel = None
self.display = 2
self.first_visible_line = 0
self.move_mode = 0
self.auto_mode = 0
self.currentrow = ["", "", "", "", "", "", ""]
wrapper(self.start)
def handle_keys(self, chr):
if chr == -1:
return
if self.auto_mode and (chr not in (curses.KEY_UP, curses.KEY_DOWN,
curses.KEY_PPAGE, curses.KEY_NPAGE,
ord("q"), ord("Q"), ord("a"),
ord("A"), ord("1"), ord("2"),
ord("p"), ord("P"))):
self.display_in_footer("in automatic mode")
self.stdscr.refresh()
elif self.move_mode and (chr not in (curses.KEY_UP, curses.KEY_DOWN,
ord("m"), ord("M"), ord("q"),
ord("Q"))):
self.display_in_footer("in move mode")
self.stdscr.refresh()
else:
if chr == curses.KEY_UP:
if self.move_mode:
self.move_up()
else:
self.selected_line = max(self.selected_line - 1, 2)
self.repaint()
if chr == curses.KEY_PPAGE:
self.selected_line = max(self.selected_line - 10, 2)
self.repaint()
elif chr == curses.KEY_DOWN:
if self.move_mode:
self.move_down()
else:
self.selected_line = min(self.selected_line + 1, len(self.rows) + 1 )
self.repaint()
elif chr == curses.KEY_NPAGE:
self.selected_line = min(self.selected_line + 10, len(self.rows) + 1 )
self.repaint()
elif chr == curses.KEY_HOME:
self.first_visible_line = 0
self.selected_line = 2
elif chr in (ord("a"), ord("A")):
self.change_auto_mode()
elif chr in (ord("l"), ord("L")):
self.openlog()
elif chr in (ord("w"), ord("W")):
self.wakeup()
elif chr in (ord("r"), ord("R")):
self.run()
elif chr in (ord("s"), ord("S")):
self.sleep()
elif chr in (ord("k"), ord("K")):
self.kill()
elif chr in (ord("t"), ord("T")):
self.stop()
elif chr in (ord("d"), ord("D")):
self.delete()
elif chr in (ord("i"), ord("I")):
self.init()
elif chr in (ord("m"), ord("M")):
self.change_select_mode()
elif chr in (ord("p"), ord("P")):
self.purge_done()
elif chr == ord("1"):
self.display = 1
self.first_visible_line = 0
self.selected_line = 2
self.display_in_footer("only done processes are displayed")
elif chr == ord("2"):
self.display = 2
self.first_visible_line = 0
self.selected_line = 2
self.display_in_footer("only not done processes are displayed")
elif chr in (ord("q"), ord("Q")):
if curses.panel.top_panel() == self.panel:
self.panel.bottom()
curses.panel.update_panels()
else:
self.running = 0
return
def set_status(self, task_id, status):
return run_sql("UPDATE schTASK set status=%s WHERE id=%s", (status, task_id))
def set_progress(self, task_id, progress):
return run_sql("UPDATE schTASK set progress=%s WHERE id=%s", (progress, task_id))
def openlog(self):
task_id = self.currentrow[0]
status = self.currentrow[5]
if status != 'WAITING':
tmpname = os.tmpnam()
tmpfile = open(tmpname, "w")
try:
- tmpfile.write(open(os.path.join(logdir, 'bibsched_task_%d.log' % task_id)).read())
+ tmpfile.write(open(os.path.join(CFG_LOGDIR, 'bibsched_task_%d.log' % task_id)).read())
except IOError:
pass
try:
- tmpfile.write(open(os.path.join(logdir, 'bibsched_task_%d.err' % task_id)).read())
+ tmpfile.write(open(os.path.join(CFG_LOGDIR, 'bibsched_task_%d.err' % task_id)).read())
except IOError:
pass
tmpfile.close()
if CFG_BIBSCHED_LOG_PAGER:
pager = CFG_BIBSCHED_LOG_PAGER
else:
pager = os.environ.get('PAGER', '/bin/more')
curses.endwin()
os.spawnlp(os.P_WAIT, pager, pager, tmpname)
os.remove(tmpname)
curses.panel.update_panels()
def count_processes(self, status):
out = 0
res = run_sql("SELECT COUNT(id) FROM schTASK WHERE status=%s GROUP BY status", (status,))
try:
out = res[0][0]
except:
pass
return out
def wakeup(self):
task_id = self.currentrow[0]
process = self.currentrow[1]
status = self.currentrow[5]
if self.count_processes('RUNNING') + self.count_processes('CONTINUING') >= 1:
self.display_in_footer("a process is already running!")
elif status == "SLEEPING":
mypid = get_task_pid(process, task_id)
if mypid != 0:
os.kill(mypid, signal.SIGCONT)
self.display_in_footer("process woken up")
else:
self.display_in_footer("process is not sleeping")
self.stdscr.refresh()
def _display_YN_box(self, msg):
msg += ' (Y/N)'
rows = msg.split('\n')
height = len(rows) + 2
width = max([len(row) for row in rows]) + 4
self.win = curses.newwin(
height,
width,
(self.height - height) / 2 + 1,
(self.width - width) / 2 + 1
)
self.panel = curses.panel.new_panel( self.win )
self.panel.top()
self.win.border()
i = 1
for row in rows:
self.win.addstr(i, 2, row)
i += 1
self.win.refresh()
while 1:
c = self.win.getch()
if c in (ord('y'), ord('Y')):
return True
elif c in (ord('n'), ord('N')):
return False
def purge_done(self):
if self._display_YN_box("You are going to purge all the list of DONE tasks.\n"
"This will definitely alter your task history.\nAre you sure?"):
run_sql("DELETE FROM schTASK WHERE status='DONE'")
curses.panel.update_panels()
self.display_in_footer("DONE processes purged")
else:
curses.panel.update_panels()
def run(self):
task_id = self.currentrow[0]
process = self.currentrow[1]
status = self.currentrow[5]
sleeptime = self.currentrow[4]
if self.count_processes('RUNNING') + self.count_processes('CONTINUING') >= 1:
self.display_in_footer("a process is already running!")
elif status == "STOPPED" or status == "WAITING":
if process in self.helper_modules:
- program = os.path.join(bindir, process)
+ program = os.path.join(CFG_BINDIR, process)
fdout, fderr = get_output_channelnames(task_id)
COMMAND = "%s %s >> %s 2>> %s &" % (program, str(task_id), fdout, fderr)
os.system(COMMAND)
Log("manually running task #%d (%s)" % (task_id, process))
if sleeptime:
new_runtime = get_datetime(sleeptime)
new_task_arguments = marshal.loads(self.currentrow[7])
new_task_arguments["runtime"] = new_runtime
new_task_id = run_sql("INSERT INTO schTASK (proc,user,runtime,sleeptime,arguments,status)"\
" VALUES (%s,%s,%s,%s,%s,'WAITING')",
(process, self.currentrow[2], new_runtime, sleeptime,
self.currentrow[7]))
new_task_arguments["task"] = new_task_id
run_sql("""UPDATE schTASK SET arguments=%s WHERE id=%s""",
(marshal.dumps(new_task_arguments), new_task_id))
else:
self.display_in_footer("process status should be STOPPED or WAITING!")
self.stdscr.refresh()
def sleep(self):
task_id = self.currentrow[0]
process = self.currentrow[1]
status = self.currentrow[5]
if status != 'RUNNING' and status != 'CONTINUING':
self.display_in_footer("this process is not running!")
else:
mypid = get_task_pid(process, task_id)
if mypid != 0:
os.kill(mypid, signal.SIGUSR1)
self.display_in_footer("USR1 signal sent to process #%s" % mypid)
else:
self.set_status(task_id, 'STOPPED')
self.display_in_footer("cannot find process...")
self.stdscr.refresh()
def kill(self):
task_id = self.currentrow[0]
process = self.currentrow[1]
#status = self.currentrow[5]
mypid = get_task_pid(process, task_id)
if mypid != 0:
os.kill(mypid, signal.SIGKILL)
self.set_status(task_id, 'STOPPED')
self.display_in_footer("KILL signal sent to process #%s" % mypid)
else:
self.set_status(task_id, 'STOPPED')
self.display_in_footer("cannot find process...")
self.stdscr.refresh()
def stop(self):
task_id = self.currentrow[0]
process = self.currentrow[1]
#status = self.currentrow[5]
mypid = get_task_pid(process, task_id)
if mypid != 0:
os.kill(mypid, signal.SIGTERM)
self.display_in_footer("TERM signal sent to process #%s" % mypid)
else:
self.set_status(task_id, 'STOPPED')
self.display_in_footer("cannot find process...")
self.stdscr.refresh()
def delete(self):
task_id = self.currentrow[0]
#process = self.currentrow[1]
status = self.currentrow[5]
if status != 'RUNNING' and status != 'CONTINUING' and status != 'SLEEPING':
self.set_status(task_id, "%s_DELETED" % status)
self.display_in_footer("process deleted")
self.selected_line = max(self.selected_line, 2)
else:
self.display_in_footer("cannot delete running processes")
self.stdscr.refresh()
def init(self):
task_id = self.currentrow[0]
#process = self.currentrow[1]
status = self.currentrow[5]
if status != 'RUNNING' and status != 'CONTINUING' and status != 'SLEEPING':
self.set_status(task_id, "WAITING")
self.set_progress(task_id, "None")
self.display_in_footer("process initialised")
else:
self.display_in_footer("cannot initialise running processes")
self.stdscr.refresh()
def change_select_mode(self):
if self.move_mode:
self.move_mode = 0
else:
status = self.currentrow[5]
if status in ( "RUNNING" , "CONTINUING" , "SLEEPING" ):
self.display_in_footer("cannot move running processes!")
else:
self.move_mode = 1
self.stdscr.refresh()
def change_auto_mode(self):
if self.auto_mode:
- program = os.path.join(bindir, "bibsched")
+ program = os.path.join(CFG_BINDIR, "bibsched")
COMMAND = "%s -q stop" % program
os.system(COMMAND)
self.auto_mode = 0
else:
- program = os.path.join( bindir, "bibsched")
+ program = os.path.join( CFG_BINDIR, "bibsched")
COMMAND = "%s -q start" % program
os.system(COMMAND)
self.auto_mode = 1
self.move_mode = 0
self.stdscr.refresh()
def move_up(self):
self.display_in_footer("not implemented yet")
self.stdscr.refresh()
def move_down(self):
self.display_in_footer("not implemented yet")
self.stdscr.refresh()
def put_line(self, row):
col_w = [ 5 , 11 , 21 , 21 , 7 , 11 , 25 ]
maxx = self.width
if self.y == self.selected_line - self.first_visible_line and self.y > 1:
if self.auto_mode:
attr = curses.color_pair(2) + curses.A_STANDOUT + curses.A_BOLD
elif self.move_mode:
attr = curses.color_pair(7) + curses.A_STANDOUT + curses.A_BOLD
else:
attr = curses.color_pair(8) + curses.A_STANDOUT + curses.A_BOLD
self.item_status = row[5]
self.currentrow = row
elif self.y == 0:
if self.auto_mode:
attr = curses.color_pair(2) + curses.A_STANDOUT + curses.A_BOLD
elif self.move_mode:
attr = curses.color_pair(7) + curses.A_STANDOUT + curses.A_BOLD
else:
attr = curses.color_pair(8) + curses.A_STANDOUT + curses.A_BOLD
elif row[5] == "DONE":
attr = curses.color_pair(5) + curses.A_BOLD
elif row[5] == "STOPPED":
attr = curses.color_pair(6) + curses.A_BOLD
elif row[5].find("ERROR") > -1:
attr = curses.color_pair(4) + curses.A_BOLD
elif row[5] == "WAITING":
attr = curses.color_pair(3) + curses.A_BOLD
elif row[5] in ("RUNNING","CONTINUING") :
attr = curses.color_pair(2) + curses.A_BOLD
else:
attr = curses.A_BOLD
myline = str(row[0]).ljust(col_w[0])
myline += str(row[1]).ljust(col_w[1])
myline += str(row[2]).ljust(col_w[2])
myline += str(row[3])[:19].ljust(col_w[3])
myline += str(row[4]).ljust(col_w[4])
myline += str(row[5]).ljust(col_w[5])
myline += str(row[6]).ljust(col_w[6])
myline = myline.ljust(maxx)
self.stdscr.addnstr(self.y, 0, myline, maxx, attr)
self.y = self.y+1
def display_in_footer(self, footer, i = 0, print_time_p=0):
if print_time_p:
footer = "%s %s" % (footer, time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
maxx = self.stdscr.getmaxyx()[1]
footer = footer.ljust(maxx)
if self.auto_mode:
colorpair = 2
elif self.move_mode:
colorpair = 7
else:
colorpair = 1
self.stdscr.addnstr(self.y - i, 0, footer, maxx - 1, curses.A_STANDOUT + curses.color_pair(colorpair) + curses.A_BOLD )
def repaint(self):
self.y = 0
self.stdscr.clear()
self.height, self.width = self.stdscr.getmaxyx()
maxy = self.height - 2
#maxx = self.width
self.put_line( ("ID", "PROC", "USER", "RUNTIME", "SLEEP", "STATUS", "PROGRESS") )
self.put_line( ("---", "----", "----", "-------------------", "-----", "-----", "--------") )
if self.selected_line > maxy + self.first_visible_line - 1:
self.first_visible_line = self.selected_line - maxy + 1
if self.selected_line < self.first_visible_line + 2:
self.first_visible_line = self.selected_line - 2
for row in self.rows[self.first_visible_line:self.first_visible_line+maxy-2]:
task_id, proc, user, runtime, sleeptime, status, progress, arguments = row
self.put_line( row )
self.y = self.stdscr.getmaxyx()[0] - 1
if self.auto_mode:
self.display_in_footer(self.footer_auto_mode, print_time_p=1)
elif self.move_mode:
self.display_in_footer(self.footer_move_mode, print_time_p=1)
else:
self.display_in_footer(self.footer_select_mode, print_time_p=1)
footer2 = ""
if self.item_status.find("DONE") > -1 or self.item_status == "ERROR" or self.item_status == "STOPPED":
footer2 += self.footer_stopped_item
elif self.item_status == "RUNNING" or self.item_status == "CONTINUING":
footer2 += self.footer_running_item
elif self.item_status == "SLEEPING":
footer2 += self.footer_sleeping_item
elif self.item_status == "WAITING":
footer2 += self.footer_waiting_item
self.display_in_footer(footer2, 1)
self.stdscr.refresh()
def start(self, stdscr):
ring = 0
if curses.has_colors():
curses.start_color()
curses.init_pair(8, curses.COLOR_WHITE, curses.COLOR_BLACK)
curses.init_pair(1, curses.COLOR_WHITE, curses.COLOR_RED)
curses.init_pair(2, curses.COLOR_GREEN, curses.COLOR_BLACK)
curses.init_pair(3, curses.COLOR_MAGENTA, curses.COLOR_BLACK)
curses.init_pair(4, curses.COLOR_RED, curses.COLOR_BLACK)
curses.init_pair(5, curses.COLOR_BLUE, curses.COLOR_BLACK)
curses.init_pair(6, curses.COLOR_CYAN, curses.COLOR_BLACK)
curses.init_pair(7, curses.COLOR_YELLOW, curses.COLOR_BLACK)
self.stdscr = stdscr
self.base_panel = curses.panel.new_panel( self.stdscr )
self.base_panel.bottom()
curses.panel.update_panels()
self.height, self.width = stdscr.getmaxyx()
self.stdscr.clear()
if server_pid (): self.auto_mode = 1
if self.display == 1:
where = "and status='DONE'"
order = "DESC"
else:
where = "and status!='DONE'"
order = "ASC"
self.rows = run_sql("""SELECT id,proc,user,runtime,sleeptime,status,progress,arguments FROM schTASK WHERE status NOT LIKE '%%DELETED%%' %s ORDER BY runtime %s""" % (where, order))
self.repaint()
ring = 0
while self.running:
ring += 1
char = -1
try:
char = timed_out(self.stdscr.getch, 1)
if char == 27: # escaping sequence
char = self.stdscr.getch()
if char == 79: # arrow
char = self.stdscr.getch()
if char == 65: #arrow up
char = curses.KEY_UP
elif char == 66: #arrow down
char = curses.KEY_DOWN
elif char == 72:
char = curses.KEY_PPAGE
elif char == 70:
char = curses.KEY_NPAGE
elif char == 91:
char = self.stdscr.getch()
if char == 53:
char = self.stdscr.getch()
if char == 126:
char = curses.KEY_HOME
except TimedOutExc:
char = -1
self.handle_keys(char)
if ring == 4:
if self.display == 1:
where = "and status='DONE'"
order = "DESC"
else:
where = "and status!='DONE'"
order = "ASC"
self.rows = run_sql("""SELECT id,proc,user,runtime,sleeptime,status,progress,arguments FROM schTASK WHERE status NOT LIKE '%%DELETED%%' %s ORDER BY runtime %s""" % (where, order))
ring = 0
self.repaint()
class BibSched:
def __init__(self):
self.helper_modules = cfg_valid_processes
self.running = {}
self.sleep_done = {}
self.sleep_sent = {}
self.stop_sent = {}
self.suicide_sent = {}
def set_status(self, task_id, status):
return run_sql("UPDATE schTASK set status=%s WHERE id=%s", (status, task_id))
def can_run( self, proc ):
return len( self.running.keys() ) == 0
def get_running_processes(self):
row = None
res = run_sql("SELECT id,proc,user,UNIX_TIMESTAMP(runtime),sleeptime,arguments,status FROM schTASK "\
" WHERE status='RUNNING' or status='CONTINUING' LIMIT 1")
try:
row = res[0]
except:
pass
return row
def handle_row( self, row ):
task_id, proc, user, runtime, sleeptime, arguments, status = row
if status == "SLEEP":
if task_id in self.running.keys():
self.set_status( task_id, "SLEEP SENT" )
os.kill( self.running[task_id], signal.SIGUSR1 )
self.sleep_sent[task_id] = self.running[task_id]
elif status == "SLEEPING":
if task_id in self.sleep_sent.keys():
self.sleep_done[task_id] = self.sleep_sent[task_id]
del self.sleep_sent[task_id]
if status == "WAKEUP":
if task_id in self.sleep_done.keys():
self.running[task_id] = self.sleep_done[task_id]
del self.sleep_done[task_id]
os.kill( self.running[task_id], signal.SIGCONT )
self.set_status( task_id, "RUNNING" )
elif status == "STOP":
if task_id in self.running.keys():
self.set_status( task_id, "STOP SENT" )
os.kill( self.running[task_id], signal.SIGTERM )
self.stop_sent[task_id] = self.running[task_id]
del self.running[task_id]
elif status == "STOPPED" and task_id in self.stop_sent.keys():
del self.stop_sent[task_id]
elif status == "SUICIDE":
if task_id in self.running.keys():
self.set_status( task_id, "SUICIDE SENT" )
os.kill( self.running[task_id], signal.SIGABRT )
self.suicide_sent[task_id] = self.running[task_id]
del self.running[task_id]
elif status == "SUICIDED" and task_id in self.suicide_sent.keys():
del self.suicide_sent[task_id]
elif status.find("DONE") > -1 and task_id in self.running.keys():
del self.running[task_id]
elif self.can_run(proc) and status == "WAITING" and runtime <= time.time():
if proc in self.helper_modules:
- program = os.path.join(bindir, proc)
+ program = os.path.join(CFG_BINDIR, proc)
fdout, fderr = get_output_channelnames(task_id)
COMMAND = "%s %s >> %s 2>> %s" % (program, str(task_id), fdout, fderr)
Log("task #%d (%s) started" % (task_id, proc))
os.system(COMMAND)
Log("task #%d (%s) ended" % (task_id, proc))
self.running[task_id] = get_task_pid(proc, task_id)
if sleeptime:
new_runtime = get_datetime(sleeptime)
new_task_arguments = marshal.loads(arguments)
new_task_arguments["runtime"] = new_runtime
new_task_id = run_sql("INSERT INTO schTASK (proc,user,runtime,sleeptime,arguments,status)"\
" VALUES (%s,%s,%s,%s,%s,'WAITING')",
(proc, user, new_runtime, sleeptime, arguments))
new_task_arguments["task"] = new_task_id
run_sql("""UPDATE schTASK SET arguments=%s WHERE id=%s""",
(marshal.dumps(new_task_arguments), new_task_id))
def watch_loop(self):
running_process = self.get_running_processes()
if running_process:
proc = running_process[ 1 ]
task_id = running_process[ 0 ]
if get_task_pid(proc, task_id):
self.running[task_id] = get_task_pid(proc, task_id)
else:
self.set_status(task_id,"ERROR")
rows = []
while 1:
for row in rows:
self.handle_row( row )
time.sleep(CFG_BIBSCHED_REFRESHTIME)
rows = run_sql("SELECT id,proc,user,UNIX_TIMESTAMP(runtime),sleeptime,arguments,status FROM schTASK ORDER BY runtime ASC")
class TimedOutExc(Exception):
def __init__(self, value = "Timed Out"):
self.value = value
def __str__(self):
return repr(self.value)
def timed_out(f, timeout, *args, **kwargs):
def handler(signum, frame):
raise TimedOutExc()
old = signal.signal(signal.SIGALRM, handler)
signal.alarm(timeout)
try:
result = f(*args, **kwargs)
finally:
signal.signal(signal.SIGALRM, old)
signal.alarm(0)
return result
def Log(message):
- log = open(logdir + "/bibsched.log","a")
+ log = open(CFG_LOGDIR + "/bibsched.log","a")
log.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime()))
log.write(message)
log.write("\n")
log.close()
def redirect_stdout_and_stderr():
"This function redirects stdout and stderr to bibsched.log and bibsched.err file."
- sys.stdout = open(logdir + "/bibsched.log", "a")
- sys.stderr = open(logdir + "/bibsched.err", "a")
+ sys.stdout = open(CFG_LOGDIR + "/bibsched.log", "a")
+ sys.stderr = open(CFG_LOGDIR + "/bibsched.err", "a")
def usage(exitcode=1, msg=""):
"""Prints usage info."""
if msg:
sys.stderr.write("Error: %s.\n" % msg)
sys.stderr.write ("""\
Usage: %s [options] [start|stop|restart|monitor|status]
The following commands are available for bibsched:
start start bibsched in background
stop stop a running bibsched
restart restart a running bibsched
monitor enter the interactive monitor
status get report about current status of the queue
Command options:
-d, --daemon\t Launch BibSched in the daemon mode (deprecated, use 'start')
General options:
-h, --help \t\t Print this help.
-V, --version \t\t Print version information.
""" % sys.argv [0])
#sys.stderr.write(" -v, --verbose=LEVEL \t Verbose level (0=min, 1=default, 9=max).\n")
sys.exit(exitcode)
pidfile = os.path.join(CFG_PREFIX, 'var', 'run', 'bibsched.pid')
def error (msg):
print >> sys.stderr, "error: " + msg
sys.exit (1)
def server_pid ():
# The pid must be stored on the filesystem
try:
pid = int (open (pidfile).read ())
except IOError:
return None
# Even if the pid is available, we check if it corresponds to an
# actual process, as it might have been killed externally
try:
os.kill (pid, signal.SIGCONT)
except OSError:
return None
return pid
def start (verbose = True):
""" Fork this process in the background and start processing
requests. The process PID is stored in a pid file, so that it can
be stopped later on."""
if verbose:
sys.stdout.write ("starting bibsched: ")
sys.stdout.flush ()
pid = server_pid ()
if pid:
error ("another instance of bibsched (pid %d) is running" % pid)
# start the child process using the "double fork" technique
pid = os.fork ()
if pid > 0: sys.exit (0)
os.setsid ()
os.chdir ('/')
pid = os.fork ()
if pid > 0:
if verbose:
sys.stdout.write ('pid %d\n' % pid)
Log ("daemon started (pid %d)" % pid)
open (pidfile, 'w').write ('%d' % pid)
return
sys.stdin.close ()
redirect_stdout_and_stderr ()
sched = BibSched()
sched.watch_loop ()
return
def stop (verbose = True):
pid = server_pid ()
if not pid:
error ('bibsched seems not to be running.')
try: os.kill (pid, signal.SIGKILL)
except OSError:
print >> sys.stderr, 'no bibsched process found'
Log ("daemon stopped (pid %d)" % pid)
if verbose: print "stopping bibsched: pid %d" % pid
os.unlink (pidfile)
return
def monitor(verbose = True):
redirect_stdout_and_stderr()
manager = Manager()
return
def write_message(msg, stream=sys.stdout, verbose=1):
"""Write message and flush output stream (may be sys.stdout or sys.stderr).
Useful for debugging stuff."""
if msg:
if stream == sys.stdout or stream == sys.stderr:
stream.write(time.strftime("%Y-%m-%d %H:%M:%S --> ",
time.localtime()))
try:
stream.write("%s\n" % msg)
except UnicodeEncodeError:
stream.write("%s\n" % msg.encode('ascii', 'backslashreplace'))
stream.flush()
else:
sys.stderr.write("Unknown stream %s. [must be sys.stdout or sys.stderr]\n" % stream)
def report_queue_status(verbose=True):
"""
Report about the current status of BibSched queue on standard output.
"""
def report_about_processes(status='RUNNING'):
"""
Helper function to report about processes with the given status.
"""
res = run_sql("""SELECT id,proc,user,runtime,sleeptime,status,progress
FROM schTASK WHERE status=%s ORDER BY id ASC""",
(status,))
write_message("%s processes: %d" % (status, len(res)))
for (proc_id, proc_proc, proc_user, proc_runtime, proc_sleeptime,
proc_status, proc_progress) in res:
write_message(' * ID="%s" PROC="%s" USER="%s" RUNTIME="%s" SLEEPTIME="%s" STATUS="%s" PROGRESS="%s"' % \
(proc_id, proc_proc, proc_user, proc_runtime,
proc_sleeptime, proc_status, proc_progress))
return
write_message("BibSched queue status report for %s:" % gethostname())
report_about_processes('Running')
report_about_processes('Waiting')
write_message("Done.")
return
def restart(verbose = True):
stop(verbose)
start(verbose)
return
def main():
verbose = True
try:
opts, args = getopt.getopt(sys.argv[1:], "hVdq", [
"help","version","daemon", "quiet"])
except getopt.GetoptError, err:
Log ("Error: %s" % err)
usage(1, err)
for opt, arg in opts:
if opt in ["-h", "--help"]:
usage (0)
elif opt in ["-V", "--version"]:
print __revision__
sys.exit(0)
elif opt in ["-d", "--daemon"]:
redirect_stdout_and_stderr ()
sched = BibSched()
Log("daemon started")
sched.watch_loop()
elif opt in ['-q', '--quiet']:
verbose = False
else:
usage(1)
try: cmd = args [0]
except IndexError: cmd = 'monitor'
try:
{ 'start': start,
'stop': stop,
'restart': restart,
'monitor': monitor,
'status': report_queue_status, } [cmd] (verbose)
except KeyError:
usage (1, 'unkown command: %s' % `cmd`)
return
if __name__ == '__main__':
main()
diff --git a/modules/bibupload/lib/bibupload.py b/modules/bibupload/lib/bibupload.py
index ff416c07a..368793501 100644
--- a/modules/bibupload/lib/bibupload.py
+++ b/modules/bibupload/lib/bibupload.py
@@ -1,1598 +1,1598 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
BibUpload: Receive MARC XML file and update the appropriate database
tables according to options.
Usage: bibupload [options] input.xml
Examples:
$ bibupload -i input.xml
Options:
-a, --append new fields are appended to the existing record
-c, --correct fields are replaced by the new ones in the
existing record
-f, --format takes only the FMT fields into account.
Does not update
-i, --insert insert the new record in the database
-r, --replace the existing record is entirely replaced
by the new one
-z, --reference update references (update only 999 fields)
-s, --stage=STAGE stage to start from in the algorithm
(0: always done; 1: FMT tags;
2: FFT tags; 3: BibFmt; 4: Metadata update; 5: time update)
-n, --notimechange do not change record last modification date
when updating
Scheduling options:
-u, --user=USER user name to store task, password needed
General options:
-h, --help print this help and exit
-v, --verbose=LEVEL verbose level (from 0 to 9, default 1)
-V --version print the script version
"""
__revision__ = "$Id$"
import os
import sys
import time
from zlib import compress
import re
import urllib2
import tempfile
from invenio.config import CFG_OAI_ID_FIELD, weburl
from invenio.bibupload_config import *
from invenio.dbquery import run_sql, \
Error
from invenio.bibrecord import create_records, \
create_record, \
record_add_field, \
record_delete_field, \
record_xml_output, \
record_get_field_instances, \
record_get_field_values, \
field_get_subfield_values
from invenio.dateutils import convert_datestruct_to_datetext
from invenio.bibformat import format_record
-from invenio.config import filedir, \
- filedirsize, \
+from invenio.config import CFG_WEBSUBMIT_FILEDIR, \
+ CFG_WEBSUBMIT_FILESYSTEM_BIBDOC_GROUP_LIMIT, \
htdocsurl, \
- tmpdir, \
+ CFG_TMPDIR, \
CFG_PREFIX
from invenio.bibtask import task_init, write_message, get_datetime, \
task_set_option, task_get_option, task_get_task_param, task_update_status, \
task_update_progress
from invenio.bibdocfile import BibRecDocs, file_strip_ext, normalize_format
#Statistic variables
stat = {}
stat['nb_records_to_upload'] = 0
stat['nb_records_updated'] = 0
stat['nb_records_inserted'] = 0
stat['nb_errors'] = 0
stat['exectime'] = time.localtime()
### bibupload engine functions:
def bibupload(record, opt_tag=None, opt_mode=None,
opt_stage_to_start_from=1, opt_notimechange=0):
"""Main function: process a record and fit it in the tables
bibfmt, bibrec, bibrec_bibxxx, bibxxx with proper record
metadata.
Return (error_code, recID) of the processed record.
"""
assert(opt_mode in ('insert', 'replace', 'replace_or_insert', 'reference',
'correct', 'append', 'format'))
error = None
# If there are special tags to proceed check if it exists in the record
if opt_tag is not None and not(record.has_key(opt_tag)):
write_message(" Failed: Tag not found, enter a valid tag to update.",
verbose=1, stream=sys.stderr)
return (1, -1)
# Extraction of the Record Id from 001, SYSNO or OAIID tags:
rec_id = retrieve_rec_id(record, opt_mode)
if rec_id == -1:
return (1, -1)
elif rec_id > 0:
write_message(" -Retrieve record ID (found %s): DONE." % rec_id, verbose=2)
if not record.has_key('001'):
# Found record ID by means of SYSNO or OAIID, and the
# input MARCXML buffer does not have this 001 tag, so we
# should add it now:
error = record_add_field(record, '001', '', '', rec_id, [], 0)
if error is None:
write_message(" Failed: " \
"Error during adding the 001 controlfield " \
"to the record", verbose=1, stream=sys.stderr)
return (1, int(rec_id))
else:
error = None
write_message(" -Added tag 001: DONE.", verbose=2)
write_message(" -Check if the xml marc file is already in the database: DONE" , verbose=2)
# Reference mode check if there are reference tag
if opt_mode == 'reference':
error = extract_tag_from_record(record, CFG_BIBUPLOAD_REFERENCE_TAG)
if error is None:
write_message(" Failed: No reference tags has been found...",
verbose=1, stream=sys.stderr)
return (1, -1)
else:
error = None
write_message(" -Check if reference tags exist: DONE", verbose=2)
if opt_mode == 'insert' or \
(opt_mode == 'replace_or_insert' and rec_id is None):
insert_mode_p = True
# Insert the record into the bibrec databases to have a recordId
rec_id = create_new_record()
write_message(" -Creation of a new record id (%d): DONE" % rec_id, verbose=2)
# we add the record Id control field to the record
error = record_add_field(record, '001', '', '', rec_id, [], 0)
if error is None:
write_message(" Failed: " \
"Error during adding the 001 controlfield " \
"to the record", verbose=1, stream=sys.stderr)
return (1, int(rec_id))
else:
error = None
elif opt_mode != 'insert' and opt_mode != 'format' and \
opt_stage_to_start_from != 5:
insert_mode_p = False
# Update Mode
# Retrieve the old record to update
rec_old = create_record(format_record(int(rec_id), 'xm'), 2)[0]
if rec_old is None:
write_message(" Failed during the creation of the old record!",
verbose=1, stream=sys.stderr)
return (1, int(rec_id))
else:
write_message(" -Retrieve the old record to update: DONE", verbose=2)
# In Replace mode, take over old strong tags if applicable:
if opt_mode == 'replace' or \
opt_mode == 'replace_or_insert':
copy_strong_tags_from_old_record(record, rec_old)
# Delete tags to correct in the record
if opt_mode == 'correct' or opt_mode == 'reference':
delete_tags_to_correct(record, rec_old, opt_tag)
write_message(" -Delete the old tags to correct in the old record: DONE",
verbose=2)
# Append new tag to the old record and update the new record with the old_record modified
if opt_mode == 'append' or opt_mode == 'correct' or \
opt_mode == 'reference':
record = append_new_tag_to_old_record(record, rec_old,
opt_tag, opt_mode)
write_message(" -Append new tags to the old record: DONE", verbose=2)
# now we clear all the rows from bibrec_bibxxx from the old
# record (they will be populated later (if needed) during
# stage 4 below):
delete_bibrec_bibxxx(rec_old, rec_id)
write_message(" -Clean bibrec_bibxxx: DONE", verbose=2)
write_message(" -Stage COMPLETED", verbose=2)
# Have a look if we have FMT tags
write_message("Stage 1: Start (Insert of FMT tags if exist).", verbose=2)
if opt_stage_to_start_from <= 1 and \
extract_tag_from_record(record, 'FMT') is not None:
record = insert_fmt_tags(record, rec_id, opt_mode)
if record is None:
write_message(" Stage 1 failed: Error while inserting FMT tags",
verbose=1, stream=sys.stderr)
return (1, int(rec_id))
elif record == 0:
# Mode format finished
stat['nb_records_updated'] += 1
return (0, int(rec_id))
write_message(" -Stage COMPLETED", verbose=2)
else:
write_message(" -Stage NOT NEEDED", verbose=2)
# Have a look if we have FFT tags
write_message("Stage 2: Start (Process FFT tags if exist).", verbose=2)
if opt_stage_to_start_from <= 2 and \
extract_tag_from_record(record, 'FFT') is not None:
record = elaborate_fft_tags(record, rec_id, opt_mode)
write_message(" -Stage COMPLETED", verbose=2)
else:
write_message(" -Stage NOT NEEDED", verbose=2)
# Update of the BibFmt
write_message("Stage 3: Start (Update bibfmt).", verbose=2)
if opt_stage_to_start_from <= 3:
# format the single record as xml
rec_xml_new = record_xml_output(record)
# Update bibfmt with the format xm of this record
if opt_mode != 'format':
error = update_bibfmt_format(rec_id, rec_xml_new, 'xm')
if error == 1:
write_message(" Failed: error during update_bibfmt_format",
verbose=1, stream=sys.stderr)
return (1, int(rec_id))
# archive MARCXML format of this record for version history purposes:
if opt_mode != 'format':
error = archive_marcxml_for_history(rec_id)
if error == 1:
write_message(" Failed to archive MARCXML for history",
verbose=1, stream=sys.stderr)
return (1, int(rec_id))
else:
write_message(" -Archived MARCXML for history : DONE", verbose=2)
write_message(" -Stage COMPLETED", verbose=2)
# Update the database MetaData
write_message("Stage 4: Start (Update the database with the metadata).",
verbose=2)
if opt_stage_to_start_from <= 4:
if opt_mode == 'insert' or \
opt_mode == 'replace' or \
opt_mode == 'replace_or_insert' or \
opt_mode == 'append' or \
opt_mode == 'correct' or \
opt_mode == 'reference':
update_database_with_metadata(record, rec_id)
else:
write_message(" -Stage NOT NEEDED in mode %s" % opt_mode,
verbose=2)
write_message(" -Stage COMPLETED", verbose=2)
else:
write_message(" -Stage NOT NEEDED", verbose=2)
# Finally we update the bibrec table with the current date
write_message("Stage 5: Start (Update bibrec table with current date).",
verbose=2)
if opt_stage_to_start_from <= 5 and \
opt_notimechange == 0 and \
not insert_mode_p:
now = convert_datestruct_to_datetext(time.localtime())
write_message(" -Retrieved current localtime: DONE", verbose=2)
update_bibrec_modif_date(now, rec_id)
write_message(" -Stage COMPLETED", verbose=2)
else:
write_message(" -Stage NOT NEEDED", verbose=2)
# Increase statistics
if insert_mode_p:
stat['nb_records_inserted'] += 1
else:
stat['nb_records_updated'] += 1
# Upload of this record finish
write_message("Record "+str(rec_id)+" DONE", verbose=1)
return (0, int(rec_id))
def print_out_bibupload_statistics():
"""Print the statistics of the process"""
out = "Task stats: %(nb_input)d input records, %(nb_updated)d updated, " \
"%(nb_inserted)d inserted, %(nb_errors)d errors. " \
"Time %(nb_sec).2f sec." % { \
'nb_input': stat['nb_records_to_upload'],
'nb_updated': stat['nb_records_updated'],
'nb_inserted': stat['nb_records_inserted'],
'nb_errors': stat['nb_errors'],
'nb_sec': time.time() - time.mktime(stat['exectime']) }
write_message(out)
def open_marc_file(path):
"""Open a file and return the data"""
try:
# open the file containing the marc document
marc_file = open(path,'r')
marc = marc_file.read()
marc_file.close()
except IOError, erro:
write_message("Error: %s" % erro, verbose=1, stream=sys.stderr)
write_message("Exiting.", sys.stderr)
task_update_status("ERROR")
sys.exit(1)
return marc
def xml_marc_to_records(xml_marc):
"""create the records"""
# Creation of the records from the xml Marc in argument
recs = create_records(xml_marc, 1, 1)
if recs == []:
write_message("Error: Cannot parse MARCXML file.", verbose=1, stream=sys.stderr)
write_message("Exiting.", sys.stderr)
task_update_status("ERROR")
sys.exit(1)
elif recs[0][0] is None:
write_message("Error: MARCXML file has wrong format: %s" % recs,
verbose=1, stream=sys.stderr)
write_message("Exiting.", sys.stderr)
task_update_status("ERROR")
sys.exit(1)
else:
recs = map((lambda x:x[0]), recs)
return recs
def find_record_format(rec_id, format):
"""Look whether record REC_ID is formatted in FORMAT,
i.e. whether FORMAT exists in the bibfmt table for this record.
Return the number of times it is formatted: 0 if not, 1 if yes,
2 if found more than once (should never occur).
"""
out = 0
query = """SELECT COUNT(id) FROM bibfmt WHERE id_bibrec=%s AND format=%s"""
params = (rec_id, format)
res = []
try:
res = run_sql(query, params)
out = res[0][0]
except Error, error:
write_message(" Error during find_record_format() : %s " % error, verbose=1, stream=sys.stderr)
return out
def find_record_from_recid(rec_id):
"""
Try to find record in the database from the REC_ID number.
Return record ID if found, None otherwise.
"""
try:
res = run_sql("SELECT id FROM bibrec WHERE id=%s",
(rec_id,))
except Error, error:
write_message(" Error during find_record_bibrec() : %s "
% error, verbose=1, stream=sys.stderr)
if res:
return res[0][0]
else:
return None
def find_record_from_sysno(sysno):
"""
Try to find record in the database from the external SYSNO number.
Return record ID if found, None otherwise.
"""
bibxxx = 'bib'+CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:2]+'x'
bibrec_bibxxx = 'bibrec_' + bibxxx
try:
res = run_sql("""SELECT bb.id_bibrec FROM %(bibrec_bibxxx)s AS bb,
%(bibxxx)s AS b WHERE b.tag=%%s AND b.value=%%s
AND bb.id_bibxxx=b.id""" % \
{'bibxxx': bibxxx,
'bibrec_bibxxx': bibrec_bibxxx},
(CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG, sysno,))
except Error, error:
write_message(" Error during find_record_from_sysno(): %s " % error,
verbose=1, stream=sys.stderr)
if res:
return res[0][0]
else:
return None
def find_record_from_extoaiid(extoaiid):
"""
Try to find record in the database from the external EXTOAIID number.
Return record ID if found, None otherwise.
"""
bibxxx = 'bib'+CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:2]+'x'
bibrec_bibxxx = 'bibrec_' + bibxxx
try:
res = run_sql("""SELECT bb.id_bibrec FROM %(bibrec_bibxxx)s AS bb,
%(bibxxx)s AS b WHERE b.tag=%%s AND b.value=%%s
AND bb.id_bibxxx=b.id""" % \
{'bibxxx': bibxxx,
'bibrec_bibxxx': bibrec_bibxxx},
(CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG, extoaiid,))
except Error, error:
write_message(" Error during find_record_from_extoaiid(): %s "
% error, verbose=1, stream=sys.stderr)
if res:
return res[0][0]
else:
return None
def find_record_from_oaiid(oaiid):
"""
Try to find record in the database from the OAI ID number.
Return record ID if found, None otherwise.
"""
bibxxx = 'bib'+CFG_OAI_ID_FIELD[0:2]+'x'
bibrec_bibxxx = 'bibrec_' + bibxxx
try:
res = run_sql("""SELECT bb.id_bibrec FROM %(bibrec_bibxxx)s AS bb,
%(bibxxx)s AS b WHERE b.tag=%%s AND b.value=%%s
AND bb.id_bibxxx=b.id""" % \
{'bibxxx': bibxxx,
'bibrec_bibxxx': bibrec_bibxxx},
(CFG_OAI_ID_FIELD, oaiid,))
except Error, error:
write_message(" Error during find_record_from_oaiid(): %s " % error,
verbose=1, stream=sys.stderr)
if res:
return res[0][0]
else:
return None
def extract_tag_from_record(record, tag_number):
""" Extract the tag_number for record."""
# first step verify if the record is not already in the database
if record:
return record.get(tag_number, None)
return None
def retrieve_rec_id(record, opt_mode):
"""Retrieve the record Id from a record by using tag 001 or SYSNO or OAI ID
tag. opt_mod is the desired mode."""
rec_id = None
# 1st step: we look for the tag 001
tag_001 = extract_tag_from_record(record, '001')
if tag_001 is not None:
# We extract the record ID from the tag
rec_id = tag_001[0][3]
# if we are in insert mode => error
if opt_mode == 'insert':
write_message(" Failed : Error tag 001 found in the xml" \
" submitted, you should use the option replace," \
" correct or append to replace an existing" \
" record. (-h for help)",
verbose=1, stream=sys.stderr)
return -1
else:
# we found the rec id and we are not in insert mode => continue
# we try to match rec_id against the database:
if find_record_from_recid(rec_id) is not None:
# okay, 001 corresponds to some known record
return rec_id
else:
# The record doesn't exist yet. We shall have try to check
# the SYSNO or OAI id later.
write_message(" -Tag 001 value not found in database.",
verbose=9)
rec_id = None
else:
write_message(" -Tag 001 not found in the xml marc file.", verbose=9)
if rec_id is None:
# 2nd step we look for the SYSNO
sysnos = record_get_field_values(record,
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3],
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] or "",
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] or "",
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6])
if sysnos:
sysno = sysnos[0] # there should be only one external SYSNO
write_message(" -Checking if SYSNO " + sysno + \
" exists in the database", verbose=9)
# try to find the corresponding rec id from the database
rec_id = find_record_from_sysno(sysno)
if rec_id is not None:
# rec_id found
pass
else:
# The record doesn't exist yet. We will try to check
# external and internal OAI ids later.
write_message(" -Tag SYSNO value not found in database.",
verbose=9)
rec_id = None
else:
write_message(" -Tag SYSNO not found in the xml marc file.",
verbose=9)
if rec_id is None:
# 2nd step we look for the external OAIID
extoaiids = record_get_field_values(record,
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3],
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] or "",
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] or "",
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6])
if extoaiids:
extoaiid = extoaiids[0] # there should be only one external OAI ID
write_message(" -Checking if EXTOAIID " + extoaiid + \
" exists in the database", verbose=9)
# try to find the corresponding rec id from the database
rec_id = find_record_from_extoaiid(extoaiid)
if rec_id is not None:
# rec_id found
pass
else:
# The record doesn't exist yet. We will try to check
# OAI id later.
write_message(" -Tag EXTOAIID value not found in database.",
verbose=9)
rec_id = None
else:
write_message(" -Tag EXTOAIID not found in the xml marc file.", verbose=9)
if rec_id is None:
# 4th step we look for the OAI ID
oaiidvalues = record_get_field_values(record,
CFG_OAI_ID_FIELD[0:3],
CFG_OAI_ID_FIELD[3:4] != "_" and \
CFG_OAI_ID_FIELD[3:4] or "",
CFG_OAI_ID_FIELD[4:5] != "_" and \
CFG_OAI_ID_FIELD[4:5] or "",
CFG_OAI_ID_FIELD[5:6])
if oaiidvalues:
oaiid = oaiidvalues[0] # there should be only one OAI ID
write_message(" -Check if local OAI ID " + oaiid + \
" exist in the database", verbose=9)
# try to find the corresponding rec id from the database
rec_id = find_record_from_oaiid(oaiid)
if rec_id is not None:
# rec_id found
pass
else:
write_message(" -Tag OAI ID value not found in database.",
verbose=9)
rec_id = None
else:
write_message(" -Tag SYSNO not found in the xml marc file.",
verbose=9)
# Now we should have detected rec_id from SYSNO or OAIID
# tags. (None otherwise.)
if rec_id:
if opt_mode == 'insert':
write_message(" Failed : Record found in the database," \
" you should use the option replace," \
" correct or append to replace an existing" \
" record. (-h for help)",
verbose=1, stream=sys.stderr)
return -1
else:
if opt_mode != 'insert' and \
opt_mode != 'replace_or_insert':
write_message(" Failed : Record not found in the database."\
" Please insert the file before updating it."\
" (-h for help)", verbose=1, stream=sys.stderr)
return -1
return rec_id
### Insert functions
def create_new_record():
"""Create new record in the database"""
now = convert_datestruct_to_datetext(time.localtime())
query = """INSERT INTO bibrec (creation_date, modification_date)
VALUES (%s, %s)"""
params = (now, now)
try:
rec_id = run_sql(query, params)
return rec_id
except Error, error:
write_message(" Error during the creation_new_record function : %s "
% error, verbose=1, stream=sys.stderr)
return None
def insert_bibfmt(id_bibrec, marc, format):
"""Insert the format in the table bibfmt"""
# compress the marc value
pickled_marc = compress(marc)
# get the current time
now = convert_datestruct_to_datetext(time.localtime())
query = """INSERT INTO bibfmt (id_bibrec, format, last_updated, value)
VALUES (%s, %s, %s, %s)"""
try:
row_id = run_sql(query, (id_bibrec, format, now, pickled_marc))
return row_id
except Error, error:
write_message(" Error during the insert_bibfmt function : %s "
% error, verbose=1, stream=sys.stderr)
return None
def insert_record_bibxxx(tag, value):
"""Insert the record into bibxxx"""
# determine into which table one should insert the record
table_name = 'bib'+tag[0:2]+'x'
# check if the tag, value combination exists in the table
query = """SELECT id,value FROM %s """ % table_name
query += """ WHERE tag=%s AND value=%s"""
params = (tag, value)
try:
res = run_sql(query, params)
except Error, error:
write_message(" Error during the insert_record_bibxxx function : %s "
% error, verbose=1, stream=sys.stderr)
# Note: compare now the found values one by one and look for
# string binary equality (e.g. to respect lowercase/uppercase
# match), regardless of the charset etc settings. Ideally we
# could use a BINARY operator in the above SELECT statement, but
# we would have to check compatibility on various MySQLdb versions
# etc; this approach checks all matched values in Python, not in
# MySQL, which is less cool, but more conservative, so it should
# work better on most setups.
for row in res:
row_id = row[0]
row_value = row[1]
if row_value == value:
return (table_name, row_id)
# We got here only when the tag,value combination was not found,
# so it is now necessary to insert the tag,value combination into
# bibxxx table as new.
query = """INSERT INTO %s """ % table_name
query += """ (tag, value) values (%s , %s)"""
params = (tag, value)
try:
row_id = run_sql(query, params)
except Error, error:
write_message(" Error during the insert_record_bibxxx function : %s "
% error, verbose=1, stream=sys.stderr)
return (table_name, row_id)
def insert_record_bibrec_bibxxx(table_name, id_bibxxx,
field_number, id_bibrec):
"""Insert the record into bibrec_bibxxx"""
# determine into which table one should insert the record
full_table_name = 'bibrec_'+ table_name
# insert the proper row into the table
query = """INSERT INTO %s """ % full_table_name
query += """(id_bibrec,id_bibxxx, field_number) values (%s , %s, %s)"""
params = (id_bibrec, id_bibxxx, field_number)
try:
res = run_sql(query, params)
except Error, error:
write_message(" Error during the insert_record_bibrec_bibxxx"
" function 2nd query : %s " % error, verbose=1, stream=sys.stderr)
return res
def download_url(url, format):
"""Download a url (if it corresponds to a remote file) and return a local url
to it."""
protocol = urllib2.urlparse.urlsplit(url)[0]
- (tmp, tmppath) = tempfile.mkstemp(suffix=format, dir=tmpdir)
+ (tmp, tmppath) = tempfile.mkstemp(suffix=format, dir=CFG_TMPDIR)
tmp = os.fdopen(tmp, 'w')
try:
if protocol in ('', 'file'):
path = urllib2.urlparse.urlsplit(url)[2]
for allowed_path in CFG_BIBUPLOAD_FFT_ALLOWED_LOCAL_PATHS:
if path.startswith(allowed_path):
tmp.write(open(path).read())
return tmppath
raise StandardError, "%s is not in one of the allowed paths." % path
else:
tmp.write(urllib2.urlopen(url).read())
return tmppath
except Exception, e:
tmp.close()
os.remove(tmppath)
raise e
def make_mark_url(recid, docname, format):
"""Return a url valid to be used for MARC."""
return '%s/record/%s/files/%s%s' % (weburl, recid, docname, format)
def get_docname_from_url(url):
"""Return a potential docname given a url"""
path = urllib2.urlparse.urlsplit(url)[2]
filename = os.path.split(path)[-1]
return file_strip_ext(filename)
def elaborate_fft_tags(record, rec_id, mode):
"""
Process FFT tags that should contain $a with file pathes or URLs
to get the fulltext from. This function enriches record with
proper 8564 URL tags, downloads fulltext files and stores them
into var/data structure where appropriate.
CFG_BIBUPLOAD_WGET_SLEEP_TIME defines time to sleep in seconds in
between URL downloads.
Note: if an FFT tag contains multiple $a subfields, we upload them
into different 856 URL tags in the metadata. See regression test
case test_multiple_fft_insert_via_http().
"""
# Let's define some handy sub procedure.
def _synchronize_8564(rec_id, record, descriptions, comments, changed):
"""Sinchronize the 8564 tags for record with actual files. descriptions
should be a dictionary docname:description for the new description to be
inserted."""
write_message("Synchronizing MARC of recid '%s' with:\n%s\nwith descriptions: %s\nand comments: %s\nand changed urls: %s" % (rec_id, record, descriptions, comments, changed), verbose=9, stream=sys.stderr)
tags8564s = record_get_field_instances(record, '856', '4', ' ')
filtered_tags8564s = []
old_comments = {}
old_descriptions = {}
# Let's discover all the previous internal urls, in order to rebuild them!
for field in tags8564s:
to_be_removed = False
for value in field_get_subfield_values(field, 'u'):
if value.startswith('%s/record/%s/files/' % (weburl, rec_id)):
to_be_removed = True
description = field_get_subfield_values(field, 'y')
if description:
old_descriptions[value] = description[0]
comment = field_get_subfield_values(field, 'z')
if comment:
old_comments[value] = comment[0]
if not to_be_removed:
filtered_tags8564s.append(field)
# Let's keep in the record only external 8564
record_delete_field(record, '856', '4', ' ') # First we delete 8564
for field in filtered_tags8564s: # Then we readd external ones
record_add_field(record, '856', '4', ' ', '', field[0])
# Now we refresh with existing internal 8564
bibrecdocs = BibRecDocs(rec_id)
latest_files = bibrecdocs.list_latest_files()
for file in latest_files:
new_url = '%s/record/%s/files/%s' % (weburl, rec_id, file.get_full_name())
new_subfield = [('u', new_url)]
description = descriptions.get(new_url, '')
if description == 'KEEP-OLD-VALUE':
description = old_descriptions.get(changed.get(new_url, new_url), '')
if description:
new_subfield.append(('y', description))
comment = comments.get(new_url, '')
if comment == 'KEEP-OLD-VALUE':
comment = old_comments.get(changed.get(new_url, new_url), '')
if comment:
new_subfield.append(('z', comment))
record_add_field(record, '856', '4', ' ', '', new_subfield)
# Let'handle all the icons
for bibdoc in bibrecdocs.list_bibdocs():
icon = bibdoc.get_icon()
if icon:
icon = icon.list_all_files()
if icon:
icon = icon[0].get_full_name()
new_subfield = [('q', '%s/record/%s/files/%s' % (weburl, rec_id, icon))]
new_subfield.append(('x', 'icon'))
record_add_field(record, '856', '4', ' ', '', new_subfield)
return record
def _add_new_format(bibdoc, url, format, docname, doctype, newname):
"""Adds a new format for a given bibdoc. Returns True when everything's fine."""
try:
tmpurl = download_url(url, format)
try:
bibdoc.add_file_new_format(tmpurl)
except StandardError, e:
write_message("('%s', '%s', '%s', '%s', '%s') not inserted because format already exists (%s)." % (url, format, docname, doctype, newname, e), stream=sys.stderr)
os.remove(tmpurl)
return False
except Exception, e:
write_message("Error in downloading '%s' because of: %s" % (url, e))
return False
return True
def _add_new_version(bibdoc, url, format, docname, doctype, newname):
"""Adds a new version for a given bibdoc. Returns True when everything's fine."""
try:
tmpurl = download_url(url, format)
try:
bibdoc.add_file_new_version(tmpurl)
except StandardError, e:
write_message("('%s', '%s', '%s', '%s', '%s') not inserted because '%s'." % (url, format, docname, doctype, newname, e), stream=sys.stderr)
os.remove(tmpurl)
return False
except Exception, e:
write_message("Error in downloading '%s' because of: %s" % (url, e))
return False
return True
def _add_new_icon(bibdoc, url, restriction):
"""Adds a new icon to an existing bibdoc, replacing the previous one if it exists. If url is empty, just remove the current icon."""
if not url:
bibdoc.delete_icon()
else:
try:
path = urllib2.urlparse.urlsplit(url)[2]
filename = os.path.split(path)[-1]
format = filename[len(file_strip_ext(filename)):].lower()
tmpurl = download_url(url, format)
try:
icondoc = bibdoc.add_icon(tmpurl, 'icon-%s' % bibdoc.get_docname())
if restriction and restriction != 'KEEP-OLD-VALUE':
icondoc.set_status(restriction)
except StandardError, e:
write_message("('%s', '%s') icon not added because '%s'." % (url, format, e), stream=sys.stderr)
os.remove(tmpurl)
return False
except Exception, e:
write_message("Error in downloading '%s' because of: %s" % (url, e))
return False
return True
tuple_list = extract_tag_from_record(record, 'FFT')
if tuple_list: # FFT Tags analysis
write_message("FFTs: "+str(tuple_list), verbose=9)
docs = {} # docnames and their data
comments = {} # files and their comments
descriptions = {} # docnames and their descriptions
changed = {} # local urls changed because of docname->newname
for fft in record_get_field_instances(record, 'FFT', ' ', ' '):
# Let's discover the type of the document
doctype = field_get_subfield_values(fft, 't')
if doctype:
doctype = doctype[0]
else: # Default is Main
doctype = 'Main'
# Let's discover the url.
url = field_get_subfield_values(fft, 'a')
if url:
url = url[0]
else:
url = ''
# Let's discover the description
description = field_get_subfield_values(fft, 'd')
if description:
description = description[0]
else:
description = ''
# Let's discover the desired docname to be created/altered
name = field_get_subfield_values(fft, 'n')
if name:
name = file_strip_ext(name[0])
else:
if url:
name = get_docname_from_url(url)
else:
write_message("fft '%s' doesn't specifies neither a url nor a name" % str(fft), stream=sys.stderr)
break
# Let's discover the desired new docname in case we want to change it
newname = field_get_subfield_values(fft, 'm')
if newname:
newname = file_strip_ext(newname[0])
else:
newname = name
# Let's discover the desired format
format = field_get_subfield_values(fft, 'f')
if format:
format = format[0]
else:
if url:
path = urllib2.urlparse.urlsplit(url)[2]
filename = os.path.split(path)[-1]
format = filename[len(file_strip_ext(filename)):]
else:
format = ''
format = normalize_format(format)
# Let's discover the icon
icon = field_get_subfield_values(fft, 'x')
if icon:
icon = icon[0]
else:
icon = ''
# Let's discover the comment
comment = field_get_subfield_values(fft, 'z')
if comment:
comment = comment[0]
else:
comment = ''
# Let's discover the restriction
restriction = field_get_subfield_values(fft, 'r')
if restriction:
restriction = restriction[0]
else:
restriction = ''
version = field_get_subfield_values(fft, 'v')
if version:
version = version[0]
else:
version = ''
if docs.has_key(name): # new format considered
(doctype2, newname2, restriction2, icon2, version2, urls) = docs[name]
if doctype2 != doctype:
write_message("fft '%s' specifies a different doctype from previous fft with docname '%s'" % (str(fft), name), stream=sys.stderr)
break
if newname2 != newname:
write_message("fft '%s' specifies a different newname from previous fft with docname '%s'" % (str(fft), name), stream=sys.stderr)
break
if restriction2 != restriction:
write_message("fft '%s' specifies a different restriction from previous fft with docname '%s'" % (str(fft), name), stream=sys.stderr)
break
if icon2 != icon:
write_message("fft '%x' specifies a different icon than the previous fft with docname '%s'" % (str(fft), name), stream=sys.stderr)
break
if version2 != version:
write_message("fft '%x' specifies a different version than the previous fft with docname '%s'" % (str(fft), name), stream=sys.stderr)
break
for (url2, format2) in urls:
if format == format2:
write_message("fft '%s' specifies a second file '%s' with the same format '%s' from previous fft with docname '%s'" % (str(fft), url, format, name), stream=sys.stderr)
if url:
urls.append((url, format))
else:
if url:
docs[name] = (doctype, newname, restriction, icon, version, [(url, format)])
else:
docs[name] = (doctype, newname, restriction, icon, version, [])
comments['%s/record/%s/files/%s%s' % (weburl, rec_id, newname, format)] = comment
descriptions['%s/record/%s/files/%s%s' % (weburl, rec_id, newname, format)] = description
if newname != name:
changed['%s/record/%s/files/%s%s' % (weburl, rec_id, newname, format)] = '%s/record/%s/files/%s%s' % (weburl, rec_id, name, format)
write_message('Result of FFT analysis:\n\tDocs: %s\n\tComments: %s\n\tDescriptions: %s' % (docs, comments, descriptions), verbose=9, stream=sys.stderr)
# Let's remove all FFT tags
record_delete_field(record, 'FFT', ' ', ' ')
# Preprocessed data elaboration
bibrecdocs = BibRecDocs(rec_id)
if mode == 'replace': # First we erase previous bibdocs
for bibdoc in bibrecdocs.list_bibdocs():
bibdoc.delete()
bibrecdocs.build_bibdoc_list()
for docname, (doctype, newname, restriction, icon, version, urls) in docs.iteritems():
write_message("Elaborating olddocname: '%s', newdocname: '%s', doctype: '%s', restriction: '%s', icon: '%s', urls: '%s', mode: '%s'" % (docname, newname, doctype, restriction, icon, urls, mode), verbose=9, stream=sys.stderr)
if mode in ('insert', 'replace'): # new bibdocs, new docnames, new marc
if newname in bibrecdocs.get_bibdoc_names(doctype):
write_message("('%s', '%s', '%s') not inserted because docname already exists." % (doctype, newname, urls), stream=sys.stderr)
break
try:
bibdoc = bibrecdocs.add_bibdoc(doctype, newname)
bibdoc.set_status(restriction)
except Exception, e:
write_message("('%s', '%s', '%s') not inserted because: '%s'." % (doctype, newname, urls, e), stream=sys.stderr)
break
for (url, format) in urls:
_add_new_format(bibdoc, url, format, docname, doctype, newname)
if icon and not icon == 'KEEP-OLD-VALUE':
_add_new_icon(bibdoc, icon, restriction)
elif mode == 'replace_or_insert': # to be thought as correct_or_insert
for bibdoc in bibrecdocs.list_bibdocs():
if bibdoc.get_docname() == docname:
if doctype not in ('PURGE', 'DELETE', 'EXPUNGE', 'REVERT'):
if newname != docname:
try:
bibdoc.change_name(newname)
except StandardError, e:
write_message(e, stream=sys.stderr)
break
found_bibdoc = False
for bibdoc in bibrecdocs.list_bibdocs():
if bibdoc.get_docname() == docname:
found_bibdoc = True
if doctype == 'PURGE':
bibdoc.purge()
elif doctype == 'DELETE':
bibdoc.delete()
elif doctype == 'EXPUNGE':
bibdoc.expunge()
elif doctype == 'REVERT':
try:
bibdoc.revert(version)
except Exception, e:
- write_message('(%s, %s) not correctly reverted: %s' (newname, version, e))
+ write_message('(%s, %s) not correctly reverted: %s' % (newname, version, e))
else:
if restriction != 'KEEP-OLD-VALUE':
bibdoc.set_status(restriction)
# Since the docname already existed we have to first
# bump the version by pushing the first new file
# then pushing the other files.
if urls:
(first_url, first_format) = urls[0]
other_urls = urls[1:]
if not _add_new_version(bibdoc, first_url, first_format, docname, doctype, newname):
continue
for (url, format) in other_urls:
_add_new_format(bibdoc, url, format, docname, doctype, newname)
if icon != 'KEEP-OLD-VALUE':
_add_new_icon(bibdoc, icon, restriction)
if not found_bibdoc:
bibdoc = bibrecdocs.add_bibdoc(doctype, newname)
for (url, format) in urls:
_add_new_format(bibdoc, url, format, docname, doctype, newname)
if icon and not icon == 'KEEP-OLD-VALUE':
_add_new_icon(bibdoc, icon, restriction)
elif mode == 'correct':
for bibdoc in bibrecdocs.list_bibdocs():
if bibdoc.get_docname() == docname:
if doctype not in ('PURGE', 'DELETE', 'EXPUNGE', 'REVERT'):
if newname != docname:
try:
bibdoc.change_name(newname)
except StandardError, e:
write_message(e, stream=sys.stderr)
break
found_bibdoc = False
for bibdoc in bibrecdocs.list_bibdocs():
if bibdoc.get_docname() == newname:
found_bibdoc = True
if doctype == 'PURGE':
bibdoc.purge()
elif doctype == 'DELETE':
bibdoc.delete()
elif doctype == 'EXPUNGE':
bibdoc.expunge()
elif doctype == 'REVERT':
try:
bibdoc.revert(version)
except Exception, e:
write_message('(%s, %s) not correctly reverted: %s' % (newname, version, e))
else:
if restriction != 'KEEP-OLD-VALUE':
bibdoc.set_status(restriction)
if urls:
(first_url, first_format) = urls[0]
other_urls = urls[1:]
if not _add_new_version(bibdoc, first_url, first_format, docname, doctype, newname):
continue
for (url, format) in other_urls:
_add_new_format(bibdoc, url, format, docname, description, doctype, newname)
if icon != 'KEEP-OLD-VALUE':
_add_new_icon(bibdoc, icon, restriction)
if not found_bibdoc:
write_message("('%s', '%s', '%s') not added because '%s' docname didn't existed." % (doctype, newname, urls, docname), stream=sys.stderr)
elif mode == 'append':
found_bibdoc = False
for bibdoc in bibrecdocs.list_bibdocs():
if bibdoc.get_docname() == docname:
found_bibdoc = True
for (url, format) in urls:
_add_new_format(bibdoc, url, format, docname, doctype, newname)
if icon not in ('', 'KEEP-OLD-VALUE'):
_add_new_icon(bibdoc, icon, restriction)
if not found_bibdoc:
try:
bibdoc = bibrecdocs.add_bibdoc(doctype, docname)
bibdoc.set_status(restriction)
for (url, format) in urls:
_add_new_format(bibdoc, url, format, docname, doctype, newname)
if icon and not icon == 'KEEP-OLD-VALUE':
_add_new_icon(bibdoc, icon, restriction)
except Exception, e:
write_message("('%s', '%s', '%s') not appended because: '%s'." % (doctype, newname, urls, e), stream=sys.stderr)
write_message('Changed urls: %s' % str(changed), verbose=9, stream=sys.stderr)
return _synchronize_8564(rec_id, record, descriptions, comments, changed)
else:
return record
def insert_fmt_tags(record, rec_id, opt_mode):
"""Process and insert FMT tags"""
fmt_fields = record_get_field_instances(record, 'FMT')
if fmt_fields:
for fmt_field in fmt_fields:
# Get the f, g subfields of the FMT tag
try:
f_value = field_get_subfield_values(fmt_field, "f")[0]
except IndexError:
f_value = ""
try:
g_value = field_get_subfield_values(fmt_field, "g")[0]
except IndexError:
g_value = ""
# Update the format
res = update_bibfmt_format(rec_id, g_value, f_value)
if res == 1:
write_message(" Failed: Error during update_bibfmt", verbose=1, stream=sys.stderr)
# If we are in format mode, we only care about the FMT tag
if opt_mode == 'format':
return 0
# We delete the FMT Tag of the record
record_delete_field(record, 'FMT')
write_message(" -Delete field FMT from record : DONE", verbose=2)
return record
elif opt_mode == 'format':
write_message(" Failed: Format updated failed : No tag FMT found", verbose=1, stream=sys.stderr)
return None
else:
return record
### Update functions
def update_bibrec_modif_date(now, bibrec_id):
"""Update the date of the record in bibrec table """
query = """UPDATE bibrec SET modification_date=%s WHERE id=%s"""
params = (now, bibrec_id)
try:
run_sql(query, params)
write_message(" -Update record modification date : DONE" , verbose=2)
except Error, error:
write_message(" Error during update_bibrec_modif_date function : %s" % error,
verbose=1, stream=sys.stderr)
def update_bibfmt_format(id_bibrec, format_value, format_name):
"""Update the format in the table bibfmt"""
# We check if the format is already in bibFmt
nb_found = find_record_format(id_bibrec, format_name)
if nb_found == 1:
# we are going to update the format
# get the current time
now = convert_datestruct_to_datetext(time.localtime())
# compress the format_value value
pickled_format_value = compress(format_value)
# update the format:
query = """UPDATE bibfmt SET last_updated=%s, value=%s WHERE id_bibrec=%s AND format=%s"""
params = (now, pickled_format_value, id_bibrec, format_name)
try:
row_id = run_sql(query, params)
if row_id is None:
write_message(" Failed: Error during update_bibfmt_format function", verbose=1, stream=sys.stderr)
return 1
else:
write_message(" -Update the format %s in bibfmt : DONE" % format_name , verbose=2)
return 0
except Error, error:
write_message(" Error during the update_bibfmt_format function : %s " % error, verbose=1, stream=sys.stderr)
elif nb_found > 1:
write_message(" Failed: Same format %s found several time in bibfmt for the same record." % format_name, verbose=1, stream=sys.stderr)
return 1
else:
# Insert the format information in BibFMT
res = insert_bibfmt(id_bibrec, format_value, format_name)
if res is None:
write_message(" Failed: Error during insert_bibfmt", verbose=1, stream=sys.stderr)
return 1
else:
write_message(" -Insert the format %s in bibfmt : DONE" % format_name , verbose=2)
return 0
def archive_marcxml_for_history(recID):
"""
Archive current MARCXML format of record RECID from BIBFMT table
into hstRECORD table. Useful to keep MARCXML history of records.
Return 0 if everything went fine. Return 1 otherwise.
"""
try:
res = run_sql("SELECT id_bibrec, value, last_updated FROM bibfmt WHERE format='xm' AND id_bibrec=%s",
(recID,))
if res:
run_sql("""INSERT INTO hstRECORD (id_bibrec, marcxml, job_id, job_name, job_person, job_date, job_details)
VALUES (%s,%s,%s,%s,%s,%s,%s)""",
(res[0][0], res[0][1], task_get_option('task', 0), 'bibupload', task_get_option('user','UNKNOWN'), res[0][2],
'mode: ' + task_get_option('mode','UNKNOWN') + '; file: ' + task_get_option('file_path','UNKNOWN') + '.'))
except Error, error:
write_message(" Error during archive_marcxml_for_history: %s " % error,
verbose=1, stream=sys.stderr)
return 1
return 0
def update_database_with_metadata(record, rec_id):
"""Update the database tables with the record and the record id given in parameter"""
for tag in record.keys():
# check if tag is not a special one:
if tag not in CFG_BIBUPLOAD_SPECIAL_TAGS:
# for each tag there is a list of tuples representing datafields
tuple_list = record[tag]
# this list should contain the elements of a full tag [tag, ind1, ind2, subfield_code]
tag_list = []
tag_list.append(tag)
for single_tuple in tuple_list:
# these are the contents of a single tuple
subfield_list = single_tuple[0]
ind1 = single_tuple[1]
ind2 = single_tuple[2]
# append the ind's to the full tag
if ind1 == '' or ind1 == ' ':
tag_list.append('_')
else:
tag_list.append(ind1)
if ind2 == '' or ind2 == ' ':
tag_list.append('_')
else:
tag_list.append(ind2)
datafield_number = single_tuple[4]
if tag in CFG_BIBUPLOAD_SPECIAL_TAGS:
# nothing to do for special tags (FFT, FMT)
pass
elif tag in CFG_BIBUPLOAD_CONTROLFIELD_TAGS and tag != "001":
value = single_tuple[3]
# get the full tag
full_tag = ''.join(tag_list)
# update the tables
write_message(" insertion of the tag "+full_tag+" with the value "+value, verbose=9)
# insert the tag and value into into bibxxx
(table_name, bibxxx_row_id) = insert_record_bibxxx(full_tag, value)
#print 'tname, bibrow', table_name, bibxxx_row_id;
if table_name is None or bibxxx_row_id is None:
write_message(" Failed : during insert_record_bibxxx", verbose=1, stream=sys.stderr)
# connect bibxxx and bibrec with the table bibrec_bibxxx
res = insert_record_bibrec_bibxxx(table_name, bibxxx_row_id, datafield_number, rec_id)
if res is None:
write_message(" Failed : during insert_record_bibrec_bibxxx", verbose=1, stream=sys.stderr)
else:
# get the tag and value from the content of each subfield
for subfield in subfield_list:
subtag = subfield[0]
value = subfield[1]
tag_list.append(subtag)
# get the full tag
full_tag = ''.join(tag_list)
# update the tables
write_message(" insertion of the tag "+full_tag+" with the value "+value, verbose=9)
# insert the tag and value into into bibxxx
(table_name, bibxxx_row_id) = insert_record_bibxxx(full_tag, value)
if table_name is None or bibxxx_row_id is None:
write_message(" Failed : during insert_record_bibxxx", verbose=1, stream=sys.stderr)
# connect bibxxx and bibrec with the table bibrec_bibxxx
res = insert_record_bibrec_bibxxx(table_name, bibxxx_row_id, datafield_number, rec_id)
if res is None:
write_message(" Failed : during insert_record_bibrec_bibxxx", verbose=1, stream=sys.stderr)
# remove the subtag from the list
tag_list.pop()
tag_list.pop()
tag_list.pop()
tag_list.pop()
write_message(" -Update the database with metadata : DONE", verbose=2)
def append_new_tag_to_old_record(record, rec_old, opt_tag, opt_mode):
"""Append new tags to a old record"""
if opt_tag is not None:
tag = opt_tag
if tag in CFG_BIBUPLOAD_CONTROLFIELD_TAGS:
if tag == '001':
pass
else:
# if it is a controlfield,just access the value
for single_tuple in record[tag]:
controlfield_value = single_tuple[3]
# add the field to the old record
newfield_number = record_add_field(rec_old, tag, "", "", controlfield_value)
if newfield_number is None:
write_message(" Error when adding the field"+tag, verbose=1, stream=sys.stderr)
else:
# For each tag there is a list of tuples representing datafields
for single_tuple in record[tag]:
# We retrieve the information of the tag
subfield_list = single_tuple[0]
ind1 = single_tuple[1]
ind2 = single_tuple[2]
# We add the datafield to the old record
write_message(" Adding tag: %s ind1=%s ind2=%s code=%s" % (tag, ind1, ind2, subfield_list), verbose=9)
newfield_number = record_add_field(rec_old, tag, ind1, ind2, "", subfield_list)
if newfield_number is None:
write_message("Error when adding the field"+tag, verbose=1, stream=sys.stderr)
else:
# Go through each tag in the appended record
for tag in record.keys():
# Reference mode append only reference tag
if opt_mode == 'reference':
if tag == CFG_BIBUPLOAD_REFERENCE_TAG:
for single_tuple in record[tag]:
# We retrieve the information of the tag
subfield_list = single_tuple[0]
ind1 = single_tuple[1]
ind2 = single_tuple[2]
# We add the datafield to the old record
write_message(" Adding tag: %s ind1=%s ind2=%s code=%s" % (tag, ind1, ind2, subfield_list), verbose=9)
newfield_number = record_add_field(rec_old, tag, ind1, ind2, "", subfield_list)
if newfield_number is None:
write_message(" Error when adding the field"+tag, verbose=1, stream=sys.stderr)
else:
if tag in CFG_BIBUPLOAD_CONTROLFIELD_TAGS:
if tag == '001':
pass
else:
# if it is a controlfield,just access the value
for single_tuple in record[tag]:
controlfield_value = single_tuple[3]
# add the field to the old record
newfield_number = record_add_field(rec_old, tag, "", "", controlfield_value)
if newfield_number is None:
write_message(" Error when adding the field"+tag, verbose=1, stream=sys.stderr)
else:
# For each tag there is a list of tuples representing datafields
for single_tuple in record[tag]:
# We retrieve the information of the tag
subfield_list = single_tuple[0]
ind1 = single_tuple[1]
ind2 = single_tuple[2]
# We add the datafield to the old record
write_message(" Adding tag: %s ind1=%s ind2=%s code=%s" % (tag, ind1, ind2, subfield_list), verbose=9)
newfield_number = record_add_field(rec_old, tag, ind1, ind2, "", subfield_list)
if newfield_number is None:
write_message(" Error when adding the field"+tag, verbose=1, stream=sys.stderr)
return rec_old
def copy_strong_tags_from_old_record(record, rec_old):
"""
Look for strong tags in RECORD and REC_OLD. If no strong tags are
found in RECORD, then copy them over from REC_OLD. This function
modifies RECORD structure on the spot.
"""
for strong_tag in CFG_BIBUPLOAD_STRONG_TAGS:
if not record_get_field_instances(record, strong_tag):
strong_tag_old_field_instances = record_get_field_instances(rec_old, strong_tag)
if strong_tag_old_field_instances:
for strong_tag_old_field_instance in strong_tag_old_field_instances:
sf_vals, fi_ind1, fi_ind2, controlfield, dummy = strong_tag_old_field_instance
record_add_field(record, strong_tag, fi_ind1, fi_ind2, controlfield, sf_vals)
return
### Delete functions
def delete_tags_to_correct(record, rec_old, opt_tag):
"""
Delete tags from REC_OLD which are also existing in RECORD. When
deleting, pay attention not only to tags, but also to indicators,
so that fields with the same tags but different indicators are not
deleted.
"""
# browse through all the tags from the MARCXML file:
for tag in record.keys():
# do we have to delete only a special tag or any tag?
if opt_tag is None or opt_tag == tag:
# check if the tag exists in the old record too:
if rec_old.has_key(tag) and tag != '001':
# the tag does exist, so delete all record's tag+ind1+ind2 combinations from rec_old
for dummy_sf_vals, ind1, ind2, dummy_cf, dummy_field_number in record[tag]:
write_message(" Delete tag: " + tag + " ind1=" + ind1 + " ind2=" + ind2, verbose=9)
record_delete_field(rec_old, tag, ind1, ind2)
def delete_bibrec_bibxxx(record, id_bibrec):
"""Delete the database record from the table bibxxx given in parameters"""
# we clear all the rows from bibrec_bibxxx from the old record
for tag in record.keys():
if tag not in CFG_BIBUPLOAD_SPECIAL_TAGS:
# for each name construct the bibrec_bibxxx table name
table_name = 'bibrec_bib'+tag[0:2]+'x'
# delete all the records with proper id_bibrec
query = """DELETE FROM `%s` where id_bibrec = %s"""
params = (table_name, id_bibrec)
try:
run_sql(query % params)
except Error, error:
write_message(" Error during the delete_bibrec_bibxxx function : %s " % error, verbose=1, stream=sys.stderr)
def wipe_out_record_from_all_tables(recid):
"""
Wipe out completely the record and all its traces of RECID from
the database (bibrec, bibrec_bibxxx, bibxxx, bibfmt). Useful for
the time being for test cases.
"""
# delete all the linked bibdocs
for bibdoc in BibRecDocs(recid).list_bibdocs():
bibdoc.expunge()
# delete from bibrec:
run_sql("DELETE FROM bibrec WHERE id=%s", (recid,))
# delete from bibrec_bibxxx:
for i in range(0, 10):
for j in range(0, 10):
run_sql("DELETE FROM %(bibrec_bibxxx)s WHERE id_bibrec=%%s" % \
{'bibrec_bibxxx': "bibrec_bib%i%ix" % (i, j)},
(recid,))
# delete all unused bibxxx values:
for i in range(0, 10):
for j in range(0, 10):
run_sql("DELETE %(bibxxx)s FROM %(bibxxx)s " \
" LEFT JOIN %(bibrec_bibxxx)s " \
" ON %(bibxxx)s.id=%(bibrec_bibxxx)s.id_bibxxx " \
" WHERE %(bibrec_bibxxx)s.id_bibrec IS NULL" % \
{'bibxxx': "bib%i%ix" % (i, j),
'bibrec_bibxxx': "bibrec_bib%i%ix" % (i, j)})
# delete from bibfmt:
run_sql("DELETE FROM bibfmt WHERE id_bibrec=%s", (recid,))
# delete from bibrec_bibdoc:
run_sql("DELETE FROM bibrec_bibdoc WHERE id_bibrec=%s", (recid,))
return
def delete_bibdoc(id_bibrec):
"""Delete document from bibdoc which correspond to the bibrec id given in parameter"""
query = """UPDATE bibdoc SET status='DELETED'
WHERE id IN (SELECT id_bibdoc FROM bibrec_bibdoc
WHERE id_bibrec=%s)"""
params = (id_bibrec,)
try:
run_sql(query, params)
except Error, error:
write_message(" Error during the delete_bibdoc function : %s " % error,
verbose=1, stream=sys.stderr)
def delete_bibrec_bibdoc(id_bibrec):
"""Delete the bibrec record from the table bibrec_bibdoc given in parameter"""
# delete all the records with proper id_bibrec
query = """DELETE FROM bibrec_bibdoc WHERE id_bibrec=%s"""
params = (id_bibrec,)
try:
run_sql(query, params)
except Error, error:
write_message(" Error during the delete_bibrec_bibdoc function : %s " % error,
verbose=1, stream=sys.stderr)
def main():
"""Main that construct all the bibtask."""
task_set_option('mode', None)
task_set_option('verbose', 1)
task_set_option("tag", None)
task_set_option("file_path", None)
task_set_option("notimechange", 0)
task_set_option("stage_to_start_from", 1)
task_init(authorization_action='runbibupload',
authorization_msg="BibUpload Task Submission",
description="""Receive MARC XML file and update appropriate database
tables according to options.
Examples:
$ bibupload -i input.xml
""",
help_specific_usage=""" -a, --append\t\tnew fields are appended to the existing record
-c, --correct\t\tfields are replaced by the new ones in the existing record
-f, --format\t\ttakes only the FMT fields into account. Does not update
-i, --insert\t\tinsert the new record in the database
-r, --replace\t\tthe existing record is entirely replaced by the new one
-z, --reference\tupdate references (update only 999 fields)
-s, --stage=STAGE\tstage to start from in the algorithm (0: always done; 1: FMT tags;
\t\t\t2: FFT tags; 3: BibFmt; 4: Metadata update; 5: time update)
-n, --notimechange\tdo not change record last modification date when updating
""",
version=__revision__,
specific_params=("ircazs:fn",
[
"insert",
"replace",
"correct",
"append",
"reference",
"stage=",
"format",
"notimechange",
]),
task_submit_elaborate_specific_parameter_fnc=task_submit_elaborate_specific_parameter,
task_run_fnc=task_run_core)
def task_submit_elaborate_specific_parameter(key, value, opts, args):
""" Given the string key it checks it's meaning, eventually using the
value. Usually it fills some key in the options dict.
It must return True if it has elaborated the key, False, if it doesn't
know that key.
eg:
if key in ['-n', '--number']:
task_get_option(\1) = value
return True
return False
"""
# No time change option
if key in ("-n", "--notimechange"):
task_set_option('notimechange', 1)
# Insert mode option
elif key in ("-i", "--insert"):
if task_get_option('mode') == 'replace':
# if also replace found, then set to replace_or_insert
task_set_option('mode', 'replace_or_insert')
else:
task_set_option('mode', 'insert')
task_set_option('file_path', os.path.abspath(args[0]))
# Replace mode option
elif key in ("-r", "--replace"):
if task_get_option('mode') == 'insert':
# if also insert found, then set to replace_or_insert
task_set_option('mode', 'replace_or_insert')
else:
task_set_option('mode', 'replace')
task_set_option('file_path', os.path.abspath(args[0]))
# Correct mode option
elif key in ("-c", "--correct"):
task_set_option('mode', 'correct')
task_set_option('file_path', os.path.abspath(args[0]))
# Append mode option
elif key in ("-a", "--append"):
task_set_option('mode', 'append')
task_set_option('file_path', os.path.abspath(args[0]))
# Reference mode option
elif key in ("-z", "--reference"):
task_set_option('mode', 'reference')
task_set_option('file_path', os.path.abspath(args[0]))
# Format mode option
elif key in ("-f", "--format"):
task_set_option('mode', 'format')
task_set_option('file_path', os.path.abspath(args[0]))
else:
return False
return True
def task_submit_check_options():
""" Reimplement this method for having the possibility to check options
before submitting the task, in order for example to provide default
values. It must return False if there are errors in the options.
"""
if task_get_option('mode') is None:
write_message("Please specify at least one update/insert mode!")
return False
if task_get_option('file_path') is None:
write_message("Missing filename! -h for help.")
return False
return True
def task_run_core():
""" Reimplement to add the body of the task."""
error = 0
write_message("Input file '%s', input mode '%s'." %
(task_get_option('file_path'), task_get_option('mode')))
write_message("STAGE 0:", verbose=2)
if task_get_option('file_path') is not None:
recs = xml_marc_to_records(open_marc_file(task_get_option('file_path')))
stat['nb_records_to_upload'] = len(recs)
write_message(" -Open XML marc: DONE", verbose=2)
if recs is not None:
# We proceed each record by record
for record in recs:
error = bibupload(
record,
opt_tag=task_get_option('tag'),
opt_mode=task_get_option('mode'),
opt_stage_to_start_from=task_get_option('stage_to_start_from'),
opt_notimechange=task_get_option('notimechange'))
if error[0] == 1:
if record:
write_message(record_xml_output(record),
stream=sys.stderr)
else:
write_message("Record could not have been parsed",
stream=sys.stderr)
stat['nb_errors'] += 1
task_update_progress("Done %d out of %d." % \
(stat['nb_records_inserted'] + \
stat['nb_records_updated'],
stat['nb_records_to_upload']))
else:
write_message(" Error bibupload failed: No record found",
verbose=1, stream=sys.stderr)
if task_get_option('verbose') >= 1:
# Print out the statistics
print_out_bibupload_statistics()
# Check if they were errors
return not stat['nb_errors'] >= 1
if __name__ == "__main__":
main()
diff --git a/modules/bibupload/lib/bibupload_config.py b/modules/bibupload/lib/bibupload_config.py
index 5d7cb469b..de8aac2fa 100644
--- a/modules/bibupload/lib/bibupload_config.py
+++ b/modules/bibupload/lib/bibupload_config.py
@@ -1,60 +1,60 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
BibUpload Engine configuration.
"""
__revision__ = "$Id$"
-from invenio.config import tmpdir
+from invenio.config import CFG_TMPDIR
CFG_BIBUPLOAD_CONTROLFIELD_TAGS = ['001', '002', '003', '004',
'005', '006', '007', '008']
CFG_BIBUPLOAD_SPECIAL_TAGS = ['FMT', 'FFT']
-CFG_BIBUPLOAD_FFT_ALLOWED_LOCAL_PATHS = ('/tmp', '/home', '/afs', tmpdir)
+CFG_BIBUPLOAD_FFT_ALLOWED_LOCAL_PATHS = ('/tmp', '/home', '/afs', CFG_TMPDIR)
CFG_BIBUPLOAD_REFERENCE_TAG = '999'
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG = '970__a' # useful for matching when
# our records come from an
# external digital library
# system
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG = '035__a' # useful for matching when
# we harvest stuff via OAI
# that we do not want to
# reexport via Invenio
# OAI; so records may have
# only the source OAI ID
# stored in this tag (kind
# of like external system
# number too)
CFG_BIBUPLOAD_STRONG_TAGS = ['964'] # The list of tags that are strong
# enough to resist the replace
# mode. Useful for tags that
# might be created from an
# external non-metadata-like
# source, e.g. the information
# about the number of copies left.
diff --git a/modules/bibupload/lib/bibupload_regression_tests.py b/modules/bibupload/lib/bibupload_regression_tests.py
index 3e695e31f..c1188956b 100644
--- a/modules/bibupload/lib/bibupload_regression_tests.py
+++ b/modules/bibupload/lib/bibupload_regression_tests.py
@@ -1,3093 +1,3093 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
# pylint: disable-msg=C0301
"""Regression tests for the BibUpload."""
__revision__ = "$Id$"
import re
import unittest
import os
import time
from urllib2 import urlopen
from md5 import md5
-from invenio.config import CFG_OAI_ID_FIELD, CFG_PREFIX, weburl, tmpdir
+from invenio.config import CFG_OAI_ID_FIELD, CFG_PREFIX, weburl, CFG_TMPDIR
from invenio import bibupload
from invenio.bibupload_config import CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG, \
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG, \
CFG_BIBUPLOAD_STRONG_TAGS
from invenio.search_engine import print_record
from invenio.dbquery import run_sql
from invenio.dateutils import convert_datestruct_to_datetext
from invenio.testutils import make_test_suite, warn_user_about_tests_and_run
from invenio.bibtask import task_set_option
from invenio.bibdocfile import BibRecDocs
# helper functions:
def remove_tag_001_from_xmbuffer(xmbuffer):
"""Remove tag 001 from MARCXML buffer. Useful for testing two
MARCXML buffers without paying attention to recIDs attributed
during the bibupload.
"""
return re.sub(r'.*', '', xmbuffer)
def compare_xmbuffers(xmbuffer1, xmbuffer2):
"""Compare two XM (XML MARC) buffers by removing whitespaces
before testing.
"""
def remove_blanks_from_xmbuffer(xmbuffer):
"""Remove \n and blanks from XMBUFFER."""
out = xmbuffer.replace("\n", "")
out = out.replace(" ", "")
return out
# remove whitespace:
xmbuffer1 = remove_blanks_from_xmbuffer(xmbuffer1)
xmbuffer2 = remove_blanks_from_xmbuffer(xmbuffer2)
if xmbuffer1 != xmbuffer2:
print "\n=" + xmbuffer1 + "=\n"
print "\n=" + xmbuffer2 + "=\n"
return (xmbuffer1 == xmbuffer2)
def remove_tag_001_from_hmbuffer(hmbuffer):
"""Remove tag 001 from HTML MARC buffer. Useful for testing two
HTML MARC buffers without paying attention to recIDs attributed
during the bibupload.
"""
return re.sub(r'(^|\n)(
)?[0-9]{9}\s001__\s\d+($|\n)', '', hmbuffer)
def compare_hmbuffers(hmbuffer1, hmbuffer2):
"""Compare two HM (HTML MARC) buffers by removing whitespaces
before testing.
"""
# remove eventual
...
formatting:
hmbuffer1 = re.sub(r'^
', '', hmbuffer1)
hmbuffer2 = re.sub(r'^
', '', hmbuffer2)
hmbuffer1 = re.sub(r'
$', '', hmbuffer1)
hmbuffer2 = re.sub(r'
$', '', hmbuffer2)
# remove leading recid, leaving only field values:
hmbuffer1 = re.sub(r'(^|\n)[0-9]{9}\s', '', hmbuffer1)
hmbuffer2 = re.sub(r'(^|\n)[0-9]{9}\s', '', hmbuffer2)
# remove leading whitespace:
hmbuffer1 = re.sub(r'(^|\n)\s+', '', hmbuffer1)
hmbuffer2 = re.sub(r'(^|\n)\s+', '', hmbuffer2)
compare_hmbuffers = hmbuffer1 == hmbuffer2
if not compare_hmbuffers:
print "\n=" + hmbuffer1 + "=\n"
print "\n=" + hmbuffer2 + "=\n"
return compare_hmbuffers
def try_url_download(url):
"""Try to download a given URL"""
try:
open_url = urlopen(url)
open_url.read()
except Exception, e:
raise StandardError, "Downloading %s is impossible because of %s" \
% (url, str(e))
return True
class BibUploadInsertModeTest(unittest.TestCase):
"""Testing insert mode."""
def setUp(self):
# pylint: disable-msg=C0103
"""Initialise the MARCXML variable"""
self.test = """somethingTester, J YMITTester, K JCERN2Tester, GCERN3test11test31test12test32test13test33test21test41test22test42test14test51test52Tester, TCERN"""
self.test_hm = """
100__ $$aTester, T$$uCERN
111__ $$atest11$$ctest31
111__ $$atest12$$ctest32
111__ $$atest13$$ctest33
111__ $$btest21$$dtest41
111__ $$btest22$$dtest42
111__ $$atest14
111__ $$etest51
111__ $$etest52
245__ $$asomething
700__ $$aTester, J Y$$uMIT
700__ $$aTester, K J$$uCERN2
700__ $$aTester, G$$uCERN3
"""
def test_create_record_id(self):
"""bibupload - insert mode, trying to create a new record ID in the database"""
rec_id = bibupload.create_new_record()
self.assertNotEqual(-1, rec_id)
def test_no_retrieve_record_id(self):
"""bibupload - insert mode, detection of record ID in the input file"""
# We create create the record out of the xml marc
recs = bibupload.xml_marc_to_records(self.test)
# We call the function which should retrieve the record id
rec_id = bibupload.retrieve_rec_id(recs[0], 'insert')
# We compare the value found with None
self.assertEqual(None, rec_id)
def test_insert_complete_xmlmarc(self):
"""bibupload - insert mode, trying to insert complete MARCXML file"""
# Initialize the global variable
task_set_option('verbose', 0)
# We create create the record out of the xml marc
recs = bibupload.xml_marc_to_records(self.test)
# We call the main function with the record as a parameter
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# We retrieve the inserted xml
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
# Compare if the two MARCXML are the same
self.failUnless(compare_xmbuffers(remove_tag_001_from_xmbuffer(inserted_xm),
self.test))
self.failUnless(compare_hmbuffers(remove_tag_001_from_hmbuffer(inserted_hm),
self.test_hm))
class BibUploadAppendModeTest(unittest.TestCase):
"""Testing append mode."""
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize the MARCXML variable"""
self.test_existing = """123456789Tester, TDESY"""
self.test_to_append = """123456789Tester, UCERN"""
self.test_expected_xm = """123456789Tester, TDESYTester, UCERN"""
self.test_expected_hm = """
001__ 123456789
100__ $$aTester, T$$uDESY
100__ $$aTester, U$$uCERN
"""
# insert test record:
test_to_upload = self.test_existing.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
self.test_recid = recid
# replace test buffers with real recid of inserted test record:
self.test_existing = self.test_existing.replace('123456789',
str(self.test_recid))
self.test_to_append = self.test_to_append.replace('123456789',
str(self.test_recid))
self.test_expected_xm = self.test_expected_xm.replace('123456789',
str(self.test_recid))
self.test_expected_hm = self.test_expected_hm.replace('123456789',
str(self.test_recid))
def test_retrieve_record_id(self):
"""bibupload - append mode, the input file should contain a record ID"""
task_set_option('verbose', 0)
# We create create the record out of the xml marc
recs = bibupload.xml_marc_to_records(self.test_to_append)
# We call the function which should retrieve the record id
rec_id = bibupload.retrieve_rec_id(recs[0], 'append')
# We compare the value found with None
self.assertEqual(str(self.test_recid), rec_id)
# clean up after ourselves:
bibupload.wipe_out_record_from_all_tables(self.test_recid)
return
def test_update_modification_record_date(self):
"""bibupload - append mode, checking the update of the modification date"""
# Initialize the global variable
task_set_option('verbose', 0)
# We create create the record out of the xml marc
recs = bibupload.xml_marc_to_records(self.test_existing)
# We call the function which should retrieve the record id
rec_id = bibupload.retrieve_rec_id(recs[0], opt_mode='append')
# Retrieve current localtime
now = time.localtime()
# We update the modification date
bibupload.update_bibrec_modif_date(convert_datestruct_to_datetext(now), rec_id)
# We retrieve the modification date from the database
query = """SELECT DATE_FORMAT(modification_date,'%%Y-%%m-%%d %%H:%%i:%%s') FROM bibrec where id = %s"""
res = run_sql(query % rec_id)
# We compare the two results
self.assertEqual(res[0][0], convert_datestruct_to_datetext(now))
# clean up after ourselves:
bibupload.wipe_out_record_from_all_tables(self.test_recid)
return
def test_append_complete_xml_marc(self):
"""bibupload - append mode, appending complete MARCXML file"""
# Now we append a datafield
# We create create the record out of the xml marc
recs = bibupload.xml_marc_to_records(self.test_to_append)
# We call the main function with the record as a parameter
err, recid = bibupload.bibupload(recs[0], opt_mode='append')
# We retrieve the inserted xm
after_append_xm = print_record(recid, 'xm')
after_append_hm = print_record(recid, 'hm')
# Compare if the two MARCXML are the same
self.failUnless(compare_xmbuffers(after_append_xm, self.test_expected_xm))
self.failUnless(compare_hmbuffers(after_append_hm, self.test_expected_hm))
# clean up after ourselves:
bibupload.wipe_out_record_from_all_tables(self.test_recid)
return
class BibUploadCorrectModeTest(unittest.TestCase):
"""
Testing correcting a record containing similar tags (identical
tag, different indicators). Currently CDS Invenio replaces only
those tags that have matching indicators too, unlike ALEPH500 that
does not pay attention to indicators, it corrects all fields with
the same tag, regardless of the indicator values.
"""
def setUp(self):
"""Initialize the MARCXML test record."""
self.testrec1_xm = """
123456789SzGeCERNTest, JaneTest InstituteTest, JohnTest UniversityCoolTest, JimTest Laboratory
"""
self.testrec1_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, Jane$$uTest Institute
10047 $$aTest, John$$uTest University
10048 $$aCool
10047 $$aTest, Jim$$uTest Laboratory
"""
self.testrec1_xm_to_correct = """
123456789Test, JosephTest AcademyTest2, JosephTest2 Academy
"""
self.testrec1_corrected_xm = """
123456789SzGeCERNTest, JaneTest InstituteCoolTest, JosephTest AcademyTest2, JosephTest2 Academy
"""
self.testrec1_corrected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, Jane$$uTest Institute
10048 $$aCool
10047 $$aTest, Joseph$$uTest Academy
10047 $$aTest2, Joseph$$uTest2 Academy
"""
# insert test record:
task_set_option('verbose', 0)
test_record_xm = self.testrec1_xm.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(test_record_xm)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recID:
self.testrec1_xm = self.testrec1_xm.replace('123456789', str(recid))
self.testrec1_hm = self.testrec1_hm.replace('123456789', str(recid))
self.testrec1_xm_to_correct = self.testrec1_xm_to_correct.replace('123456789', str(recid))
self.testrec1_corrected_xm = self.testrec1_corrected_xm.replace('123456789', str(recid))
self.testrec1_corrected_hm = self.testrec1_corrected_hm.replace('123456789', str(recid))
# test of the inserted record:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(compare_xmbuffers(inserted_xm, self.testrec1_xm))
self.failUnless(compare_hmbuffers(inserted_hm, self.testrec1_hm))
def test_record_correction(self):
"""bibupload - correct mode, similar MARCXML tags/indicators"""
# correct some tags:
recs = bibupload.xml_marc_to_records(self.testrec1_xm_to_correct)
err, recid = bibupload.bibupload(recs[0], opt_mode='correct')
corrected_xm = print_record(recid, 'xm')
corrected_hm = print_record(recid, 'hm')
# did it work?
self.failUnless(compare_xmbuffers(corrected_xm, self.testrec1_corrected_xm))
self.failUnless(compare_hmbuffers(corrected_hm, self.testrec1_corrected_hm))
# clean up after ourselves:
bibupload.wipe_out_record_from_all_tables(recid)
return
class BibUploadReplaceModeTest(unittest.TestCase):
"""Testing replace mode."""
def setUp(self):
"""Initialize the MARCXML test record."""
self.testrec1_xm = """
123456789SzGeCERNTest, JaneTest InstituteTest, JohnTest UniversityCoolTest, JimTest Laboratory
"""
self.testrec1_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, Jane$$uTest Institute
10047 $$aTest, John$$uTest University
10048 $$aCool
10047 $$aTest, Jim$$uTest Laboratory
"""
self.testrec1_xm_to_replace = """
123456789Test, JosephTest AcademyTest2, JosephTest2 Academy
"""
self.testrec1_replaced_xm = """
123456789Test, JosephTest AcademyTest2, JosephTest2 Academy
"""
self.testrec1_replaced_hm = """
001__ 123456789
10047 $$aTest, Joseph$$uTest Academy
10047 $$aTest2, Joseph$$uTest2 Academy
"""
# insert test record:
task_set_option('verbose', 0)
test_record_xm = self.testrec1_xm.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(test_record_xm)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recID:
self.testrec1_xm = self.testrec1_xm.replace('123456789', str(recid))
self.testrec1_hm = self.testrec1_hm.replace('123456789', str(recid))
self.testrec1_xm_to_replace = self.testrec1_xm_to_replace.replace('123456789', str(recid))
self.testrec1_replaced_xm = self.testrec1_replaced_xm.replace('123456789', str(recid))
self.testrec1_replaced_hm = self.testrec1_replaced_hm.replace('123456789', str(recid))
# test of the inserted record:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(compare_xmbuffers(inserted_xm, self.testrec1_xm))
self.failUnless(compare_hmbuffers(inserted_hm, self.testrec1_hm))
def test_record_replace(self):
"""bibupload - replace mode, similar MARCXML tags/indicators"""
# replace some tags:
recs = bibupload.xml_marc_to_records(self.testrec1_xm_to_replace)
err, recid = bibupload.bibupload(recs[0], opt_mode='replace')
replaced_xm = print_record(recid, 'xm')
replaced_hm = print_record(recid, 'hm')
# did it work?
self.failUnless(compare_xmbuffers(replaced_xm, self.testrec1_replaced_xm))
self.failUnless(compare_hmbuffers(replaced_hm, self.testrec1_replaced_hm))
# clean up after ourselves:
bibupload.wipe_out_record_from_all_tables(recid)
return
class BibUploadReferencesModeTest(unittest.TestCase):
"""Testing references mode."""
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize the MARCXML variable"""
self.test_insert = """123456789Tester, TCERN"""
self.test_reference = """123456789M. Lüscher and P. Weisz, String excitation energies in SU(N) gauge theories beyond the free-string approximation,J. High Energy Phys. 07 (2004) 014"""
self.test_reference_expected_xm = """123456789Tester, TCERNM. Lüscher and P. Weisz, String excitation energies in SU(N) gauge theories beyond the free-string approximation,J. High Energy Phys. 07 (2004) 014"""
self.test_insert_hm = """
001__ 123456789
100__ $$aTester, T$$uCERN
"""
self.test_reference_expected_hm = """
001__ 123456789
100__ $$aTester, T$$uCERN
%(reference_tag)sC5 $$mM. Lüscher and P. Weisz, String excitation energies in SU(N) gauge theories beyond the free-string approximation,$$sJ. High Energy Phys. 07 (2004) 014
""" % {'reference_tag': bibupload.CFG_BIBUPLOAD_REFERENCE_TAG}
# insert test record:
task_set_option('verbose', 0)
test_insert = self.test_insert.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(test_insert)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recID:
self.test_insert = self.test_insert.replace('123456789', str(recid))
self.test_insert_hm = self.test_insert_hm.replace('123456789', str(recid))
self.test_reference = self.test_reference.replace('123456789', str(recid))
self.test_reference_expected_xm = self.test_reference_expected_xm.replace('123456789', str(recid))
self.test_reference_expected_hm = self.test_reference_expected_hm.replace('123456789', str(recid))
# test of the inserted record:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(compare_xmbuffers(inserted_xm, self.test_insert))
self.failUnless(compare_hmbuffers(inserted_hm, self.test_insert_hm))
self.test_recid = recid
def test_reference_complete_xml_marc(self):
"""bibupload - reference mode, inserting references MARCXML file"""
# We create create the record out of the xml marc
recs = bibupload.xml_marc_to_records(self.test_reference)
# We call the main function with the record as a parameter
err, recid = bibupload.bibupload(recs[0], opt_mode='reference')
# We retrieve the inserted xml
reference_xm = print_record(recid, 'xm')
reference_hm = print_record(recid, 'hm')
# Compare if the two MARCXML are the same
self.failUnless(compare_xmbuffers(reference_xm, self.test_reference_expected_xm))
self.failUnless(compare_hmbuffers(reference_hm, self.test_reference_expected_hm))
# clean up after ourselves:
bibupload.wipe_out_record_from_all_tables(self.test_recid)
return
class BibUploadFMTModeTest(unittest.TestCase):
"""Testing FMT mode."""
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize the MARCXML variable"""
self.new_xm_with_fmt = """
SzGeCERNHBTest. Okay.Bar, BazFooOn the quux and huux
"""
self.expected_xm_after_inserting_new_xm_with_fmt = """
123456789SzGeCERNBar, BazFooOn the quux and huux
"""
self.expected_hm_after_inserting_new_xm_with_fmt = """
001__ 123456789
003__ SzGeCERN
100__ $$aBar, Baz$$uFoo
245__ $$aOn the quux and huux
"""
self.recid3_xm_before_all_the_tests = print_record(3, 'xm')
self.recid3_hm_before_all_the_tests = print_record(3, 'hm')
self.recid3_xm_with_fmt = """
3SzGeCERNHBTest. Here is some format value.Doe, JohnCERNOn the foos and bars
"""
self.recid3_xm_with_fmt_only_first = """
3HBTest. Let us see if this gets inserted well.
"""
self.recid3_xm_with_fmt_only_second = """
3HBTest. Yet another test, to be run after the first one.HDTest. Let's see what will be stored in the detailed format field.
"""
def restore_recid3(self):
"""Helper function that restores recID 3 MARCXML, using the
value saved before all the tests started to execute.
(see self.recid3_xm_before_all_the_tests).
Does not restore HB and HD formats.
"""
recs = bibupload.xml_marc_to_records(self.recid3_xm_before_all_the_tests)
err, recid = bibupload.bibupload(recs[0], opt_mode='replace')
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(compare_xmbuffers(inserted_xm, self.recid3_xm_before_all_the_tests))
self.failUnless(compare_hmbuffers(inserted_hm, self.recid3_hm_before_all_the_tests))
def test_inserting_new_record_containing_fmt_tag(self):
"""bibupload - FMT tag, inserting new record containing FMT tag"""
recs = bibupload.xml_marc_to_records(self.new_xm_with_fmt)
(dummy, new_recid) = bibupload.bibupload(recs[0], opt_mode='insert')
xm_after = print_record(new_recid, 'xm')
hm_after = print_record(new_recid, 'hm')
hb_after = print_record(new_recid, 'hb')
self.failUnless(compare_xmbuffers(xm_after,
self.expected_xm_after_inserting_new_xm_with_fmt.replace('123456789', str(new_recid))))
self.failUnless(compare_hmbuffers(hm_after,
self.expected_hm_after_inserting_new_xm_with_fmt.replace('123456789', str(new_recid))))
self.failUnless(hb_after.startswith("Test. Okay."))
def test_updating_existing_record_formats_in_format_mode(self):
"""bibupload - FMT tag, updating existing record via format mode"""
xm_before = print_record(3, 'xm')
hm_before = print_record(3, 'hm')
# insert first format value:
recs = bibupload.xml_marc_to_records(self.recid3_xm_with_fmt_only_first)
bibupload.bibupload(recs[0], opt_mode='format')
xm_after = print_record(3, 'xm')
hm_after = print_record(3, 'hm')
hb_after = print_record(3, 'hb')
self.assertEqual(xm_after, xm_before)
self.assertEqual(hm_after, hm_before)
self.failUnless(hb_after.startswith("Test. Let us see if this gets inserted well."))
# now insert another format value and recheck:
recs = bibupload.xml_marc_to_records(self.recid3_xm_with_fmt_only_second)
bibupload.bibupload(recs[0], opt_mode='format')
xm_after = print_record(3, 'xm')
hm_after = print_record(3, 'hm')
hb_after = print_record(3, 'hb')
hd_after = print_record(3, 'hd')
self.assertEqual(xm_after, xm_before)
self.assertEqual(hm_after, hm_before)
self.failUnless(hb_after.startswith("Test. Yet another test, to be run after the first one."))
self.failUnless(hd_after.startswith("Test. Let's see what will be stored in the detailed format field."))
# restore original record 3:
self.restore_recid3()
def test_updating_existing_record_formats_in_correct_mode(self):
"""bibupload - FMT tag, updating existing record via correct mode"""
xm_before = print_record(3, 'xm')
hm_before = print_record(3, 'hm')
# insert first format value:
recs = bibupload.xml_marc_to_records(self.recid3_xm_with_fmt_only_first)
bibupload.bibupload(recs[0], opt_mode='correct')
xm_after = print_record(3, 'xm')
hm_after = print_record(3, 'hm')
hb_after = print_record(3, 'hb')
self.assertEqual(xm_after, xm_before)
self.assertEqual(hm_after, hm_before)
self.failUnless(hb_after.startswith("Test. Let us see if this gets inserted well."))
# now insert another format value and recheck:
recs = bibupload.xml_marc_to_records(self.recid3_xm_with_fmt_only_second)
bibupload.bibupload(recs[0], opt_mode='correct')
xm_after = print_record(3, 'xm')
hm_after = print_record(3, 'hm')
hb_after = print_record(3, 'hb')
hd_after = print_record(3, 'hd')
self.assertEqual(xm_after, xm_before)
self.assertEqual(hm_after, hm_before)
self.failUnless(hb_after.startswith("Test. Yet another test, to be run after the first one."))
self.failUnless(hd_after.startswith("Test. Let's see what will be stored in the detailed format field."))
# restore original record 3:
self.restore_recid3()
def test_updating_existing_record_formats_in_replace_mode(self):
"""bibupload - FMT tag, updating existing record via replace mode"""
# insert first format value:
recs = bibupload.xml_marc_to_records(self.recid3_xm_with_fmt_only_first)
bibupload.bibupload(recs[0], opt_mode='replace')
xm_after = print_record(3, 'xm')
hm_after = print_record(3, 'hm')
hb_after = print_record(3, 'hb')
self.failUnless(compare_xmbuffers(xm_after,
'3'), 0)
self.failUnless(compare_hmbuffers(hm_after,
'000000003 001__ 3'), 0)
self.failUnless(hb_after.startswith("Test. Let us see if this gets inserted well."))
# now insert another format value and recheck:
recs = bibupload.xml_marc_to_records(self.recid3_xm_with_fmt_only_second)
bibupload.bibupload(recs[0], opt_mode='replace')
xm_after = print_record(3, 'xm')
hm_after = print_record(3, 'hm')
hb_after = print_record(3, 'hb')
hd_after = print_record(3, 'hd')
self.failUnless(compare_xmbuffers(xm_after, """
3"""))
self.failUnless(compare_hmbuffers(hm_after, '000000003 001__ 3'))
self.failUnless(hb_after.startswith("Test. Yet another test, to be run after the first one."))
self.failUnless(hd_after.startswith("Test. Let's see what will be stored in the detailed format field."))
# final insertion and recheck:
recs = bibupload.xml_marc_to_records(self.recid3_xm_with_fmt)
bibupload.bibupload(recs[0], opt_mode='replace')
xm_after = print_record(3, 'xm')
hm_after = print_record(3, 'hm')
hb_after = print_record(3, 'hb')
hd_after = print_record(3, 'hd')
self.failUnless(compare_xmbuffers(xm_after, """
3SzGeCERNDoe, JohnCERNOn the foos and bars
"""))
self.failUnless(compare_hmbuffers(hm_after, """
001__ 3
003__ SzGeCERN
100__ $$aDoe, John$$uCERN
245__ $$aOn the foos and bars
"""))
self.failUnless(hb_after.startswith("Test. Here is some format value."))
self.failUnless(hd_after.startswith("Test. Let's see what will be stored in the detailed format field."))
# restore original record 3:
self.restore_recid3()
class BibUploadRecordsWithSYSNOTest(unittest.TestCase):
"""Testing uploading of records that have external SYSNO present."""
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize the MARCXML test records."""
self.verbose = 0
# Note that SYSNO fields are repeated but with different
# subfields, this is to test whether bibupload would not
# mistakenly pick up wrong values.
self.xm_testrec1 = """
123456789SzGeCERNBar, BazFooOn the quux and huux 1sysno1sysno2
""" % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3],
'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] or " ",
'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] or " ",
'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6],
}
self.hm_testrec1 = """
001__ 123456789
003__ SzGeCERN
100__ $$aBar, Baz$$uFoo
245__ $$aOn the quux and huux 1
%(sysnotag)s%(sysnoind1)s%(sysnoind2)s $$%(sysnosubfieldcode)ssysno1
%(sysnotag)s%(sysnoind1)s%(sysnoind2)s $$0sysno2
""" % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3],
'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4],
'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5],
'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6],
}
self.xm_testrec1_to_update = """
SzGeCERNBar, BazFooOn the quux and huux 1 Updatedsysno1sysno2
""" % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3],
'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] or " ",
'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] or " ",
'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6],
}
self.xm_testrec1_updated = """
123456789SzGeCERNBar, BazFooOn the quux and huux 1 Updatedsysno1sysno2
""" % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3],
'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] or " ",
'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] or " ",
'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6],
}
self.hm_testrec1_updated = """
001__ 123456789
003__ SzGeCERN
100__ $$aBar, Baz$$uFoo
245__ $$aOn the quux and huux 1 Updated
%(sysnotag)s%(sysnoind1)s%(sysnoind2)s $$%(sysnosubfieldcode)ssysno1
%(sysnotag)s%(sysnoind1)s%(sysnoind2)s $$0sysno2
""" % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3],
'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4],
'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5],
'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6],
}
self.xm_testrec2 = """
987654321SzGeCERNBar, BazFooOn the quux and huux 2sysno2sysno1
""" % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3],
'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4] or " ",
'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5] or " ",
'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6],
}
self.hm_testrec2 = """
001__ 987654321
003__ SzGeCERN
100__ $$aBar, Baz$$uFoo
245__ $$aOn the quux and huux 2
%(sysnotag)s%(sysnoind1)s%(sysnoind2)s $$%(sysnosubfieldcode)ssysno2
%(sysnotag)s%(sysnoind1)s%(sysnoind2)s $$0sysno1
""" % {'sysnotag': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[0:3],
'sysnoind1': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[3:4],
'sysnoind2': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[4:5],
'sysnosubfieldcode': CFG_BIBUPLOAD_EXTERNAL_SYSNO_TAG[5:6],
}
def test_insert_the_same_sysno_record(self):
"""bibupload - SYSNO tag, refuse to insert the same SYSNO record"""
# initialize bibupload mode:
if self.verbose:
print "test_insert_the_same_sysno_record() started"
# insert record 1 first time:
testrec_to_insert_first = self.xm_testrec1.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
task_set_option('verbose', 0)
err1, recid1 = bibupload.bibupload(recs[0], opt_mode='insert')
inserted_xm = print_record(recid1, 'xm')
inserted_hm = print_record(recid1, 'hm')
# use real recID when comparing whether it worked:
self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1))
self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec1))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec1))
# insert record 2 first time:
testrec_to_insert_first = self.xm_testrec2.replace('987654321',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
task_set_option('verbose', 0)
err2, recid2 = bibupload.bibupload(recs[0], opt_mode='insert')
inserted_xm = print_record(recid2, 'xm')
inserted_hm = print_record(recid2, 'hm')
# use real recID when comparing whether it worked:
self.xm_testrec2 = self.xm_testrec2.replace('987654321', str(recid2))
self.hm_testrec2 = self.hm_testrec2.replace('987654321', str(recid2))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec2))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec2))
# try to insert updated record 1, it should fail:
recs = bibupload.xml_marc_to_records(self.xm_testrec1_to_update)
task_set_option('verbose', 0)
err1_updated, recid1_updated = bibupload.bibupload(recs[0], opt_mode='insert')
self.assertEqual(-1, recid1_updated)
# delete test records
bibupload.wipe_out_record_from_all_tables(recid1)
bibupload.wipe_out_record_from_all_tables(recid2)
bibupload.wipe_out_record_from_all_tables(recid1_updated)
if self.verbose:
print "test_insert_the_same_sysno_record() finished"
def test_insert_or_replace_the_same_sysno_record(self):
"""bibupload - SYSNO tag, allow to insert or replace the same SYSNO record"""
# initialize bibupload mode:
task_set_option('verbose', self.verbose)
if self.verbose:
print "test_insert_or_replace_the_same_sysno_record() started"
# insert/replace record 1 first time:
testrec_to_insert_first = self.xm_testrec1.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err1, recid1 = bibupload.bibupload(recs[0], opt_mode='replace_or_insert')
inserted_xm = print_record(recid1, 'xm')
inserted_hm = print_record(recid1, 'hm')
# use real recID in test buffers when comparing whether it worked:
self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1))
self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec1))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec1))
# try to insert/replace updated record 1, it should be okay:
task_set_option('verbose', self.verbose)
recs = bibupload.xml_marc_to_records(self.xm_testrec1_to_update)
err1_updated, recid1_updated = bibupload.bibupload(recs[0],
opt_mode='replace_or_insert')
inserted_xm = print_record(recid1_updated, 'xm')
inserted_hm = print_record(recid1_updated, 'hm')
self.assertEqual(recid1, recid1_updated)
# use real recID in test buffers when comparing whether it worked:
self.xm_testrec1_updated = self.xm_testrec1_updated.replace('123456789', str(recid1))
self.hm_testrec1_updated = self.hm_testrec1_updated.replace('123456789', str(recid1))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec1_updated))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec1_updated))
# delete test records
bibupload.wipe_out_record_from_all_tables(recid1)
bibupload.wipe_out_record_from_all_tables(recid1_updated)
if self.verbose:
print "test_insert_or_replace_the_same_sysno_record() finished"
def test_replace_nonexisting_sysno_record(self):
"""bibupload - SYSNO tag, refuse to replace non-existing SYSNO record"""
# initialize bibupload mode:
task_set_option('verbose', self.verbose)
if self.verbose:
print "test_replace_nonexisting_sysno_record() started"
# insert record 1 first time:
testrec_to_insert_first = self.xm_testrec1.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err1, recid1 = bibupload.bibupload(recs[0], opt_mode='replace_or_insert')
inserted_xm = print_record(recid1, 'xm')
inserted_hm = print_record(recid1, 'hm')
# use real recID in test buffers when comparing whether it worked:
self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1))
self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec1))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec1))
# try to replace record 2 it should fail:
testrec_to_insert_first = self.xm_testrec2.replace('987654321',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err2, recid2 = bibupload.bibupload(recs[0], opt_mode='replace')
self.assertEqual(-1, recid2)
# delete test records
bibupload.wipe_out_record_from_all_tables(recid1)
bibupload.wipe_out_record_from_all_tables(recid2)
if self.verbose:
print "test_replace_nonexisting_sysno_record() finished"
class BibUploadRecordsWithEXTOAIIDTest(unittest.TestCase):
"""Testing uploading of records that have external EXTOAIID present."""
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize the MARCXML test records."""
self.verbose = 0
# Note that EXTOAIID fields are repeated but with different
# subfields, this is to test whether bibupload would not
# mistakenly pick up wrong values.
self.xm_testrec1 = """
123456789SzGeCERNextoaiid1extoaiid2Bar, BazFooOn the quux and huux 1
""" % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3],
'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] or " ",
'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] or " ",
'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6],
}
self.hm_testrec1 = """
001__ 123456789
003__ SzGeCERN
%(extoaiidtag)s%(extoaiidind1)s%(extoaiidind2)s $$%(extoaiidsubfieldcode)sextoaiid1
%(extoaiidtag)s%(extoaiidind1)s%(extoaiidind2)s $$0extoaiid2
100__ $$aBar, Baz$$uFoo
245__ $$aOn the quux and huux 1
""" % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3],
'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4],
'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5],
'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6],
}
self.xm_testrec1_to_update = """
SzGeCERNextoaiid1extoaiid2Bar, BazFooOn the quux and huux 1 Updated
""" % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3],
'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] or " ",
'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] or " ",
'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6],
}
self.xm_testrec1_updated = """
123456789SzGeCERNextoaiid1extoaiid2Bar, BazFooOn the quux and huux 1 Updated
""" % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3],
'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] or " ",
'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] or " ",
'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6],
}
self.hm_testrec1_updated = """
001__ 123456789
003__ SzGeCERN
%(extoaiidtag)s%(extoaiidind1)s%(extoaiidind2)s $$%(extoaiidsubfieldcode)sextoaiid1
%(extoaiidtag)s%(extoaiidind1)s%(extoaiidind2)s $$0extoaiid2
100__ $$aBar, Baz$$uFoo
245__ $$aOn the quux and huux 1 Updated
""" % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3],
'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4],
'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5],
'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6],
}
self.xm_testrec2 = """
987654321SzGeCERNextoaiid2extoaiid1Bar, BazFooOn the quux and huux 2
""" % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3],
'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4] or " ",
'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] != "_" and \
CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5] or " ",
'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6],
}
self.hm_testrec2 = """
001__ 987654321
003__ SzGeCERN
%(extoaiidtag)s%(extoaiidind1)s%(extoaiidind2)s $$%(extoaiidsubfieldcode)sextoaiid2
%(extoaiidtag)s%(extoaiidind1)s%(extoaiidind2)s $$0extoaiid1
100__ $$aBar, Baz$$uFoo
245__ $$aOn the quux and huux 2
""" % {'extoaiidtag': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[0:3],
'extoaiidind1': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[3:4],
'extoaiidind2': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[4:5],
'extoaiidsubfieldcode': CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG[5:6],
}
def test_insert_the_same_extoaiid_record(self):
"""bibupload - EXTOAIID tag, refuse to insert the same EXTOAIID record"""
# initialize bibupload mode:
task_set_option('verbose', self.verbose)
if self.verbose:
print "test_insert_the_same_extoaiid_record() started"
# insert record 1 first time:
testrec_to_insert_first = self.xm_testrec1.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err1, recid1 = bibupload.bibupload(recs[0], opt_mode='insert')
inserted_xm = print_record(recid1, 'xm')
inserted_hm = print_record(recid1, 'hm')
# use real recID when comparing whether it worked:
self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1))
self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec1))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec1))
# insert record 2 first time:
testrec_to_insert_first = self.xm_testrec2.replace('987654321',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err2, recid2 = bibupload.bibupload(recs[0], opt_mode='insert')
inserted_xm = print_record(recid2, 'xm')
inserted_hm = print_record(recid2, 'hm')
# use real recID when comparing whether it worked:
self.xm_testrec2 = self.xm_testrec2.replace('987654321', str(recid2))
self.hm_testrec2 = self.hm_testrec2.replace('987654321', str(recid2))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec2))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec2))
# try to insert updated record 1, it should fail:
recs = bibupload.xml_marc_to_records(self.xm_testrec1_to_update)
err1_updated, recid1_updated = bibupload.bibupload(recs[0], opt_mode='insert')
self.assertEqual(-1, recid1_updated)
# delete test records
bibupload.wipe_out_record_from_all_tables(recid1)
bibupload.wipe_out_record_from_all_tables(recid2)
bibupload.wipe_out_record_from_all_tables(recid1_updated)
if self.verbose:
print "test_insert_the_same_extoaiid_record() finished"
def test_insert_or_replace_the_same_extoaiid_record(self):
"""bibupload - EXTOAIID tag, allow to insert or replace the same EXTOAIID record"""
# initialize bibupload mode:
task_set_option('verbose', self.verbose)
if self.verbose:
print "test_insert_or_replace_the_same_extoaiid_record() started"
# insert/replace record 1 first time:
testrec_to_insert_first = self.xm_testrec1.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err1, recid1 = bibupload.bibupload(recs[0], opt_mode='replace_or_insert')
inserted_xm = print_record(recid1, 'xm')
inserted_hm = print_record(recid1, 'hm')
# use real recID in test buffers when comparing whether it worked:
self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1))
self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec1))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec1))
# try to insert/replace updated record 1, it should be okay:
recs = bibupload.xml_marc_to_records(self.xm_testrec1_to_update)
err1_updated, recid1_updated = bibupload.bibupload(recs[0], opt_mode='replace_or_insert')
inserted_xm = print_record(recid1_updated, 'xm')
inserted_hm = print_record(recid1_updated, 'hm')
self.assertEqual(recid1, recid1_updated)
# use real recID in test buffers when comparing whether it worked:
self.xm_testrec1_updated = self.xm_testrec1_updated.replace('123456789', str(recid1))
self.hm_testrec1_updated = self.hm_testrec1_updated.replace('123456789', str(recid1))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec1_updated))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec1_updated))
# delete test records
bibupload.wipe_out_record_from_all_tables(recid1)
bibupload.wipe_out_record_from_all_tables(recid1_updated)
if self.verbose:
print "test_insert_or_replace_the_same_extoaiid_record() finished"
def test_replace_nonexisting_extoaiid_record(self):
"""bibupload - EXTOAIID tag, refuse to replace non-existing EXTOAIID record"""
# initialize bibupload mode:
task_set_option('verbose', self.verbose)
if self.verbose:
print "test_replace_nonexisting_extoaiid_record() started"
# insert record 1 first time:
testrec_to_insert_first = self.xm_testrec1.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err1, recid1 = bibupload.bibupload(recs[0], opt_mode='replace_or_insert')
inserted_xm = print_record(recid1, 'xm')
inserted_hm = print_record(recid1, 'hm')
# use real recID in test buffers when comparing whether it worked:
self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1))
self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec1))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec1))
# try to replace record 2 it should fail:
testrec_to_insert_first = self.xm_testrec2.replace('987654321',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err2, recid2 = bibupload.bibupload(recs[0], opt_mode='replace')
self.assertEqual(-1, recid2)
# delete test records
bibupload.wipe_out_record_from_all_tables(recid1)
bibupload.wipe_out_record_from_all_tables(recid2)
if self.verbose:
print "test_replace_nonexisting_extoaiid_record() finished"
class BibUploadRecordsWithOAIIDTest(unittest.TestCase):
"""Testing uploading of records that have OAI ID present."""
def setUp(self):
# pylint: disable-msg=C0103
"""Initialize the MARCXML test records."""
self.verbose = 0
# Note that OAI fields are repeated but with different
# subfields, this is to test whether bibupload would not
# mistakenly pick up wrong values.
self.xm_testrec1 = """
123456789SzGeCERNBar, BazFooOn the quux and huux 1oai:foo:1oai:foo:2
""" % {'oaitag': CFG_OAI_ID_FIELD[0:3],
'oaiind1': CFG_OAI_ID_FIELD[3:4] != "_" and \
CFG_OAI_ID_FIELD[3:4] or " ",
'oaiind2': CFG_OAI_ID_FIELD[4:5] != "_" and \
CFG_OAI_ID_FIELD[4:5] or " ",
'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6],
}
self.hm_testrec1 = """
001__ 123456789
003__ SzGeCERN
100__ $$aBar, Baz$$uFoo
245__ $$aOn the quux and huux 1
%(oaitag)s%(oaiind1)s%(oaiind2)s $$%(oaisubfieldcode)soai:foo:1
%(oaitag)s%(oaiind1)s%(oaiind2)s $$0oai:foo:2
""" % {'oaitag': CFG_OAI_ID_FIELD[0:3],
'oaiind1': CFG_OAI_ID_FIELD[3:4],
'oaiind2': CFG_OAI_ID_FIELD[4:5],
'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6],
}
self.xm_testrec1_to_update = """
SzGeCERNBar, BazFooOn the quux and huux 1 Updatedoai:foo:1oai:foo:2
""" % {'oaitag': CFG_OAI_ID_FIELD[0:3],
'oaiind1': CFG_OAI_ID_FIELD[3:4] != "_" and \
CFG_OAI_ID_FIELD[3:4] or " ",
'oaiind2': CFG_OAI_ID_FIELD[4:5] != "_" and \
CFG_OAI_ID_FIELD[4:5] or " ",
'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6],
}
self.xm_testrec1_updated = """
123456789SzGeCERNBar, BazFooOn the quux and huux 1 Updatedoai:foo:1oai:foo:2
""" % {'oaitag': CFG_OAI_ID_FIELD[0:3],
'oaiind1': CFG_OAI_ID_FIELD[3:4] != "_" and \
CFG_OAI_ID_FIELD[3:4] or " ",
'oaiind2': CFG_OAI_ID_FIELD[4:5] != "_" and \
CFG_OAI_ID_FIELD[4:5] or " ",
'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6],
}
self.hm_testrec1_updated = """
001__ 123456789
003__ SzGeCERN
100__ $$aBar, Baz$$uFoo
245__ $$aOn the quux and huux 1 Updated
%(oaitag)s%(oaiind1)s%(oaiind2)s $$%(oaisubfieldcode)soai:foo:1
%(oaitag)s%(oaiind1)s%(oaiind2)s $$0oai:foo:2
""" % {'oaitag': CFG_OAI_ID_FIELD[0:3],
'oaiind1': CFG_OAI_ID_FIELD[3:4],
'oaiind2': CFG_OAI_ID_FIELD[4:5],
'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6],
}
self.xm_testrec2 = """
987654321SzGeCERNBar, BazFooOn the quux and huux 2oai:foo:2oai:foo:1
""" % {'oaitag': CFG_OAI_ID_FIELD[0:3],
'oaiind1': CFG_OAI_ID_FIELD[3:4] != "_" and \
CFG_OAI_ID_FIELD[3:4] or " ",
'oaiind2': CFG_OAI_ID_FIELD[4:5] != "_" and \
CFG_OAI_ID_FIELD[4:5] or " ",
'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6],
}
self.hm_testrec2 = """
001__ 987654321
003__ SzGeCERN
100__ $$aBar, Baz$$uFoo
245__ $$aOn the quux and huux 2
%(oaitag)s%(oaiind1)s%(oaiind2)s $$%(oaisubfieldcode)soai:foo:2
%(oaitag)s%(oaiind1)s%(oaiind2)s $$0oai:foo:1
""" % {'oaitag': CFG_OAI_ID_FIELD[0:3],
'oaiind1': CFG_OAI_ID_FIELD[3:4],
'oaiind2': CFG_OAI_ID_FIELD[4:5],
'oaisubfieldcode': CFG_OAI_ID_FIELD[5:6],
}
def test_insert_the_same_oai_record(self):
"""bibupload - OAIID tag, refuse to insert the same OAI record"""
task_set_option('verbose', self.verbose)
# insert record 1 first time:
testrec_to_insert_first = self.xm_testrec1.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err1, recid1 = bibupload.bibupload(recs[0], opt_mode='insert')
inserted_xm = print_record(recid1, 'xm')
inserted_hm = print_record(recid1, 'hm')
# use real recID when comparing whether it worked:
self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1))
self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec1))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec1))
# insert record 2 first time:
testrec_to_insert_first = self.xm_testrec2.replace('987654321',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err2, recid2 = bibupload.bibupload(recs[0], opt_mode='insert')
inserted_xm = print_record(recid2, 'xm')
inserted_hm = print_record(recid2, 'hm')
# use real recID when comparing whether it worked:
self.xm_testrec2 = self.xm_testrec2.replace('987654321', str(recid2))
self.hm_testrec2 = self.hm_testrec2.replace('987654321', str(recid2))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec2))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec2))
# try to insert updated record 1, it should fail:
recs = bibupload.xml_marc_to_records(self.xm_testrec1_to_update)
err1_updated, recid1_updated = bibupload.bibupload(recs[0], opt_mode='insert')
self.assertEqual(-1, recid1_updated)
# delete test records
bibupload.wipe_out_record_from_all_tables(recid1)
bibupload.wipe_out_record_from_all_tables(recid2)
bibupload.wipe_out_record_from_all_tables(recid1_updated)
def test_insert_or_replace_the_same_oai_record(self):
"""bibupload - OAIID tag, allow to insert or replace the same OAI record"""
# initialize bibupload mode:
task_set_option('verbose', self.verbose)
# insert/replace record 1 first time:
testrec_to_insert_first = self.xm_testrec1.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err1, recid1 = bibupload.bibupload(recs[0], opt_mode='replace_or_insert')
inserted_xm = print_record(recid1, 'xm')
inserted_hm = print_record(recid1, 'hm')
# use real recID in test buffers when comparing whether it worked:
self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1))
self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec1))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec1))
# try to insert/replace updated record 1, it should be okay:
recs = bibupload.xml_marc_to_records(self.xm_testrec1_to_update)
err1_updated, recid1_updated = bibupload.bibupload(recs[0], opt_mode='replace_or_insert')
inserted_xm = print_record(recid1_updated, 'xm')
inserted_hm = print_record(recid1_updated, 'hm')
self.assertEqual(recid1, recid1_updated)
# use real recID in test buffers when comparing whether it worked:
self.xm_testrec1_updated = self.xm_testrec1_updated.replace('123456789', str(recid1))
self.hm_testrec1_updated = self.hm_testrec1_updated.replace('123456789', str(recid1))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec1_updated))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec1_updated))
# delete test records
bibupload.wipe_out_record_from_all_tables(recid1)
bibupload.wipe_out_record_from_all_tables(recid1_updated)
def test_replace_nonexisting_oai_record(self):
"""bibupload - OAIID tag, refuse to replace non-existing OAI record"""
task_set_option('verbose', self.verbose)
# insert record 1 first time:
testrec_to_insert_first = self.xm_testrec1.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err1, recid1 = bibupload.bibupload(recs[0], opt_mode='replace_or_insert')
inserted_xm = print_record(recid1, 'xm')
inserted_hm = print_record(recid1, 'hm')
# use real recID in test buffers when comparing whether it worked:
self.xm_testrec1 = self.xm_testrec1.replace('123456789', str(recid1))
self.hm_testrec1 = self.hm_testrec1.replace('123456789', str(recid1))
self.failUnless(compare_xmbuffers(inserted_xm,
self.xm_testrec1))
self.failUnless(compare_hmbuffers(inserted_hm,
self.hm_testrec1))
# try to replace record 2 it should fail:
testrec_to_insert_first = self.xm_testrec2.replace('987654321',
'')
recs = bibupload.xml_marc_to_records(testrec_to_insert_first)
err2, recid2 = bibupload.bibupload(recs[0], opt_mode='replace')
self.assertEqual(-1, recid2)
# delete test records
bibupload.wipe_out_record_from_all_tables(recid1)
bibupload.wipe_out_record_from_all_tables(recid2)
class BibUploadIndicatorsTest(unittest.TestCase):
"""
Testing uploading of a MARCXML record with indicators having
either blank space (as per MARC schema) or empty string value (old
behaviour).
"""
def setUp(self):
"""Initialize the MARCXML test record."""
self.testrec1_xm = """
SzGeCERNTest, JohnTest University
"""
self.testrec1_hm = """
003__ SzGeCERN
100__ $$aTest, John$$uTest University
"""
self.testrec2_xm = """
SzGeCERNTest, JohnTest University
"""
self.testrec2_hm = """
003__ SzGeCERN
100__ $$aTest, John$$uTest University
"""
def test_record_with_spaces_in_indicators(self):
"""bibupload - inserting MARCXML with spaces in indicators"""
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(self.testrec1_xm)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(compare_xmbuffers(remove_tag_001_from_xmbuffer(inserted_xm),
self.testrec1_xm))
self.failUnless(compare_hmbuffers(remove_tag_001_from_hmbuffer(inserted_hm),
self.testrec1_hm))
bibupload.wipe_out_record_from_all_tables(recid)
def test_record_with_no_spaces_in_indicators(self):
"""bibupload - inserting MARCXML with no spaces in indicators"""
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(self.testrec2_xm)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(compare_xmbuffers(remove_tag_001_from_xmbuffer(inserted_xm),
self.testrec2_xm))
self.failUnless(compare_hmbuffers(remove_tag_001_from_hmbuffer(inserted_hm),
self.testrec2_hm))
bibupload.wipe_out_record_from_all_tables(recid)
class BibUploadUpperLowerCaseTest(unittest.TestCase):
"""
Testing treatment of similar records with only upper and lower
case value differences in the bibxxx table.
"""
def setUp(self):
"""Initialize the MARCXML test records."""
self.testrec1_xm = """
SzGeCERNTest, JohnTest University
"""
self.testrec1_hm = """
003__ SzGeCERN
100__ $$aTest, John$$uTest University
"""
self.testrec2_xm = """
SzGeCERNTeSt, JoHnTest UniVeRsity
"""
self.testrec2_hm = """
003__ SzGeCERN
100__ $$aTeSt, JoHn$$uTest UniVeRsity
"""
def test_record_with_upper_lower_case_letters(self):
"""bibupload - inserting similar MARCXML records with upper/lower case"""
task_set_option('verbose', 0)
# insert test record #1:
recs = bibupload.xml_marc_to_records(self.testrec1_xm)
err1, recid1 = bibupload.bibupload(recs[0], opt_mode='insert')
recid1_inserted_xm = print_record(recid1, 'xm')
recid1_inserted_hm = print_record(recid1, 'hm')
# insert test record #2:
recs = bibupload.xml_marc_to_records(self.testrec2_xm)
err1, recid2 = bibupload.bibupload(recs[0], opt_mode='insert')
recid2_inserted_xm = print_record(recid2, 'xm')
recid2_inserted_hm = print_record(recid2, 'hm')
# let us compare stuff now:
self.failUnless(compare_xmbuffers(remove_tag_001_from_xmbuffer(recid1_inserted_xm),
self.testrec1_xm))
self.failUnless(compare_hmbuffers(remove_tag_001_from_hmbuffer(recid1_inserted_hm),
self.testrec1_hm))
self.failUnless(compare_xmbuffers(remove_tag_001_from_xmbuffer(recid2_inserted_xm),
self.testrec2_xm))
self.failUnless(compare_hmbuffers(remove_tag_001_from_hmbuffer(recid2_inserted_hm),
self.testrec2_hm))
# clean up after ourselves:
bibupload.wipe_out_record_from_all_tables(recid1)
bibupload.wipe_out_record_from_all_tables(recid2)
class BibUploadStrongTagsTest(unittest.TestCase):
"""Testing treatment of strong tags and the replace mode."""
def setUp(self):
"""Initialize the MARCXML test record."""
self.testrec1_xm = """
123456789SzGeCERNTest, JaneTest InstituteTest titleA valueAnother value
""" % {'strong_tag': bibupload.CFG_BIBUPLOAD_STRONG_TAGS[0]}
self.testrec1_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, Jane$$uTest Institute
245__ $$aTest title
%(strong_tag)s__ $$aA value$$bAnother value
""" % {'strong_tag': bibupload.CFG_BIBUPLOAD_STRONG_TAGS[0]}
self.testrec1_xm_to_replace = """
123456789Test, JosephTest Academy
"""
self.testrec1_replaced_xm = """
123456789Test, JosephTest AcademyA valueAnother value
""" % {'strong_tag': bibupload.CFG_BIBUPLOAD_STRONG_TAGS[0]}
self.testrec1_replaced_hm = """
001__ 123456789
100__ $$aTest, Joseph$$uTest Academy
%(strong_tag)s__ $$aA value$$bAnother value
""" % {'strong_tag': bibupload.CFG_BIBUPLOAD_STRONG_TAGS[0]}
# insert test record:
task_set_option('verbose', 0)
test_record_xm = self.testrec1_xm.replace('123456789',
'')
recs = bibupload.xml_marc_to_records(test_record_xm)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recID:
self.testrec1_xm = self.testrec1_xm.replace('123456789', str(recid))
self.testrec1_hm = self.testrec1_hm.replace('123456789', str(recid))
self.testrec1_xm_to_replace = self.testrec1_xm_to_replace.replace('123456789', str(recid))
self.testrec1_replaced_xm = self.testrec1_replaced_xm.replace('123456789', str(recid))
self.testrec1_replaced_hm = self.testrec1_replaced_hm.replace('123456789', str(recid))
# test of the inserted record:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(compare_xmbuffers(inserted_xm, self.testrec1_xm))
self.failUnless(compare_hmbuffers(inserted_hm, self.testrec1_hm))
def test_strong_tags_persistence(self):
"""bibupload - strong tags, persistence in replace mode"""
# replace all metadata tags; will the strong tags be kept?
recs = bibupload.xml_marc_to_records(self.testrec1_xm_to_replace)
err, recid = bibupload.bibupload(recs[0], opt_mode='replace')
replaced_xm = print_record(recid, 'xm')
replaced_hm = print_record(recid, 'hm')
# did it work?
self.failUnless(compare_xmbuffers(replaced_xm, self.testrec1_replaced_xm))
self.failUnless(compare_hmbuffers(replaced_hm, self.testrec1_replaced_hm))
# clean up after ourselves:
bibupload.wipe_out_record_from_all_tables(recid)
return
class BibUploadFFTModeTest(unittest.TestCase):
"""
Testing treatment of fulltext file transfer import mode.
"""
def _test_bibdoc_status(self, recid, docname, status):
res = run_sql('SELECT bd.status FROM bibrec_bibdoc as bb JOIN bibdoc as bd ON bb.id_bibdoc = bd.id WHERE bb.id_bibrec = %s AND bd.docname = %s', (recid, docname))
self.failUnless(res)
self.assertEqual(status, res[0][0])
def test_simple_fft_insert(self):
"""bibupload - simple FFT insert"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest Universityhttp://cds.cern.ch/img/cds.gif
"""
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/cds.gif
""" % {'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/cds.gif
""" % {'weburl': weburl}
testrec_expected_url = "%(weburl)s/record/123456789/files/cds.gif" \
% {'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url = testrec_expected_url.replace('123456789',
str(recid))
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
self.failUnless(try_url_download(testrec_expected_url))
bibupload.wipe_out_record_from_all_tables(recid)
def test_exotic_format_fft_append(self):
"""bibupload - exotic format FFT append"""
# define the test case:
- testfile = os.path.join(tmpdir, 'test.ps.Z')
+ testfile = os.path.join(CFG_TMPDIR, 'test.ps.Z')
open(testfile, 'w').write('TEST')
test_to_upload = """
SzGeCERNTest, JohnTest University
"""
testrec_to_append = """
123456789%s
""" % testfile
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/test.ps.Z
""" % {'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/test.ps.Z
""" % {'weburl': weburl}
testrec_expected_url = "%(weburl)s/record/123456789/files/test.ps.Z" \
% {'weburl': weburl}
testrec_expected_url2 = "%(weburl)s/record/123456789/files/test?format=ps.Z" \
% {'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_to_append = testrec_to_append.replace('123456789',
str(recid))
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url = testrec_expected_url.replace('123456789',
str(recid))
testrec_expected_url2 = testrec_expected_url.replace('123456789',
str(recid))
recs = bibupload.xml_marc_to_records(testrec_to_append)
err, recid = bibupload.bibupload(recs[0], opt_mode='append')
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
self.assertEqual(urlopen(testrec_expected_url).read(), 'TEST')
self.assertEqual(urlopen(testrec_expected_url2).read(), 'TEST')
bibupload.wipe_out_record_from_all_tables(recid)
def test_fft_check_md5_through_bibrecdoc_str(self):
"""bibupload - simple FFT insert, check md5 through BibRecDocs.str()"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest University%s/img/head.gif
""" % weburl
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
original_md5 = md5(urlopen('%s/img/head.gif' % weburl).read()).hexdigest()
bibrec_str = str(BibRecDocs(int(recid)))
md5_found = False
for row in bibrec_str.split('\n'):
if 'checksum' in row:
if original_md5 in row:
md5_found = True
self.failUnless(md5_found)
bibupload.wipe_out_record_from_all_tables(recid)
def test_detailed_fft_insert(self):
"""bibupload - detailed FFT insert"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest Universityhttp://cds.cern.ch/img/cds.gifSuperMainThis is a descriptionThis is a commentCIDIESSEhttp://cds.cern.ch/img/cds.gifSuperMain.jpegThis is a descriptionThis is a second commentCIDIESSE
"""
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/CIDIESSE.gifThis is a descriptionThis is a comment%(weburl)s/record/123456789/files/CIDIESSE.jpegThis is a descriptionThis is a second comment
""" % {'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/CIDIESSE.gif$$yThis is a description$$zThis is a comment
8564_ $$u%(weburl)s/record/123456789/files/CIDIESSE.jpeg$$yThis is a description$$zThis is a second comment
""" % {'weburl': weburl}
testrec_expected_url1 = "%(weburl)s/record/123456789/files/CIDIESSE.gif" % {'weburl': weburl}
testrec_expected_url2 = "%(weburl)s/record/123456789/files/CIDIESSE.jpeg" % {'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url1 = testrec_expected_url1.replace('123456789',
str(recid))
testrec_expected_url2 = testrec_expected_url1.replace('123456789',
str(recid))
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
self.failUnless(try_url_download(testrec_expected_url1))
self.failUnless(try_url_download(testrec_expected_url2))
bibupload.wipe_out_record_from_all_tables(recid)
def test_simple_fft_insert_with_restriction(self):
"""bibupload - simple FFT insert with restriction"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest Universityhttp://cds.cern.ch/img/cds.gifthesishttp://cds.cern.ch/img/cds.gif
"""
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/cds.gif%(weburl)s/record/123456789/files/icon-cds.gificon
""" % {'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/cds.gif
8564_ $$q%(weburl)s/record/123456789/files/icon-cds.gif$$xicon
""" % {'weburl': weburl}
testrec_expected_url = "%(weburl)s/record/123456789/files/cds.gif" \
% {'weburl': weburl}
testrec_expected_icon = "%(weburl)s/record/123456789/files/icon-cds.gif" \
% {'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url = testrec_expected_url.replace('123456789',
str(recid))
testrec_expected_icon = testrec_expected_icon.replace('123456789',
str(recid))
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
open_url = urlopen(testrec_expected_url)
self.failUnless("This file is restricted" in open_url.read())
open_icon = urlopen(testrec_expected_icon)
restricted_icon = urlopen("%s/img/restricted.gif" % weburl)
self.failUnless(open_icon.read() == restricted_icon.read())
bibupload.wipe_out_record_from_all_tables(recid)
def test_simple_fft_insert_with_icon(self):
"""bibupload - simple FFT insert with icon"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest Universityhttp://cds.cern.ch/img/cds.gifhttp://cds.cern.ch/img/cds.gif
"""
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/cds.gif%(weburl)s/record/123456789/files/icon-cds.gificon
""" % {'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/cds.gif
8564_ $$q%(weburl)s/record/123456789/files/icon-cds.gif$$xicon
""" % {'weburl': weburl}
testrec_expected_url = "%(weburl)s/record/123456789/files/cds.gif" \
% {'weburl': weburl}
testrec_expected_icon = "%(weburl)s/record/123456789/files/icon-cds.gif" \
% {'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url = testrec_expected_url.replace('123456789',
str(recid))
testrec_expected_icon = testrec_expected_icon.replace('123456789',
str(recid))
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
self.failUnless(try_url_download(testrec_expected_url))
self.failUnless(try_url_download(testrec_expected_icon))
bibupload.wipe_out_record_from_all_tables(recid)
def test_multiple_fft_insert(self):
"""bibupload - multiple FFT insert"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest Universityhttp://cds.cern.ch/img/cds.gifhttp://cdsweb.cern.ch/img/head.gifhttp://doc.cern.ch/archive/electronic/hep-th/0101/0101001.pdf%(prefix)s/var/tmp/demobibdata.xml/etc/passwd
""" % { 'prefix': CFG_PREFIX }
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/0101001.pdf%(weburl)s/record/123456789/files/cds.gif%(weburl)s/record/123456789/files/demobibdata.xml%(weburl)s/record/123456789/files/head.gif
""" % { 'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/0101001.pdf
8564_ $$u%(weburl)s/record/123456789/files/cds.gif
8564_ $$u%(weburl)s/record/123456789/files/demobibdata.xml
8564_ $$u%(weburl)s/record/123456789/files/head.gif
""" % { 'weburl': weburl}
# insert test record:
testrec_expected_urls = []
for files in ('cds.gif', 'head.gif', '0101001.pdf', 'demobibdata.xml'):
testrec_expected_urls.append('%(weburl)s/record/123456789/files/%(files)s' % {'weburl' : weburl, 'files' : files})
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_urls = []
for files in ('cds.gif', 'head.gif', '0101001.pdf', 'demobibdata.xml'):
testrec_expected_urls.append('%(weburl)s/record/%(recid)s/files/%(files)s' % {'weburl' : weburl, 'files' : files, 'recid' : recid})
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
# FIXME: Next test has been commented out since, appearently, the
# returned xml can have non predictable row order (but still correct)
# Using only html marc output is fine because a value is represented
# by a single row, so a row to row comparison can be employed.
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
for url in testrec_expected_urls:
self.failUnless(try_url_download(url))
self._test_bibdoc_status(recid, 'head', '')
self._test_bibdoc_status(recid, '0101001', '')
self._test_bibdoc_status(recid, 'cds', '')
self._test_bibdoc_status(recid, 'demobibdata', '')
bibupload.wipe_out_record_from_all_tables(recid)
def test_simple_fft_correct(self):
"""bibupload - simple FFT correct"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest Universityhttp://cds.cern.ch/img/cds.gif
"""
test_to_correct = """
123456789http://cds.cern.ch/img/cds.gif
"""
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/cds.gif
""" % { 'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/cds.gif
""" % { 'weburl': weburl}
testrec_expected_url = "%(weburl)s/record/123456789/files/patata.gif" \
% {'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url = testrec_expected_url.replace('123456789',
str(recid))
test_to_correct = test_to_correct.replace('123456789',
str(recid))
# correct test record with new FFT:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_correct)
bibupload.bibupload(recs[0], opt_mode='correct')
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(try_url_download(testrec_expected_url))
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
self._test_bibdoc_status(recid, 'cds', '')
#print "\nRecid: " + str(recid) + "\n"
#print testrec_expected_hm + "\n"
#print print_record(recid, 'hm') + "\n"
bibupload.wipe_out_record_from_all_tables(recid)
def test_detailed_fft_correct(self):
"""bibupload - detailed FFT correct"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest Universityhttp://cds.cern.ch/img/cds.gifTryComment
"""
test_to_correct = """
123456789http://cdsweb.cern.ch/img/head.gifcdspatataNext TryKEEP-OLD-VALUE
"""
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/patata.gifNext TryComment
""" % { 'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/patata.gif$$yNext Try$$zComment
""" % { 'weburl': weburl}
testrec_expected_url = "%(weburl)s/record/123456789/files/patata.gif" \
% {'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url = testrec_expected_url.replace('123456789',
str(recid))
test_to_correct = test_to_correct.replace('123456789',
str(recid))
# correct test record with new FFT:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_correct)
bibupload.bibupload(recs[0], opt_mode='correct')
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(try_url_download(testrec_expected_url))
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
self._test_bibdoc_status(recid, 'patata', '')
#print "\nRecid: " + str(recid) + "\n"
#print testrec_expected_hm + "\n"
#print print_record(recid, 'hm') + "\n"
bibupload.wipe_out_record_from_all_tables(recid)
def test_no_url_fft_correct(self):
"""bibupload - no_url FFT correct"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest Universityhttp://cds.cern.ch/img/cds.gifTryComment
"""
test_to_correct = """
123456789cdspatata.gifKEEP-OLD-VALUENext Comment
"""
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/patata.gifTryNext Comment
""" % { 'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/patata.gif$$yTry$$zNext Comment
""" % { 'weburl': weburl}
testrec_expected_url = "%(weburl)s/record/123456789/files/patata.gif" \
% {'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url = testrec_expected_url.replace('123456789',
str(recid))
test_to_correct = test_to_correct.replace('123456789',
str(recid))
# correct test record with new FFT:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_correct)
bibupload.bibupload(recs[0], opt_mode='correct')
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(try_url_download(testrec_expected_url))
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
self._test_bibdoc_status(recid, 'patata', '')
#print "\nRecid: " + str(recid) + "\n"
#print testrec_expected_hm + "\n"
#print print_record(recid, 'hm') + "\n"
bibupload.wipe_out_record_from_all_tables(recid)
def test_new_icon_fft_append(self):
"""bibupload - new icon FFT append"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest University
"""
test_to_correct = """
123456789cdshttp://cds.cern.ch/img/cds.gif
"""
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/icon-cds.gificon
""" % { 'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$q%(weburl)s/record/123456789/files/icon-cds.gif$$xicon
""" % { 'weburl': weburl}
testrec_expected_url = "%(weburl)s/record/123456789/files/icon-cds.gif" \
% {'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url = testrec_expected_url.replace('123456789',
str(recid))
test_to_correct = test_to_correct.replace('123456789',
str(recid))
# correct test record with new FFT:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_correct)
bibupload.bibupload(recs[0], opt_mode='append')
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(try_url_download(testrec_expected_url))
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
self._test_bibdoc_status(recid, 'cds', '')
#print "\nRecid: " + str(recid) + "\n"
#print testrec_expected_hm + "\n"
#print print_record(recid, 'hm') + "\n"
bibupload.wipe_out_record_from_all_tables(recid)
def test_multiple_fft_correct(self):
"""bibupload - multiple FFT correct"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest Universityhttp://cds.cern.ch/img/cds.gifTryCommentRestrictedhttp://cds.cern.ch/img/cds.gif.jpegTry jpegComment jpegRestricted
"""
test_to_correct = """
123456789http://cds.cern.ch/img/cds.gifpatata.gifNew restricted
"""
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/patata.gif
""" % { 'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/patata.gif
""" % { 'weburl': weburl}
testrec_expected_url = "%(weburl)s/record/123456789/files/patata.gif" \
% {'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url = testrec_expected_url.replace('123456789',
str(recid))
test_to_correct = test_to_correct.replace('123456789',
str(recid))
# correct test record with new FFT:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_correct)
bibupload.bibupload(recs[0], opt_mode='correct')
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(try_url_download(testrec_expected_url))
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
self._test_bibdoc_status(recid, 'patata', 'New restricted')
#print "\nRecid: " + str(recid) + "\n"
#print testrec_expected_hm + "\n"
#print print_record(recid, 'hm') + "\n"
bibupload.wipe_out_record_from_all_tables(recid)
def test_purge_fft_correct(self):
"""bibupload - purge FFT correct"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest Universityhttp://cds.cern.ch/img/cds.gifhttp://cdsweb.cern.ch/img/head.gif
"""
test_to_correct = """
123456789http://cds.cern.ch/img/cds.gif
"""
test_to_purge = """
123456789http://cds.cern.ch/img/cds.gifPURGE
"""
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/cds.gif%(weburl)s/record/123456789/files/head.gif
""" % { 'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/cds.gif
8564_ $$u%(weburl)s/record/123456789/files/head.gif
""" % { 'weburl': weburl}
testrec_expected_url = "%(weburl)s/record/123456789/files/cds.gif" % { 'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url = testrec_expected_url.replace('123456789',
str(recid))
test_to_correct = test_to_correct.replace('123456789',
str(recid))
test_to_purge = test_to_purge.replace('123456789',
str(recid))
# correct test record with new FFT:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_correct)
bibupload.bibupload(recs[0], opt_mode='correct')
# purge test record with new FFT:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_purge)
bibupload.bibupload(recs[0], opt_mode='correct')
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(try_url_download(testrec_expected_url))
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
self._test_bibdoc_status(recid, 'cds', '')
self._test_bibdoc_status(recid, 'head', '')
#print "\nRecid: " + str(recid) + "\n"
#print testrec_expected_hm + "\n"
#print print_record(recid, 'hm') + "\n"
bibupload.wipe_out_record_from_all_tables(recid)
def test_revert_fft_correct(self):
"""bibupload - revert FFT correct"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest University%s/img/iconpen.gifcds
""" % weburl
test_to_correct = """
123456789%s/img/head.gifcds
""" % weburl
test_to_revert = """
123456789cdsREVERT1
"""
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/cds.gif
""" % { 'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/cds.gif
""" % { 'weburl': weburl}
testrec_expected_url = "%(weburl)s/record/123456789/files/cds.gif" % { 'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url = testrec_expected_url.replace('123456789',
str(recid))
test_to_correct = test_to_correct.replace('123456789',
str(recid))
test_to_revert = test_to_revert.replace('123456789',
str(recid))
# correct test record with new FFT:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_correct)
bibupload.bibupload(recs[0], opt_mode='correct')
# revert test record with new FFT:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_revert)
bibupload.bibupload(recs[0], opt_mode='correct')
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(try_url_download(testrec_expected_url))
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
self._test_bibdoc_status(recid, 'cds', '')
expected_content_version1 = urlopen('%s/img/iconpen.gif' % weburl).read()
expected_content_version2 = urlopen('%s/img/head.gif' % weburl).read()
expected_content_version3 = expected_content_version1
content_version1 = urlopen('%s/record/%s/files/cds.gif?version=1' % (weburl, recid)).read()
content_version2 = urlopen('%s/record/%s/files/cds.gif?version=2' % (weburl, recid)).read()
content_version3 = urlopen('%s/record/%s/files/cds.gif?version=3' % (weburl, recid)).read()
self.assertEqual(expected_content_version1, content_version1)
self.assertEqual(expected_content_version2, content_version2)
self.assertEqual(expected_content_version3, content_version3)
#print "\nRecid: " + str(recid) + "\n"
#print testrec_expected_hm + "\n"
#print print_record(recid, 'hm') + "\n"
bibupload.wipe_out_record_from_all_tables(recid)
def test_simple_fft_replace(self):
"""bibupload - simple FFT replace"""
# define the test case:
test_to_upload = """
SzGeCERNTest, JohnTest University%s/img/iconpen.gifcds
""" % weburl
test_to_replace = """
123456789SzGeCERNTest, JohnTest University%s/img/head.gif
""" % weburl
testrec_expected_xm = """
123456789SzGeCERNTest, JohnTest University%(weburl)s/record/123456789/files/head.gif
""" % { 'weburl': weburl}
testrec_expected_hm = """
001__ 123456789
003__ SzGeCERN
100__ $$aTest, John$$uTest University
8564_ $$u%(weburl)s/record/123456789/files/head.gif
""" % { 'weburl': weburl}
testrec_expected_url = "%(weburl)s/record/123456789/files/head.gif" % { 'weburl': weburl}
# insert test record:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_upload)
err, recid = bibupload.bibupload(recs[0], opt_mode='insert')
# replace test buffers with real recid of inserted test record:
testrec_expected_xm = testrec_expected_xm.replace('123456789',
str(recid))
testrec_expected_hm = testrec_expected_hm.replace('123456789',
str(recid))
testrec_expected_url = testrec_expected_url.replace('123456789',
str(recid))
test_to_replace = test_to_replace.replace('123456789',
str(recid))
# replace test record with new FFT:
task_set_option('verbose', 0)
recs = bibupload.xml_marc_to_records(test_to_replace)
bibupload.bibupload(recs[0], opt_mode='replace')
# compare expected results:
inserted_xm = print_record(recid, 'xm')
inserted_hm = print_record(recid, 'hm')
self.failUnless(try_url_download(testrec_expected_url))
self.failUnless(compare_xmbuffers(inserted_xm,
testrec_expected_xm))
self.failUnless(compare_hmbuffers(inserted_hm,
testrec_expected_hm))
expected_content_version = urlopen('%s/img/head.gif' % weburl).read()
content_version = urlopen('%s/record/%s/files/head.gif' % (weburl, recid)).read()
self.assertEqual(expected_content_version, content_version)
#print "\nRecid: " + str(recid) + "\n"
#print testrec_expected_hm + "\n"
#print print_record(recid, 'hm') + "\n"
bibupload.wipe_out_record_from_all_tables(recid)
test_suite = make_test_suite(BibUploadInsertModeTest,
BibUploadAppendModeTest,
BibUploadCorrectModeTest,
BibUploadReplaceModeTest,
BibUploadReferencesModeTest,
BibUploadRecordsWithSYSNOTest,
BibUploadRecordsWithEXTOAIIDTest,
BibUploadRecordsWithOAIIDTest,
BibUploadFMTModeTest,
BibUploadIndicatorsTest,
BibUploadUpperLowerCaseTest,
BibUploadStrongTagsTest,
BibUploadFFTModeTest)
#test_suite = make_test_suite(BibUploadStrongTagsTest,)
if __name__ == "__main__":
warn_user_about_tests_and_run(test_suite)
diff --git a/modules/elmsubmit/lib/elmsubmit.py b/modules/elmsubmit/lib/elmsubmit.py
index 1662e3bbf..54a3291d8 100644
--- a/modules/elmsubmit/lib/elmsubmit.py
+++ b/modules/elmsubmit/lib/elmsubmit.py
@@ -1,273 +1,273 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
__revision__ = "$Id$"
# import sys
import os
import os.path
import smtplib
import invenio.elmsubmit_EZEmail as elmsubmit_EZEmail
import invenio.elmsubmit_submission_parser as elmsubmit_submission_parser
# import the config file
-from invenio.config import tmpdir, cdsname
-from invenio.config import storage
+from invenio.config import CFG_TMPDIR, cdsname
+from invenio.config import CFG_WEBSUBMIT_STORAGEDIR
import invenio.elmsubmit_config as elmsubmit_config
import invenio.elmsubmit_field_validation as elmsubmit_field_validation
from invenio.elmsubmit_misc import random_alphanum_string as _random_alphanum_string
import invenio.elmsubmit_generate_marc as elmsubmit_generate_marc
def process_email(email_string):
""" main entry point of the module, handles whole processing of the email
"""
# See if we can parse the email:
try:
e = elmsubmit_EZEmail.ParseMessage(email_string)
except elmsubmit_EZEmail.EZEmailParseError, err:
try:
if err.basic_email_info['from'] is None:
raise ValueError
response = elmsubmit_EZEmail.CreateMessage(to=err.basic_email_info['from'],
_from=elmsubmit_config.CFG_ELMSUBMIT_PEOPLE['admin'],
message=elmsubmit_config.CFG_ELMSUBMIT_NOLANGMSGS['bad_email'],
subject="Re: " + (err.basic_email_info.get('Subject', '') or ''),
references=[err.basic_email_info.get('message-id', '') or ''],
wrap_message=False)
_send_smtp(_from=elmsubmit_config.CFG_ELMSUBMIT_PEOPLE['admin'], to=err.basic_email_info['from'], msg=response)
raise elmsubmitError("Email could not be parsed. Reported to sender.")
except ValueError:
raise elmsubmitError("From: field of submission email could not be parsed. Could not report to sender.")
# See if we can parse the submission fields in the email:
try:
# Note that this returns a dictionary loaded with utf8 byte strings:
submission_dict = elmsubmit_submission_parser.parse_submission(e.primary_message.encode('utf8'))
# Add the submitter's email:
submission_dict['SuE'] = e.from_email.encode('utf8')
except elmsubmit_submission_parser.SubmissionParserError:
_notify(msg=e, response=elmsubmit_config.CFG_ELMSUBMIT_NOLANGMSGS['bad_submission'])
raise elmsubmitSubmissionError("Could not parse submission.")
# Check we have been given the required fields:
available_fields = submission_dict.keys()
if not len(filter(lambda x: x in available_fields, elmsubmit_config.CFG_ELMSUBMIT_REQUIRED_FIELDS)) == len(elmsubmit_config.CFG_ELMSUBMIT_REQUIRED_FIELDS):
response = elmsubmit_config.CFG_ELMSUBMIT_NOLANGMSGS['missing_fields_1'] + elmsubmit_config.CFG_ELMSUBMIT_NOLANGMSGS['missing_fields_2'] + "\n\n" + repr(elmsubmit_config.CFG_ELMSUBMIT_REQUIRED_FIELDS)
_notify(msg=e, response=response)
raise elmsubmitSubmissionError("Submission does not contain the required fields %s." % (elmsubmit_config.CFG_ELMSUBMIT_REQUIRED_FIELDS))
# Check that the fields we have been given validate OK:
map(lambda field: validate_submission_field(e, submission_dict, field, submission_dict[field]), elmsubmit_config.CFG_ELMSUBMIT_REQUIRED_FIELDS)
# Get a submission directory:
folder_name = 'elmsubmit_' + _random_alphanum_string(15)
- storage_dir = os.path.join(tmpdir, folder_name)
+ storage_dir = os.path.join(CFG_TMPDIR, folder_name)
try:
os.makedirs(storage_dir)
except EnvironmentError:
_notify(e=e, response=elmsubmit_config.CFG_ELMSUBMIT_NOLANGMSGS['temp_problem'])
_notify_admin(response="Could not create directory: %s" % (storage_dir))
raise elmsubmitError("Could not create directory: %s" % (storage_dir))
# Process the files list:
process_files(e, submission_dict, storage_dir)
#generate the appropriate Marc_XML for the submission
marc_xml = elmsubmit_generate_marc.generate_marc(submission_dict)
- # Write the Marc to a file in tmpdir
+ # Write the Marc to a file in CFG_TMPDIR
file_name = folder_name + '.xml'
- fullpath = os.path.join(tmpdir, file_name)
+ fullpath = os.path.join(CFG_TMPDIR, file_name)
try:
open(fullpath, 'wb').write(marc_xml)
except EnvironmentError:
response_email = elmsubmit_config.CFG_ELMSUBMIT_NOLANGMSGS['temp_problem']
admin_response_email = "There was a problem writing data to directory %s." % (storage_dir)
error = elmsubmitError("There was a problem writing data to directory %s." % (storage_dir))
return (response_email, admin_response_email, error)
# print marc_xml
return marc_xml
def validate_submission_field(msg, submission_dict, field, value):
try:
(field_documentation, fixed_value, validation_success) = getattr(elmsubmit_field_validation, field)(value.decode('utf8'))
submission_dict[field] = fixed_value.encode('utf8')
if not validation_success:
_notify(msg=msg, response=elmsubmit_config.CFG_ELMSUBMIT_NOLANGMSGS['bad_field'] + ' ' + field.upper() + '\n\n'
+ elmsubmit_config.CFG_ELMSUBMIT_NOLANGMSGS['correct_format'] + '\n\n' + field_documentation)
raise elmsubmitSubmissionError("Submission contains field %s which does not validate." % (field))
except AttributeError:
# No validation defined for this field:
pass
def process_files(msg, submission_dict, storage_dir):
""" extract the files out of the email and include them in the submission dict
"""
files = map(lambda filename: filename.decode('utf8'), submission_dict['files'])
# Check for the special filename 'all': if we find it, add all of
# the files attached to the email to the list of files to submit:
if 'all' in files:
f = lambda attachment: attachment['filename'] is not None
g = lambda attachment: attachment['filename'].lower()
attached_filenames = map(g, filter(f, msg.attachments))
files.extend(attached_filenames)
files = filter(lambda name: name != 'all', files)
# Filter out duplicate filenames:
_temp = {}
map(lambda filename: _temp.update({ filename : 1}), files)
files = _temp.keys()
# Get the files out of the mail message:
# file dictionary with file content needed for saving the file to proper directory
file_dict = {}
# file list needed to be included in submission_dict
file_list = []
for filename in files:
# See if we have special keyword self (which uses the mail message itself as the file):
if filename == 'self':
file_attachment = msg.original_message
filename = _random_alphanum_string(8) + '_' + msg.date_sent_utc.replace(' ', '_').replace(':', '-') + '.msg'
else:
nominal_attachments = filter(lambda attachment: attachment['filename'].lower() == filename, msg.attachments)
try:
file_attachment = nominal_attachments[0]['file']
except IndexError:
_notify(msg=msg, response=elmsubmit_config.CFG_ELMSUBMIT_NOLANGMSGS['missing_attachment'] + ' ' + filename)
raise elmsubmitSubmissionError("Submission is missing attached file: %s" % (filename))
file_dict[filename.encode('utf8')] = file_attachment
#merge the file name and the storage dir in the submission_dict
full_file_name = os.path.join(storage_dir, filename.encode('utf8'))
file_list.append(full_file_name)
submission_dict['files'] = file_list
def create_files((path, dictionary_or_data)):
"""
Take any dictionary, eg.:
{ 'title' : 'The loveliest title.',
'name' : 'Pete the dog.',
'info' : 'pdf file content'
}
and create a set of files in the given directory:
directory/title
directory/name
directory/info
so that each filename is a dictionary key, and the contents of
each file is the value that the key pointed to.
"""
fullpath = os.path.join(storage_dir, path)
try:
dictionary_or_data.has_key
except AttributeError:
open(fullpath, 'wb').write(dictionary_or_data)
try:
map(create_files, file_dict.items())
except EnvironmentError:
response_email = elmsubmit_config.CFG_ELMSUBMIT_NOLANGMSGS['temp_problem']
admin_response_email = "There was a problem writing data to directory %s." % (storage_dir)
error = elmsubmitError("There was a problem writing data to directory %s." % (storage_dir))
return (response_email, admin_response_email, error)
return None
def _send_smtp(_from, to, msg):
s = smtplib.SMTP()
s.connect(host=elmsubmit_config.CFG_ELMSUBMIT_SERVERS['smtp'])
s.sendmail(_from, to, msg)
s.close()
def _notify(msg, response):
response = elmsubmit_EZEmail.CreateMessage(to=[(msg.from_name, msg.from_email)],
_from=elmsubmit_config.CFG_ELMSUBMIT_PEOPLE['admin'],
message=response,
subject="Re: " + msg.subject,
references=[msg.message_id],
wrap_message=False)
_send_smtp(_from=elmsubmit_config.CFG_ELMSUBMIT_PEOPLE['admin'], to=msg.from_email, msg=response)
def _notify_admin(response):
response = elmsubmit_EZEmail.CreateMessage(to=elmsubmit_config.CFG_ELMSUBMIT_PEOPLE['admin'],
_from=elmsubmit_config.CFG_ELMSUBMIT_PEOPLE['admin'],
message=response,
subject="%s / elmsubmit problem." % cdsname,
wrap_message=False)
_send_smtp(_from=elmsubmit_config.CFG_ELMSUBMIT_PEOPLE['admin'], to=elmsubmit_config.CFG_ELMSUBMIT_PEOPLE['admin'], msg=response)
class elmsubmitError(Exception):
pass
class elmsubmitSubmissionError(elmsubmitError):
pass
class _elmsubmitPrivateError(Exception):
"""
An emtpy parent class for all the private errors in this module.
"""
pass
diff --git a/modules/elmsubmit/lib/elmsubmit_tests.py b/modules/elmsubmit/lib/elmsubmit_tests.py
index 0b50e825e..240d4358f 100644
--- a/modules/elmsubmit/lib/elmsubmit_tests.py
+++ b/modules/elmsubmit/lib/elmsubmit_tests.py
@@ -1,241 +1,241 @@
# -*- coding: utf-8 -*-
## $Id$
## CDS Invenio elmsubmit unit tests.
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-## General Public License for more details.
+## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""Unit tests for the elmsubmit."""
__revision__ = "$Id$"
import unittest
import re
import os
import os.path
from string import expandtabs, replace
-from invenio.config import tmpdir
+from invenio.config import CFG_TMPDIR
import invenio.elmsubmit_config as elmsubmit_config
import xml.dom.minidom
from invenio import elmsubmit
class MarcTest(unittest.TestCase):
""" elmsubmit - test for saniy """
def test_simple_marc(self):
"""elmsubmit - parsing simple email"""
try:
- f=open(os.path.join(tmpdir, elmsubmit_config.CFG_ELMSUBMIT_FILES['test_case_1']),'r')
+ f=open(os.path.join(CFG_TMPDIR, elmsubmit_config.CFG_ELMSUBMIT_FILES['test_case_1']),'r')
email = f.read()
f.close()
# let's try to parse an example email and compare it with the appropriate marc xml
x = elmsubmit.process_email(email)
y = """somethingSimko, TCERN"""
- # in order to properly compare the marc files we have to remove the FFT node, it includes a random generated file path
+ # in order to properly compare the marc files we have to remove the FFT node, it includes a random generated file path
dom_x = xml.dom.minidom.parseString(x)
datafields = dom_x.getElementsByTagName("datafield")
#remove all the FFT datafields
for node in datafields:
if (node.hasAttribute("tag") and node.getAttribute("tag") == "FFT"):
node.parentNode.removeChild(node)
node.unlink()
new_x = dom_x.toprettyxml("","\n")
dom_y = xml.dom.minidom.parseString(y)
new_y = dom_y.toprettyxml("","\n")
# 'normalize' the two XML MARC files for the purpose of comparing
new_x = expandtabs(new_x)
new_y = expandtabs(new_y)
new_x = new_x.replace(' ','')
new_y = new_y.replace(' ','')
new_x = new_x.replace('\n','')
new_y = new_y.replace('\n','')
# compare the two xml marcs
self.assertEqual(new_x,new_y)
except IOError:
self.fail("WARNING: the test case file does not exist; test not run.")
def test_complex_marc(self):
"""elmsubmit - parsing complex email with multiple fields"""
- try:
- f=open(os.path.join(tmpdir, elmsubmit_config.CFG_ELMSUBMIT_FILES['test_case_2']),'r')
+ try:
+ f=open(os.path.join(CFG_TMPDIR, elmsubmit_config.CFG_ELMSUBMIT_FILES['test_case_2']),'r')
email = f.read()
f.close()
# let's try to reproduce the demo XML MARC file by parsing it and printing it back:
x = elmsubmit.process_email(email)
y = """somethingLe Meur, J YMITJedrzejek, K JCERN2Favre, GCERN3test11test31test12test32test13test33test21test41test22test42test14test51test52Simko, TCERN"""
- # in order to properly compare the marc files we have to remove the FFT node, it includes a random generated file path
+ # in order to properly compare the marc files we have to remove the FFT node, it includes a random generated file path
dom_x = xml.dom.minidom.parseString(x)
datafields = dom_x.getElementsByTagName("datafield")
#remove all the FFT datafields
for node in datafields:
if (node.hasAttribute("tag") and node.getAttribute("tag") == "FFT"):
node.parentNode.removeChild(node)
node.unlink()
new_x = dom_x.toprettyxml("","\n")
dom_y = xml.dom.minidom.parseString(y)
new_y = dom_y.toprettyxml("","\n")
# 'normalize' the two XML MARC files for the purpose of comparing
new_x = expandtabs(new_x)
new_y = expandtabs(new_y)
new_x = new_x.replace(' ','')
new_y = new_y.replace(' ','')
new_x = new_x.replace('\n','')
new_y = new_y.replace('\n','')
-
+
# compare the two xml marcs
self.assertEqual(new_x,new_y)
except IOError:
self.fail("WARNING: the test case file does not exist; test not run.")
class FileStorageTest(unittest.TestCase):
""" testing proper storage of files """
def test_read_text_files(self):
"""elmsubmit - reading text files"""
try:
-
- f=open(os.path.join(tmpdir, elmsubmit_config.CFG_ELMSUBMIT_FILES['test_case_2']),'r')
+
+ f=open(os.path.join(CFG_TMPDIR, elmsubmit_config.CFG_ELMSUBMIT_FILES['test_case_2']),'r')
email = f.read()
f.close()
# let's try to see if the files were properly stored:
xml_marc = elmsubmit.process_email(email)
dom = xml.dom.minidom.parseString(xml_marc)
datafields = dom.getElementsByTagName("datafield")
# get the file addresses
file_list = []
for node in datafields:
if (node.hasAttribute("tag") and node.getAttribute("tag") == "FFT"):
children = node.childNodes
for child in children:
if (child.hasChildNodes()):
file_list.append(child.firstChild.nodeValue)
f=open(file_list[0], 'r')
x = f.read()
f.close()
x.lstrip()
x.rstrip()
y = """second attachment\n"""
self.assertEqual(x,y)
f=open(file_list[1], 'r')
x = f.read()
f.close()
x.lstrip()
x.rstrip()
y = """some attachment\n"""
self.assertEqual(x,y)
except IOError:
self.fail("WARNING: the test case file does not exist; test not run.")
-
+
def create_test_suite():
"""Return test suite for the elmsubmit module"""
return unittest.TestSuite((unittest.makeSuite(MarcTest,'test'), unittest.makeSuite(FileStorageTest,'test')))
# unittest.makeSuite(BadInputTreatmentTest,'test'),
# unittest.makeSuite(GettingFieldValuesTest,'test'),
# unittest.makeSuite(AccentedUnicodeLettersTest,'test')))
if __name__ == '__main__':
unittest.TextTestRunner(verbosity=2).run(create_test_suite())
-
+
diff --git a/modules/miscutil/bin/testsuite.in b/modules/miscutil/bin/testsuite.in
index c11a9667b..6304f88d0 100644
--- a/modules/miscutil/bin/testsuite.in
+++ b/modules/miscutil/bin/testsuite.in
@@ -1,118 +1,118 @@
#!@PYTHON@
## -*- mode: python; coding: utf-8; -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""Run CDS Invenio test suite."""
__revision__ = "$Id$"
import getopt
import unittest
import sys
# we first import webinterface_tests to be sure to have the fake
# Apache environment working:
from invenio import webinterface_tests
# now we import the rest:
-from invenio.config import version
+from invenio.config import CFG_VERSION
from invenio import search_engine_tests
from invenio import bibindex_engine_tests
from invenio import bibindex_engine_stemmer_tests
from invenio import bibrecord_tests
from invenio import bibrank_citation_indexer_tests
from invenio import bibrank_citation_searcher_tests
from invenio import bibrank_downloads_indexer_tests
from invenio import bibrank_record_sorter_tests
from invenio import bibrank_tag_based_indexer_tests
from invenio import oai_repository_tests
from invenio import bibconvert_tests
from invenio import errorlib_tests
from invenio import elmsubmit_tests
from invenio import bibformat_engine_tests
from invenio import websearch_external_collections_getter_tests
from invenio import webuser_tests
from invenio import webgroup_tests
from invenio import dbquery_tests
from invenio import dateutils_tests
from invenio import htmlutils_tests
from invenio import access_control_firerole_tests
from invenio import intbitset_tests
from invenio import textutils_tests
def usage():
"""Print usage info on standard error output."""
sys.stderr.write("Usage: %s [options]\n" % sys.argv[0])
sys.stderr.write("General options:\n")
sys.stderr.write(" -h, --help \t\t Print this help.\n")
sys.stderr.write(" -V, --version \t\t Print version information.\n")
sys.stderr.write("Description: run CDS Invenio test suite.\n")
return
def create_all_test_suites():
"""Return all tests suites for all CDS Invenio modules."""
return unittest.TestSuite((search_engine_tests.create_test_suite(),
bibindex_engine_tests.create_test_suite(),
bibindex_engine_stemmer_tests.create_test_suite(),
bibrecord_tests.create_test_suite(),
bibrank_citation_indexer_tests.create_test_suite(),
bibrank_citation_searcher_tests.create_test_suite(),
bibrank_downloads_indexer_tests.create_test_suite(),
bibrank_record_sorter_tests.create_test_suite(),
bibrank_tag_based_indexer_tests.create_test_suite(),
oai_repository_tests.create_test_suite(),
bibconvert_tests.create_test_suite(),
errorlib_tests.create_test_suite(),
elmsubmit_tests.create_test_suite(),
webinterface_tests.create_test_suite(),
bibformat_engine_tests.create_test_suite(),
websearch_external_collections_getter_tests.create_test_suite(),
webuser_tests.create_test_suite(),
webgroup_tests.create_test_suite(),
dbquery_tests.create_test_suite(),
dateutils_tests.create_test_suite(),
htmlutils_tests.create_test_suite(),
access_control_firerole_tests.create_test_suite(),
intbitset_tests.create_test_suite(),
textutils_tests.create_test_suite(),
))
def print_info_line():
"""Prints info line about tests to be executed."""
- info_line = """CDS Invenio v%s test suite results:""" % version
+ info_line = """CDS Invenio v%s test suite results:""" % CFG_VERSION
sys.stderr.write(info_line + "\n")
sys.stderr.write("=" * len(info_line) + "\n")
if __name__ == "__main__":
try:
opts, args = getopt.getopt(sys.argv[1:], "hV", ["help", "version"])
except getopt.GetoptError:
usage()
sys.exit(2)
for opt in opts:
if opt[0] in ("-V","--version"):
print __revision__
sys.exit(0)
elif opt[0] in ("-h","--help"):
usage()
sys.exit(0)
print_info_line()
unittest.TextTestRunner(verbosity=2).run(create_all_test_suites())
diff --git a/modules/miscutil/lib/dbquery.py b/modules/miscutil/lib/dbquery.py
index 4a6291684..97a297669 100644
--- a/modules/miscutil/lib/dbquery.py
+++ b/modules/miscutil/lib/dbquery.py
@@ -1,361 +1,361 @@
## $Id$
## CDS Invenio utility to run SQL queries. The core is taken from
## modpython FAQ and modified to suit our needs. The insert_id() is
## inspired by Erik Forsberg's mod_python slides.
## FIXME: note that this version of persistent connectivity to the
## database is not thread-safe; it works allright in the prefork model
## only (apache2-mpm-prefork). We should rather replace it with the
## connection pool technique when time permits. See:
## http://modpython.org/FAQ/faqw.py?req=show&file=faq03.003.htp
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""CDS Invenio utility to run SQL queries."""
__revision__ = "$Id$"
# dbquery clients can import these from here:
# pylint: disable-msg=W0611
from MySQLdb import escape_string
from MySQLdb import Warning, Error, InterfaceError, DataError, \
DatabaseError, OperationalError, IntegrityError, \
InternalError, NotSupportedError, \
ProgrammingError
import string
import time
import marshal
from zlib import compress, decompress
from thread import get_ident
from invenio.config import CFG_ACCESS_CONTROL_LEVEL_SITE, \
CFG_MAX_CACHED_QUERIES, CFG_MISCUTIL_USE_SQLALCHEMY
if CFG_MISCUTIL_USE_SQLALCHEMY:
try:
import sqlalchemy.pool as pool
import MySQLdb as mysqldb
mysqldb = pool.manage(mysqldb, use_threadlocal=True)
connect = mysqldb.connect
except ImportError:
CFG_MISCUTIL_USE_SQLALCHEMY = False
from MySQLdb import connect
else:
from MySQLdb import connect
## DB config variables. These variables are to be set in invenio.conf
## by admins and then replaced in situ in this file by calling
## "inveniocfg --update-dbexec".
## Note that they are defined here and not in config.py in order to
## prevent them from being exported accidentally elsewhere, as no-one
## should know DB credentials but this file.
CFG_DATABASE_HOST = 'localhost'
CFG_DATABASE_NAME = 'cdsinvenio'
CFG_DATABASE_USER = 'cdsinvenio'
CFG_DATABASE_PASS = 'my123p$ss'
_DB_CONN = {}
def _db_login(relogin = 0):
"""Login to the database."""
## Note: we are using "use_unicode=False", because we want to
## receive strings from MySQL as Python UTF-8 binary string
## objects, not as Python Unicode string objects, as of yet.
## Note: "charset='utf8'" is needed for recent MySQLdb versions
## (such as 1.2.1_p2 and above). For older MySQLdb versions such
## as 1.2.0, an explicit "init_command='SET NAMES utf8'" parameter
## would constitute an equivalent. But we are not bothering with
## older MySQLdb versions here, since we are recommending to
## upgrade to more recent versions anyway.
if CFG_MISCUTIL_USE_SQLALCHEMY:
return connect(host=CFG_DATABASE_HOST, db=CFG_DATABASE_NAME,
user=CFG_DATABASE_USER, passwd=CFG_DATABASE_PASS,
use_unicode=False, charset='utf8')
else:
thread_ident = get_ident()
if relogin:
_DB_CONN[thread_ident] = connect(host=CFG_DATABASE_HOST,
db=CFG_DATABASE_NAME,
user=CFG_DATABASE_USER,
passwd=CFG_DATABASE_PASS,
use_unicode=False, charset='utf8')
return _DB_CONN[thread_ident]
else:
if _DB_CONN.has_key(thread_ident):
return _DB_CONN[thread_ident]
else:
_DB_CONN[thread_ident] = connect(host=CFG_DATABASE_HOST,
db=CFG_DATABASE_NAME,
user=CFG_DATABASE_USER,
passwd=CFG_DATABASE_PASS,
use_unicode=False, charset='utf8')
return _DB_CONN[thread_ident]
def _db_logout():
"""Close a connection."""
try:
del _DB_CONN[get_ident()]
except KeyError:
pass
try:
_db_cache
except NameError:
_db_cache = {}
def run_sql_cached(sql, param=None, n=0, with_desc=0, affected_tables=[]):
"""
Run the SQL query and cache the SQL command for later reuse.
@param param: tuple of string params to insert in the query
(see notes below)
@param n: number of tuples in result (0 for unbounded)
@param with_desc: if true, will return a
DB API 7-tuple describing columns in query
@param affected_tables is a list of tablenames of affected tables,
used to decide whether we should update the cache or whether we
can return cached result, depending on the last modification time
for corresponding tables. If empty, and if the cached result is
present in the cache, always return the cached result without
recomputing it. (This is useful to speed up queries that operate
on objects that virtually never change, e.g. list of defined
logical fields, that remain usually constant in between Apache
restarts. Note that this may be a bit dangerous as a default for
any query.)
@return the result as provided by run_sql
Note that it is pointless and even wrong to use this function with
SQL commands different from SELECT.
"""
global _db_cache
key = repr((sql, param, n, with_desc))
# Garbage collecting needed?
if len(_db_cache) >= CFG_MAX_CACHED_QUERIES:
_db_cache = {}
# Query already in the cache?
if not _db_cache.has_key(key) or \
(affected_tables and _db_cache[key][1] < max([get_table_update_time(table) for table in affected_tables])):
# Let's update the cache
result = run_sql(sql, param, n, with_desc)
_db_cache[key] = (result, time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
### log_sql_query_cached(key, result, False) ### UNCOMMENT ONLY IF you REALLY want to log all queries
else:
result = _db_cache[key][0]
### log_sql_query_cached(key, result, True) ### UNCOMMENT ONLY IF you REALLY want to log all queries
return result
def run_sql(sql, param=None, n=0, with_desc=0):
"""Run SQL on the server with PARAM and return result.
@param param: tuple of string params to insert in the query
(see notes below)
@param n: number of tuples in result (0 for unbounded)
@param with_desc: if true, will return a
DB API 7-tuple describing columns in query
@return: if SELECT, SHOW, DESCRIBE statements: tuples of data, followed
by description if parameter
provided
if INSERT: last row id.
else: SQL result as provided by database
When the site is closed for maintenance (as governed by the
config variable CFG_ACCESS_CONTROL_LEVEL_SITE), do not attempt
to run any SQL queries but return empty list immediately.
Useful to be able to have the website up while MySQL database
is down for maintenance, hot copies, table repairs, etc.
In case of problems, exceptions are returned according to the
Python DB API 2.0. The client code can import them from this
file and catch them.
"""
if CFG_ACCESS_CONTROL_LEVEL_SITE == 2:
# do not connect to the database as the site is closed for maintenance:
return []
### log_sql_query(sql, param) ### UNCOMMENT ONLY IF you REALLY want to log all queries
if param:
param = tuple(param)
try:
db = _db_login()
cur = db.cursor()
rc = cur.execute(sql, param)
except OperationalError: # unexpected disconnect, bad malloc error, etc
# FIXME: now reconnect is always forced, we may perhaps want to ping() first?
try:
db = _db_login(relogin = 1)
cur = db.cursor()
rc = cur.execute(sql, param)
except OperationalError: # again an unexpected disconnect, bad malloc error, etc
raise
if string.upper(string.split(sql)[0]) in ("SELECT", "SHOW", "DESC", "DESCRIBE"):
if n:
recset = cur.fetchmany(n)
else:
recset = cur.fetchall()
if with_desc:
return recset, cur.description
else:
return recset
else:
if string.upper(string.split(sql)[0]) == "INSERT":
rc = cur.lastrowid
return rc
def blob_to_string(ablob):
"""Return string representation of ABLOB. Useful to treat MySQL
BLOBs in the same way for both recent and old MySQLdb versions.
"""
if type(ablob) is str:
# BLOB is already a string in MySQLdb 0.9.2
return ablob
else:
# BLOB is array.array in MySQLdb 1.0.0 and later
return ablob.tostring()
def log_sql_query_cached(key, result, hit_p):
"""Log SQL query cached into prefix/var/log/dbquery.log log file. In order
to enable logging of all SQL queries, please uncomment two lines
in run_sql_cached() above. Useful for fine-level debugging only!
"""
- from invenio.config import logdir
+ from invenio.config import CFG_LOGDIR
from invenio.dateutils import convert_datestruct_to_datetext
from invenio.textutils import indent_text
- log_path = logdir + '/dbquery.log'
+ log_path = CFG_LOGDIR + '/dbquery.log'
date_of_log = convert_datestruct_to_datetext(time.localtime())
message = date_of_log + '-->\n'
message += indent_text('Key:\n' + indent_text(str(key), 2, wrap=True), 2)
message += indent_text('Result:\n' + indent_text(str(result) + (hit_p and ' HIT' or ' MISS'), 2, wrap=True), 2)
message += 'Cached queries: %i\n\n' % len(_db_cache)
try:
log_file = open(log_path, 'a+')
log_file.writelines(message)
log_file.close()
except:
pass
def log_sql_query(sql, param=None):
"""Log SQL query into prefix/var/log/dbquery.log log file. In order
to enable logging of all SQL queries, please uncomment one line
in run_sql() above. Useful for fine-level debugging only!
"""
- from invenio.config import logdir
+ from invenio.config import CFG_LOGDIR
from invenio.dateutils import convert_datestruct_to_datetext
from invenio.textutils import indent_text
- log_path = logdir + '/dbquery.log'
+ log_path = CFG_LOGDIR + '/dbquery.log'
date_of_log = convert_datestruct_to_datetext(time.localtime())
message = date_of_log + '-->\n'
message += indent_text('Query:\n' + indent_text(str(sql), 2, wrap=True), 2)
message += indent_text('Params:\n' + indent_text(str(param), 2, wrap=True), 2)
message += '-----------------------------\n\n'
try:
log_file = open(log_path, 'a+')
log_file.writelines(message)
log_file.close()
except:
pass
def get_table_update_time(tablename):
"""Return update time of TABLENAME. TABLENAME can contain
wildcard `%' in which case we return the maximum update time
value.
"""
# Note: in order to work with all of MySQL 4.0, 4.1, 5.0, this
# function uses SHOW TABLE STATUS technique with a dirty column
# position lookup to return the correct value. (Making use of
# Index_Length column that is either of type long (when there are
# some indexes defined) or of type None (when there are no indexes
# defined, e.g. table is empty). When we shall use solely
# MySQL-5.0, we can employ a much cleaner technique of using
# SELECT UPDATE_TIME FROM INFORMATION_SCHEMA.TABLES WHERE
# table_name='collection'.
res = run_sql("SHOW TABLE STATUS LIKE '%s'" % tablename)
update_times = [] # store all update times
for row in res:
if type(row[10]) is long or \
row[10] is None:
# MySQL-4.1 and 5.0 have creation_time in 11th position,
# so return next column:
update_times.append(str(row[12]))
else:
# MySQL-4.0 has creation_time in 10th position, which is
# of type datetime.datetime or str (depending on the
# version of MySQLdb), so return next column:
update_times.append(str(row[11]))
return max(update_times)
def get_table_status_info(tablename):
"""Return table status information on TABLENAME. Returned is a
dict with keys like Name, Rows, Data_length, Max_data_length,
etc. If TABLENAME does not exist, return empty dict.
"""
# Note: again a hack so that it works on all MySQL 4.0, 4.1, 5.0
res = run_sql("SHOW TABLE STATUS LIKE '%s'" % tablename)
table_status_info = {} # store all update times
for row in res:
if type(row[10]) is long or \
row[10] is None:
# MySQL-4.1 and 5.0 have creation time in 11th position:
table_status_info['Name'] = row[0]
table_status_info['Rows'] = row[4]
table_status_info['Data_length'] = row[6]
table_status_info['Max_data_length'] = row[8]
table_status_info['Create_time'] = row[11]
table_status_info['Update_time'] = row[12]
else:
# MySQL-4.0 has creation_time in 10th position, which is
# of type datetime.datetime or str (depending on the
# version of MySQLdb):
table_status_info['Name'] = row[0]
table_status_info['Rows'] = row[3]
table_status_info['Data_length'] = row[5]
table_status_info['Max_data_length'] = row[7]
table_status_info['Create_time'] = row[10]
table_status_info['Update_time'] = row[11]
return table_status_info
def serialize_via_marshal(obj):
"""Serialize Python object via marshal into a compressed string."""
return compress(marshal.dumps(obj))
def deserialize_via_marshal(string):
"""Decompress and deserialize string into a Python object via marshal."""
return marshal.loads(decompress(string))
try:
import psyco
psyco.bind(serialize_via_marshal)
psyco.bind(deserialize_via_marshal)
except StandardError, e:
pass
diff --git a/modules/miscutil/lib/errorlib.py b/modules/miscutil/lib/errorlib.py
index b72073c96..4e694ce09 100644
--- a/modules/miscutil/lib/errorlib.py
+++ b/modules/miscutil/lib/errorlib.py
@@ -1,444 +1,444 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
""" Error handling library """
__revision__ = "$Id$"
import traceback
import os
import sys
import time
from cStringIO import StringIO
-from invenio.config import cdslang, logdir, alertengineemail, adminemail, supportemail, cdsname, weburl
+from invenio.config import cdslang, CFG_LOGDIR, CFG_WEBALERT_ALERT_ENGINE_EMAIL, adminemail, supportemail, cdsname, weburl
from invenio.miscutil_config import CFG_MISCUTIL_ERROR_MESSAGES
from invenio.urlutils import wash_url_argument
from invenio.messages import wash_language, gettext_set_language
from invenio.dateutils import convert_datestruct_to_datetext
def get_client_info(req):
"""
Returns a dictionary with client information
@param req: mod_python request
"""
try:
return \
{ 'host' : req.hostname,
'url' : req.unparsed_uri,
'time' : convert_datestruct_to_datetext(time.localtime()),
'browser' : req.headers_in.has_key('User-Agent') and req.headers_in['User-Agent'] or "N/A",
'client_ip' : req.connection.remote_ip
}
except:
return {}
def get_pretty_wide_client_info(req):
"""Return in a pretty way all the avilable information about the current
user/client"""
if req:
from invenio.webuser import collect_user_info
user_info = collect_user_info(req)
keys = user_info.keys()
keys.sort()
max_key = max([len(key) for key in keys])
ret = ""
fmt = "%% %is: %%s\n" % max_key
for key in keys:
ret += fmt % (key, user_info[key])
return ret
else:
return "No client information available"
def get_tracestack():
"""
If an exception has been caught, return the system tracestack or else return tracestack of what is currently in the stack
"""
if traceback.format_tb(sys.exc_info()[2]):
delimiter = "\n"
tracestack_pretty = "Traceback: \n%s" % delimiter.join(traceback.format_tb(sys.exc_info()[2]))
else:
tracestack = traceback.extract_stack()[:-1] #force traceback except for this call
tracestack_pretty = "%sForced traceback (most recent call last)" % (' '*4,)
for trace_tuple in tracestack:
tracestack_pretty += """
File "%(file)s", line %(line)s, in %(function)s
%(text)s""" % \
{ 'file' : trace_tuple[0],
'line' : trace_tuple[1],
'function' : trace_tuple[2],
'text' : trace_tuple[3] is not None and str(trace_tuple[3]) or ""
}
return tracestack_pretty
def register_exception(force_stack=False, stream='error', req=None, prefix='', suffix='', alert_admin=False):
"""
log error exception to invenio.err and warning exception to invenio.log
errors will be logged with client information (if req is given)
@param force_stack: when True stack is always printed, while when False,
stack is printed only whenever the Exception type is not containing the
word Invenio
@param stream: 'error' or 'warning'
@param req = mod_python request
@param prefix a message to be printed before the exception in
the log
@param suffix a message to be printed before the exception in
the log
@param alert_admin wethever to send the exception to the administrator via email
@return 1 if successfully wrote to stream, 0 if not
"""
try:
## Let's extract exception information
exc_info = sys.exc_info()
if exc_info[0]:
## We found an exception.
## We want to extract the name of the Exception
exc_name = exc_info[0].__name__
exc_value = str(exc_info[1])
## Let's record when and where and what
www_data = "%(time)s -> %(name)s: %(value)s" % {
'time' : time.strftime("%Y-%m-%d %H:%M:%S"),
'name' : exc_name,
'value' : exc_value
}
## Let's retrieve contextual user related info, if any
try:
client_data = get_pretty_wide_client_info(req)
except Exception, e:
client_data = "Error in retrieving contextual information: %s" % e
## Let's extract the traceback
if not exc_name.startswith('Invenio') or force_stack:
## We put a large traceback only if requested
## or the Exception is not an Invenio one.
tracestack = traceback.extract_stack()[-5:-2]
tracestack_data = "%sForced traceback (most recent call last)" % (' '*4,)
for trace_tuple in tracestack:
tracestack_data += """
File "%(file)s", line %(line)s, in %(function)s
%(text)s""" % \
{ 'file' : trace_tuple[0],
'line' : trace_tuple[1],
'function' : trace_tuple[2],
'text' : trace_tuple[3] is not None and str(trace_tuple[3]) or ""
}
else:
tracestack_data = ""
exception_data = StringIO()
## Let's print the exception (and the traceback)
traceback.print_exception(exc_info[0], exc_info[1], exc_info[2], None, exception_data)
exception_data = exception_data.getvalue()
log_stream = StringIO()
email_stream = StringIO()
## If a prefix was requested let's print it
if prefix:
print >> log_stream, prefix
print >> email_stream, prefix
print >> email_stream, "The following problem occurred on %s" % weburl
print >> email_stream, ">>> Registered exception"
print >> log_stream, www_data
print >> email_stream, www_data
print >> email_stream, ">>> User details"
print >> log_stream, client_data
print >> email_stream, client_data
print >> email_stream, ">>> Traceback details"
if tracestack_data:
print >> log_stream, tracestack_data
print >> email_stream, tracestack_data
print >> log_stream, exception_data
print >> email_stream, exception_data
## If a suffix was requested let's print it
if suffix:
print >> log_stream, suffix
print >> email_stream, suffix
log_text = log_stream.getvalue()
email_text = email_stream.getvalue()
## Preparing the exception dump
stream = stream=='error' and 'err' or 'log'
## We now have the whole trace
written_to_log = False
try:
## Let's try to write into the log.
- open(os.path.join(logdir, 'invenio.' + stream), 'a').write(log_text)
+ open(os.path.join(CFG_LOGDIR, 'invenio.' + stream), 'a').write(log_text)
written_to_log = True
finally:
if alert_admin or not written_to_log:
## If requested or if it's impossible to write in the log
from invenio.mailutils import send_email
send_email(adminemail, adminemail, subject='Registered exception at %s' % weburl, content=email_text)
return 1
else:
return 0
except Exception, e:
- print >> sys.stderr, "Error in registering exception to '%s': '%s'" % (logdir + '/invenio.' + stream, e)
+ print >> sys.stderr, "Error in registering exception to '%s': '%s'" % (CFG_LOGDIR + '/invenio.' + stream, e)
return 0
def register_errors(errors_or_warnings_list, stream, req=None):
"""
log errors to invenio.err and warnings to invenio.log
errors will be logged with client information (if req is given) and a tracestack
warnings will be logged with just the warning message
@param errors_or_warnings_list: list of tuples (err_name, err_msg)
err_name = ERR_ + %(module_directory_name)s + _ + %(error_name)s #ALL CAPS
err_name must be stored in file: module_directory_name + _config.py
as the key for dict with name: CFG_ + %(module_directory_name)s + _ERROR_MESSAGES
@param stream: 'error' or 'warning'
@param req = mod_python request
@return tuple integer 1 if successfully wrote to stream, integer 0 if not
will append another error to errors_list if unsuccessful
"""
client_info_dict = ""
if stream == "error":
# call the stack trace now
tracestack_pretty = get_tracestack()
# if req is given, get client info
if req:
client_info_dict = get_client_info(req)
if client_info_dict:
client_info = \
'''URL: http://%(host)s%(url)s
Browser: %(browser)s
Client: %(client_ip)s''' % client_info_dict
else:
client_info = "No client information available"
else:
client_info = "No client information available"
# check arguments
errors_or_warnings_list = wash_url_argument(errors_or_warnings_list, 'list')
stream = wash_url_argument(stream, 'str')
for etuple in errors_or_warnings_list:
etuple = wash_url_argument(etuple, 'tuple')
# check stream arg for presence of [error,warning]; when none, add error and default to warning
if stream == 'error':
stream = 'err'
elif stream == 'warning':
stream = 'log'
else:
stream = 'log'
error = 'ERR_MISCUTIL_BAD_FILE_ARGUMENT_PASSED'
errors_or_warnings_list.append((error, eval(CFG_MISCUTIL_ERROR_MESSAGES[error])% stream))
# update log_errors
- stream_location = os.path.join(logdir, '/invenio.' + stream)
+ stream_location = os.path.join(CFG_LOGDIR, '/invenio.' + stream)
errors = ''
for etuple in errors_or_warnings_list:
try:
errors += "%s%s : %s \n " % (' '*4*7+' ', etuple[0], etuple[1])
except:
errors += "%s%s \n " % (' '*4*7+' ', etuple)
if errors:
errors = errors[(4*7+1):-3] # get rid of begining spaces and last '\n'
msg = """
%(time)s --> %(errors)s%(error_file)s""" % \
{ 'time' : client_info_dict and client_info_dict['time'] or time.strftime("%Y-%m-%d %H:%M:%S"),
'errors' : errors,
'error_file' : stream=='err' and "\n%s%s\n%s\n" % (' '*4, client_info, tracestack_pretty) or ""
}
try:
stream_to_write = open(stream_location, 'a+')
stream_to_write.writelines(msg)
stream_to_write.close()
return_value = 1
except :
error = 'ERR_MISCUTIL_WRITE_FAILED'
errors_or_warnings_list.append((error, CFG_MISCUTIL_ERROR_MESSAGES[error] % stream_location))
return_value = 0
return return_value
def get_msg_associated_to_code(err_code, stream='error'):
"""
Returns string of code
@param code: error or warning code
@param stream: 'error' or 'warning'
@return tuple (err_code, formatted_message)
"""
err_code = wash_url_argument(err_code, 'str')
stream = wash_url_argument(stream, 'str')
try:
module_directory_name = err_code.split('_')[1].lower()
module_config = module_directory_name + '_config'
module_dict_name = "CFG_" + module_directory_name.upper() + "_%s_MESSAGES" % stream.upper()
module = __import__(module_config, globals(), locals(), [module_dict_name])
module_dict = getattr(module, module_dict_name)
err_msg = module_dict[err_code]
except ImportError:
error = 'ERR_MISCUTIL_IMPORT_ERROR'
err_msg = CFG_MISCUTIL_ERROR_MESSAGES[error] % (err_code,
module_config)
err_code = error
except AttributeError:
error = 'ERR_MISCUTIL_NO_DICT'
err_msg = CFG_MISCUTIL_ERROR_MESSAGES[error] % (err_code,
module_config,
module_dict_name)
err_code = error
except KeyError:
error = 'ERR_MISCUTIL_NO_MESSAGE_IN_DICT'
err_msg = CFG_MISCUTIL_ERROR_MESSAGES[error] % (err_code,
module_config + '.' + module_dict_name)
err_code = error
except:
error = 'ERR_MISCUTIL_UNDEFINED_ERROR'
err_msg = CFG_MISCUTIL_ERROR_MESSAGES[error] % err_code
err_code = error
return (err_code, err_msg)
def get_msgs_for_code_list(code_list, stream='error', ln=cdslang):
"""
@param code_list: list of tuples [(err_name, arg1, ..., argN), ...]
err_name = ERR_ + %(module_directory_name)s + _ + %(error_name)s #ALL CAPS
err_name must be stored in file: module_directory_name + _config.py
as the key for dict with name: CFG_ + %(module_directory_name)s + _ERROR_MESSAGES
For warnings, same thing except:
err_name can begin with either 'ERR' or 'WRN'
dict name ends with _warning_messages
@param stream: 'error' or 'warning'
@return list of tuples of length 2 [('ERR_...', err_msg), ...]
if code_list empty, will return None.
if errors retrieving error messages, will append an error to the list
"""
ln = wash_language(ln)
_ = gettext_set_language(ln)
out = []
if type(code_list) is None:
return None
code_list = wash_url_argument(code_list, 'list')
stream = wash_url_argument(stream, 'str')
for code_tuple in code_list:
if not(type(code_tuple) is tuple):
code_tuple = (code_tuple,)
nb_tuple_args = len(code_tuple) - 1
err_code = code_tuple[0]
if stream == 'error' and not err_code.startswith('ERR'):
error = 'ERR_MISCUTIL_NO_ERROR_MESSAGE'
out.append((error, eval(CFG_MISCUTIL_ERROR_MESSAGES[error])))
continue
elif stream == 'warning' and not (err_code.startswith('ERR') or err_code.startswith('WRN')):
error = 'ERR_MISCUTIL_NO_WARNING_MESSAGE'
out.append((error, eval(CFG_MISCUTIL_ERROR_MESSAGES[error])))
continue
(new_err_code, err_msg) = get_msg_associated_to_code(err_code, stream)
if err_msg[:2] == '_(' and err_msg[-1] == ')':
# err_msg is internationalized
err_msg = eval(err_msg)
nb_msg_args = err_msg.count('%') - err_msg.count('%%')
parsing_error = ""
if new_err_code != err_code or nb_msg_args == 0:
# undefined_error or immediately displayable error
out.append((new_err_code, err_msg))
continue
try:
if nb_msg_args == nb_tuple_args:
err_msg = err_msg % code_tuple[1:]
elif nb_msg_args < nb_tuple_args:
err_msg = err_msg % code_tuple[1:nb_msg_args+1]
parsing_error = 'ERR_MISCUTIL_TOO_MANY_ARGUMENT'
parsing_error_message = eval(CFG_MISCUTIL_ERROR_MESSAGES[parsing_error])
parsing_error_message %= code_tuple[0]
elif nb_msg_args > nb_tuple_args:
code_tuple = list(code_tuple)
for dummy in range(nb_msg_args - nb_tuple_args):
code_tuple.append('???')
code_tuple = tuple(code_tuple)
err_msg = err_msg % code_tuple[1:]
parsing_error = 'ERR_MISCUTIL_TOO_FEW_ARGUMENT'
parsing_error_message = eval(CFG_MISCUTIL_ERROR_MESSAGES[parsing_error])
parsing_error_message %= code_tuple[0]
except:
parsing_error = 'ERR_MISCUTIL_BAD_ARGUMENT_TYPE'
parsing_error_message = eval(CFG_MISCUTIL_ERROR_MESSAGES[parsing_error])
parsing_error_message %= code_tuple[0]
out.append((err_code, err_msg))
if parsing_error:
out.append((parsing_error, parsing_error_message))
if not(out):
out = None
return out
def send_error_report_to_admin(header, url, time_msg,
browser, client, error,
sys_error, traceback_msg):
"""
Sends an email to the admin with client info and tracestack
"""
- from_addr = '%s Alert Engine <%s>' % (cdsname, alertengineemail)
+ from_addr = '%s Alert Engine <%s>' % (cdsname, CFG_WEBALERT_ALERT_ENGINE_EMAIL)
to_addr = adminemail
body = """
The following error was seen by a user and sent to you.
%(contact)s
%(header)s
%(url)s
%(time)s
%(browser)s
%(client)s
%(error)s
%(sys_error)s
%(traceback)s
Please see the %(logdir)s/invenio.err for traceback details.""" % \
{ 'header' : header,
'url' : url,
'time' : time_msg,
'browser' : browser,
'client' : client,
'error' : error,
'sys_error' : sys_error,
'traceback' : traceback_msg,
- 'logdir' : logdir,
+ 'logdir' : CFG_LOGDIR,
'contact' : "Please contact %s quoting the following information:" % (supportemail,) #! is support email always cds?
}
from invenio.mailutils import send_email
send_email(from_addr, to_addr, subject="Error notification", content=body)
diff --git a/modules/miscutil/lib/errorlib_regression_tests.py b/modules/miscutil/lib/errorlib_regression_tests.py
index 7c448a171..2e24ac646 100644
--- a/modules/miscutil/lib/errorlib_regression_tests.py
+++ b/modules/miscutil/lib/errorlib_regression_tests.py
@@ -1,80 +1,80 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""errorlib Regression Test Suite."""
__revision__ = "$Id$"
import unittest
import os
from invenio.errorlib import register_exception
-from invenio.config import weburl, logdir
+from invenio.config import weburl, CFG_LOGDIR
from invenio.testutils import make_test_suite, warn_user_about_tests_and_run, \
test_web_page_content, merge_error_messages
class ErrorlibWebPagesAvailabilityTest(unittest.TestCase):
"""Check errorlib web pages whether they are up or not."""
def test_your_baskets_pages_availability(self):
"""errorlib - availability of error sending pages"""
baseurl = weburl + '/error/'
_exports = ['', 'send']
error_messages = []
for url in [baseurl + page for page in _exports]:
error_messages.extend(test_web_page_content(url))
if error_messages:
self.fail(merge_error_messages(error_messages))
return
class ErrorlibRegisterExceptionTest(unittest.TestCase):
"""Check errorlib register_exception functionality."""
def test_simple_register_exception(self):
"""errorlib - simple usage of register_exception"""
try:
raise Exception('test-exception')
except:
result = register_exception()
- log_content = open(os.path.join(logdir, 'invenio.err')).read()
+ log_content = open(os.path.join(CFG_LOGDIR, 'invenio.err')).read()
self.failUnless('test_simple_register_exception' in log_content)
self.failUnless('test-exception' in log_content)
self.assertEqual(1, result, "register_exception have not returned 1")
def test_alert_admin_register_exception(self):
"""errorlib - alerting admin with register_exception"""
try:
raise Exception('test-exception')
except:
result = register_exception(alert_admin=True)
- log_content = open(os.path.join(logdir, 'invenio.err')).read()
+ log_content = open(os.path.join(CFG_LOGDIR, 'invenio.err')).read()
self.failUnless('test_alert_admin_register_exception' in log_content)
self.failUnless('test-exception' in log_content)
self.assertEqual(1, result, "register_exception have not returned 1")
test_suite = make_test_suite(ErrorlibWebPagesAvailabilityTest,
ErrorlibRegisterExceptionTest)
if __name__ == "__main__":
warn_user_about_tests_and_run(test_suite)
diff --git a/modules/miscutil/lib/inveniocfg.py b/modules/miscutil/lib/inveniocfg.py
index e45ba3f5a..1246fb3fb 100644
--- a/modules/miscutil/lib/inveniocfg.py
+++ b/modules/miscutil/lib/inveniocfg.py
@@ -1,848 +1,852 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
Invenio configuration and administration CLI tool.
Usage: inveniocfg [options]
General options:
-h, --help print this help
-V, --version print version number
Options to finish your installation:
--create-apache-conf create Apache configuration files
--create-tables create DB tables for Invenio
--drop-tables drop DB tables of Invenio
Options to set up and test a demo site:
--create-demo-site create demo site
--load-demo-records load demo records
--remove-demo-records remove demo records, keeping demo site
--drop-demo-site drop demo site configurations too
--run-unit-tests run unit test suite (need DB connectivity)
--run-regression-tests run regression test suite (need demo site)
Options to update config files in situ:
--update-all perform all the update options
--update-config-py update config.py file from invenio.conf file
--update-dbquery-py update dbquery.py with DB credentials from invenio.conf
--update-dbexec update dbexec with DB credentials from invenio.conf
- --update-bibconvert-tpl update bibconvert templates with WEBURL from invenio.conf
+ --update-bibconvert-tpl update bibconvert templates with CFG_SITE_URL from invenio.conf
Options to update DB tables:
--reset-all perform all the reset options
- --reset-cdsname reset tables to take account of new CDSNAME and CDSNAMEINTL
+ --reset-sitename reset tables to take account of new CFG_SITE_NAME*
--reset-adminemail reset tables to take account of new ADMINEMAIL
--reset-fieldnames reset tables to take account of new I18N names from PO files
Options to help the work:
--list print names and values of all options from conf files
--get get value of a given option from conf files
--conf-dir path to directory where invenio*.conf files are [optional]
"""
__revision__ = "$Id$"
from ConfigParser import ConfigParser
import os
import re
import shutil
import sys
def print_usage():
"""Print help."""
print __doc__
def print_version():
"""Print version information."""
print __revision__
def convert_conf_option(option_name, option_value):
"""
Convert conf option into Python config.py line, converting
values to ints or strings as appropriate.
"""
- ## 1) convert option name:
- if option_name in ['cdsname', 'cdslang', 'supportemail',
- 'adminemail', 'alertengineemail', 'webdir',
- 'weburl', 'sweburl', 'bindir', 'pylibdir',
- 'cachedir', 'logdir', 'tmpdir', 'etcdir',
- 'version', 'localedir', 'cdslangs',
- 'cdsnameintl', 'counters', 'storage',
- 'filedir', 'filedirsize', 'xmlmarc2textmarc',
- 'bibupload', 'bibformat', 'bibwords',
- 'bibconvert', 'bibconvertconf',]:
- # keep lowercase for these "legacy" names:
- pass
- else:
- # otherwise convert to uppercase:
- option_name = option_name.upper()
+ ## 1) convert option name to uppercase:
+ option_name = option_name.upper()
+
+ ## also, adjust some conf names due to backwards compatibility:
+ option_name_replace_data = {'CFG_SITE_URL': 'weburl',
+ 'CFG_SITE_SECURE_URL': 'sweburl',
+ 'CFG_SITE_NAME': 'cdsname',
+ 'CFG_SITE_NAME_INTL': 'cdsnameintl',
+ 'CFG_SITE_LANG': 'cdslang',
+ 'CFG_SITE_LANGS': 'cdslangs',
+ 'CFG_SITE_SUPPORT_EMAIL': 'supportemail',
+ 'CFG_SITE_ADMIN_EMAIL': 'adminemail',
+ }
+ if option_name_replace_data.has_key(option_name):
+ option_name = option_name_replace_data[option_name]
## 2) convert option value to int or string:
try:
option_value = int(option_value)
except ValueError:
option_value = '"' + option_value + '"'
## 3a) special cases: regexps
if option_name in ['CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS',
'CFG_BIBINDEX_CHARS_PUNCTUATION']:
option_value = 'r"[' + option_value[1:-1] + ']"'
## 3b) special cases: True, False, None
if option_value in ['"True"', '"False"', '"None"']:
option_value = option_value[1:-1]
## 3c) special cases: dicts or lists
if option_name in ['CFG_WEBSEARCH_FIELDS_CONVERT',
'CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS']:
option_value = option_value[1:-1]
## 3d) special cases: cdslangs
if option_name == 'cdslangs':
out = "["
for lang in option_value[1:-1].split(","):
out += "'%s', " % lang
out += "]"
option_value = out
## 3e) special cases: multiline
if option_name == 'CFG_OAI_IDENTIFY_DESCRIPTION':
# make triple quotes
option_value = '""' + option_value + '""'
## 3f) ignore some options:
- if option_name == 'CDSNAMEINTL':
+ if option_name == 'cdsnameintl':
# treated elsewhere
return
## 4) finally, return output line:
return '%s = %s' % (option_name, option_value)
def cli_cmd_update_config_py(conf):
"""
Update new config.py from conf options, keeping previous
config.py in a backup copy.
"""
print ">>> Going to update config.py..."
## location where config.py is:
- configpyfile = conf.get("Invenio", "pylibdir") + \
+ configpyfile = conf.get("Invenio", "CFG_PYLIBDIR") + \
os.sep + 'invenio' + os.sep + 'config.py'
## backup current config.py file:
if os.path.exists(configpyfile):
shutil.copy(configpyfile, configpyfile + '.OLD')
## here we go:
fdesc = open(configpyfile, 'w')
## generate preamble:
fdesc.write("# -*- coding: utf-8 -*-\n")
fdesc.write("# DO NOT EDIT THIS FILE! IT WAS AUTOMATICALLY GENERATED\n")
fdesc.write("# FROM INVENIO.CONF BY EXECUTING:\n")
fdesc.write("# " + " ".join(sys.argv) + "\n")
- ## special treatment for CDSNAMEINTL options:
+ ## special treatment for CFG_SITE_NAME_INTL options:
fdesc.write("cdsnameintl = {}\n")
- for lang in conf.get("Invenio", "cdslangs").split(","):
+ for lang in conf.get("Invenio", "CFG_SITE_LANGS").split(","):
fdesc.write("cdsnameintl['%s'] = \"%s\"\n" % (lang, conf.get("Invenio",
- "cdsnameintl_" + lang)))
+ "CFG_SITE_NAME_INTL_" + lang)))
## special treatment for legacy WebSubmit options: (FIXME: phase them out)
- fdesc.write("accessurl = '%s/search'\n" % conf.get("Invenio", "WEBURL"))
- fdesc.write("urlpath = '%s'\n" % conf.get("Invenio", "WEBURL"))
- fdesc.write("images = '%s/img'\n" % conf.get("Invenio", "WEBURL"))
- fdesc.write("htdocsurl = '%s'\n" % conf.get("Invenio", "WEBURL"))
+ fdesc.write("accessurl = '%s/search'\n" % conf.get("Invenio", "CFG_SITE_URL"))
+ fdesc.write("urlpath = '%s'\n" % conf.get("Invenio", "CFG_SITE_URL"))
+ fdesc.write("images = '%s/img'\n" % conf.get("Invenio", "CFG_SITE_URL"))
+ fdesc.write("htdocsurl = '%s'\n" % conf.get("Invenio", "CFG_SITE_URL"))
## process all the options normally:
for section in conf.sections():
for option in conf.options(section):
if not option.startswith('CFG_DATABASE_'):
# put all options except for db credentials into config.py
line_out = convert_conf_option(option, conf.get(section, option))
if line_out:
fdesc.write(line_out + "\n")
## generate postamble:
fdesc.write("")
fdesc.write("# END OF GENERATED FILE")
## we are done:
fdesc.close()
+ print "You may want to restart Apache now."
print ">>> config.py updated successfully."
def cli_cmd_update_dbquery_py(conf):
"""
Update lib/dbquery.py file with DB parameters read from conf file.
Note: this edits dbquery.py in situ, taking a backup first.
Use only when you know what you are doing.
"""
print ">>> Going to update dbquery.py..."
## location where dbquery.py is:
- dbquerypyfile = conf.get("Invenio", "pylibdir") + \
+ dbquerypyfile = conf.get("Invenio", "CFG_PYLIBDIR") + \
os.sep + 'invenio' + os.sep + 'dbquery.py'
## backup current dbquery.py file:
if os.path.exists(dbquerypyfile):
shutil.copy(dbquerypyfile, dbquerypyfile + '.OLD')
## replace db parameters:
out = ''
for line in open(dbquerypyfile, 'r').readlines():
match = re.search(r'^CFG_DATABASE_(HOST|NAME|USER|PASS)(\s*=\s*)\'.*\'$', line)
if match:
dbparam = 'CFG_DATABASE_' + match.group(1)
out += "%s%s'%s'\n" % (dbparam, match.group(2),
conf.get('Invenio', dbparam))
else:
out += line
fdesc = open(dbquerypyfile, 'w')
fdesc.write(out)
fdesc.close()
+ print "You may want to restart Apache now."
print ">>> dbquery.py updated successfully."
def cli_cmd_update_dbexec(conf):
"""
Update bin/dbexec file with DB parameters read from conf file.
Note: this edits dbexec in situ, taking a backup first.
Use only when you know what you are doing.
"""
print ">>> Going to update dbexec..."
## location where dbexec is:
- dbexecfile = conf.get("Invenio", "bindir") + \
+ dbexecfile = conf.get("Invenio", "CFG_BINDIR") + \
os.sep + 'dbexec'
## backup current dbexec file:
if os.path.exists(dbexecfile):
shutil.copy(dbexecfile, dbexecfile + '.OLD')
## replace db parameters via sed:
out = ''
for line in open(dbexecfile, 'r').readlines():
match = re.search(r'^CFG_DATABASE_(HOST|NAME|USER|PASS)(\s*=\s*)\'.*\'$', line)
if match:
dbparam = 'CFG_DATABASE_' + match.group(1)
out += "%s%s'%s'\n" % (dbparam, match.group(2),
conf.get("Invenio", dbparam))
else:
out += line
fdesc = open(dbexecfile, 'w')
fdesc.write(out)
fdesc.close()
print ">>> dbexec updated successfully."
def cli_cmd_update_bibconvert_tpl(conf):
"""
Update bibconvert/config/*.tpl files looking for 856
http://.../record/ lines, replacing URL with CDSWEB taken from
conf file. Note: this edits tpl files in situ, taking a
backup first. Use only when you know what you are doing.
"""
print ">>> Going to update bibconvert templates..."
## location where bibconvert/config/*.tpl are:
- tpldir = conf.get("Invenio", 'ETCDIR') + \
+ tpldir = conf.get("Invenio", 'CFG_ETCDIR') + \
os.sep + 'bibconvert' + os.sep + 'config'
## find all *.tpl files:
for tplfilename in os.listdir(tpldir):
if tplfilename.endswith(".tpl"):
## change tpl file:
tplfile = tpldir + os.sep + tplfilename
shutil.copy(tplfile, tplfile + '.OLD')
out = ''
for line in open(tplfile, 'r').readlines():
match = re.search(r'^(.*)http://.*?/record/(.*)$', line)
if match:
out += "%s%s/record/%s\n" % (match.group(1),
- conf.get("Invenio", 'WEBURL'),
+ conf.get("Invenio", 'CFG_SITE_URL'),
match.group(2))
else:
out += line
fdesc = open(tplfile, 'w')
fdesc.write(out)
fdesc.close()
print ">>> bibconvert templates updated successfully."
-def cli_cmd_reset_cdsname(conf):
+def cli_cmd_reset_sitename(conf):
"""
- Reset collection-related tables with new CDSNAME and
- CDSNAMEINTL read from conf files.
+ Reset collection-related tables with new CFG_SITE_NAME and
+ CFG_SITE_NAME_INTL* read from conf files.
"""
- print ">>> Going to reset CDSNAME and CDSNAMEINTL..."
+ print ">>> Going to reset CFG_SITE_NAME and CFG_SITE_NAME_INTL..."
from invenio.dbquery import run_sql, IntegrityError
- # reset CDSNAME:
- cdsname = conf.get("Invenio", "cdsname")
+ # reset CFG_SITE_NAME:
+ sitename = conf.get("Invenio", "CFG_SITE_NAME")
try:
run_sql("""INSERT INTO collection (id, name, dbquery, reclist, restricted) VALUES
- (1,%s,NULL,NULL,NULL)""", (cdsname,))
+ (1,%s,NULL,NULL,NULL)""", (sitename,))
except IntegrityError:
- run_sql("""UPDATE collection SET name=%s WHERE id=1""", (cdsname,))
- # reset CDSNAMEINTL:
- for lang in conf.get("Invenio", "cdslangs").split(","):
- cdsname_lang = conf.get("Invenio", "cdsnameintl_" + lang)
+ run_sql("""UPDATE collection SET name=%s WHERE id=1""", (sitename,))
+ # reset CFG_SITE_NAME_INTL:
+ for lang in conf.get("Invenio", "CFG_SITE_LANGS").split(","):
+ sitename_lang = conf.get("Invenio", "CFG_SITE_NAME_INTL_" + lang)
try:
run_sql("""INSERT INTO collectionname (id_collection, ln, type, value) VALUES
- (%s,%s,%s,%s)""", (1, lang, 'ln', cdsname_lang))
+ (%s,%s,%s,%s)""", (1, lang, 'ln', sitename_lang))
except IntegrityError:
run_sql("""UPDATE collectionname SET value=%s
WHERE ln=%s AND id_collection=1 AND type='ln'""",
- (cdsname_lang, lang))
- print ">>> CDSNAME and CDSNAMEINTL reset successfully."
+ (sitename_lang, lang))
+ print "You may want to restart Apache now."
+ print ">>> CFG_SITE_NAME and CFG_SITE_NAME_INTL* reset successfully."
def cli_cmd_reset_adminemail(conf):
"""
Reset user-related tables with new ADMINEMAIL read from conf files.
"""
print ">>> Going to reset ADMINEMAIL..."
from invenio.dbquery import run_sql
- adminemail = conf.get("Invenio", "adminemail")
+ adminemail = conf.get("Invenio", "CFG_SITE_ADMIN_EMAIL")
run_sql("DELETE FROM user WHERE id=1")
run_sql("""INSERT INTO user (id, email, password, note, nickname) VALUES
(1, %s, AES_ENCRYPT(email, ''), 1, 'admin')""",
(adminemail,))
+ print "You may want to restart Apache now."
print ">>> ADMINEMAIL reset successfully."
def cli_cmd_reset_fieldnames(conf):
"""
Reset I18N field names such as author, title, etc and other I18N
ranking method names such as word similarity. Their translations
are taken from the PO files.
"""
print ">>> Going to reset I18N field names..."
from invenio.messages import gettext_set_language, language_list_long
from invenio.dbquery import run_sql, IntegrityError
## get field id and name list:
field_id_name_list = run_sql("SELECT id, name FROM field")
## get rankmethod id and name list:
rankmethod_id_name_list = run_sql("SELECT id, name FROM rnkMETHOD")
## update names for every language:
for lang, dummy in language_list_long():
_ = gettext_set_language(lang)
## this list is put here in order for PO system to pick names
## suitable for translation
field_name_names = {"any field": _("any field"),
"title": _("title"),
"author": _("author"),
"abstract": _("abstract"),
"keyword": _("keyword"),
"report number": _("report number"),
"subject": _("subject"),
"reference": _("reference"),
"fulltext": _("fulltext"),
"collection": _("collection"),
"division": _("division"),
"year": _("year"),
"experiment": _("experiment"),
"record ID": _("record ID"),}
## update I18N names for every language:
for (field_id, field_name) in field_id_name_list:
try:
run_sql("""INSERT INTO fieldname (id_field,ln,type,value) VALUES
(%s,%s,%s,%s)""", (field_id, lang, 'ln',
field_name_names[field_name]))
except IntegrityError:
run_sql("""UPDATE fieldname SET value=%s
WHERE id_field=%s AND ln=%s AND type=%s""",
(field_name_names[field_name], field_id, lang, 'ln',))
## ditto for rank methods:
rankmethod_name_names = {"wrd": _("word similarity"),
"demo_jif": _("journal impact factor"),
"citation": _("times cited"),}
for (rankmethod_id, rankmethod_name) in rankmethod_id_name_list:
try:
run_sql("""INSERT INTO rnkMETHODNAME (id_rnkMETHOD,ln,type,value) VALUES
(%s,%s,%s,%s)""", (rankmethod_id, lang, 'ln',
rankmethod_name_names[rankmethod_name]))
except IntegrityError:
run_sql("""UPDATE rnkMETHODNAME SET value=%s
WHERE id_rnkMETHOD=%s AND ln=%s AND type=%s""",
(rankmethod_name_names[rankmethod_name], rankmethod_id, lang, 'ln',))
print ">>> I18N field names reset successfully."
def test_db_connection():
"""
Test DB connection, and if fails, advise user how to set it up.
Useful to be called during table creation.
"""
print "Testing DB connection...",
from invenio.textutils import wrap_text_in_a_box
from invenio.dbquery import run_sql, Error
## first, test connection to the DB server:
try:
run_sql("SHOW TABLES")
except Error, err:
from invenio.dbquery import CFG_DATABASE_HOST, CFG_DATABASE_NAME, \
CFG_DATABASE_USER, CFG_DATABASE_PASS
print wrap_text_in_a_box("""\
DATABASE CONNECTIVITY ERROR %(errno)d: %(errmsg)s.\n
Perhaps you need to set up database and connection rights?
If yes, then please login as MySQL admin user and run the
following commands now:
$ mysql -h %(dbhost)s -u root -p mysql
mysql> CREATE DATABASE %(dbname)s DEFAULT CHARACTER SET utf8;
mysql> GRANT ALL PRIVILEGES ON %(dbname)s.* TO %(dbuser)s@%(webhost)s IDENTIFIED BY '%(dbpass)s';
mysql> QUIT
The values printed above were detected from your configuration.
If they are not right, then please edit your invenio.conf file
and rerun 'inveniocfg --update-all' first.
If the problem is of different nature, then please inspect
the above error message and fix the problem before continuing.""" % \
{'errno': err.args[0],
'errmsg': err.args[1],
'dbname': CFG_DATABASE_NAME,
'dbhost': CFG_DATABASE_HOST,
'dbuser': CFG_DATABASE_USER,
'dbpass': CFG_DATABASE_PASS,
'webhost': CFG_DATABASE_HOST == 'localhost' and 'localhost' or os.popen('hostname -f', 'r').read().strip(),
})
sys.exit(1)
print "ok"
## second, test insert/select of a Unicode string to detect
## possible Python/MySQL/MySQLdb mis-setup:
print "Testing Python/MySQL/MySQLdb UTF-8 chain...",
try:
beta_in_utf8 = "β" # Greek beta in UTF-8 is 0xCEB2
run_sql("CREATE TEMPORARY TABLE test__invenio__utf8 (x char(1), y varbinary(2)) DEFAULT CHARACTER SET utf8")
run_sql("INSERT INTO test__invenio__utf8 (x, y) VALUES (%s, %s)", (beta_in_utf8, beta_in_utf8))
res = run_sql("SELECT x,y,HEX(x),HEX(y),LENGTH(x),LENGTH(y),CHAR_LENGTH(x),CHAR_LENGTH(y) FROM test__invenio__utf8")
assert res[0] == ('\xce\xb2', '\xce\xb2', 'CEB2', 'CEB2', 2L, 2L, 1L, 2L)
run_sql("DROP TEMPORARY TABLE test__invenio__utf8")
except Exception, err:
print wrap_text_in_a_box("""\
DATABASE RELATED ERROR %s\n
A problem was detected with the UTF-8 treatment in the chain
between the Python application, the MySQLdb connector, and
the MySQL database. You may perhaps have installed older
versions of some prerequisite packages?\n
Please check the INSTALL file and please fix this problem
before continuing.""" % err)
sys.exit(1)
print "ok"
def cli_cmd_create_tables(conf):
"""Create and fill Invenio DB tables. Useful for the installation process."""
print ">>> Going to create and fill tables..."
from invenio.config import CFG_PREFIX
test_db_connection()
for cmd in ["%s/bin/dbexec < %s/lib/sql/invenio/tabcreate.sql" % (CFG_PREFIX, CFG_PREFIX),
"%s/bin/dbexec < %s/lib/sql/invenio/tabfill.sql" % (CFG_PREFIX, CFG_PREFIX)]:
if os.system(cmd):
print "ERROR: failed execution of", cmd
sys.exit(1)
- cli_cmd_reset_cdsname(conf)
+ cli_cmd_reset_sitename(conf)
cli_cmd_reset_adminemail(conf)
cli_cmd_reset_fieldnames(conf)
for cmd in ["%s/bin/webaccessadmin -u admin -c -a" % CFG_PREFIX]:
if os.system(cmd):
print "ERROR: failed execution of", cmd
sys.exit(1)
print ">>> Tables created and filled successfully."
def cli_cmd_drop_tables(conf):
"""Drop Invenio DB tables. Useful for the uninstallation process."""
print ">>> Going to drop tables..."
from invenio.config import CFG_PREFIX
from invenio.textutils import wrap_text_in_a_box, wait_for_user
if '--yes-i-know' not in sys.argv:
wait_for_user(wrap_text_in_a_box("""\
WARNING: You are going to destroy your database tables!\n
Press Ctrl-C if you want to abort this action.\n
Press ENTER to proceed with this action."""))
cmd = "%s/bin/dbexec < %s/lib/sql/invenio/tabdrop.sql" % (CFG_PREFIX, CFG_PREFIX)
if os.system(cmd):
print "ERROR: failed execution of", cmd
sys.exit(1)
print ">>> Tables dropped successfully."
def cli_cmd_create_demo_site(conf):
"""Create demo site. Useful for testing purposes."""
print ">>> Going to create demo site..."
from invenio.config import CFG_PREFIX
from invenio.dbquery import run_sql
run_sql("TRUNCATE schTASK")
for cmd in ["%s/bin/dbexec < %s/lib/sql/invenio/democfgdata.sql" % (CFG_PREFIX, CFG_PREFIX),
"%s/bin/webaccessadmin -u admin -c -r -D" % CFG_PREFIX,
"%s/bin/webcoll -u admin" % CFG_PREFIX,
"%s/bin/webcoll 1" % CFG_PREFIX,]:
if os.system(cmd):
print "ERROR: failed execution of", cmd
sys.exit(1)
print ">>> Demo site created successfully."
def cli_cmd_load_demo_records(conf):
"""Load demo records. Useful for testing purposes."""
from invenio.config import CFG_PREFIX
from invenio.dbquery import run_sql
print ">>> Going to load demo records..."
run_sql("TRUNCATE schTASK")
for cmd in ["%s/bin/bibupload -i %s/var/tmp/demobibdata.xml" % (CFG_PREFIX, CFG_PREFIX),
"%s/bin/bibupload 1" % CFG_PREFIX,
"%s/bin/bibindex -u admin" % CFG_PREFIX,
"%s/bin/bibindex 2" % CFG_PREFIX,
"%s/bin/bibreformat -u admin -o HB" % CFG_PREFIX,
"%s/bin/bibreformat 3" % CFG_PREFIX,
"%s/bin/bibupload 4" % CFG_PREFIX,
"%s/bin/webcoll -u admin" % CFG_PREFIX,
"%s/bin/webcoll 5" % CFG_PREFIX,
"%s/bin/bibrank -u admin" % CFG_PREFIX,
"%s/bin/bibrank 6" % CFG_PREFIX,]:
if os.system(cmd):
print "ERROR: failed execution of", cmd
sys.exit(1)
print ">>> Demo records loaded successfully."
def cli_cmd_remove_demo_records(conf):
"""Remove demo records. Useful when you are finished testing."""
print ">>> Going to remove demo records..."
from invenio.config import CFG_PREFIX
from invenio.dbquery import run_sql
from invenio.textutils import wrap_text_in_a_box, wait_for_user
if '--yes-i-know' not in sys.argv:
wait_for_user(wrap_text_in_a_box("""\
WARNING: You are going to destroy your records and documents!\n
Press Ctrl-C if you want to abort this action.\n
Press ENTER to proceed with this action."""))
if os.path.exists(CFG_PREFIX + os.sep + 'var' + os.sep + 'data' + os.sep + 'files'):
shutil.rmtree(CFG_PREFIX + os.sep + 'var' + os.sep + 'data' + os.sep + 'files')
run_sql("TRUNCATE schTASK")
for cmd in ["%s/bin/dbexec < %s/lib/sql/invenio/tabbibclean.sql" % (CFG_PREFIX, CFG_PREFIX),
"%s/bin/webcoll -u admin" % CFG_PREFIX,
"%s/bin/webcoll 1" % CFG_PREFIX,]:
if os.system(cmd):
print "ERROR: failed execution of", cmd
sys.exit(1)
print ">>> Demo records removed successfully."
def cli_cmd_drop_demo_site(conf):
"""Drop demo site completely. Useful when you are finished testing."""
print ">>> Going to drop demo site..."
from invenio.textutils import wrap_text_in_a_box, wait_for_user
if '--yes-i-know' not in sys.argv:
wait_for_user(wrap_text_in_a_box("""\
WARNING: You are going to destroy your site and documents!\n
Press Ctrl-C if you want to abort this action.\n
Press ENTER to proceed with this action."""))
cli_cmd_drop_tables(conf)
cli_cmd_create_tables(conf)
cli_cmd_remove_demo_records(conf)
print ">>> Demo site dropped successfully."
def cli_cmd_run_unit_tests(conf):
"""Run unit tests, usually on the working demo site."""
from invenio.config import CFG_PREFIX
os.system("%s/bin/testsuite" % CFG_PREFIX)
def cli_cmd_run_regression_tests(conf):
"""Run regression tests, usually on the working demo site."""
from invenio.config import CFG_PREFIX
if '--yes-i-know' in sys.argv:
os.system("%s/bin/regressiontestsuite --yes-i-know" % CFG_PREFIX)
else:
os.system("%s/bin/regressiontestsuite" % CFG_PREFIX)
def cli_cmd_create_apache_conf(conf):
"""
Create Apache conf files for this site, keeping previous
files in a backup copy.
"""
print ">>> Going to create Apache conf files..."
from invenio.textutils import wrap_text_in_a_box
- apache_conf_dir = conf.get("Invenio", 'ETCDIR') + \
+ apache_conf_dir = conf.get("Invenio", 'CFG_ETCDIR') + \
os.sep + 'apache'
if not os.path.exists(apache_conf_dir):
os.mkdir(apache_conf_dir)
apache_vhost_file = apache_conf_dir + os.sep + \
'invenio-apache-vhost.conf'
apache_vhost_ssl_file = apache_conf_dir + os.sep + \
'invenio-apache-vhost-ssl.conf'
apache_vhost_body = """\
AddDefaultCharset UTF-8
ServerSignature Off
ServerTokens Prod
NameVirtualHost *:80
deny from all
deny from all
ServerName %(servername)s
ServerAlias %(serveralias)s
ServerAdmin %(serveradmin)s
DocumentRoot %(webdir)s
Options FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
allow from all
ErrorLog %(logdir)s/apache.err
LogLevel warn
CustomLog %(logdir)s/apache.log combined
DirectoryIndex index.en.html index.html
SetHandler python-program
PythonHandler invenio.webinterface_layout
PythonDebug On
AddHandler python-program .py
PythonHandler mod_python.publisher
PythonDebug On
-""" % {'servername': conf.get('Invenio', 'WEBURL').replace("http://", ""),
- 'serveralias': conf.get('Invenio', 'WEBURL').replace("http://", "").split('.')[0],
- 'serveradmin': conf.get('Invenio', 'ADMINEMAIL'),
- 'webdir': conf.get('Invenio', 'WEBDIR'),
- 'logdir': conf.get('Invenio', 'LOGDIR'),
+""" % {'servername': conf.get('Invenio', 'CFG_SITE_URL').replace("http://", ""),
+ 'serveralias': conf.get('Invenio', 'CFG_SITE_URL').replace("http://", "").split('.')[0],
+ 'serveradmin': conf.get('Invenio', 'CFG_SITE_ADMIN_EMAIL'),
+ 'webdir': conf.get('Invenio', 'CFG_WEBDIR'),
+ 'logdir': conf.get('Invenio', 'CFG_LOGDIR'),
}
apache_vhost_ssl_body = """\
ServerSignature Off
ServerTokens Prod
NameVirtualHost *:443
#SSLCertificateFile /etc/apache2/ssl/apache.pem
SSLCertificateFile /etc/apache2/ssl/server.crt
SSLCertificateKeyFile /etc/apache2/ssl/server.key
deny from all
deny from all
ServerName %(servername)s
ServerAlias %(serveralias)s
ServerAdmin %(serveradmin)s
SSLEngine on
DocumentRoot %(webdir)s
Options FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
allow from all
ErrorLog %(logdir)s/apache-ssl.err
LogLevel warn
CustomLog %(logdir)s/apache-ssl.log combined
DirectoryIndex index.en.html index.html
SetHandler python-program
PythonHandler invenio.webinterface_layout
PythonDebug On
AddHandler python-program .py
PythonHandler mod_python.publisher
PythonDebug On
-""" % {'servername': conf.get('Invenio', 'SWEBURL').replace("http://", ""),
- 'serveralias': conf.get('Invenio', 'SWEBURL').replace("http://", "").split('.')[0],
- 'serveradmin': conf.get('Invenio', 'ADMINEMAIL'),
- 'webdir': conf.get('Invenio', 'WEBDIR'),
- 'logdir': conf.get('Invenio', 'LOGDIR'),
+""" % {'servername': conf.get('Invenio', 'CFG_SITE_URL_SECURE').replace("http://", ""),
+ 'serveralias': conf.get('Invenio', 'CFG_SITE_URL_SECURE').replace("http://", "").split('.')[0],
+ 'serveradmin': conf.get('Invenio', 'CFG_SITE_ADMIN_EMAIL'),
+ 'webdir': conf.get('Invenio', 'CFG_WEBDIR'),
+ 'logdir': conf.get('Invenio', 'CFG_LOGDIR'),
}
# write HTTP vhost snippet:
if os.path.exists(apache_vhost_file):
shutil.copy(apache_vhost_file,
apache_vhost_file + '.OLD')
fdesc = open(apache_vhost_file, 'w')
fdesc.write(apache_vhost_body)
fdesc.close()
print "Created file", apache_vhost_file
# write HTTPS vhost snippet:
- if conf.get('Invenio', 'SWEBURL') != \
- conf.get('Invenio', 'WEBURL'):
+ if conf.get('Invenio', 'CFG_SITE_URL_SECURE') != \
+ conf.get('Invenio', 'CFG_SITE_URL'):
if os.path.exists(apache_vhost_ssl_file):
shutil.copy(apache_vhost_ssl_file,
apache_vhost_ssl_file + '.OLD')
fdesc = open(apache_vhost_ssl_file, 'w')
fdesc.write(apache_vhost_ssl_body)
fdesc.close()
print "Created file", apache_vhost_ssl_file
print ""
print wrap_text_in_a_box("""\
Apache virtual host configurations for your site have been
created. You can check created files and put the following
include statements in your httpd.conf:\n
Include %s
Include %s
""" % (apache_vhost_file, apache_vhost_ssl_file))
print ">>> Apache conf files created."
def cli_cmd_get(conf, varname):
"""
Return value of VARNAME read from CONF files. Useful for
third-party programs to access values of conf options such as
CFG_PREFIX. Return None if VARNAME is not found.
"""
# do not pay attention to upper/lower case:
varname = varname.lower()
# do not pay attention to section names yet:
all_options = {}
for section in conf.sections():
for option in conf.options(section):
all_options[option] = conf.get(section, option)
return all_options.get(varname, None)
def cli_cmd_list(conf):
"""
Print a list of all conf options and values from CONF.
"""
for section in conf.sections():
for option in conf.options(section):
print option, '=', conf.get(section, option)
def main():
"""Main entry point."""
conf = ConfigParser()
if '--help' in sys.argv or \
'-h' in sys.argv:
print_usage()
elif '--version' in sys.argv or \
'-V' in sys.argv:
print_version()
else:
confdir = None
if '--conf-dir' in sys.argv:
try:
confdir = sys.argv[sys.argv.index('--conf-dir') + 1]
except IndexError:
pass # missing --conf-dir argument value
if not os.path.exists(confdir):
print "ERROR: bad or missing --conf-dir option value."
sys.exit(1)
else:
## try to detect path to conf dir (relative to this bin dir):
confdir = re.sub(r'/bin$', '/etc', sys.path[0])
## read conf files:
for conffile in [confdir + os.sep + 'invenio.conf',
confdir + os.sep + 'invenio-autotools.conf',
confdir + os.sep + 'invenio-local.conf',]:
if os.path.exists(conffile):
conf.read(conffile)
else:
print "ERROR: Badly guessed conf file location", conffile
print "(Please use --conf-dir option.)"
sys.exit(1)
## decide what to do:
done = False
for opt_idx in range(0, len(sys.argv)):
opt = sys.argv[opt_idx]
if opt == '--conf-dir':
# already treated before, so skip silently:
pass
elif opt == '--get':
try:
varname = sys.argv[opt_idx + 1]
except IndexError:
print "ERROR: bad or missing --get option value."
sys.exit(1)
if varname.startswith('-'):
print "ERROR: bad or missing --get option value."
sys.exit(1)
varvalue = cli_cmd_get(conf, varname)
if varvalue is not None:
print varvalue
else:
sys.exit(1)
done = True
elif opt == '--list':
cli_cmd_list(conf)
done = True
elif opt == '--create-tables':
cli_cmd_create_tables(conf)
done = True
elif opt == '--drop-tables':
cli_cmd_drop_tables(conf)
done = True
elif opt == '--create-demo-site':
cli_cmd_create_demo_site(conf)
done = True
elif opt == '--load-demo-records':
cli_cmd_load_demo_records(conf)
done = True
elif opt == '--remove-demo-records':
cli_cmd_remove_demo_records(conf)
done = True
elif opt == '--drop-demo-site':
cli_cmd_drop_demo_site(conf)
done = True
elif opt == '--run-unit-tests':
cli_cmd_run_unit_tests(conf)
done = True
elif opt == '--run-regression-tests':
cli_cmd_run_regression_tests(conf)
done = True
elif opt == '--update-all':
cli_cmd_update_config_py(conf)
cli_cmd_update_dbquery_py(conf)
cli_cmd_update_dbexec(conf)
cli_cmd_update_bibconvert_tpl(conf)
done = True
elif opt == '--update-config-py':
cli_cmd_update_config_py(conf)
done = True
elif opt == '--update-dbquery-py':
cli_cmd_update_dbquery_py(conf)
done = True
elif opt == '--update-dbexec':
cli_cmd_update_dbexec(conf)
done = True
elif opt == '--update-bibconvert-tpl':
cli_cmd_update_bibconvert_tpl(conf)
done = True
elif opt == '--reset-all':
- cli_cmd_reset_cdsname(conf)
+ cli_cmd_reset_sitename(conf)
cli_cmd_reset_adminemail(conf)
cli_cmd_reset_fieldnames(conf)
done = True
- elif opt == '--reset-cdsname':
- cli_cmd_reset_cdsname(conf)
+ elif opt == '--reset-sitename':
+ cli_cmd_reset_sitename(conf)
done = True
elif opt == '--reset-adminemail':
cli_cmd_reset_adminemail(conf)
done = True
elif opt == '--reset-fieldnames':
cli_cmd_reset_fieldnames(conf)
done = True
elif opt == '--create-apache-conf':
cli_cmd_create_apache_conf(conf)
done = True
elif opt.startswith("-") and opt != '--yes-i-know':
print "ERROR: unknown option", opt
sys.exit(1)
if not done:
print """ERROR: Please specify a command. Please see '--help'."""
sys.exit(1)
if __name__ == '__main__':
main()
diff --git a/modules/miscutil/lib/mailutils.py b/modules/miscutil/lib/mailutils.py
index 423c7958d..e6e81e11b 100644
--- a/modules/miscutil/lib/mailutils.py
+++ b/modules/miscutil/lib/mailutils.py
@@ -1,297 +1,297 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
Invenio mail sending utilities. send_email() is the main API function
people should be using; just check out its docstring.
"""
__revision__ = "$Id$"
import sys
from time import sleep
import smtplib
import socket
import re
import os
from email.Header import Header
from email.MIMEText import MIMEText
from email.MIMEMultipart import MIMEMultipart
from email.MIMEImage import MIMEImage
from cStringIO import StringIO
from formatter import DumbWriter, AbstractFormatter
from invenio.config import \
supportemail, \
weburl, \
cdslang, \
cdsnameintl, \
cdsname, \
adminemail, \
CFG_MISCUTIL_SMTP_HOST, \
CFG_MISCUTIL_SMTP_PORT, \
- version
+ CFG_VERSION
from invenio.messages import wash_language, gettext_set_language
from invenio.errorlib import get_msgs_for_code_list, register_errors, register_exception
def send_email(fromaddr,
toaddr,
subject="",
content="",
html_content='',
html_images={},
header=None,
footer=None,
html_header=None,
html_footer=None,
copy_to_admin=0,
attempt_times=1,
attempt_sleeptime=10,
debug_level=0,
ln=cdslang,
charset='utf-8'
):
"""Send an forged email to TOADDR from FROMADDR with message created from subjet, content and possibly
header and footer.
@param fromaddr: [string] sender
@param toaddr: [string] receivers separated by ,
@param subject: [string] subject of the email
@param content: [string] content of the email
@param html_content: [string] html version of the email
@param html_images: [dict] dictionary of image id, image path
@param header: [string] header to add, None for the Default
@param footer: [string] footer to add, None for the Default
@param html_header: [string] header to add to the html part, None for the Default
@param html_footer: [string] footer to add to the html part, None for the Default
@param copy_to_admin: [int] if 1 add emailamin in receivers
@attempt_time: [int] number of tries
@attempt_sleeptime: [int] seconds in between tries
@debug_level: [int] debug level
@ln: [string] invenio language
@param charset: which charset to use in message ('utf-8' by default)
If sending fails, try to send it ATTEMPT_TIMES, and wait for
ATTEMPT_SLEEPTIME seconds in between tries.
e.g.:
send_email('foo.bar@cern.ch', 'bar.foo@cern.ch', 'Let\'s try!'', 'check 1234', 'check1234', {'image1': '/tmp/quantum.jpg'})
@return [bool]: True if email was sent okay, False if it was not.
"""
toaddr = toaddr.strip()
usebcc = ',' in toaddr # More than one address, let's use Bcc in place of To
if copy_to_admin:
if len(toaddr) > 0:
toaddr += ",%s" % (adminemail,)
else:
toaddr = adminemail
body = forge_email(fromaddr, toaddr, subject, content, html_content, html_images, usebcc, header, footer, html_header, html_footer, ln, charset)
toaddr = toaddr.split(",")
if attempt_times < 1 or len(toaddr[0]) == 0:
log('ERR_MISCUTIL_NOT_ATTEMPTING_SEND_EMAIL', fromaddr, toaddr, body)
return False
try:
server = smtplib.SMTP(CFG_MISCUTIL_SMTP_HOST, CFG_MISCUTIL_SMTP_PORT)
if debug_level > 2:
server.set_debuglevel(1)
else:
server.set_debuglevel(0)
server.sendmail(fromaddr, toaddr, body)
server.quit()
except (smtplib.SMTPException, socket.error):
if attempt_times > 1:
if (debug_level > 1):
log('ERR_MISCUTIL_CONNECTION_SMTP', attempt_sleeptime, sys.exc_info()[0], fromaddr, toaddr, body)
sleep(attempt_sleeptime)
return send_email(fromaddr, toaddr, body, attempt_times-1, attempt_sleeptime)
else:
log('ERR_MISCUTIL_SENDING_EMAIL', fromaddr, toaddr, body)
return False
except Exception:
register_exception()
return False
return True
def email_header(ln=cdslang):
"""The header of the email
@param ln: language
@return header as a string"""
ln = wash_language(ln)
_ = gettext_set_language(ln)
#standard header
out = """%(hello)s
""" % {
'hello': _("Hello:")
}
return out
def email_html_header(ln=cdslang):
"""The header of the email
@param ln: language
@return header as a string"""
ln = wash_language(ln)
_ = gettext_set_language(ln)
#standard header
out = """%(hello)s
""" % {
'hello': _("Hello:")
}
return out
def email_footer(ln=cdslang):
"""The footer of the email
@param ln: language
@return footer as a string"""
ln = wash_language(ln)
_ = gettext_set_language(ln)
#standard footer
out = """\n\n%(best_regards)s
--
%(cdsnameintl)s <%(weburl)s>
%(need_intervention_please_contact)s <%(supportemail)s>
""" % {
'cdsnameintl': cdsnameintl[ln],
'best_regards': _("Best regards"),
'weburl': weburl,
'need_intervention_please_contact': _("Need human intervention? Contact"),
'supportemail': supportemail
}
return out
def email_html_footer(ln=cdslang):
"""The html footer of the email
@param ln: language
@return footer as a string"""
ln = wash_language(ln)
_ = gettext_set_language(ln)
#standard footer
out = """
%(best_regards)s%(cdsnameintl)s
%(need_intervention_please_contact)s %(supportemail)s
""" % {
'cdsnameintl': cdsnameintl.get(ln, cdsname),
'best_regards': _("Best regards"),
'weburl': weburl,
'need_intervention_please_contact': _("Need human intervention? Contact"),
'supportemail': supportemail
}
return out
def forge_email(fromaddr, toaddr, subject, content, html_content='',
html_images={}, usebcc=False, header=None, footer=None,
html_header=None, html_footer=None, ln=cdslang,
charset='utf-8'):
"""Prepare email. Add header and footer if needed.
@param fromaddr: [string] sender
@param toaddr: [string] receivers separated by ,
@param usebcc: [bool] True for using Bcc in place of To
@param subject: [string] subject of the email
@param content: [string] content of the email
@param html_content: [string] html version of the email
@param html_images: [dict] dictionary of image id, image path
@param header: [string] None for the default header
@param footer: [string] None for the default footer
@param ln: language
@param charset: which charset to use in message ('utf-8' by default)
@return forged email as a string"""
if header is None:
content = email_header(ln) + content
else:
content = header + content
if footer is None:
content += email_footer(ln)
else:
content += footer
if html_content:
if html_header is None:
html_content = email_html_header(ln) + html_content
else:
html_content = html_header + content
if html_footer is None:
html_content += email_html_footer(ln)
else:
html_content += html_footer
msg_root = MIMEMultipart('related')
msg_root['Subject'] = Header(subject, charset)
msg_root['From'] = fromaddr
if usebcc:
msg_root['Bcc'] = toaddr
else:
msg_root['To'] = toaddr
msg_root.preamble = 'This is a multi-part message in MIME format.'
msg_alternative = MIMEMultipart('alternative')
msg_root.attach(msg_alternative)
msg_text = MIMEText(content, _charset=charset)
msg_alternative.attach(msg_text)
msg_text = MIMEText(html_content, 'html', _charset=charset)
msg_alternative.attach(msg_text)
for image_id, image_path in html_images.iteritems():
msg_image = MIMEImage(open(image_path, 'rb').read())
msg_image.add_header('Content-ID', '<%s>' % image_id)
msg_image.add_header('Content-Disposition', 'attachment', filename=os.path.split(image_path)[1])
msg_root.attach(msg_image)
else:
msg_root = MIMEText(content, _charset=charset)
msg_root['From'] = fromaddr
if usebcc:
msg_root['Bcc'] = toaddr
else:
msg_root['To'] = toaddr
msg_root['Subject'] = Header(subject, charset)
- msg_root.add_header('User-Agent', 'CDS Invenio %s' % version)
+ msg_root.add_header('User-Agent', 'CDS Invenio %s' % CFG_VERSION)
return msg_root.as_string()
RE_NEWLINES = re.compile(r' |
', re.I)
RE_SPACES = re.compile(r'\s+')
RE_HTML_TAGS = re.compile(r'<.+?>')
def email_strip_html(html_content):
"""Strip html tags from html_content, trying to respect formatting."""
html_content = RE_SPACES.sub(' ', html_content)
html_content = RE_NEWLINES.sub('\n', html_content)
html_content = RE_HTML_TAGS.sub('', html_content)
html_content = html_content.split('\n')
out = StringIO()
out_format = AbstractFormatter(DumbWriter(out))
for row in html_content:
out_format.add_flowing_data(row)
out_format.end_paragraph(1)
return out.getvalue()
def log(*error):
"""Register error
@param error: tuple of the form(ERR_, arg1, arg2...)
"""
_ = gettext_set_language(cdslang)
errors = get_msgs_for_code_list([error], 'error', cdslang)
register_errors(errors, 'error')
diff --git a/modules/miscutil/lib/messages.py b/modules/miscutil/lib/messages.py
index 7633713a5..6a5035e9a 100644
--- a/modules/miscutil/lib/messages.py
+++ b/modules/miscutil/lib/messages.py
@@ -1,89 +1,89 @@
# -*- coding: utf-8 -*-
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""CDS Invenio international messages functions, to be used by all
I18N interfaces. Typical usage in the caller code is:
from messages import gettext_set_language
[...]
def square(x, ln=cdslang):
_ = gettext_set_language(ln)
print _("Hello there!")
print _("The square of %s is %s.") % (x, x*x)
In the caller code, all output strings should be made translatable via
the _() convention.
For more information, see ABOUT-NLS file.
"""
__revision__ = "$Id$"
import gettext
-from invenio.config import localedir, cdslangs
+from invenio.config import CFG_LOCALEDIR, cdslangs
lang = {}
for ln in cdslangs:
- lang[ln] = gettext.translation('cds-invenio', localedir, languages = [ln], fallback = True)
+ lang[ln] = gettext.translation('cds-invenio', CFG_LOCALEDIR, languages = [ln], fallback = True)
def gettext_set_language(ln):
"""
Set the _ gettext function in every caller function
Usage::
_ = gettext_set_language(ln)
"""
return lang[ln].gettext
def wash_language(ln):
"""Look at LN and check if it is one of allowed languages for the interface.
Return it in case of success, return the default language otherwise."""
if ln in cdslangs:
return ln
else:
return 'en'
def language_list_long():
"""Return list of [short name, long name] for all enabled languages."""
cfg_all_language_names = {'bg': 'Български',
'ca': 'Català',
'cs': 'Česky',
'de': 'Deutsch',
'el': 'Ελληνικά',
'en': 'English',
'es': 'Español',
'fr': 'Français',
'hr': 'Hrvatski',
'it': 'Italiano',
'ja': '日本語',
'no': 'Norsk/Bokmål',
'pl': 'Polski',
'pt': 'Português',
'ru': 'Русский',
'sk': 'Slovensky',
'sv': 'Svenska',
'uk': 'Українська',
'zh_CN': '中文(简)',
'zh_TW': '中文(繁)',}
enabled_lang_list = []
for lang in cdslangs:
enabled_lang_list.append([lang,cfg_all_language_names[lang]])
return enabled_lang_list
diff --git a/modules/webaccess/doc/admin/webaccess-admin-guide.webdoc b/modules/webaccess/doc/admin/webaccess-admin-guide.webdoc
index 901291fcc..57f4bf9a8 100644
--- a/modules/webaccess/doc/admin/webaccess-admin-guide.webdoc
+++ b/modules/webaccess/doc/admin/webaccess-admin-guide.webdoc
@@ -1,1125 +1,1125 @@
## -*- mode: html; coding: utf-8; -*-
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
WebAccess is a common RBAC, role based access control, for all of
CDS Invenio. This means that users are connected to roles that cover
different areas of access. I.e administrator of the photo
collection or system librarian. Users can be active in
different areas and of course connected to as many roles as needed.
The roles are connected to actions. An action identifies a task you
can perform in CDS Invenio. It can be defined to take any number of
arguments in order to more clearly describe what you are allowing
connected users to do.
- For example the system librarian can be allowed to run bibwords on
+ For example the system librarian can be allowed to run bibindex on
the different indexes. To allow system librarians to run the
- bibwords indexing on the field author we connect role system
- librarian with action runbibwords using the argument
+ bibindex indexing on the field author we connect role system
+ librarian with action runbibindex using the argument
index='author'.
Additionally, roles could have firewall-like role
definitions. A definition is a formal description of which
users are entitled to belong to the role. So you have two ways for
connecting users to roles. Either linking explicitly a user with the
role or describing the characteristics that makes users belong to
the role.
WebAccess is based on allowing users to perform actions. This means
that only allowed actions are stored in the access control engine's
database.
All the WebAccess Administration web pages have certain
features/design choices in common
- Divided into steps
The process of adding new authorizations/information is
stepwise. The subtitle contains information about wich step you are
on and what you are supposed to do.
- Restart from any wanted step
You can always start from an earlier step by simply clicking the
wanted button. This is not a way to undo changes! No information
about previous database is kept, so all changes are definite.
- Change or new entry must confirmed
On all the pages you will be asked to confirm the change, with
information about what kind of change you are about to perform.
- Links to other relevant admin areas on the right side
To make it easier to perform your administration tasks, we have
added a menu area on the right hand side of these pages. The menu
contain links to other relevant admin pages and change according to
the page you are on and the information you have selected.
I. Role area
II. Example - connecting role and user
I. Role area
Administration tasks starts in one of the administration areas. The
role area is the main area from where you can perform all your
managing tasks. The other admin areas are just other ways of
entering.
II. Example - connecting role and user
One of the important tasks that can be handled via the WebAccess Admin Web Interface
is the delegation of access rights to users. This is done by connecting them to the
different roles offered.
The task is divided into 5 simple and comprehensive steps. Below follows the pages from
the different steps with comments on the ongoing procedure.
- step 1 - select a role
You must first select the role you want to connect users to. All the available roles are
listed alfabetically in a select box. Just find the wanted role and select it. Then click on
the button saying "select role".
If you start from the Role Area, this step is already done, and you start directly on step 2.
- step 2 - search for users
As you can see, the subtitle of the page has now changed. The subtitle always tells you
which step you are on and what your current task is.
There can be possibly thousands of users using your online library, therefore it is important
to make it easier to identify the user you are looking for. Give part of, or the entire search
string and all users with partly matching e-mails will be listed on the next step.
You can also see that the right hand menu has changed. This area is always updated with links
to related admin areas.
start adding new authorizations to role superadmin.
- step 3 - select a user.
The select box contains all users with partly matching e-mail addresses. Select the one
you want to connect to the role and continue.
Notice the navigation trail that tells you were on the Administrator pages you are currently
working.
start adding new authorizations to role superadmin.
- step 4 - confirm to add user
All WebAccess Administrator web pages display the action you are about to peform, this
means explaining what kind of addition, change or update will be done to your access control
data.
If you are happy with your decision, simply confirm it.
start adding new authorizations to role superadmin.
- step 5 - confirm user added.
The user has now been added to this role. You can easily continue adding more users to this
role be restarting from step 2 or 3. You can also go directly to another area and keep working
on the same role.
start adding new authorizations to role superadmin.
- we are done
This example is very similar to all the other pages where you administrate WebAccess. The pages
are an easy gateway to maintaing access control rights and share a lot of features.
- divided into steps
- restart from any wanted step (not undo)
- changes must be confirmed
- link to other relevant areas
- prevent unwanted input
As an administrator with access to these pages you are free to manage the rights any way you want.
Here you can administrate the accounts and the access policy for your CDS Invenio installation.
- Access policy:
To change the access policy, the general config file (or
access_control_config.py) must be edited manually in a text
editor. The site can there be defined as opened or closed, you can
edit the access policy level for guest accounts, registered
accounts and decide when to warn the owner of the account when
something happens with it, either when it is created, deleted or
approved. The Apache server must be restarted after modifying
these settings.
The two levels for guest account, are:
0 - Allow guest accounts
1 - Do not allow guest accounts
The five levels for normal accounts, are:
0 - Allow user to create account, automatically activate new accounts
1 - Allow user to create account, administrator must activate account
2 - Only administrators can create account. User cannot edit the email address.
3 - Users cannot register or update account information (email/password)
4 - User cannot change default login method
You can configure CDS Invenio to send an email:
1. To an admin email-address when an account is created
2. To the owner of an account when it is created
3. To the owner of an account when it is activated
4. To the owner of an account when it is deleted
Define how open the site is:
0 = normal operation of the site
1 = read-only site, all write operations temporarily closed
2 = site fully closed
CFG_ACCESS_CONTROL_LEVEL_SITE = 0
Access policy for guests:
0 = Allow guests to search,
1 = Guests cannot search (all users must login)
CFG_ACCESS_CONTROL_LEVEL_GUESTS = 0
Access policy for accounts:
0 = Users can register, automatically acticate accounts
1 = Users can register, but admin must activate the accounts
2 = Users cannot register or change email address, only admin can register accounts.
3 = Users cannot register or update email address or password, only admin can register accounts.
4 = Same as 3, but user cannot change login method.
CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS = 0
Limit email addresses available to use when register a new account (example: cern.ch):
CFG_ACCESS_CONTROL_LIMIT_REGISTRATION_TO_DOMAIN = ""
Send an email when a new account is created by an user:
CFG_ACCESS_CONTROL_NOTIFY_ADMIN_ABOUT_NEW_ACCOUNTS = 0
Send an email to the user notifying when the account is created:
CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_NEW_ACCOUNT = 0
Send an email to the user notifying when the account is activated:
CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_ACTIVATION = 0
Send an email to the user notifying when the account is deleted/rejected:
CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_DELETION = 0
- Account overview:
Here you find an overview of the number of guest accounts, registered accounts and accounts
awaiting activation, with a link to the activation page.
- Create account:
For creating new accounts, the email address must be unique. If configured to do so, an email
will be sent to the given address when an account is created.
- Edit accounts:
For activating or rejecting accounts in addition to modifying them. An activated account can be
inactivated for a short period of time, but this will not warn the account owner. To find accounts
enter a part of the email address of the account and then search. This may take some time. If there
are more than the selected number of accounts per page, you can use the next/prev links to switch
pages. The accounts to search in can also be limited to only activated or not activated accounts.
- Edit account:
When editing one account, you can change the email address, password, delete the account, or modify
the baskets or alerts belonging to one account. Which login method should be the default for this
account can also be selected. To modify baskets or alerts, you need to login as the user, and
modify the desired data as a normal user. Remember to log out as the user when you are finished
editing.
CDS Invenio supports using external login systems to authenticate users.
When a user wants to login, the username and password given by the user is checked against the selected
system, if the user is authenticated by the external system, a valid email-address is returned to
CDS Invenio and used to recognize the user within CDS Invenio.
If a new user is trying to login without having an account, using an external login system, an account
is automatically created in CDS Invenio to recognize and store the users settings. The password for the
local account is randomly generated.
If you want the user to be unable to change login method and account username / password, forcing use
of certain external systems, set CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS to 4 as mentioned in the last paragraph.
If a user is changing login method from an external one to the internal, he also need to either change the
password before logging out, or set the password via the lost password email service.
If you are using CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS with a value greater than 1 note
that, even if the first login of a user through an external authentication technically means registering
the user into the system, this is not the semantic expected behaviour by the user. The user is already
registered at an authority that we trust, so there's no need to prevent the user from being imported
into the system. That's why for external authentication CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS is not
considered apart from what said above.
If a external login system is used, you may want to protect the users username / password using HTTPS.
To add new system, two changes must be made (for the time being):
- The name of the method, if it is default or not, and the classname must be added to the variable
CFG_EXTERNAL_AUTHENTICATION in access_control_config.py. Atleast one method must be marked as the
default one. The internal login method should be given with None as classname.
Example:
CFG_EXTERNAL_AUTHENTICATION = {"%s (internal)" % cdsname: (None, True), "CERN NICE (external)":
(AuthCernWrapper(), False)}
- A class must be created derived from the class external_authentication inside file
external_authentication.py. This class must include at least the
function auth_user. This function returns a valid email-address in CDS Invenio if the user
is authenticated, not necessarily the same entered by the user as username. If the user
is not authenticated, return None.
The class could also provide five more methods: fetch_user_preferences, user_exists,
fetch_user_groups_membership and fetch_all_users_groups_membership.
The first should take an email and eventually a password and should return a dictionary of keys
and value representing external preferences, infos or settings. If, for some reasons, you like
to force some kind of hiding for some particular field you should export the related key
prefixed by "HIDDEN_". Those fields won't be displayed in tables and pages regarding external
settings.
The second method should check through the external system if a particular email exists. If you
provide such a method then a user will be able to switch from and to this authorization method.
The third method should take an email and (if necessary) a password
and should return a dictionary of external_groups_names toghether with their description, for which
the user has a membership. Those groups will be merged into the groups system.
The user will be a member of those groups and will be able to use them in any place
where groups are useful, but won't be able to unsubscribe or to administrate them.
The fourth method should just return a dictionary of external groups as keys and tuples containing
a group description and a list of email of users belonging to each groups. Those memberships
will be merged into the database in the way done by the previous method, but could
provide batch synchronization of groups.
The fifth method should just return the nickname as is known by the external authentication
system, given the usual email/username and the password.
Note: if your system has more than one external login methods then incoherence in the groups
memberships could happen when a user switch his login method. This will be fixed some times in the
future.
If you add as an attribute of your class the enforce_external_nicknames and set it to True, this will enforce
the system to import external nicknames whenever the user login with the external login method for the
first time. Since a nickname is not changable this will stay fixed forever. If this nickname is
already registered in the system (suppose that is linked with a local account) then it will not be
imported. If this variable doesn't exist or is set to False then no nickname will be
imported and the user will be free to choose a nickname in the future (and then this will again
stay forever).
Note: every method will receive as last parameter the mod_python request object, that could
be used for particular purposes.
Example template:
from invenio.external_authentication import ExternalAuth, InvenioWebAccessExternalAuthError
class ExternalAuthFoo(ExternalAuth):
"""External authentication template example."""
def __init__ (self):
"""Initialize stuff here."""
self.name = None
self.enforce_external_nicknames = False
pass
def auth_user(self, username, password, req=None):
"""Authenticate user-supplied USERNAME and PASSWORD.
Return None if authentication failed, or the email address of the
person if the authentication was successful. In order to do
this you may perhaps have to keep a translation table between
usernames and email addresses.
Raise InvenioWebAccessExternalAuthError in case of external troubles.
"""
raise NotImplementedError
#return None
def user_exists(self, email, req=None):
"""Checks against external_authentication for existance of email.
@return True if the user exists, False otherwise
"""
raise NotImplementedError
def fetch_user_groups_membership(self, username, password=None, req=None):
"""Given a username, returns a dictionary of groups
and their description to which the user is subscribed.
Raise InvenioWebAccessExternalAuthError in case of troubles.
"""
raise NotImplementedError
#return {}
def fetch_user_preferences(self, username, password=None, req=None):
"""Given a username and a password, returns a dictionary of keys and
values, corresponding to external infos and settings.
userprefs = {"telephone": "2392489",
"address": "10th Downing Street"}
"""
raise NotImplementedError
#return {}
def fetch_all_users_groups_membership(self, req=None):
"""Fetch all the groups with a description, and users who belong to
each groups.
@return {'mygroup': ('description', ['email1', 'email2', ...]), ...}
"""
raise NotImplementedError
def fetch_user_nickname(self, username, password, req=None):
"""Given a username and a password, returns the right nickname belonging
to that user (username could be an email).
"""
raise NotImplementedError
#return Nickname
In the WebAccess RBAC system, roles are built up from their names,
description and definition.
A definition is the way to formally implicitly define which users belong
to which roles.
A definition is expressed in a firewall like rules language. It's built up
by rows which are matched from top to bottom, in order to decide if the
current user (wethever he/she is logged in or not) may belong to a role.
Any row has this syntax:
ALLOW/DENY ANY
or
ALLOW/DENY [NOT] field {one or more values}
The rows are parsed from top to bottom. If a row matches the user than the
user belongs to the role if the rule is an ALLOW rule, otherwise, if the
rule is a DENY one, the user doesn't belong to the role.
A rule of the kind ALLOW|DENY ANY always matches, regardless of the user.
Note, in place of ANY you can use the word ALL. The semantic is the same. The
system support both to let the user comply with the English grammar.
The second type of rule is interpreted as follows: given a dictionary
of keys:values describing a user (we will cover this below), the rule
considers the value associated with the key named in field, and checks
if it corresponds to at least one of the values in the "one or more values" list.
This is a list of comma separated strings, which can be literal
(double-)quoted strings or regexps (marked by `/' ... `/' signs). If at
least a value matches (literally or through the regexp language), the
whole rule is considered to match.
If the optional NOT keyword is specified than if at least a value of the
rule matches the rule is skipped, otherwise if all the value of the rules
don't match the whole rule matches.
A DENY ANY rule is implicitly added at the end of every definition.
Any field is valid, but only rules concerning fields which currently
exist in the user describing dictionary are checked. All the rules
with non existant fields are skipped.
The user describing dictionary is built at runtime with all the informations
that can be gathered about the current user (and its session).
Currently valid fields are: uid, email, nickname, apache_user, remote_ip,
remote_host, groups, apache_groups and all the external settings provided
by the external authentication systems (e.g. CERN SSO provides:
external_authmethod, external_building, external_department, external_email,
external_external, external_firstname, external_fullname, external_homdir,
external_homeinstitute, external_lastname, external_login, external_mobilenumber,
external_phonenumber).
Among those fields there are some special cases, which are remote_ip and
(apache_)groups. Rules can refer to remote_ip either using a literal
expression for specifing list of single ips, or a usual regexp (or list
of regexps), or, also, using the common network group/mask notation
(e.g. "127.0.0.0/24") as a literal string, which is a mix between literal
expressions and regexps. (apache_)groups are related to group memberships.
Since a user will probably belong to more than a group, then the rule
matches if there's at least one group to which the user belong, that matches
at least one of the expressions (NOT rules behave as you can imagine).
The dictionary is built using the current user session. If the user is
authenticated in some way (apache, locally, externally, SSO...) then more
infos could be provided to the firerole system in order to decide if the
user should belong to a role or not.
The default fields that are always there are:
uid: an integer representing the user id
nickname: the nickname of the user
email: the email of the user
group/groups: local or external group to which the user belong
guest: 1 if the user is a guest (not logged), 0 otherwise
plus all the external setting retrieved by an external authentication system.
If the action to which the role defined is raised from the webinterface of
CDS Invenio, then you will have those additional fields:
remote_ip: the remote ip address of the user who is browsing
remote_host: the remote hostname of the user who is browsing
referer: the webpage from where the user is coming from
uri: the uri the user is visiting
agent: the agent string describing the user's browser
apache_user: the Apache user provided by the authenticated user
apache_group/apache_groups: the Apache groups to which the apache user
belong
Note that you can specify either (apache_)group or (apache_)groups (with or
without the trailing s). They are semantically equal and are supported just
to let people comply with the English grammar.
Every rule is case-insensitive (apart values which must match literally
and regexp values which don't explicitly specify case-insesitive matches).
Every rule may contain comments preceded by the '#' character.
Any comment is discarded.
When you set a definition for a role, it is actually compiled and stored
in a binary compressed form inside the database. If the syntax isn't correct
this will be stated and the definition won't be set or updated.
Example of role definition:
allow not email /.*@gmail.com/,/.*@hotmail.com/
deny group badguys
allow remote_ip "127.0.0.0/24"
deny all
This definition would match all users whose emails don't end with @gmail.com and
@hotmail.com, or who don't belong to the group badguys and have remote_ip
in the 24bit mask network of 127.0.0.0. All the the other users don't belong
to the role which is being defined.
diff --git a/modules/webaccess/lib/access_control_engine.py b/modules/webaccess/lib/access_control_engine.py
index 92af85f8d..c9ff37609 100644
--- a/modules/webaccess/lib/access_control_engine.py
+++ b/modules/webaccess/lib/access_control_engine.py
@@ -1,413 +1,413 @@
## $Id$
## CDS Invenio Access Control Engine in mod_python.
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""CDS Invenio Access Control Engine in mod_python."""
__revision__ = "$Id$"
from invenio.config import \
CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS, \
- version, sweburl
+ CFG_VERSION, sweburl
from invenio.dbquery import run_sql_cached, ProgrammingError
import invenio.access_control_admin as aca
from invenio.access_control_config import SUPERADMINROLE, CFG_WEBACCESS_WARNING_MSGS, CFG_WEBACCESS_MSGS
from invenio import webuser
from invenio import access_control_firerole
from invenio.urlutils import make_canonical_urlargd
from urllib import quote
called_from = 1 #1=web,0=cli
try:
import _apache
except ImportError, e:
called_from = 0
def make_list_apache_firerole(name_action, arguments):
"""Given an action and a dictionary arguments returns a list of all the
roles (and their descriptions) which are authorized to perform this
action with these arguments, and whose FireRole definition expect
an Apache Password membership.
"""
roles = aca.acc_find_possible_roles(name_action, arguments)
ret = []
for role in roles:
res = run_sql_cached('SELECT name, description, firerole_def_ser FROM accROLE WHERE id=%s', (role, ), affected_tables=['accROLE'])
if access_control_firerole.acc_firerole_suggest_apache_p(access_control_firerole.deserialize(res[0][2])):
ret.append((res[0][0], res[0][1]))
return ret
def _format_list_of_apache_firerole(roles, referer):
"""Given a list of tuples (role, description) (returned by make_list_apache_firerole), and a referer url, returns a nice string for
presenting urls that let the user login with Apache password through
Firerole.
This function is needed only at CERN for aiding in the migration of
Apache Passwords restricted collections to FireRole roles.
Please use it with care."""
out = ""
if roles:
out += "
Here's a list of administrative roles you may have " \
"received authorization to, via an Apache password. If you are aware " \
"of such a password, please follow the corresponding link."
out += "
"
out += ""
return out
def make_apache_message(name_action, arguments, referer=None):
"""Given an action name and a dictionary of arguments and a refere url
it returns a a nice string for presenting urls that let the user login
with Apache password through Firerole authorized roles.
This function is needed only at CERN for aiding in the migration of
Apache Passwords restricted collections to FireRole roles.
Please use it with care."""
if not referer:
referer = '%s/youraccount/youradminactivities' % sweburl
roles = make_list_apache_firerole(name_action, arguments)
if roles:
return _format_list_of_apache_firerole(roles, referer)
else:
return ""
## access controle engine function
def acc_authorize_action(req, name_action, verbose=0, check_only_uid_p=False, **arguments):
"""Check if user is allowed to perform action
with given list of arguments.
Return (0, message) if authentication succeeds, (error code, error message) if it fails.
The arguments are as follows:
req - could be either one of these three things:
id_user of the current user
user_info dictionary built against the user details
req mod_python request object
name_action - the name of the action
arguments - dictionary with keyword=value pairs created automatically
by python on the extra arguments. these depend on the
given action.
check_only_uid_p - hidden parameter needed to only check against uids without
looking at role definitions
"""
#TASK -1: Checking external source if user is authorized:
#if CFG_:
# em_pw = run_sql("SELECT email, FROM user WHERE id=%s", (id_user,))
# if em_pw:
# if not CFG_EXTERNAL_ACCESS_CONTROL.loginUser(em_pw[0][0], em_pw[0][1]):
# return (10, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[10], (called_from and CFG_WEBACCESS_MSGS[1] or "")))
# TASK 0: find id and allowedkeywords of action
user_info = {}
if check_only_uid_p:
id_user = req
else:
if type(req) in [type(1), type(1L)]: # req is id_user
id_user = req
user_info = webuser.collect_user_info(id_user)
elif type(req) == type({}): # req is user_info
user_info = req
if user_info.has_key('uid'):
id_user = user_info['uid']
else:
return (4, CFG_WEBACCESS_WARNING_MSGS[4])
else: # req is req
user_info = webuser.collect_user_info(req)
if user_info.has_key('uid'):
id_user = user_info['uid']
else:
return (4, CFG_WEBACCESS_WARNING_MSGS[4])
# Check if just the userid is enough to execute this action
(auth_code, auth_message) = acc_authorize_action(id_user, name_action, verbose, check_only_uid_p=True, **arguments)
if auth_code == 0:
return (auth_code, auth_message)
if verbose: print 'task 0 - get action info'
query1 = """select a.id, a.allowedkeywords, a.optional
from accACTION a
where a.name = '%s'""" % (name_action)
try:
id_action, aallowedkeywords, optional = run_sql_cached(query1, affected_tables=['accACTION'])[0]
except (ProgrammingError, IndexError):
return (3, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[3] % name_action, (called_from and CFG_WEBACCESS_MSGS[1] or "")))
defkeys = aallowedkeywords.split(',')
for key in arguments.keys():
if key not in defkeys:
if user_info.has_key('uri'):
return (8, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[8], (called_from and "%s %s" % (CFG_WEBACCESS_MSGS[0] % quote(user_info['uri']), CFG_WEBACCESS_MSGS[1]) or ""))) #incorrect arguments?
else:
return (8, "%s" % (CFG_WEBACCESS_WARNING_MSGS[8]))
# -------------------------------------------
# TASK 1: check if user is a superadmin
# we know the action exists. no connection with role is necessary
# passed arguments must have allowed keywords
# no check to see if the argument exists
if verbose: print 'task 1 - is user %s' % (SUPERADMINROLE, )
if check_only_uid_p:
if run_sql_cached("""SELECT r.id
FROM accROLE r LEFT JOIN user_accROLE ur
ON r.id = ur.id_accROLE
WHERE r.name = '%s' AND
ur.id_user = '%s' AND ur.expiration>=NOW()""" % (SUPERADMINROLE, id_user), affected_tables=['accROLE', 'user_accROLE']):
return (0, CFG_WEBACCESS_WARNING_MSGS[0])
else:
if access_control_firerole.acc_firerole_check_user(user_info, access_control_firerole.load_role_definition(aca.acc_get_role_id(SUPERADMINROLE))):
return (0, CFG_WEBACCESS_WARNING_MSGS[0])
# TASK 2: check if user exists and find all the user's roles and create or-string
if verbose: print 'task 2 - find user and userroles'
try:
query2 = """SELECT email, note from user where id=%s""" % id_user
res2 = run_sql_cached(query2, affected_tables=['user'])
if check_only_uid_p:
if not res2:
raise Exception
if res2:
if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 1 and res2[0][1] not in [1, "1"]:
if res2[0][0]:
if user_info.has_key('uri'):
return (9, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[9] % res2[0][0], (called_from and "%s %s" % (CFG_WEBACCESS_MSGS[0] % quote(user_info['uri']), CFG_WEBACCESS_MSGS[1]) or "")))
else:
return (9, CFG_WEBACCESS_WARNING_MSGS[9] % res2[0][0])
else:
raise Exception
if check_only_uid_p:
query2 = """SELECT ur.id_accROLE FROM user_accROLE ur WHERE ur.id_user=%s AND ur.expiration>=NOW() ORDER BY ur.id_accROLE """ % id_user
res2 = run_sql_cached(query2, affected_tables=['user_accROLE'])
except Exception:
if user_info.has_key('uri'):
return (6, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[6], (called_from and "%s %s" % (CFG_WEBACCESS_MSGS[0] % quote(user_info['uri']), CFG_WEBACCESS_MSGS[1]) or "")))
else:
return (6, CFG_WEBACCESS_WARNING_MSGS[6])
if check_only_uid_p:
if not res2:
if user_info.has_key('uri'):
return (2, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[2], (called_from and "%s %s" % (CFG_WEBACCESS_MSGS[0] % quote(user_info['uri']), CFG_WEBACCESS_MSGS[1]) or ""))) #user has no roles
else:
return (2, CFG_WEBACCESS_WARNING_MSGS[2])
# -------------------------------------------
# create role string (add default value? roles='(raa.id_accROLE='def' or ')
str_roles = ''
for (role, ) in res2:
if str_roles: str_roles += ','
str_roles += '%s' % (role, )
# TASK 3: authorizations with no arguments given
if verbose: print 'task 3 - checks with no arguments'
if not arguments:
# 3.1
if optional == 'no':
if verbose: print ' - action with zero arguments'
if check_only_uid_p:
connection = run_sql_cached("""SELECT id_accROLE FROM accROLE_accACTION_accARGUMENT
WHERE id_accROLE IN (%s) AND
id_accACTION = %s AND
argumentlistid = 0 AND
id_accARGUMENT = 0 """ % (str_roles, id_action), affected_tables=['accROLE_accACTION_accARGUMENT'])
if connection and 1:
return (0, CFG_WEBACCESS_WARNING_MSGS[0])
else:
if user_info.has_key('uri'):
return (1, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[1], (called_from and "%s %s" % (CFG_WEBACCESS_MSGS[0] % quote(user_info['uri']), CFG_WEBACCESS_MSGS[1]) or "")))
else:
return (1, "%s" % CFG_WEBACCESS_WARNING_MSGS[1])
else:
connection = run_sql_cached("""SELECT id_accROLE FROM
accROLE_accACTION_accARGUMENT
WHERE id_accACTION = %s AND
argumentlistid = 0 AND
id_accARGUMENT = 0 """ % id_action, affected_tables=['accROLE_accACTION_accARGUMENT'])
for id_accROLE in connection:
if access_control_firerole.acc_firerole_check_user(user_info, access_control_firerole.load_role_definition(id_accROLE[0])):
return (0, CFG_WEBACCESS_WARNING_MSGS[0])
if user_info.has_key('uri'):
return (1, "%s %s %s" % (CFG_WEBACCESS_WARNING_MSGS[1], (called_from and "%s %s" % (CFG_WEBACCESS_MSGS[0] % quote(user_info['uri']), CFG_WEBACCESS_MSGS[1]) or ""), make_apache_message(name_action, arguments, user_info.get('referer', None))))
else:
return (1, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[1], make_apache_message(name_action, arguments, user_info.get('referer', None))))
# 3.2
if optional == 'yes':
if verbose: print ' - action with optional arguments'
if check_only_uid_p:
connection = run_sql_cached("""SELECT id_accROLE FROM accROLE_accACTION_accARGUMENT
WHERE id_accROLE IN (%s) AND
id_accACTION = %s AND
id_accARGUMENT = -1 AND
argumentlistid = -1 """ % (str_roles, id_action), affected_tables=['accROLE_accACTION_accARGUMENT'])
if connection and 1:
return (0, CFG_WEBACCESS_WARNING_MSGS[0])
else:
if user_info.has_key('uri'):
return (1, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[1], (called_from and "%s %s" % (CFG_WEBACCESS_MSGS[0] % quote(user_info['uri']), CFG_WEBACCESS_MSGS[1]) or "")))
else:
return (1, CFG_WEBACCESS_WARNING_MSGS[1])
else:
connection = run_sql_cached("""SELECT id_accROLE FROM
accROLE_accACTION_accARGUMENT
WHERE id_accACTION = %s AND
id_accARGUMENT = -1 AND
argumentlistid = -1 """ % id_action, affected_tables=['accROLE_accACTION_accARGUMENT'])
for id_accROLE in connection:
if access_control_firerole.acc_firerole_check_user(user_info, access_control_firerole.load_role_definition(id_accROLE[0])):
return (0, CFG_WEBACCESS_WARNING_MSGS[0])
if user_info.has_key('uri'):
return (1, "%s %s %s" % (CFG_WEBACCESS_WARNING_MSGS[1], (called_from and "%s %s" % (CFG_WEBACCESS_MSGS[0] % quote(user_info['uri']), CFG_WEBACCESS_MSGS[1]) or ""), make_apache_message(name_action, arguments, user_info.get('referer', None))))
else:
return (1, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[1], make_apache_message(name_action, arguments, user_info.get('referer', None))))
# none of the zeroargs tests succeded
if verbose: print ' - not authorization without arguments'
return (5, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[5], (called_from and "%s" % (CFG_WEBACCESS_MSGS[1] or ""))))
# TASK 4: create list of keyword and values that satisfy part of the authentication and create or-string
if verbose: print 'task 4 - create keyword=value pairs'
# create dictionary with default values and replace entries from input arguments
defdict = {}
for key in defkeys:
try: defdict[key] = arguments[key]
except KeyError: return (5, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[5], (called_from and "%s" % (CFG_WEBACCESS_MSGS[1] or "")))) # all keywords must be present
# except KeyError: defdict[key] = 'x' # default value, this is not in use...
# create or-string from arguments
str_args = ''
for key in defkeys:
if str_args: str_args += ' OR '
str_args += """(arg.keyword = '%s' AND arg.value = '%s')""" % (key, defdict[key])
# TASK 5: find all the table entries that partially authorize the action in question
if verbose: print 'task 5 - find table entries that are part of the result'
if check_only_uid_p:
query4 = """SELECT DISTINCT raa.id_accROLE, raa.id_accACTION, raa.argumentlistid,
raa.id_accARGUMENT, arg.keyword, arg.value
FROM accROLE_accACTION_accARGUMENT raa, accARGUMENT arg
WHERE raa.id_accACTION = %s AND
raa.id_accROLE IN (%s) AND
(%s) AND
raa.id_accARGUMENT = arg.id """ % (id_action, str_roles, str_args)
else:
query4 = """SELECT DISTINCT raa.id_accROLE, raa.id_accACTION, raa.argumentlistid,
raa.id_accARGUMENT, arg.keyword, arg.value, ar.firerole_def_ser
FROM accROLE_accACTION_accARGUMENT raa INNER JOIN accROLE ar ON
raa.id_accROLE = ar.id, accARGUMENT arg
WHERE raa.id_accACTION = %s AND
(%s) AND
raa.id_accARGUMENT = arg.id """ % (id_action, str_args)
try: res4 = run_sql_cached(query4, affected_tables=['accROLE_accACTION_accARGUMENT', 'accARGUMENT', 'accROLE'])
except ProgrammingError:
raise query4
return (3, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[3] % id_action, (called_from and "%s" % (CFG_WEBACCESS_MSGS[1] or ""))))
res5 = []
if check_only_uid_p:
if not res4:
if user_info.has_key('uri'):
return (1, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[1], (called_from and "%s %s" % (CFG_WEBACCESS_MSGS[0] % quote(user_info['uri']), CFG_WEBACCESS_MSGS[1]) or ""))) # no entries at all
else:
return (1, CFG_WEBACCESS_WARNING_MSGS[1])
res5 = []
for res in res4:
res5.append(res)
else:
for row in res4:
if access_control_firerole.acc_firerole_check_user(user_info, access_control_firerole.load_role_definition(row[0])):
res5.append(row)
if not res5:
if user_info.has_key('uri'):
return (1, "%s %s %s" % (CFG_WEBACCESS_WARNING_MSGS[1], (called_from and "%s %s" % (CFG_WEBACCESS_MSGS[0] % quote(user_info['uri']), CFG_WEBACCESS_MSGS[1]) or ""), make_apache_message(name_action, arguments, user_info.get('referer', None))))
else:
return (1, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[1], make_apache_message(name_action, arguments, user_info.get('referer', None))))
res5.sort()
# USER AUTHENTICATED TO PERFORM ACTION WITH ONE ARGUMENT
if len(defdict) == 1: return (0, CFG_WEBACCESS_WARNING_MSGS[0])
# CHECK WITH MORE THAN 1 ARGUMENT
# TASK 6: run through the result and try to satisfy authentication
if verbose: print 'task 6 - combine results and try to satisfy'
cur_role = cur_action = cur_arglistid = 0
booldict = {}
for key in defkeys: booldict[key] = 0
# run through the results
for (role, action, arglistid, arg, keyword, val) in res5 + [(-1, -1, -1, -1, -1, -1)]:
# not the same role or argumentlist (authorization group), i.e. check if thing are satisfied
# if cur_arglistid != arglistid or cur_role != role or cur_action != action:
if (cur_arglistid, cur_role, cur_action) != (arglistid, role, action):
if verbose: print ' : checking new combination',
# test if all keywords are satisfied
for value in booldict.values():
if not value: break
else:
if verbose: print '-> found satisfying combination'
return (0, CFG_WEBACCESS_WARNING_MSGS[0]) # USER AUTHENTICATED TO PERFORM ACTION
if verbose: print '-> not this one'
# assign the values for the current tuple from the query
cur_arglistid, cur_role, cur_action = arglistid, role, action
for key in booldict.keys():
booldict[key] = 0
# set keyword qualified for the action, (whatever result of the test)
booldict[keyword] = 1
if verbose: print 'finished'
# authentication failed
if user_info.has_key('uri'):
return (4, "%s %s" % (CFG_WEBACCESS_WARNING_MSGS[4], (called_from and "%s %s" % (CFG_WEBACCESS_MSGS[0] % quote(user_info['uri']), CFG_WEBACCESS_MSGS[1]) or "")))
else:
- return (4, CFG_WEBACCESS_WARNING_MSGS[4])
\ No newline at end of file
+ return (4, CFG_WEBACCESS_WARNING_MSGS[4])
diff --git a/modules/webaccess/lib/external_authentication_cern_wrapper.py b/modules/webaccess/lib/external_authentication_cern_wrapper.py
index 1cff1c3cb..cf65fbd60 100644
--- a/modules/webaccess/lib/external_authentication_cern_wrapper.py
+++ b/modules/webaccess/lib/external_authentication_cern_wrapper.py
@@ -1,171 +1,171 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""Nice API Python wrapper."""
__revision__ = \
"$Id$"
import httplib
import urllib
import re
-from invenio.config import etcdir
+from invenio.config import CFG_ETCDIR
class AuthCernWrapper:
"""Wrapper class for CERN NICE/CRA webservice"""
def __init__(self):
"""Create a connection to CERN NICE/CRA webservice.
Authentication credential should be in the file
- etcdir/webaccess/cern_nice_soap_credentials.txt which must contain
+ CFG_ETCDIR/webaccess/cern_nice_soap_credentials.txt which must contain
username:password in base64 encoding.
"""
self._cern_nice_soap_auth = \
- open(etcdir + "/webaccess/cern_nice_soap_credentials.txt",
+ open(CFG_ETCDIR + "/webaccess/cern_nice_soap_credentials.txt",
"r").read().strip()
self._headers = {"Content-type": "application/x-www-form-urlencoded",
"Accept": "text/plain",
"Authorization": "Basic " + self._cern_nice_soap_auth}
self._conn = httplib.HTTPSConnection("winservices-soap.web.cern.ch")
def __del__(self):
"""Close the CERN Nice webservice connection."""
if self._conn:
self._conn.close()
def _request(self, name, params):
"""Call the name request with a dictionary parms.
@return the XML response.
"""
params = urllib.urlencode(params)
self._conn.request("POST",
"/winservices-soap/generic/Authentication.asmx/%s" % name,
params, self._headers)
response = self._conn.getresponse()
return response.read()
def ccid_is_nice(self, ccid):
"""Verify this CCID belongs to a Nice account. Returns login or -1
if not found.
"""
data = self._request("CCIDisNice", {"CCID": ccid})
match = re.search('(?P.*)', data)
if match:
if match == -1:
return False
else:
return match.group("CCID")
def get_groups_for_user(self, user_name):
"""Returns a string array containing Groups the specified User is
member of. UserName is NICE Login or Email. Listname can be 'listname'
or 'listname@cern.ch'."""
data = self._request("GetGroupsForUser", {"UserName": user_name})
groups = []
for match in re.finditer("(?P.*)", data):
groups.append(match.group("group"))
return groups
def user_is_member_of_list(self, user_name, list_name):
"""Check if one user is member of specified simba list. UserName is
NICE Login or Email. Listname can be 'listname' or 'listname@cern.ch'.
"""
data = self._request("UserIsMemberOfList",
{"UserName": user_name, "ListName": list_name})
match = re.search('(?P.*)', data)
if match:
match = match.group("membership")
if match == "true":
return True
else:
return False
return None
def user_is_member_of_group(self, user_name, group_name):
"""Check if one user is member of specified NICE Group. UserName is
NICE Login or Email."""
data = self._request("UserIsMemberOfGroup",
{"UserName": user_name, "GroupName": group_name})
match = re.search("(?P.*)", data)
if match:
match = match.group("membership")
if match == "true":
return True
else:
return False
return None
def get_user_info(self, user_name, password):
"""Authenticates user from login and password. Login can be email
address or NICE login."""
data = self._request("GetUserInfo",
{"UserName": user_name, "Password": password})
infore = re.compile("<(?P.*)>(?P.*)")
infos = {}
for row in data.split('\r\n'):
match = infore.search(row)
if match:
infos[match.group("field")] = match.group("value")
return infos
def search_groups(self, pattern):
"""Search for a group, based on given pattern. 3 characters minimum are
required. Search is done with: *pattern*."""
data = self._request("SearchGroups", {"pattern": pattern})
groups = []
for match in re.finditer("(?P.*)", data):
groups.append(match.group("group"))
return groups
def get_user_info_ex(self, user_name, password, group_name):
"""Authenticates user from login and password. Login can be email
address or NICE login. Group membership is verified at the same time,
multiple groups can be specified, separated with ','."""
data = self._request("GetUserInfoEx", {"UserName": user_name,
"Password": password,
"GroupName": group_name})
infore = re.compile("<(?P.*)>(?P.*)")
infos = {}
for row in data.split('\r\n'):
match = infore.search(row)
if match:
infos[match.group("field")] = match.group("value")
return infos
def list_users(self, display_name):
"""Search users with given display name. Display name is firstname +
lastname, or email, and can contain *."""
data = self._request("ListUsers", {"DisplayName": display_name})
users = []
infore = re.compile("<(?P.*)>(?P.*)")
for row in data.split('\r\n'):
if "" in row:
current_user = {}
elif "" in row:
users.append(current_user)
else:
match = infore.search(row)
if match:
current_user[match.group("field")] = match.group("value")
return users
diff --git a/modules/webalert/bin/alertengine.in b/modules/webalert/bin/alertengine.in
index 73d2ff2d5..bca7283b3 100644
--- a/modules/webalert/bin/alertengine.in
+++ b/modules/webalert/bin/alertengine.in
@@ -1,82 +1,82 @@
#!@PYTHON@
## -*- mode: python; coding: utf-8; -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-## General Public License for more details.
+## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""Alert engine command line interface"""
__revision__ = "$Id$"
try:
import sys
import getopt
- from invenio.config import version, supportemail
+ from invenio.config import CFG_VERSION, supportemail
from invenio.alert_engine import run_alerts
from time import time
except ImportError, e:
print "Error: %s" % e
import sys
sys.exit(1)
import datetime
def usage():
print """Usage: alertengine [OPTION]
Run the alert engine.
-h, --help display this help and exit
-V, --version output version information and exit
-d --date="YEAR-MONTH-DAY" run the alertengine as if we were the
specified day, for test purposes (today)
Report bugs to <%s>""" % supportemail
-
+
def main():
date = datetime.date.today()
-
+
try:
opts, args = getopt.getopt(sys.argv[1:], "hVd:",
["help", "version", "date="])
except getopt.GetoptError:
usage()
sys.exit(2)
for o, a in opts:
if o in ("-h", "--help"):
usage()
sys.exit()
if o in ("-V", "--version"):
print __revision__
sys.exit(0)
if o in ("-d", "--date"):
year, month, day = map(int, a.split('-'))
date = datetime.date(year, month, day)
-
-
+
+
run_alerts(date)
if __name__ == "__main__":
t0 = time()
- main()
+ main()
t1 = time()
print 'Alert engine finished in %.2f seconds' % (t1 - t0)
diff --git a/modules/webalert/lib/alert_engine.py b/modules/webalert/lib/alert_engine.py
index 083c8a63a..e3d80b4dc 100644
--- a/modules/webalert/lib/alert_engine.py
+++ b/modules/webalert/lib/alert_engine.py
@@ -1,369 +1,369 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""Alert engine implementation."""
## rest of the Python code goes below
__revision__ = "$Id$"
from cgi import parse_qs
from re import search, sub
import sys
from time import localtime, strftime, mktime, sleep
from string import split
import smtplib
import datetime
from email.Header import Header
from email.Message import Message
from email.MIMEText import MIMEText
from invenio.config import \
- logdir, \
+ CFG_LOGDIR, \
supportemail, \
- version, \
+ CFG_VERSION, \
weburl
from invenio.search_engine import perform_request_search
from invenio.alert_engine_config import *
from invenio.webinterface_handler import wash_urlargd
from invenio.dbquery import run_sql
from invenio.htmlparser import *
from invenio.webuser import get_email
import invenio.template
websearch_templates = invenio.template.load('websearch')
webalert_templates = invenio.template.load('webalert')
def update_date_lastrun(alert):
return run_sql('update user_query_basket set date_lastrun=%s where id_user=%s and id_query=%s and id_basket=%s;', (strftime("%Y-%m-%d"), alert[0], alert[1], alert[2],))
def get_alert_queries(frequency):
return run_sql('select distinct id, urlargs from query q, user_query_basket uqb where q.id=uqb.id_query and uqb.frequency=%s and uqb.date_lastrun <= now();', (frequency,))
def get_alert_queries_for_user(uid):
return run_sql('select distinct id, urlargs, uqb.frequency from query q, user_query_basket uqb where q.id=uqb.id_query and uqb.id_user=%s and uqb.date_lastrun <= now();', (uid,))
def get_alerts(query, frequency):
r = run_sql('select id_user, id_query, id_basket, frequency, date_lastrun, alert_name, notification from user_query_basket where id_query=%s and frequency=%s;', (query['id_query'], frequency,))
return {'alerts': r, 'records': query['records'], 'argstr': query['argstr'], 'date_from': query['date_from'], 'date_until': query['date_until']}
# def add_record_to_basket(record_id, basket_id):
# if CFG_WEBALERT_DEBUG_LEVEL > 0:
# print "-> adding record %s into basket %s" % (record_id, basket_id)
# try:
# return run_sql('insert into basket_record (id_basket, id_record) values(%s, %s);', (basket_id, record_id,))
# except:
# return 0
# def add_records_to_basket(record_ids, basket_id):
# # TBD: generate the list and all all records in one step (see below)
# for i in record_ids:
# add_record_to_basket(i, basket_id)
# Optimized version:
def add_records_to_basket(record_ids, basket_id):
nrec = len(record_ids)
if nrec > 0:
vals = '(%s,%s)' % (basket_id, record_ids[0])
if nrec > 1:
for i in record_ids[1:]:
vals += ',(%s, %s)' % (basket_id, i)
if CFG_WEBALERT_DEBUG_LEVEL > 0:
print "-> adding %s records into basket %s: %s" % (nrec, basket_id, vals)
try:
if CFG_WEBALERT_DEBUG_LEVEL < 4:
return run_sql('insert into basket_record (id_basket, id_record) values %s;' % vals) # Cannot use the run_sql(, (,)) form for some reason
else:
print ' NOT ADDED, DEBUG LEVEL == 4'
return 0
except:
return 0
else:
return 0
def get_query(alert_id):
r = run_sql('select urlargs from query where id=%s', (alert_id,))
return r[0][0]
def email_notify(alert, records, argstr):
if len(records) == 0:
return
msg = ""
if CFG_WEBALERT_DEBUG_LEVEL > 0:
msg = "*** THIS MESSAGE WAS SENT IN DEBUG MODE ***\n\n"
url = weburl + "/search?" + argstr
# Extract the pattern and catalogue list from the formatted query
query = parse_qs(argstr)
pattern = query.get('p', [''])[0]
catalogues = query.get('c', [])
frequency = alert[3]
msg += webalert_templates.tmpl_alert_email_body(
alert[5], url, records, pattern, catalogues, frequency)
msg = MIMEText(msg, _charset='utf-8')
email = get_email(alert[0])
if email == 'guest':
print "********************************************************************************"
print "The following alert was not send, because cannot detect user email address:"
print " " + repr(argstr)
print "********************************************************************************"
return
msg['To'] = email
# Let the template fill in missing fields
webalert_templates.tmpl_alert_email_headers(alert[5], msg)
sender = msg['From']
body = msg.as_string()
if CFG_WEBALERT_DEBUG_LEVEL > 0:
print "********************************************************************************"
print body
print "********************************************************************************"
if CFG_WEBALERT_DEBUG_LEVEL < 2:
send_email(sender, email, content=body,
attempt_time=CFG_WEBALERT_SEND_EMAIL_NUMBER_OF_TRIES,
attempt_sleeptime=CFG_WEBALERT_SEND_EMAIL_SLEEPTIME_BETWEEN_TRIES)
if CFG_WEBALERT_DEBUG_LEVEL == 4:
send_email(sender, supportemail, content=body,
attempt_time=CFG_WEBALERT_SEND_EMAIL_NUMBER_OF_TRIES,
attempt_sleeptime=CFG_WEBALERT_SEND_EMAIL_SLEEPTIME_BETWEEN_TRIES)
def get_argument(args, argname):
if args.has_key(argname):
return args[argname]
else:
return []
def _date_to_tuple(date):
return [int(part) for part in (date.year, date.month, date.day)]
def get_record_ids(argstr, date_from, date_until):
argd = wash_urlargd(parse_qs(argstr), websearch_templates.search_results_default_urlargd)
p = get_argument(argd, 'p')
c = get_argument(argd, 'c')
cc = get_argument(argd, 'cc')
as = get_argument(argd, 'as')
f = get_argument(argd, 'f')
so = get_argument(argd, 'so')
sp = get_argument(argd, 'sp')
ot = get_argument(argd, 'ot')
as = get_argument(argd, 'as')
p1 = get_argument(argd, 'p1')
f1 = get_argument(argd, 'f1')
m1 = get_argument(argd, 'm1')
op1 = get_argument(argd, 'op1')
p2 = get_argument(argd, 'p2')
f2 = get_argument(argd, 'f2')
m2 = get_argument(argd, 'm2')
op2 = get_argument(argd, 'op2')
p3 = get_argument(argd, 'p3')
f3 = get_argument(argd, 'f3')
m3 = get_argument(argd, 'm3')
sc = get_argument(argd, 'sc')
d1y, d1m, d1d = _date_to_tuple(date_from)
d2y, d2m, d2d = _date_to_tuple(date_until)
return perform_request_search(of='id', p=p, c=c, cc=cc, f=f, so=so, sp=sp, ot=ot,
as=as, p1=p1, f1=f1, m1=m1, op1=op1, p2=p2, f2=f2,
m2=m2, op2=op2, p3=p3, f3=f3, m3=m3, sc=sc, d1y=d1y,
d1m=d1m, d1d=d1d, d2y=d2y, d2m=d2m, d2d=d2d)
def get_argument_as_string(argstr, argname):
args = parse_qs(argstr)
a = get_argument(args, argname)
r = ''
if len(a):
r = a[0]
for i in a[1:len(a)]:
r += ", %s" % i
return r
def get_pattern(argstr):
return get_argument_as_string(argstr, 'p')
def get_catalogue(argstr):
return get_argument_as_string(argstr, 'c')
def get_catalogue_num(argstr):
args = parse_qs(argstr)
a = get_argument(args, 'c')
return len(a)
def run_query(query, frequency, date_until):
"""Return a dictionary containing the information of the performed query.
The information contains the id of the query, the arguments as a
string, and the list of found records."""
if frequency == 'day':
date_from = date_until - datetime.timedelta(days=1)
elif frequency == 'week':
date_from = date_until - datetime.timedelta(weeks=1)
else:
# Months are not an explicit notion of timedelta (it's the
# most ambiguous too). So we explicitely take the same day of
# the previous month.
d, m, y = (date_until.day, date_until.month, date_until.year)
m = m - 1
if m == 0:
m = 12
y = y - 1
date_from = datetime.date(year=y, month=m, day=d)
recs = get_record_ids(query[1], date_from, date_until)
n = len(recs)
if n:
log('query %08s produced %08s records' % (query[0], len(recs)))
if CFG_WEBALERT_DEBUG_LEVEL > 2:
print "[%s] run query: %s with dates: from=%s, until=%s\n found rec ids: %s" % (
strftime("%c"), query, date_from, date_until, recs)
return {'id_query': query[0], 'argstr': query[1],
'records': recs, 'date_from': date_from, 'date_until': date_until}
def process_alert_queries(frequency, date):
"""Run the alerts according to the frequency.
Retrieves the queries for which an alert exists, performs it, and
processes the corresponding alerts."""
alert_queries = get_alert_queries(frequency)
for aq in alert_queries:
q = run_query(aq, frequency, date)
alerts = get_alerts(q, frequency)
process_alerts(alerts)
def replace_argument(argstr, argname, argval):
"""Replace the given date argument value with the new one.
If the argument is missing, it is added."""
if search('%s=\d+' % argname, argstr):
r = sub('%s=\d+' % argname, '%s=%s' % (argname, argval), argstr)
else:
r = argstr + '&%s=%s' % (argname, argval)
return r
def update_arguments(argstr, date_from, date_until):
"""Replace date arguments in argstr with the ones specified by date_from and date_until.
Absent arguments are added."""
d1y, d1m, d1d = _date_to_tuple(date_from)
d2y, d2m, d2d = _date_to_tuple(date_until)
r = replace_argument(argstr, 'd1y', d1y)
r = replace_argument(r, 'd1m', d1m)
r = replace_argument(r, 'd1d', d1d)
r = replace_argument(r, 'd2y', d2y)
r = replace_argument(r, 'd2m', d2m)
r = replace_argument(r, 'd2d', d2d)
return r
def log(msg):
try:
- log = open(logdir + '/alertengine.log', 'a')
+ log = open(CFG_LOGDIR + '/alertengine.log', 'a')
log.write(strftime('%Y%m%d%H%M%S#'))
log.write(msg + '\n')
log.close()
except:
pass
def process_alerts(alerts):
# TBD: do not generate the email each time, forge it once and then
# send it to all appropriate people
for a in alerts['alerts']:
if alert_use_basket_p(a):
add_records_to_basket(alerts['records'], a[2])
if alert_use_notification_p(a):
argstr = update_arguments(alerts['argstr'], alerts['date_from'], alerts['date_until'])
email_notify(a, alerts['records'], argstr)
update_date_lastrun(a)
def alert_use_basket_p(alert):
return alert[2] != 0
def alert_use_notification_p(alert):
return alert[6] == 'y'
def run_alerts(date):
"""Run the alerts.
First decide which alerts to run according to the current local
time, and runs them."""
if date.day == 1:
process_alert_queries('month', date)
if date.isoweekday() == 1: # first day of the week
process_alert_queries('week', date)
process_alert_queries('day', date)
def process_alert_queries_for_user(uid, date):
"""Process the alerts for the given user id.
All alerts are with reference date set as the current local time."""
alert_queries = get_alert_queries_for_user(uid)
print alert_queries
for aq in alert_queries:
frequency = aq[2]
q = run_query(aq, frequency, date)
alerts = get_alerts(q, frequency)
process_alerts(alerts)
diff --git a/modules/webalert/lib/webalert_templates.py b/modules/webalert/lib/webalert_templates.py
index 3e8234d38..875540681 100644
--- a/modules/webalert/lib/webalert_templates.py
+++ b/modules/webalert/lib/webalert_templates.py
@@ -1,562 +1,562 @@
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
__revision__ = "$Id$"
import cgi
import time
import string
from invenio.config import \
- alertengineemail, \
+ CFG_WEBALERT_ALERT_ENGINE_EMAIL, \
cdsname, \
supportemail, \
weburl
from invenio.messages import gettext_set_language
from invenio.htmlparser import get_as_text, wrap
from invenio.alert_engine_config import CFG_WEBALERT_MAX_NUM_OF_RECORDS_IN_ALERT_EMAIL
class Template:
def tmpl_errorMsg(self, ln, error_msg, rest = ""):
"""
Adds an error message to the output
Parameters:
- 'ln' *string* - The language to display the interface in
- 'error_msg' *string* - The error message
- 'rest' *string* - The rest of the page
"""
# load the right message language
_ = gettext_set_language(ln)
out = """
%(error)s
%(rest)s""" % {
'error' : error_msg,
'rest' : rest
}
return out
def tmpl_textual_query_info_from_urlargs(self, ln, args):
"""
Displays a human inteligible textual representation of a query
Parameters:
- 'ln' *string* - The language to display the interface in
- 'args' *array* - The URL arguments array (parsed)
"""
# load the right message language
_ = gettext_set_language(ln)
out = ""
if args.has_key('p'):
out += "" + _("Pattern") + ": " + string.join(args['p'], "; ") + " "
if args.has_key('f'):
out += "" + _("Field") + ": " + string.join(args['f'], "; ") + " "
if args.has_key('p1'):
out += "" + _("Pattern 1") + ": " + string.join(args['p1'], "; ") + " "
if args.has_key('f1'):
out += "" + _("Field 1") + ": " + string.join(args['f1'], "; ") + " "
if args.has_key('p2'):
out += "" + _("Pattern 2") + ": " + string.join(args['p2'], "; ") + " "
if args.has_key('f2'):
out += "" + _("Field 2") + ": " + string.join(args['f2'], "; ") + " "
if args.has_key('p3'):
out += "" + _("Pattern 3") + ": " + string.join(args['p3'], "; ") + " "
if args.has_key('f3'):
out += "" + _("Field 3") + ": " + string.join(args['f3'], "; ") + " "
if args.has_key('c'):
out += "" + _("Collections") + ": " + string.join(args['c'], "; ") + " "
elif args.has_key('cc'):
out += "" + _("Collection") + ": " + string.join(args['cc'], "; ") + " "
return out
def tmpl_account_list_alerts(self, ln, alerts):
"""
Displays all the alerts in the main "Your account" page
Parameters:
- 'ln' *string* - The language to display the interface in
- 'alerts' *array* - The existing alerts IDs ('id' + 'name' pairs)
"""
# load the right message language
_ = gettext_set_language(ln)
out = """""" % {
'show' : _("SHOW"),
}
return out
def tmpl_input_alert(self, ln, query, alert_name, action, frequency, notification,
baskets, old_id_basket, id_basket, id_query,
guest, guesttxt):
"""
Displays an alert adding form.
Parameters:
- 'ln' *string* - The language to display the interface in
- 'query' *string* - The HTML code of the textual representation of the query (as returned ultimately by tmpl_textual_query_info_from_urlargs...)
- 'alert_name' *string* - The alert name
- 'action' *string* - The action to complete ('update' or 'add')
- 'frequency' *string* - The frequency of alert running ('day', 'week', 'month')
- 'notification' *string* - If notification should be sent by email ('y', 'n')
- 'baskets' *array* - The existing baskets ('id' + 'name' pairs)
- 'old_id_basket' *string* - The id of the previous basket of this alert
- 'id_basket' *string* - The id of the basket of this alert
- 'id_query' *string* - The id of the query associated to this alert
- 'guest' *bool* - If the user is a guest user
- 'guesttxt' *string* - The HTML content of the warning box for guest users (produced by webaccount.tmpl_warning_guest_user)
"""
# load the right message language
_ = gettext_set_language(ln)
out = ""
out += """
%(notify_cond)s
%(query_text)s:
%(query)s
""" % {
'notify_cond' : _("This alert will notify you each time/only if a new item satisfies the following query:"),
'query_text' : _("QUERY"),
'query' : query,
}
out += """"
if guest:
out += guesttxt
return out
def tmpl_list_alerts(self, ln, alerts, guest, guesttxt):
"""
Displays the list of alerts
Parameters:
- 'ln' *string* - The language to display the interface in
- 'alerts' *array* - The existing alerts:
- 'queryid' *string* - The id of the associated query
- 'queryargs' *string* - The query string
- 'textargs' *string* - The textual description of the query string
- 'userid' *string* - The user id
- 'basketid' *string* - The basket id
- 'basketname' *string* - The basket name
- 'alertname' *string* - The alert name
- 'frequency' *string* - The frequency of alert running ('day', 'week', 'month')
- 'notification' *string* - If notification should be sent by email ('y', 'n')
- 'created' *string* - The date of alert creation
- 'lastrun' *string* - The last running date
- 'guest' *bool* - If the user is a guest user
- 'guesttxt' *string* - The HTML content of the warning box for guest users (produced by webaccount.tmpl_warning_guest_user)
"""
# load the right message language
_ = gettext_set_language(ln)
out = '
' + _("Set a new alert from %(x_url1_open)syour searches%(x_url1_close)s, the %(x_url2_open)spopular searches%(x_url2_close)s, or the input form.") + '
' + (_("You have defined %s alerts.") % ('' + str(len(alerts)) + '' )) + '
'
if guest:
out += guesttxt
return out
def tmpl_display_alerts(self, ln, permanent, nb_queries_total, nb_queries_distinct, queries, guest, guesttxt):
"""
Displays the list of alerts
Parameters:
- 'ln' *string* - The language to display the interface in
- 'permanent' *string* - If displaying most popular searches ('y') or only personal searches ('n')
- 'nb_queries_total' *string* - The number of personal queries in the last period
- 'nb_queries_distinct' *string* - The number of distinct queries in the last period
- 'queries' *array* - The existing queries:
- 'id' *string* - The id of the associated query
- 'args' *string* - The query string
- 'textargs' *string* - The textual description of the query string
- 'lastrun' *string* - The last running date (only for personal queries)
- 'guest' *bool* - If the user is a guest user
- 'guesttxt' *string* - The HTML content of the warning box for guest users (produced by webaccount.tmpl_warning_guest_user)
"""
# load the right message language
_ = gettext_set_language(ln)
if len(queries) == 0:
out = _("You have not executed any search yet. Please go to the %(x_url_open)ssearch interface%(x_url_close)s first.") % \
{'x_url_open': '',
'x_url_close': ''}
return out
out = ''
# display message: number of items in the list
if permanent == "n":
msg = _("You have performed %(x_nb1)s searches (%(x_nb2)s different questions) during the last 30 days or so.") % {'x_nb1': nb_queries_total,
'x_nb2': nb_queries_distinct}
out += '
' + msg + '
'
else:
# permanent="y"
msg = _("Here are the %s most popular searches.")
msg %= ('' + str(len(queries)) + '')
out += '
' + msg + '
'
# display the list of searches
out += """
%(no)s
%(question)s
%(action)s
""" % {
'no' : "#",
'question' : _("Question"),
'action' : _("Action")
}
if permanent == "n":
out += '
%s
' % _("Last Run")
out += "
\n"
i = 0
for query in queries :
i += 1
# id, pattern, base, search url and search set alert, date
out += """
""" % {
'index' : i,
'textargs' : query['textargs'],
'weburl' : weburl,
'args' : cgi.escape(query['args']),
'id' : query['id'],
'ln': ln,
'execute_query' : _("Execute search"),
'set_alert' : _("Set new alert")
}
if permanent == "n":
out += '
%s
' % query['lastrun']
out += """
\n"""
out += "
\n"
if guest :
out += guesttxt
return out
def tmpl_alert_email_headers(self, name, headers):
headers['Subject'] = 'Alert %s run on %s' % (
name, time.strftime("%Y-%m-%d"))
- headers['From'] = '%s Alert Engine <%s>' % (cdsname, alertengineemail)
+ headers['From'] = '%s Alert Engine <%s>' % (cdsname, CFG_WEBALERT_ALERT_ENGINE_EMAIL)
def tmpl_alert_email_body(self, name, url, records, pattern,
catalogues, frequency):
l = len(catalogues)
if l == 0:
collections = ''
elif l == 1:
collections = "collection: %s\n" % catalogues[0]
else:
collections = "collections: %s\n" % wrap(', '.join(catalogues))
if pattern:
pattern = 'pattern: %s\n' % pattern
frequency = {'day': 'daily',
'week': 'weekly',
'month': 'monthly'}[frequency]
l = len(records)
if l == 1:
total = '1 record'
else:
total = '%d records' % l
body = """\
Hello:
Below are the results of the email notification alert that
you set up with the %(cdsname)s.
This is an automatic message, please don't reply to it.
For any question, please use <%(supportemail)s> instead.
alert name: %(name)s
%(pattern)s%(collections)sfrequency: %(frequency)s
run time: %(runtime)s
found: %(total)s
url: <%(url)s>
""" % {'supportemail': supportemail,
'name': name,
'cdsname': cdsname,
'pattern': pattern,
'collections': collections,
'frequency': frequency,
'runtime': time.strftime("%a %Y-%m-%d %H:%M:%S"),
'total': total,
'url': url}
for index, recid in enumerate(records[:CFG_WEBALERT_MAX_NUM_OF_RECORDS_IN_ALERT_EMAIL]):
body += "\n%i) " % (index + 1)
body += self.tmpl_alert_email_record(recid)
body += "\n"
if len(records) > CFG_WEBALERT_MAX_NUM_OF_RECORDS_IN_ALERT_EMAIL:
body += '''
Only the first %s records were displayed. Please consult the search
URL given at the top of this email to see all the results.
''' % CFG_WEBALERT_MAX_NUM_OF_RECORDS_IN_ALERT_EMAIL
body += '''
--
%s Alert Service <%s>
Unsubscribe? See <%s>
Need human intervention? Contact <%s>
''' % (cdsname, weburl, weburl + '/youralerts/list', supportemail)
return body
def tmpl_alert_email_record(self, recid):
""" Format a single record."""
out = wrap(get_as_text(recid))
out += "Detailed record: <%s/record/%s>" % (weburl, recid)
return out
diff --git a/modules/webbasket/lib/webbasket_webinterface.py b/modules/webbasket/lib/webbasket_webinterface.py
index c8f5f7d2e..dbffc7670 100644
--- a/modules/webbasket/lib/webbasket_webinterface.py
+++ b/modules/webbasket/lib/webbasket_webinterface.py
@@ -1,742 +1,742 @@
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""WebBasket Web Interface."""
__revision__ = "$Id$"
__lastupdated__ = """$Date$"""
from mod_python import apache
-from invenio.config import weburl, webdir, cdslang, \
+from invenio.config import weburl, CFG_WEBDIR, cdslang, \
CFG_ACCESS_CONTROL_LEVEL_SITE
from invenio.messages import gettext_set_language
from invenio.webpage import page
from invenio.webuser import getUid, page_not_authorized, isGuestUser
from invenio.webbasket import *
from invenio.webbasket_config import CFG_WEBBASKET_CATEGORIES, \
CFG_WEBBASKET_ACTIONS
from invenio.urlutils import get_referer, redirect_to_url
from invenio.webinterface_handler import wash_urlargd, WebInterfaceDirectory
class WebInterfaceYourBasketsPages(WebInterfaceDirectory):
"""Defines the set of /yourbaskets pages."""
_exports = ['', 'display', 'display_item', 'write_comment',
'save_comment', 'delete_comment', 'add', 'delete',
'modify', 'edit', 'create_basket', 'display_public',
'list_public_baskets', 'unsubscribe', 'subscribe']
def index(self, req, form):
"""Index page."""
redirect_to_url(req, '%s/yourbaskets/display?%s' % (weburl, req.args))
def display(self, req, form):
"""Display basket"""
argd = wash_urlargd(form, {'category':
(str, CFG_WEBBASKET_CATEGORIES['PRIVATE']),
'topic': (int, 0),
'group': (int, 0),
'bsk_to_sort': (int, 0),
'sort_by_title': (str, ""),
'sort_by_date': (str, ""),
'of': (str, '')
})
_ = gettext_set_language(argd['ln'])
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1:
return page_not_authorized(req, "../yourbaskets/display",
navmenuid = 'yourbaskets')
(body, errors, warnings) = perform_request_display(uid,
argd['category'],
argd['topic'],
argd['group'],
argd['ln'])
if isGuestUser(uid):
body = create_guest_warning_box(argd['ln']) + body
navtrail = ''\
'%s'
navtrail %= (weburl, argd['ln'], _("Your Account"))
navtrail_end = create_basket_navtrail(uid=uid,
category=argd['category'],
topic=argd['topic'],
group=argd['group'],
ln=argd['ln'])
return page(title = _("Display baskets"),
body = body,
navtrail = navtrail + navtrail_end,
uid = uid,
lastupdated = __lastupdated__,
language = argd['ln'],
errors = errors,
warnings = warnings,
req = req,
navmenuid = 'yourbaskets',
of = argd['of'])
def display_item(self, req, form):
""" Display basket item """
argd = wash_urlargd(form, {'bskid': (int, 0),
'recid': (int, 0),
'format': (str, "hb"),
'category':
(str, CFG_WEBBASKET_CATEGORIES['PRIVATE']),
'topic': (int, 0),
'group': (int, 0),
'of': (str, '')
})
_ = gettext_set_language(argd['ln'])
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1:
return page_not_authorized(req, "../yourbaskets/display_item",
navmenuid = 'yourbaskets')
(body, errors, warnings) = perform_request_display_item(
uid=uid,
bskid=argd['bskid'],
recid=argd['recid'],
format=argd['format'],
category=argd['category'],
topic=argd['topic'],
group_id=argd['group'],
ln=argd['ln'])
if isGuestUser(uid):
body = create_guest_warning_box(argd['ln']) + body
navtrail = ''\
'%s'
navtrail %= (weburl, argd['ln'], _("Your Account"))
navtrail_end = create_basket_navtrail(uid=uid,
category=argd['category'],
topic=argd['topic'],
group=argd['group'],
bskid=argd['bskid'],
ln=argd['ln'])
return page(title = _("Details and comments"),
body = body,
navtrail = navtrail + navtrail_end,
uid = uid,
lastupdated = __lastupdated__,
language = argd['ln'],
errors = errors,
warnings = warnings,
req = req,
navmenuid = 'yourbaskets',
of = argd['of'])
def write_comment(self, req, form):
"""Write a comment (just interface for writing)"""
argd = wash_urlargd(form, {'bskid': (int, 0),
'recid': (int, 0),
'cmtid': (int, 0),
'category':
(str, CFG_WEBBASKET_CATEGORIES['PRIVATE']),
'topic': (int, 0),
'group': (int, 0),
'of' : (str, '')
})
_ = gettext_set_language(argd['ln'])
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1:
return page_not_authorized(req, "../yourbaskets/write_comment",
navmenuid = 'yourbaskets')
(body, errors, warnings) = perform_request_write_comment(
uid=uid,
bskid=argd['bskid'],
recid=argd['recid'],
cmtid=argd['cmtid'],
category=argd['category'],
topic=argd['topic'],
group_id=argd['group'],
ln=argd['ln'])
navtrail = ''\
'%s'
navtrail %= (weburl, argd['ln'], _("Your Account"))
navtrail_end = create_basket_navtrail(uid=uid,
category=argd['category'],
topic=argd['topic'],
group=argd['group'],
bskid=argd['bskid'],
ln=argd['ln'])
return page(title = _("Write a comment"),
body = body,
navtrail = navtrail + navtrail_end,
uid = uid,
lastupdated = __lastupdated__,
language = argd['ln'],
errors = errors,
warnings = warnings,
req = req,
navmenuid = 'yourbaskets',
of = argd['of'])
def save_comment(self, req, form):
"""Save comment on record in basket"""
argd = wash_urlargd(form, {'bskid': (int, 0),
'recid': (int, 0),
'title': (str, ""),
'text': (str, ""),
'category':
(str, CFG_WEBBASKET_CATEGORIES['PRIVATE']),
'topic': (int, 0),
'group': (int, 0),
'of' : (str, '')
})
_ = gettext_set_language(argd['ln'])
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1:
return page_not_authorized(req, "../yourbaskets/save_comment",
navmenuid = 'yourbaskets')
(errors_saving, infos) = perform_request_save_comment(
uid=uid,
bskid=argd['bskid'],
recid=argd['recid'],
title=argd['title'],
text=argd['text'],
ln=argd['ln'])
(body, errors_displaying, warnings) = perform_request_display_item(
uid=uid,
bskid=argd['bskid'],
recid=argd['recid'],
format='hb',
category=argd['category'],
topic=argd['topic'],
group_id=argd['group'],
ln=argd['ln'])
body = create_infobox(infos) + body
errors = errors_saving.extend(errors_displaying)
navtrail = ''\
'%s'
navtrail %= (weburl, argd['ln'], _("Your Account"))
navtrail_end = create_basket_navtrail(uid=uid,
category=argd['category'],
topic=argd['topic'],
group=argd['group'],
bskid=argd['bskid'],
ln=argd['ln'])
return page(title = _("Details and comments"),
body = body,
navtrail = navtrail + navtrail_end,
uid = uid,
lastupdated = __lastupdated__,
language = argd['ln'],
errors = errors,
warnings = warnings,
req = req,
navmenuid = 'yourbaskets',
of = argd['of'])
def delete_comment(self, req, form):
"""Delete a comment
@param bskid: id of basket (int)
@param recid: id of record (int)
@param cmtid: id of comment (int)
@param category: category (see webbasket_config) (str)
@param topic: nb of topic currently displayed (int)
@param group: id of group baskets currently displayed (int)
@param ln: language"""
argd = wash_urlargd(form, {'bskid': (int, 0),
'recid': (int, 0),
'cmtid': (int, 0),
'category':
(str, CFG_WEBBASKET_CATEGORIES['PRIVATE']),
'topic': (int, 0),
'group': (int, 0),
'of' : (str, '')
})
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1:
return page_not_authorized(req, "../yourbaskets/delete_comment",
navmenuid = 'yourbaskets')
_ = gettext_set_language(argd['ln'])
url = weburl + '/yourbaskets/display_item?recid=%i&bskid=%i' % \
(argd['recid'], argd['bskid'])
url += '&category=%s&topic=%i&group=%i&ln=%s' % \
(argd['category'], argd['topic'],
argd['group'], argd['ln'])
errors = perform_request_delete_comment(uid,
argd['bskid'],
argd['recid'],
argd['cmtid'])
if not(len(errors)):
redirect_to_url(req, url)
else:
return page(uid = uid,
title = '',
body = '',
language = argd['ln'],
errors = errors,
req = req,
navmenuid = 'yourbaskets',
of = argd['of'])
def add(self, req, form):
"""Add records to baskets.
@param recid: list of records to add
@param bskids: list of baskets to add records to. if not provided,
will return a page where user can select baskets
@param referer: URL of the referring page
@param new_basket_name: add record to new basket
@param new_topic_name: new basket goes into new topic
@param create_in_topic: # of topic to put basket into
@param ln: language"""
argd = wash_urlargd(form, {'recid': (list, []),
'bskids': (list, []),
'referer': (str, ""),
'new_basket_name': (str, ""),
'new_topic_name': (str, ""),
'create_in_topic': (int, -1),
"of" : (str, '')
})
_ = gettext_set_language(argd['ln'])
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1:
return page_not_authorized(req, "../yourbaskets/add",
navmenuid = 'yourbaskets')
if not argd['referer']:
argd['referer'] = get_referer(req)
(body, errors, warnings) = perform_request_add(
uid=uid,
recids=argd['recid'],
bskids=argd['bskids'],
referer=argd['referer'],
new_basket_name=argd['new_basket_name'],
new_topic_name=argd['new_topic_name'],
create_in_topic=argd['create_in_topic'],
ln=argd['ln'])
if isGuestUser(uid):
body = create_guest_warning_box(argd['ln']) + body
if not(len(warnings)) :
title = _("Your Baskets")
else:
title = _("Add records to baskets")
navtrail = ''\
'%s'
navtrail %= (weburl, argd['ln'], _("Your Account"))
return page(title = title,
body = body,
navtrail = navtrail,
uid = uid,
lastupdated = __lastupdated__,
language = argd['ln'],
errors = errors,
warnings = warnings,
req = req,
navmenuid = 'yourbaskets',
of = argd['of'])
def delete(self, req, form):
"""Delete basket interface"""
argd = wash_urlargd(form, {'bskid': (int, -1),
'confirmed': (int, 0),
'category':
(str, CFG_WEBBASKET_CATEGORIES['PRIVATE']),
'topic': (int, 0),
'group': (int, 0),
'of' : (str, '')
})
_ = gettext_set_language(argd['ln'])
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1:
return page_not_authorized(req, "../yourbaskets/delete",
navmenuid = 'yourbaskets')
(body, errors, warnings)=perform_request_delete(
uid=uid,
bskid=argd['bskid'],
confirmed=argd['confirmed'],
category=argd['category'],
selected_topic=argd['topic'],
selected_group_id=argd['group'],
ln=argd['ln'])
if argd['confirmed']:
url = weburl
url += '/yourbaskets/display?category=%s&topic=%i&group=%i&ln=%s' %\
(argd['category'], argd['topic'], argd['group'], argd['ln'])
redirect_to_url(req, url)
else:
navtrail = ''\
'%s'
navtrail %= (weburl, argd['ln'], _("Your Account"))
navtrail_end = create_basket_navtrail(uid=uid,
category=argd['category'],
topic=argd['topic'],
group=argd['group'],
bskid=argd['bskid'],
ln=argd['ln'])
if isGuestUser(uid):
body = create_guest_warning_box(argd['ln']) + body
return page(title = _("Delete a basket"),
body = body,
navtrail = navtrail + navtrail_end,
uid = uid,
lastupdated = __lastupdated__,
language = argd['ln'],
errors = errors,
warnings = warnings,
req = req,
navmenuid = 'yourbaskets',
of = argd['of'])
def modify(self, req, form):
"""Modify basket content interface (reorder, suppress record, etc.)"""
argd = wash_urlargd(form, {'action': (str, ""),
'bskid': (int, -1),
'recid': (int, 0),
'category':
(str, CFG_WEBBASKET_CATEGORIES['PRIVATE']),
'topic': (int, 0),
'group': (int, 0),
'of' : (str, '')
})
_ = gettext_set_language(argd['ln'])
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1:
return page_not_authorized(req, "../yourbaskets/modify",
navmenuid = 'yourbaskets')
url = weburl
url += '/yourbaskets/display?category=%s&topic=%i&group=%i&ln=%s' %\
(argd['category'], argd['topic'], argd['group'], argd['ln'])
if argd['action'] == CFG_WEBBASKET_ACTIONS['DELETE']:
delete_record(uid, argd['bskid'], argd['recid'])
redirect_to_url(req, url)
elif argd['action'] == CFG_WEBBASKET_ACTIONS['UP']:
move_record(uid, argd['bskid'], argd['recid'], argd['action'])
redirect_to_url(req, url)
elif argd['action'] == CFG_WEBBASKET_ACTIONS['DOWN']:
move_record(uid, argd['bskid'], argd['recid'], argd['action'])
redirect_to_url(req, url)
elif argd['action'] == CFG_WEBBASKET_ACTIONS['COPY']:
title = _("Copy record to basket")
referer = get_referer(req)
(body, errors, warnings) = perform_request_add(uid=uid,
recids=argd['recid'],
referer=referer,
ln=argd['ln'])
if isGuestUser(uid):
body = create_guest_warning_box(argd['ln']) + body
else:
title = ''
body = ''
warnings = ''
errors = [('ERR_WEBBASKET_UNDEFINED_ACTION',)]
navtrail = ''\
'%s'
navtrail %= (weburl, argd['ln'], _("Your Account"))
navtrail_end = create_basket_navtrail(uid=uid,
category=argd['category'],
topic=argd['topic'],
group=argd['group'],
bskid=argd['bskid'],
ln=argd['ln'])
return page(title = title,
body = body,
navtrail = navtrail + navtrail_end,
uid = uid,
lastupdated = __lastupdated__,
language = argd['ln'],
errors = errors,
warnings = warnings,
req = req,
navmenuid = 'yourbaskets',
of = argd['of'])
def edit(self, req, form):
"""Edit basket interface"""
argd = wash_urlargd(form, {'bskid': (int, 0),
'groups': (list, []),
'topic': (int, 0),
'add_group': (str, ""),
'group_cancel': (str, ""),
'submit': (str, ""),
'cancel': (str, ""),
'delete': (str, ""),
'new_name': (str, ""),
'new_topic': (int, -1),
'new_topic_name': (str, ""),
'new_group': (str, ""),
'external': (str, ""),
'of' : (str, '')
})
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1:
return page_not_authorized(req, "../yourbaskets/edit",
navmenuid = 'yourbaskets')
_ = gettext_set_language(argd['ln'])
if argd['cancel']:
url = weburl + '/yourbaskets/display?category=%s&topic=%i&ln=%s'
url %= (CFG_WEBBASKET_CATEGORIES['PRIVATE'], argd['topic'],
argd['ln'])
redirect_to_url(req, url)
elif argd['delete']:
url = weburl
url += '/yourbaskets/delete?bskid=%i&category=%s&topic=%i&ln=%s' %\
(argd['bskid'], CFG_WEBBASKET_CATEGORIES['PRIVATE'],
argd['topic'], argd['ln'])
redirect_to_url(req, url)
elif argd['add_group'] and not(argd['new_group']):
body = perform_request_add_group(uid=uid,
bskid=argd['bskid'],
topic=argd['topic'],
ln=argd['ln'])
errors = []
warnings = []
elif (argd['add_group'] and argd['new_group']) or argd['group_cancel']:
if argd['add_group']:
perform_request_add_group(uid=uid,
bskid=argd['bskid'],
topic=argd['topic'],
group_id=argd['new_group'],
ln=argd['ln'])
(body, errors, warnings) = perform_request_edit(uid=uid,
bskid=argd['bskid'],
topic=argd['topic'],
ln=argd['ln'])
elif argd['submit']:
(body, errors, warnings) = perform_request_edit(
uid=uid,
bskid=argd['bskid'],
topic=argd['topic'],
new_name=argd['new_name'],
new_topic=argd['new_topic'],
new_topic_name=argd['new_topic_name'],
groups=argd['groups'],
external=argd['external'],
ln=argd['ln'])
if argd['new_topic'] != -1:
argd['topic'] = argd['new_topic']
url = weburl + '/yourbaskets/display?category=%s&topic=%i&ln=%s' %\
(CFG_WEBBASKET_CATEGORIES['PRIVATE'],
argd['topic'], argd['ln'])
redirect_to_url(req, url)
else:
(body, errors, warnings) = perform_request_edit(uid=uid,
bskid=argd['bskid'],
topic=argd['topic'],
ln=argd['ln'])
navtrail = ''\
'%s'
navtrail %= (weburl, argd['ln'], _("Your Account"))
navtrail_end = create_basket_navtrail(
uid=uid,
category=CFG_WEBBASKET_CATEGORIES['PRIVATE'],
topic=argd['topic'],
group=0,
bskid=argd['bskid'],
ln=argd['ln'])
if isGuestUser(uid):
body = create_guest_warning_box(argd['ln']) + body
return page(title = _("Edit basket"),
body = body,
navtrail = navtrail + navtrail_end,
uid = uid,
lastupdated = __lastupdated__,
language = argd['ln'],
errors = errors,
warnings = warnings,
req = req,
navmenuid = 'yourbaskets',
of = argd['of'])
def create_basket(self, req, form):
"""Create basket interface"""
argd = wash_urlargd(form, {'new_basket_name': (str, ""),
'new_topic_name': (str, ""),
'create_in_topic': (int, -1),
'topic_number': (int, -1),
'of' : (str, ''),
})
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1:
return page_not_authorized(req, "../yourbaskets/create_basket",
navmenuid = 'yourbaskets')
_ = gettext_set_language(argd['ln'])
if argd['new_basket_name'] and \
(argd['new_topic_name'] or argd['create_in_topic'] != -1):
topic = perform_request_create_basket(
uid=uid,
new_basket_name=argd['new_basket_name'],
new_topic_name=argd['new_topic_name'],
create_in_topic=argd['create_in_topic'],
ln=argd['ln'])
url = weburl + '/yourbaskets/display?category=%s&topic=%i&ln=%s'
url %= (CFG_WEBBASKET_CATEGORIES['PRIVATE'], int(topic), argd['ln'])
redirect_to_url(req, url)
else:
(body, errors, warnings) = perform_request_create_basket(
uid=uid,
new_basket_name=argd['new_basket_name'],
new_topic_name=argd['new_topic_name'],
create_in_topic=argd['create_in_topic'],
topic_number=argd['topic_number'],
ln=argd['ln'])
navtrail = '%s'
navtrail %= (weburl, argd['ln'], _("Your Account"))
if isGuestUser(uid):
body = create_guest_warning_box(argd['ln']) + body
return page(title = _("Create basket"),
body = body,
navtrail = navtrail,
uid = uid,
lastupdated = __lastupdated__,
language = argd['ln'],
errors = errors,
warnings = warnings,
req = req,
navmenuid = 'yourbaskets',
of = argd['of'])
def display_public(self, req, form):
"""Display public basket. If of is x** then output will be XML"""
argd = wash_urlargd(form, {'bskid': (int, 0),
'of': (str, "hb"),
})
_ = gettext_set_language(argd['ln'])
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE == 2:
return page_not_authorized(req, "../yourbaskets/display_public",
navmenuid = 'yourbaskets')
if argd['bskid'] == 0:
# No given basket => display list of public baskets
(body, errors, warnings) = perform_request_list_public_baskets(
0, 1, 1,
argd['ln'])
return page(title = _("List of public baskets"),
body = body,
navtrail = '',
uid = uid,
lastupdated = __lastupdated__,
language = argd['ln'],
errors = errors,
warnings = warnings,
req = req,
of = argd['of'])
if len(argd['of']) and argd['of'][0]=='x':
# XML output
req.content_type = "text/xml"
req.send_http_header()
return perform_request_display_public(bskid=argd['bskid'],
of=argd['of'],
ln=argd['ln'])
(body, errors, warnings) = perform_request_display_public(
bskid=argd['bskid'],
ln=argd['ln'])
referer = get_referer(req)
if 'list_public_basket' not in referer:
referer = weburl + '/yourbaskets/list_public_baskets?ln=' + \
argd['ln']
navtrail = '%s' % \
(referer, _("List of public baskets"))
return page(title = _("Public basket"),
body = body,
navtrail = navtrail,
uid = uid,
lastupdated = __lastupdated__,
language = argd['ln'],
errors = errors,
warnings = warnings,
req = req,
navmenuid = 'yourbaskets',
of = argd['of'])
def list_public_baskets(self, req, form):
"""List of public baskets interface"""
argd = wash_urlargd(form, {'inf_limit': (int, 0),
'order': (int, 1),
'asc': (int, 1),
'of': (str, '')
})
_ = gettext_set_language(argd['ln'])
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE == 2:
return page_not_authorized(req, "../yourbaskets/unsubscribe",
navmenuid = 'yourbaskets')
(body, errors, warnings) = perform_request_list_public_baskets(
argd['inf_limit'],
argd['order'],
argd['asc'], argd['ln'])
return page(title = _("List of public baskets"),
body = body,
navtrail = '',
uid = uid,
lastupdated = __lastupdated__,
language = argd['ln'],
errors = errors,
warnings = warnings,
req = req,
navmenuid = 'yourbaskets',
of = argd['of'])
def unsubscribe(self, req, form):
"""unsubscribe to basket"""
argd = wash_urlargd(form, {'bskid': (int, 0),
'of': (str, '')
})
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE == 2:
return page_not_authorized(req, "../yourbaskets/unsubscribe",
navmenuid = 'yourbaskets')
perform_request_unsubscribe(uid, argd['bskid'])
url = weburl + '/yourbaskets/display?category=%s&ln=%s'
url %= (CFG_WEBBASKET_CATEGORIES['EXTERNAL'], argd['ln'])
redirect_to_url(req, url)
def subscribe(self, req, form):
"""subscribe to basket"""
argd = wash_urlargd(form, {'bskid': (int, 0),
'of': (str, '')
})
uid = getUid(req)
if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE == 2:
return page_not_authorized(req, "../yourbaskets/subscribe",
navmenuid = 'yourbaskets')
errors = perform_request_subscribe(uid, argd['bskid'])
if len(errors):
return page(errors=errors,
uid=uid,
language=argd['ln'],
body = '',
title = '',
req=req,
navmenuid = 'yourbaskets')
url = weburl + '/yourbaskets/display?category=%s&ln=%s'
url %= (CFG_WEBBASKET_CATEGORIES['EXTERNAL'], argd['ln'])
redirect_to_url(req, url)
diff --git a/modules/webcomment/lib/webcomment.py b/modules/webcomment/lib/webcomment.py
index fe9f2c747..1f910bbf5 100644
--- a/modules/webcomment/lib/webcomment.py
+++ b/modules/webcomment/lib/webcomment.py
@@ -1,1050 +1,1050 @@
# -*- coding: utf-8 -*-
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
""" Comments and reviews for records """
__revision__ = "$Id$"
# non CDS Invenio imports:
import time
import math
# CDS Invenio imports:
from invenio.dbquery import run_sql
from invenio.config import cdslang, \
- alertengineemail,\
+ CFG_WEBALERT_ALERT_ENGINE_EMAIL,\
adminemail,\
weburl,\
cdsname,\
CFG_WEBCOMMENT_ALLOW_REVIEWS,\
CFG_WEBCOMMENT_ALLOW_SHORT_REVIEWS,\
CFG_WEBCOMMENT_ALLOW_COMMENTS,\
CFG_WEBCOMMENT_ADMIN_NOTIFICATION_LEVEL,\
CFG_WEBCOMMENT_NB_REPORTS_BEFORE_SEND_EMAIL_TO_ADMIN,\
CFG_WEBCOMMENT_TIMELIMIT_PROCESSING_COMMENTS_IN_SECONDS,\
CFG_WEBCOMMENT_TIMELIMIT_PROCESSING_REVIEWS_IN_SECONDS
from invenio.webmessage_mailutils import email_quote_txt
from invenio.webuser import get_user_info
from invenio.dateutils import convert_datetext_to_dategui, \
datetext_default, \
convert_datestruct_to_datetext
from invenio.mailutils import send_email
from invenio.messages import wash_language, gettext_set_language
from invenio.urlutils import wash_url_argument
from invenio.webuser import isGuestUser
from invenio.webcomment_config import CFG_WEBCOMMENT_ACTION_CODE
try:
import invenio.template
webcomment_templates = invenio.template.load('webcomment')
except:
pass
def perform_request_display_comments_or_remarks(recID, ln=cdslang, display_order='od', display_since='all', nb_per_page=100, page=1, voted=-1, reported=-1, reviews=0, uid=-1):
"""
Returns all the comments (reviews) of a specific internal record or external basket record.
@param recID: record id where (internal record IDs > 0) or (external basket record IDs < -100)
@param display_order: hh = highest helpful score, review only
lh = lowest helpful score, review only
hs = highest star score, review only
ls = lowest star score, review only
od = oldest date
nd = newest date
@param display_since: all= no filtering by date
nd = n days ago
nw = n weeks ago
nm = n months ago
ny = n years ago
where n is a single digit integer between 0 and 9
@param nb_per_page: number of results per page
@param page: results page
@param voted: boolean, active if user voted for a review, see perform_request_vote function
@param reported: boolean, active if user reported a certain comment/review, perform_request_report function
@param reviews: boolean, enabled if reviews, disabled for comments
@param uid: the id of the user who is reading comments
@return html body.
"""
errors = []
warnings = []
nb_reviews = 0
nb_comments = 0
# wash arguments
recID = wash_url_argument(recID, 'int')
ln = wash_language(ln)
display_order = wash_url_argument(display_order, 'str')
display_since = wash_url_argument(display_since, 'str')
nb_per_page = wash_url_argument(nb_per_page, 'int')
page = wash_url_argument(page, 'int')
voted = wash_url_argument(voted, 'int')
reported = wash_url_argument(reported, 'int')
reviews = wash_url_argument(reviews, 'int')
# vital argument check
(valid, error_body) = check_recID_is_in_range(recID, warnings, ln)
if not(valid):
return (error_body, errors, warnings)
# Query the database and filter results
res = query_retrieve_comments_or_remarks(recID, display_order, display_since, reviews)
res2 = query_retrieve_comments_or_remarks(recID, display_order, display_since, not reviews)
nb_res = len(res)
if reviews:
nb_reviews = nb_res
nb_comments = len(res2)
else:
nb_reviews = len(res2)
nb_comments = nb_res
# checking non vital arguemnts - will be set to default if wrong
#if page <= 0 or page.lower() != 'all':
if page < 0:
page = 1
warnings.append(('WRN_WEBCOMMENT_INVALID_PAGE_NB',))
if nb_per_page < 0:
nb_per_page = 100
warnings.append(('WRN_WEBCOMMENT_INVALID_NB_RESULTS_PER_PAGE',))
if CFG_WEBCOMMENT_ALLOW_REVIEWS and reviews:
if display_order not in ['od', 'nd', 'hh', 'lh', 'hs', 'ls']:
display_order = 'hh'
warnings.append(('WRN_WEBCOMMENT_INVALID_REVIEW_DISPLAY_ORDER',))
else:
if display_order not in ['od', 'nd']:
display_order = 'od'
warnings.append(('WRN_WEBCOMMENT_INVALID_DISPLAY_ORDER',))
# filter results according to page and number of reults per page
if nb_per_page > 0:
if nb_res > 0:
last_page = int(math.ceil(nb_res / float(nb_per_page)))
else:
last_page = 1
if page > last_page:
page = 1
warnings.append(("WRN_WEBCOMMENT_INVALID_PAGE_NB",))
if nb_res > nb_per_page: # if more than one page of results
if page < last_page:
res = res[(page-1)*(nb_per_page) : (page*nb_per_page)]
else:
res = res[(page-1)*(nb_per_page) : ]
else: # one page of results
pass
else:
last_page = 1
# Send to template
avg_score = 0.0
if not CFG_WEBCOMMENT_ALLOW_COMMENTS and not CFG_WEBCOMMENT_ALLOW_REVIEWS: # comments not allowed by admin
errors.append(('ERR_WEBCOMMENT_COMMENTS_NOT_ALLOWED',))
if reported > 0:
warnings.append(('WRN_WEBCOMMENT_FEEDBACK_RECORDED',))
elif reported == 0:
warnings.append(('WRN_WEBCOMMENT_ALREADY_REPORTED',))
if CFG_WEBCOMMENT_ALLOW_REVIEWS and reviews:
avg_score = calculate_avg_score(res)
if voted > 0:
warnings.append(('WRN_WEBCOMMENT_FEEDBACK_RECORDED',))
elif voted == 0:
warnings.append(('WRN_WEBCOMMENT_ALREADY_VOTED',))
body = webcomment_templates.tmpl_get_comments(recID,
ln,
nb_per_page, page, last_page,
display_order, display_since,
CFG_WEBCOMMENT_ALLOW_REVIEWS,
res, nb_comments, avg_score,
warnings,
border=0,
reviews=reviews,
total_nb_reviews=nb_reviews,
uid=uid)
return (body, errors, warnings)
def perform_request_vote(cmt_id, client_ip_address, value, uid=-1):
"""
Vote positively or negatively for a comment/review
@param cmt_id: review id
@param value: +1 for voting positively
-1 for voting negatively
@return integer 1 if successful, integer 0 if not
"""
cmt_id = wash_url_argument(cmt_id, 'int')
client_ip_address = wash_url_argument(client_ip_address, 'str')
value = wash_url_argument(value, 'int')
uid = wash_url_argument(uid, 'int')
if cmt_id > 0 and value in [-1, 1] and check_user_can_vote(cmt_id, client_ip_address, uid):
action_date = convert_datestruct_to_datetext(time.localtime())
action_code = CFG_WEBCOMMENT_ACTION_CODE['VOTE']
query = """INSERT INTO cmtACTIONHISTORY (id_cmtRECORDCOMMENT,
id_bibrec, id_user, client_host, action_time,
action_code)
VALUES (%i, NULL ,%i, inet_aton('%s'), '%s', '%s')"""
query %= (cmt_id, uid, client_ip_address, action_date, action_code)
run_sql(query)
return query_record_useful_review(cmt_id, value)
else:
return 0
def check_user_can_comment(recID, client_ip_address, uid=-1):
""" Check if a user hasn't already commented within the last seconds
time limit: CFG_WEBCOMMENT_TIMELIMIT_PROCESSING_COMMENTS_IN_SECONDS
@param recID: record id
@param client_ip_address: IP => use: str(req.get_remote_host(apache.REMOTE_NOLOOKUP))
@param uid: user id, as given by invenio.webuser.getUid(req)
"""
recID = wash_url_argument(recID, 'int')
client_ip_address = wash_url_argument(client_ip_address, 'str')
uid = wash_url_argument(uid, 'int')
max_action_time = time.time() - CFG_WEBCOMMENT_TIMELIMIT_PROCESSING_COMMENTS_IN_SECONDS
max_action_time = convert_datestruct_to_datetext(time.localtime(max_action_time))
action_code = CFG_WEBCOMMENT_ACTION_CODE['ADD_COMMENT']
query = """SELECT id_bibrec
FROM cmtACTIONHISTORY
WHERE id_bibrec=%i AND
action_code='%s' AND
action_time>'%s'
""" % (recID, action_code, max_action_time)
if uid < 0:
query += " AND client_host=inet_aton('%s')" % client_ip_address
else:
query += " AND id_user=%i" % uid
res = run_sql(query)
return len(res) == 0
def check_user_can_review(recID, client_ip_address, uid=-1):
""" Check if a user hasn't already reviewed within the last seconds
time limit: CFG_WEBCOMMENT_TIMELIMIT_PROCESSING_REVIEWS_IN_SECONDS
@param cmt_id: comment id
@param client_ip_address: IP => use: str(req.get_remote_host(apache.REMOTE_NOLOOKUP))
@param uid: user id, as given by invenio.webuser.getUid(req)
"""
action_code = CFG_WEBCOMMENT_ACTION_CODE['ADD_REVIEW']
query = """SELECT id_bibrec
FROM cmtACTIONHISTORY
WHERE id_bibrec=%i AND
action_code='%s'
""" % (recID, action_code)
if uid < 0:
query += " AND client_host=inet_aton('%s')" % client_ip_address
else:
query += " AND id_user=%i" % uid
res = run_sql(query)
return len(res) == 0
def check_user_can_vote(cmt_id, client_ip_address, uid=-1):
""" Checks if a user hasn't already voted
@param cmt_id: comment id
@param client_ip_address: IP => use: str(req.get_remote_host(apache.REMOTE_NOLOOKUP))
@param uid: user id, as given by invenio.webuser.getUid(req)
"""
cmt_id = wash_url_argument(cmt_id, 'int')
client_ip_address = wash_url_argument(client_ip_address, 'str')
uid = wash_url_argument(uid, 'int')
query = """SELECT id_cmtRECORDCOMMENT
FROM cmtACTIONHISTORY
WHERE id_cmtRECORDCOMMENT=%i""" % cmt_id
if uid < 0:
query += " AND client_host=inet_aton('%s')" % client_ip_address
else:
query += " AND id_user=%i" % uid
res = run_sql(query)
return (len(res) == 0)
def perform_request_report(cmt_id, client_ip_address, uid=-1):
"""
Report a comment/review for inappropriate content.
Will send an email to the administrator if number of reports is a multiple of CFG_WEBCOMMENT_NB_REPORTS_BEFORE_SEND_EMAIL_TO_ADMIN
@param cmt_id: comment id
@return integer 1 if successful, integer 0 if not
"""
cmt_id = wash_url_argument(cmt_id, 'int')
if cmt_id <= 0:
return 0
(query_res, nb_abuse_reports) = query_record_report_this(cmt_id)
if query_res == 0:
return 0
if not(check_user_can_report(cmt_id, client_ip_address, uid)):
return 0
action_date = convert_datestruct_to_datetext(time.localtime())
action_code = CFG_WEBCOMMENT_ACTION_CODE['REPORT_ABUSE']
query = """INSERT INTO cmtACTIONHISTORY (id_cmtRECORDCOMMENT, id_bibrec,
id_user, client_host, action_time, action_code)
VALUES (%i, NULL, %i, inet_aton('%s'), '%s', '%s')"""
query %= (cmt_id, uid, client_ip_address, action_date, action_code)
run_sql(query)
if nb_abuse_reports % CFG_WEBCOMMENT_NB_REPORTS_BEFORE_SEND_EMAIL_TO_ADMIN == 0:
(cmt_id2,
id_bibrec,
id_user,
cmt_body,
cmt_date,
cmt_star,
cmt_vote, cmt_nb_votes_total,
cmt_title,
cmt_reported) = query_get_comment(cmt_id)
(user_nb_abuse_reports,
user_votes,
user_nb_votes_total) = query_get_user_reports_and_votes(int(id_user))
(nickname, user_email, last_login) = query_get_user_contact_info(id_user)
- from_addr = '%s Alert Engine <%s>' % (cdsname, alertengineemail)
+ from_addr = '%s Alert Engine <%s>' % (cdsname, CFG_WEBALERT_ALERT_ENGINE_EMAIL)
to_addr = adminemail
subject = "An error report has been sent from a user"
body = '''
The following comment has been reported a total of %(cmt_reported)s times.
Author: nickname = %(nickname)s
email = %(user_email)s
user_id = %(uid)s
This user has:
total number of reports = %(user_nb_abuse_reports)s
%(votes)s
Comment: comment_id = %(cmt_id)s
record_id = %(id_bibrec)s
date written = %(cmt_date)s
nb reports = %(cmt_reported)s
%(review_stuff)s
body =
---start body---
%(cmt_body)s
---end body---
Please go to the WebComment Admin interface %(comment_admin_link)s to delete this message if necessary. A warning will be sent to the user in question.''' % \
{ 'cfg-report_max' : CFG_WEBCOMMENT_NB_REPORTS_BEFORE_SEND_EMAIL_TO_ADMIN,
'nickname' : nickname,
'user_email' : user_email,
'uid' : id_user,
'user_nb_abuse_reports' : user_nb_abuse_reports,
'user_votes' : user_votes,
'votes' : CFG_WEBCOMMENT_ALLOW_REVIEWS and \
"total number of positive votes\t= %s\n\t\t\t\ttotal number of negative votes\t= %s" % \
(user_votes, (user_nb_votes_total - user_votes)) or "\n",
'cmt_id' : cmt_id,
'id_bibrec' : id_bibrec,
'cmt_date' : cmt_date,
'cmt_reported' : cmt_reported,
'review_stuff' : CFG_WEBCOMMENT_ALLOW_REVIEWS and \
"star score\t\t= %s\n\t\t\treview title\t\t= %s" % (cmt_star, cmt_title) or "",
'cmt_body' : cmt_body,
'comment_admin_link' : weburl + "/admin/webcomment/webcommentadmin.py",
'user_admin_link' : "user_admin_link" #! FIXME
}
#FIXME to be added to email when websession module is over:
#If you wish to ban the user, you can do so via the User Admin Panel %(user_admin_link)s.
send_email(from_addr, to_addr, subject, body)
return 1
def check_user_can_report(cmt_id, client_ip_address, uid=-1):
""" Checks if a user hasn't already reported a comment
@param cmt_id: comment id
@param client_ip_address: IP => use: str(req.get_remote_host(apache.REMOTE_NOLOOKUP))
@param uid: user id, as given by invenio.webuser.getUid(req)
"""
cmt_id = wash_url_argument(cmt_id, 'int')
client_ip_address = wash_url_argument(client_ip_address, 'str')
uid = wash_url_argument(uid, 'int')
query = """SELECT id_cmtRECORDCOMMENT
FROM cmtACTIONHISTORY
WHERE id_cmtRECORDCOMMENT=%i""" % cmt_id
if uid < 0:
query += " AND client_host=inet_aton('%s')" % client_ip_address
else:
query += " AND id_user=%i" % uid
res = run_sql(query)
return (len(res) == 0)
def query_get_user_contact_info(uid):
"""
Get the user contact information
@return tuple (nickname, email, last_login), if none found return ()
Note: for the moment, if no nickname, will return email address up to the '@'
"""
query1 = """SELECT nickname, email,
DATE_FORMAT(last_login, '%%Y-%%m-%%d %%H:%%i:%%s')
FROM user WHERE id=%s"""
params1 = (uid,)
res1 = run_sql(query1, params1)
if res1:
return res1[0]
else:
return ()
def query_get_user_reports_and_votes(uid):
"""
Retrieve total number of reports and votes of a particular user
@param uid: user id
@return tuple (total_nb_reports, total_nb_votes_yes, total_nb_votes_total)
if none found return ()
"""
query1 = """SELECT nb_votes_yes,
nb_votes_total,
nb_abuse_reports
FROM cmtRECORDCOMMENT
WHERE id_user=%s"""
params1 = (uid,)
res1 = run_sql(query1, params1)
if len(res1) == 0:
return ()
nb_votes_yes = nb_votes_total = nb_abuse_reports = 0
for cmt_tuple in res1:
nb_votes_yes += int(cmt_tuple[0])
nb_votes_total += int(cmt_tuple[1])
nb_abuse_reports += int(cmt_tuple[2])
return (nb_abuse_reports, nb_votes_yes, nb_votes_total)
def query_get_comment(comID):
"""
Get all fields of a comment
@param comID: comment id
@return tuple (comID, id_bibrec, id_user, body, date_creation, star_score, nb_votes_yes, nb_votes_total, title, nb_abuse_reports)
if none found return ()
"""
query1 = """SELECT id,
id_bibrec,
id_user,
body,
DATE_FORMAT(date_creation, '%%Y-%%m-%%d %%H:%%i:%%s'),
star_score,
nb_votes_yes,
nb_votes_total,
title,
nb_abuse_reports
FROM cmtRECORDCOMMENT
WHERE id=%s"""
params1 = (comID,)
res1 = run_sql(query1, params1)
if len(res1)>0:
return res1[0]
else:
return ()
def query_record_report_this(comID):
"""
Increment the number of reports for a comment
@param comID: comment id
@return tuple (success, new_total_nb_reports_for_this_comment) where
success is integer 1 if success, integer 0 if not
if none found, return ()
"""
#retrieve nb_abuse_reports
query1 = "SELECT nb_abuse_reports FROM cmtRECORDCOMMENT WHERE id=%s"
params1 = (comID,)
res1 = run_sql(query1, params1)
if len(res1)==0:
return ()
#increment and update
nb_abuse_reports = int(res1[0][0]) + 1
query2 = "UPDATE cmtRECORDCOMMENT SET nb_abuse_reports=%s WHERE id=%s"
params2 = (nb_abuse_reports, comID)
res2 = run_sql(query2, params2)
return (int(res2), nb_abuse_reports)
def query_record_useful_review(comID, value):
"""
private funciton
Adjust the number of useful votes and number of total votes for a comment.
@param comID: comment id
@param value: +1 or -1
@return integer 1 if successful, integer 0 if not
"""
# retrieve nb_useful votes
query1 = "SELECT nb_votes_total, nb_votes_yes FROM cmtRECORDCOMMENT WHERE id=%s"
params1 = (comID,)
res1 = run_sql(query1, params1)
if len(res1)==0:
return 0
# modify and insert new nb_useful votes
nb_votes_yes = int(res1[0][1])
if value >= 1:
nb_votes_yes = int(res1[0][1]) + 1
nb_votes_total = int(res1[0][0]) + 1
query2 = "UPDATE cmtRECORDCOMMENT SET nb_votes_total=%s, nb_votes_yes=%s WHERE id=%s"
params2 = (nb_votes_total, nb_votes_yes, comID)
res2 = run_sql(query2, params2)
return int(res2)
def query_retrieve_comments_or_remarks (recID, display_order='od', display_since='0000-00-00 00:00:00',
ranking=0):
"""
Private function
Retrieve tuple of comments or remarks from the database
@param recID: record id
@param display_order: hh = highest helpful score
lh = lowest helpful score
hs = highest star score
ls = lowest star score
od = oldest date
nd = newest date
@param display_since: datetime, e.g. 0000-00-00 00:00:00
@param ranking: boolean, enabled if reviews, disabled for comments
@param full_reviews_p: boolean, filter out empty reviews (with score only) if False
@return tuple of comment where comment is
tuple (nickname, date_creation, body, id) if ranking disabled or
tuple (nickname, date_creation, body, nb_votes_yes, nb_votes_total, star_score, title, id)
Note: for the moment, if no nickname, will return email address up to '@'
"""
display_since = calculate_start_date(display_since)
order_dict = { 'hh' : "cmt.nb_votes_yes/(cmt.nb_votes_total+1) DESC, cmt.date_creation DESC ",
'lh' : "cmt.nb_votes_yes/(cmt.nb_votes_total+1) ASC, cmt.date_creation ASC ",
'ls' : "cmt.star_score ASC, cmt.date_creation DESC ",
'hs' : "cmt.star_score DESC, cmt.date_creation DESC ",
'od' : "cmt.date_creation ASC ",
'nd' : "cmt.date_creation DESC "
}
# Ranking only done for comments and when allowed
if ranking and recID > 0:
try:
display_order = order_dict[display_order]
except:
display_order = order_dict['od']
else:
# in case of recID > 0 => external record => no ranking!
ranking = 0
try:
if display_order[-1] == 'd':
display_order = order_dict[display_order]
else:
display_order = order_dict['od']
except:
display_order = order_dict['od']
query = """SELECT user.nickname,
cmt.id_user,
DATE_FORMAT(cmt.date_creation, '%%Y-%%m-%%d %%H:%%i:%%s'),
cmt.body,
%(ranking)s cmt.id
FROM %(table)s cmt LEFT JOIN user ON
user.id=cmt.id_user
WHERE %(id_bibrec)s=%(recID)i
%(ranking_only)s
%(display_since)s
ORDER BY %(display_order)s"""
params = { 'ranking' : ranking and ' cmt.nb_votes_yes, cmt.nb_votes_total, cmt.star_score, cmt.title, ' or '',
'ranking_only' : ranking and ' AND cmt.star_score>0 ' or ' AND cmt.star_score=0 ',
'id_bibrec' : recID > 0 and 'cmt.id_bibrec' or 'cmt.id_bibrec_or_bskEXTREC',
'table' : recID > 0 and 'cmtRECORDCOMMENT' or 'bskRECORDCOMMENT',
'recID' : recID,
'display_since' : display_since=='0000-00-00 00:00:00' and ' ' or 'AND cmt.date_creation>=\'%s\' ' % display_since,
'display_order' : display_order
}
res = run_sql(query % params)
if res:
return res
return ()
def query_add_comment_or_remark(reviews=0, recID=0, uid=-1, msg="", note="", score=0, priority=0, client_ip_address=''):
"""
Private function
Insert a comment/review or remarkinto the database
@param recID: record id
@param uid: user id
@param msg: comment body
@param note: comment title
@param score: review star score
@param priority: remark priority #!FIXME
@return integer >0 representing id if successful, integer 0 if not
"""
current_date = calculate_start_date('0d')
#change utf-8 message into general unicode
msg = msg.decode('utf-8')
note = note.decode('utf-8')
#change general unicode back to utf-8
msg = msg.encode('utf-8')
note = note.encode('utf-8')
query = """INSERT INTO cmtRECORDCOMMENT (id_bibrec,
id_user,
body,
date_creation,
star_score,
nb_votes_total,
title)
VALUES (%s, %s, %s, %s, %s, %s, %s)"""
params = (recID, uid, msg, current_date, score, 0, note)
res = run_sql(query, params)
if res:
action_code = CFG_WEBCOMMENT_ACTION_CODE[reviews and 'ADD_REVIEW' or 'ADD_COMMENT']
action_time = convert_datestruct_to_datetext(time.localtime())
query2 = """INSERT INTO cmtACTIONHISTORY (id_cmtRECORDCOMMENT,
id_bibrec, id_user, client_host, action_time, action_code)
VALUES ('', %i, %i, inet_aton('%s'), '%s', '%s')"""
params2 = (recID, uid, client_ip_address, action_time, action_code)
run_sql(query2%params2)
return int(res)
def calculate_start_date(display_since):
"""
Private function
Returns the datetime of display_since argument in MYSQL datetime format
calculated according to the local time.
@param display_since = all= no filtering
nd = n days ago
nw = n weeks ago
nm = n months ago
ny = n years ago
where n is a single digit number
@return string of wanted datetime.
If 'all' given as argument, will return datetext_default
datetext_default is defined in miscutils/lib/dateutils and
equals 0000-00-00 00:00:00 => MySQL format
If bad arguement given, will return datetext_default
"""
# time type and seconds coefficients
time_types = {'d':0, 'w':0, 'm':0, 'y':0}
## verify argument
# argument wrong size
if (display_since==(None or 'all')) or (len(display_since) > 2):
return datetext_default
try:
nb = int(display_since[0])
except:
return datetext_default
if str(display_since[1]) in time_types:
time_type = str(display_since[1])
else:
return datetext_default
## calculate date
# initialize the coef
if time_type == 'w':
time_types[time_type] = 7
else:
time_types[time_type] = 1
start_time = time.localtime()
start_time = (start_time[0] - nb*time_types['y'],
start_time[1] - nb*time_types['m'],
start_time[2] - nb*time_types['d'] - nb*time_types['w'],
start_time[3],
start_time[4],
start_time[5],
start_time[6],
start_time[7],
start_time[8])
return convert_datestruct_to_datetext(start_time)
def count_comments(recID):
"""
Returns the number of comments made on a record.
"""
recID = int(recID)
query = """SELECT count(id) FROM cmtRECORDCOMMENT
WHERE id_bibrec=%i AND star_score=0"""
return run_sql(query % recID)[0][0]
def count_reviews(recID):
"""
Returns the number of reviews made on a record.
"""
recID = int(recID)
query = """SELECT count(id) FROM cmtRECORDCOMMENT
WHERE id_bibrec=%i AND star_score>0"""
return run_sql(query % recID)[0][0]
def get_first_comments_or_remarks(recID=-1,
ln=cdslang,
nb_comments='all',
nb_reviews='all',
voted=-1,
reported=-1):
"""
Gets nb number comments/reviews or remarks.
In the case of comments, will get both comments and reviews
Comments and remarks sorted by most recent date, reviews sorted by highest helpful score
@param recID: record id
@param ln: language
@param nb: number of comment/reviews or remarks to get
@param voted: 1 if user has voted for a remark
@param reported: 1 if user has reported a comment or review
@return if comment, tuple (comments, reviews) both being html of first nb comments/reviews
if remark, tuple (remakrs, None)
"""
warnings = []
errors = []
voted = wash_url_argument(voted, 'int')
reported = wash_url_argument(reported, 'int')
## check recID argument
if type(recID) is not int:
return ()
if recID >= 1: #comment or review. NB: suppressed reference to basket (handled in webbasket)
if CFG_WEBCOMMENT_ALLOW_REVIEWS:
res_reviews = query_retrieve_comments_or_remarks(recID=recID, display_order="hh", ranking=1)
nb_res_reviews = len(res_reviews)
## check nb argument
if type(nb_reviews) is int and nb_reviews < len(res_reviews):
first_res_reviews = res_reviews[:nb_reviews]
else:
first_res_reviews = res_reviews
if CFG_WEBCOMMENT_ALLOW_COMMENTS:
res_comments = query_retrieve_comments_or_remarks(recID=recID, display_order="od", ranking=0)
nb_res_comments = len(res_comments)
## check nb argument
if type(nb_comments) is int and nb_comments < len(res_comments):
first_res_comments = res_comments[:nb_comments]
else:
first_res_comments = res_comments
else: #error
errors.append(('ERR_WEBCOMMENT_RECID_INVALID', recID)) #!FIXME dont return error anywhere since search page
# comment
if recID >= 1:
comments = reviews = ""
if reported > 0:
warnings.append(('WRN_WEBCOMMENT_FEEDBACK_RECORDED_GREEN_TEXT',))
elif reported == 0:
warnings.append(('WRN_WEBCOMMENT_FEEDBACK_NOT_RECORDED_RED_TEXT',))
if CFG_WEBCOMMENT_ALLOW_COMMENTS: # normal comments
comments = webcomment_templates.tmpl_get_first_comments_without_ranking(recID, ln, first_res_comments, nb_res_comments, warnings)
if CFG_WEBCOMMENT_ALLOW_REVIEWS: # ranked comments
#calculate average score
avg_score = calculate_avg_score(res_reviews)
if voted > 0:
warnings.append(('WRN_WEBCOMMENT_FEEDBACK_RECORDED_GREEN_TEXT',))
elif voted == 0:
warnings.append(('WRN_WEBCOMMENT_FEEDBACK_NOT_RECORDED_RED_TEXT',))
reviews = webcomment_templates.tmpl_get_first_comments_with_ranking(recID, ln, first_res_reviews, nb_res_reviews, avg_score, warnings)
return (comments, reviews)
# remark
else:
return(webcomment_templates.tmpl_get_first_remarks(first_res_comments, ln, nb_res_comments), None)
def calculate_avg_score(res):
"""
private function
Calculate the avg score of reviews present in res
@param res: tuple of tuple returned from query_retrieve_comments_or_remarks
@return a float of the average score rounded to the closest 0.5
"""
c_star_score = 6
avg_score = 0.0
nb_reviews = 0
for comment in res:
if comment[c_star_score] > 0:
avg_score += comment[c_star_score]
nb_reviews += 1
if nb_reviews == 0:
return 0.0
avg_score = avg_score / nb_reviews
avg_score_unit = avg_score - math.floor(avg_score)
if avg_score_unit < 0.25:
avg_score = math.floor(avg_score)
elif avg_score_unit > 0.75:
avg_score = math.floor(avg_score) + 1
else:
avg_score = math.floor(avg_score) + 0.5
if avg_score > 5:
avg_score = 5.0
return avg_score
def perform_request_add_comment_or_remark(recID=0,
uid=-1,
action='DISPLAY',
ln=cdslang,
msg=None,
score=None,
note=None,
priority=None,
reviews=0,
comID=-1,
client_ip_address=None):
"""
Add a comment/review or remark
@param recID: record id
@param uid: user id
@param action: 'DISPLAY' to display add form
'SUBMIT' to submit comment once form is filled
'REPLY' to reply to an existing comment
@param ln: language
@param msg: the body of the comment/review or remark
@param score: star score of the review
@param note: title of the review
@param priority: priority of remark (int)
@param reviews: boolean, if enabled will add a review, if disabled will add a comment
@param comID: if replying, this is the comment id of the commetn are replying to
@return html add form if action is display or reply
html successful added form if action is submit
"""
warnings = []
errors = []
actions = ['DISPLAY', 'REPLY', 'SUBMIT']
_ = gettext_set_language(ln)
## check arguments
check_recID_is_in_range(recID, warnings, ln)
if uid <= 0:
errors.append(('ERR_WEBCOMMENT_UID_INVALID', uid))
return ('', errors, warnings)
user_contact_info = query_get_user_contact_info(uid)
nickname = ''
if user_contact_info:
if user_contact_info[0]:
nickname = user_contact_info[0]
# show the form
if action == 'DISPLAY':
if reviews and CFG_WEBCOMMENT_ALLOW_REVIEWS:
return (webcomment_templates.tmpl_add_comment_form_with_ranking(recID, uid, nickname, ln, msg, score, note, warnings), errors, warnings)
elif not reviews and CFG_WEBCOMMENT_ALLOW_COMMENTS:
return (webcomment_templates.tmpl_add_comment_form(recID, uid, nickname, ln, msg, warnings), errors, warnings)
else:
errors.append(('ERR_WEBCOMMENT_COMMENTS_NOT_ALLOWED',))
elif action == 'REPLY':
if reviews and CFG_WEBCOMMENT_ALLOW_REVIEWS:
errors.append(('ERR_WEBCOMMENT_REPLY_REVIEW',))
return (webcomment_templates.tmpl_add_comment_form_with_ranking(recID, uid, nickname, ln, msg, score, note, warnings), errors, warnings)
elif not reviews and CFG_WEBCOMMENT_ALLOW_COMMENTS:
if comID > 0:
comment = query_get_comment(comID)
if comment:
user_info = get_user_info(comment[2])
if user_info:
date_creation = convert_datetext_to_dategui(str(comment[4]))
msg = _("%(x_name)s wrote on %(x_date)s:")% {'x_name': user_info[2], 'x_date': date_creation}
msg += "\n\n" + comment[3]
msg = email_quote_txt(text=msg)
return (webcomment_templates.tmpl_add_comment_form(recID, uid, nickname, ln, msg, warnings), errors, warnings)
else:
errors.append(('ERR_WEBCOMMENT_COMMENTS_NOT_ALLOWED',))
# check before submitting form
elif action == 'SUBMIT':
if reviews and CFG_WEBCOMMENT_ALLOW_REVIEWS:
if note.strip() in ["", "None"] and not CFG_WEBCOMMENT_ALLOW_SHORT_REVIEWS:
warnings.append(('WRN_WEBCOMMENT_ADD_NO_TITLE',))
if score == 0 or score > 5:
warnings.append(("WRN_WEBCOMMENT_ADD_NO_SCORE",))
if msg.strip() in ["", "None"] and not CFG_WEBCOMMENT_ALLOW_SHORT_REVIEWS:
warnings.append(('WRN_WEBCOMMENT_ADD_NO_BODY',))
# if no warnings, submit
if len(warnings) == 0:
if reviews:
if check_user_can_review(recID, client_ip_address, uid):
success = query_add_comment_or_remark(reviews, recID=recID, uid=uid, msg=msg,
note=note, score=score, priority=0,
client_ip_address=client_ip_address)
else:
warnings.append('WRN_WEBCOMMENT_CANNOT_REVIEW_TWICE')
success = 1
else:
if check_user_can_comment(recID, client_ip_address, uid):
success = query_add_comment_or_remark(reviews, recID=recID, uid=uid, msg=msg,
note=note, score=score, priority=0,
client_ip_address=client_ip_address)
else:
warnings.append('WRN_WEBCOMMENT_TIMELIMIT')
success = 1
if success > 0:
if CFG_WEBCOMMENT_ADMIN_NOTIFICATION_LEVEL > 0:
notify_admin_of_new_comment(comID=success)
return (webcomment_templates.tmpl_add_comment_successful(recID, ln, reviews, warnings), errors, warnings)
else:
errors.append(('ERR_WEBCOMMENT_DB_INSERT_ERROR'))
# if are warnings or if inserting comment failed, show user where warnings are
if reviews and CFG_WEBCOMMENT_ALLOW_REVIEWS:
return (webcomment_templates.tmpl_add_comment_form_with_ranking(recID, uid, nickname, ln, msg, score, note, warnings), errors, warnings)
else:
return (webcomment_templates.tmpl_add_comment_form(recID, uid, nickname, ln, msg, warnings), errors, warnings)
# unknown action send to display
else:
warnings.append(('WRN_WEBCOMMENT_ADD_UNKNOWN_ACTION',))
if reviews and CFG_WEBCOMMENT_ALLOW_REVIEWS:
return (webcomment_templates.tmpl_add_comment_form_with_ranking(recID, uid, ln, msg, score, note, warnings), errors, warnings)
else:
return (webcomment_templates.tmpl_add_comment_form(recID, uid, ln, msg, warnings), errors, warnings)
return ('', errors, warnings)
def notify_admin_of_new_comment(comID):
"""
Sends an email to the admin with details regarding comment with ID = comID
"""
comment = query_get_comment(comID)
if len(comment) > 0:
(comID2,
id_bibrec,
id_user,
body,
date_creation,
star_score, nb_votes_yes, nb_votes_total,
title,
nb_abuse_reports) = comment
else:
return
user_info = query_get_user_contact_info(id_user)
if len(user_info) > 0:
(nickname, email, last_login) = user_info
if not len(nickname) > 0:
nickname = email.split('@')[0]
else:
nickname = email = last_login = "ERROR: Could not retrieve"
from invenio.search_engine import print_record
record = print_record(recID=id_bibrec, format='hs')
review_stuff = '''
Star score = %s
Title = %s''' % (star_score, title)
out = '''
The following %(comment_or_review)s has just been posted (%(date)s).
AUTHOR:
Nickname = %(nickname)s
Email = %(email)s
User ID = %(uid)s
RECORD CONCERNED:
Record ID = %(recID)s
Record =
%(record_details)s
%(comment_or_review_caps)s:
%(comment_or_review)s ID = %(comID)s %(review_stuff)s
Body =
%(body)s
ADMIN OPTIONS:
To delete comment go to %(weburl)s/admin/webcomment/webcommentadmin.py/delete?comid=%(comID)s
''' % \
{ 'comment_or_review' : star_score>0 and 'review' or 'comment',
'comment_or_review_caps': star_score>0 and 'REVIEW' or 'COMMENT',
'date' : date_creation,
'nickname' : nickname,
'email' : email,
'uid' : id_user,
'recID' : id_bibrec,
'record_details' : record,
'comID' : comID2,
'review_stuff' : star_score>0 and review_stuff or "",
'body' : body.replace(' ','\n'),
'weburl' : weburl
}
- from_addr = '%s WebComment <%s>' % (cdsname, alertengineemail)
+ from_addr = '%s WebComment <%s>' % (cdsname, CFG_WEBALERT_ALERT_ENGINE_EMAIL)
to_addr = adminemail
subject = "A new comment/review has just been posted"
send_email(from_addr, to_addr, subject, out)
def check_recID_is_in_range(recID, warnings=[], ln=cdslang):
"""
Check that recID is >= 0
Append error messages to errors listi
@param recID: record id
@param warnings: the warnings list of the calling function
@return tuple (boolean, html) where boolean (1=true, 0=false)
and html is the body of the page to display if there was a problem
"""
# Make errors into a list if needed
if type(warnings) is not list:
errors = [warnings]
try:
recID = int(recID)
except:
pass
if type(recID) is int:
if recID > 0:
from invenio.search_engine import record_exists
success = record_exists(recID)
if success == 1:
return (1,"")
else:
warnings.append(('ERR_WEBCOMMENT_RECID_INEXISTANT', recID))
return (0, webcomment_templates.tmpl_record_not_found(status='inexistant', recID=recID, ln=ln))
elif recID == 0:
warnings.append(('ERR_WEBCOMMENT_RECID_MISSING',))
return (0, webcomment_templates.tmpl_record_not_found(status='missing', recID=recID, ln=ln))
else:
warnings.append(('ERR_WEBCOMMENT_RECID_INVALID', recID))
return (0, webcomment_templates.tmpl_record_not_found(status='invalid', recID=recID, ln=ln))
else:
warnings.append(('ERR_WEBCOMMENT_RECID_NAN', recID))
return (0, webcomment_templates.tmpl_record_not_found(status='nan', recID=recID, ln=ln))
def check_int_arg_is_in_range(value, name, errors, gte_value, lte_value=None):
"""
Check that variable with name 'name' >= gte_value and optionally <= lte_value
Append error messages to errors list
@param value: variable value
@param name: variable name
@param errors: list of error tuples (error_id, value)
@param gte_value: greater than or equal to value
@param lte_value: less than or equal to value
@return boolean (1=true, 0=false)
"""
# Make errors into a list if needed
if type(errors) is not list:
errors = [errors]
if type(value) is not int or type(gte_value) is not int:
errors.append(('ERR_WEBCOMMENT_PROGRAMNING_ERROR',))
return 0
if type(value) is not int:
errors.append(('ERR_WEBCOMMENT_ARGUMENT_NAN', value))
return 0
if value < gte_value:
errors.append(('ERR_WEBCOMMENT_ARGUMENT_INVALID', value))
return 0
if lte_value:
if type(lte_value) is not int:
errors.append(('ERR_WEBCOMMENT_PROGRAMNING_ERROR',))
return 0
if value > lte_value:
errors.append(('ERR_WEBCOMMENT_ARGUMENT_INVALID', value))
return 0
return 1
def get_mini_reviews(recid, ln=cdslang):
"""
Returns the web controls to add reviews to a record from the
detailed record pages mini-panel.
@param recid the id of the displayed record
@param ln the user's language
"""
if CFG_WEBCOMMENT_ALLOW_SHORT_REVIEWS:
action = 'SUBMIT'
else:
action = 'DISPLAY'
reviews = query_retrieve_comments_or_remarks(recid, ranking=1)
return webcomment_templates.tmpl_mini_review(recid, ln, action=action,
avg_score=calculate_avg_score(reviews),
nb_comments_total=len(reviews))
diff --git a/modules/webjournal/lib/webjournal.py b/modules/webjournal/lib/webjournal.py
index b35a2f6aa..e5a3e9751 100644
--- a/modules/webjournal/lib/webjournal.py
+++ b/modules/webjournal/lib/webjournal.py
@@ -1,436 +1,436 @@
# -*- coding: utf-8 -*-
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
from urllib2 import urlopen
import smtplib
import sets
import time
from invenio.bibformat_engine import BibFormatObject, \
format_with_format_template
from invenio.errorlib import register_exception
from invenio.webpage import page
-from invenio.config import weburl, etcdir
+from invenio.config import weburl, CFG_ETCDIR
from invenio.urlutils import redirect_to_url
from invenio.webuser import collect_user_info
from invenio.webjournal_config import InvenioWebJournalNoIndexTemplateError, \
InvenioWebJournalNoIssueNumberTagError, \
InvenioWebJournalNoArticleTemplateError, \
InvenioWebJournalNoArticleRuleError, \
InvenioWebJournalNoPopupTemplateError, \
InvenioWebJournalReleaseUpdateError, \
InvenioWebJournalIssueNotFoundDBError, \
InvenioWebJournalJournalIdNotFoundDBError, \
InvenioWebJournalArchiveDateWronglyFormedError
from invenio.webjournal_utils import get_xml_from_config
from invenio.webjournal_utils import get_recid_from_order_CERNBulletin, \
get_article_page_from_cache, \
cache_article_page, \
createhtmlmail, \
put_css_in_file, \
get_monday_of_the_week, \
get_current_issue_time, \
get_all_issue_weeks, \
release_journal_update, \
get_next_journal_issues, \
issue_times_to_week_strings, \
issue_week_strings_to_times, \
release_journal_issue, \
was_alert_sent_for_issue, \
update_DB_for_alert, \
get_current_issue, \
get_current_publication, \
get_list_of_issues_for_publication, \
count_down_to_monday, \
count_week_string_up
from invenio.webjournal_templates import tmpl_webjournal_alert_success_msg, \
tmpl_webjournal_alert_subject_CERNBulletin, \
tmpl_webjournal_alert_plain_text_CERNBulletin, \
tmpl_webjournal_alert_interface, \
tmpl_webjournal_issue_control_interface, \
tmpl_webjournal_issue_control_success_msg, \
tmpl_webjournal_update_an_issue, \
tmpl_webjournal_updated_issue_msg, \
tmpl_webjournal_alert_was_already_sent, \
tmpl_webjournal_admin_interface
def perform_request_index(req, journal_name, issue_number, language, category):
"""
Central logic function for index pages.
Brings together format templates and MARC rules from the config, with
the requested index page, given by the url parameters.
From config:
- page template for index pages -> formatting
- MARC rule list -> Category Navigation
- MARC tag used for issue numbers -> search (later in the format
elements)
Uses BibFormatObject and format_with_format_template to produce the
required HTML.
"""
# init all the values we need from config.xml
config_strings = get_xml_from_config(["index", "rule", "issue_number"],
journal_name)
try:
try:
index_page_template = config_strings["index"][0]
except:
raise InvenioWebJournalNoIndexTemplateError(language, journal_name)
except InvenioWebJournalNoIndexTemplateError, e:
register_exception(req=req)
return e.user_box()
index_page_template_path = 'webjournal/%s' % (index_page_template)
rule_list = config_strings["rule"]
try:
if len(rule_list) == 0:
raise InvenioWebJournalNoArticleRuleError(language, journal_name)
except InvenioWebJournalNoArticleRuleError, e:
register_exception(req=req)
return e.user_box()
try:
try:
issue_number_tag = config_strings["issue_number"][0]
except:
raise InvenioWebJournalNoIssueNumberTagError(language, journal_name)
except InvenioWebJournalNoIssueNumberTagError, e:
register_exception(req=req)
return e.user_box()
# get the current category for index display
current_category_in_list = 0
i = 0
if category != "":
for rule_string in rule_list:
category_from_config = rule_string.split(",")[0]
if category_from_config.lower() == category.lower():
current_category_in_list = i
i+=1
else:
# add the first category to the url string as a default
req.journal_defaults["category"] = rule_list[0].split(",")[0]
# get the important values for the category from the config file
rule_string = rule_list[current_category_in_list].replace(" ", "")
category = rule_string.split(",")[0]
rule = rule_string.split(",")[1]
marc_datafield = rule.split(":")[0]
rule_match = rule.split(":")[1]
marc_tag = marc_datafield[:3]
marc_ind1 = (str(marc_datafield[3]) == "_") and " " or marc_datafield[3]
marc_ind2 = (str(marc_datafield[4]) == "_") and " " or marc_datafield[4]
marc_subfield = marc_datafield[5]
# create a marc record, containing category and issue number
temp_marc = '''0%s%s''' % (issue_number_tag[:3],
(issue_number_tag[3] == "_") and " " or issue_number_tag[3],
(issue_number_tag[4] == "_") and " " or issue_number_tag[4],
issue_number_tag[5],
issue_number, marc_tag, marc_ind1,
marc_ind2, marc_subfield, rule_match)
#temp_marc = temp_marc.decode('utf-8').encode('utf-8')
# create a record and get HTML back from bibformat
user_info = collect_user_info(req)
bfo = BibFormatObject(0, ln=language, xml_record=temp_marc, user_info=user_info)
bfo.req = req
html = format_with_format_template(index_page_template_path, bfo)[0]
return html
def perform_request_article(req, journal_name, issue_number, language,
category, number, editor):
"""
Central logic function for article pages.
Loads the format template for article display and displays the requested
article using BibFormat.
'Editor' Mode genereates edit links on the article view page and disables
caching.
"""
# init all the values we need from config.xml
config_strings = get_xml_from_config(["detailed", "rule"], journal_name)
try:
try:
index_page_template = config_strings["detailed"][0]
except:
raise InvenioWebJournalNoArticleTemplateError(language,
journal_name)
except InvenioWebJournalNoArticleTemplateError, e:
register_exception(req=req)
return e.user_box()
index_page_template_path = 'webjournal/%s' % (index_page_template)
rule_list = config_strings["rule"]
try:
if len(rule_list) == 0:
raise InvenioWebJournalNoArticleRuleError(language, journal_name)
except InvenioWebJournalNoArticleRuleError, e:
register_exception(req=req)
return e.user_box()
# get the current category for index display
current_category_in_list = 0
i = 0
if category != "":
for rule_string in rule_list:
category_from_config = rule_string.split(",")[0]
if category_from_config.lower() == category.lower():
current_category_in_list = i
i+=1
rule_string = rule_list[current_category_in_list].replace(" ", "")
rule = rule_string.split(",")[1]
# try to get the page from the cache
recid = get_recid_from_order_CERNBulletin(number, rule, issue_number)
cached_html = get_article_page_from_cache(journal_name, category, recid,
issue_number, language)
if cached_html and editor == "False":
return cached_html
# create a record and get HTML back from bibformat
user_info = collect_user_info(req)
bfo = BibFormatObject(recid, ln=language, user_info=user_info)
bfo.req = req
html_out = format_with_format_template(index_page_template_path,
bfo)[0]
# cache if not in editor mode
if editor == "False":
cache_article_page(html_out, journal_name, category,
recid, issue_number, language)
return html_out
def perform_request_administrate(journal_name, language):
"""
"""
current_issue = get_current_issue(language, journal_name)
current_publication = get_current_publication(journal_name,
current_issue,
language)
issue_list = get_list_of_issues_for_publication(current_publication)
next_issue_number = count_week_string_up(issue_list[-1])
return tmpl_webjournal_admin_interface(journal_name, current_issue,
current_publication, issue_list,
next_issue_number, language)
def perform_request_alert(req, journal_name, issue_number, language,
sent, plain_text, subject, recipients,
html_mail, force):
"""
All the logic for alert emails.
Messages are retrieved from templates. (should be migrated to msg class)
Mails can be edited by an interface form.
Sent in HTML/PlainText or only PlainText if wished so.
"""
subject = tmpl_webjournal_alert_subject_CERNBulletin(journal_name,
issue_number)
plain_text = tmpl_webjournal_alert_plain_text_CERNBulletin(journal_name,
language,
issue_number)
plain_text = plain_text.encode('utf-8')
if sent == "False":
interface = tmpl_webjournal_alert_interface(language, journal_name,
subject, plain_text)
return page(title="alert system", body=interface)
else:
if was_alert_sent_for_issue(issue_number,
journal_name,
language) != False and force == "False":
return tmpl_webjournal_alert_was_already_sent(language, journal_name,
subject, plain_text,
recipients,
html_mail, issue_number)
if html_mail == "html":
html_file = urlopen('%s/journal/?name=%s&ln=en'
% (weburl, journal_name))
html_string = html_file.read()
html_file.close()
html_string = put_css_in_file(html_string, journal_name)
else:
html_string = plain_text.replace("\n", " ")
message = createhtmlmail(html_string, plain_text,
subject, recipients)
server = smtplib.SMTP("localhost", 25)
server.sendmail('Bulletin-Support@cern.ch', recipients, message)
# todo: has to go to some messages config
update_DB_for_alert(issue_number, journal_name, language)
return tmpl_webjournal_alert_success_msg(language, journal_name)
def perform_request_issue_control(req, journal_name, issue_numbers,
language, add, action):
"""
Central logic for issue control.
Regenerates the flat files current_issue and issue_group that control
the which issue is currently active for the journal.
Todo: move issue control to DB
"""
if action == "cfg" or action == "Refresh" or action == "Add_One":
# find out if we are in update or release
try:
current_issue_time = get_current_issue_time(journal_name)
all_issue_weeks = get_all_issue_weeks(current_issue_time,
journal_name,
language)
except InvenioWebJournalIssueNotFoundDBError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalJournalIdNotFoundDBError, e:
register_exception(req=req)
return e.user_box()
if max(all_issue_weeks) > current_issue_time:
# propose an update
next_issue_week = None
all_issue_weeks.sort()
for issue_week in all_issue_weeks:
if issue_week > current_issue_time:
next_issue_week = issue_week
break
output = tmpl_webjournal_update_an_issue(language,
journal_name,
issue_times_to_week_strings([next_issue_week,])[0],
issue_times_to_week_strings([current_issue_time,])[0])
else:
# propose a release
next_issues = get_next_journal_issues(current_issue_time,
journal_name)
next_issues = issue_times_to_week_strings(next_issues,
language)
if action == "Refresh":
next_issues += issue_numbers
next_issues = list(sets.Set(next_issues))# avoid double entries
elif action == "Add_One":
next_issues += issue_numbers
next_issues = list(sets.Set(next_issues))# avoid double entries
next_issues_times = issue_week_strings_to_times(next_issues,
language)
highest_issue_so_far = max(next_issues_times)
one_more_issue = get_next_journal_issues(highest_issue_so_far,
journal_name,
language,
1)
one_more_issue = issue_times_to_week_strings(one_more_issue,
language)
next_issues += one_more_issue
next_issues = list(sets.Set(next_issues)) # avoid double entries
next_issues.sort()
else:
# get the next (default 2) issue numbers to publish
next_issues = get_next_journal_issues(current_issue_time,
journal_name,
language)
next_issues = issue_times_to_week_strings(next_issues,
language)
output = tmpl_webjournal_issue_control_interface(language,
journal_name,
next_issues)
elif action == "Publish":
publish_issues = issue_numbers
publish_issues = list(sets.Set(publish_issues)) # avoid double entries
publish_issues.sort()
try:
release_journal_issue(publish_issues, journal_name, language)
except InvenioWebJournalJournalIdNotFoundDBError, e:
register_exception(req=req)
return e.user_box()
output = tmpl_webjournal_issue_control_success_msg(language,
publish_issues, journal_name)
elif action == "Update":
try:
try:
update_issue = issue_numbers[0]
except:
raise InvenioWebJournalReleaseUpdateError(language, journal_name)
except InvenioWebJournalReleaseUpdateError, e:
register_exception(req=req)
return e.user_box()
try:
release_journal_update(update_issue, journal_name, language)
except InvenioWebJournalJournalIdNotFoundDBError, e:
register_exception(req=req)
return e.user_box()
output = tmpl_webjournal_updated_issue_msg(language, update_issue,
journal_name)
return page(title="Publish System", body=output)
def perform_request_popup(req, language, journal_name, type, record):
"""
"""
config_strings = get_xml_from_config(["popup"], journal_name)
try:
try:
popup_page_template = config_strings["popup"][0]
except:
raise InvenioWebJournalNoPopupTemplateError(language)
except InvenioWebJournalNoPopupTemplateError, e:
register_exception(req=req)
return e.user_box()
popup_page_template_path = 'webjournal/%s' % popup_page_template
user_info = collect_user_info(req)
bfo = BibFormatObject(record, ln=language, user_info=user_info)
bfo.req = req
html = format_with_format_template(popup_page_template_path, bfo)[0]
return html
def perform_request_search(journal_name, language, req, issue,
archive_year, archive_issue, archive_select,
archive_date, archive_search):
"""
Logic for the search / archive page.
"""
config_strings = get_xml_from_config(["search", "issue_number", "rule"],
journal_name)
try:
try:
search_page_template = config_strings["search"][0]
except:
raise InvenioWebJournalNoSearchTemplateError(journal_name, language)
except InvenioWebJournalNoSearchTemplateError, e:
register_exception(req=req)
return e.user_box()
search_page_template_path = 'webjournal/%s' % (search_page_template)
# just an empty buffer record, since all values are in req.journal_defaults
if archive_select == "False" and archive_search == "False":
temp_marc = '''0'''
user_info = collect_user_info(req)
bfo = BibFormatObject(0, ln=language, xml_record=temp_marc, user_info=user_info)
bfo.req = req
html = format_with_format_template(search_page_template_path, bfo)[0]
return html
elif archive_select == "Go":
redirect_to_url(req, "%s/journal/?name=%s&issue=%s&ln=%s" % (weburl,
journal_name,
archive_issue,
language))
elif archive_search == "Go":
archive_issue_time = time.strptime(archive_date, "%d/%m/%Y")
archive_issue_time = count_down_to_monday(archive_issue_time)
archive_issue = issue_times_to_week_strings([archive_issue_time,])[0]
redirect_to_url(req, "%s/journal/?name=%s&issue=%s&ln=%s" % (weburl,
journal_name,
archive_issue,
language))
diff --git a/modules/webjournal/lib/webjournal_config.py b/modules/webjournal/lib/webjournal_config.py
index cd701e3b7..f21b07925 100644
--- a/modules/webjournal/lib/webjournal_config.py
+++ b/modules/webjournal/lib/webjournal_config.py
@@ -1,680 +1,680 @@
# -*- coding: utf-8 -*-
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
import os
-from invenio.config import adminemail, supportemail, etcdir, weburl, cdslang
+from invenio.config import adminemail, supportemail, CFG_ETCDIR, weburl, cdslang
from invenio.messages import gettext_set_language
from invenio.webpage import page
from invenio.htmlutils import escape_html
from invenio.messages import gettext_set_language
from invenio.webjournal_utils import parse_url_string
from invenio.webjournal_templates import tmpl_webjournal_error_box,\
tmpl_webjournal_missing_info_box
#from invenio.data_cacher import SQLDataCacher
#
#CFG_JOURNAL_CONFIG_CACHE = {}
#
#def initialize_config_cache():
# """
# """
# journal_id_names = SQLDataCacher("SELECT * FROM jrnJOURNAL", affected_tables=(jrnJOURNAL))
class InvenioWebJournalNoIndexTemplateError(Exception):
"""Exception if no index template is specified in the config."""
def __init__(self, language, journal_name):
"""Initialisation."""
self.journal = journal_name
self.language = language
def __str__(self):
"""String representation."""
return 'Admin did not provide a template for the index page of \
journal: %s. \
The path to such a file should be given in the config.xml of\
this journal under the tag ...\
' % repr(self.journal)
def user_box(self):
"""
user-friendly error message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('Internal configuration error'),
_('There is no format configured for this journals index page'),
'Admin did not provide a template for the index page of journal: %s. \
The path to such a file should be given in the config.xml of\
this journal under the tag ...\
' % escape_html(self.journal))
class InvenioWebJournalNoArticleTemplateError(Exception):
"""
Exception if an article was called without its order number.
"""
def __init__(self, language, journal_name):
"""
Initialisation.
"""
self.journal = journal_name
self.language = language
def __str__(self):
"""
String representation.
"""
return 'Admin did not provide a template for the article view page of journal: %s. \
The path to such a file should be given in the config.xml of this journal \
under the tag ...' % repr(self.journal)
def user_box(self):
"""
user-friendly error message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('Internal configuration error'),
_('There is no format configured for this journals index page'),
'Admin did not provide a template for the index page of journal: %s. \
The path to such a file should be given in the config.xml of\
this journal under the tag ...\
' % escape_html(self.journal))
class InvenioWebJournalNoSearchTemplateError(Exception):
"""
Exception if an article was called without its order number.
"""
def __init__(self, journal_name, language=cdslang):
"""
Initialisation.
"""
self.journal = journal_name
self.language = language
def __str__(self):
"""
String representation.
"""
return 'Admin did not provide a template for the search page view page of journal: %s. \
The path to such a file should be given in the config.xml of this journal \
under the tag ...' % repr(self.journal)
def user_box(self):
"""
user-friendly error message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('Internal configuration error'),
_('There is no format configured for this journals search page'),
'Admin did not provide a template for the search page of journal: %s. \
The path to such a file should be given in the config.xml of\
this journal under the tag ...\
' % escape_html(self.journal))
class InvenioWebJournalNoPopupTemplateError(Exception):
"""
Exception if an article was called without its order number.
"""
def __init__(self, language, journal_name):
"""
Initialisation.
"""
self.journal = journal_name
self.language = language
def __str__(self):
"""
String representation.
"""
return 'Admin did not provide a template for the popup view page \
of journal: %s. \
The path to such a file should be given in the config.xml of this \
journal under the tag \
...' % repr(
self.journal)
def user_box(self):
"""
user-friendly error message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('Internal configuration error'),
_('There is no format configured for this journals popup page'),
'Admin did not provide a template for the popup page of journal: %s. \
The path to such a file should be given in the config.xml of\
this journal under the tag ...\
' % escape_html(self.journal))
class InvenioWebJournalNoArticleRuleError(Exception):
"""
Exception if there are no article type rules defined.
"""
def __init__(self, language, journal_name):
"""
Initialisation.
"""
self.journal = journal_name
self.language = language
def __str__(self):
"""
String representation.
"""
return 'The config.xml file for journal: %s does not contain any \
article rules. These rules are needed to associate collections from \
your Invenio installation to navigable article types. A rule should \
have the form of NameOfArticleType, \
marc_tag:ExpectedContentOfMarcTag' % escape_html(self.journal)
def user_box(self):
"""
user-friendly error message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_("No journal articles"),
_("Problem with the configuration of this journal"),
"The system couldn't find the definitions for different article \
kinds (e.g. News, Sports, etc). If there is nothing defined, \
nothing can be shown and it thus indicates that there is either a \
problem with the setup of this journal or in the Software itself.\
There is nothing you can do at this moment. If you wish you can \
send an inquiry to the responsible developers. We apologize \
for the inconvenience.")
class InvenioWebJournalNoIssueNumberTagError(Exception):
"""
Exception if there is no marc tag for issue number defined.
"""
def __init__(self, language, journal_name):
"""
Initialisation.
"""
self.journal = journal_name
self.language = language
def __str__(self):
"""
String representation.
"""
return 'The config.xml file for journal: %s does not contain a marc tag\
to deduce the issue number from. WebJournal is an issue number based \
system, meaning you have to give some form of numbering system in a \
dedicated marc tag, so the system can see which is the active journal \
publication of the date.' % repr(self.journal)
def user_box(self):
"""
user-friendly error message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_("No journal issues"),
_("Problem with the configuration of this journal"),
"The system couldn't find a definition for an issue \
numbering system. Issue numbers conrol the date of the \
publication you are seing. This indicates that there is an \
error in the setup of this journal or the Software itself. \
There is nothing you can do at the moment. If you wish you \
can send an inquiry to the responsible developers. We \
apologize for the inconvenience.")
class InvenioWebJournalNoArticleNumberError(Exception):
"""
Exception if an article was called without its order number.
"""
def __init__(self, language, journal_name):
"""
Initialisation.
"""
self.journal = journal_name
self.language = language
def __str__(self):
"""
String representation.
"""
return 'In Journal %s an article was called without specifying the order \
of this article in the issue. This parameter is mandatory and should be \
provided by internal links in any case. Maybe this was a bad direct url \
hack. Check where the request came from.' % repr(self.journal)
def user_box(self):
"""
user-friendly error message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('Journal article error'),
_('We could not know which article you were looking for'),
'The url you passed did not provide an article number or the \
article number was badly formed. If you \
came to this page through some link on the journal page, please \
report this to the admin. If you got this link through some \
external resource, e.g. an email, you can try to put in a number \
for the article in the url by hand or just visit the front \
page at %s/journal/?name=%s' % (weburl, self.journal))
class InvenioWebJournalNoJournalOnServerError(Exception):
"""
Exception that is thrown if there are no Journal instances on the server
"""
def __init__(self, language):
"""
Initialisation.
"""
self.language = language
def __str__(self):
"""
String representation.
"""
return 'Apparently there are no journals configured on this \
installation of CDS Invenio. You can try to use the sample Invenio \
Atlantis Journal for testing.'
def user_box(self):
"""
user-friendly message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('No journals available'),
_('We could not provide you any journals'),
_('It seems that there are no journals defined on this server. '
'Please contact support if this is not right.'))
class InvenioWebJournalNoNameError(Exception):
"""
"""
def __init__(self, language):
"""
Initialisation.
"""
self.language = language
def __str__(self):
"""
String representation.
"""
return 'User probably forgot to add the name parameter for the journal\
Maybe you also want to check if dns mappings are configured correctly.'
def user_box(self):
"""
user-friendly message with formatting.
"""
_ = gettext_set_language(self.language)
return webjournal_missing_info_box(self.language,
_("Select a journal on this server"),
_("We couldn't guess which journal you are looking for"),
_("You did not provide an argument for a journal name. "
"Please select the journal you want to read in the list below."))
class InvenioWebJournalNoCurrentIssueError(Exception):
"""
"""
def __init__(self, language):
"""
Initialisation.
"""
self.language = language
def __str__(self):
"""
String representation.
"""
return 'There seems to be no current issue number stored for this \
journal. Is this the first time you use the journal? Otherwise, check\
configuration.'
def user_box(self):
"""
user-friendly message with formatting.
"""
_ = gettext_set_language(self.language)
return webjournal_error_box(self.language,
_('No current issue'),
_('We could not find any informtion on the current issue'),
_('The configuration for the current issue seems to be empty. '
'Try providing an issue number or check with support.'))
class InvenioWebJournalIssueNumberBadlyFormedError(Exception):
"""
"""
def __init__(self, language, issue):
"""
Initialisation.
"""
self.language = language
self.issue = issue
def __str__(self):
"""
String representation.
"""
return 'The issue number was badly formed. If this comes from the \
user it is no problem.'
def user_box(self):
"""
user-friendly message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('Issue number badly formed'),
_('We could not read the issue number you provided'),
'The issue number you provided in the url seems to be badly\
formed. Issue numbers have to be in the form of ww/YYYY, so\
e.g. 50/2007. You provided the issue number like so: \
%s.' % escape_html(self.issue))
class InvenioWebJournalArchiveDateWronglyFormedError (Exception):
"""
"""
def __init__(self, language, date):
"""
Initialisation.
"""
self.language = language
self.date = date
def __str__(self):
"""
String representation.
"""
return 'The archive date was badly formed. If this comes from the \
user it is no problem.'
def user_box(self):
"""
user-friendly message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('Archive date badly formed'),
_('We could not read the archive date you provided'),
'The archive date you provided in the form seems to be badly\
formed. Archive dates have to be in the form of dd/mm/YYYY, so\
e.g. 02/12/2007. You provided the archive date like so: \
%s.' % escape_html(self.date))
class IvenioWebJournalNoPopupTypeError(Exception):
"""
Exception that is thrown if a popup is requested without specifying the
type of the popup to call.
"""
def __init__(self, language, journal_name):
"""
Initialisation.
"""
self.language = language
self.journal_name
def __str__(self):
"""
String representation.
"""
return 'There was no popup type provided for a popup window on \
journal %s.' % repr(self.journal_name)
def user_box(self):
"""
user-friendly message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('No popup type'),
_('We could not know what kind of popup you requested'),
'You called a popup window on CDS Invenio without \
specifying the type of the popup. Does this link come \
from a CDS Invenio Journal? If so, please contact \
support.')
class InvenioWebJournalNoPopupRecordError(Exception):
"""
Exception that is thrown if a popup is requested without specifying the
type of the popup to call.
"""
def __init__(self, language, journal_name, recid):
"""
Initialisation.
"""
self.language = language
self.journal_name
self.recid = recid
def __str__(self):
"""
String representation.
"""
return 'There was no recid provided to the popup system of webjournal \
or the recid was badly formed. The recid was %s' % repr(self.recid)
def user_box(self):
"""
user-friendly message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('No popup record'),
_('We could not deduce the popup article you requested'),
'You called a popup window on CDS Invenio without \
specifying a record in which you are interested or the \
record was badly formed. Does this link come \
from a CDS Invenio Journal? If so, please contact \
support.')
class InvenioWebJournalReleaseUpdateError(Exception):
"""
Exception that is thrown if an update release was not successful.
"""
def __init__(self, language, journal_name):
"""
Initialisation.
"""
self.language = language
self.journal_name = journal_name
def __str__(self):
"""
String representation.
"""
return 'There were no updates submitted on a click on the update button.\
This should never happen and must be due to an internal error.'
def user_box(self):
"""
user-friendly message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('Update error'),
_('There was an internal error'),
'We encountered an internal error trying to update the \
journal issue. You can try to launch the update again or \
contact the Administrator. We apologize for the \
inconvenience.')
class InvenioWebJournalReleaseDBError(Exception):
"""
Exception that is thrown if an update release was not successful.
"""
def __init__(self, language):
"""
Initialisation.
"""
self.language = language
def __str__(self):
"""
String representation.
"""
return 'There was an error in synchronizing DB times with the actual \
python time objects. Debug the code in: \
webjournal_utils.issue_times_to_week_strings'
def user_box(self):
"""
user-friendly message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('Journal publishing DB error'),
_('There was an internal error'),
'We encountered an internal error trying to publish the \
journal issue. You can try to launch the publish interface \
again or contact the Administrator. We apologize for the \
inconvenience.')
class InvenioWebJournalIssueNotFoundDBError(Exception):
"""
Exception that is thrown if there was an issue number not found in the
"""
def __init__(self, language, journal_name, issue_number):
"""
Initialisation.
"""
self.language = language
self.journal_name = journal_name
sefl.issue_number = issue_number
def __str__(self):
"""
String representation.
"""
return 'The issue %s could not be found in the DB for journal %s.' % (self.issue_number,
self.journal_name)
def user_box(self):
"""
user-friendly message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('Journal issue error'),
_('We could not find a current issue in the Database'),
'We encountered an internal error trying to get an issue \
number. You can try to refresh the page or \
contact the Administrator. We apologize for the \
inconvenience.')
class InvenioWebJournalJournalIdNotFoundDBError(Exception):
"""
Exception that is thrown if there was an issue number not found in the
"""
def __init__(self, language, journal_name):
"""
Initialisation.
"""
self.language = language
self.journal_name = journal_name
def __str__(self):
"""
String representation.
"""
return 'The id for journal %s was not found in the Database. Make \
sure the entry exists!' % (self.journal_name)
def user_box(self):
"""
user-friendly message with formatting.
"""
_ = gettext_set_language(self.language)
return tmpl_webjournal_error_box(self.language,
_('Journal ID error'),
_('We could not find the id for this journal in the Database'),
'We encountered an internal error trying to get the id \
for this journal. You can try to refresh the page or \
contact the Administrator. We apologize for the \
inconvenience.')
#!!! depreceated !!!#
def webjournal_missing_info_box(language, title, msg_title, msg):
"""
returns a box indicating that the given journal was not found on the
server, leaving the opportunity to select an existing journal from a list.
"""
#params = parse_url_string(req)
#try:
# language = params["ln"]
#except:
# language = cdslang
_ = gettext_set_language(language)
title = _(title)
box_title = _(msg_title)
box_text = _(msg)
box_list_title = _("Available journals")
find_journals = lambda path: [entry for entry in os.listdir(str(path)) if os.path.isdir(str(path)+str(entry))]
try:
- all_journals = find_journals('%s/webjournal/' % etcdir)
+ all_journals = find_journals('%s/webjournal/' % CFG_ETCDIR)
except:
all_journals = []
box = '''
''' % (weburl, title_msg, msg, supportemail)
return page(title=title, body=box)
diff --git a/modules/webjournal/lib/webjournal_templates.py b/modules/webjournal/lib/webjournal_templates.py
index 990225b5f..a2bb1bc26 100644
--- a/modules/webjournal/lib/webjournal_templates.py
+++ b/modules/webjournal/lib/webjournal_templates.py
@@ -1,447 +1,447 @@
# -*- coding: utf-8 -*-
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
import os
import time
-from invenio.config import adminemail, supportemail, etcdir, weburl, cdslang
+from invenio.config import adminemail, supportemail, CFG_ETCDIR, weburl, cdslang
from invenio.messages import gettext_set_language
from invenio.webpage import page
from invenio.webjournal_utils import get_number_of_articles_for_issue, \
get_release_time, \
get_announcement_time, \
get_current_publication
def tmpl_webjournal_missing_info_box(language, title, msg_title, msg):
"""
returns a box indicating that the given journal was not found on the
server, leaving the opportunity to select an existing journal from a list.
"""
_ = gettext_set_language(language)
box_title = msg_title
box_text = msg
box_list_title = _("Available Journals")
# todo: move to DB call
find_journals = lambda path: [entry for entry in os.listdir(str(path)) if os.path.isdir(str(path)+str(entry))]
try:
- all_journals = find_journals('%s/webjournal/' % etcdir)
+ all_journals = find_journals('%s/webjournal/' % CFG_ETCDIR)
except:
all_journals = []
box = '''
''' % (weburl, title_msg, msg, mail_msg)
return page(title=title, body=box)
def tmpl_webjournal_regenerate_success(language, journal_name, issue_number):
"""
Success message if a user applied the "regenerate" link. Links back to
the regenerated journal.
"""
_ = gettext_set_language(language)
return page(
title=_("Issue regenerated"),
body = '''
The issue number %s for the journal %s has been successfully
regenerated.
Look at your changes: >> %s
''' % (issue_number, journal_name, weburl, journal_name, journal_name))
def tmpl_webjournal_regenerate_error(language, journal_name, issue_number):
"""
Failure message for a regeneration try.
"""
_ = gettext_set_language(language)
return page(
title=_("Regeneration Error"),
body = _("The issue could not be correctly regenerated. "
"Please contact your administrator."))
def tmpl_webjournal_feature_record_interface(language, journal_name):
"""
Draws an interface form to feature a specific record from CDS Invenio.
"""
_ = gettext_set_language(language)
interface = '''
''' % (weburl, journal_name)
return page(title=_("Feature a record"), body=interface)
def tmpl_webjournal_feature_record_success(language, journal_name, recid):
"""
Draw a success message for featuring a record and a backlink to the journal
"""
_ = gettext_set_language(language)
title = "Successfully featured record: %s" % recid
msg = '''Return to your journal here >>
%s''' % (weburl,
journal_name,
journal_name)
return page(title = title, body = msg)
def tmpl_webjournal_alert_plain_text_CERNBulletin(journal_name, language, issue):
"""
Plain Text message for alert of CERN Bulletin. No multilanguage since the
message should always be in two languages.
"""
current_publication = get_current_publication(journal_name, issue)
plain_text = u'''Dear Subscriber,
The latest issue of the CERN Bulletin, no. %s, has been released.
You can access it at the following URL:
http://bulletin.cern.ch/
Best Wishes,
CERN Bulletin team
----
Cher Abonné,
Le nouveau numéro du CERN Bulletin, no. %s, vient de paraître.
Vous pouvez y accéder à cette adresse :
http://bulletin.cern.ch/fre
Bonne lecture,
L'équipe du Bulletin du CERN
''' % (current_publication, current_publication)
return plain_text
def tmpl_webjournal_alert_subject_CERNBulletin(journal_name, issue):
"""
Subject text for the CERN Bulletin release.
"""
return "CERN bulletin %s released" % get_current_publication(journal_name,
issue)
def tmpl_webjournal_alert_interface(language, journal_name, subject,
plain_text):
"""
Alert eMail interface.
"""
_ = gettext_set_language(language)
interface = '''
''' % (weburl, journal_name, subject, plain_text)
return interface
def tmpl_webjournal_alert_was_already_sent(language, journal_name,
subject, plain_text, recipients,
html_mail, issue):
"""
"""
_ = gettext_set_language(language)
interface = '''
''' % (weburl, journal_name, recipients,
subject, plain_text, html_mail, issue, weburl, journal_name,
issue)
return page(title="Confirmation Required", body=interface)
def tmpl_webjournal_alert_success_msg(language, journal_name):
"""
Success messge for the alert system.
"""
_ = gettext_set_language(language)
title = _("Alert sent successfully!")
body = 'Return to your journal here: >> \
%s ' % (weburl, journal_name,
journal_name)
return page(title=title, body=body)
def tmpl_webjournal_issue_control_interface(language, journal_name,
active_issues):
"""
"""
_ = gettext_set_language(language)
interface = '''
Publishing Interface
This interface gives you the possibilite to create
your current webjournal publication. Every checked
issue number will be in the current publication. Once
you've made your selection you can publish the new
issue by clicking the Publish button at the end.
''' % (weburl,
journal_name,
"".join(['
%s
'
% (issue, issue) for issue in active_issues]),
)
return interface
def tmpl_webjournal_issue_control_success_msg(language,
active_issues, journal_name):
"""
"""
_ = gettext_set_language(language)
issue_string = "".join([" - %s" % issue for issue in active_issues])
title = '
' % (weburl, journal_name,
journal_name, weburl,
journal_name, weburl, journal_name)
return title + body
def tmpl_webjournal_update_an_issue(language, journal_name, next_issue,
current_issue):
"""
A form that lets a user make an update to an issue number.
"""
_ = gettext_set_language(language)
current_articles = get_number_of_articles_for_issue(current_issue,
journal_name,
language)
next_articles = get_number_of_articles_for_issue(next_issue,
journal_name,
language)
html = '''
The Issue that was released on week %s has pending updates scheduled. The
next update for this issue is %s.
Note: If you want to make a new release, please click through all the
pending updates first.
Do you want to release the update from issue
%s (%s) to issue %s (%s)
now?
''' % (current_issue, next_issue,
current_issue,
",".join(["%s : %s" % (item[0], item[1]) for item in current_articles.iteritems()]),
next_issue,
",".join(["%s : %s" % (item[0], item[1]) for item in next_articles.iteritems()]),
weburl, journal_name, next_issue)
return html
def tmpl_webjournal_updated_issue_msg(language, update_issue, journal_name):
"""
Prints a success message for the Update release of a journal.
"""
_ = gettext_set_language(language)
title = '
''' % ((issue==current_issue) and "background:#00FF00;" or "background:#F1F1F1;",
issue, (issue==next_issue_number) and "?" or current_publication,
"\n".join(['
''' % ("\n".join([issue_box for issue_box in issue_boxes]),
weburl, weburl, journal_name, weburl, journal_name)
return page(title=title, body=body)
diff --git a/modules/webjournal/lib/webjournal_utils.py b/modules/webjournal/lib/webjournal_utils.py
index 405448bb3..ea27eca06 100644
--- a/modules/webjournal/lib/webjournal_utils.py
+++ b/modules/webjournal/lib/webjournal_utils.py
@@ -1,1237 +1,1237 @@
# -*- coding: utf-8 -*-
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
Various utilities for WebJournal, e.g. config parser, etc.
"""
from invenio.bibformat_engine import BibFormatObject
from invenio.errorlib import register_exception
from invenio.search_engine import search_pattern
-from invenio.config import etcdir, weburl, adminemail, cachedir, cdslang
+from invenio.config import CFG_ETCDIR, weburl, adminemail, CFG_CACHEDIR, cdslang
from invenio.messages import gettext_set_language
from invenio.webpage import page
from invenio.dbquery import run_sql
from xml.dom import minidom
from urllib2 import urlopen
import time
import datetime
import re
import os
import cPickle
############################ MAPPING FUNCTIONS ################################
def get_order_dict_from_recid_list(list, issue_number):
"""
this is a centralized function that takes a list of recid's and brings it in
order using a centralized algorithm. this always has to be in sync with
the reverse function get_recid_from_order(order)
parameters:
list: a list of all recid's that should be brought into order
issue_number: the issue_number for which we are deriving the order
(this has to be one number)
returns:
ordered_records: a dictionary with the recids ordered by keys
"""
ordered_records = {}
for record in list:
temp_rec = BibFormatObject(record)
issue_numbers = temp_rec.fields('773__n')
order_number = temp_rec.fields('773__c')
# todo: the marc fields have to be set'able by some sort of config interface
n = 0
for temp_issue in issue_numbers:
if temp_issue == issue_number:
try:
order_number = int(order_number[n])
except:
# todo: Warning, record does not support numbering scheme
order_number = -1
n+=1
if order_number != -1:
try:
ordered_records[order_number] = record
except:
pass
# todo: Error, there are two records with the same order_number in the issue
else:
ordered_records[max(ordered_records.keys()) + 1] = record
return ordered_records
def get_records_in_same_issue_in_order(recid):
"""
"""
raise ("Not implemented yet.")
def get_recid_from_order(order, rule, issue_number):
"""
takes the order of a record in the journal as passed in the url arguments
and derives the recid using the current issue number and the record
rule for this kind of records.
parameters:
order: the order at which the record appears in the journal as passed
in the url
rule: the defining rule of the journal record category
issue_number: the issue number for which we are searching
returns:
recid: the recid of the ordered record
"""
# get the id list
all_records = list(search_pattern(p="%s and 773__n:%s" %
(rule, issue_number),
f="&action_search=Search"))
ordered_records = {}
for record in all_records:
temp_rec = BibFormatObject(record)
issue_numbers = temp_rec.fields('773__n')
order_number = temp_rec.fields('773__c')
# todo: fields for issue number and order number have to become generic
n = 0
for temp_issue in issue_numbers:
if temp_issue == issue_number:
try:
order_number = int(order_number[n])
except:
# todo: Warning, record does not support numbering scheme
order_number = -1
n+=1
if order_number != -1:
try:
ordered_records[order_number] = record
except:
pass
# todo: Error, there are two records with the same order_number in the issue
else:
ordered_records[max(ordered_records.keys()) + 1] = record
try:
recid = ordered_records[int(order)]
except:
pass
# todo: ERROR, numbering scheme inconsistency
return recid
# todo: move to a template
def please_login(req, journal_name, ln="en", title="", message="", backlink=""):
"""
"""
_ = gettext_set_language(ln)
if title == "":
title_out = _("Please login to perform this action.")
else:
title_out = title
if message == "":
message_out = _("In order to publish webjournal issues you must be logged \
in and be authorized for this kind of task. If you have a \
login, use the link \
below to login.")
else:
message_out = message
if backlink == "":
backlink_out = "%s/journal/issue_control?name=%s" % (weburl, journal_name)
else:
backlink_out = backlink
title_msg = _("We need you to login")
body_out = '''
''' % (weburl,
title_msg,
message_out,
weburl,
backlink_out,
adminemail)
return page(title = title_out,
body = body_out,
description = "",
keywords = "",
language = ln,
req = req)
def get_rule_string_from_rule_list(rule_list, category):
"""
"""
i = 0
current_category_in_list = 0
for rule_string in rule_list:
category_from_config = rule_string.split(",")[0]
if category_from_config.lower() == category.lower():
current_category_in_list = i
i+=1
try:
rule_string = rule_list[current_category_in_list]
except:
rule_string = ""
# todo: exception
return rule_string
def get_category_from_rule_string(rule_string):
"""
"""
pass
def get_rule_string_from_category(category):
"""
"""
pass
######################## TIME / ISSUE FUNCTIONS ###############################
def get_monday_of_the_week(week_number, year):
"""
CERN Bulletin specific function that returns a string indicating the
Monday of each week as: Monday
"""
timetuple = issue_week_strings_to_times(['%s/%s' % (week_number, year), ])[0]
return time.strftime("%A %d %B %Y", timetuple)
def get_issue_number_display(issue_number, journal_name, language=cdslang):
"""
Returns the display string for a given issue number.
"""
journal_id = get_journal_id(journal_name, language)
issue_display = run_sql("SELECT issue_display FROM jrnISSUE \
WHERE issue_number=%s AND id_jrnJOURNAL=%s", (issue_number,
journal_id))[0][0]
return issue_display
def get_current_issue_time(journal_name, language=cdslang):
"""
Return the current issue of a journal as a time object.
"""
current_issue = get_current_issue(language, journal_name)
week_number = current_issue.split("/")[0]
year = current_issue.split("/")[1]
current_issue_time = issue_week_strings_to_times(['%s/%s' %
(week_number, year), ])[0]
return current_issue_time
def get_all_issue_weeks(issue_time, journal_name, language):
"""
Function that takes an issue_number, checks the DB for the issue_display
which can contain the other (update) weeks involved with this issue and
returns all issues in a list of timetuples (always for Monday of each
week).
"""
from invenio.webjournal_config import InvenioWebJournalIssueNotFoundDBError
journal_id = get_journal_id(journal_name)
issue_string = issue_times_to_week_strings([issue_time,])[0]
try:
issue_display = run_sql(
"SELECT issue_display FROM jrnISSUE WHERE issue_number=%s \
AND id_jrnJOURNAL=%s",
(issue_string, journal_id))[0][0]
except:
raise InvenioWebJournalIssueNotFoundDBError(language, journal_name,
issue_string)
issue_bounds = issue_display.split("/")[0].split("-")
year = issue_display.split("/")[1]
all_issue_weeks = []
if len(issue_bounds) == 2:
# is the year changing? -> "52-02/2008"
if int(issue_bounds[0]) > int(issue_bounds[1]):
# get everything from the old year
old_year_issues = []
low_bound_time = issue_week_strings_to_times(['%s/%s' %
(issue_bounds[0],
str(int(year)-1)), ])[0]
# if the year changes over the week we always take the higher year
low_bound_date = datetime.date(int(time.strftime("%Y", low_bound_time)),
int(time.strftime("%m", low_bound_time)),
int(time.strftime("%d", low_bound_time)))
week_counter = datetime.timedelta(weeks=1)
date = low_bound_date
# count up the weeks until you get to the new year
while date.year != int(year):
old_year_issues.append(date.timetuple())
#format = time.strftime("%W/%Y", date.timetuple())
date = date + week_counter
# get everything from the new year
new_year_issues = []
for i in range(1, int(issue_bounds[1])+1):
to_append = issue_week_strings_to_times(['%s/%s' % (i, year),])[0]
new_year_issues.append(to_append)
all_issue_weeks += old_year_issues
all_issue_weeks += new_year_issues
else:
for i in range(int(issue_bounds[0]), int(issue_bounds[1])+1):
to_append = issue_week_strings_to_times(['%s/%s' % (i, year),])[0]
all_issue_weeks.append(to_append)
elif len(issue_bounds) == 1:
to_append = issue_week_strings_to_times(['%s/%s' %
(issue_bounds[0], year),])[0]
all_issue_weeks.append(to_append)
else:
return False
return all_issue_weeks
def count_down_to_monday(current_time):
"""
Takes a timetuple and counts it down to the next monday and returns
this time.
"""
next_monday = datetime.date(int(time.strftime("%Y", current_time)),
int(time.strftime("%m", current_time)),
int(time.strftime("%d", current_time)))
counter = datetime.timedelta(days=-1)
while next_monday.weekday() != 0:
next_monday = next_monday + counter
return next_monday.timetuple()
def get_next_journal_issues(current_issue_time, journal_name,
language=cdslang, number=2):
"""
Returns the next issue numbers from the current_issue_time.
"""
#now = '%s-%s-%s 00:00:00' % (int(time.strftime("%Y", current_issue_time)),
# int(time.strftime("%m", current_issue_time)),
# int(time.strftime("%d", current_issue_time)))
#
now = datetime.date(int(time.strftime("%Y", current_issue_time)),
int(time.strftime("%m", current_issue_time)),
int(time.strftime("%d", current_issue_time)))
week_counter = datetime.timedelta(weeks=1)
date = now
next_issues = []
for i in range(1, number+1):
date = date + week_counter
#date = run_sql("SELECT %s + INTERVAL 1 WEEK", (date,))[0][0]
#date_formated = time.strptime(date, "%Y-%m-%d %H:%M:%S")
#raise '%s %s' % (repr(now), repr(date_formated))
next_issues.append(date.timetuple())
#next_issues.append(date_formated)
return next_issues
def issue_times_to_week_strings(issue_times, language=cdslang):
"""
Function that approaches a correct python time to MySQL time week string
conversion by looking up and down the time horizon and always rechecking
the python time with the mysql result until a week string match is found.
"""
issue_strings = []
for issue in issue_times:
# do the initial pythonic week view
week = time.strftime("%W/%Y", issue)
week += " Monday"
Limit = 5
counter = 0
success = False
# try going up 5
while success == False and counter <= Limit:
counter += 1
success = get_consistent_issue_week(issue, week)
if success == False:
week = count_week_string_up(week)
else:
break
# try going down 5
counter = 0
while success == False and counter <= Limit:
counter += 1
success = get_consistent_issue_week(issue, week)
if success == False:
week = count_week_string_down(week)
else:
break
from webjournal_config import InvenioWebJournalReleaseDBError
if success == False:
raise InvenioWebJournalReleaseDBError(language)
#check_for_time = run_sql("SELECT STR_TO_DATE(%s, %s)",
# (week, conversion_rule))[0][0]
#while (issue != check_for_time.timetuple()):
# week = str(int(week.split("/")[0]) + 1) + "/" + week.split("/")[1]
# if week[1] == "/":
# week = "0" + week
# #raise repr(week)
# check_for_time = run_sql("SELECT STR_TO_DATE(%s, %s)",
# (week, conversion_rule))[0][0]
issue_strings.append(week.split(" ")[0])
return issue_strings
def count_week_string_up(week):
"""
Function that takes a week string representation and counts it up by one.
"""
week_nr = week.split("/")[0]
year = week.split("/")[1]
if week_nr == "53":
week_nr = "01"
year = str(int(year) + 1)
else:
week_nr = str(int(week_nr) + 1)
if len(week_nr) == 1:
week_nr = "0" + week_nr
return "%s/%s" % (week_nr, year)
def count_week_string_down(week):
"""
Function that takes a week string representation and counts it down by one.
"""
week_nr = week.split("/")[0]
year = week.split("/")[1]
if week_nr == "01":
week_nr = "53"
year = str(int(year)-1)
else:
week_nr = str(int(week_nr)-1)
if len(week_nr) == 1:
week_nr = "0" + week_nr
return "%s/%s" % (week_nr, year)
def get_consistent_issue_week(issue_time, issue_week):
"""
This is the central consistency function between our Python and MySQL dates.
We use mysql times because of a bug in Scientific Linux that does not allow
us to reconvert a week number to a timetuple.
The function takes a week string, e.g. "02/2008" and its according timetuple
from our functions. Then it retrieves the mysql timetuple for this week and
compares the two times. If they are equal our times are consistent, if not,
we return False and some function should try to approach a consisten result
(see example in issue_times_to_week_strings()).
"""
conversion_rule = '%v/%x %W'
mysql_repr = run_sql("SELECT STR_TO_DATE(%s, %s)",
(issue_week, conversion_rule))[0][0]
if mysql_repr.timetuple() == issue_time:
return issue_week
else:
return False
def issue_week_strings_to_times(issue_weeks, language=cdslang):
"""
Converts a list of issue week strings (WW/YYYY) to python time objects.
"""
issue_times = []
for issue in issue_weeks:
week_number = issue.split("/")[0]
year = issue.split("/")[1]
to_convert = '%s/%s Monday' % (year, week_number)
conversion_rule = '%x/%v %W'
result = run_sql("SELECT STR_TO_DATE(%s, %s)",
(to_convert, conversion_rule))[0][0]
issue_times.append(result.timetuple())
return issue_times
def release_journal_update(update_issue, journal_name, language=cdslang):
"""
Releases an update to a journal.
"""
journal_id = get_journal_id(journal_name, language)
run_sql("UPDATE jrnISSUE set date_released=NOW() \
WHERE issue_number=%s \
AND id_jrnJOURNAL=%s", (update_issue,
journal_id))
def sort_by_week_number(x, y):
"""
Sorts a list of week numbers.
"""
year_x = x.split("/")[1]
year_y = y.split("/")[1]
if cmp(year_x, year_y) != 0:
return cmp(year_x, year_y)
else:
week_x = x.split("/")[0]
week_y = y.split("/")[0]
return cmp(week_x, week_y)
def release_journal_issue(publish_issues, journal_name, language=cdslang):
"""
Releases a new issue.
"""
journal_id = get_journal_id(journal_name, language)
if len(publish_issues) > 1:
publish_issues.sort(sort_by_week_number)
low_bound = publish_issues[0]
high_bound = publish_issues[-1]
issue_display = '%s-%s/%s' % (low_bound.split("/")[0],
high_bound.split("/")[0],
high_bound.split("/")[1])
# remember convention: if we are going over a new year, take the higher
else:
issue_display = publish_issues[0]
# produce the DB lines
for publish_issue in publish_issues:
run_sql("INSERT INTO jrnISSUE (id_jrnJOURNAL, issue_number, issue_display) \
VALUES(%s, %s, %s)", (journal_id,
publish_issue,
issue_display))
# set first issue to published
release_journal_update(publish_issues[0], journal_name, language)
def delete_journal_issue(issue, journal_name, language=cdslang):
"""
Deletes an issue from the DB.
"""
journal_id = get_journal_id(journal_name, language)
run_sql("DELETE FROM jrnISSUE WHERE issue_number=%s \
AND id_jrnJOURNAL=%s",(issue, journal_id))
def was_alert_sent_for_issue(issue, journal_name, language):
"""
"""
journal_id = get_journal_id(journal_name, language)
date_announced = run_sql("SELECT date_announced FROM jrnISSUE \
WHERE issue_number=%s \
AND id_jrnJOURNAL=%s", (issue, journal_id))[0][0]
if date_announced == None:
return False
else:
return date_announced.timetuple()
def update_DB_for_alert(issue, journal_name, language):
"""
"""
journal_id = get_journal_id(journal_name, language)
run_sql("UPDATE jrnISSUE set date_announced=NOW() \
WHERE issue_number=%s \
AND id_jrnJOURNAL=%s", (issue,
journal_id))
def get_number_of_articles_for_issue(issue, journal_name, language=cdslang):
"""
Function that returns a dictionary with all categories and number of
articles in each category.
"""
config_strings = get_xml_from_config(["rule",], journal_name)
rule_list = config_strings["rule"]
all_articles = {}
for rule in rule_list:
category_name = rule.split(",")[0]
if issue[0] == "0" and len(issue) == 7:
week_nr = issue.split("/")[0]
year = issue.split("/")[1]
issue_nr_alternative = "%s/%s" % (week_nr[1], year)
all_records_of_a_type = list(search_pattern(p='65017a:"%s" and 773__n:%s' %
(category_name, issue),
f="&action_search=Search"))
all_records_of_a_type += list(search_pattern(p='65017a:"%s" and 773__n:%s' %
(category_name, issue_nr_alternative),
f="&action_search=Search"))
else:
all_records_of_a_type = list(search_pattern(p='65017a:"%s" and 773__n:%s' %
(category_name, issue),
f="&action_search=Search"))
all_articles[category_name] = len(all_records_of_a_type)
return all_articles
def get_list_of_issues_for_publication(publication):
"""
Takes a publication string, e.g. 23-24/2008 and splits it down to a list
of single issues.
"""
year = publication.split("/")[1]
issues_string = publication.split("/")[0]
bounds = issues_string.split("-")
issues = []
if len(bounds) == 2:
low_bound = issues_string.split("-")[0]
high_bound = issues_string.split("-")[1]
if int(low_bound) < int(high_bound):
for i in range(int(low_bound), int(high_bound)+1):
issue_nr = str(i)
if len(issue_nr) == 1:
issue_nr = "0" + issue_nr
issues.append("%s/%s" % (issue_nr, year))
else:
for i in range(int(low_bound), 53+1):
issue_nr = str(i)
if len(issue_nr) == 1:
issue_nr = "0" + issue_nr
issues.append("%s/%s" % (issue_nr, str(int(year)-1)))
for i in range(1, int(high_bound) + 1):
issue_nr = str(i)
if len(issue_nr) == 1:
issue_nr = "0" + issue_nr
issues.append("%s/%s" % (issue_nr, year))
else:
issues.append("%s/%s" % (bounds[0], year))
return issues
def get_release_time(issue, journal_name, language=cdslang):
"""
Gets the date at which an issue was released from the DB.
"""
journal_id = get_journal_id(journal_name, language)
try:
release_date = run_sql("SELECT date_released FROM jrnISSUE \
WHERE issue_number=%s AND id_jrnJOURNAL=%s",
(issue, journal_id))[0][0]
except:
return False
if release_date == None:
return False
else:
return release_date.timetuple()
def get_announcement_time(issue, journal_name, language=cdslang):
"""
Get the date at which an issue was announced through the alert system.
"""
journal_id = get_journal_id(journal_name, language)
try:
announce_date = run_sql("SELECT date_announced FROM jrnISSUE \
WHERE issue_number=%s AND id_jrnJOURNAL=%s",
(issue, journal_id))[0][0]
except:
return False
if announce_date == None:
return False
else:
return announce_date.timetuple()
######################## GET DEFAULTS FUNCTIONS ###############################
def get_journal_id(journal_name, language=cdslang):
"""
Get the id for this journal from the DB.
"""
from invenio.webjournal_config import InvenioWebJournalJournalIdNotFoundDBError
try:
journal_id = run_sql("SELECT id FROM jrnJOURNAL WHERE name=%s",
(journal_name,))[0][0]
except:
raise InvenioWebJournalJournalIdNotFoundDBError(language, journal_name)
return journal_id
def guess_journal_name(language):
"""
tries to take a guess what a user was looking for on the server if not
providing a name for the journal.
if there is only one journal on the server, returns the name of which,
otherwise redirects to a list with possible journals.
"""
from invenio.webjournal_config import InvenioWebJournalNoJournalOnServerError
from invenio.webjournal_config import InvenioWebJournalNoNameError
all_journals = run_sql("SELECT * FROM jrnJOURNAL ORDER BY id")
if len(all_journals) == 0:
raise InvenioWebJournalNoJournalOnServerError(language)
elif len(all_journals) == 1:
return all_journals[0][1]
else:
raise InvenioWebJournalNoNameError(language)
def get_current_issue(language, journal_name):
"""
Returns the current issue of a journal as a string.
"""
journal_id = get_journal_id(journal_name, language)
try:
current_issue = run_sql("SELECT issue_number FROM jrnISSUE \
WHERE date_released <= NOW() AND id_jrnJOURNAL=%s \
ORDER BY date_released DESC LIMIT 1", (journal_id,))[0][0]
except:
# start the first journal ever with the day of today
current_issue = time.strftime("%W/%Y", time.localtime())
run_sql("INSERT INTO jrnISSUE \
(id_jrnJOURNAL, issue_number, issue_display) \
VALUES(%s, %s, %s)", (journal_id,
current_issue,
current_issue))
return current_issue
def get_current_publication(journal_name, current_issue, language=cdslang):
"""
Returns the current publication string (current issue + updates).
"""
journal_id = get_journal_id(journal_name, language)
current_publication = run_sql("SELECT issue_display FROM jrnISSUE \
WHERE issue_number=%s AND \
id_jrnJOURNAL=%s",
(current_issue, journal_id))[0][0]
return current_publication
def get_xml_from_config(xpath_list, journal_name):
"""
wrapper for minidom.getElementsByTagName()
Takes a list of string expressions and a journal name and searches the config
file of this journal for the given xpath queries. Returns a dictionary with
a key for each query and a list of string (innerXml) results for each key.
Has a special field "config_fetching_error" that returns an error when
something has gone wrong.
"""
# get and open the config file
results = {}
- config_path = '%s/webjournal/%s/config.xml' % (etcdir, journal_name)
+ config_path = '%s/webjournal/%s/config.xml' % (CFG_ETCDIR, journal_name)
config_file = minidom.Document
try:
config_file = minidom.parse("%s" % config_path)
except:
#todo: raise exception "error: no config file found"
results["config_fetching_error"] = "could not find config file"
return results
for xpath in xpath_list:
result_list = config_file.getElementsByTagName(xpath)
results[xpath] = []
for result in result_list:
try:
result_string = result.firstChild.toxml(encoding="utf-8")
except:
# WARNING, config did not have a value
continue
results[xpath].append(result_string)
return results
def parse_url_string(req):
"""
centralized function to parse any url string given in webjournal.
returns:
args: all arguments in dict form
"""
args = {}
# first get what you can from the argument string
try:
argument_string = req.args#"name=CERNBulletin&issue=22/2007"#req.args
except:
argument_string = ""
try:
arg_list = argument_string.split("&")
except:
# no arguments
arg_list = []
for entry in arg_list:
try:
key = entry.split("=")[0]
except KeyError:
# todo: WARNING, could not parse one argument
continue
try:
val = entry.split("=")[1]
except:
# todo: WARNING, could not parse one argument
continue
try:
args[key] = val
except:
# todo: WARNING, argument given twice
continue
# secondly try to get default arguments
try:
for entry in req.journal_defaults.keys():
try:
args[entry] = req.journal_defaults[entry]
except:
# todo: Error, duplicate entry from args and defaults
pass
except:
# no defaults
pass
return args
######################## EMAIL HELPER FUNCTIONS ###############################
def createhtmlmail (html, text, subject, toaddr):
"""
Create a mime-message that will render HTML in popular
MUAs, text in better ones.
"""
import MimeWriter
import mimetools
import cStringIO
out = cStringIO.StringIO() # output buffer for our message
htmlin = cStringIO.StringIO(html)
txtin = cStringIO.StringIO(text)
writer = MimeWriter.MimeWriter(out)
#
# set up some basic headers... we put subject here
# because smtplib.sendmail expects it to be in the
# message body
#
writer.addheader("Subject", subject)
writer.addheader("MIME-Version", "1.0")
writer.addheader("To", toaddr)
#
# start the multipart section of the message
# multipart/alternative seems to work better
# on some MUAs than multipart/mixed
#
writer.startmultipartbody("alternative")
writer.flushheaders()
#
# the plain text section
#
subpart = writer.nextpart()
subpart.addheader("Content-Transfer-Encoding", "quoted-printable")
#pout = subpart.startbody("text/plain", [("charset", 'us-ascii')])
pout = subpart.startbody("text/plain", [("charset", 'utf-8')])
mimetools.encode(txtin, pout, 'quoted-printable')
txtin.close()
#
# start the html subpart of the message
#
subpart = writer.nextpart()
subpart.addheader("Content-Transfer-Encoding", "quoted-printable")
txtin.close()
#
# start the html subpart of the message
#
subpart = writer.nextpart()
subpart.addheader("Content-Transfer-Encoding", "quoted-printable")
#
# returns us a file-ish object we can write to
#
#pout = subpart.startbody("text/html", [("charset", 'us-ascii')])
pout = subpart.startbody("text/html", [("charset", 'utf-8')])
mimetools.encode(htmlin, pout, 'quoted-printable')
htmlin.close()
#
# Now that we're done, close our writer and
# return the message body
#
writer.lastpart()
msg = out.getvalue()
out.close()
print msg
return msg
def put_css_in_file(html_message, journal_name):
"""
Takes an external css file and puts all the content of it in the head
of an HTML file in style tags. (Used for HTML emails)
"""
config_strings = get_xml_from_config(["screen"], journal_name)
try:
css_path = config_strings["screen"][0]
except:
register_exception(req=Null,
suffix="No css file for journal %s. Is this right?"
% journal_name)
return
css_file = urlopen('%s/%s' % (weburl, css_path))
css = css_file.read()
css = make_full_paths_in_css(css, journal_name)
html_parted = html_message.split("")
if len(html_parted) > 1:
html = '%s%s' % (html_parted[0],
css,
html_parted[1])
else:
html_parted = html_message.split("")
if len(html_parted) > 1:
html = '%s%s' % (html_parted[0],
css,
html_parted[1])
else:
return
return html
def make_full_paths_in_css(css, journal_name):
"""
"""
url_pattern = re.compile('''url\(["']?\s*(?P\S*)\s*["']?\)''',
re.DOTALL)
url_iter = url_pattern.finditer(css)
rel_to_full_path = {}
for url in url_iter:
url_string = url.group("url")
url_string = url_string.replace("\"", "")
url_string = url_string.replace("\'", "")
if url_string[:6] != "http://":
rel_to_full_path[url_string] = '"%s/img/%s/%s"' % (weburl,
journal_name,
url_string)
for url in rel_to_full_path.keys():
css = css.replace(url, rel_to_full_path[url])
return css
############################ CACHING FUNCTIONS ################################
def cache_index_page(html, journal_name, category, issue, ln):
"""
Caches the index page main area of a Bulletin
(right hand menu cannot be cached)
"""
issue = issue.replace("/", "_")
category = category.replace(" ", "")
- if not (os.path.isdir('%s/webjournal/%s' % (cachedir, journal_name) )):
- os.makedirs('%s/webjournal/%s' % (cachedir, journal_name))
- cached_file = open('%s/webjournal/%s/%s_index_%s_%s.html' % (cachedir,
+ if not (os.path.isdir('%s/webjournal/%s' % (CFG_CACHEDIR, journal_name) )):
+ os.makedirs('%s/webjournal/%s' % (CFG_CACHEDIR, journal_name))
+ cached_file = open('%s/webjournal/%s/%s_index_%s_%s.html' % (CFG_CACHEDIR,
journal_name,
issue, category,
ln), "w")
cached_file.write(html)
cached_file.close()
def get_index_page_from_cache(journal_name, category, issue, ln):
"""
Function to get an index page from the cache.
False if not in cache.
"""
issue = issue.replace("/", "_")
category = category.replace(" ", "")
try:
cached_file = open('%s/webjournal/%s/%s_index_%s_%s.html'
- % (cachedir, journal_name, issue, category, ln)).read()
+ % (CFG_CACHEDIR, journal_name, issue, category, ln)).read()
except:
return False
return cached_file
def cache_article_page(html, journal_name, category, recid, issue, ln):
"""
Caches an article view of a journal.
"""
issue = issue.replace("/", "_")
category = category.replace(" ", "")
- if not (os.path.isdir('%s/webjournal/%s' % (cachedir, journal_name) )):
- os.makedirs('%s/webjournal/%s' % (cachedir, journal_name))
+ if not (os.path.isdir('%s/webjournal/%s' % (CFG_CACHEDIR, journal_name) )):
+ os.makedirs('%s/webjournal/%s' % (CFG_CACHEDIR, journal_name))
cached_file = open('%s/webjournal/%s/%s_article_%s_%s_%s.html'
- % (cachedir, journal_name, issue, category, recid, ln),
+ % (CFG_CACHEDIR, journal_name, issue, category, recid, ln),
"w")
cached_file.write(html)
cached_file.close()
def get_article_page_from_cache(journal_name, category, recid, issue, ln):
"""
Gets an article view of a journal from cache.
False if not in cache.
"""
issue = issue.replace("/", "_")
category = category.replace(" ", "")
try:
cached_file = open('%s/webjournal/%s/%s_article_%s_%s_%s.html'
- % (cachedir, journal_name, issue, category, recid, ln)).read()
+ % (CFG_CACHEDIR, journal_name, issue, category, recid, ln)).read()
except:
return False
return cached_file
def clear_cache_for_article(journal_name, category, recid, issue):
"""
Resets the cache for an article (e.g. after an article has been modified)
"""
issue = issue.replace("/", "_")
category = category.replace(" ", "")
# try to delete the article cached file
try:
os.remove('%s/webjournal/%s/%s_article_%s_%s_en.html' %
- (cachedir, journal_name, issue, category, recid))
+ (CFG_CACHEDIR, journal_name, issue, category, recid))
except:
pass
try:
os.remove('%s/webjournal/%s/%s_article_%s_%s_fr.html' %
- (cachedir, journal_name, issue, category, recid))
+ (CFG_CACHEDIR, journal_name, issue, category, recid))
except:
pass
# delete the index page for the category
try:
os.remove('%s/webjournal/%s/%s_index_%s_en.html'
- % (cachedir, journal_name, issue, category))
+ % (CFG_CACHEDIR, journal_name, issue, category))
except:
pass
try:
os.remove('%s/webjournal/%s/%s_index_%s_fr.html'
- % (cachedir, journal_name, issue, category))
+ % (CFG_CACHEDIR, journal_name, issue, category))
except:
pass
# delete the entry in the recid_order_map
# todo: make this per entry
try:
os.remove('%s/webjournal/%s/%s_recid_order_map.dat'
- % (cachedir, journal_name, issue))
+ % (CFG_CACHEDIR, journal_name, issue))
except:
pass
return True
def clear_cache_for_issue(journal_name, issue):
"""
clears the cache of a whole issue.
"""
issue = issue.replace("/", "_")
all_cached_files = os.listdir('%s/webjournal/%s/'
- % (cachedir, journal_name))
+ % (CFG_CACHEDIR, journal_name))
for cached_file in all_cached_files:
if cached_file[:7] == issue:
try:
os.remove('%s/webjournal/%s/%s'
- % (cachedir, journal_name, cached_file))
+ % (CFG_CACHEDIR, journal_name, cached_file))
except:
return False
return True
def cache_recid_data_dict_CERNBulletin(recid, issue, rule, order):
"""
The CERN Bulletin has a specific recid data dict that is cached
using cPickle.
"""
issue = issue.replace("/", "_")
# get whats in there
- if not os.path.isdir('%s/webjournal/CERNBulletin' % cachedir):
- os.makedirs('%s/webjournal/CERNBulletin' % cachedir)
+ if not os.path.isdir('%s/webjournal/CERNBulletin' % CFG_CACHEDIR):
+ os.makedirs('%s/webjournal/CERNBulletin' % CFG_CACHEDIR)
try:
temp_file = open('%s/webjournal/CERNBulletin/%s_recid_order_map.dat'
- % (cachedir, issue))
+ % (CFG_CACHEDIR, issue))
except:
temp_file = open('%s/webjournal/CERNBulletin/%s_recid_order_map.dat'
- % (cachedir, issue), "w")
+ % (CFG_CACHEDIR, issue), "w")
try:
recid_map = cPickle.load(temp_file)
except:
recid_map = ""
temp_file.close()
# add new recid
if recid_map == "":
recid_map = {}
if not recid_map.has_key(rule):
recid_map[rule] = {}
recid_map[rule][order] = recid
# save back
temp_file = open('%s/webjournal/CERNBulletin/%s_recid_order_map.dat'
- % (cachedir, issue), "w")
+ % (CFG_CACHEDIR, issue), "w")
cPickle.dump(recid_map, temp_file)
temp_file.close()
def get_cached_recid_data_dict_CERNBulletin(issue, rule):
"""
Function to restore from cache the dict Data Type that the CERN Bulletin
uses for mapping between the order of an article and its recid.
"""
issue = issue.replace("/", "_")
try:
temp_file = open('%s/webjournal/CERNBulletin/%s_recid_order_map.dat'
- % (cachedir, issue))
+ % (CFG_CACHEDIR, issue))
except:
return {}
try:
recid_map = cPickle.load(temp_file)
except:
return {}
try:
recid_dict = recid_map[rule]
except:
recid_dict = {}
return recid_dict
######################### CERN SPECIFIC FUNCTIONS #############################
def get_order_dict_from_recid_list_CERNBulletin(list, issue_number):
"""
special derivative of the get_order_dict_from_recid_list function that
extends the behavior insofar as too return a dictionary in which every
entry is a dict (there can be several number 1 articles) and every dict entry
is a tuple with an additional boolean to indicate if there is a graphical "new"
flag. the dict key on the second level is the upload time in epoch seconds.
e.g.
{1:{10349:(rec, true), 24792:(rec, false)}, 2:{736424:(rec,false)}, 24791:{1:(rec:false}}
the ordering inside an order number is given by upload date. so it is an ordering
1-level -> number
2-level -> date
"""
ordered_records = {}
for record in list:
temp_rec = BibFormatObject(record)
issue_numbers = temp_rec.fields('773__n')
order_number = temp_rec.fields('773__c')
try:
# upload_date = run_sql("SELECT modification_date FROM bibrec WHERE id=%s", (record, ))[0][0]
upload_date = run_sql("SELECT creation_date FROM bibrec WHERE id=%s", (record, ))[0][0]
except:
pass
#return repr(time.mktime(upload_date.timetuple()))
# todo: the marc fields have to be set'able by some sort of config interface
n = 0
for temp_issue in issue_numbers:
if temp_issue == issue_number:
try:
order_number = int(order_number[n])
except:
# todo: Warning, record does not support numbering scheme
order_number = -1
n+=1
if order_number != -1:
try:
if ordered_records.has_key(order_number):
ordered_records[order_number][int(time.mktime(upload_date.timetuple()))] = (record, True)
else:
ordered_records[order_number] = {int(time.mktime(upload_date.timetuple())):(record, False)}
except:
pass
# todo: Error, there are two records with the same order_number in the issue
else:
ordered_records[max(ordered_records.keys()) + 1] = record
return ordered_records
def get_recid_from_order_CERNBulletin(order, rule, issue_number):
"""
same functionality as get_recid_from_order above, but extends it for
the CERN Bulletin in a way so multiple entries for the first article are
possible.
parameters:
order: the order at which the record appears in the journal as passed
in the url
rule: the defining rule of the journal record category
issue_number: the issue number for which we are searching
returns:
recid: the recid of the ordered record
"""
# try to get it from cache
recid_dict = {}
recid_dict = get_cached_recid_data_dict_CERNBulletin(issue_number, rule)
if recid_dict.has_key(order):
recid = recid_dict[order]
return recid
alternative_issue_number = "00/0000"
# get the id list
if issue_number[0] == "0":
alternative_issue_number = issue_number[1:]
all_records = list(search_pattern(p="%s and 773__n:%s" %
(rule, issue_number),
f="&action_search=Search"))
all_records += list(search_pattern(p="%s and 773__n:%s" %
(rule, alternative_issue_number),
f="&action_search=Search"))
else:
all_records = list(search_pattern(p="%s and 773__n:%s" %
(rule, issue_number),
f="&action_search=Search"))
#raise repr(all_records)
ordered_records = {}
new_addition_records = []
for record in all_records:
temp_rec = BibFormatObject(record) # todo: refactor with get_fieldValues from search_engine
issue_numbers = temp_rec.fields('773__n')
order_number = temp_rec.fields('773__c')
#raise "%s:%s" % (repr(issue_numbers), repr(order_number))
# todo: fields for issue number and order number have to become generic
n = 0
for temp_issue in issue_numbers:
if temp_issue == issue_number or temp_issue == alternative_issue_number:
try:
order_number = int(order_number[n])
except:
register_exception(stream="warning", suffix="There \
was an article in the journal that does not support \
a numbering scheme")
order_number = -1000
n+=1
if order_number == -1000:
ordered_records[max(ordered_records.keys()) + 1] = record
elif order_number <= 1:
new_addition_records.append(record)
else:
try:
ordered_records[order_number] = record
except:
register_exception(stream='warning', suffix="There \
were double entries for an order in this journal.")
# process the CERN Bulletin specific new additions
if len(new_addition_records) > 1 and int(order) <= 1:
# if we are dealing with a new addition (order number smaller 1)
ordered_new_additions = {}
for record in new_addition_records:
#upload_date = run_sql("SELECT modification_date FROM bibrec WHERE id=%s", (record, ))[0][0]
upload_date = run_sql("SELECT creation_date FROM bibrec WHERE id=%s", (record, ))[0][0]
ordered_new_additions[int(time.mktime(upload_date.timetuple()))] = record
i = 1
while len(ordered_new_additions) > 0:
temp_key = pop_oldest_article_CERNBulletin(ordered_new_additions)
record = ordered_new_additions.pop(int(temp_key))
ordered_records[i] = record
i -=1
else:
# if we have only one record on 1 just push it through
ordered_records[1] = new_addition_records[0]
try:
recid = ordered_records[int(order)]
except:
register_exception()
cache_recid_data_dict_CERNBulletin(recid, issue_number, rule, order)
return recid
def pop_newest_article_CERNBulletin(news_article_dict):
"""
pop key of the most recent article (highest c-timestamp)
"""
keys = news_article_dict.keys()
keys.sort()
key = keys[len(keys)-1]
return key
def pop_oldest_article_CERNBulletin(news_article_dict):
"""
pop key of the oldest article (lowest c-timestamp)
"""
keys = news_article_dict.keys()
keys.sort()
key = keys[0]
return key
########################### REGULAR EXPRESSIONS ###############################
header_pattern = re.compile('
\s*\s*)?
#url(["']?(?P\S*)["']?)
diff --git a/modules/webjournal/lib/webjournal_webinterface.py b/modules/webjournal/lib/webjournal_webinterface.py
index df5b5511b..0ef3e001b 100644
--- a/modules/webjournal/lib/webjournal_webinterface.py
+++ b/modules/webjournal/lib/webjournal_webinterface.py
@@ -1,627 +1,627 @@
# -*- coding: utf-8 -*-
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""WebJournal Web Interface."""
__revision__ = "$Id$"
__lastupdated__ = """$Date$"""
import time
import os
import urllib
from urllib2 import urlopen
from email import message_from_string
from xml.dom import minidom
from mod_python import apache
from invenio.webinterface_handler import wash_urlargd, WebInterfaceDirectory
from invenio.access_control_engine import acc_authorize_action
-from invenio.config import weburl, webdir, cdslang, etcdir
+from invenio.config import weburl, CFG_WEBDIR, cdslang, CFG_ETCDIR
from invenio.webpage import page
from invenio.webuser import getUid
from invenio.urlutils import redirect_to_url
from invenio.errorlib import register_exception
from invenio.bibformat_engine import format_with_format_template, BibFormatObject
from invenio.search_engine import search_pattern
from webjournal_config import *
from invenio.webjournal_utils import get_recid_from_order, \
get_recid_from_order_CERNBulletin, \
parse_url_string, \
get_xml_from_config, \
please_login, \
get_current_issue, \
get_rule_string_from_rule_list, \
get_monday_of_the_week, \
cache_index_page, \
get_index_page_from_cache, \
get_article_page_from_cache, \
cache_article_page, \
clear_cache_for_issue
from invenio.webjournal_washer import wash_category, \
wash_issue_number, \
wash_journal_name, \
wash_journal_language, \
wash_article_number, \
wash_popup_type, \
wash_popup_record, \
wash_archive_date
from invenio.webjournal import perform_request_index, \
perform_request_article, \
perform_request_alert, \
perform_request_issue_control, \
perform_request_popup, \
perform_request_administrate, \
perform_request_search
from invenio.webjournal_templates import tmpl_webjournal_regenerate_success, \
tmpl_webjournal_regenerate_error, \
tmpl_webjournal_feature_record_interface, \
tmpl_webjournal_feature_record_success, \
tmpl_webjournal_alert_plain_text_CERNBulletin, \
tmpl_webjournal_alert_subject_CERNBulletin, \
tmpl_webjournal_alert_success_msg, \
tmpl_webjournal_alert_interface
class WebInterfaceJournalPages(WebInterfaceDirectory):
"""Defines the set of /journal pages."""
_exports = ['', 'article', 'issue_control', 'edit_article', 'alert', 'search',
'feature_record', 'popup', 'regenerate', 'administrate']
# profiler
#def index(self, req, form):
# import hotshot
# pr = hotshot.Profile('/tmp/journal_profile')
# return pr.runcall(self.index_bla, req=req, form=form)
def index(self, req, form):
"""
Index page.
Washes all the parameters and stores them in journal_defaults dict
for subsequent format_elements.
Passes on to logic function and eventually returns HTML.
"""
argd = wash_urlargd(form, {'name': (str, ""),
'issue': (str, ""),
'category': (str, ""),
'ln': (str, "")}
)
try:
language = wash_journal_language(argd['ln'])
journal_name = wash_journal_name(language, argd['name'])
issue_number = wash_issue_number(language, journal_name,
argd['issue'])
category = wash_category(language, argd['category'])
except InvenioWebJournalNoJournalOnServerError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoNameError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoCurrentIssueError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalIssueNumberBadlyFormedError, e:
register_exception(req=req)
return e.user_box()
# the journal_defaults will be used by format elements that have no
# direct access to the params here, no more checking needed
req.journal_defaults = {"name": journal_name,
"issue": issue_number,
"ln": language,
"category": category}
-
+
html = perform_request_index(req, journal_name, issue_number, language,
category)
return html
-
+
def article(self, req, form):
"""
Article page.
Washes all the parameters and stores them in journal_defaults dict
for subsequent format_elements.
Passes on to logic function and eventually returns HTML.
"""
argd = wash_urlargd(form, {'name': (str, ""),
'issue': (str, ""),
'category': (str, ""),
'number': (str, ""),
'ln': (str, ""),
'editor': (str, "False")}
)
try:
language = wash_journal_language(argd['ln'])
journal_name = wash_journal_name(language, argd['name'])
issue_number = wash_issue_number(language, journal_name,
argd['issue'])
category = wash_category(language, argd['category'])
number = wash_article_number(language, argd['number'], journal_name)
editor = argd['editor']
except InvenioWebJournalNoJournalOnServerError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoNameError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoCurrentIssueError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalIssueNumberBadlyFormedError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoArticleNumberError, e:
register_exception(req=req)
return e.user_box()
# automatically make all logged in users of cfgwebjournal editors
if acc_authorize_action(getUid(req), 'cfgwebjournal',
name="%s" % journal_name)[0] == 0:
editor = "True"
# the journal_defaults will be used by format elements that have no
# direct access to the params here, no more checking needed
req.journal_defaults = {"name" : journal_name,
"issue" : issue_number,
"ln" : language,
"category" : category,
"editor" : editor,
"number" : number}
-
+
html = perform_request_article(req, journal_name, issue_number,
language, category, number, editor)
return html
-
+
def edit_article(self, req, form):
"""
Simple url redirecter to toggle the edit mode on for article pages.
Checks if user is logged in.
"""
argd = wash_urlargd(form, {'name': (str, ""),
'ln': (str, "")})
try:
language = wash_journal_language(argd['ln'])
journal_name = wash_journal_name(language, argd['name'])
except InvenioWebJournalNoJournalOnServerError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoNameError, e:
register_exception(req=req)
return e.user_box()
-
+
if acc_authorize_action(getUid(req), 'cfgwebjournal',
name="%s" % journal_name)[0] != 0:
return please_login(req, journal_name,
backlink='%s/journal/edit_article?%s'
% (weburl, urllib.quote(req.args)))
# todo: use make_canonical_url from urlutils
redirect_to_url(req,
"%s/journal/article?%s&editor=True"
% (weburl, req.args))
def administrate(self, req, form):
"""Index page."""
argd = wash_urlargd(form, {'name': (str, ""),
'ln': (str, "")
})
try:
language = wash_journal_language(argd['ln'])
journal_name = wash_journal_name(language, argd['name'])
except InvenioWebJournalNoJournalOnServerError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoNameError, e:
register_exception(req=req)
return e.user_box()
# check for user rights
if acc_authorize_action(getUid(req), 'cfgwebjournal',
name="%s" % journal_name)[0] != 0:
return please_login(req, journal_name,
backlink='%s/journal/administrate?name=%s'
% (weburl, journal_name))
-
+
return perform_request_administrate(journal_name, language)
-
+
def feature_record(self, req, form):
"""
Interface to feature a record. Will be saved in a flat file.
"""
argd = wash_urlargd(form, {'name': (str, ""),
'recid': (str, "init"),
'featured': (str, "false"),
'url': (str, "init"),
'ln': (str, "")
})
try:
language = wash_journal_language(argd['ln'])
journal_name = wash_journal_name(language, argd['name'])
recid = argd['recid']
url = argd['url']
featured = argd['featured']
except InvenioWebJournalNoJournalOnServerError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoNameError, e:
register_exception(req=req)
return e.user_box()
# check for user rights
if acc_authorize_action(getUid(req), 'cfgwebjournal',
name="%s" % journal_name)[0] != 0:
return please_login(req, journal_name,
backlink='%s/journal/feature_record?name=%s'
% (weburl, journal_name))
-
+
if recid == "init":
return tmpl_webjournal_feature_record_interface(language,
- journal_name)
+ journal_name)
else:
# todo: move to DB, maybe?
fptr = open('%s/webjournal/%s/featured_record'
- % (etcdir, journal_name), "w")
+ % (CFG_ETCDIR, journal_name), "w")
fptr.write(recid)
fptr.write('\n')
fptr.write(argd['url'])
fptr.close()
return tmpl_webjournal_feature_record_success(language,
- journal_name, recid)
-
+ journal_name, recid)
+
def regenerate(self, req, form):
"""
Clears the cache for the issue given.
"""
argd = wash_urlargd(form, {'name': (str, ""),
'issue': (str, ""),
'ln': (str, "")})
try:
language = wash_journal_language(argd['ln'])
journal_name = wash_journal_name(language, argd['name'])
issue_number = wash_issue_number(language, journal_name,
argd['issue'])
except InvenioWebJournalNoJournalOnServerError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoNameError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoCurrentIssueError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalIssueNumberBadlyFormedError, e:
register_exception(req=req)
return e.user_box()
# check for user rights
if acc_authorize_action(getUid(req), 'cfgwebjournal',
name="%s" % journal_name)[0] != 0:
return please_login(req, journal_name,
backlink='%s/journal/regenerate?name=%s'
% (weburl, journal_name))
# clear cache
success = clear_cache_for_issue(journal_name, issue_number)
if success:
return tmpl_webjournal_regenerate_success(language, journal_name,
issue_number)
else:
return tmpl_webjournal_regenerate_error(language, journal_name,
issue_number)
-
+
def alert(self, req, form):
"""
Alert system.
Sends an email alert, in HTML/PlainText or only PlainText to a mailing
list to alert for new journal releases.
"""
argd = wash_urlargd(form, {'name': (str, ""),
'sent': (str, "False"),
'plainText': (str, u''),
'htmlMail': (str, ""),
'recipients': (str, ""),
'subject': (str, ""),
'ln': (str, ""),
'issue': (str, ""),
'force': (str, "False")})
try:
language = wash_journal_language(argd['ln'])
journal_name = wash_journal_name(language, argd['name'])
issue_number = wash_issue_number(language, journal_name,
argd['issue'])
plain_text = argd['plainText']
html_mail = argd['htmlMail']
recipients = argd['recipients']
subject = argd['subject']
sent = argd['sent']
force = argd['force']
except InvenioWebJournalNoJournalOnServerError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoNameError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoCurrentIssueError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalIssueNumberBadlyFormedError, e:
register_exception(req=req)
return e.user_box()
# login
if acc_authorize_action(getUid(req), 'cfgwebjournal',
name="%s" % journal_name)[0] != 0:
return please_login(req, journal_name,
backlink='%s/journal/alert?name=%s'
% (weburl, journal_name))
-
+
html = perform_request_alert(req, journal_name, issue_number, language,
sent, plain_text, subject, recipients,
html_mail, force)
return html
-
+
def issue_control(self, req, form):
"""
page that allows full control over creating, backtracing, adding to,
removing from issues.
"""
argd = wash_urlargd(form, {'name': (str, ""),
'add': (str, ""),
'action_publish': (str, "cfg"),
'issue_number': (list, []),
'ln': (str, "")}
)
try:
language = wash_journal_language(argd['ln'])
journal_name = wash_journal_name(language, argd['name'])
issue_numbers = []
for number in argd['issue_number']:
if number != "ww/YYYY":
issue_numbers.append(wash_issue_number(language,
journal_name,
number))
add = argd['add']
action = argd['action_publish']
except InvenioWebJournalNoJournalOnServerError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoNameError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoCurrentIssueError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalIssueNumberBadlyFormedError, e:
register_exception(req=req)
return e.user_box()
# check user rights
if acc_authorize_action(getUid(req), 'cfgwebjournal',
name="%s" % journal_name)[0] != 0:
return please_login(req, journal_name)
-
+
html = perform_request_issue_control(req, journal_name, issue_numbers,
language, add, action)
-
+
return html
-
+
def popup(self, req, form):
"""
simple pass-through function that serves as a checker for popups.
"""
argd = wash_urlargd(form, {'name': (str, ""),
'record': (str, ""),
'type': (str, ""),
'ln': (str, "")
})
try:
language = wash_journal_language(argd['ln'])
journal_name = wash_journal_name(language, argd['name'])
type = wash_popup_type(language, argd['type'], journal_name)
record = wash_popup_record(language, argd['record'], journal_name)
except InvenioWebJournalNoJournalOnServerError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoNameError, e:
register_exception(req=req)
return e.user_box()
except IvenioWebJournalNoPopupTypeError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoPopupRecordError, e:
register_exception(req=req)
return e.user_box()
-
+
html = perform_request_popup(req, language, journal_name, type, record)
-
+
return html
-
-
-
+
+
+
def search(self, req, form):
"""
Creates a temporary record containing all the information needed for
the search, meaning list of issue_numbers (timeframe), list of keywords,
list of categories to search in. In this way everything can be configured
globally in the config for the given webjournal and we can reuse the bibformat
for whatever search we want.
"""
argd = wash_urlargd(form, {'name': (str, ""),
'issue': (str, ""),
'archive_year': (str, ""),
'archive_issue': (str, ""),
'archive_select': (str, "False"),
'archive_date': (str, ""),
'archive_search': (str, "False"),
'ln': (str, cdslang)})
try:
language = wash_journal_language(argd['ln'])
journal_name = wash_journal_name(language, argd['name'])
archive_issue = wash_issue_number(language, journal_name,
argd['archive_issue'])
archive_date = wash_archive_date(language, journal_name,
argd['archive_date'])
issue_number = wash_issue_number(language, journal_name,
argd['issue'])
archive_year = argd['archive_year']
archive_select = argd['archive_select']
archive_search = argd['archive_search']
except InvenioWebJournalNoJournalOnServerError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoNameError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalNoCurrentIssueError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalIssueNumberBadlyFormedError, e:
register_exception(req=req)
return e.user_box()
except InvenioWebJournalArchiveDateWronglyFormedError, e:
register_exception(req=req)
return e.user_box()
req.journal_defaults = {"name" : journal_name,
"issue" : issue_number,
"archive_year" : archive_year,
"archive_issue" : archive_issue,
"archive_select" : archive_select,
"archive_date" : archive_date,
"archive_search" : archive_search,
"language" : language,
}
-
+
html = perform_request_search(journal_name, language, req, issue_number,
archive_year, archive_issue,
archive_select, archive_date, archive_search)
return html
-
+
#if argd['name'] == "":
# register_exception(stream='warning',
# suffix="User tried to search without providing a journal name.")
# return webjournal_missing_info_box(req, title="Journal not found",
# msg_title="We don't know which journal you are looking for",
# msg='''You were looking for a journal without providing a name.
# Unfortunately we cannot know which journal you are looking for.
# Below you have a selection of journals that are available on this server.
# If you should find your journal there, just click the link,
# otherwise please contact the server admin and ask for existence
# of the journal you are looking for.''')
#else:
# journal_name = argd['name']
-
+
# config_strings = get_xml_from_config(["search", "issue_number", "rule"], journal_name)
# try:
- # try:
+ # try:
# search_page_template = config_strings["search"][0]
# except:
# raise InvenioWebJournalNoArticleTemplateError(journal_name) # todo: new exception
# except InvenioWebJournalNoArticleTemplateError:
# register_exception(req=req)
# return webjournal_error_box(req,
# "Search Page Template not found",
# "Problem with the configuration for this journal.",
# "The system couldn't find the template for the search result page of this journal. This is a mandatory file and thus indicates that the journal was setup wrong or produced an internal error. If you are neither admin nor developer there is nothing you can do at this point, but send an email request. We apologize for the inconvenience.")
# search_page_template_path = 'webjournal/%s' % (search_page_template)
# try:
# try:
# issue_number_tag = config_strings["issue_number"][0]
# except KeyError:
# raise InvenioWebJournalNoIssueNumberTagError(journal_name)
# except InvenioWebJournalNoIssueNumberTagError:
# register_exception(req=req)
# return webjournal_error_box(req,
# title="No Issues",
# title_msg="Problem with the configuration of this journal",
# msg="The system couldn't find a definition for an issue numbering system. Issue numbers conrol the date of the publication you are seing. This indicates that there is an error in the setup of this journal or the Software itself. There is nothing you can do at the moment. If you wish you can send an inquiry to the responsible developers. We apologize for the inconvenience.")
# rule_list = config_strings["rule"]
# try:
# if len(rule_list) == 0:
- # raise InvenioWebJournalNoArticleRuleError()
- # except InvenioWebJournalNoArticleRuleError, e:
+ # raise InvenioWebJournalNoArticleRuleError()
+ # except InvenioWebJournalNoArticleRuleError, e:
# register_exception(req=req)
# return webjournal_error_box(req,
# "No searchable Articles",
# "Problem with the configuration of this journal",
# "The system couldn't find the definitions for different article kinds (e.g. News, Sports, etc.). If there is nothing defined, nothing can be shown and it thus indicates that there is either a problem with the setup of this journal or in the Software itself. There is nothing you can do at this moment. If you wish you can send an inquiry to the responsible developers. We apologize for the inconvenience.")
# category_rules = []
# if argd['category'] == []:
# # append all categories
# for rule_string in rule_list:
# marc = {}
# marc["category"] = rule_string.split(",")[0]
# rule = rule_string.split(",")[1]
# marc_datafield = rule.split(":")[0]
# marc["rule_match"] = rule.split(":")[1]
# marc["marc_tag"] = marc_datafield[1:4]
# marc["marc_ind1"] = (marc_datafield[4] == "_") and " " or marc_datafield[4]
# marc["marc_ind2"] = (marc_datafield[5] == "_") and " " or marc_datafield[5]
# marc["marc_subfield"] = marc_datafield[6]
# category_rules.append(marc)
# else:
# # append only categories from the url param
# for single_category in argd['category']:
# rule_string = get_rule_string_from_rule_list(rule_list, single_category)
# marc = {}
# marc["category"] = rule_string.split(",")[0]
# rule = rule_string.split(",")[1]
# marc_datafield = rule.split(":")[0]
# marc["rule_match"] = rule.split(":")[1]
# marc["marc_tag"] = marc_datafield[1:4]
# marc["marc_ind1"] = (marc_datafield[4] == "_") and " " or marc_datafield[4]
# marc["marc_ind2"] = (marc_datafield[5] == "_") and " " or marc_datafield[5]
# marc["marc_subfield"] = marc_datafield[6]
# category_rules.append(marc)
- #
+ #
# category_fields = "\n".join(['''
#
# %s
- #
+ #
# ''' % (marc["marc_tag"],
# marc["marc_ind1"],
# marc["marc_ind2"],
# marc["marc_subfield"],
# marc["rule_match"]) for marc in category_rules])
- #
+ #
# issue_number_fields = "\n".join(['''
#
# %s
#
# ''' % (issue_number_tag[:3],
# (issue_number_tag[3] == "_") and " " or issue_number_tag[3],
# (issue_number_tag[4] == "_") and " " or issue_number_tag[4],
# issue_number_tag[5],
# issue_number) for issue_number in argd['issue']])
- #
+ #
# temp_marc = '''
# 0
# %s
# %s
# ''' % (issue_number_fields, category_fields)
#
#
# # create a record and get HTML back from bibformat
# bfo = BibFormatObject(0, ln=argd['ln'], xml_record=temp_marc, req=req) # pass 0 for rn, we don't need it
# html_out = format_with_format_template(search_page_template_path, bfo)[0]
- #
+ #
# #perform_request_search(cc="News Articles", p="families and 773__n:23/2007")
# #cc = argd['category']
# #p = keyword
# #for issue_number in argd['issue_number']:
# # p += " and 773__n:%s" % issue_number
# ## todo: issue number tag generic from config
# #results = perform_request_search(cc=cc, p=p)
- #
+ #
# return html_out
if __name__ == "__main__":
index()
diff --git a/modules/webjournal/lib/widgets/bfe_webjournal_widget_seminars.py b/modules/webjournal/lib/widgets/bfe_webjournal_widget_seminars.py
index e5461fa25..df67bb5f3 100644
--- a/modules/webjournal/lib/widgets/bfe_webjournal_widget_seminars.py
+++ b/modules/webjournal/lib/widgets/bfe_webjournal_widget_seminars.py
@@ -1,155 +1,155 @@
# -*- coding: utf-8 -*-
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
-from invenio.config import cachedir
+from invenio.config import CFG_CACHEDIR
from urllib2 import urlopen
from xml.dom import minidom
import time
Cached_Filename = "webjournal_widget_seminars.xml"
Indico_Seminar_Location = "http://indico.cern.ch/tools/export.py?fid=1l7&date=today&days=1&of=xml"
Update_Frequency = 3600 # in seconds
def format(bfo):
"""
"""
out = get_widget_HTML(bfo)
return out
def escape_values(bfo):
"""
"""
return 0
def get_widget_HTML(bfo):
"""
Indico seminars of the day service
Gets seminars of the day from CERN Indico every 60 minutes and displays
them in a widget.
"""
try:
- seminar_xml = minidom.parse('%s/%s' % (cachedir, Cached_Filename))
+ seminar_xml = minidom.parse('%s/%s' % (CFG_CACHEDIR, Cached_Filename))
except:
_update_seminars()
- seminar_xml = minidom.parse('%s/%s' % (cachedir, Cached_Filename))
+ seminar_xml = minidom.parse('%s/%s' % (CFG_CACHEDIR, Cached_Filename))
try:
timestamp = seminar_xml.firstChild.getAttribute("time")
except:
timestamp = time.struct_time()
-
+
last_update = time.mktime(time.strptime(timestamp, "%a, %d %b %Y %H:%M:%S %Z"))
now = time.mktime(time.gmtime())
if last_update + Update_Frequency < now:
_update_seminars()
- seminar_xml = minidom.parse('%s/%s' % (cachedir, Cached_Filename))
+ seminar_xml = minidom.parse('%s/%s' % (CFG_CACHEDIR, Cached_Filename))
html = ""
seminars = seminar_xml.getElementsByTagName("seminar")
if len(seminars) == 0:
return "
"
-
+
return html.encode('utf-8')
-
+
def _update_seminars():
"""
helper function that gets the xml data source from CERN Indico and creates
a dedicated xml file in the cache for easy use in the widget.
"""
indico_xml = urlopen(Indico_Seminar_Location)
xml_file_handler = minidom.parseString(indico_xml.read())
seminar_xml = ['' % time.strftime("%a, %d %b %Y %H:%M:%S GMT", time.gmtime()), ]
agenda_items = xml_file_handler.getElementsByTagName("agenda_item")
for item in agenda_items:
seminar_xml.extend(["", ])
try:
start_time = item.getElementsByTagName("start_time")[0].firstChild.toxml()
except:
start_time = ""
seminar_xml.extend(["%s" % start_time, ])
try:
category = item.getElementsByTagName("category")[0].firstChild.toxml()
category = category.split("/")[-1]
category = category.replace("&", "")
category = category.replace("nbsp;", "")
- category = category.replace(" ", "")
+ category = category.replace(" ", "")
except:
category = ""
seminar_xml.extend(["%s" % category, ])
try:
title = item.getElementsByTagName("title")[0].firstChild.toxml()
except:
title = ""
seminar_xml.extend(["%s" % title, ])
try:
url = item.getElementsByTagName("agenda_url")[0].firstChild.toxml()
except:
url = "#"
seminar_xml.extend(["%s" % url, ])
try:
speaker = item.getElementsByTagName("speaker")[0].firstChild.toxml()
except:
speaker = ""
seminar_xml.extend(["%s" % speaker, ])
try:
room = item.getElementsByTagName("room")[0].firstChild.toxml()
except:
room = ""
seminar_xml.extend(["%s" % room, ])
seminar_xml.extend(["", ])
- seminar_xml.extend(["", ])
+ seminar_xml.extend(["", ])
# write the created file to cache
- fptr = open("%s/%s" % (cachedir, Cached_Filename), "w")
+ fptr = open("%s/%s" % (CFG_CACHEDIR, Cached_Filename), "w")
fptr.write(("\n".join(seminar_xml)).encode('utf-8'))
fptr.close()
-
+
if __name__ == "__main__":
- get_widget_HTML()
\ No newline at end of file
+ get_widget_HTML()
diff --git a/modules/webjournal/lib/widgets/bfe_webjournal_widget_weather.py b/modules/webjournal/lib/widgets/bfe_webjournal_widget_weather.py
index e3dca69fe..c1f3b528e 100644
--- a/modules/webjournal/lib/widgets/bfe_webjournal_widget_weather.py
+++ b/modules/webjournal/lib/widgets/bfe_webjournal_widget_weather.py
@@ -1,132 +1,132 @@
# -*- coding: utf-8 -*-
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
"""
from invenio import errorlib
-from invenio.config import cachedir
+from invenio.config import CFG_CACHEDIR
import feedparser
import time
from urllib2 import urlopen
from invenio.errorlib import register_exception
import re
Weather_Service = "Yahoo! Weather"
# rss feed on yahoo weather, check developer.yahoo.com/weather for details
RSS_Feed = "http://weather.yahooapis.com/forecastrss?p=SZXX0008&u=c"
# filename of the rss feed in cache
Cached_Filename = "webjournal_widget_YahooWeather.rss"
# filename of flat file in cache that holds the expire time
Expire_Time_Filename = "weather_RSS_expires"
image_pattern = re.compile('''
\S*)\s*/>*
'''
,re.DOTALL | re.IGNORECASE | re.VERBOSE)
def format(bfo, title_en="", title_fr=""):
"""
wrapper function needed for BibFormat to route the widget HTML
"""
out = get_widget_HTML()
if bfo.lang == "fr":
title = title_fr
else:
title = title_en
if title != "":
try:
weather_image_match = image_pattern.findall(out)[0]
weather_image = weather_image_match[1]
out = re.sub(image_pattern, "", out)
except:
register_exception(req=bfo.req)
weather_image = ""
weather_image = weather_image.replace("\"", "\'")
out = '''
''' % (weather_image, title, out)
return out
def escape_values(bfo):
"""
"""
return 0
def get_widget_HTML():
"""
weather forecast using Yahoo! Weather service
we check and store the "expires" data from the rss feed to decide when
an update is needed.
- there always resides a cached version in cds cachedir along with a flat
+ there always resides a cached version in cds CFG_CACHEDIR along with a flat
file that indicates the time when the feed expires.
"""
try:
- weather_feed = feedparser.parse('%s/%s' % (cachedir, Cached_Filename))
+ weather_feed = feedparser.parse('%s/%s' % (CFG_CACHEDIR, Cached_Filename))
except:
_update_feed()
- weather_feed = feedparser.parse('%s/%s' % (cachedir, Cached_Filename))
-
+ weather_feed = feedparser.parse('%s/%s' % (CFG_CACHEDIR, Cached_Filename))
+
now_in_gmt = time.gmtime()
now_time_string = time.strftime( "%a, %d %b %Y %H:%M:%S GMT", now_in_gmt)
try:
- expire_time_string = open('%s/%s' (cachedir, Expire_Time_Filename)).read()
+ expire_time_string = open('%s/%s' (CFG_CACHEDIR, Expire_Time_Filename)).read()
expire_time = time.strptime(open(Expire_Time_Filename).read(), "%a, %d %b %Y %H:%M:%S %Z")
#expire_time['tm_isdt'] = 0
expire_in_seconds = time.mktime(expire_time)
now_in_seconds = time.mktime(now_in_gmt)
diff = time.mktime(expire_time) - time.mktime(now_in_gmt)
except:
diff = -1
if diff < 0:
_update_feed()
- weather_feed = feedparser.parse('%s/%s' % (cachedir, Cached_Filename))
-
+ weather_feed = feedparser.parse('%s/%s' % (CFG_CACHEDIR, Cached_Filename))
+
# construct the HTML
html = weather_feed.entries[0]['summary']
-
+
return html
-
-
+
+
def _update_feed():
"""
helper function that updates the feed by copying the new rss file to the
cache dir and resetting the time string on the expireTime flat file
"""
feed = urlopen(RSS_Feed)
- cached_file = open('%s/%s' % (cachedir, Cached_Filename), 'w')
+ cached_file = open('%s/%s' % (CFG_CACHEDIR, Cached_Filename), 'w')
cached_file.write(feed.read())
cached_file.close()
feed_data = feedparser.parse(RSS_Feed)
expire_time = feed_data.headers['expires']
- expire_file = open('%s/%s' % (cachedir, Expire_Time_Filename), 'w')
+ expire_file = open('%s/%s' % (CFG_CACHEDIR, Expire_Time_Filename), 'w')
expire_file.write(expire_time)
expire_file.close()
if __name__ == "__main__":
from invenio.bibformat_engine import BibFormatObject
myrec = BibFormatObject(7)
- format(myrec)
\ No newline at end of file
+ format(myrec)
diff --git a/modules/websearch/lib/search_engine.py b/modules/websearch/lib/search_engine.py
index 7f4aaa607..e489359a2 100644
--- a/modules/websearch/lib/search_engine.py
+++ b/modules/websearch/lib/search_engine.py
@@ -1,4060 +1,4060 @@
# -*- coding: utf-8 -*-
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
# pylint: disable-msg=C0301
"""CDS Invenio Search Engine in mod_python."""
__lastupdated__ = """$Date$"""
__revision__ = "$Id$"
## import general modules:
import cgi
import copy
import string
import os
import re
import time
import urllib
import zlib
## import CDS Invenio stuff:
from invenio.config import \
CFG_CERN_SITE, \
CFG_OAI_ID_FIELD, \
CFG_WEBCOMMENT_ALLOW_REVIEWS, \
CFG_WEBSEARCH_CALL_BIBFORMAT, \
CFG_WEBSEARCH_CREATE_SIMILARLY_NAMED_AUTHORS_LINK_BOX, \
CFG_WEBSEARCH_FIELDS_CONVERT, \
CFG_WEBSEARCH_NB_RECORDS_TO_SORT, \
CFG_WEBSEARCH_SEARCH_CACHE_SIZE, \
CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS, \
CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS, \
cdslang, \
cdsname, \
- logdir, \
+ CFG_LOGDIR, \
weburl
from invenio.search_engine_config import CFG_EXPERIMENTAL_FEATURES, InvenioWebSearchUnknownCollectionError
from invenio.bibrank_record_sorter import get_bibrank_methods, rank_records
from invenio.bibrank_downloads_similarity import register_page_view_event, calculate_reading_similarity_list
from invenio.bibindex_engine_stemmer import stem
from invenio.bibformat import format_record, format_records, get_output_format_content_type, create_excel
from invenio.bibformat_config import CFG_BIBFORMAT_USE_OLD_BIBFORMAT
from invenio.bibrank_downloads_grapher import create_download_history_graph_and_box
from invenio.data_cacher import DataCacher
from invenio.websearch_external_collections import print_external_results_overview, perform_external_collection_search
from invenio.access_control_admin import acc_get_action_id
from invenio.access_control_config import VIEWRESTRCOLL, \
CFG_ACC_GRANT_AUTHOR_RIGHTS_TO_EMAILS_IN_TAGS
from invenio.websearchadminlib import get_detailed_page_tabs
from invenio.intbitset import intbitset as HitSet
from invenio.webinterface_handler import wash_urlargd
from invenio.urlutils import make_canonical_urlargd
from invenio.dbquery import DatabaseError
from invenio.access_control_engine import acc_authorize_action
import invenio.template
webstyle_templates = invenio.template.load('webstyle')
webcomment_templates = invenio.template.load('webcomment')
from invenio.bibrank_citation_searcher import calculate_cited_by_list, calculate_co_cited_with_list, get_self_cited_in, get_self_cited_by
from invenio.bibrank_citation_grapher import create_citation_history_graph_and_box
from invenio.dbquery import run_sql, run_sql_cached, get_table_update_time, Error
from invenio.webuser import getUid, collect_user_info
from invenio.webpage import page, pageheaderonly, pagefooteronly, create_error_box
from invenio.messages import gettext_set_language
try:
from mod_python import apache
except ImportError, e:
pass # ignore user personalisation, needed e.g. for command-line
try:
import invenio.template
websearch_templates = invenio.template.load('websearch')
except:
pass
## global vars:
search_cache = {} # will cache results of previous searches
cfg_nb_browse_seen_records = 100 # limit of the number of records to check when browsing certain collection
cfg_nicely_ordered_collection_list = 0 # do we propose collection list nicely ordered or alphabetical?
collection_reclist_cache_timestamp = 0
field_i18nname_cache_timestamp = 0
collection_i18nname_cache_timestamp = 0
## precompile some often-used regexp for speed reasons:
re_word = re.compile('[\s]')
re_quotes = re.compile('[\'\"]')
re_doublequote = re.compile('\"')
re_equal = re.compile('\=')
re_logical_and = re.compile('\sand\s', re.I)
re_logical_or = re.compile('\sor\s', re.I)
re_logical_not = re.compile('\snot\s', re.I)
re_operators = re.compile(r'\s([\+\-\|])\s')
re_pattern_wildcards_at_beginning = re.compile(r'(\s)[\*\%]+')
re_pattern_single_quotes = re.compile("'(.*?)'")
re_pattern_double_quotes = re.compile("\"(.*?)\"")
re_pattern_regexp_quotes = re.compile("\/(.*?)\/")
re_pattern_short_words = re.compile(r'([\s\"]\w{1,3})[\*\%]+')
re_pattern_space = re.compile("__SPACE__")
re_pattern_today = re.compile("\$TODAY\$")
re_unicode_lowercase_a = re.compile(unicode(r"(?u)[áàäâãå]", "utf-8"))
re_unicode_lowercase_ae = re.compile(unicode(r"(?u)[æ]", "utf-8"))
re_unicode_lowercase_e = re.compile(unicode(r"(?u)[éèëê]", "utf-8"))
re_unicode_lowercase_i = re.compile(unicode(r"(?u)[íìïî]", "utf-8"))
re_unicode_lowercase_o = re.compile(unicode(r"(?u)[óòöôõø]", "utf-8"))
re_unicode_lowercase_u = re.compile(unicode(r"(?u)[úùüû]", "utf-8"))
re_unicode_lowercase_y = re.compile(unicode(r"(?u)[ýÿ]", "utf-8"))
re_unicode_lowercase_c = re.compile(unicode(r"(?u)[çć]", "utf-8"))
re_unicode_lowercase_n = re.compile(unicode(r"(?u)[ñ]", "utf-8"))
re_unicode_uppercase_a = re.compile(unicode(r"(?u)[ÁÀÄÂÃÅ]", "utf-8"))
re_unicode_uppercase_ae = re.compile(unicode(r"(?u)[Æ]", "utf-8"))
re_unicode_uppercase_e = re.compile(unicode(r"(?u)[ÉÈËÊ]", "utf-8"))
re_unicode_uppercase_i = re.compile(unicode(r"(?u)[ÍÌÏÎ]", "utf-8"))
re_unicode_uppercase_o = re.compile(unicode(r"(?u)[ÓÒÖÔÕØ]", "utf-8"))
re_unicode_uppercase_u = re.compile(unicode(r"(?u)[ÚÙÜÛ]", "utf-8"))
re_unicode_uppercase_y = re.compile(unicode(r"(?u)[Ý]", "utf-8"))
re_unicode_uppercase_c = re.compile(unicode(r"(?u)[ÇĆ]", "utf-8"))
re_unicode_uppercase_n = re.compile(unicode(r"(?u)[Ñ]", "utf-8"))
re_latex_lowercase_a = re.compile("\\\\[\"H'`~^vu=k]\{?a\}?")
re_latex_lowercase_ae = re.compile("\\\\ae\\{\\}?")
re_latex_lowercase_e = re.compile("\\\\[\"H'`~^vu=k]\\{?e\\}?")
re_latex_lowercase_i = re.compile("\\\\[\"H'`~^vu=k]\\{?i\\}?")
re_latex_lowercase_o = re.compile("\\\\[\"H'`~^vu=k]\\{?o\\}?")
re_latex_lowercase_u = re.compile("\\\\[\"H'`~^vu=k]\\{?u\\}?")
re_latex_lowercase_y = re.compile("\\\\[\"']\\{?y\\}?")
re_latex_lowercase_c = re.compile("\\\\['uc]\\{?c\\}?")
re_latex_lowercase_n = re.compile("\\\\[c'~^vu]\\{?n\\}?")
re_latex_uppercase_a = re.compile("\\\\[\"H'`~^vu=k]\\{?A\\}?")
re_latex_uppercase_ae = re.compile("\\\\AE\\{?\\}?")
re_latex_uppercase_e = re.compile("\\\\[\"H'`~^vu=k]\\{?E\\}?")
re_latex_uppercase_i = re.compile("\\\\[\"H'`~^vu=k]\\{?I\\}?")
re_latex_uppercase_o = re.compile("\\\\[\"H'`~^vu=k]\\{?O\\}?")
re_latex_uppercase_u = re.compile("\\\\[\"H'`~^vu=k]\\{?U\\}?")
re_latex_uppercase_y = re.compile("\\\\[\"']\\{?Y\\}?")
re_latex_uppercase_c = re.compile("\\\\['uc]\\{?C\\}?")
re_latex_uppercase_n = re.compile("\\\\[c'~^vu]\\{?N\\}?")
class RestrictedCollectionDataCacher(DataCacher):
def __init__(self):
def cache_filler():
ret = []
try:
viewcollid = acc_get_action_id(VIEWRESTRCOLL)
res = run_sql("""SELECT DISTINCT ar.value
FROM accROLE_accACTION_accARGUMENT raa JOIN accARGUMENT ar ON raa.id_accARGUMENT = ar.id
WHERE ar.keyword = 'collection' AND raa.id_accACTION = %s""", (viewcollid,))
except Exception:
# database problems, return empty cache
return []
for coll in res:
ret.append(coll[0])
return ret
def timestamp_getter():
return max(get_table_update_time('accROLE_accACTION_accARGUMENT'), get_table_update_time('accARGUMENT'))
DataCacher.__init__(self, cache_filler, timestamp_getter)
def collection_restricted_p(collection):
cache = restricted_collection_cache.get_cache()
return collection in cache
try:
restricted_collection_cache.is_ok_p
except Exception:
restricted_collection_cache = RestrictedCollectionDataCacher()
def check_user_can_view_record(user_info, recid):
"""Check if the user is authorized to view the given recid. The function
grants access in two cases: either user has author rights on ths record,
or he has view rights to the primary collection this record belongs to.
Returns the same type as acc_authorize_action
"""
def _is_user_in_authorized_author_list_for_recid(user_info, recid):
"""Return True if the user have submitted the given record."""
authorized_emails = []
for tag in CFG_ACC_GRANT_AUTHOR_RIGHTS_TO_EMAILS_IN_TAGS:
authorized_emails.extend(get_fieldvalues(recid, tag))
for email in authorized_emails:
email = email.strip().lower()
if user_info['email'].strip().lower() == email:
return True
return False
record_primary_collection = guess_primary_collection_of_a_record(recid)
if collection_restricted_p(record_primary_collection):
(auth_code, auth_msg) = acc_authorize_action(user_info, VIEWRESTRCOLL, collection=record_primary_collection)
if auth_code == 0 or _is_user_in_authorized_author_list_for_recid(user_info, recid):
return (0, '')
else:
return (auth_code, auth_msg)
else:
return (0, '')
class IndexStemmingDataCacher(DataCacher):
def __init__(self):
def cache_filler():
try:
res = run_sql("""SELECT id, stemming_language FROM idxINDEX""")
except DatabaseError:
# database problems, return empty cache
return {}
return dict(res)
def timestamp_getter():
return get_table_update_time('idxINDEX')
DataCacher.__init__(self, cache_filler, timestamp_getter)
def get_index_stemming_language(index_id):
cache = index_stemming_cache.get_cache()
return cache[index_id]
try:
index_stemming_cache.is_ok_p
except Exception:
index_stemming_cache = IndexStemmingDataCacher()
class FieldI18nNameDataCacher(DataCacher):
def __init__(self):
def cache_filler():
ret = {}
try:
res = run_sql("SELECT f.name,fn.ln,fn.value FROM fieldname AS fn, field AS f WHERE fn.id_field=f.id AND fn.type='ln'") # ln=long name
except Exception:
# database problems, return empty cache
return {}
for f, ln, i18nname in res:
if i18nname:
if not ret.has_key(f):
ret[f] = {}
ret[f][ln] = i18nname
return ret
def timestamp_getter():
return get_table_update_time('fieldname')
DataCacher.__init__(self, cache_filler, timestamp_getter)
def get_field_i18nname(self, f, ln=cdslang):
out = f
try:
out = self.get_cache()[f][ln]
except KeyError:
pass # translation in LN does not exist
return out
try:
if not field_i18n_name_cache.is_ok_p:
raise Exception
except Exception:
field_i18n_name_cache = FieldI18nNameDataCacher()
class CollectionRecListDataCacher(DataCacher):
def __init__(self):
def cache_filler():
ret = {}
try:
res = run_sql("SELECT name,reclist FROM collection")
except Exception:
# database problems, return empty cache
return {}
for name, reclist in res:
ret[name] = None # this will be filled later during runtime by calling get_collection_reclist(coll)
return ret
def timestamp_getter():
return get_table_update_time('collection')
DataCacher.__init__(self, cache_filler, timestamp_getter)
def get_collection_reclist(self, coll):
cache = self.get_cache()
if not cache[coll]:
# not yet it the cache, so calculate it and fill the cache:
set = HitSet()
query = "SELECT nbrecs,reclist FROM collection WHERE name='%s'" % coll
res = run_sql(query, None, 1)
if res:
try:
set = HitSet(res[0][1])
except:
pass
self.cache[coll] = set
cache[coll] = set
# finally, return reclist:
return cache[coll]
try:
if not collection_reclist_cache.is_ok_p:
raise Exception
except Exception:
collection_reclist_cache = CollectionRecListDataCacher()
class CollectionI18nDataCacher(DataCacher):
def __init__(self):
def cache_filler():
ret = {}
try:
res = run_sql("SELECT c.name,cn.ln,cn.value FROM collectionname AS cn, collection AS c WHERE cn.id_collection=c.id AND cn.type='ln'") # ln=long name
except Exception:
# database problems,
return {}
for c, ln, i18nname in res:
if i18nname:
if not ret.has_key(c):
ret[c] = {}
ret[c][ln] = i18nname
return ret
def timestamp_getter():
return get_table_update_time('collectionname')
DataCacher.__init__(self, cache_filler, timestamp_getter)
def get_coll_i18nname(self, c, ln=cdslang):
"""Return nicely formatted collection name (of name type 'ln',
'long name') for collection C in language LN."""
cache = self.get_cache()
out = c
try:
out = cache[c][ln]
except KeyError:
pass # translation in LN does not exist
return out
try:
if not collection_i18n_name_cache.is_ok_p:
raise Exception
except Exception:
collection_i18n_name_cache = CollectionI18nDataCacher()
def get_alphabetically_ordered_collection_list(level=0, ln=cdslang):
"""Returns nicely ordered (score respected) list of collections, more exactly list of tuples
(collection name, printable collection name).
Suitable for create_search_box()."""
out = []
query = "SELECT id,name FROM collection ORDER BY name ASC"
res = run_sql(query)
for c_id, c_name in res:
# make a nice printable name (e.g. truncate c_printable for
# long collection names in given language):
c_printable = get_coll_i18nname(c_name, ln)
if len(c_printable)>30:
c_printable = c_printable[:30] + "..."
if level:
c_printable = " " + level * '-' + " " + c_printable
out.append([c_name, c_printable])
return out
def get_nicely_ordered_collection_list(collid=1, level=0, ln=cdslang):
"""Returns nicely ordered (score respected) list of collections, more exactly list of tuples
(collection name, printable collection name).
Suitable for create_search_box()."""
colls_nicely_ordered = []
query = "SELECT c.name,cc.id_son FROM collection_collection AS cc, collection AS c "\
" WHERE c.id=cc.id_son AND cc.id_dad='%s' ORDER BY score DESC" % collid
res = run_sql(query)
for c, cid in res:
# make a nice printable name (e.g. truncate c_printable for
# long collection names in given language):
c_printable = get_coll_i18nname(c, ln)
if len(c_printable)>30:
c_printable = c_printable[:30] + "..."
if level:
c_printable = " " + level * '-' + " " + c_printable
colls_nicely_ordered.append([c, c_printable])
colls_nicely_ordered = colls_nicely_ordered + get_nicely_ordered_collection_list(cid, level+1, ln=ln)
return colls_nicely_ordered
def get_index_id_from_field(field):
"""Returns first index id where the field code FIELD is indexed.
Returns zero in case there is no table for this index.
Example: field='author', output=4."""
out = 0
res = run_sql("""SELECT w.id FROM idxINDEX AS w, idxINDEX_field AS wf, field AS f
WHERE f.code=%s AND wf.id_field=f.id AND w.id=wf.id_idxINDEX
LIMIT 1""", (field,))
if res:
out = res[0][0]
return out
def get_words_from_pattern(pattern):
"Returns list of whitespace-separated words from pattern."
words = {}
for word in string.split(pattern):
if not words.has_key(word):
words[word] = 1;
return words.keys()
def create_basic_search_units(req, p, f, m=None, of='hb'):
"""Splits search pattern and search field into a list of independently searchable units.
- A search unit consists of '(operator, pattern, field, type, hitset)' tuples where
'operator' is set union (|), set intersection (+) or set exclusion (-);
'pattern' is either a word (e.g. muon*) or a phrase (e.g. 'nuclear physics');
'field' is either a code like 'title' or MARC tag like '100__a';
'type' is the search type ('w' for word file search, 'a' for access file search).
- Optionally, the function accepts the match type argument 'm'.
If it is set (e.g. from advanced search interface), then it
performs this kind of matching. If it is not set, then a guess is made.
'm' can have values: 'a'='all of the words', 'o'='any of the words',
'p'='phrase/substring', 'r'='regular expression',
'e'='exact value'.
- Warnings are printed on req (when not None) in case of HTML output formats."""
opfts = [] # will hold (o,p,f,t,h) units
## check arguments: if matching type phrase/string/regexp, do we have field defined?
if (m=='p' or m=='r' or m=='e') and not f:
m = 'a'
if of.startswith("h"):
print_warning(req, "This matching type cannot be used within any field. I will perform a word search instead." )
print_warning(req, "If you want to phrase/substring/regexp search in a specific field, e.g. inside title, then please choose within title search option.")
## is desired matching type set?
if m:
## A - matching type is known; good!
if m == 'e':
# A1 - exact value:
opfts.append(['+', p, f, 'a']) # '+' since we have only one unit
elif m == 'p':
# A2 - phrase/substring:
opfts.append(['+', "%" + p + "%", f, 'a']) # '+' since we have only one unit
elif m == 'r':
# A3 - regular expression:
opfts.append(['+', p, f, 'r']) # '+' since we have only one unit
elif m == 'a' or m == 'w':
# A4 - all of the words:
p = strip_accents(p) # strip accents for 'w' mode, FIXME: delete when not needed
for word in get_words_from_pattern(p):
opfts.append(['+', word, f, 'w']) # '+' in all units
elif m == 'o':
# A5 - any of the words:
p = strip_accents(p) # strip accents for 'w' mode, FIXME: delete when not needed
for word in get_words_from_pattern(p):
if len(opfts)==0:
opfts.append(['+', word, f, 'w']) # '+' in the first unit
else:
opfts.append(['|', word, f, 'w']) # '|' in further units
else:
if of.startswith("h"):
print_warning(req, "Matching type '%s' is not implemented yet." % m, "Warning")
opfts.append(['+', "%" + p + "%", f, 'a'])
else:
## B - matching type is not known: let us try to determine it by some heuristics
if f and p[0] == '"' and p[-1] == '"':
## B0 - does 'p' start and end by double quote, and is 'f' defined? => doing ACC search
opfts.append(['+', p[1:-1], f, 'a'])
elif f and p[0] == "'" and p[-1] == "'":
## B0bis - does 'p' start and end by single quote, and is 'f' defined? => doing ACC search
opfts.append(['+', '%' + p[1:-1] + '%', f, 'a'])
elif f and p[0] == "/" and p[-1] == "/":
## B0ter - does 'p' start and end by a slash, and is 'f' defined? => doing regexp search
opfts.append(['+', p[1:-1], f, 'r'])
elif f and string.find(p, ',') >= 0:
## B1 - does 'p' contain comma, and is 'f' defined? => doing ACC search
opfts.append(['+', p, f, 'a'])
elif f and str(f[0:2]).isdigit():
## B2 - does 'f' exist and starts by two digits? => doing ACC search
opfts.append(['+', p, f, 'a'])
else:
## B3 - doing WRD search, but maybe ACC too
# search units are separated by spaces unless the space is within single or double quotes
# so, let us replace temporarily any space within quotes by '__SPACE__'
p = re_pattern_single_quotes.sub(lambda x: "'"+string.replace(x.group(1), ' ', '__SPACE__')+"'", p)
p = re_pattern_double_quotes.sub(lambda x: "\""+string.replace(x.group(1), ' ', '__SPACE__')+"\"", p)
p = re_pattern_regexp_quotes.sub(lambda x: "/"+string.replace(x.group(1), ' ', '__SPACE__')+"/", p)
# wash argument:
p = re_equal.sub(":", p)
p = re_logical_and.sub(" ", p)
p = re_logical_or.sub(" |", p)
p = re_logical_not.sub(" -", p)
p = re_operators.sub(r' \1', p)
for pi in string.split(p): # iterate through separated units (or items, as "pi" stands for "p item")
pi = re_pattern_space.sub(" ", pi) # replace back '__SPACE__' by ' '
# firstly, determine set operator
if pi[0] == '+' or pi[0] == '-' or pi[0] == '|':
oi = pi[0]
pi = pi[1:]
else:
# okay, there is no operator, so let us decide what to do by default
oi = '+' # by default we are doing set intersection...
# secondly, determine search pattern and field:
if string.find(pi, ":") > 0:
fi, pi = string.split(pi, ":", 1)
else:
fi, pi = f, pi
# look also for old ALEPH field names:
if fi and CFG_WEBSEARCH_FIELDS_CONVERT.has_key(string.lower(fi)):
fi = CFG_WEBSEARCH_FIELDS_CONVERT[string.lower(fi)]
# wash 'pi' argument:
if re_quotes.match(pi):
# B3a - quotes are found => do ACC search (phrase search)
if fi:
if pi[0] == '"' and pi[-1] == '"':
pi = string.replace(pi, '"', '') # remove quote signs
opfts.append([oi, pi, fi, 'a'])
elif pi[0] == "'" and pi[-1] == "'":
pi = string.replace(pi, "'", "") # remove quote signs
opfts.append([oi, "%" + pi + "%", fi, 'a'])
else: # unbalanced quotes, so do WRD query:
opfts.append([oi, pi, fi, 'w'])
else:
# fi is not defined, look at where we are doing exact or subphrase search (single/double quotes):
if pi[0] == '"' and pi[-1] == '"':
opfts.append([oi, pi[1:-1], "anyfield", 'a'])
if of.startswith("h"):
print_warning(req, "Searching for an exact match inside any field may be slow. You may want to search for words instead, or choose to search within specific field.")
else:
# nope, subphrase in global index is not possible => change back to WRD search
pi = strip_accents(pi) # strip accents for 'w' mode, FIXME: delete when not needed
for pii in get_words_from_pattern(pi):
# since there may be '-' and other chars that we do not index in WRD
opfts.append([oi, pii, fi, 'w'])
if of.startswith("h"):
print_warning(req, "The partial phrase search does not work in any field. I'll do a boolean AND searching instead.")
print_warning(req, "If you want to do a partial phrase search in a specific field, e.g. inside title, then please choose 'within title' search option.", "Tip")
print_warning(req, "If you want to do exact phrase matching, then please use double quotes.", "Tip")
elif fi and str(fi[0]).isdigit() and str(fi[0]).isdigit():
# B3b - fi exists and starts by two digits => do ACC search
opfts.append([oi, pi, fi, 'a'])
elif fi and not get_index_id_from_field(fi):
# B3c - fi exists but there is no words table for fi => try ACC search
opfts.append([oi, pi, fi, 'a'])
elif fi and pi.startswith('/') and pi.endswith('/'):
# B3d - fi exists and slashes found => try regexp search
opfts.append([oi, pi[1:-1], fi, 'r'])
else:
# B3e - general case => do WRD search
pi = strip_accents(pi) # strip accents for 'w' mode, FIXME: delete when not needed
for pii in get_words_from_pattern(pi):
opfts.append([oi, pii, fi, 'w'])
## sanity check:
for i in range(0, len(opfts)):
try:
pi = opfts[i][1]
if pi == '*':
if of.startswith("h"):
print_warning(req, "Ignoring standalone wildcard word.", "Warning")
del opfts[i]
if pi == '' or pi == ' ':
fi = opfts[i][2]
if fi:
if of.startswith("h"):
print_warning(req, "Ignoring empty %s search term." % fi, "Warning")
del opfts[i]
except:
pass
## return search units:
return opfts
def page_start(req, of, cc, as, ln, uid, title_message=None,
description='', keywords='', recID=-1, tab=''):
"Start page according to given output format."
_ = gettext_set_language(ln)
if not title_message: title_message = _("Search Results")
if not req:
return # we were called from CLI
content_type = get_output_format_content_type(of)
if of.startswith('x'):
if of == 'xr':
# we are doing RSS output
req.content_type = "application/rss+xml"
req.send_http_header()
req.write("""\n""")
else:
# we are doing XML output:
req.content_type = "text/xml"
req.send_http_header()
req.write("""\n""")
elif of.startswith('t') or str(of[0:3]).isdigit():
# we are doing plain text output:
req.content_type = "text/plain"
req.send_http_header()
elif of == "id":
pass # nothing to do, we shall only return list of recIDs
elif content_type == 'text/html':
# we are doing HTML output:
req.content_type = "text/html"
req.send_http_header()
if not description:
description = "%s %s." % (cc, _("Search Results"))
if not keywords:
keywords = "%s, WebSearch, %s" % (get_coll_i18nname(cdsname, ln), get_coll_i18nname(cc, ln))
argd = {}
if req.args:
argd = cgi.parse_qs(req.args)
rssurl = websearch_templates.build_rss_url(argd)
navtrail = create_navtrail_links(cc, as, ln)
navtrail_append_title_p = 1
# FIXME: Find a good point to put this code.
# This is a nice hack to trigger jsMath only when displaying single
# records.
if of.lower() in CFG_WEBSEARCH_USE_JSMATH_FOR_FORMATS:
metaheaderadd = """
"""
else:
metaheaderadd = ''
if tab != '' or ((of != '' or of.lower() != 'hd') and of != 'hb'):
# If we are not in information tab in HD format, customize
# the nav. trail to have a link back to main record. (Due
# to the way perform_request_search() works, hb
# (lowercase) is equal to hd)
if (of != '' or of.lower() != 'hd') and of != 'hb':
# Export
format_name = of
query = "SELECT name FROM format WHERE code=%s"
res = run_sql(query, (of,))
if res:
format_name = res[0][0]
navtrail += ' > %s > %s' % \
(weburl, recID, title_message, format_name)
else:
# Discussion, citations, etc. tabs
tab_label = get_detailed_page_tabs(cc, ln=ln)[tab]['label']
navtrail += ' > %s > %s' % \
(weburl, recID, title_message, _(tab_label))
navtrail_append_title_p = 0
req.write(pageheaderonly(req=req, title=title_message,
navtrail=navtrail,
description=description,
keywords=keywords,
metaheaderadd=metaheaderadd,
uid=uid,
language=ln,
navmenuid='search',
navtrail_append_title_p=\
navtrail_append_title_p,
rssurl=rssurl))
req.write(websearch_templates.tmpl_search_pagestart(ln=ln))
#else:
# req.send_http_header()
def page_end(req, of="hb", ln=cdslang):
"End page according to given output format: e.g. close XML tags, add HTML footer, etc."
if of == "id":
return [] # empty recID list
if not req:
return # we were called from CLI
if of.startswith('h'):
req.write(websearch_templates.tmpl_search_pageend(ln = ln)) # pagebody end
req.write(pagefooteronly(lastupdated=__lastupdated__, language=ln, req=req))
return "\n"
def create_inputdate_box(name="d1", selected_year=0, selected_month=0, selected_day=0, ln=cdslang):
"Produces 'From Date', 'Until Date' kind of selection box. Suitable for search options."
_ = gettext_set_language(ln)
box = ""
# day
box += """"""
# month
box += """"""
# year
box += """"""
return box
def create_search_box(cc, colls, p, f, rg, sf, so, sp, rm, of, ot, as,
ln, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3,
m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, dt, jrec, ec,
action=""):
"""Create search box for 'search again in the results page' functionality."""
# load the right message language
_ = gettext_set_language(ln)
# some computations
cc_intl = get_coll_i18nname(cc, ln)
cc_colID = get_colID(cc)
colls_nicely_ordered = []
if cfg_nicely_ordered_collection_list:
colls_nicely_ordered = get_nicely_ordered_collection_list(ln=ln)
else:
colls_nicely_ordered = get_alphabetically_ordered_collection_list(ln=ln)
colls_nice = []
for (cx, cx_printable) in colls_nicely_ordered:
if not cx.startswith("Unnamed collection"):
colls_nice.append({ 'value' : cx,
'text' : cx_printable
})
coll_selects = []
if colls and colls[0] != cdsname:
# some collections are defined, so print these first, and only then print 'add another collection' heading:
for c in colls:
if c:
temp = []
temp.append({ 'value' : '',
'text' : '*** %s ***' % _("any collection")
})
for val in colls_nice:
# print collection:
if not cx.startswith("Unnamed collection"):
temp.append({ 'value' : val['value'],
'text' : val['text'],
'selected' : (c == re.sub("^[\s\-]*","", val['value']))
})
coll_selects.append(temp)
coll_selects.append([{ 'value' : '',
'text' : '*** %s ***' % _("add another collection")
}] + colls_nice)
else: # we searched in CDSNAME, so print 'any collection' heading
coll_selects.append([{ 'value' : '',
'text' : '*** %s ***' % _("any collection")
}] + colls_nice)
sort_fields = [{
'value' : '',
'text' : _("latest first")
}]
query = """SELECT DISTINCT(f.code),f.name FROM field AS f, collection_field_fieldvalue AS cff
WHERE cff.type='soo' AND cff.id_field=f.id
ORDER BY cff.score DESC, f.name ASC"""
res = run_sql(query)
for code, name in res:
sort_fields.append({
'value' : code,
'text' : name,
})
## ranking methods
ranks = [{
'value' : '',
'text' : "- %s %s -" % (_("OR").lower (), _("rank by")),
}]
for (code, name) in get_bibrank_methods(cc_colID, ln):
# propose found rank methods:
ranks.append({
'value' : code,
'text' : name,
})
formats = []
query = """SELECT code,name FROM format WHERE visibility='1' ORDER BY name ASC"""
res = run_sql(query)
if res:
# propose found formats:
for code, name in res:
formats.append({ 'value' : code,
'text' : name
})
else:
formats.append({'value' : 'hb',
'text' : _("HTML brief")
})
return websearch_templates.tmpl_search_box(
ln = ln,
as = as,
cc_intl = cc_intl,
cc = cc,
ot = ot,
sp = sp,
action = action,
fieldslist = get_searchwithin_fields(ln=ln, colID=cc_colID),
f1 = f1,
f2 = f2,
f3 = f3,
m1 = m1,
m2 = m2,
m3 = m3,
p1 = p1,
p2 = p2,
p3 = p3,
op1 = op1,
op2 = op2,
rm = rm,
p = p,
f = f,
coll_selects = coll_selects,
d1y = d1y, d2y = d2y, d1m = d1m, d2m = d2m, d1d = d1d, d2d = d2d,
dt = dt,
sort_fields = sort_fields,
sf = sf,
so = so,
ranks = ranks,
sc = sc,
rg = rg,
formats = formats,
of = of,
pl = pl,
jrec = jrec,
ec = ec,
)
def create_navtrail_links(cc=cdsname, as=0, ln=cdslang, self_p=1, tab=''):
"""Creates navigation trail links, i.e. links to collection
ancestors (except Home collection). If as==1, then links to
Advanced Search interfaces; otherwise Simple Search.
"""
dads = []
for dad in get_coll_ancestors(cc):
if dad != cdsname: # exclude Home collection
dads.append ((dad, get_coll_i18nname (dad, ln)))
if self_p and cc != cdsname:
dads.append((cc, get_coll_i18nname(cc, ln)))
return websearch_templates.tmpl_navtrail_links(
as=as, ln=ln, dads=dads)
def get_searchwithin_fields(ln='en', colID=None):
"""Retrieves the fields name used in the 'search within' selection box for the collection ID colID."""
res = None
if colID:
res = run_sql_cached("""SELECT f.code,f.name FROM field AS f, collection_field_fieldvalue AS cff
WHERE cff.type='sew' AND cff.id_collection=%s AND cff.id_field=f.id
ORDER BY cff.score DESC, f.name ASC""", (colID,))
if not res:
res = run_sql_cached("SELECT code,name FROM field ORDER BY name ASC")
fields = [{
'value' : '',
'text' : get_field_i18nname("any field", ln)
}]
for field_code, field_name in res:
if field_code and field_code != "anyfield":
fields.append({ 'value' : field_code,
'text' : get_field_i18nname(field_name, ln)
})
return fields
def create_andornot_box(name='op', value='', ln='en'):
"Returns HTML code for the AND/OR/NOT selection box."
_ = gettext_set_language(ln)
out = """
""" % (name,
is_selected('a', value), _("AND"),
is_selected('o', value), _("OR"),
is_selected('n', value), _("AND NOT"))
return out
def create_matchtype_box(name='m', value='', ln='en'):
"Returns HTML code for the 'match type' selection box."
_ = gettext_set_language(ln)
out = """
""" % (name,
is_selected('a', value), _("All of the words:"),
is_selected('o', value), _("Any of the words:"),
is_selected('e', value), _("Exact phrase:"),
is_selected('p', value), _("Partial phrase:"),
is_selected('r', value), _("Regular expression:"))
return out
def is_selected(var, fld):
"Checks if the two are equal, and if yes, returns ' selected'. Useful for select boxes."
if type(var) is int and type(fld) is int:
if var == fld:
return " selected"
elif str(var) == str(fld):
return " selected"
elif fld and len(fld)==3 and fld[0] == "w" and var == fld[1:]:
return " selected"
return ""
def wash_colls(cc, c, split_colls=0):
"""Wash collection list by checking whether user has deselected
anything under 'Narrow search'. Checks also if cc is a list or not.
Return list of cc, colls_to_display, colls_to_search since the list
of collections to display is different from that to search in.
This is because users might have chosen 'split by collection'
functionality.
The behaviour of "collections to display" depends solely whether
user has deselected a particular collection: e.g. if it started
from 'Articles and Preprints' page, and deselected 'Preprints',
then collection to display is 'Articles'. If he did not deselect
anything, then collection to display is 'Articles & Preprints'.
The behaviour of "collections to search in" depends on the
'split_colls' parameter:
* if is equal to 1, then we can wash the colls list down
and search solely in the collection the user started from;
* if is equal to 0, then we are splitting to the first level
of collections, i.e. collections as they appear on the page
we started to search from;
The function raises exception
InvenioWebSearchUnknownCollectionError
if cc or one of c collections is not known.
"""
colls_out = []
colls_out_for_display = []
# check what type is 'cc':
if type(cc) is list:
for ci in cc:
if collection_reclist_cache.has_key(ci):
# yes this collection is real, so use it:
cc = ci
break
else:
# check once if cc is real:
if not collection_reclist_cache.has_key(cc):
if cc:
raise InvenioWebSearchUnknownCollectionError(cc)
else:
cc = cdsname # cc is not set, so replace it with Home collection
# check type of 'c' argument:
if type(c) is list:
colls = c
else:
colls = [c]
# remove all 'unreal' collections:
colls_real = []
for coll in colls:
if collection_reclist_cache.has_key(coll):
colls_real.append(coll)
else:
if coll:
raise InvenioWebSearchUnknownCollectionError(coll)
colls = colls_real
# check if some real collections remain:
if len(colls)==0:
colls = [cc]
# then let us check the list of non-restricted "real" sons of 'cc' and compare it to 'coll':
res = run_sql("""SELECT c.name FROM collection AS c,
collection_collection AS cc,
collection AS ccc
WHERE c.id=cc.id_son AND cc.id_dad=ccc.id
AND ccc.name=%s AND cc.type='r'
AND c.restricted IS NULL""", (cc,))
l_cc_nonrestricted_sons = []
l_c = colls
for row in res:
l_cc_nonrestricted_sons.append(row[0])
l_c.sort()
l_cc_nonrestricted_sons.sort()
if l_cc_nonrestricted_sons == l_c:
colls_out_for_display = [cc] # yep, washing permitted, it is sufficient to display 'cc'
else:
colls_out_for_display = colls # nope, we need to display all 'colls' successively
# remove duplicates:
colls_out_for_display_nondups=filter(lambda x, colls_out_for_display=colls_out_for_display: colls_out_for_display[x-1] not in colls_out_for_display[x:], range(1, len(colls_out_for_display)+1))
colls_out_for_display = map(lambda x, colls_out_for_display=colls_out_for_display:colls_out_for_display[x-1], colls_out_for_display_nondups)
# second, let us decide on collection splitting:
if split_colls == 0:
# type A - no sons are wanted
colls_out = colls_out_for_display
# elif split_colls == 1:
else:
# type B - sons (first-level descendants) are wanted
for coll in colls_out_for_display:
coll_sons = get_coll_sons(coll)
if coll_sons == []:
colls_out.append(coll)
else:
colls_out = colls_out + coll_sons
# remove duplicates:
colls_out_nondups=filter(lambda x, colls_out=colls_out: colls_out[x-1] not in colls_out[x:], range(1, len(colls_out)+1))
colls_out = map(lambda x, colls_out=colls_out:colls_out[x-1], colls_out_nondups)
return (cc, colls_out_for_display, colls_out)
def strip_accents(x):
"""Strip accents in the input phrase X (assumed in UTF-8) by replacing
accented characters with their unaccented cousins (e.g. é by e).
Return such a stripped X."""
x = re_latex_lowercase_a.sub("a", x)
x = re_latex_lowercase_ae.sub("ae", x)
x = re_latex_lowercase_e.sub("e", x)
x = re_latex_lowercase_i.sub("i", x)
x = re_latex_lowercase_o.sub("o", x)
x = re_latex_lowercase_u.sub("u", x)
x = re_latex_lowercase_y.sub("x", x)
x = re_latex_lowercase_c.sub("c", x)
x = re_latex_lowercase_n.sub("n", x)
x = re_latex_uppercase_a.sub("A", x)
x = re_latex_uppercase_ae.sub("AE", x)
x = re_latex_uppercase_e.sub("E", x)
x = re_latex_uppercase_i.sub("I", x)
x = re_latex_uppercase_o.sub("O", x)
x = re_latex_uppercase_u.sub("U", x)
x = re_latex_uppercase_y.sub("Y", x)
x = re_latex_uppercase_c.sub("C", x)
x = re_latex_uppercase_n.sub("N", x)
# convert input into Unicode string:
try:
y = unicode(x, "utf-8")
except:
return x # something went wrong, probably the input wasn't UTF-8
# asciify Latin-1 lowercase characters:
y = re_unicode_lowercase_a.sub("a", y)
y = re_unicode_lowercase_ae.sub("ae", y)
y = re_unicode_lowercase_e.sub("e", y)
y = re_unicode_lowercase_i.sub("i", y)
y = re_unicode_lowercase_o.sub("o", y)
y = re_unicode_lowercase_u.sub("u", y)
y = re_unicode_lowercase_y.sub("y", y)
y = re_unicode_lowercase_c.sub("c", y)
y = re_unicode_lowercase_n.sub("n", y)
# asciify Latin-1 uppercase characters:
y = re_unicode_uppercase_a.sub("A", y)
y = re_unicode_uppercase_ae.sub("AE", y)
y = re_unicode_uppercase_e.sub("E", y)
y = re_unicode_uppercase_i.sub("I", y)
y = re_unicode_uppercase_o.sub("O", y)
y = re_unicode_uppercase_u.sub("U", y)
y = re_unicode_uppercase_y.sub("Y", y)
y = re_unicode_uppercase_c.sub("C", y)
y = re_unicode_uppercase_n.sub("N", y)
# return UTF-8 representation of the Unicode string:
return y.encode("utf-8")
def wash_index_term(term, max_char_length=50):
"""
Return washed form of the index term TERM that would be suitable
for storing into idxWORD* tables. I.e., lower the TERM, and
truncate it safely to MAX_CHAR_LENGTH UTF-8 characters (meaning,
in principle, 4*MAX_CHAR_LENGTH bytes).
The function works by an internal conversion of TERM, when needed,
from its input Python UTF-8 binary string format into Python
Unicode format, and then truncating it safely to the given number
of TF-8 characters, without possible mis-truncation in the middle
of a multi-byte UTF-8 character that could otherwise happen if we
would have been working with UTF-8 binary representation directly.
Note that MAX_CHAR_LENGTH corresponds to the length of the term
column in idxINDEX* tables.
"""
washed_term = unicode(term, 'utf-8').lower()
if len(washed_term) <= max_char_length:
# no need to truncate the term, because it will fit
# nicely even if it uses four-byte UTF-8 characters
return washed_term.encode('utf-8')
else:
# truncate the term in a safe position:
return washed_term[:max_char_length].encode('utf-8')
def wash_pattern(p):
"""Wash pattern passed by URL. Check for sanity of the wildcard by
removing wildcards if they are appended to extremely short words
(1-3 letters). TODO: instead of this approximative treatment, it
will be much better to introduce a temporal limit, e.g. to kill a
query if it does not finish in 10 seconds."""
# strip accents:
# p = strip_accents(p) # FIXME: when available, strip accents all the time
# add leading/trailing whitespace for the two following wildcard-sanity checking regexps:
p = " " + p + " "
# get rid of wildcards at the beginning of words:
p = re_pattern_wildcards_at_beginning.sub("\\1", p)
# replace spaces within quotes by __SPACE__ temporarily:
p = re_pattern_single_quotes.sub(lambda x: "'"+string.replace(x.group(1), ' ', '__SPACE__')+"'", p)
p = re_pattern_double_quotes.sub(lambda x: "\""+string.replace(x.group(1), ' ', '__SPACE__')+"\"", p)
p = re_pattern_regexp_quotes.sub(lambda x: "/"+string.replace(x.group(1), ' ', '__SPACE__')+"/", p)
# get rid of extremely short words (1-3 letters with wildcards):
p = re_pattern_short_words.sub("\\1", p)
# replace back __SPACE__ by spaces:
p = re_pattern_space.sub(" ", p)
# replace special terms:
p = re_pattern_today.sub(time.strftime("%Y-%m-%d", time.localtime()), p)
# remove unnecessary whitespace:
p = string.strip(p)
return p
def wash_field(f):
"""Wash field passed by URL."""
# get rid of unnecessary whitespace:
f = string.strip(f)
# wash old-style CDS Invenio/ALEPH 'f' field argument, e.g. replaces 'wau' and 'au' by 'author'
if CFG_WEBSEARCH_FIELDS_CONVERT.has_key(string.lower(f)):
f = CFG_WEBSEARCH_FIELDS_CONVERT[f]
return f
def wash_dates(d1="", d1y=0, d1m=0, d1d=0, d2="", d2y=0, d2m=0, d2d=0):
"""
Take user-submitted date arguments D1 (full datetime string) or
(D1Y, D1M, D1Y) year, month, day tuple and D2 or (D2Y, D2M, D2Y)
and return (YYY1-M1-D2 H1:M1:S2, YYY2-M2-D2 H2:M2:S2) datetime
strings in the YYYY-MM-DD HH:MM:SS format suitable for time
restricted searching.
Note that when both D1 and (D1Y, D1M, D1D) parameters are present,
the precedence goes to D1. Ditto for D2*.
Note that when (D1Y, D1M, D1D) are taken into account, some values
may be missing and are completed e.g. to 01 or 12 according to
whether it is the starting or the ending date.
"""
datetext1, datetext2 = "", ""
# sanity checking:
if d1 == "" and d1y == 0 and d1m == 0 and d1d == 0 and d2 == "" and d2y == 0 and d2m == 0 and d2d == 0:
return ("", "") # nothing selected, so return empty values
# wash first (starting) date:
if d1:
# full datetime string takes precedence:
datetext1 = d1
else:
# okay, first date passed as (year,month,day):
if d1y:
datetext1 += "%04d" % d1y
else:
datetext1 += "0000"
if d1m:
datetext1 += "-%02d" % d1m
else:
datetext1 += "-01"
if d1d:
datetext1 += "-%02d" % d1d
else:
datetext1 += "-01"
datetext1 += " 00:00:00"
# wash second (ending) date:
if d2:
# full datetime string takes precedence:
datetext2 = d2
else:
# okay, second date passed as (year,month,day):
if d2y:
datetext2 += "%04d" % d2y
else:
datetext2 += "9999"
if d2m:
datetext2 += "-%02d" % d2m
else:
datetext2 += "-12"
if d2d:
datetext2 += "-%02d" % d2d
else:
datetext2 += "-31" # NOTE: perhaps we should add max(datenumber) in
# given month, but for our quering it's not
# needed, 31 will always do
datetext2 += " 00:00:00"
# okay, return constructed YYYY-MM-DD HH:MM:SS datetexts:
return (datetext1, datetext2)
def get_colID(c):
"Return collection ID for collection name C. Return None if no match found."
colID = None
res = run_sql("SELECT id FROM collection WHERE name=%s", (c,), 1)
if res:
colID = res[0][0]
return colID
def get_coll_i18nname(c, ln=cdslang):
"""Return nicely formatted collection name (of name type 'ln',
'long name') for collection C in language LN."""
global collection_i18nname_cache
global collection_i18nname_cache_timestamp
# firstly, check whether the collectionname table was modified:
if get_table_update_time('collectionname') > collection_i18nname_cache_timestamp:
# yes it was, cache clear-up needed:
collection_i18nname_cache = create_collection_i18nname_cache()
# secondly, read i18n name from either the cache or return common name:
out = c
try:
out = collection_i18nname_cache[c][ln]
except KeyError:
pass # translation in LN does not exist
return out
def get_field_i18nname(f, ln=cdslang):
"""Return nicely formatted field name (of type 'ln', 'long name')
for field F in language LN."""
global field_i18nname_cache
global field_i18nname_cache_timestamp
# firstly, check whether the fieldname table was modified:
if get_table_update_time('fieldname') > field_i18nname_cache_timestamp:
# yes it was, cache clear-up needed:
field_i18nname_cache = create_field_i18nname_cache()
# secondly, read i18n name from either the cache or return common name:
out = f
try:
out = field_i18nname_cache[f][ln]
except KeyError:
pass # translation in LN does not exist
return out
def get_coll_ancestors(coll):
"Returns a list of ancestors for collection 'coll'."
coll_ancestors = []
coll_ancestor = coll
while 1:
res = run_sql("""SELECT c.name FROM collection AS c
LEFT JOIN collection_collection AS cc ON c.id=cc.id_dad
LEFT JOIN collection AS ccc ON ccc.id=cc.id_son
WHERE ccc.name=%s ORDER BY cc.id_dad ASC LIMIT 1""",
(coll_ancestor,))
if res:
coll_name = res[0][0]
coll_ancestors.append(coll_name)
coll_ancestor = coll_name
else:
break
# ancestors found, return reversed list:
coll_ancestors.reverse()
return coll_ancestors
def get_coll_sons(coll, type='r', public_only=1):
"""Return a list of sons (first-level descendants) of type 'type' for collection 'coll'.
If public_only, then return only non-restricted son collections.
"""
coll_sons = []
query = "SELECT c.name FROM collection AS c "\
"LEFT JOIN collection_collection AS cc ON c.id=cc.id_son "\
"LEFT JOIN collection AS ccc ON ccc.id=cc.id_dad "\
"WHERE cc.type=%s AND ccc.name=%s"
if public_only:
query += " AND c.restricted IS NULL "
query += " ORDER BY cc.score DESC"
res = run_sql(query, (type, coll))
for name in res:
coll_sons.append(name[0])
return coll_sons
def get_coll_real_descendants(coll):
"""Return a list of all descendants of collection 'coll' that are defined by a 'dbquery'.
IOW, we need to decompose compound collections like "A & B" into "A" and "B" provided
that "A & B" has no associated database query defined.
"""
coll_sons = []
res = run_sql("""SELECT c.name,c.dbquery FROM collection AS c
LEFT JOIN collection_collection AS cc ON c.id=cc.id_son
LEFT JOIN collection AS ccc ON ccc.id=cc.id_dad
WHERE ccc.name=%s ORDER BY cc.score DESC""",
(coll,))
for name, dbquery in res:
if dbquery: # this is 'real' collection, so return it:
coll_sons.append(name)
else: # this is 'composed' collection, so recurse:
coll_sons.extend(get_coll_real_descendants(name))
return coll_sons
def get_collection_reclist(coll):
"""Return hitset of recIDs that belong to the collection 'coll'.
But firstly check the last updated date of the collection table.
If it's newer than the cache timestamp, then empty the cache,
since new records could have been added."""
global collection_reclist_cache
global collection_reclist_cache_timestamp
# firstly, check whether the collection table was modified:
if get_table_update_time('collection') > collection_reclist_cache_timestamp:
# yes it was, cache clear-up needed:
collection_reclist_cache = create_collection_reclist_cache()
# secondly, read reclist from either the cache or the database:
if not collection_reclist_cache[coll]:
# not yet it the cache, so calculate it and fill the cache:
query = "SELECT nbrecs,reclist FROM collection WHERE name='%s'" % coll
res = run_sql(query, None, 1)
if res:
try:
set = HitSet(res[0][1])
except:
set = HitSet()
collection_reclist_cache[coll] = set
# finally, return reclist:
return collection_reclist_cache[coll]
def coll_restricted_p(coll):
"Predicate to test if the collection coll is restricted or not."
if not coll:
return 0
res = run_sql("SELECT restricted FROM collection WHERE name=%s", (coll,))
if res and res[0][0] is not None:
return 1
else:
return 0
def coll_restricted_group(coll):
"Return Apache group to which the collection is restricted. Return None if it's public."
if not coll:
return None
res = run_sql("SELECT restricted FROM collection WHERE name=%s", (coll,))
if res:
return res[0][0]
else:
return None
def create_collection_reclist_cache():
"""Creates list of records belonging to collections. Called on startup
and used later for intersecting search results with collection universe."""
global collection_reclist_cache_timestamp
# populate collection reclist cache:
collrecs = {}
try:
res = run_sql("SELECT name,reclist FROM collection")
except Error:
# database problems, set timestamp to zero and return empty cache
collection_reclist_cache_timestamp = 0
return collrecs
for name, reclist in res:
collrecs[name] = None # this will be filled later during runtime by calling get_collection_reclist(coll)
# update timestamp:
try:
collection_reclist_cache_timestamp = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
except NameError:
collection_reclist_cache_timestamp = 0
return collrecs
try:
collection_reclist_cache.has_key(cdsname)
except:
try:
collection_reclist_cache = create_collection_reclist_cache()
except:
collection_reclist_cache = {}
def create_collection_i18nname_cache():
"""Create cache of I18N collection names of type 'ln' (=long name).
Called on startup and used later during the search time."""
global collection_i18nname_cache_timestamp
# populate collection I18N name cache:
names = {}
try:
res = run_sql("SELECT c.name,cn.ln,cn.value FROM collectionname AS cn, collection AS c WHERE cn.id_collection=c.id AND cn.type='ln'") # ln=long name
except Error:
# database problems, set timestamp to zero and return empty cache
collection_i18nname_cache_timestamp = 0
return names
for c, ln, i18nname in res:
if i18nname:
if not names.has_key(c):
names[c] = {}
names[c][ln] = i18nname
# update timestamp:
try:
collection_i18nname_cache_timestamp = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
except NameError:
collection_i18nname_cache_timestamp = 0
return names
try:
collection_i18nname_cache.has_key(cdsname)
except:
try:
collection_i18nname_cache = create_collection_i18nname_cache()
except:
collection_i18nname_cache = {}
def create_field_i18nname_cache():
"""Create cache of I18N field names of type 'ln' (=long name).
Called on startup and used later during the search time."""
global field_i18nname_cache_timestamp
# populate field I18 name cache:
names = {}
try:
res = run_sql("SELECT f.name,fn.ln,fn.value FROM fieldname AS fn, field AS f WHERE fn.id_field=f.id AND fn.type='ln'") # ln=long name
except Error:
# database problems, set timestamp to zero and return empty cache
field_i18nname_cache_timestamp = 0
return names
for f, ln, i18nname in res:
if i18nname:
if not names.has_key(f):
names[f] = {}
names[f][ln] = i18nname
# update timestamp:
try:
field_i18nname_cache_timestamp = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
except NameError:
field_i18nname_cache_timestamp = 0
return names
try:
field_i18nname_cache.has_key(cdsname)
except:
try:
field_i18nname_cache = create_field_i18nname_cache()
except:
field_i18nname_cache = {}
def browse_pattern(req, colls, p, f, rg, ln=cdslang):
"""Browse either biliographic phrases or words indexes, and display it."""
# load the right message language
_ = gettext_set_language(ln)
## do we search in words indexes?
if not f:
return browse_in_bibwords(req, p, f)
## is p enclosed in quotes? (coming from exact search)
if p.startswith('"') and p.endswith('"'):
p = p[1:-1]
p_orig = p
## okay, "real browse" follows:
browsed_phrases = get_nearest_terms_in_bibxxx(p, f, rg, 1)
while not browsed_phrases:
# try again and again with shorter and shorter pattern:
try:
p = p[:-1]
browsed_phrases = get_nearest_terms_in_bibxxx(p, f, rg, 1)
except:
# probably there are no hits at all:
req.write(_("No values found."))
return
## try to check hits in these particular collection selection:
browsed_phrases_in_colls = []
if 0:
for phrase in browsed_phrases:
phrase_hitset = HitSet()
phrase_hitsets = search_pattern("", phrase, f, 'e')
for coll in colls:
phrase_hitset.union_update(phrase_hitsets[coll])
if len(phrase_hitset) > 0:
# okay, this phrase has some hits in colls, so add it:
browsed_phrases_in_colls.append([phrase, len(phrase_hitset)])
## were there hits in collections?
if browsed_phrases_in_colls == []:
if browsed_phrases != []:
#print_warning(req, """
No match close to %s found in given collections.
#Please try different term.
Displaying matches in any collection...""" % p_orig)
## try to get nbhits for these phrases in any collection:
for phrase in browsed_phrases:
browsed_phrases_in_colls.append([phrase, get_nbhits_in_bibxxx(phrase, f)])
## display results now:
out = websearch_templates.tmpl_browse_pattern(
f=f,
fn=get_field_i18nname(f, ln),
ln=ln,
browsed_phrases_in_colls=browsed_phrases_in_colls,
colls=colls,
)
req.write(out)
return
def browse_in_bibwords(req, p, f, ln=cdslang):
"""Browse inside words indexes."""
if not p:
return
_ = gettext_set_language(ln)
urlargd = {}
urlargd.update(req.argd)
urlargd['action'] = 'search'
nearest_box = create_nearest_terms_box(urlargd, p, f, 'w', ln=ln, intro_text_p=0)
req.write(websearch_templates.tmpl_search_in_bibwords(
p = p,
f = f,
ln = ln,
nearest_box = nearest_box
))
return
def search_pattern(req=None, p=None, f=None, m=None, ap=0, of="id", verbose=0, ln=cdslang):
"""Search for complex pattern 'p' within field 'f' according to
matching type 'm'. Return hitset of recIDs.
The function uses multi-stage searching algorithm in case of no
exact match found. See the Search Internals document for
detailed description.
The 'ap' argument governs whether an alternative patterns are to
be used in case there is no direct hit for (p,f,m). For
example, whether to replace non-alphanumeric characters by
spaces if it would give some hits. See the Search Internals
document for detailed description. (ap=0 forbits the
alternative pattern usage, ap=1 permits it.)
The 'of' argument governs whether to print or not some
information to the user in case of no match found. (Usually it
prints the information in case of HTML formats, otherwise it's
silent).
The 'verbose' argument controls the level of debugging information
to be printed (0=least, 9=most).
All the parameters are assumed to have been previously washed.
This function is suitable as a mid-level API.
"""
_ = gettext_set_language(ln)
hitset_empty = HitSet()
# sanity check:
if not p:
hitset_full = HitSet(trailing_bits=1)
hitset_full.discard(0)
# no pattern, so return all universe
return hitset_full
# search stage 1: break up arguments into basic search units:
if verbose and of.startswith("h"):
t1 = os.times()[4]
basic_search_units = create_basic_search_units(req, p, f, m, of)
if verbose and of.startswith("h"):
t2 = os.times()[4]
print_warning(req, "Search stage 1: basic search units are: %s" % basic_search_units)
print_warning(req, "Search stage 1: execution took %.2f seconds." % (t2 - t1))
# search stage 2: do search for each search unit and verify hit presence:
if verbose and of.startswith("h"):
t1 = os.times()[4]
basic_search_units_hitsets = []
for idx_unit in range(0, len(basic_search_units)):
bsu_o, bsu_p, bsu_f, bsu_m = basic_search_units[idx_unit]
basic_search_unit_hitset = search_unit(bsu_p, bsu_f, bsu_m)
if verbose >= 9 and of.startswith("h"):
print_warning(req, "Search stage 1: pattern %s gave hitlist %s" % (bsu_p, list(basic_search_unit_hitset)))
if len(basic_search_unit_hitset) > 0 or \
ap==0 or \
bsu_o=="|" or \
((idx_unit+1) 0:
# we retain the new unit instead
if of.startswith('h'):
print_warning(req, _("No exact match found for %(x_query1)s, using %(x_query2)s instead...") % \
{'x_query1': "" + cgi.escape(bsu_p) + "",
'x_query2': "" + cgi.escape(bsu_pn) + ""})
basic_search_units[idx_unit][1] = bsu_pn
basic_search_units_hitsets.append(basic_search_unit_hitset)
else:
# stage 2-3: no hits found either, propose nearest indexed terms:
if of.startswith('h'):
if req:
if bsu_f == "recid":
print_warning(req, "Requested record does not seem to exist.")
else:
print_warning(req, create_nearest_terms_box(req.argd, bsu_p, bsu_f, bsu_m, ln=ln))
return hitset_empty
else:
# stage 2-3: no hits found either, propose nearest indexed terms:
if of.startswith('h'):
if req:
if bsu_f == "recid":
print_warning(req, "Requested record does not seem to exist.")
else:
print_warning(req, create_nearest_terms_box(req.argd, bsu_p, bsu_f, bsu_m, ln=ln))
return hitset_empty
if verbose and of.startswith("h"):
t2 = os.times()[4]
for idx_unit in range(0, len(basic_search_units)):
print_warning(req, "Search stage 2: basic search unit %s gave %d hits." %
(basic_search_units[idx_unit][1:], len(basic_search_units_hitsets[idx_unit])))
print_warning(req, "Search stage 2: execution took %.2f seconds." % (t2 - t1))
# search stage 3: apply boolean query for each search unit:
if verbose and of.startswith("h"):
t1 = os.times()[4]
# let the initial set be the complete universe:
hitset_in_any_collection = HitSet(trailing_bits=1)
hitset_in_any_collection.discard(0)
for idx_unit in range(0, len(basic_search_units)):
this_unit_operation = basic_search_units[idx_unit][0]
this_unit_hitset = basic_search_units_hitsets[idx_unit]
if this_unit_operation == '+':
hitset_in_any_collection.intersection_update(this_unit_hitset)
elif this_unit_operation == '-':
hitset_in_any_collection.difference_update(this_unit_hitset)
elif this_unit_operation == '|':
hitset_in_any_collection.union_update(this_unit_hitset)
else:
if of.startswith("h"):
print_warning(req, "Invalid set operation %s." % this_unit_operation, "Error")
if len(hitset_in_any_collection) == 0:
# no hits found, propose alternative boolean query:
if of.startswith('h'):
nearestterms = []
for idx_unit in range(0, len(basic_search_units)):
bsu_o, bsu_p, bsu_f, bsu_m = basic_search_units[idx_unit]
if bsu_p.startswith("%") and bsu_p.endswith("%"):
bsu_p = "'" + bsu_p[1:-1] + "'"
bsu_nbhits = len(basic_search_units_hitsets[idx_unit])
# create a similar query, but with the basic search unit only
argd = {}
argd.update(req.argd)
argd['p'] = bsu_p
argd['f'] = bsu_f
nearestterms.append((bsu_p, bsu_nbhits, argd))
text = websearch_templates.tmpl_search_no_boolean_hits(
ln=ln, nearestterms=nearestterms)
print_warning(req, text)
if verbose and of.startswith("h"):
t2 = os.times()[4]
print_warning(req, "Search stage 3: boolean query gave %d hits." % len(hitset_in_any_collection))
print_warning(req, "Search stage 3: execution took %.2f seconds." % (t2 - t1))
return hitset_in_any_collection
def search_unit(p, f=None, m=None):
"""Search for basic search unit defined by pattern 'p' and field
'f' and matching type 'm'. Return hitset of recIDs.
All the parameters are assumed to have been previously washed.
'p' is assumed to be already a ``basic search unit'' so that it
is searched as such and is not broken up in any way. Only
wildcard and span queries are being detected inside 'p'.
This function is suitable as a low-level API.
"""
## create empty output results set:
set = HitSet()
if not p: # sanity checking
return set
if m == 'a' or m == 'r':
# we are doing either direct bibxxx search or phrase search or regexp search
set = search_unit_in_bibxxx(p, f, m)
else:
# we are doing bibwords search by default
set = search_unit_in_bibwords(p, f)
return set
def search_unit_in_bibwords(word, f, decompress=zlib.decompress):
"""Searches for 'word' inside bibwordsX table for field 'f' and returns hitset of recIDs."""
set = HitSet() # will hold output result set
set_used = 0 # not-yet-used flag, to be able to circumvent set operations
# deduce into which bibwordsX table we will search:
stemming_language = get_index_stemming_language(get_index_id_from_field("anyfield"))
bibwordsX = "idxWORD%02dF" % get_index_id_from_field("anyfield")
if f:
index_id = get_index_id_from_field(f)
if index_id:
bibwordsX = "idxWORD%02dF" % index_id
stemming_language = get_index_stemming_language(index_id)
else:
return HitSet() # word index f does not exist
# wash 'word' argument and run query:
word = string.replace(word, '*', '%') # we now use '*' as the truncation character
words = string.split(word, "->", 1) # check for span query
if len(words) == 2:
word0 = re_word.sub('', words[0])
word1 = re_word.sub('', words[1])
if stemming_language:
word0 = stem(word0, stemming_language)
word1 = stem(word1, stemming_language)
res = run_sql("SELECT term,hitlist FROM %s WHERE term BETWEEN %%s AND %%s" % bibwordsX,
(wash_index_term(word0), wash_index_term(word1)))
else:
word = re_word.sub('', word)
if stemming_language:
word = stem(word, stemming_language)
if string.find(word, '%') >= 0: # do we have wildcard in the word?
res = run_sql("SELECT term,hitlist FROM %s WHERE term LIKE %%s" % bibwordsX,
(wash_index_term(word),))
else:
res = run_sql("SELECT term,hitlist FROM %s WHERE term=%%s" % bibwordsX,
(wash_index_term(word),))
# fill the result set:
for word, hitlist in res:
hitset_bibwrd = HitSet(hitlist)
# add the results:
if set_used:
set.union_update(hitset_bibwrd)
else:
set = hitset_bibwrd
set_used = 1
# okay, return result set:
return set
def search_unit_in_bibxxx(p, f, type):
"""Searches for pattern 'p' inside bibxxx tables for field 'f' and returns hitset of recIDs found.
The search type is defined by 'type' (e.g. equals to 'r' for a regexp search)."""
p_orig = p # saving for eventual future 'no match' reporting
query_addons = "" # will hold additional SQL code for the query
query_params = () # will hold parameters for the query (their number may vary depending on TYPE argument)
# wash arguments:
f = string.replace(f, '*', '%') # replace truncation char '*' in field definition
if type == 'r':
query_addons = "REGEXP %s"
query_params = (p,)
else:
p = string.replace(p, '*', '%') # we now use '*' as the truncation character
ps = string.split(p, "->", 1) # check for span query:
if len(ps) == 2:
query_addons = "BETWEEN %s AND %s"
query_params = (ps[0], ps[1])
else:
if string.find(p, '%') > -1:
query_addons = "LIKE %s"
query_params = (ps[0],)
else:
query_addons = "= %s"
query_params = (ps[0],)
# construct 'tl' which defines the tag list (MARC tags) to search in:
tl = []
if str(f[0]).isdigit() and str(f[1]).isdigit():
tl.append(f) # 'f' seems to be okay as it starts by two digits
else:
# convert old ALEPH tag names, if appropriate: (TODO: get rid of this before entering this function)
if CFG_WEBSEARCH_FIELDS_CONVERT.has_key(string.lower(f)):
f = CFG_WEBSEARCH_FIELDS_CONVERT[string.lower(f)]
# deduce desired MARC tags on the basis of chosen 'f'
tl = get_field_tags(f)
if not tl:
# f index does not exist, nevermind
pass
# okay, start search:
l = [] # will hold list of recID that matched
for t in tl:
# deduce into which bibxxx table we will search:
digit1, digit2 = int(t[0]), int(t[1])
bx = "bib%d%dx" % (digit1, digit2)
bibx = "bibrec_bib%d%dx" % (digit1, digit2)
# construct and run query:
if t == "001":
res = run_sql("SELECT id FROM bibrec WHERE id %s" % query_addons,
query_params)
else:
query = "SELECT bibx.id_bibrec FROM %s AS bx LEFT JOIN %s AS bibx ON bx.id=bibx.id_bibxxx WHERE bx.value %s" % \
(bx, bibx, query_addons)
if len(t) != 6 or t[-1:]=='%':
# wildcard query, or only the beginning of field 't'
# is defined, so add wildcard character:
query += " AND bx.tag LIKE %s"
res = run_sql(query, query_params + (t + '%',))
else:
# exact query for 't':
query += " AND bx.tag=%s"
res = run_sql(query, query_params + (t,))
# fill the result set:
for id_bibrec in res:
if id_bibrec[0]:
l.append(id_bibrec[0])
# check no of hits found:
nb_hits = len(l)
# okay, return result set:
set = HitSet(l)
return set
def search_unit_in_bibrec(datetext1, datetext2, type='c'):
"""
Return hitset of recIDs found that were either created or modified
(according to 'type' arg being 'c' or 'm') from datetext1 until datetext2, inclusive.
Does not pay attention to pattern, collection, anything. Useful
to intersect later on with the 'real' query.
"""
set = HitSet()
if type.startswith("m"):
type = "modification_date"
else:
type = "creation_date" # by default we are searching for creation dates
res = run_sql("SELECT id FROM bibrec WHERE %s>=%%s AND %s<=%%s" % (type, type),
(datetext1, datetext2))
for row in res:
set += row[0]
return set
def intersect_results_with_collrecs(req, hitset_in_any_collection, colls, ap=0, of="hb", verbose=0, ln=cdslang):
"""Return dict of hitsets given by intersection of hitset with the collection universes."""
_ = gettext_set_language(ln)
# search stage 4: intersect with the collection universe:
if verbose and of.startswith("h"):
t1 = os.times()[4]
results = {}
results_nbhits = 0
for coll in colls:
results[coll] = hitset_in_any_collection & get_collection_reclist(coll)
results_nbhits += len(results[coll])
if results_nbhits == 0:
# no hits found, try to search in Home:
results_in_Home = hitset_in_any_collection & get_collection_reclist(cdsname)
if len(results_in_Home) > 0:
# some hits found in Home, so propose this search:
if of.startswith("h"):
url = websearch_templates.build_search_url(req.argd, cc=cdsname, c=[])
print_warning(req, _("No match found in collection %(x_collection)s. Other public collections gave %(x_url_open)s%(x_nb_hits)d hits%(x_url_close)s.") %\
{'x_collection': '' + string.join([get_coll_i18nname(coll, ln) for coll in colls], ', ') + '',
'x_url_open': '' % (url),
'x_nb_hits': len(results_in_Home),
'x_url_close': ''})
results = {}
else:
# no hits found in Home, recommend different search terms:
if of.startswith("h"):
print_warning(req, _("No public collection matched your query. "
"If you were looking for a non-public document, please choose "
"the desired restricted collection first."))
results = {}
if verbose and of.startswith("h"):
t2 = os.times()[4]
print_warning(req, "Search stage 4: intersecting with collection universe gave %d hits." % results_nbhits)
print_warning(req, "Search stage 4: execution took %.2f seconds." % (t2 - t1))
return results
def intersect_results_with_hitset(req, results, hitset, ap=0, aptext="", of="hb"):
"""Return intersection of search 'results' (a dict of hitsets
with collection as key) with the 'hitset', i.e. apply
'hitset' intersection to each collection within search
'results'.
If the final 'results' set is to be empty, and 'ap'
(approximate pattern) is true, and then print the `warningtext'
and return the original 'results' set unchanged. If 'ap' is
false, then return empty results set.
"""
if ap:
results_ap = copy.deepcopy(results)
else:
results_ap = {} # will return empty dict in case of no hits found
nb_total = 0
for coll in results.keys():
results[coll].intersection_update(hitset)
nb_total += len(results[coll])
if nb_total == 0:
if of.startswith("h"):
print_warning(req, aptext)
results = results_ap
return results
def create_similarly_named_authors_link_box(author_name, ln=cdslang):
"""Return a box similar to ``Not satisfied...'' one by proposing
author searches for similar names. Namely, take AUTHOR_NAME
and the first initial of the firstame (after comma) and look
into author index whether authors with e.g. middle names exist.
Useful mainly for CERN Library that sometimes contains name
forms like Ellis-N, Ellis-Nick, Ellis-Nicolas all denoting the
same person. The box isn't proposed if no similarly named
authors are found to exist.
"""
# return nothing if not configured:
if CFG_WEBSEARCH_CREATE_SIMILARLY_NAMED_AUTHORS_LINK_BOX == 0:
return ""
# return empty box if there is no initial:
if re.match(r'[^ ,]+, [^ ]', author_name) is None:
return ""
# firstly find name comma initial:
author_name_to_search = re.sub(r'^([^ ,]+, +[^ ,]).*$', '\\1', author_name)
# secondly search for similar name forms:
similar_author_names = {}
for name in author_name_to_search, strip_accents(author_name_to_search):
for tag in get_field_tags("author"):
# deduce into which bibxxx table we will search:
digit1, digit2 = int(tag[0]), int(tag[1])
bx = "bib%d%dx" % (digit1, digit2)
bibx = "bibrec_bib%d%dx" % (digit1, digit2)
if len(tag) != 6 or tag[-1:]=='%':
# only the beginning of field 't' is defined, so add wildcard character:
res = run_sql("""SELECT bx.value FROM %s AS bx
WHERE bx.value LIKE %%s AND bx.tag LIKE %%s""" % bx,
(name + "%", tag + "%"))
else:
res = run_sql("""SELECT bx.value FROM %s AS bx
WHERE bx.value LIKE %%s AND bx.tag=%%s""" % bx,
(name + "%", tag))
for row in res:
similar_author_names[row[0]] = 1
# remove the original name and sort the list:
try:
del similar_author_names[author_name]
except KeyError:
pass
# thirdly print the box:
out = ""
if similar_author_names:
out_authors = similar_author_names.keys()
out_authors.sort()
tmp_authors = []
for out_author in out_authors:
nbhits = get_nbhits_in_bibxxx(out_author, "author")
if nbhits:
tmp_authors.append((out_author, nbhits))
out += websearch_templates.tmpl_similar_author_names(
authors=tmp_authors, ln=ln)
return out
def create_nearest_terms_box(urlargd, p, f, t='w', n=5, ln=cdslang, intro_text_p=True):
"""Return text box containing list of 'n' nearest terms above/below 'p'
for the field 'f' for matching type 't' (words/phrases) in
language 'ln'.
Propose new searches according to `urlargs' with the new words.
If `intro_text_p' is true, then display the introductory message,
otherwise print only the nearest terms in the box content.
"""
# load the right message language
_ = gettext_set_language(ln)
out = ""
nearest_terms = []
if not p: # sanity check
p = "."
# look for nearest terms:
if t == 'w':
nearest_terms = get_nearest_terms_in_bibwords(p, f, n, n)
if not nearest_terms:
return "%s %s." % (_("No words index available for"), get_field_i18nname(f, ln))
else:
nearest_terms = get_nearest_terms_in_bibxxx(p, f, n, n)
if not nearest_terms:
return "%s %s." % (_("No phrase index available for"), get_field_i18nname(f, ln))
terminfo = []
for term in nearest_terms:
if t == 'w':
hits = get_nbhits_in_bibwords(term, f)
else:
hits = get_nbhits_in_bibxxx(term, f)
argd = {}
argd.update(urlargd)
# check which fields contained the requested parameter, and replace it.
for (px, fx) in ('p', 'f'), ('p1', 'f1'), ('p2', 'f2'), ('p3', 'f3'):
if px in argd:
if f == argd[fx] or f == "anyfield" or f == "":
if string.find(argd[px], p) > -1:
argd[px] = string.replace(argd[px], p, term)
break
else:
if string.find(argd[px], f+':'+p) > -1:
argd[px] = string.replace(argd[px], f+':'+p, f+':'+term)
break
elif string.find(argd[px], f+':"'+p+'"') > -1:
argd[px] = string.replace(argd[px], f+':"'+p+'"', f+':"'+term+'"')
break
terminfo.append((term, hits, argd))
intro = ""
if intro_text_p: # add full leading introductory text
if f:
intro = _("Search term %(x_term)s inside index %(x_index)s did not match any record. Nearest terms in any collection are:") % \
{'x_term': "" + cgi.escape(p.startswith("%") and p.endswith("%") and p[1:-1] or p) + "",
'x_index': "" + cgi.escape(get_field_i18nname(f, ln)) + ""}
else:
intro = _("Search term %s did not match any record. Nearest terms in any collection are:") % \
("" + cgi.escape(p.startswith("%") and p.endswith("%") and p[1:-1] or p) + "")
return websearch_templates.tmpl_nearest_term_box(p=p, ln=ln, f=f, terminfo=terminfo,
intro=intro)
def get_nearest_terms_in_bibwords(p, f, n_below, n_above):
"""Return list of +n -n nearest terms to word `p' in index for field `f'."""
nearest_words = [] # will hold the (sorted) list of nearest words to return
# deduce into which bibwordsX table we will search:
bibwordsX = "idxWORD%02dF" % get_index_id_from_field("anyfield")
if f:
index_id = get_index_id_from_field(f)
if index_id:
bibwordsX = "idxWORD%02dF" % index_id
else:
return nearest_words
# firstly try to get `n' closest words above `p':
res = run_sql("SELECT term FROM %s WHERE term<%%s ORDER BY term DESC LIMIT %%s" % bibwordsX,
(p, n_above))
for row in res:
nearest_words.append(row[0])
nearest_words.reverse()
# secondly insert given word `p':
nearest_words.append(p)
# finally try to get `n' closest words below `p':
res = run_sql("SELECT term FROM %s WHERE term>%%s ORDER BY term ASC LIMIT %%s" % bibwordsX,
(p, n_below))
for row in res:
nearest_words.append(row[0])
return nearest_words
def get_nearest_terms_in_bibxxx(p, f, n_below, n_above):
"""Browse (-n_above, +n_below) closest bibliographic phrases
for the given pattern p in the given field f, regardless
of collection.
Return list of [phrase1, phrase2, ... , phrase_n]."""
## determine browse field:
if not f and string.find(p, ":") > 0: # does 'p' contain ':'?
f, p = string.split(p, ":", 1)
## We are going to take max(n_below, n_above) as the number of
## values to ferch from bibXXx. This is needed to work around
## MySQL UTF-8 sorting troubles in 4.0.x. Proper solution is to
## use MySQL 4.1.x or our own idxPHRASE in the future.
n_fetch = 2*max(n_below, n_above)
## construct 'tl' which defines the tag list (MARC tags) to search in:
tl = []
if str(f[0]).isdigit() and str(f[1]).isdigit():
tl.append(f) # 'f' seems to be okay as it starts by two digits
else:
# deduce desired MARC tags on the basis of chosen 'f'
tl = get_field_tags(f)
## start browsing to fetch list of hits:
browsed_phrases = {} # will hold {phrase1: 1, phrase2: 1, ..., phraseN: 1} dict of browsed phrases (to make them unique)
# always add self to the results set:
browsed_phrases[p.startswith("%") and p.endswith("%") and p[1:-1] or p] = 1
for t in tl:
# deduce into which bibxxx table we will search:
digit1, digit2 = int(t[0]), int(t[1])
bx = "bib%d%dx" % (digit1, digit2)
bibx = "bibrec_bib%d%dx" % (digit1, digit2)
# firstly try to get `n' closest phrases above `p':
if len(t) != 6 or t[-1:]=='%': # only the beginning of field 't' is defined, so add wildcard character:
res = run_sql("""SELECT bx.value FROM %s AS bx
WHERE bx.value<%%s AND bx.tag LIKE %%s
ORDER BY bx.value DESC LIMIT %%s""" % bx,
(p, t + "%", n_fetch))
else:
res = run_sql("""SELECT bx.value FROM %s AS bx
WHERE bx.value<%%s AND bx.tag=%%s
ORDER BY bx.value DESC LIMIT %%s""" % bx,
(p, t, n_fetch))
for row in res:
browsed_phrases[row[0]] = 1
# secondly try to get `n' closest phrases equal to or below `p':
if len(t) != 6 or t[-1:]=='%': # only the beginning of field 't' is defined, so add wildcard character:
res = run_sql("""SELECT bx.value FROM %s AS bx
WHERE bx.value>=%%s AND bx.tag LIKE %%s
ORDER BY bx.value ASC LIMIT %%s""" % bx,
(p, t + "%", n_fetch))
else:
res = run_sql("""SELECT bx.value FROM %s AS bx
WHERE bx.value>=%%s AND bx.tag=%%s
ORDER BY bx.value ASC LIMIT %%s""" % bx,
(p, t, n_fetch))
for row in res:
browsed_phrases[row[0]] = 1
# select first n words only: (this is needed as we were searching
# in many different tables and so aren't sure we have more than n
# words right; this of course won't be needed when we shall have
# one ACC table only for given field):
phrases_out = browsed_phrases.keys()
phrases_out.sort(lambda x, y: cmp(string.lower(strip_accents(x)),
string.lower(strip_accents(y))))
# find position of self:
try:
idx_p = phrases_out.index(p)
except:
idx_p = len(phrases_out)/2
# return n_above and n_below:
return phrases_out[max(0, idx_p-n_above):idx_p+n_below]
def get_nbhits_in_bibwords(word, f):
"""Return number of hits for word 'word' inside words index for field 'f'."""
out = 0
# deduce into which bibwordsX table we will search:
bibwordsX = "idxWORD%02dF" % get_index_id_from_field("anyfield")
if f:
index_id = get_index_id_from_field(f)
if index_id:
bibwordsX = "idxWORD%02dF" % index_id
else:
return 0
if word:
res = run_sql("SELECT hitlist FROM %s WHERE term=%%s" % bibwordsX,
(word,))
for hitlist in res:
out += len(HitSet(hitlist[0]))
return out
def get_nbhits_in_bibxxx(p, f):
"""Return number of hits for word 'word' inside words index for field 'f'."""
## determine browse field:
if not f and string.find(p, ":") > 0: # does 'p' contain ':'?
f, p = string.split(p, ":", 1)
## construct 'tl' which defines the tag list (MARC tags) to search in:
tl = []
if str(f[0]).isdigit() and str(f[1]).isdigit():
tl.append(f) # 'f' seems to be okay as it starts by two digits
else:
# deduce desired MARC tags on the basis of chosen 'f'
tl = get_field_tags(f)
# start searching:
recIDs = {} # will hold dict of {recID1: 1, recID2: 1, ..., } (unique recIDs, therefore)
for t in tl:
# deduce into which bibxxx table we will search:
digit1, digit2 = int(t[0]), int(t[1])
bx = "bib%d%dx" % (digit1, digit2)
bibx = "bibrec_bib%d%dx" % (digit1, digit2)
if len(t) != 6 or t[-1:]=='%': # only the beginning of field 't' is defined, so add wildcard character:
res = run_sql("""SELECT bibx.id_bibrec FROM %s AS bibx, %s AS bx
WHERE bx.value=%%s AND bx.tag LIKE %%s
AND bibx.id_bibxxx=bx.id""" % (bibx, bx),
(p, t + "%"))
else:
res = run_sql("""SELECT bibx.id_bibrec FROM %s AS bibx, %s AS bx
WHERE bx.value=%%s AND bx.tag=%%s
AND bibx.id_bibxxx=bx.id""" % (bibx, bx),
(p, t))
for row in res:
recIDs[row[0]] = 1
return len(recIDs)
def get_mysql_recid_from_aleph_sysno(sysno):
"""Returns DB's recID for ALEPH sysno passed in the argument (e.g. "002379334CER").
Returns None in case of failure."""
out = None
res = run_sql("""SELECT bb.id_bibrec FROM bibrec_bib97x AS bb, bib97x AS b
WHERE b.value=%s AND b.tag='970__a' AND bb.id_bibxxx=b.id""",
(sysno,))
if res:
out = res[0][0]
return out
def guess_primary_collection_of_a_record(recID):
"""Return primary collection name a record recid belongs to, by testing 980 identifier.
May lead to bad guesses when a collection is defined dynamically bia dbquery.
In that case, return 'cdsname'."""
out = cdsname
dbcollids = get_fieldvalues(recID, "980__a")
if dbcollids:
dbquery = "collection:" + dbcollids[0]
res = run_sql("SELECT name FROM collection WHERE dbquery=%s", (dbquery,))
if res:
out = res[0][0]
return out
def get_tag_name(tag_value, prolog="", epilog=""):
"""Return tag name from the known tag value, by looking up the 'tag' table.
Return empty string in case of failure.
Example: input='100__%', output=first author'."""
out = ""
res = run_sql("SELECT name FROM tag WHERE value=%s", (tag_value,))
if res:
out = prolog + res[0][0] + epilog
return out
def get_fieldcodes():
"""Returns a list of field codes that may have been passed as 'search options' in URL.
Example: output=['subject','division']."""
out = []
res = run_sql("SELECT DISTINCT(code) FROM field")
for row in res:
out.append(row[0])
return out
def get_field_tags(field):
"""Returns a list of MARC tags for the field code 'field'.
Returns empty list in case of error.
Example: field='author', output=['100__%','700__%']."""
out = []
query = """SELECT t.value FROM tag AS t, field_tag AS ft, field AS f
WHERE f.code=%s AND ft.id_field=f.id AND t.id=ft.id_tag
ORDER BY ft.score DESC"""
res = run_sql(query, (field, ))
for val in res:
out.append(val[0])
return out
def get_fieldvalues(recID, tag):
"""Return list of field values for field TAG inside record RECID."""
out = []
if tag == "001___":
# we have asked for recID that is not stored in bibXXx tables
out.append(str(recID))
else:
# we are going to look inside bibXXx tables
digits = tag[0:2]
try:
intdigits = int(digits)
if intdigits < 0 or intdigits > 99:
raise ValueError
except ValueError:
# invalid tag value asked for
return []
bx = "bib%sx" % digits
bibx = "bibrec_bib%sx" % digits
query = "SELECT bx.value FROM %s AS bx, %s AS bibx " \
" WHERE bibx.id_bibrec='%s' AND bx.id=bibx.id_bibxxx AND bx.tag LIKE '%s' " \
" ORDER BY bibx.field_number, bx.tag ASC" % (bx, bibx, recID, tag)
res = run_sql(query)
for row in res:
out.append(row[0])
return out
def get_fieldvalues_alephseq_like(recID, tags_in):
"""Return buffer of ALEPH sequential-like textual format with fields found in the list TAGS_IN for record RECID."""
out = ""
if type(tags_in) is not list:
tags_in = [tags_in,]
if len(tags_in) == 1 and len(tags_in[0]) == 6:
## case A: one concrete subfield asked, so print its value if found
## (use with care: can false you if field has multiple occurrences)
out += string.join(get_fieldvalues(recID, tags_in[0]),"\n")
else:
## case B: print our "text MARC" format; works safely all the time
# find out which tags to output:
dict_of_tags_out = {}
if not tags_in:
for i in range(0, 10):
for j in range(0, 10):
dict_of_tags_out["%d%d%%" % (i, j)] = 1
else:
for tag in tags_in:
if len(tag) == 0:
for i in range(0, 10):
for j in range(0, 10):
dict_of_tags_out["%d%d%%" % (i, j)] = 1
elif len(tag) == 1:
for j in range(0, 10):
dict_of_tags_out["%s%d%%" % (tag, j)] = 1
elif len(tag) < 5:
dict_of_tags_out["%s%%" % tag] = 1
elif tag >= 6:
dict_of_tags_out[tag[0:5]] = 1
tags_out = dict_of_tags_out.keys()
tags_out.sort()
# search all bibXXx tables as needed:
for tag in tags_out:
digits = tag[0:2]
try:
intdigits = int(digits)
if intdigits < 0 or intdigits > 99:
raise ValueError
except ValueError:
# invalid tag value asked for
continue
if tag.startswith("001") or tag.startswith("00%"):
if out:
out += "\n"
out += "%09d %s %d" % (recID, "001__", recID)
bx = "bib%sx" % digits
bibx = "bibrec_bib%sx" % digits
query = "SELECT b.tag,b.value,bb.field_number FROM %s AS b, %s AS bb "\
"WHERE bb.id_bibrec='%s' AND b.id=bb.id_bibxxx AND b.tag LIKE '%s%%' "\
"ORDER BY bb.field_number, b.tag ASC" % (bx, bibx, recID, tag)
res = run_sql(query)
# go through fields:
field_number_old = -999
field_old = ""
for row in res:
field, value, field_number = row[0], row[1], row[2]
ind1, ind2 = field[3], field[4]
if ind1 == "_":
ind1 = ""
if ind2 == "_":
ind2 = ""
# print field tag
if field_number != field_number_old or field[:-1] != field_old[:-1]:
if out:
out += "\n"
out += "%09d %s " % (recID, field[:5])
field_number_old = field_number
field_old = field
# print subfield value
if field[0:2] == "00" and field[-1:] == "_":
out += value
else:
out += "$$%s%s" % (field[-1:], value)
return out
def record_exists(recID):
"""Return 1 if record RECID exists.
Return 0 if it doesn't exist.
Return -1 if it exists but is marked as deleted."""
out = 0
query = "SELECT id FROM bibrec WHERE id='%s'" % recID
res = run_sql(query, None, 1)
if res:
# record exists; now check whether it isn't marked as deleted:
dbcollids = get_fieldvalues(recID, "980__%")
if ("DELETED" in dbcollids) or (CFG_CERN_SITE and "DUMMY" in dbcollids):
out = -1 # exists, but marked as deleted
else:
out = 1 # exists fine
return out
def record_public_p(recID):
"""Return 1 if the record is public, i.e. if it can be found in the Home collection.
Return 0 otherwise.
"""
return recID in get_collection_reclist(cdsname)
def get_creation_date(recID, fmt="%Y-%m-%d"):
"Returns the creation date of the record 'recID'."
out = ""
res = run_sql("SELECT DATE_FORMAT(creation_date,%s) FROM bibrec WHERE id=%s", (fmt, recID), 1)
if res:
out = res[0][0]
return out
def get_modification_date(recID, fmt="%Y-%m-%d"):
"Returns the date of last modification for the record 'recID'."
out = ""
res = run_sql("SELECT DATE_FORMAT(modification_date,%s) FROM bibrec WHERE id=%s", (fmt, recID), 1)
if res:
out = res[0][0]
return out
def print_warning(req, msg, type='', prologue=' ', epilogue=' '):
"Prints warning message and flushes output."
if req and msg:
req.write(websearch_templates.tmpl_print_warning(
msg = msg,
type = type,
prologue = prologue,
epilogue = epilogue,
))
return
def print_search_info(p, f, sf, so, sp, rm, of, ot, collection=cdsname, nb_found=-1, jrec=1, rg=10,
as=0, ln=cdslang, p1="", p2="", p3="", f1="", f2="", f3="", m1="", m2="", m3="", op1="", op2="",
sc=1, pl_in_url="",
d1y=0, d1m=0, d1d=0, d2y=0, d2m=0, d2d=0, dt="",
cpu_time=-1, middle_only=0):
"""Prints stripe with the information on 'collection' and 'nb_found' results and CPU time.
Also, prints navigation links (beg/next/prev/end) inside the results set.
If middle_only is set to 1, it will only print the middle box information (beg/netx/prev/end/etc) links.
This is suitable for displaying navigation links at the bottom of the search results page."""
out = ""
# sanity check:
if jrec < 1:
jrec = 1
if jrec > nb_found:
jrec = max(nb_found-rg+1, 1)
return websearch_templates.tmpl_print_search_info(
ln = ln,
weburl = weburl,
collection = collection,
as = as,
collection_name = get_coll_i18nname(collection, ln),
collection_id = get_colID(collection),
middle_only = middle_only,
rg = rg,
nb_found = nb_found,
sf = sf,
so = so,
rm = rm,
of = of,
ot = ot,
p = p,
f = f,
p1 = p1,
p2 = p2,
p3 = p3,
f1 = f1,
f2 = f2,
f3 = f3,
m1 = m1,
m2 = m2,
m3 = m3,
op1 = op1,
op2 = op2,
pl_in_url = pl_in_url,
d1y = d1y,
d1m = d1m,
d1d = d1d,
d2y = d2y,
d2m = d2m,
d2d = d2d,
dt = dt,
jrec = jrec,
sc = sc,
sp = sp,
all_fieldcodes = get_fieldcodes(),
cpu_time = cpu_time,
)
def print_results_overview(req, colls, results_final_nb_total, results_final_nb, cpu_time, ln=cdslang, ec=[]):
"""Prints results overview box with links to particular collections below."""
out = ""
new_colls = []
for coll in colls:
new_colls.append({
'id': get_colID(coll),
'code': coll,
'name': get_coll_i18nname(coll, ln),
})
return websearch_templates.tmpl_print_results_overview(
ln = ln,
weburl = weburl,
results_final_nb_total = results_final_nb_total,
results_final_nb = results_final_nb,
cpu_time = cpu_time,
colls = new_colls,
ec = ec,
)
def sort_records(req, recIDs, sort_field='', sort_order='d', sort_pattern='', verbose=0, of='hb', ln=cdslang):
"""Sort records in 'recIDs' list according sort field 'sort_field' in order 'sort_order'.
If more than one instance of 'sort_field' is found for a given record, try to choose that that is given by
'sort pattern', for example "sort by report number that starts by CERN-PS".
Note that 'sort_field' can be field code like 'author' or MARC tag like '100__a' directly."""
_ = gettext_set_language(ln)
## check arguments:
if not sort_field:
return recIDs
if len(recIDs) > CFG_WEBSEARCH_NB_RECORDS_TO_SORT:
if of.startswith('h'):
print_warning(req, _("Sorry, sorting is allowed on sets of up to %d records only. Using default sort order.") % CFG_WEBSEARCH_NB_RECORDS_TO_SORT, "Warning")
return recIDs
sort_fields = string.split(sort_field, ",")
recIDs_dict = {}
recIDs_out = []
## first deduce sorting MARC tag out of the 'sort_field' argument:
tags = []
for sort_field in sort_fields:
if sort_field and str(sort_field[0:2]).isdigit():
# sort_field starts by two digits, so this is probably a MARC tag already
tags.append(sort_field)
else:
# let us check the 'field' table
query = """SELECT DISTINCT(t.value) FROM tag AS t, field_tag AS ft, field AS f
WHERE f.code='%s' AND ft.id_field=f.id AND t.id=ft.id_tag
ORDER BY ft.score DESC""" % sort_field
res = run_sql(query)
if res:
for row in res:
tags.append(row[0])
else:
if of.startswith('h'):
print_warning(req, _("Sorry, %s does not seem to be a valid sort option. Choosing title sort instead.") % sort_field, "Error")
tags.append("245__a")
if verbose >= 3:
print_warning(req, "Sorting by tags %s." % tags)
if sort_pattern:
print_warning(req, "Sorting preferentially by %s." % sort_pattern)
## check if we have sorting tag defined:
if tags:
# fetch the necessary field values:
for recID in recIDs:
val = "" # will hold value for recID according to which sort
vals = [] # will hold all values found in sorting tag for recID
for tag in tags:
vals.extend(get_fieldvalues(recID, tag))
if sort_pattern:
# try to pick that tag value that corresponds to sort pattern
bingo = 0
for v in vals:
if v.lower().startswith(sort_pattern.lower()): # bingo!
bingo = 1
val = v
break
if not bingo: # sort_pattern not present, so add other vals after spaces
val = sort_pattern + " " + string.join(vals)
else:
# no sort pattern defined, so join them all together
val = string.join(vals)
val = strip_accents(val.lower()) # sort values regardless of accents and case
if recIDs_dict.has_key(val):
recIDs_dict[val].append(recID)
else:
recIDs_dict[val] = [recID]
# sort them:
recIDs_dict_keys = recIDs_dict.keys()
recIDs_dict_keys.sort()
# now that keys are sorted, create output array:
for k in recIDs_dict_keys:
for s in recIDs_dict[k]:
recIDs_out.append(s)
# ascending or descending?
if sort_order == 'a':
recIDs_out.reverse()
# okay, we are done
return recIDs_out
else:
# good, no sort needed
return recIDs
def print_records(req, recIDs, jrec=1, rg=10, format='hb', ot='', ln=cdslang, relevances=[], relevances_prologue="(", relevances_epilogue="%%)", decompress=zlib.decompress, search_pattern='', print_records_prologue_p=True, print_records_epilogue_p=True, verbose=0, tab=''):
"""
Prints list of records 'recIDs' formatted according to 'format' in
groups of 'rg' starting from 'jrec'.
Assumes that the input list 'recIDs' is sorted in reverse order,
so it counts records from tail to head.
A value of 'rg=-9999' means to print all records: to be used with care.
Print also list of RELEVANCES for each record (if defined), in
between RELEVANCE_PROLOGUE and RELEVANCE_EPILOGUE.
Print prologue and/or epilogue specific to 'format' if
'print_records_prologue_p' and/or print_records_epilogue_p' are
True.
"""
# load the right message language
_ = gettext_set_language(ln)
# sanity checking:
if req is None:
return
# get user_info (for formatting based on user)
user_info = collect_user_info(req)
if len(recIDs):
nb_found = len(recIDs)
if rg == -9999: # print all records
rg = nb_found
else:
rg = abs(rg)
if jrec < 1: # sanity checks
jrec = 1
if jrec > nb_found:
jrec = max(nb_found-rg+1, 1)
# will print records from irec_max to irec_min excluded:
irec_max = nb_found - jrec
irec_min = nb_found - jrec - rg
if irec_min < 0:
irec_min = -1
if irec_max >= nb_found:
irec_max = nb_found - 1
#req.write("%s:%d-%d" % (recIDs, irec_min, irec_max))
if format.startswith('x'):
# print header if needed
if print_records_prologue_p:
print_records_prologue(req, format)
# print records
recIDs_to_print = [recIDs[x] for x in range(irec_max, irec_min, -1)]
format_records(recIDs_to_print,
format,
ln=ln,
search_pattern=search_pattern,
record_separator="\n",
user_info=user_info,
req=req)
# print footer if needed
if print_records_epilogue_p:
print_records_epilogue(req, format)
elif format.startswith('t') or str(format[0:3]).isdigit():
# we are doing plain text output:
for irec in range(irec_max, irec_min, -1):
x = print_record(recIDs[irec], format, ot, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
req.write(x)
if x:
req.write('\n')
elif format == 'excel':
recIDs_to_print = [recIDs[x] for x in range(irec_max, irec_min, -1)]
create_excel(recIDs=recIDs_to_print, req=req, ln=ln)
else:
# we are doing HTML output:
if format == 'hp' or format.startswith("hb_") or format.startswith("hd_"):
# portfolio and on-the-fly formats:
for irec in range(irec_max, irec_min, -1):
req.write(print_record(recIDs[irec], format, ot, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose))
elif format.startswith("hb"):
# HTML brief format:
req.write(websearch_templates.tmpl_record_format_htmlbrief_header(
ln = ln))
for irec in range(irec_max, irec_min, -1):
row_number = jrec+irec_max-irec
recid = recIDs[irec]
if relevances and relevances[irec]:
relevance = relevances[irec]
else:
relevance = ''
record = print_record(recIDs[irec], format, ot, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
req.write(websearch_templates.tmpl_record_format_htmlbrief_body(
ln = ln,
recid = recid,
row_number = row_number,
relevance = relevance,
record = record,
relevances_prologue = relevances_prologue,
relevances_epilogue = relevances_epilogue,
))
req.write(websearch_templates.tmpl_record_format_htmlbrief_footer(
ln = ln))
elif format.startswith("hd"):
# HTML detailed format:
for irec in range(irec_max, irec_min, -1):
unordered_tabs = get_detailed_page_tabs(get_colID(guess_primary_collection_of_a_record(recIDs[irec])),
recIDs[irec], ln=ln)
ordered_tabs_id = [(tab_id, values['order']) for (tab_id, values) in unordered_tabs.iteritems()]
ordered_tabs_id.sort(lambda x,y: cmp(x[1],y[1]))
link_ln = ''
if ln != cdslang:
link_ln = '?ln=%s' % ln
tabs = [(unordered_tabs[tab_id]['label'], \
'%s/record/%s/%s%s' % (weburl, recIDs[irec], tab_id, link_ln), \
tab_id == tab,
unordered_tabs[tab_id]['enabled']) \
for (tab_id, order) in ordered_tabs_id
if unordered_tabs[tab_id]['visible'] == True]
content = ''
# load content
if tab == 'usage':
r = calculate_reading_similarity_list(recIDs[irec], "downloads")
downloadsimilarity = None
downloadhistory = None
#if r:
# downloadsimilarity = r
if CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS:
downloadhistory = create_download_history_graph_and_box(recIDs[irec], ln)
r = calculate_reading_similarity_list(recIDs[irec], "pageviews")
viewsimilarity = None
if r: viewsimilarity = r
content = websearch_templates.tmpl_detailed_record_statistics(recIDs[irec],
ln,
downloadsimilarity=downloadsimilarity,
downloadhistory=downloadhistory,
viewsimilarity=viewsimilarity)
req.write(webstyle_templates.detailed_record_container(content,
recIDs[irec],
tabs,
ln))
elif tab == 'citations':
citinglist = []
citationhistory = None
recid = recIDs[irec]
selfcited = get_self_cited_by(recid)
r = calculate_cited_by_list(recid)
if r:
citinglist = r
citationhistory = create_citation_history_graph_and_box(recid, ln)
r = calculate_co_cited_with_list(recid)
cociting = None
if r:
cociting = r
content = websearch_templates.tmpl_detailed_record_citations(recid,
ln,
citinglist=citinglist,
citationhistory=citationhistory,
cociting=cociting,
selfcited=selfcited)
req.write(webstyle_templates.detailed_record_container(content,
recid,
tabs,
ln))
elif tab == 'references':
content = format_record(recIDs[irec], 'HDREF', ln=ln, user_info=user_info, verbose=verbose)
req.write(webstyle_templates.detailed_record_container(content,
recIDs[irec],
tabs,
ln))
else:
# Metadata tab
content = print_record(recIDs[irec], format, ot, ln,
search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
creationdate = None
modifydate = None
if record_exists(recIDs[irec]) == 1:
creationdate = get_creation_date(recIDs[irec])
modifydate = get_modification_date(recIDs[irec])
content = websearch_templates.tmpl_detailed_record_metadata(
recID = recIDs[irec],
ln = ln,
format = format,
creationdate = creationdate,
modifydate = modifydate,
content = content)
req.write(webstyle_templates.detailed_record_container(content,
recIDs[irec],
tabs,
ln=ln,
creationdate=creationdate,
modifydate=modifydate,
show_short_rec_p=False))
if len(tabs) > 0:
# Add the mini box at bottom of the page
if CFG_WEBCOMMENT_ALLOW_REVIEWS:
from invenio.webcomment import get_mini_reviews
reviews = get_mini_reviews(recid = recIDs[irec], ln=ln)
else:
reviews = ''
actions = format_record(recIDs[irec], 'HDACT', ln=ln, user_info=user_info, verbose=verbose)
files = format_record(recIDs[irec], 'HDFILE', ln=ln, user_info=user_info, verbose=verbose)
req.write(webstyle_templates.detailed_record_mini_panel(recIDs[irec],
ln,
format,
files=files,
reviews=reviews,
actions=actions))
else:
# Other formats
for irec in range(irec_max, irec_min, -1):
req.write(print_record(recIDs[irec], format, ot, ln,
search_pattern=search_pattern,
user_info=user_info, verbose=verbose))
else:
print_warning(req, _("Use different search terms."))
def print_records_prologue(req, format):
"""
Print the appropriate prologue for list of records in the given
format.
"""
prologue = "" # no prologue needed for HTML or Text formats
if format.startswith('xm'):
prologue = websearch_templates.tmpl_xml_marc_prologue()
elif format.startswith('xn'):
prologue = websearch_templates.tmpl_xml_nlm_prologue()
elif format.startswith('xr'):
prologue = websearch_templates.tmpl_xml_rss_prologue()
elif format.startswith('x'):
prologue = websearch_templates.tmpl_xml_default_prologue()
req.write(prologue)
def print_records_epilogue(req, format):
"""
Print the appropriate epilogue for list of records in the given
format.
"""
epilogue = "" # no epilogue needed for HTML or Text formats
if format.startswith('xm'):
epilogue = websearch_templates.tmpl_xml_marc_epilogue()
elif format.startswith('xn'):
epilogue = websearch_templates.tmpl_xml_nlm_epilogue()
elif format.startswith('xr'):
epilogue = websearch_templates.tmpl_xml_rss_epilogue()
elif format.startswith('x'):
epilogue = websearch_templates.tmpl_xml_default_epilogue()
req.write(epilogue)
def print_record(recID, format='hb', ot='', ln=cdslang, decompress=zlib.decompress,
search_pattern=None, user_info=None, verbose=0):
"""Prints record 'recID' formatted accoding to 'format'."""
_ = gettext_set_language(ln)
out = ""
# sanity check:
record_exist_p = record_exists(recID)
if record_exist_p == 0: # doesn't exist
return out
# New Python BibFormat procedure for formatting
# Old procedure follows further below
# We must still check some special formats, but these
# should disappear when BibFormat improves.
if not (CFG_BIBFORMAT_USE_OLD_BIBFORMAT \
or format.lower().startswith('t') \
or format.lower().startswith('hm') \
or str(format[0:3]).isdigit() \
or ot):
# Unspecified format is hd
if format == '':
format = 'hd'
if record_exist_p == -1 and get_output_format_content_type(format) == 'text/html':
# HTML output displays a default value for deleted records.
# Other format have to deal with it.
out += _("The record has been deleted.")
else:
out += call_bibformat(recID, format, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
# at the end of HTML brief mode, print the "Detailed record" functionality:
if format.lower().startswith('hb') and \
format.lower() != 'hb_p':
out += websearch_templates.tmpl_print_record_brief_links(
ln = ln,
recID = recID,
weburl = weburl
)
return out
# Old PHP BibFormat procedure for formatting
# print record opening tags, if needed:
if format == "marcxml" or format == "oai_dc":
out += " \n"
out += " \n"
for oai_id in get_fieldvalues(recID, CFG_OAI_ID_FIELD):
out += " %s\n" % oai_id
out += " %s\n" % get_modification_date(recID)
out += " \n"
out += " \n"
if format.startswith("xm") or format == "marcxml":
# look for detailed format existence:
query = "SELECT value FROM bibfmt WHERE id_bibrec='%s' AND format='%s'" % (recID, format)
res = run_sql(query, None, 1)
if res and record_exist_p == 1:
# record 'recID' is formatted in 'format', so print it
out += "%s" % decompress(res[0][0])
else:
# record 'recID' is not formatted in 'format' -- they are not in "bibfmt" table; so fetch all the data from "bibXXx" tables:
if format == "marcxml":
out += """ \n"""
out += " %d\n" % int(recID)
elif format.startswith("xm"):
out += """ \n"""
out += " %d\n" % int(recID)
if record_exist_p == -1:
# deleted record, so display only OAI ID and 980:
oai_ids = get_fieldvalues(recID, CFG_OAI_ID_FIELD)
if oai_ids:
out += "%s\n" % \
(CFG_OAI_ID_FIELD[0:3], CFG_OAI_ID_FIELD[3:4], CFG_OAI_ID_FIELD[4:5], CFG_OAI_ID_FIELD[5:6], oai_ids[0])
out += "DELETED\n"
else:
# controlfields
query = "SELECT b.tag,b.value,bb.field_number FROM bib00x AS b, bibrec_bib00x AS bb "\
"WHERE bb.id_bibrec='%s' AND b.id=bb.id_bibxxx AND b.tag LIKE '00%%' "\
"ORDER BY bb.field_number, b.tag ASC" % recID
res = run_sql(query)
for row in res:
field, value = row[0], row[1]
value = encode_for_xml(value)
out += """ %s\n""" % \
(encode_for_xml(field[0:3]), value)
# datafields
i = 1 # Do not process bib00x and bibrec_bib00x, as
# they are controlfields. So start at bib01x and
# bibrec_bib00x (and set i = 0 at the end of
# first loop)
for digit1 in range(0, 10):
for digit2 in range(i, 10):
bx = "bib%d%dx" % (digit1, digit2)
bibx = "bibrec_bib%d%dx" % (digit1, digit2)
query = "SELECT b.tag,b.value,bb.field_number FROM %s AS b, %s AS bb "\
"WHERE bb.id_bibrec='%s' AND b.id=bb.id_bibxxx AND b.tag LIKE '%s%%' "\
"ORDER BY bb.field_number, b.tag ASC" % (bx, bibx, recID, str(digit1)+str(digit2))
res = run_sql(query)
field_number_old = -999
field_old = ""
for row in res:
field, value, field_number = row[0], row[1], row[2]
ind1, ind2 = field[3], field[4]
if ind1 == "_" or ind1 == "":
ind1 = " "
if ind2 == "_" or ind2 == "":
ind2 = " "
# print field tag
if field_number != field_number_old or field[:-1] != field_old[:-1]:
if field_number_old != -999:
out += """ \n"""
out += """ \n""" % \
(encode_for_xml(field[0:3]), encode_for_xml(ind1), encode_for_xml(ind2))
field_number_old = field_number
field_old = field
# print subfield value
value = encode_for_xml(value)
out += """ %s\n""" % \
(encode_for_xml(field[-1:]), value)
# all fields/subfields printed in this run, so close the tag:
if field_number_old != -999:
out += """ \n"""
i = 0 # Next loop should start looking at bib%0 and bibrec_bib00x
# we are at the end of printing the record:
out += " \n"
elif format == "xd" or format == "oai_dc":
# XML Dublin Core format, possibly OAI -- select only some bibXXx fields:
out += """ \n"""
if record_exist_p == -1:
out += ""
else:
for f in get_fieldvalues(recID, "041__a"):
out += " %s\n" % f
for f in get_fieldvalues(recID, "100__a"):
out += " %s\n" % encode_for_xml(f)
for f in get_fieldvalues(recID, "700__a"):
out += " %s\n" % encode_for_xml(f)
for f in get_fieldvalues(recID, "245__a"):
out += " %s\n" % encode_for_xml(f)
for f in get_fieldvalues(recID, "65017a"):
out += " %s\n" % encode_for_xml(f)
for f in get_fieldvalues(recID, "8564_u"):
out += " %s\n" % encode_for_xml(f)
for f in get_fieldvalues(recID, "520__a"):
out += " %s\n" % encode_for_xml(f)
out += " %s\n" % get_creation_date(recID)
out += " \n"
elif str(format[0:3]).isdigit():
# user has asked to print some fields only
if format == "001":
out += "%s\n" % (format, recID, format)
else:
vals = get_fieldvalues(recID, format)
for val in vals:
out += "%s\n" % (format, val, format)
elif format.startswith('t'):
## user directly asked for some tags to be displayed only
if record_exist_p == -1:
out += get_fieldvalues_alephseq_like(recID, ["001", CFG_OAI_ID_FIELD, "980"])
else:
out += get_fieldvalues_alephseq_like(recID, ot)
elif format == "hm":
if record_exist_p == -1:
out += "
"
elif format == "hd":
# HTML detailed format
if record_exist_p == -1:
out += _("The record has been deleted.")
else:
# look for detailed format existence:
query = "SELECT value FROM bibfmt WHERE id_bibrec='%s' AND format='%s'" % (recID, format)
res = run_sql(query, None, 1)
if res:
# record 'recID' is formatted in 'format', so print it
out += "%s" % decompress(res[0][0])
else:
# record 'recID' is not formatted in 'format', so try to call BibFormat on the fly or use default format:
out_record_in_format = call_bibformat(recID, format, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
if out_record_in_format:
out += out_record_in_format
else:
out += websearch_templates.tmpl_print_record_detailed(
ln = ln,
recID = recID,
weburl = weburl,
)
elif format.startswith("hb_") or format.startswith("hd_"):
# underscore means that HTML brief/detailed formats should be called on-the-fly; suitable for testing formats
if record_exist_p == -1:
out += _("The record has been deleted.")
else:
out += call_bibformat(recID, format, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
elif format.startswith("hx"):
# BibTeX format, called on the fly:
if record_exist_p == -1:
out += _("The record has been deleted.")
else:
out += call_bibformat(recID, format, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
elif format.startswith("hs"):
# for citation/download similarity navigation links:
if record_exist_p == -1:
out += _("The record has been deleted.")
else:
out += '' % websearch_templates.build_search_url(recid=recID, ln=ln)
# firstly, title:
titles = get_fieldvalues(recID, "245__a")
if titles:
for title in titles:
out += "%s" % title
else:
# usual title not found, try conference title:
titles = get_fieldvalues(recID, "111__a")
if titles:
for title in titles:
out += "%s" % title
else:
# just print record ID:
out += "%s %d" % (get_field_i18nname("record ID", ln), recID)
out += ""
# secondly, authors:
authors = get_fieldvalues(recID, "100__a") + get_fieldvalues(recID, "700__a")
if authors:
out += " - %s" % authors[0]
if len(authors) > 1:
out += " et al"
# thirdly publication info:
publinfos = get_fieldvalues(recID, "773__s")
if not publinfos:
publinfos = get_fieldvalues(recID, "909C4s")
if not publinfos:
publinfos = get_fieldvalues(recID, "037__a")
if not publinfos:
publinfos = get_fieldvalues(recID, "088__a")
if publinfos:
out += " - %s" % publinfos[0]
else:
# fourthly publication year (if not publication info):
years = get_fieldvalues(recID, "773__y")
if not years:
years = get_fieldvalues(recID, "909C4y")
if not years:
years = get_fieldvalues(recID, "260__c")
if years:
out += " (%s)" % years[0]
else:
# HTML brief format by default
if record_exist_p == -1:
out += _("The record has been deleted.")
else:
query = "SELECT value FROM bibfmt WHERE id_bibrec='%s' AND format='%s'" % (recID, format)
res = run_sql(query)
if res:
# record 'recID' is formatted in 'format', so print it
out += "%s" % decompress(res[0][0])
else:
# record 'recID' is not formatted in 'format', so try to call BibFormat on the fly: or use default format:
if CFG_WEBSEARCH_CALL_BIBFORMAT:
out_record_in_format = call_bibformat(recID, format, ln, search_pattern=search_pattern,
user_info=user_info, verbose=verbose)
if out_record_in_format:
out += out_record_in_format
else:
out += websearch_templates.tmpl_print_record_brief(
ln = ln,
recID = recID,
weburl = weburl,
)
else:
out += websearch_templates.tmpl_print_record_brief(
ln = ln,
recID = recID,
weburl = weburl,
)
# at the end of HTML brief mode, print the "Detailed record" functionality:
if format == 'hp' or format.startswith("hb_") or format.startswith("hd_"):
pass # do nothing for portfolio and on-the-fly formats
else:
out += websearch_templates.tmpl_print_record_brief_links(
ln = ln,
recID = recID,
weburl = weburl,
)
# print record closing tags, if needed:
if format == "marcxml" or format == "oai_dc":
out += " \n"
out += " \n"
return out
def encode_for_xml(s):
"Encode special chars in string so that it would be XML-compliant."
s = string.replace(s, '&', '&')
s = string.replace(s, '<', '<')
return s
def call_bibformat(recID, format="HD", ln=cdslang, search_pattern=None, user_info=None, verbose=0):
"""
Calls BibFormat and returns formatted record.
BibFormat will decide by itself if old or new BibFormat must be used.
"""
keywords = []
if search_pattern is not None:
units = create_basic_search_units(None, str(search_pattern), None)
keywords = [unit[1] for unit in units if unit[0] != '-']
return format_record(recID,
of=format,
ln=ln,
search_pattern=keywords,
user_info=user_info,
verbose=verbose)
def log_query(hostname, query_args, uid=-1):
"""
Log query into the query and user_query tables.
Return id_query or None in case of problems.
"""
id_query = None
if uid > 0:
# log the query only if uid is reasonable
res = run_sql("SELECT id FROM query WHERE urlargs=%s", (query_args,), 1)
try:
id_query = res[0][0]
except:
id_query = run_sql("INSERT INTO query (type, urlargs) VALUES ('r', %s)", (query_args,))
if id_query:
run_sql("INSERT INTO user_query (id_user, id_query, hostname, date) VALUES (%s, %s, %s, %s)",
(uid, id_query, hostname,
time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())))
return id_query
def log_query_info(action, p, f, colls, nb_records_found_total=-1):
"""Write some info to the log file for later analysis."""
try:
- log = open(logdir + "/search.log", "a")
+ log = open(CFG_LOGDIR + "/search.log", "a")
log.write(time.strftime("%Y%m%d%H%M%S#", time.localtime()))
log.write(action+"#")
log.write(p+"#")
log.write(f+"#")
for coll in colls[:-1]:
log.write("%s," % coll)
log.write("%s#" % colls[-1])
log.write("%d" % nb_records_found_total)
log.write("\n")
log.close()
except:
pass
return
def wash_url_argument(var, new_type):
"""Wash list argument into 'new_type', that can be 'list',
'str', or 'int'. Useful for washing mod_python passed
arguments, that are all lists of strings (URL args may be
multiple), but we sometimes want only to take the first value,
and sometimes to represent it as string or numerical value."""
out = []
if new_type == 'list': # return lst
if type(var) is list:
out = var
else:
out = [var]
elif new_type == 'str': # return str
if type(var) is list:
try:
out = "%s" % var[0]
except:
out = ""
elif type(var) is str:
out = var
else:
out = "%s" % var
elif new_type == 'int': # return int
if type(var) is list:
try:
out = string.atoi(var[0])
except:
out = 0
elif type(var) is int:
out = var
elif type(var) is str:
try:
out = string.atoi(var)
except:
out = 0
else:
out = 0
return out
### CALLABLES
def perform_request_search(req=None, cc=cdsname, c=None, p="", f="", rg=10, sf="", so="d", sp="", rm="", of="id", ot="", as=0,
p1="", f1="", m1="", op1="", p2="", f2="", m2="", op2="", p3="", f3="", m3="", sc=0, jrec=0,
recid=-1, recidb=-1, sysno="", id=-1, idb=-1, sysnb="", action="", d1="",
d1y=0, d1m=0, d1d=0, d2="", d2y=0, d2m=0, d2d=0, dt="", verbose=0, ap=0, ln=cdslang, ec=None, tab=""):
"""Perform search or browse request, without checking for
authentication. Return list of recIDs found, if of=id.
Otherwise create web page.
The arguments are as follows:
req - mod_python Request class instance.
cc - current collection (e.g. "ATLAS"). The collection the
user started to search/browse from.
c - collection list (e.g. ["Theses", "Books"]). The
collections user may have selected/deselected when
starting to search from 'cc'.
p - pattern to search for (e.g. "ellis and muon or kaon").
f - field to search within (e.g. "author").
rg - records in groups of (e.g. "10"). Defines how many hits
per collection in the search results page are
displayed.
sf - sort field (e.g. "title").
so - sort order ("a"=ascending, "d"=descending).
sp - sort pattern (e.g. "CERN-") -- in case there are more
values in a sort field, this argument tells which one
to prefer
rm - ranking method (e.g. "jif"). Defines whether results
should be ranked by some known ranking method.
of - output format (e.g. "hb"). Usually starting "h" means
HTML output (and "hb" for HTML brief, "hd" for HTML
detailed), "x" means XML output, "t" means plain text
output, "id" means no output at all but to return list
of recIDs found. (Suitable for high-level API.)
ot - output only these MARC tags (e.g. "100,700,909C0b").
Useful if only some fields are to be shown in the
output, e.g. for library to control some fields.
as - advanced search ("0" means no, "1" means yes). Whether
search was called from within the advanced search
interface.
p1 - first pattern to search for in the advanced search
interface. Much like 'p'.
f1 - first field to search within in the advanced search
interface. Much like 'f'.
m1 - first matching type in the advanced search interface.
("a" all of the words, "o" any of the words, "e" exact
phrase, "p" partial phrase, "r" regular expression).
op1 - first operator, to join the first and the second unit
in the advanced search interface. ("a" add, "o" or,
"n" not).
p2 - second pattern to search for in the advanced search
interface. Much like 'p'.
f2 - second field to search within in the advanced search
interface. Much like 'f'.
m2 - second matching type in the advanced search interface.
("a" all of the words, "o" any of the words, "e" exact
phrase, "p" partial phrase, "r" regular expression).
op2 - second operator, to join the second and the third unit
in the advanced search interface. ("a" add, "o" or,
"n" not).
p3 - third pattern to search for in the advanced search
interface. Much like 'p'.
f3 - third field to search within in the advanced search
interface. Much like 'f'.
m3 - third matching type in the advanced search interface.
("a" all of the words, "o" any of the words, "e" exact
phrase, "p" partial phrase, "r" regular expression).
sc - split by collection ("0" no, "1" yes). Governs whether
we want to present the results in a single huge list,
or splitted by collection.
jrec - jump to record (e.g. "234"). Used for navigation
inside the search results.
recid - display record ID (e.g. "20000"). Do not
search/browse but go straight away to the Detailed
record page for the given recID.
recidb - display record ID bis (e.g. "20010"). If greater than
'recid', then display records from recid to recidb.
Useful for example for dumping records from the
database for reformatting.
sysno - display old system SYS number (e.g. ""). If you
migrate to CDS Invenio from another system, and store your
old SYS call numbers, you can use them instead of recid
if you wish so.
id - the same as recid, in case recid is not set. For
backwards compatibility.
idb - the same as recid, in case recidb is not set. For
backwards compatibility.
sysnb - the same as sysno, in case sysno is not set. For
backwards compatibility.
action - action to do. "SEARCH" for searching, "Browse" for
browsing. Default is to search.
d1 - first datetime in full YYYY-mm-dd HH:MM:DD format
(e.g. "1998-08-23 12:34:56"). Useful for search limits
on creation/modification date (see 'dt' argument
below). Note that 'd1' takes precedence over d1y, d1m,
d1d if these are defined.
d1y - first date's year (e.g. "1998"). Useful for search
limits on creation/modification date.
d1m - first date's month (e.g. "08"). Useful for search
limits on creation/modification date.
d1d - first date's day (e.g. "23"). Useful for search
limits on creation/modification date.
d2 - second datetime in full YYYY-mm-dd HH:MM:DD format
(e.g. "1998-09-02 12:34:56"). Useful for search limits
on creation/modification date (see 'dt' argument
below). Note that 'd2' takes precedence over d2y, d2m,
d2d if these are defined.
d2y - second date's year (e.g. "1998"). Useful for search
limits on creation/modification date.
d2m - second date's month (e.g. "09"). Useful for search
limits on creation/modification date.
d2d - second date's day (e.g. "02"). Useful for search
limits on creation/modification date.
dt - first and second date's type (e.g. "c"). Specifies
whether to search in creation dates ("c") or in
modification dates ("m"). When dt is not set and d1*
and d2* are set, the default is "c".
verbose - verbose level (0=min, 9=max). Useful to print some
internal information on the searching process in case
something goes wrong.
ap - alternative patterns (0=no, 1=yes). In case no exact
match is found, the search engine can try alternative
patterns e.g. to replace non-alphanumeric characters by
a boolean query. ap defines if this is wanted.
ln - language of the search interface (e.g. "en"). Useful
for internationalization.
ec - list of external search engines to search as well
(e.g. "SPIRES HEP").
"""
selected_external_collections_infos = None
# wash all arguments requiring special care
try:
(cc, colls_to_display, colls_to_search) = wash_colls(cc, c, sc) # which colls to search and to display?
except InvenioWebSearchUnknownCollectionError, exc:
colname = exc.colname
if of.startswith("h"):
page_start(req, of, cc, as, ln, getUid(req),
websearch_templates.tmpl_collection_not_found_page_title(colname, ln))
req.write(websearch_templates.tmpl_collection_not_found_page_body(colname, ln))
return page_end(req, of, ln)
elif of == "id":
return []
elif of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
else:
return page_end(req, of, ln)
p = wash_pattern(p)
f = wash_field(f)
p1 = wash_pattern(p1)
f1 = wash_field(f1)
p2 = wash_pattern(p2)
f2 = wash_field(f2)
p3 = wash_pattern(p3)
f3 = wash_field(f3)
datetext1, datetext2 = wash_dates(d1, d1y, d1m, d1d, d2, d2y, d2m, d2d)
_ = gettext_set_language(ln)
# backwards compatibility: id, idb, sysnb -> recid, recidb, sysno (if applicable)
if sysnb != "" and sysno == "":
sysno = sysnb
if id > 0 and recid == -1:
recid = id
if idb > 0 and recidb == -1:
recidb = idb
# TODO deduce passed search limiting criterias (if applicable)
pl, pl_in_url = "", "" # no limits by default
if action != "browse" and req and req.args: # we do not want to add options while browsing or while calling via command-line
fieldargs = cgi.parse_qs(req.args)
for fieldcode in get_fieldcodes():
if fieldargs.has_key(fieldcode):
for val in fieldargs[fieldcode]:
pl += "+%s:\"%s\" " % (fieldcode, val)
pl_in_url += "&%s=%s" % (urllib.quote(fieldcode), urllib.quote(val))
# deduce recid from sysno argument (if applicable):
if sysno: # ALEPH SYS number was passed, so deduce DB recID for the record:
recid = get_mysql_recid_from_aleph_sysno(sysno)
if recid is None:
recid = 0 # use recid 0 to indicate that this sysno does not exist
# deduce collection we are in (if applicable):
if recid > 0:
cc = guess_primary_collection_of_a_record(recid)
# deduce user id (if applicable):
try:
uid = getUid(req)
except:
uid = 0
## 0 - start output
if recid >= 0: # recid can be 0 if deduced from sysno and if such sysno does not exist
## 1 - detailed record display
title, description, keywords = \
websearch_templates.tmpl_record_page_header_content(req, recid, ln)
page_start(req, of, cc, as, ln, uid, title, description, keywords, recid, tab)
# Default format is hb but we are in detailed -> change 'of'
if of == "hb":
of = "hd"
if record_exists(recid):
if recidb <= recid: # sanity check
recidb = recid + 1
if of == "id":
return [recidx for recidx in range(recid, recidb) if record_exists(recidx)]
else:
print_records(req, range(recid, recidb), -1, -9999, of, ot, ln, search_pattern=p, verbose=verbose, tab=tab)
if req and of.startswith("h"): # register detailed record page view event
client_ip_address = str(req.get_remote_host(apache.REMOTE_NOLOOKUP))
register_page_view_event(recid, uid, client_ip_address)
else: # record does not exist
if of == "id":
return []
elif of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
elif of.startswith("h"):
print_warning(req, _("Requested record does not seem to exist."))
elif action == "browse":
## 2 - browse needed
page_start(req, of, cc, as, ln, uid, _("Browse"))
if of.startswith("h"):
req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so, sp, rm, of, ot, as, ln, p1, f1, m1, op1,
p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, dt, jrec, ec, action))
try:
if as == 1 or (p1 or p2 or p3):
browse_pattern(req, colls_to_search, p1, f1, rg, ln)
browse_pattern(req, colls_to_search, p2, f2, rg, ln)
browse_pattern(req, colls_to_search, p3, f3, rg, ln)
else:
browse_pattern(req, colls_to_search, p, f, rg, ln)
except:
if of.startswith("h"):
req.write(create_error_box(req, verbose=verbose, ln=ln))
elif of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
return page_end(req, of, ln)
elif rm and p.startswith("recid:"):
## 3-ter - similarity search needed
page_start(req, of, cc, as, ln, uid, _("Search Results"))
if of.startswith("h"):
req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so, sp, rm, of, ot, as, ln, p1, f1, m1, op1,
p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, dt, jrec, ec, action))
if record_exists(p[6:]) != 1:
# record does not exist
if of.startswith("h"):
print_warning(req, "Requested record does not seem to exist.")
if of == "id":
return []
elif of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
else:
# record well exists, so find similar ones to it
t1 = os.times()[4]
results_similar_recIDs, results_similar_relevances, results_similar_relevances_prologue, results_similar_relevances_epilogue, results_similar_comments = \
rank_records(rm, 0, get_collection_reclist(cdsname), string.split(p), verbose)
if results_similar_recIDs:
t2 = os.times()[4]
cpu_time = t2 - t1
if of.startswith("h"):
req.write(print_search_info(p, f, sf, so, sp, rm, of, ot, cdsname, len(results_similar_recIDs),
jrec, rg, as, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2,
sc, pl_in_url,
d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time))
print_warning(req, results_similar_comments)
print_records(req, results_similar_recIDs, jrec, rg, of, ot, ln,
results_similar_relevances, results_similar_relevances_prologue, results_similar_relevances_epilogue, search_pattern=p, verbose=verbose)
elif of=="id":
return results_similar_recIDs
elif of.startswith("x"):
print_records(req, results_similar_recIDs, jrec, rg, of, ot, ln,
results_similar_relevances, results_similar_relevances_prologue, results_similar_relevances_epilogue, search_pattern=p, verbose=verbose)
else:
# rank_records failed and returned some error message to display:
if of.startswith("h"):
print_warning(req, results_similar_relevances_prologue)
print_warning(req, results_similar_relevances_epilogue)
print_warning(req, results_similar_comments)
if of == "id":
return []
elif of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
elif p.startswith("cocitedwith:"): #WAS EXPERIMENTAL
## 3-terter - cited by search needed
page_start(req, of, cc, as, ln, uid, _("Search Results"))
if of.startswith("h"):
req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so, sp, rm, of, ot, as, ln, p1, f1, m1, op1,
p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, dt, jrec, ec, action))
recID = p[12:]
if record_exists(recID) != 1:
# record does not exist
if of.startswith("h"):
print_warning(req, "Requested record does not seem to exist.")
if of == "id":
return []
elif of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
else:
# record well exists, so find co-cited ones:
t1 = os.times()[4]
results_cocited_recIDs = map(lambda x: x[0], calculate_co_cited_with_list(int(recID)))
if results_cocited_recIDs:
t2 = os.times()[4]
cpu_time = t2 - t1
if of.startswith("h"):
req.write(print_search_info(p, f, sf, so, sp, rm, of, ot, cdsname, len(results_cocited_recIDs),
jrec, rg, as, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2,
sc, pl_in_url,
d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time))
print_records(req, results_cocited_recIDs, jrec, rg, of, ot, ln, search_pattern=p, verbose=verbose)
elif of=="id":
return results_cocited_recIDs
elif of.startswith("x"):
print_records(req, results_cocited_recIDs, jrec, rg, of, ot, ln, search_pattern=p, verbose=verbose)
else:
# cited rank_records failed and returned some error message to display:
if of.startswith("h"):
print_warning(req, "nothing found")
if of == "id":
return []
elif of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
else:
## 3 - common search needed
page_start(req, of, cc, as, ln, uid, _("Search Results"))
if of.startswith("h"):
req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so, sp, rm, of, ot, as, ln, p1, f1, m1, op1,
p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, dt, jrec, ec, action))
t1 = os.times()[4]
results_in_any_collection = HitSet()
if as == 1 or (p1 or p2 or p3):
## 3A - advanced search
try:
results_in_any_collection = search_pattern(req, p1, f1, m1, ap=ap, of=of, verbose=verbose, ln=ln)
if len(results_in_any_collection) == 0:
if of.startswith("h"):
perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos)
elif of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
return page_end(req, of, ln)
if p2:
results_tmp = search_pattern(req, p2, f2, m2, ap=ap, of=of, verbose=verbose, ln=ln)
if op1 == "a": # add
results_in_any_collection.intersection_update(results_tmp)
elif op1 == "o": # or
results_in_any_collection.union_update(results_tmp)
elif op1 == "n": # not
results_in_any_collection.difference_update(results_tmp)
else:
if of.startswith("h"):
print_warning(req, "Invalid set operation %s." % op1, "Error")
if len(results_in_any_collection) == 0:
if of.startswith("h"):
perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos)
elif of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
return page_end(req, of, ln)
if p3:
results_tmp = search_pattern(req, p3, f3, m3, ap=ap, of=of, verbose=verbose, ln=ln)
if op2 == "a": # add
results_in_any_collection.intersection_update(results_tmp)
elif op2 == "o": # or
results_in_any_collection.union_update(results_tmp)
elif op2 == "n": # not
results_in_any_collection.difference_update(results_tmp)
else:
if of.startswith("h"):
print_warning(req, "Invalid set operation %s." % op2, "Error")
except:
if of.startswith("h"):
req.write(create_error_box(req, verbose=verbose, ln=ln))
perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos)
elif of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
return page_end(req, of, ln)
else:
## 3B - simple search
try:
results_in_any_collection = search_pattern(req, p, f, ap=ap, of=of, verbose=verbose, ln=ln)
except:
if of.startswith("h"):
req.write(create_error_box(req, verbose=verbose, ln=ln))
perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos)
return page_end(req, of, ln)
if len(results_in_any_collection) == 0:
if of.startswith("h"):
perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos)
elif of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
return page_end(req, of, ln)
# search_cache_key = p+"@"+f+"@"+string.join(colls_to_search,",")
# if search_cache.has_key(search_cache_key): # is the result in search cache?
# results_final = search_cache[search_cache_key]
# else:
# results_final = search_pattern(req, p, f, colls_to_search)
# search_cache[search_cache_key] = results_final
# if len(search_cache) > CFG_WEBSEARCH_SEARCH_CACHE_SIZE: # is the cache full? (sanity cleaning)
# search_cache.clear()
# search stage 4: intersection with collection universe:
try:
results_final = intersect_results_with_collrecs(req, results_in_any_collection, colls_to_search, ap, of, verbose, ln)
except:
if of.startswith("h"):
req.write(create_error_box(req, verbose=verbose, ln=ln))
perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos)
return page_end(req, of, ln)
if results_final == {}:
if of.startswith("h"):
perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos)
if of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
return page_end(req, of, ln)
# search stage 5: apply search option limits and restrictions:
if datetext1 != "":
if verbose and of.startswith("h"):
print_warning(req, "Search stage 5: applying time limits, from %s until %s..." % (datetext1, datetext2))
try:
results_final = intersect_results_with_hitset(req,
results_final,
search_unit_in_bibrec(datetext1, datetext2, dt),
ap,
aptext= _("No match within your time limits, "
"discarding this condition..."),
of=of)
except:
if of.startswith("h"):
req.write(create_error_box(req, verbose=verbose, ln=ln))
perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos)
return page_end(req, of, ln)
if results_final == {}:
if of.startswith("h"):
perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos)
return page_end(req, of, ln)
if pl:
pl = wash_pattern(pl)
if verbose and of.startswith("h"):
print_warning(req, "Search stage 5: applying search pattern limit %s..." % (pl,))
try:
results_final = intersect_results_with_hitset(req,
results_final,
search_pattern(req, pl, ap=0, ln=ln),
ap,
aptext=_("No match within your search limits, "
"discarding this condition..."),
of=of)
except:
if of.startswith("h"):
req.write(create_error_box(req, verbose=verbose, ln=ln))
perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos)
return page_end(req, of, ln)
if results_final == {}:
if of.startswith("h"):
perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos)
if of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
return page_end(req, of, ln)
t2 = os.times()[4]
cpu_time = t2 - t1
## search stage 6: display results:
results_final_nb_total = 0
results_final_nb = {} # will hold number of records found in each collection
# (in simple dict to display overview more easily)
for coll in results_final.keys():
results_final_nb[coll] = len(results_final[coll])
#results_final_nb_total += results_final_nb[coll]
# Now let us calculate results_final_nb_total more precisely,
# in order to get the total number of "distinct" hits across
# searched collections; this is useful because a record might
# have been attributed to more than one primary collection; so
# we have to avoid counting it multiple times. The price to
# pay for this accuracy of results_final_nb_total is somewhat
# increased CPU time.
if results_final.keys() == 1:
# only one collection; no need to union them
results_final_for_all_selected_colls = results_final.values()[0]
results_final_nb_total = results_final_nb.values()[0]
else:
# okay, some work ahead to union hits across collections:
results_final_for_all_selected_colls = HitSet()
for coll in results_final.keys():
results_final_for_all_selected_colls.union_update(results_final[coll])
results_final_nb_total = len(results_final_for_all_selected_colls)
if results_final_nb_total == 0:
if of.startswith('h'):
print_warning(req, "No match found, please enter different search terms.")
elif of.startswith("x"):
# Print empty, but valid XML
print_records_prologue(req, of)
print_records_epilogue(req, of)
else:
# yes, some hits found: good!
# collection list may have changed due to not-exact-match-found policy so check it out:
for coll in results_final.keys():
if coll not in colls_to_search:
colls_to_search.append(coll)
# print results overview:
if of == "id":
# we have been asked to return list of recIDs
recIDs = list(results_final_for_all_selected_colls)
if sf: # do we have to sort?
recIDs = sort_records(req, recIDs, sf, so, sp, verbose, of)
elif rm: # do we have to rank?
results_final_for_all_colls_rank_records_output = rank_records(rm, 0, results_final_for_all_selected_colls,
string.split(p) + string.split(p1) +
string.split(p2) + string.split(p3), verbose)
if results_final_for_all_colls_rank_records_output[0]:
recIDs = results_final_for_all_colls_rank_records_output[0]
return recIDs
elif of.startswith("h"):
req.write(print_results_overview(req, colls_to_search, results_final_nb_total, results_final_nb, cpu_time, ln, ec))
selected_external_collections_infos = print_external_results_overview(req, cc, [p, p1, p2, p3], f, ec, verbose, ln)
# print number of hits found for XML outputs:
if of.startswith("x"):
req.write("\n" % results_final_nb_total)
# print records:
if len(colls_to_search)>1:
cpu_time = -1 # we do not want to have search time printed on each collection
print_records_prologue(req, of)
for coll in colls_to_search:
if results_final.has_key(coll) and len(results_final[coll]):
if of.startswith("h"):
req.write(print_search_info(p, f, sf, so, sp, rm, of, ot, coll, results_final_nb[coll],
jrec, rg, as, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2,
sc, pl_in_url,
d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time))
results_final_recIDs = list(results_final[coll])
results_final_relevances = []
results_final_relevances_prologue = ""
results_final_relevances_epilogue = ""
if sf: # do we have to sort?
results_final_recIDs = sort_records(req, results_final_recIDs, sf, so, sp, verbose, of)
elif rm: # do we have to rank?
results_final_recIDs_ranked, results_final_relevances, results_final_relevances_prologue, results_final_relevances_epilogue, results_final_comments = \
rank_records(rm, 0, results_final[coll],
string.split(p) + string.split(p1) +
string.split(p2) + string.split(p3), verbose)
if of.startswith("h"):
print_warning(req, results_final_comments)
if results_final_recIDs_ranked:
results_final_recIDs = results_final_recIDs_ranked
else:
# rank_records failed and returned some error message to display:
print_warning(req, results_final_relevances_prologue)
print_warning(req, results_final_relevances_epilogue)
print_records(req, results_final_recIDs, jrec, rg, of, ot, ln,
results_final_relevances,
results_final_relevances_prologue,
results_final_relevances_epilogue,
search_pattern=p,
print_records_prologue_p=False,
print_records_epilogue_p=False,
verbose=verbose)
if of.startswith("h"):
req.write(print_search_info(p, f, sf, so, sp, rm, of, ot, coll, results_final_nb[coll],
jrec, rg, as, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2,
sc, pl_in_url,
d1y, d1m, d1d, d2y, d2m, d2d, dt, cpu_time, 1))
print_records_epilogue(req, of)
if f == "author" and of.startswith("h"):
req.write(create_similarly_named_authors_link_box(p, ln))
# log query:
try:
id_query = log_query(req.get_remote_host(), req.args, uid)
if of.startswith("h") and id_query:
# Alert/RSS teaser:
req.write(websearch_templates.tmpl_alert_rss_teaser_box_for_query(id_query, ln=ln))
except:
# do not log query if req is None (used by CLI interface)
pass
log_query_info("ss", p, f, colls_to_search, results_final_nb_total)
# External searches
if of.startswith("h"):
perform_external_collection_search(req, cc, [p, p1, p2, p3], f, ec, verbose, ln, selected_external_collections_infos)
return page_end(req, of, ln)
def perform_request_cache(req, action="show"):
"""Manipulates the search engine cache."""
global search_cache
global collection_reclist_cache
global collection_reclist_cache_timestamp
global field_i18nname_cache
global field_i18nname_cache_timestamp
global collection_i18nname_cache
global collection_i18nname_cache_timestamp
req.content_type = "text/html"
req.send_http_header()
out = ""
out += "
Search Cache
"
# clear cache if requested:
if action == "clear":
search_cache = {}
collection_reclist_cache = create_collection_reclist_cache()
# show collection reclist cache:
out += "
Collection reclist cache
"
out += "- collection table last updated: %s" % get_table_update_time('collection')
out += " - reclist cache timestamp: %s" % collection_reclist_cache_timestamp
out += " - reclist cache contents:"
out += "
"
for coll in collection_reclist_cache.keys():
if collection_reclist_cache[coll]:
out += "%s (%d) " % (coll, len(get_collection_reclist(coll)))
out += "
"
# show search cache:
out += "
Search Cache
"
out += "
"
if len(search_cache):
out += """
"""
out += "
%s
%s
%s
%s
" % \
("Pattern", "Field", "Collection", "Number of Hits")
for search_cache_key in search_cache.keys():
p, f, c = string.split(search_cache_key, "@", 2)
# find out about length of cached data:
l = 0
for coll in search_cache[search_cache_key]:
l += len(search_cache[search_cache_key][coll])
out += "
%s
%s
%s
%d
" % (p, f, c, l)
out += "
"
else:
out += "
Search cache is empty."
out += "
"
out += """
clear cache""" % weburl
# show field i18nname cache:
out += "
Field I18N names cache
"
out += "- fieldname table last updated: %s" % get_table_update_time('fieldname')
out += " - i18nname cache timestamp: %s" % field_i18nname_cache_timestamp
out += " - i18nname cache contents:"
out += "
"
for field in field_i18nname_cache.keys():
for ln in field_i18nname_cache[field].keys():
out += "%s, %s = %s " % (field, ln, field_i18nname_cache[field][ln])
out += "
"
# show collection i18nname cache:
out += "
Collection I18N names cache
"
out += "- collectionname table last updated: %s" % get_table_update_time('collectionname')
out += " - i18nname cache timestamp: %s" % collection_i18nname_cache_timestamp
out += " - i18nname cache contents:"
out += "
"
for coll in collection_i18nname_cache.keys():
for ln in collection_i18nname_cache[coll].keys():
out += "%s, %s = %s " % (coll, ln, collection_i18nname_cache[coll][ln])
out += "
"
req.write("")
req.write(out)
req.write("")
return "\n"
def perform_request_log(req, date=""):
"""Display search log information for given date."""
req.content_type = "text/html"
req.send_http_header()
req.write("")
req.write("
Search Log
")
if date: # case A: display stats for a day
yyyymmdd = string.atoi(date)
req.write("
Date: %d
" % yyyymmdd)
req.write("""
""")
req.write("
%s
%s
%s
%s
%s
%s
" % ("No.", "Time", "Pattern", "Field", "Collection", "Number of Hits"))
# read file:
- p = os.popen("grep ^%d %s/search.log" % (yyyymmdd, logdir), 'r')
+ p = os.popen("grep ^%d %s/search.log" % (yyyymmdd, CFG_LOGDIR), 'r')
lines = p.readlines()
p.close()
# process lines:
i = 0
for line in lines:
try:
datetime, as, p, f, c, nbhits = string.split(line,"#")
i += 1
req.write("
")
else: # case B: display summary stats per day
yyyymm01 = int(time.strftime("%Y%m01", time.localtime()))
yyyymmdd = int(time.strftime("%Y%m%d", time.localtime()))
req.write("""
""")
req.write("
%s
%s
" % ("Day", "Number of Queries"))
for day in range(yyyymm01, yyyymmdd + 1):
- p = os.popen("grep -c ^%d %s/search.log" % (day, logdir), 'r')
+ p = os.popen("grep -c ^%d %s/search.log" % (day, CFG_LOGDIR), 'r')
for line in p.readlines():
req.write("""
")
req.write("")
return "\n"
def profile(p="", f="", c=cdsname):
"""Profile search time."""
import profile
import pstats
profile.run("perform_request_search(p='%s',f='%s', c='%s')" % (p, f, c), "perform_request_search_profile")
p = pstats.Stats("perform_request_search_profile")
p.strip_dirs().sort_stats("cumulative").print_stats()
return 0
## test cases:
#print wash_colls(cdsname,"Library Catalogue", 0)
#print wash_colls("Periodicals & Progress Reports",["Periodicals","Progress Reports"], 0)
#print wash_field("wau")
#print print_record(20,"tm","001,245")
#print create_opft_search_units(None, "PHE-87-13","reportnumber")
#print ":"+wash_pattern("* and % doo * %")+":\n"
#print ":"+wash_pattern("*")+":\n"
#print ":"+wash_pattern("ellis* ell* e*%")+":\n"
#print run_sql("SELECT name,dbquery from collection")
#print get_index_id("author")
#print get_coll_ancestors("Theses")
#print get_coll_sons("Articles & Preprints")
#print get_coll_real_descendants("Articles & Preprints")
#print get_collection_reclist("Theses")
#print log(sys.stdin)
#print search_unit_in_bibrec('2002-12-01','2002-12-12')
#print type(wash_url_argument("-1",'int'))
#print get_nearest_terms_in_bibxxx("ellis", "author", 5, 5)
#print call_bibformat(68, "HB_FLY")
#print create_collection_i18nname_cache()
#print get_fieldvalues(10, "980__a")
#print get_fieldvalues_alephseq_like(10,"001___")
#print get_fieldvalues_alephseq_like(10,"980__a")
#print get_fieldvalues_alephseq_like(10,"foo")
#print get_fieldvalues_alephseq_like(10,"-1")
#print get_fieldvalues_alephseq_like(10,"99")
#print get_fieldvalues_alephseq_like(10,["001", "980"])
## profiling:
#profile("of the this")
#print perform_request_search(p="ellis")
diff --git a/modules/websearch/lib/websearch_templates.py b/modules/websearch/lib/websearch_templates.py
index 3e6e36af5..737d59147 100644
--- a/modules/websearch/lib/websearch_templates.py
+++ b/modules/websearch/lib/websearch_templates.py
@@ -1,2910 +1,2910 @@
# -*- coding: utf-8 -*-
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
# pylint: disable-msg=C0301
__revision__ = "$Id$"
import urllib
import time
import cgi
import gettext
import string
import locale
from invenio.config import \
CFG_WEBSEARCH_ADVANCEDSEARCH_PATTERN_BOX_WIDTH, \
CFG_WEBSEARCH_AUTHOR_ET_AL_THRESHOLD, \
CFG_WEBSEARCH_USE_ALEPH_SYSNOS, \
CFG_BIBRANK_SHOW_READING_STATS, \
CFG_BIBRANK_SHOW_DOWNLOAD_STATS, \
CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS, \
CFG_BIBRANK_SHOW_CITATION_LINKS, \
CFG_BIBRANK_SHOW_CITATION_STATS, \
CFG_BIBRANK_SHOW_CITATION_GRAPHS, \
CFG_WEBSEARCH_RSS_TTL, \
cdslang, \
cdsname, \
cdsnameintl, \
- version, \
+ CFG_VERSION, \
weburl, \
supportemail
from invenio.dbquery import run_sql
from invenio.messages import gettext_set_language
#from invenio.search_engine_config import CFG_EXPERIMENTAL_FEATURES
from invenio.urlutils import make_canonical_urlargd, drop_default_urlargd, create_html_link, create_url
from invenio.htmlutils import nmtoken_from_string
from invenio.webinterface_handler import wash_urlargd
from invenio.websearch_external_collections import external_collection_get_state
def get_fieldvalues(recID, tag):
"""Return list of field values for field TAG inside record RECID.
FIXME: should be imported commonly for search_engine too."""
out = []
if tag == "001___":
# we have asked for recID that is not stored in bibXXx tables
out.append(str(recID))
else:
# we are going to look inside bibXXx tables
digit = tag[0:2]
bx = "bib%sx" % digit
bibx = "bibrec_bib%sx" % digit
query = "SELECT bx.value FROM %s AS bx, %s AS bibx WHERE bibx.id_bibrec='%s' AND bx.id=bibx.id_bibxxx AND bx.tag LIKE '%s'" \
"ORDER BY bibx.field_number, bx.tag ASC" % (bx, bibx, recID, tag)
res = run_sql(query)
for row in res:
out.append(row[0])
return out
class Template:
# This dictionary maps CDS Invenio language code to locale codes (ISO 639)
tmpl_localemap = {
'bg': 'bg_BG',
'ca': 'ca_ES',
'de': 'de_DE',
'el': 'el_GR',
'en': 'en_US',
'es': 'es_ES',
'pt': 'pt_BR',
'fr': 'fr_FR',
'it': 'it_IT',
'ru': 'ru_RU',
'sk': 'sk_SK',
'cs': 'cs_CZ',
'no': 'no_NO',
'sv': 'sv_SE',
'uk': 'uk_UA',
'ja': 'ja_JA',
'pl': 'pl_PL',
'hr': 'hr_HR',
'zh_CN': 'zh_CN',
'zh_TW': 'zh_TW',
}
tmpl_default_locale = "en_US" # which locale to use by default, useful in case of failure
# Type of the allowed parameters for the web interface for search results
search_results_default_urlargd = {
'cc': (str, cdsname),
'c': (list, []),
'p': (str, ""), 'f': (str, ""),
'rg': (int, 10),
'sf': (str, ""),
'so': (str, "d"),
'sp': (str, ""),
'rm': (str, ""),
'of': (str, "hb"),
'ot': (list, []),
'as': (int, 0),
'p1': (str, ""), 'f1': (str, ""), 'm1': (str, ""), 'op1':(str, ""),
'p2': (str, ""), 'f2': (str, ""), 'm2': (str, ""), 'op2':(str, ""),
'p3': (str, ""), 'f3': (str, ""), 'm3': (str, ""),
'sc': (int, 0),
'jrec': (int, 0),
'recid': (int, -1), 'recidb': (int, -1), 'sysno': (str, ""),
'id': (int, -1), 'idb': (int, -1), 'sysnb': (str, ""),
'action': (str, "search"),
'action_search': (str, ""),
'action_browse': (str, ""),
'd1': (str, ""),
'd1y': (int, 0), 'd1m': (int, 0), 'd1d': (int, 0),
'd2': (str, ""),
'd2y': (int, 0), 'd2m': (int, 0), 'd2d': (int, 0),
'dt': (str, ""),
'ap': (int, 1),
'verbose': (int, 0),
'ec': (list, []),
}
# ...and for search interfaces
search_interface_default_urlargd = {
'as': (int, 0),
'verbose': (int, 0)}
# ...and for RSS feeds
rss_default_urlargd = {'c' : (list, []),
'cc' : (str, ""),
'p' : (str, ""),
'f' : (str, ""),
'p1' : (str, ""),
'f1' : (str, ""),
'm1' : (str, ""),
'op1': (str, ""),
'p2' : (str, ""),
'f2' : (str, ""),
'm2' : (str, ""),
'op2': (str, ""),
'p3' : (str, ""),
'f3' : (str, ""),
'm3' : (str, "")}
tmpl_openurl_accepted_args = {
'genre' : (str, ''),
'aulast' : (str, ''),
'aufirst' : (str, ''),
'auinit' : (str, ''),
'auinit1' : (str, ''),
'auinitm' : (str, ''),
'issn' : (str, ''),
'eissn' : (str, ''),
'coden' : (str, ''),
'isbn' : (str, ''),
'sici' : (str, ''),
'bici' : (str, ''),
'title' : (str, ''),
'stitle' : (str, ''),
'atitle' : (str, ''),
'volume' : (str, ''),
'part' : (str, ''),
'issue' : (str, ''),
'spage' : (str, ''),
'epage' : (str, ''),
'pages' : (str, ''),
'artnum' : (str, ''),
'date' : (str, ''),
'ssn' : (str, ''),
'quarter' : (str, ''),
'url_ver' : (str, ''),
'ctx_ver' : (str, ''),
'rft_val_fmt' : (str, ''),
'rfr_id' : (str, ''),
'rft.atitle' : (str, ''),
'rft.title' : (str, ''),
'rft.jtitle' : (str, ''),
'rft.stitle' : (str, ''),
'rft.date' : (str, ''),
'rft.volume' : (str, ''),
'rft.issue' : (str, ''),
'rft.spage' : (str, ''),
'rft.epage' : (str, ''),
'rft.pages' : (str, ''),
'rft.artnumber' : (str, ''),
'rft.issn' : (str, ''),
'rft.eissn' : (str, ''),
'rft.aulast' : (str, ''),
'rft.aufirst' : (str, ''),
'rft.auinit' : (str, ''),
'rft.auinit1' : (str, ''),
'rft.auinitm' : (str, ''),
'rft.ausuffix' : (str, ''),
'rft.au' : (list, []),
'rft.aucorp' : (str, ''),
'rft.isbn' : (str, ''),
'rft.coden' : (str, ''),
'rft.sici' : (str, ''),
'rft.genre' : (str, 'unknown'),
'rft.chron' : (str, ''),
'rft.ssn' : (str, ''),
'rft.quarter' : (int, ''),
'rft.part' : (str, ''),
'rft.btitle' : (str, ''),
'rft.isbn' : (str, ''),
'rft.atitle' : (str, ''),
'rft.place' : (str, ''),
'rft.pub' : (str, ''),
'rft.edition' : (str, ''),
'rft.tpages' : (str, ''),
'rft.series' : (str, ''),
}
def tmpl_openurl2invenio(self, openurl_data):
""" Return an Invenio url corresponding to a search with the data
included in the openurl form map.
"""
from invenio.search_engine import perform_request_search
aulast = openurl_data['rft.aulast'] or openurl_data['aulast']
aufirst = openurl_data['rft.aufirst'] or openurl_data['aufirst']
auinit = openurl_data['rft.auinit'] or \
openurl_data['auinit'] or \
openurl_data['rft.auinit1'] + ' ' + openurl_data['rft.auinitm'] or \
openurl_data['auinit1'] + ' ' + openurl_data['auinitm'] or aufirst[:1]
auinit = auinit.upper()
if aulast and aufirst:
author_query = 'author:"%s, %s" or author:"%s, %s"' % (aulast, aufirst, aulast, auinit)
elif aulast and auinit:
author_query = 'author:"%s, %s"' % (aulast, auinit)
else:
author_query = ''
title = openurl_data['rft.atitle'] or \
openurl_data['atitle'] or \
openurl_data['rft.btitle'] or \
openurl_data['rft.title'] or \
openurl_data['title']
if title:
title_query = 'title:"%s"' % title
else:
title_query = ''
jtitle = openurl_data['rft.stitle'] or \
openurl_data['stitle'] or \
openurl_data['rft.jtitle'] or \
openurl_data['title']
if jtitle:
journal_query = 'journal:"%s"' % jtitle
else:
journal_query = ''
isbn = openurl_data['rft.isbn'] or \
openurl_data['isbn']
if isbn:
isbn_query = '020__a:"%s"' % isbn
else:
isbn_query = ''
issn = openurl_data['rft.eissn'] or \
openurl_data['eissn'] or \
openurl_data['rft.issn'] or \
openurl_data['issn']
if issn:
issn_query = '022__a:"%s"' % issn
else:
issn_query = ''
coden = openurl_data['rft.coden'] or openurl_data['coden']
if coden:
coden_query = '030__a:"%s"' % coden
else:
coden_query = ''
if openurl_data['rfr_id'].startswith('info:doi/'):
doi_query = '773__a:"%s"' % openurl_data['rfr_id'][len('info:doi/'):]
else:
doi_query = ''
if doi_query:
if perform_request_search(p=doi_query):
return '%s/search%s' % (weburl, make_canonical_urlargd({
'p' : doi_query,
'sc' : 1,
'of' : 'hd'}, {}))
if isbn_query:
if perform_request_search(p=isbn_query):
return '%s/search%s' % (weburl, make_canonical_urlargd({
'p' : isbn_query,
'sc' : 1,
'of' : 'hd'}, {}))
if coden_query:
if perform_request_search(p=coden_query):
return '%s/search%s' % (weburl, make_canonical_urlargd({
'p' : coden_query,
'sc' : 1,
'of' : 'hd'}, {}))
if author_query and title_query:
if perform_request_search(p='%s and %s' % (title_query, author_query)):
return '%s/search%s' % (weburl, make_canonical_urlargd({
'p' : '%s and %s' % (title_query, author_query),
'sc' : 1,
'of' : 'hd'}, {}))
if title_query:
if perform_request_search(p=title_query):
return '%s/search%s' % (weburl, make_canonical_urlargd({
'p' : title_query,
'sc' : 1,
'of' : 'hb'}, {}))
if title:
return '%s/search%s' % (weburl, make_canonical_urlargd({
'p' : title,
'sc' : 1,
'of' : 'hb'}, {}))
return ''
def build_search_url(self, known_parameters={}, **kargs):
""" Helper for generating a canonical search
url. 'known_parameters' is the list of query parameters you
inherit from your current query. You can then pass keyword
arguments to modify this query.
build_search_url(known_parameters, of="xm")
The generated URL is absolute.
"""
parameters = {}
parameters.update(known_parameters)
parameters.update(kargs)
# Now, we only have the arguments which have _not_ their default value
parameters = drop_default_urlargd(parameters, self.search_results_default_urlargd)
# Asking for a recid? Return a /record/ URL
if 'recid' in parameters:
target = "%s/record/%d" % (weburl, parameters['recid'])
del parameters['recid']
target += make_canonical_urlargd(parameters, self.search_results_default_urlargd)
return target
return "%s/search%s" % (weburl, make_canonical_urlargd(parameters, self.search_results_default_urlargd))
def build_search_interface_url(self, known_parameters={}, **kargs):
""" Helper for generating a canonical search interface URL."""
parameters = {}
parameters.update(known_parameters)
parameters.update(kargs)
c = parameters['c']
del parameters['c']
# Now, we only have the arguments which have _not_ their default value
if c and c != cdsname:
base = weburl + '/collection/' + urllib.quote(c)
else:
base = weburl
return create_url(base, drop_default_urlargd(parameters, self.search_results_default_urlargd))
def build_rss_url(self, known_parameters, **kargs):
"""Helper for generating a canonical RSS URL"""
parameters = {}
parameters.update(known_parameters)
parameters.update(kargs)
# Keep only interesting parameters
argd = wash_urlargd(parameters, self.rss_default_urlargd)
if argd:
# Handle 'c' differently since it is a list
c = argd.get('c', [])
del argd['c']
# Create query, and drop empty params
args = make_canonical_urlargd(argd, self.rss_default_urlargd)
if c != []:
# Add collections
c = [urllib.quote(coll) for coll in c]
args += '&c=' + '&c='.join(c)
return weburl + '/rss' + args
def tmpl_record_page_header_content(self, req, recid, ln):
""" Provide extra information in the header of /record pages """
_ = gettext_set_language(ln)
title = get_fieldvalues(recid, "245__a")
if title:
title = _("Record") + '#%d: %s' %(recid, cgi.escape(title[0]))
else:
title = _("Record") + ' #%d' % recid
keywords = ', '.join(get_fieldvalues(recid, "6531_a"))
description = ' '.join(get_fieldvalues(recid, "520__a"))
description += "\n"
description += '; '.join(get_fieldvalues(recid, "100__a") + get_fieldvalues(recid, "700__a"))
return [cgi.escape(x, True) for x in (title, description, keywords)]
def tmpl_navtrail_links(self, as, ln, dads):
"""
Creates the navigation bar at top of each search page (*Home > Root collection > subcollection > ...*)
Parameters:
- 'as' *bool* - Should we display an advanced search box?
- 'ln' *string* - The language to display
- 'weburl' *string* - The base URL for the site
- 'separator' *string* - The separator between two consecutive collections
- 'dads' *list* - A list of parent links, eachone being a dictionary of ('name', 'longname')
"""
out = []
for url, name in dads:
out.append(create_html_link(self.build_search_interface_url(c=url, as=as, ln=ln), {}, cgi.escape(name), {'class': 'navtrail'}))
return ' > '.join(out)
def tmpl_webcoll_body(self, ln, collection, te_portalbox,
searchfor, np_portalbox, narrowsearch,
focuson, instantbrowse, ne_portalbox):
""" Creates the body of the main search page.
Parameters:
- 'ln' *string* - language of the page being generated
- 'collection' - collection id of the page being generated
- 'te_portalbox' *string* - The HTML code for the portalbox on top of search
- 'searchfor' *string* - The HTML code for the search options
- 'np_portalbox' *string* - The HTML code for the portalbox on bottom of search
- 'searchfor' *string* - The HTML code for the search categories (left bottom of page)
- 'focuson' *string* - The HTML code for the "focuson" categories (right bottom of page)
- 'ne_portalbox' *string* - The HTML code for the bottom of the page
"""
if not narrowsearch:
narrowsearch = instantbrowse
body = '''
""" % {'ne_portalbox' : ne_portalbox}
return body
def tmpl_portalbox(self, title, body):
"""Creates portalboxes based on the parameters
Parameters:
- 'title' *string* - The title of the box
- 'body' *string* - The HTML code for the body of the box
"""
out = """
%(title)s
%(body)s
""" % {'title' : cgi.escape(title), 'body' : body}
return out
def tmpl_searchfor_simple(self, ln, collection_id, collection_name, record_count, middle_option):
"""Produces simple *Search for* box for the current collection.
Parameters:
- 'ln' *string* - The language to display
- 'header' *string* - header of search form
- 'middle_option' *string* - HTML code for the options (any field, specific fields ...)
"""
# load the right message language
_ = gettext_set_language(ln)
out = '''
'''
argd = drop_default_urlargd({'ln': ln, 'cc': collection_id, 'sc': 1},
self.search_results_default_urlargd)
# Only add non-default hidden values
for field, value in argd.items():
out += self.tmpl_input_hidden(field, value)
header = _("Search %s records for:") % \
self.tmpl_nbrecs_info(record_count, "","")
asearchurl = self.build_search_interface_url(c=collection_id, as=1, ln=ln)
# print commentary start:
out += '''
''' % {'ln' : ln,
'langlink': ln != cdslang and '?ln=' + ln or '',
'weburl' : weburl,
'asearch' : create_html_link(asearchurl, {}, _('Advanced Search')),
'header' : header,
'middle_option' : middle_option,
'msg_search' : _('Search'),
'msg_browse' : _('Browse'),
'msg_search_tips' : _('Search Tips')}
return out
def tmpl_searchfor_advanced(self,
ln, # current language
collection_id,
collection_name,
record_count,
middle_option_1, middle_option_2, middle_option_3,
searchoptions,
sortoptions,
rankoptions,
displayoptions,
formatoptions
):
"""
Produces advanced *Search for* box for the current collection.
Parameters:
- 'ln' *string* - The language to display
- 'weburl' *string* - The base URL for the site
- 'ssearchurl' *string* - The URL to simple search form
- 'header' *string* - header of search form
- 'middle_option_1' *string* - HTML code for the first row of options (any field, specific fields ...)
- 'middle_option_2' *string* - HTML code for the second row of options (any field, specific fields ...)
- 'middle_option_3' *string* - HTML code for the third row of options (any field, specific fields ...)
- 'searchoptions' *string* - HTML code for the search options
- 'sortoptions' *string* - HTML code for the sort options
- 'rankoptions' *string* - HTML code for the rank options
- 'displayoptions' *string* - HTML code for the display options
- 'formatoptions' *string* - HTML code for the format options
"""
# load the right message language
_ = gettext_set_language(ln)
out = '''
'''
argd = drop_default_urlargd({'ln': ln, 'as': 1, 'cc': collection_id, 'sc': 1},
self.search_results_default_urlargd)
# Only add non-default hidden values
for field, value in argd.items():
out += self.tmpl_input_hidden(field, value)
header = _("Search %s records for") % \
self.tmpl_nbrecs_info(record_count, "","")
header += ':'
ssearchurl = self.build_search_interface_url(c=collection_id, as=0, ln=ln)
out += '''
""" % {
'added' : _("Added/modified since:"),
'until' : _("until:"),
'added_or_modified': self.tmpl_inputdatetype(ln=ln),
'date_added' : self.tmpl_inputdate("d1", ln=ln),
'date_until' : self.tmpl_inputdate("d2", ln=ln),
'msg_sort' : _("Sort by:"),
'msg_display' : _("Display results:"),
'msg_format' : _("Output format:"),
'sortoptions' : sortoptions,
'rankoptions' : rankoptions,
'displayoptions' : displayoptions,
'formatoptions' : formatoptions
}
return out
def tmpl_matchtype_box(self, name='m', value='', ln='en'):
"""Returns HTML code for the 'match type' selection box.
Parameters:
- 'name' *string* - The name of the produced select
- 'value' *string* - The selected value (if any value is already selected)
- 'ln' *string* - the language to display
"""
# load the right message language
_ = gettext_set_language(ln)
out = """
""" % {'name' : name,
'sela' : self.tmpl_is_selected('a', value),
'opta' : _("All of the words:"),
'selo' : self.tmpl_is_selected('o', value),
'opto' : _("Any of the words:"),
'sele' : self.tmpl_is_selected('e', value),
'opte' : _("Exact phrase:"),
'selp' : self.tmpl_is_selected('p', value),
'optp' : _("Partial phrase:"),
'selr' : self.tmpl_is_selected('r', value),
'optr' : _("Regular expression:")
}
return out
def tmpl_is_selected(self, var, fld):
"""
Checks if *var* and *fld* are equal, and if yes, returns ' selected="selected"'. Useful for select boxes.
Parameters:
- 'var' *string* - First value to compare
- 'fld' *string* - Second value to compare
"""
if var == fld:
return ' selected="selected"'
else:
return ""
def tmpl_andornot_box(self, name='op', value='', ln='en'):
"""
Returns HTML code for the AND/OR/NOT selection box.
Parameters:
- 'name' *string* - The name of the produced select
- 'value' *string* - The selected value (if any value is already selected)
- 'ln' *string* - the language to display
"""
# load the right message language
_ = gettext_set_language(ln)
out = """
""" % {'name' : name,
'sela' : self.tmpl_is_selected('a', value), 'opta' : _("AND"),
'selo' : self.tmpl_is_selected('o', value), 'opto' : _("OR"),
'seln' : self.tmpl_is_selected('n', value), 'optn' : _("AND NOT")
}
return out
def tmpl_inputdate(self, name, ln, sy = 0, sm = 0, sd = 0):
"""
Produces *From Date*, *Until Date* kind of selection box. Suitable for search options.
Parameters:
- 'name' *string* - The base name of the produced selects
- 'ln' *string* - the language to display
"""
# load the right message language
_ = gettext_set_language(ln)
box = """
"""
# month
box += """
"""
# year
box += """
"""
return box
def tmpl_inputdatetype(self, dt='', ln=cdslang):
"""
Produces input date type selection box to choose
added-or-modified date search option.
Parameters:
- 'dt' *string - date type (c=created, m=modified)
- 'ln' *string* - the language to display
"""
# load the right message language
_ = gettext_set_language(ln)
box = """
""" % { 'added': _("Added since:"),
'modified': _("Modified since:"),
'sel': self.tmpl_is_selected(dt, 'm'),
}
return box
def tmpl_narrowsearch(self, as, ln, type, father,
has_grandchildren, sons, display_grandsons,
grandsons):
"""
Creates list of collection descendants of type *type* under title *title*.
If as==1, then links to Advanced Search interfaces; otherwise Simple Search.
Suitable for 'Narrow search' and 'Focus on' boxes.
Parameters:
- 'as' *bool* - Should we display an advanced search box?
- 'ln' *string* - The language to display
- 'type' *string* - The type of the produced box (virtual collections or normal collections)
- 'father' *collection* - The current collection
- 'has_grandchildren' *bool* - If the current collection has grand children
- 'sons' *list* - The list of the sub-collections (first level)
- 'display_grandsons' *bool* - If the grand children collections should be displayed (2 level deep display)
- 'grandsons' *list* - The list of sub-collections (second level)
"""
# load the right message language
_ = gettext_set_language(ln)
title = {'r': _("Narrow by collection:"),
'v': _("Focus on:")}[type]
if has_grandchildren:
style_prolog = ""
style_epilog = ""
else:
style_prolog = ""
style_epilog = ""
out = """
%(title)s
""" % {'title' : title,
'narrowsearchbox': {'r': 'narrowsearchbox',
'v': 'focusonsearchbox'}[type]}
# iterate through sons:
i = 0
for son in sons:
out += """
""" % \
{ 'narrowsearchbox': {'r': 'narrowsearchbox',
'v': 'focusonsearchbox'}[type]}
if type == 'r':
if son.restricted_p() and son.restricted_p() != father.restricted_p():
out += """
""" % {'name' : cgi.escape(son.name) }
else:
out += """ """ % {'name' : cgi.escape(son.name) }
else:
out += ''
out += """
%(link)s%(recs)s """ % {
'link': create_html_link(self.build_search_interface_url(c=son.name, ln=ln, as=as),
{}, style_prolog + cgi.escape(son.get_name(ln)) + style_epilog),
'recs' : self.tmpl_nbrecs_info(son.nbrecs, ln=ln)}
if son.restricted_p():
out += """ [%(msg)s] """ % { 'msg' : _("restricted") }
if display_grandsons and len(grandsons[i]):
# iterate trough grandsons:
out += """ """
for grandson in grandsons[i]:
out += """ %(link)s%(nbrec)s """ % {
'link': create_html_link(self.build_search_interface_url(c=grandson.name, ln=ln, as=as),
{},
cgi.escape(grandson.get_name(ln))),
'nbrec' : self.tmpl_nbrecs_info(grandson.nbrecs, ln=ln)}
out += """
"""
i += 1
out += "
"
return out
def tmpl_searchalso(self, ln, engines_list, collection_id):
_ = gettext_set_language(ln)
box_name = _("Search also:")
html = """
%(box_name)s
""" % locals()
for engine in engines_list:
internal_name = engine.name
name = _(internal_name)
base_url = engine.base_url
if external_collection_get_state(engine, collection_id) == 3:
checked = ' checked="checked"'
else:
checked = ''
html += """
"""
return html
def tmpl_nbrecs_info(self, number, prolog=None, epilog=None, ln=cdslang):
"""
Return information on the number of records.
Parameters:
- 'number' *string* - The number of records
- 'prolog' *string* (optional) - An HTML code to prefix the number (if **None**, will be
'(')
- 'epilog' *string* (optional) - An HTML code to append to the number (if **None**, will be
')')
"""
if number is None:
number = 0
if prolog is None:
prolog = ''' ('''
if epilog is None:
epilog = ''')'''
return prolog + self.tmpl_nice_number(number, ln) + epilog
def tmpl_box_restricted_content(self, ln):
"""
Displays a box containing a *restricted content* message
Parameters:
- 'ln' *string* - The language to display
"""
# load the right message language
_ = gettext_set_language(ln)
return _("The contents of this collection is restricted.")
def tmpl_box_no_records(self, ln):
"""
Displays a box containing a *no content* message
Parameters:
- 'ln' *string* - The language to display
"""
# load the right message language
_ = gettext_set_language(ln)
return _("This collection does not contain any document yet.")
def tmpl_instant_browse(self, as, ln, recids, more_link = None):
"""
Formats a list of records (given in the recids list) from the database.
Parameters:
- 'as' *int* - Advanced Search interface or not (0 or 1)
- 'ln' *string* - The language to display
- 'recids' *list* - the list of records from the database
- 'more_link' *string* - the "More..." link for the record. If not given, will not be displayed
"""
# load the right message language
_ = gettext_set_language(ln)
body = '''
'''
for recid in recids:
body += '''
%(date)s
%(body)s
''' % {'date': recid['date'],
'body': recid['body']
}
body += "
''' % {'header' : _("Latest additions:"),
'body' : body,
}
def tmpl_searchwithin_select(self, ln, fieldname, selected, values):
"""
Produces 'search within' selection box for the current collection.
Parameters:
- 'ln' *string* - The language to display
- 'fieldname' *string* - the name of the select box produced
- 'selected' *string* - which of the values is selected
- 'values' *list* - the list of values in the select
"""
out = '"""
return out
def tmpl_select(self, fieldname, values, selected=None, css_class=''):
"""
Produces a generic select box
Parameters:
- 'css_class' *string* - optional, a css class to display this select with
- 'fieldname' *list* - the name of the select box produced
- 'selected' *string* - which of the values is selected
- 'values' *list* - the list of values in the select
"""
if css_class != '':
class_field = ' class="%s"' % css_class
else:
class_field = ''
out = '"""
return out
def tmpl_record_links(self, weburl, recid, ln):
"""
Displays the *More info* and *Find similar* links for a record
Parameters:
- 'ln' *string* - The language to display
- 'weburl' *string* - The base URL for the site
- 'recid' *string* - the id of the displayed record
"""
# load the right message language
_ = gettext_set_language(ln)
out = ''' %(detailed)s - %(similar)s''' % {
'detailed': create_html_link(self.build_search_url(recid=recid, ln=ln),
{},
_("Detailed record"), {'class': "moreinfo"}),
'similar': create_html_link(self.build_search_url(p="recid:%d" % recid, rm='wrd', ln=ln),
{},
_("Similar records"),
{'class': "moreinfo"})}
if CFG_BIBRANK_SHOW_CITATION_LINKS:
out += ''' - %s ''' % \
create_html_link(self.build_search_url(p='recid:%d' % recid, rm='citation', ln=ln),
{}, _("Cited by"), {'class': "moreinfo"})
return out
def tmpl_record_body(self, weburl, titles, authors, dates, rns, abstracts, urls_u, urls_z, ln):
"""
Displays the "HTML basic" format of a record
Parameters:
- 'weburl' *string* - The base URL for the site
- 'authors' *list* - the authors (as strings)
- 'dates' *list* - the dates of publication
- 'rns' *list* - the quicknotes for the record
- 'abstracts' *list* - the abstracts for the record
- 'urls_u' *list* - URLs to the original versions of the notice
- 'urls_z' *list* - Not used
"""
out = ""
for title in titles:
out += "%(title)s " % {
'title' : cgi.escape(title)
}
if authors:
out += " / "
for author in authors[:CFG_WEBSEARCH_AUTHOR_ET_AL_THRESHOLD]:
out += '%s; ' % \
create_html_link(self.build_search_url(p=author, f='author', ln=ln),
{}, cgi.escape(author))
if len(authors) > CFG_WEBSEARCH_AUTHOR_ET_AL_THRESHOLD:
out += "et al"
for date in dates:
out += " %s." % cgi.escape(date)
for rn in rns:
out += """ [%(rn)s]""" % {'rn' : cgi.escape(rn)}
for abstract in abstracts:
out += " %(abstract)s [...]" % {'abstract' : cgi.escape(abstract[:1+string.find(abstract, '.')]) }
for idx in range(0, len(urls_u)):
out += """ %(name)s""" % {
'url' : urls_u[idx],
'name' : urls_u[idx]
}
return out
def tmpl_search_in_bibwords(self, p, f, ln, nearest_box):
"""
Displays the *Words like current ones* links for a search
Parameters:
- 'p' *string* - Current search words
- 'f' *string* - the fields in which the search was done
- 'nearest_box' *string* - the HTML code for the "nearest_terms" box - most probably from a create_nearest_terms_box call
"""
# load the right message language
_ = gettext_set_language(ln)
out = '
'
if f:
out += _("Words nearest to %(x_word)s inside %(x_field)s in any collection are:") % {'x_word': '' + cgi.escape(p) + '',
'x_field': '' + cgi.escape(f) + ''}
else:
out += _("Words nearest to %(x_word)s in any collection are:") % {'x_word': '' + cgi.escape(p) + ''}
out += ' ' + nearest_box + '
'
return out
def tmpl_nearest_term_box(self, p, ln, f, terminfo, intro):
"""
Displays the *Nearest search terms* box
Parameters:
- 'p' *string* - Current search words
- 'f' *string* - a collection description (if the search has been completed in a collection)
- 'ln' *string* - The language to display
- 'weburl' *string* - The base URL for the site
- 'terminfo': tuple (term, hits, argd) for each near term
- 'intro' *string* - the intro HTML to prefix the box with
"""
out = '''
'''
for term, hits, argd in terminfo:
if hits:
hitsinfo = str(hits)
else:
hitsinfo = '-'
term = cgi.escape(term)
if term == p: # print search word for orientation:
nearesttermsboxbody_class = "nearesttermsboxbodyselected"
if hits > 0:
term = create_html_link(self.build_search_url(argd), {},
term, {'class': "nearesttermsselected"})
else:
nearesttermsboxbody_class = "nearesttermsboxbody"
term = create_html_link(self.build_search_url(argd), {},
term, {'class': "nearestterms"})
out += '''\
"
def tmpl_browse_pattern(self, f, fn, ln, browsed_phrases_in_colls, colls):
"""
Displays the *Nearest search terms* box
Parameters:
- 'f' *string* - field (*not* i18nized)
- 'fn' *string* - field name (i18nized)
- 'ln' *string* - The language to display
- 'weburl' *string* - The base URL for the site
- 'browsed_phrases_in_colls' *array* - the phrases to display
- 'colls' *array* - the list of collection parameters of the search (c's)
"""
# load the right message language
_ = gettext_set_language(ln)
out = """
%(hits)s
%(fn)s
""" % {
'hits' : _("Hits"),
'fn' : cgi.escape(fn)
}
if len(browsed_phrases_in_colls) == 1:
# one hit only found:
phrase, nbhits = browsed_phrases_in_colls[0][0], browsed_phrases_in_colls[0][1]
query = {'c': colls,
'ln': ln,
'p': phrase,
'f': f}
out += """
%(nbhits)s
%(link)s
""" % {'nbhits': nbhits,
'link': create_html_link(self.build_search_url(query),
{}, cgi.escape(phrase))}
elif len(browsed_phrases_in_colls) > 1:
# first display what was found but the last one:
for phrase, nbhits in browsed_phrases_in_colls[:-1]:
query = {'c': colls,
'ln': ln,
'p': phrase,
'f': f}
out += """
%(nbhits)s
%(link)s
""" % {'nbhits' : nbhits,
'link': create_html_link(self.build_search_url(query),
{},
cgi.escape(phrase))}
# now display last hit as "next term":
phrase, nbhits = browsed_phrases_in_colls[-1]
query = {'c': colls,
'ln': ln,
'p': phrase,
'f': f}
out += """
"""
return out
def tmpl_search_box(self, ln, as, cc, cc_intl, ot, sp,
action, fieldslist, f1, f2, f3, m1, m2, m3,
p1, p2, p3, op1, op2, rm, p, f, coll_selects,
d1y, d2y, d1m, d2m, d1d, d2d, dt, sort_fields,
sf, so, ranks, sc, rg, formats, of, pl, jrec, ec):
"""
Displays the *Nearest search terms* box
Parameters:
- 'ln' *string* - The language to display
- 'weburl' *string* - The base URL for the site
- 'as' *bool* - Should we display an advanced search box?
- 'cc_intl' *string* - the i18nized current collection name
- 'cc' *string* - the internal current collection name
- 'ot', 'sp' *string* - hidden values
- 'action' *string* - the action demanded by the user
- 'fieldslist' *list* - the list of all fields available, for use in select within boxes in advanced search
- 'p, f, f1, f2, f3, m1, m2, m3, p1, p2, p3, op1, op2, op3, rm' *strings* - the search parameters
- 'coll_selects' *array* - a list of lists, each containing the collections selects to display
- 'd1y, d2y, d1m, d2m, d1d, d2d' *int* - the search between dates
- 'dt' *string* - the dates' types (creation dates, modification dates)
- 'sort_fields' *array* - the select information for the sort fields
- 'sf' *string* - the currently selected sort field
- 'so' *string* - the currently selected sort order ("a" or "d")
- 'ranks' *array* - ranking methods
- 'rm' *string* - selected ranking method
- 'sc' *string* - split by collection or not
- 'rg' *string* - selected results/page
- 'formats' *array* - available output formats
- 'of' *string* - the selected output format
- 'pl' *string* - `limit to' search pattern
"""
# load the right message language
_ = gettext_set_language(ln)
# These are hidden fields the user does not manipulate
# directly
argd = drop_default_urlargd({
'ln': ln, 'as': as,
'cc': cc, 'ot': ot, 'sp': sp, 'ec': ec,
}, self.search_results_default_urlargd)
out = '''
%(ccname)s
"""
return out
def tmpl_input_hidden(self, name, value):
"Produces the HTML code for a hidden field "
if isinstance(value, list):
list_input = [self.tmpl_input_hidden(name, val) for val in value]
return "\n".join(list_input)
return """""" % {
'name' : cgi.escape(str(name), 1),
'value' : cgi.escape(str(value), 1),
}
def _add_mark_to_field(self, value, fields, ln, chars = 1):
"""Adds the current value as a MARC tag in the fields array
Useful for advanced search"""
# load the right message language
_ = gettext_set_language(ln)
out = fields
if value and str(value[0:chars]).isdigit():
out.append({'value' : value,
'text' : str(value) + " " + _("MARC tag")
})
return out
def tmpl_search_pagestart(self, ln) :
"page start for search page. Will display after the page header"
return """
"""
def tmpl_search_pageend(self, ln) :
"page end for search page. Will display just before the page footer"
return """
"""
def tmpl_print_warning(self, msg, type, prologue, epilogue):
"""Prints warning message and flushes output.
Parameters:
- 'msg' *string* - The message string
- 'type' *string* - the warning type
- 'prologue' *string* - HTML code to display before the warning
- 'epilogue' *string* - HTML code to display after the warning
"""
out = '\n%s' % (prologue)
if type:
out += '%s: ' % type
out += '%s%s' % (msg, epilogue)
return out
def tmpl_print_search_info(self, ln, weburl, middle_only,
collection, collection_name, collection_id,
as, sf, so, rm, rg, nb_found, of, ot, p, f, f1,
f2, f3, m1, m2, m3, op1, op2, p1, p2,
p3, d1y, d1m, d1d, d2y, d2m, d2d, dt,
all_fieldcodes, cpu_time, pl_in_url,
jrec, sc, sp):
"""Prints stripe with the information on 'collection' and 'nb_found' results and CPU time.
Also, prints navigation links (beg/next/prev/end) inside the results set.
If middle_only is set to 1, it will only print the middle box information (beg/netx/prev/end/etc) links.
This is suitable for displaying navigation links at the bottom of the search results page.
Parameters:
- 'ln' *string* - The language to display
- 'weburl' *string* - The base URL for the site
- 'middle_only' *bool* - Only display parts of the interface
- 'collection' *string* - the collection name
- 'collection_name' *string* - the i18nized current collection name
- 'as' *bool* - if we display the advanced search interface
- 'sf' *string* - the currently selected sort format
- 'so' *string* - the currently selected sort order ("a" or "d")
- 'rm' *string* - selected ranking method
- 'rg' *int* - selected results/page
- 'nb_found' *int* - number of results found
- 'of' *string* - the selected output format
- 'ot' *string* - hidden values
- 'p' *string* - Current search words
- 'f' *string* - the fields in which the search was done
- 'f1, f2, f3, m1, m2, m3, p1, p2, p3, op1, op2' *strings* - the search parameters
- 'jrec' *int* - number of first record on this page
- 'd1y, d2y, d1m, d2m, d1d, d2d' *int* - the search between dates
- 'dt' *string* the dates' type (creation date, modification date)
- 'all_fieldcodes' *array* - all the available fields
- 'cpu_time' *float* - the time of the query in seconds
"""
# load the right message language
_ = gettext_set_language(ln)
out = ""
# left table cells: print collection name
if not middle_only:
out += '''
"
return out
def tmpl_nice_number(self, number, ln=cdslang, thousands_separator=',', max_ndigits_after_dot=None):
"""
Return nicely printed number NUMBER in language LN using
given THOUSANDS_SEPARATOR character.
If max_ndigits_after_dot is specified and the number is float, the
number is rounded by taking in consideration up to max_ndigits_after_dot
digit after the dot.
This version does not pay attention to locale. See
tmpl_nice_number_via_locale().
"""
if type(number) is float:
if max_ndigits_after_dot is not None:
number = round(number, max_ndigits_after_dot)
int_part, frac_part = str(number).split('.')
return '%s.%s' % (self.tmpl_nice_number(int(int_part), ln, thousands_separator), frac_part)
else:
chars_in = list(str(number))
number = len(chars_in)
chars_out = []
for i in range(0, number):
if i % 3 == 0 and i != 0:
chars_out.append(thousands_separator)
chars_out.append(chars_in[number-i-1])
chars_out.reverse()
return ''.join(chars_out)
def tmpl_nice_number_via_locale(self, number, ln=cdslang):
"""
Return nicely printed number NUM in language LN using the locale.
See also version tmpl_nice_number().
"""
if number is None:
return None
# Temporarily switch the numeric locale to the requested one, and format the number
# In case the system has no locale definition, use the vanilla form
ol = locale.getlocale(locale.LC_NUMERIC)
try:
locale.setlocale(locale.LC_NUMERIC, self.tmpl_localemap.get(ln, self.tmpl_default_locale))
except locale.Error:
return str(number)
try:
number = locale.format('%d', number, True)
except TypeError:
return str(number)
locale.setlocale(locale.LC_NUMERIC, ol)
return number
def tmpl_record_format_htmlbrief_header(self, ln):
"""Returns the header of the search results list when output
is html brief. Note that this function is called for each collection
results when 'split by collection' is enabled.
See also: tmpl_record_format_htmlbrief_footer(..),
tmpl_record_format_htmlbrief_body(..)
Parameters:
- 'ln' *string* - The language to display
"""
# load the right message language
_ = gettext_set_language(ln)
out = """
""" % {
'weburl' : weburl,
}
return out
def tmpl_record_format_htmlbrief_footer(self, ln):
"""Returns the footer of the search results list when output
is html brief. Note that this function is called for each collection
results when 'split by collection' is enabled.
See also: tmpl_record_format_htmlbrief_header(..),
tmpl_record_format_htmlbrief_body(..)
Parameters:
- 'ln' *string* - The language to display
"""
# load the right message language
_ = gettext_set_language(ln)
out = """
""" % {
'basket' : _("ADD TO BASKET")
}
return out
def tmpl_record_format_htmlbrief_body(self, ln, recid,
row_number, relevance,
record, relevances_prologue,
relevances_epilogue):
"""Returns the html brief format of one record. Used in the
search results list for each record.
See also: tmpl_record_format_htmlbrief_header(..),
tmpl_record_format_htmlbrief_footer(..)
Parameters:
- 'ln' *string* - The language to display
- 'row_number' *int* - The position of this record in the list
- 'recid' *int* - The recID
- 'relevance' *string* - The relevance of the record
- 'record' *string* - The formatted record
- 'relevances_prologue' *string* - HTML code to prepend the relevance indicator
- 'relevances_epilogue' *string* - HTML code to append to the relevance indicator (used mostly for formatting)
"""
# load the right message language
_ = gettext_set_language(ln)
out = """
%(number)s.
""" % {'recid': recid,
'number': row_number}
if relevance:
out += """
""" % record
return out
def tmpl_print_results_overview(self, ln, weburl, results_final_nb_total, cpu_time, results_final_nb, colls, ec):
"""Prints results overview box with links to particular collections below.
Parameters:
- 'ln' *string* - The language to display
- 'weburl' *string* - The base URL for the site
- 'results_final_nb_total' *int* - The total number of hits for the query
- 'colls' *array* - The collections with hits, in the format:
- 'coll[code]' *string* - The code of the collection (canonical name)
- 'coll[name]' *string* - The display name of the collection
- 'results_final_nb' *array* - The number of hits, indexed by the collection codes:
- 'cpu_time' *string* - The time the query took
- 'url_args' *string* - The rest of the search query
- 'ec' *array* - selected external collections
"""
if len(colls) == 1 and not ec:
# if one collection only and no external collections, print nothing:
return ""
# load the right message language
_ = gettext_set_language(ln)
# first find total number of hits:
out = """
%(founds)s
""" % {
'founds' : _("%(x_fmt_open)sResults overview:%(x_fmt_close)s Found %(x_nb_records)s records in %(x_nb_seconds)s seconds.") %\
{'x_fmt_open': '',
'x_fmt_close': '',
'x_nb_records': '' + self.tmpl_nice_number(results_final_nb_total, ln) + '',
'x_nb_seconds': '%.2f' % cpu_time}
}
# then print hits per collection:
for coll in colls:
if results_final_nb.has_key(coll['code']) and results_final_nb[coll['code']] > 0:
out += '''%(coll_name)s,
%(number)s ''' % {
'coll' : coll['id'],
'coll_name' : cgi.escape(coll['name']),
'number' : _("%s records found") % ('' + self.tmpl_nice_number(results_final_nb[coll['code']], ln) + '')
}
out += "
"
return out
def tmpl_search_no_boolean_hits(self, ln, nearestterms):
"""No hits found, proposes alternative boolean queries
Parameters:
- 'ln' *string* - The language to display
- 'weburl' *string* - The base URL for the site
- 'nearestterms' *array* - Parts of the interface to display, in the format:
- 'nearestterms[nbhits]' *int* - The resulting number of hits
- 'nearestterms[url_args]' *string* - The search parameters
- 'nearestterms[p]' *string* - The search terms
"""
# load the right message language
_ = gettext_set_language(ln)
out = _("Boolean query returned no hits. Please combine your search terms differently.")
out += '''
'''
for term, hits, argd in nearestterms:
out += '''\
"""
return out
def tmpl_similar_author_names(self, authors, ln):
"""No hits found, proposes alternative boolean queries
Parameters:
- 'authors': a list of (name, hits) tuples
- 'ln' *string* - The language to display
"""
# load the right message language
_ = gettext_set_language(ln)
out = '''
%(similar)s
''' % {
'similar' : _("See also: similar author names")
}
for author, hits in authors:
out += '''\
"""
return out
def tmpl_print_record_detailed(self, recID, ln, weburl):
"""Displays a detailed on-the-fly record
Parameters:
- 'ln' *string* - The language to display
- 'weburl' *string* - The base URL for the site
- 'recID' *int* - The record id
"""
# okay, need to construct a simple "Detailed record" format of our own:
out = "
"
# secondly, title:
titles = get_fieldvalues(recID, "245__a")
for title in titles:
out += "
%s
" % cgi.escape(title)
# thirdly, authors:
authors = get_fieldvalues(recID, "100__a") + get_fieldvalues(recID, "700__a")
if authors:
out += "
"
for author in authors:
out += '%s; ' % create_html_link(self.build_search_url(
ln=ln,
p=author,
f='author'),
{}, cgi.escape(author))
out += "
"
# fourthly, date of creation:
dates = get_fieldvalues(recID, "260__c")
for date in dates:
out += "
%s
" % date
# fifthly, abstract:
abstracts = get_fieldvalues(recID, "520__a")
for abstract in abstracts:
out += """
Abstract: %s
""" % abstract
# fifthly bis, keywords:
keywords = get_fieldvalues(recID, "6531_a")
if len(keywords):
out += """
Keyword(s):"""
for keyword in keywords:
out += '%s; ' % create_html_link(
self.build_search_url(ln=ln,
p=keyword,
f='keyword'),
{}, cgi.escape(keyword))
out += '
'
# fifthly bis bis, published in:
prs_p = get_fieldvalues(recID, "909C4p")
prs_v = get_fieldvalues(recID, "909C4v")
prs_y = get_fieldvalues(recID, "909C4y")
prs_n = get_fieldvalues(recID, "909C4n")
prs_c = get_fieldvalues(recID, "909C4c")
for idx in range(0, len(prs_p)):
out += """
Publ. in: %s""" % prs_p[idx]
if prs_v and prs_v[idx]:
out += """%s""" % prs_v[idx]
if prs_y and prs_y[idx]:
out += """(%s)""" % prs_y[idx]
if prs_n and prs_n[idx]:
out += """, no.%s""" % prs_n[idx]
if prs_c and prs_c[idx]:
out += """, p.%s""" % prs_c[idx]
out += """.
"""
# sixthly, fulltext link:
urls_z = get_fieldvalues(recID, "8564_z")
urls_u = get_fieldvalues(recID, "8564_u")
for idx in range(0, len(urls_u)):
link_text = "URL"
try:
if urls_z[idx]:
link_text = urls_z[idx]
except IndexError:
pass
out += """
""" % (link_text, urls_u[idx], urls_u[idx])
# print some white space at the end:
out += "
"
return out
def tmpl_print_record_list_for_similarity_boxen(self, title, recID_score_list, ln=cdslang):
"""Print list of records in the "hs" (HTML Similarity) format for similarity boxes.
RECID_SCORE_LIST is a list of (recID1, score1), (recID2, score2), etc.
"""
from invenio.search_engine import print_record, record_public_p
recID_score_list_to_be_printed = []
# firstly find 5 first public records to print:
nb_records_to_be_printed = 0
nb_records_seen = 0
while nb_records_to_be_printed < 5 and nb_records_seen < len(recID_score_list) and nb_records_seen < 50:
# looking through first 50 records only, picking first 5 public ones
(recID, score) = recID_score_list[nb_records_seen]
nb_records_seen += 1
if record_public_p(recID):
nb_records_to_be_printed += 1
recID_score_list_to_be_printed.append([recID, score])
# secondly print them:
out = '''
%(title)s
''' % { 'title': cgi.escape(title) }
for recid, score in recID_score_list_to_be_printed:
out += '''
"""
return out
def tmpl_print_record_brief(self, ln, recID, weburl):
"""Displays a brief record on-the-fly
Parameters:
- 'ln' *string* - The language to display
- 'weburl' *string* - The base URL for the site
- 'recID' *int* - The record id
"""
out = ""
# record 'recID' does not exist in format 'format', so print some default format:
# firstly, title:
titles = get_fieldvalues(recID, "245__a")
# secondly, authors:
authors = get_fieldvalues(recID, "100__a") + get_fieldvalues(recID, "700__a")
# thirdly, date of creation:
dates = get_fieldvalues(recID, "260__c")
# thirdly bis, report numbers:
rns = get_fieldvalues(recID, "037__a")
rns = get_fieldvalues(recID, "088__a")
# fourthly, beginning of abstract:
abstracts = get_fieldvalues(recID, "520__a")
# fifthly, fulltext link:
urls_z = get_fieldvalues(recID, "8564_z")
urls_u = get_fieldvalues(recID, "8564_u")
return self.tmpl_record_body(
weburl = weburl,
titles = titles,
authors = authors,
dates = dates,
rns = rns,
abstracts = abstracts,
urls_u = urls_u,
urls_z = urls_z,
ln=ln)
def tmpl_print_record_brief_links(self, ln, recID, weburl):
"""Displays links for brief record on-the-fly
Parameters:
- 'ln' *string* - The language to display
- 'weburl' *string* - The base URL for the site
- 'recID' *int* - The record id
"""
# load the right message language
_ = gettext_set_language(ln)
out = ""
if CFG_WEBSEARCH_USE_ALEPH_SYSNOS:
alephsysnos = get_fieldvalues(recID, "970__a")
if len(alephsysnos)>0:
alephsysno = alephsysnos[0]
out += ' %s' % \
create_html_link(self.build_search_url(sysno=alephsysno,
ln=ln),
{}, _("Detailed record"),
{'class': "moreinfo"})
else:
out += ' %s' % \
create_html_link(self.build_search_url(recid=recID, ln=ln),
{},
_("Detailed record"),
{'class': "moreinfo"})
else:
out += ' %s' % \
create_html_link(self.build_search_url(recid=recID, ln=ln),
{}, _("Detailed record"),
{'class': "moreinfo"})
out += ' - %s' % \
create_html_link(self.build_search_url(p="recid:%d" % recID,
rm="wrd",
ln=ln),
{}, _("Similar records"),
{'class': "moreinfo"})
if CFG_BIBRANK_SHOW_CITATION_LINKS:
out += ' - %s' % \
create_html_link(self.build_search_url(p="recid:%d" % recID,
rm="citation",
ln=ln),
{}, _("Cited by"),
{'class': "moreinfo"})
return out
def tmpl_xml_rss_prologue(self):
"""Creates XML RSS 2.0 prologue."""
out = """%(cdsname)s
%(weburl)s
%(cdsname)s latest documents%(cdslang)s%(timestamp)sCDS Invenio %(version)s%(supportemail)s%(timetolive)s%(weburl)s/img/cds.png%(cdsname)s
%(weburl)s
Search Search this site:p
%(weburl)s/search
""" % {'cdsname': cdsname,
'weburl': weburl,
'cdslang': cdslang,
'timestamp': time.strftime("%a, %d %b %Y %H:%M:%S %Z", time.localtime()),
- 'version': version,
+ 'version': CFG_VERSION,
'supportemail': supportemail,
'timetolive': CFG_WEBSEARCH_RSS_TTL
}
return out
def tmpl_xml_rss_epilogue(self):
"""Creates XML RSS 2.0 epilogue."""
out = """\
\n"""
return out
def tmpl_xml_nlm_prologue(self):
"""Creates XML NLM prologue."""
out = """\n"""
return out
def tmpl_xml_nlm_epilogue(self):
"""Creates XML NLM epilogue."""
out = """\n"""
return out
def tmpl_xml_marc_prologue(self):
"""Creates XML MARC prologue."""
out = """\n"""
return out
def tmpl_xml_marc_epilogue(self):
"""Creates XML MARC epilogue."""
out = """\n"""
return out
def tmpl_xml_default_prologue(self):
"""Creates XML default format prologue. (Sanity calls only.)"""
out = """\n"""
return out
def tmpl_xml_default_epilogue(self):
"""Creates XML default format epilogue. (Sanity calls only.)"""
out = """\n"""
return out
def tmpl_collection_not_found_page_title(self, colname, ln=cdslang):
"""
Create page title for cases when unexisting collection was asked for.
"""
_ = gettext_set_language(ln)
out = _("Collection %s Not Found") % cgi.escape(colname)
return out
def tmpl_collection_not_found_page_body(self, colname, ln=cdslang):
"""
Create page body for cases when unexisting collection was asked for.
"""
_ = gettext_set_language(ln)
out = """
%(title)s
%(sorry)s
%(you_may_want)s
""" % { 'title': self.tmpl_collection_not_found_page_title(colname, ln),
'sorry': _("Sorry, collection %s does not seem to exist.") % \
('' + cgi.escape(colname) + ''),
'you_may_want': _("You may want to start browsing from %s.") % \
('' + \
cgi.escape(cdsnameintl.get(ln, cdsname)) + '')}
return out
def tmpl_alert_rss_teaser_box_for_query(self, id_query, ln):
"""Propose teaser for setting up this query as alert or RSS feed.
Parameters:
- 'id_query' *int* - ID of the query we make teaser for
- 'ln' *string* - The language to display
"""
# load the right message language
_ = gettext_set_language(ln)
# get query arguments:
res = run_sql("SELECT urlargs FROM query WHERE id=%s", (id_query,))
argd = {}
if res:
argd = cgi.parse_qs(res[0][0])
rssurl = self.build_rss_url(argd)
alerturl = weburl + '/youralerts/input?ln=%s&idq=%s' % (ln, id_query)
out = '''
%(similar)s
%(msg_alert)s
''' % {
'similar' : _("Interested in being notified about new results for this query?"),
'msg_alert': _("""Set up a personal %(x_url1_open)semail alert%(x_url1_close)s
or subscribe to the %(x_url2_open)sRSS feed%(x_url2_close)s.""") % \
{'x_url1_open': ' ' % (alerturl, weburl) + ' ' % (alerturl),
'x_url1_close': '',
'x_url2_open': ' ' % (rssurl, weburl) + ' ' % rssurl,
'x_url2_close': '',
}}
return out
def tmpl_detailed_record_metadata(self, recID, ln, format,
content,
creationdate=None,
modifydate=None):
"""Returns the main detailed page of a record
Parameters:
- 'recID' *int* - The ID of the printed record
- 'ln' *string* - The language to display
- 'format' *string* - The format in used to print the record
- 'content' *string* - The main content of the page
- 'creationdate' *string* - The creation date of the printed record
- 'modifydate' *string* - The last modification date of the printed record
"""
_ = gettext_set_language(ln)
out = content
return out
def tmpl_detailed_record_statistics(self, recID, ln,
downloadsimilarity,
downloadhistory, viewsimilarity):
"""Returns the statistics page of a record
Parameters:
- 'recID' *int* - The ID of the printed record
- 'ln' *string* - The language to display
- downloadsimilarity *string* - downloadsimilarity box
- downloadhistory *string* - downloadhistory box
- viewsimilarity *string* - viewsimilarity box
"""
# load the right message language
_ = gettext_set_language(ln)
out = ''
if CFG_BIBRANK_SHOW_DOWNLOAD_STATS and downloadsimilarity is not None:
similar = self.tmpl_print_record_list_for_similarity_boxen (
_("People who downloaded this document also downloaded:"), downloadsimilarity, ln)
out = '
'
out += ' '
if CFG_BIBRANK_SHOW_READING_STATS and viewsimilarity is not None:
out += self.tmpl_print_record_list_for_similarity_boxen (
_("People who viewed this page also viewed:"), viewsimilarity, ln)
if CFG_BIBRANK_SHOW_DOWNLOAD_GRAPHS and downloadhistory is not None:
out += downloadhistory + ' '
return out
def tmpl_detailed_record_citations(self, recID, ln,
citinglist, citationhistory,
cociting,selfcited):
"""Returns the citations page of a record
Parameters:
- 'recID' *int* - The ID of the printed record
- 'ln' *string* - The language to display
- citinglist *list* - a list of tuples [(x1,y1),(x2,y2),..] where x is doc id and y is number of citations
- citationhistory *string* - citationhistory box
- cociting *string* - cociting box
- selfcited list - a list of self-citations for recID
"""
# load the right message language
_ = gettext_set_language(ln)
out = '
'
if CFG_BIBRANK_SHOW_CITATION_STATS and citinglist is not None:
similar = self.tmpl_print_record_list_for_similarity_boxen(
_("Cited by: %s records") % len (citinglist), citinglist, ln)
out += '''
%(similar)s %(more)s
''' % {
'more': create_html_link(
self.build_search_url(p='recid:%d' % \
recID, #XXXX
rm='citation', ln=ln),
{}, _("more")),
'similar': similar}
if CFG_BIBRANK_SHOW_CITATION_GRAPHS and selfcited is not None:
sc_scorelist = [] #a score list for print..
for s in selfcited:
#copy weight from citations
weight = 0
for c in citinglist:
(crec,score) = c
if crec == s:
weight = score
tmp = [s,weight]
sc_scorelist.append(tmp)
scite = self.tmpl_print_record_list_for_similarity_boxen (
_(".. of which self-citations: %s records") % len (selfcited), sc_scorelist, ln)
out += '
'+scite+'
'
if CFG_BIBRANK_SHOW_CITATION_STATS and cociting is not None:
similar = self.tmpl_print_record_list_for_similarity_boxen (
_("Co-cited with: %s records") % len (cociting), cociting, ln)
out += '''
%(similar)s %(more)s
''' % { 'more': create_html_link(self.build_search_url(p='cocitedwith:%d' % recID, ln=ln),
{}, _("more")),
'similar': similar}
if CFG_BIBRANK_SHOW_CITATION_GRAPHS and citationhistory is not None:
out += '
%s
' % citationhistory
out += '
'
return out
def tmpl_detailed_record_references(self, recID, ln, content):
"""Returns the discussion page of a record
Parameters:
- 'recID' *int* - The ID of the printed record
- 'ln' *string* - The language to display
- 'content' *string* - The main content of the page
"""
# load the right message language
_ = gettext_set_language(ln)
out = ''
if content is not None:
out += content
return out
diff --git a/modules/websearch/lib/websearch_webcoll.py b/modules/websearch/lib/websearch_webcoll.py
index 6b032efe3..068cca312 100644
--- a/modules/websearch/lib/websearch_webcoll.py
+++ b/modules/websearch/lib/websearch_webcoll.py
@@ -1,890 +1,890 @@
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""Create CDS Invenio collection cache."""
__revision__ = "$Id$"
import calendar
import copy
import sys
import cgi
import re
import os
import string
import time
from invenio.config import \
CFG_CERN_SITE, \
CFG_WEBSEARCH_INSTANT_BROWSE, \
CFG_WEBSEARCH_NARROW_SEARCH_SHOW_GRANDSONS, \
CFG_WEBSEARCH_I18N_LATEST_ADDITIONS, \
- cachedir, \
+ CFG_CACHEDIR, \
cdslang, \
cdsname, \
weburl
from invenio.messages import gettext_set_language, language_list_long
from invenio.search_engine import HitSet, search_pattern, get_creation_date, get_field_i18nname, collection_restricted_p
from invenio.dbquery import run_sql, Error, get_table_update_time
from invenio.access_control_engine import acc_authorize_action
from invenio.bibrank_record_sorter import get_bibrank_methods
from invenio.dateutils import convert_datestruct_to_dategui
from invenio.bibformat import format_record
from invenio.websearch_external_collections import \
external_collection_load_states, \
dico_collection_external_searches, \
external_collection_sort_engine_by_name
from invenio.bibtask import task_init, task_get_option, task_set_option, \
write_message, task_has_option, task_update_progress
import invenio.template
websearch_templates = invenio.template.load('websearch')
## global vars
collection_house = {} # will hold collections we treat in this run of the program; a dict of {collname2, collobject1}, ...
# cfg_cache_last_updated_timestamp_tolerance -- cache timestamp
# tolerance (in seconds), to account for the fact that an admin might
# accidentally happen to edit the collection definitions at exactly
# the same second when some webcoll process was about to be started.
# In order to be safe, let's put an exaggerated timestamp tolerance
# value such as 20 seconds:
cfg_cache_last_updated_timestamp_tolerance = 20
# cfg_cache_last_updated_timestamp_file -- location of the cache
# timestamp file:
-cfg_cache_last_updated_timestamp_file = "%s/collections/last_updated" % cachedir
+cfg_cache_last_updated_timestamp_file = "%s/collections/last_updated" % CFG_CACHEDIR
def get_collection(colname):
"""Return collection object from the collection house for given colname.
If does not exist, then create it."""
if not collection_house.has_key(colname):
colobject = Collection(colname)
collection_house[colname] = colobject
return collection_house[colname]
## auxiliary functions:
def mymkdir(newdir, mode=0777):
"""works the way a good mkdir should :)
- already exists, silently complete
- regular file in the way, raise an exception
- parent directory(ies) does not exist, make them as well
"""
if os.path.isdir(newdir):
pass
elif os.path.isfile(newdir):
raise OSError("a file with the same name as the desired " \
"dir, '%s', already exists." % newdir)
else:
head, tail = os.path.split(newdir)
if head and not os.path.isdir(head):
mymkdir(head, mode)
if tail:
os.umask(022)
os.mkdir(newdir, mode)
def is_selected(var, fld):
"Checks if the two are equal, and if yes, returns ' selected'. Useful for select boxes."
if var == fld:
return ' selected="selected"'
else:
return ""
def get_field(recID, tag):
"Gets list of field 'tag' for the record with 'recID' system number."
out = []
digit = tag[0:2]
bx = "bib%sx" % digit
bibx = "bibrec_bib%sx" % digit
query = "SELECT bx.value FROM %s AS bx, %s AS bibx WHERE bibx.id_bibrec='%s' AND bx.id=bibx.id_bibxxx AND bx.tag='%s'" \
% (bx, bibx, recID, tag)
res = run_sql(query)
for row in res:
out.append(row[0])
return out
class Collection:
"Holds the information on collections (id,name,dbquery)."
def __init__(self, name=""):
"Creates collection instance by querying the DB configuration database about 'name'."
self.calculate_reclist_run_already = 0 # to speed things up without much refactoring
self.update_reclist_run_already = 0 # to speed things up without much refactoring
self.reclist_with_nonpublic_subcolls = HitSet()
if not name:
self.name = cdsname # by default we are working on the home page
self.id = 1
self.dbquery = None
self.nbrecs = None
self.reclist = HitSet()
else:
self.name = name
try:
res = run_sql("""SELECT id,name,dbquery,nbrecs,reclist FROM collection
WHERE name=%s""", (name,))
if res:
self.id = res[0][0]
self.name = res[0][1]
self.dbquery = res[0][2]
self.nbrecs = res[0][3]
try:
self.reclist = HitSet(res[0][4])
except:
self.reclist = HitSet()
else: # collection does not exist!
self.id = None
self.dbquery = None
self.nbrecs = None
self.reclist = HitSet()
except Error, e:
print "Error %d: %s" % (e.args[0], e.args[1])
sys.exit(1)
def get_name(self, ln=cdslang, name_type="ln", prolog="", epilog="", prolog_suffix=" ", epilog_suffix=""):
"""Return nicely formatted collection name for language LN.
The NAME_TYPE may be 'ln' (=long name), 'sn' (=short name), etc."""
out = prolog
i18name = ""
res = run_sql("SELECT value FROM collectionname WHERE id_collection=%s AND ln=%s AND type=%s", (self.id, ln, name_type))
try:
i18name += res[0][0]
except IndexError:
pass
if i18name:
out += i18name
else:
out += self.name
out += epilog
return out
def get_ancestors(self):
"Returns list of ancestors of the current collection."
ancestors = []
id_son = self.id
while 1:
query = "SELECT cc.id_dad,c.name FROM collection_collection AS cc, collection AS c "\
"WHERE cc.id_son=%d AND c.id=cc.id_dad" % int(id_son)
res = run_sql(query, None, 1)
if res:
col_ancestor = get_collection(res[0][1])
ancestors.append(col_ancestor)
id_son = res[0][0]
else:
break
ancestors.reverse()
return ancestors
def restricted_p(self):
"""Predicate to test if the collection is restricted or not. Return the contect of the
`restrited' column of the collection table (typically Apache group). Otherwise return
None if the collection is public."""
if collection_restricted_p(self.name):
return 1
return None
def get_sons(self, type='r'):
"Returns list of direct sons of type 'type' for the current collection."
sons = []
id_dad = self.id
query = "SELECT cc.id_son,c.name FROM collection_collection AS cc, collection AS c "\
"WHERE cc.id_dad=%d AND cc.type='%s' AND c.id=cc.id_son ORDER BY score DESC, c.name ASC" % (int(id_dad), type)
res = run_sql(query)
for row in res:
sons.append(get_collection(row[1]))
return sons
def get_descendants(self, type='r'):
"Returns list of all descendants of type 'type' for the current collection."
descendants = []
id_dad = self.id
query = "SELECT cc.id_son,c.name FROM collection_collection AS cc, collection AS c "\
"WHERE cc.id_dad=%d AND cc.type='%s' AND c.id=cc.id_son ORDER BY score DESC" % (int(id_dad), type)
res = run_sql(query)
for row in res:
col_desc = get_collection(row[1])
descendants.append(col_desc)
descendants += col_desc.get_descendants()
return descendants
def write_cache_file(self, filename='', filebody=''):
"Write a file inside collection cache."
# open file:
- dirname = "%s/collections/%d" % (cachedir, self.id)
+ dirname = "%s/collections/%d" % (CFG_CACHEDIR, self.id)
mymkdir(dirname)
fullfilename = dirname + "/%s.html" % filename
try:
os.umask(022)
f = open(fullfilename, "w")
except IOError, v:
try:
(code, message) = v
except:
code = 0
message = v
print "I/O Error: " + str(message) + " (" + str(code) + ")"
sys.exit(1)
# print user info:
write_message("... creating %s" % fullfilename, verbose=6)
sys.stdout.flush()
# print page body:
f.write(filebody)
# close file:
f.close()
def update_webpage_cache(self):
"""Create collection page header, navtrail, body (including left and right stripes) and footer, and
call write_cache_file() afterwards to update the collection webpage cache."""
## precalculate latest additions for non-aggregate
## collections (the info is ln and as independent)
if self.dbquery and not CFG_WEBSEARCH_I18N_LATEST_ADDITIONS:
self.create_latest_additions_info()
## do this for each language:
for lang, lang_fullname in language_list_long():
# but only if some concrete language was not chosen only:
if task_get_option("language", lang) == lang:
if self.dbquery and CFG_WEBSEARCH_I18N_LATEST_ADDITIONS:
self.create_latest_additions_info(ln=lang)
# load the right message language
_ = gettext_set_language(lang)
## first, update navtrail:
for as in range(0, 2):
self.write_cache_file("navtrail-as=%s-ln=%s" % (as, lang),
self.create_navtrail_links(as, lang))
## second, update page body:
for as in range(0, 2): # do both simple search and advanced search pages:
body = websearch_templates.tmpl_webcoll_body(
ln=lang, collection=self.name,
te_portalbox = self.create_portalbox(lang, 'te'),
searchfor = self.create_searchfor(as, lang),
np_portalbox = self.create_portalbox(lang, 'np'),
narrowsearch = self.create_narrowsearch(as, lang, 'r'),
focuson = self.create_narrowsearch(as, lang, "v") + \
self.create_external_collections_box(),
instantbrowse = self.create_instant_browse(as=as, ln=lang),
ne_portalbox = self.create_portalbox(lang, 'ne')
)
self.write_cache_file("body-as=%s-ln=%s" % (as, lang), body)
## third, write portalboxes:
self.write_cache_file("portalbox-tp-ln=%s" % lang, self.create_portalbox(lang, "tp"))
self.write_cache_file("portalbox-te-ln=%s" % lang, self.create_portalbox(lang, "te"))
self.write_cache_file("portalbox-lt-ln=%s" % lang, self.create_portalbox(lang, "lt"))
self.write_cache_file("portalbox-rt-ln=%s" % lang, self.create_portalbox(lang, "rt"))
## fourth, write 'last updated' information:
self.write_cache_file("last-updated-ln=%s" % lang,
convert_datestruct_to_dategui(time.localtime(),
ln=lang))
return
def create_navtrail_links(self, as=0, ln=cdslang):
"""Creates navigation trail links, i.e. links to collection
ancestors (except Home collection). If as==1, then links to
Advanced Search interfaces; otherwise Simple Search.
"""
dads = []
for dad in self.get_ancestors():
if dad.name != cdsname: # exclude Home collection
dads.append((dad.name, dad.get_name(ln)))
return websearch_templates.tmpl_navtrail_links(
as=as, ln=ln, dads=dads)
def create_portalbox(self, lang=cdslang, position="rt"):
"""Creates portalboxes of language CDSLANG of the position POSITION by consulting DB configuration database.
The position may be: 'lt'='left top', 'rt'='right top', etc."""
out = ""
query = "SELECT p.title,p.body FROM portalbox AS p, collection_portalbox AS cp "\
" WHERE cp.id_collection=%d AND p.id=cp.id_portalbox AND cp.ln='%s' AND cp.position='%s' "\
" ORDER BY cp.score DESC" % (self.id, lang, position)
res = run_sql(query)
for row in res:
title, body = row[0], row[1]
if title:
out += websearch_templates.tmpl_portalbox(title = title,
body = body)
else:
# no title specified, so print body ``as is'' only:
out += body
return out
def create_narrowsearch(self, as=0, ln=cdslang, type="r"):
"""Creates list of collection descendants of type 'type' under title 'title'.
If as==1, then links to Advanced Search interfaces; otherwise Simple Search.
Suitable for 'Narrow search' and 'Focus on' boxes."""
# get list of sons and analyse it
sons = self.get_sons(type)
if not sons:
return ''
# get descendents
descendants = self.get_descendants(type)
grandsons = []
if CFG_WEBSEARCH_NARROW_SEARCH_SHOW_GRANDSONS:
# load grandsons for each son
for son in sons:
grandsons.append(son.get_sons())
# return ""
return websearch_templates.tmpl_narrowsearch(
as = as,
ln = ln,
type = type,
father = self,
has_grandchildren = len(descendants)>len(sons),
sons = sons,
display_grandsons = CFG_WEBSEARCH_NARROW_SEARCH_SHOW_GRANDSONS,
grandsons = grandsons
)
def create_external_collections_box(self, ln=cdslang):
external_collection_load_states()
if not dico_collection_external_searches.has_key(self.id):
return ""
engines_list = external_collection_sort_engine_by_name(dico_collection_external_searches[self.id])
return websearch_templates.tmpl_searchalso(ln, engines_list, self.id)
def create_latest_additions_info(self, rg=CFG_WEBSEARCH_INSTANT_BROWSE, ln=cdslang):
"""
Create info about latest additions that will be used for
create_instant_browse() later.
"""
self.latest_additions_info = []
if self.nbrecs and self.reclist:
# firstly, get last 'rg' records:
recIDs = list(self.reclist)
# FIXME: temporary hack in order to tweak latest additions
# list for some CERN collections:
if CFG_CERN_SITE and self.name in ['CERN Yellow Reports']:
# detect recIDs only from the current year:
recIDs = list(self.reclist & \
search_pattern(p='year:' + time.strftime("%Y", time.localtime())))
total = len(recIDs)
to_display = min(rg, total)
for idx in range(total-1, total-to_display-1, -1):
recid = recIDs[idx]
self.latest_additions_info.append({'id': recid,
'format': format_record(recid, "hb", ln=ln),
'date': get_creation_date(recid, fmt="%Y-%m-%d %H:%i")})
return
def create_instant_browse(self, rg=CFG_WEBSEARCH_INSTANT_BROWSE, as=0, ln=cdslang):
"Searches database and produces list of last 'rg' records."
if self.restricted_p():
return websearch_templates.tmpl_box_restricted_content(ln = ln)
# FIXME: temporary hack in order not to display latest
# additions box for some CERN collections:
if CFG_CERN_SITE and self.name in ['Periodicals', 'Electronic Journals']:
return ""
try:
self.latest_additions_info
latest_additions_info_p = True
except:
latest_additions_info_p = False
if latest_additions_info_p:
passIDs = []
for idx in range(0, min(len(self.latest_additions_info), rg)):
passIDs.append({'id': self.latest_additions_info[idx]['id'],
'body': self.latest_additions_info[idx]['format'] + \
websearch_templates.tmpl_record_links(weburl=weburl,
recid=self.latest_additions_info[idx]['id'],
ln=ln),
'date': self.latest_additions_info[idx]['date']})
if self.nbrecs > rg:
url = websearch_templates.build_search_url(
cc=self.name, jrec=rg+1, ln=ln, as=as)
else:
url = ""
return websearch_templates.tmpl_instant_browse(
as=as, ln=ln, recids=passIDs, more_link=url)
return websearch_templates.tmpl_box_no_records(ln=ln)
def create_searchoptions(self):
"Produces 'Search options' portal box."
box = ""
query = """SELECT DISTINCT(cff.id_field),f.code,f.name FROM collection_field_fieldvalue AS cff, field AS f
WHERE cff.id_collection=%d AND cff.id_fieldvalue IS NOT NULL AND cff.id_field=f.id
ORDER BY cff.score DESC""" % self.id
res = run_sql(query)
if res:
for row in res:
field_id = row[0]
field_code = row[1]
field_name = row[2]
query_bis = """SELECT fv.value,fv.name FROM fieldvalue AS fv, collection_field_fieldvalue AS cff
WHERE cff.id_collection=%d AND cff.type='seo' AND cff.id_field=%d AND fv.id=cff.id_fieldvalue
ORDER BY cff.score_fieldvalue DESC, cff.score DESC, fv.name ASC""" % (self.id, field_id)
res_bis = run_sql(query_bis)
if res_bis:
values = [{'value' : '', 'text' : 'any' + field_name}] # FIXME: internationalisation of "any"
for row_bis in res_bis:
values.append({'value' : cgi.escape(row_bis[0], 1), 'text' : row_bis[1]})
box += websearch_templates.tmpl_select(
fieldname = field_code,
values = values
)
return box
def create_sortoptions(self, ln=cdslang):
"""Produces 'Sort options' portal box."""
# load the right message language
_ = gettext_set_language(ln)
box = ""
query = """SELECT f.code,f.name FROM field AS f, collection_field_fieldvalue AS cff
WHERE id_collection=%d AND cff.type='soo' AND cff.id_field=f.id
ORDER BY cff.score DESC, f.name ASC""" % self.id
values = [{'value' : '', 'text': "- %s -" % _("latest first")}]
res = run_sql(query)
if res:
for row in res:
values.append({'value' : row[0], 'text': row[1]})
else:
for tmp in ('title', 'author', 'report number', 'year'):
values.append({'value' : tmp.replace(' ', ''), 'text' : get_field_i18nname(tmp, ln)})
box = websearch_templates.tmpl_select(
fieldname = 'sf',
css_class = 'address',
values = values
)
box += websearch_templates.tmpl_select(
fieldname = 'so',
css_class = 'address',
values = [
{'value' : 'a' , 'text' : _("asc.")},
{'value' : 'd' , 'text' : _("desc.")}
]
)
return box
def create_rankoptions(self, ln=cdslang):
"Produces 'Rank options' portal box."
# load the right message language
_ = gettext_set_language(ln)
values = [{'value' : '', 'text': "- %s %s -" % (string.lower(_("OR")), _("rank by"))}]
for (code, name) in get_bibrank_methods(self.id, ln):
values.append({'value' : code, 'text': name})
box = websearch_templates.tmpl_select(
fieldname = 'sf',
css_class = 'address',
values = values
)
return box
def create_displayoptions(self, ln=cdslang):
"Produces 'Display options' portal box."
# load the right message language
_ = gettext_set_language(ln)
values = []
for i in ['10', '25', '50', '100', '250', '500']:
values.append({'value' : i, 'text' : i + ' ' + _("results")})
box = websearch_templates.tmpl_select(
fieldname = 'rg',
css_class = 'address',
values = values
)
if self.get_sons():
box += websearch_templates.tmpl_select(
fieldname = 'sc',
css_class = 'address',
values = [
{'value' : '1' , 'text' : _("split by collection")},
{'value' : '0' , 'text' : _("single list")}
]
)
return box
def create_formatoptions(self, ln=cdslang):
"Produces 'Output format options' portal box."
# load the right message language
_ = gettext_set_language(ln)
box = ""
values = []
query = """SELECT f.code,f.name FROM format AS f, collection_format AS cf
WHERE cf.id_collection=%d AND cf.id_format=f.id AND f.visibility='1'
ORDER BY cf.score DESC, f.name ASC""" % self.id
res = run_sql(query)
if res:
for row in res:
values.append({'value' : row[0], 'text': row[1]})
else:
values.append({'value' : 'hb', 'text' : "HTML %s" % _("brief")})
box = websearch_templates.tmpl_select(
fieldname = 'of',
css_class = 'address',
values = values
)
return box
def create_searchwithin_selection_box(self, fieldname='f', value='', ln='en'):
"""Produces 'search within' selection box for the current collection."""
# get values
query = """SELECT f.code,f.name FROM field AS f, collection_field_fieldvalue AS cff
WHERE cff.type='sew' AND cff.id_collection=%d AND cff.id_field=f.id
ORDER BY cff.score DESC, f.name ASC""" % self.id
res = run_sql(query)
values = [{'value' : '', 'text' : get_field_i18nname("any field", ln)}]
if res:
for row in res:
values.append({'value' : row[0], 'text' : row[1]})
else:
if CFG_CERN_SITE:
for tmp in ['title', 'author', 'abstract', 'report number', 'year']:
values.append({'value' : tmp.replace(' ', ''), 'text' : get_field_i18nname(tmp, ln)})
else:
for tmp in ['title', 'author', 'abstract', 'keyword', 'report number', 'year', 'fulltext', 'reference']:
values.append({'value' : tmp.replace(' ', ''), 'text' : get_field_i18nname(tmp, ln)})
return websearch_templates.tmpl_searchwithin_select(
fieldname = fieldname,
ln = ln,
selected = value,
values = values
)
def create_searchexample(self):
"Produces search example(s) for the current collection."
out = "$collSearchExamples = getSearchExample(%d, $se);" % self.id
return out
def create_searchfor(self, as=0, ln=cdslang):
"Produces either Simple or Advanced 'Search for' box for the current collection."
if as == 1:
return self.create_searchfor_advanced(ln)
else:
return self.create_searchfor_simple(ln)
def create_searchfor_simple(self, ln=cdslang):
"Produces simple 'Search for' box for the current collection."
return websearch_templates.tmpl_searchfor_simple(
ln=ln,
collection_id = self.name,
collection_name=self.get_name(ln=ln),
record_count=self.nbrecs,
middle_option = self.create_searchwithin_selection_box(ln=ln),
)
def create_searchfor_advanced(self, ln=cdslang):
"Produces advanced 'Search for' box for the current collection."
return websearch_templates.tmpl_searchfor_advanced(
ln = ln,
collection_id = self.name,
collection_name=self.get_name(ln=ln),
record_count=self.nbrecs,
middle_option_1 = self.create_searchwithin_selection_box('f1', ln=ln),
middle_option_2 = self.create_searchwithin_selection_box('f2', ln=ln),
middle_option_3 = self.create_searchwithin_selection_box('f3', ln=ln),
searchoptions = self.create_searchoptions(),
sortoptions = self.create_sortoptions(ln),
rankoptions = self.create_rankoptions(ln),
displayoptions = self.create_displayoptions(ln),
formatoptions = self.create_formatoptions(ln)
)
def calculate_reclist(self):
"""Calculate, set and return the (reclist, reclist_with_nonpublic_subcolls) tuple for given collection."""
if self.calculate_reclist_run_already:
# do we have to recalculate?
return (self.reclist, self.reclist_with_nonpublic_subcolls)
write_message("... calculating reclist of %s" % self.name, verbose=6)
reclist = HitSet() # will hold results for public sons only; good for storing into DB
reclist_with_nonpublic_subcolls = HitSet() # will hold results for both public and nonpublic sons; good for deducing total
# number of documents
if not self.dbquery:
# A - collection does not have dbquery, so query recursively all its sons
# that are either non-restricted or that have the same restriction rules
for coll in self.get_sons():
coll_reclist, coll_reclist_with_nonpublic_subcolls = coll.calculate_reclist()
if ((coll.restricted_p() is None) or
(coll.restricted_p() == self.restricted_p())):
# add this reclist ``for real'' only if it is public
reclist.union_update(coll_reclist)
reclist_with_nonpublic_subcolls.union_update(coll_reclist_with_nonpublic_subcolls)
else:
# B - collection does have dbquery, so compute it:
# (note: explicitly remove DELETED records)
if CFG_CERN_SITE:
reclist = search_pattern(None, self.dbquery + \
' -collection:"DELETED" -collection:"DUMMY"')
else:
reclist = search_pattern(None, self.dbquery + ' -collection:"DELETED"')
reclist_with_nonpublic_subcolls = copy.deepcopy(reclist)
# store the results:
self.nbrecs = len(reclist_with_nonpublic_subcolls)
self.reclist = reclist
self.reclist_with_nonpublic_subcolls = reclist_with_nonpublic_subcolls
# last but not least, update the speed-up flag:
self.calculate_reclist_run_already = 1
# return the two sets:
return (self.reclist, self.reclist_with_nonpublic_subcolls)
def update_reclist(self):
"Update the record universe for given collection; nbrecs, reclist of the collection table."
if self.update_reclist_run_already:
# do we have to reupdate?
return 0
write_message("... updating reclist of %s (%s recs)" % (self.name, self.nbrecs), verbose=6)
sys.stdout.flush()
try:
run_sql("UPDATE collection SET nbrecs=%s, reclist=%s WHERE id=%s",
(self.nbrecs, self.reclist.fastdump(), self.id))
self.reclist_updated_since_start = 1
except Error, e:
print "Database Query Error %d: %s." % (e.args[0], e.args[1])
sys.exit(1)
# last but not least, update the speed-up flag:
self.update_reclist_run_already = 1
return 0
def get_datetime(var, format_string="%Y-%m-%d %H:%M:%S"):
"""Returns a date string according to the format string.
It can handle normal date strings and shifts with respect
to now."""
date = time.time()
shift_re = re.compile("([-\+]{0,1})([\d]+)([dhms])")
factors = {"d":24*3600, "h":3600, "m":60, "s":1}
m = shift_re.match(var)
if m:
sign = m.groups()[0] == "-" and -1 or 1
factor = factors[m.groups()[2]]
value = float(m.groups()[1])
date = time.localtime(date + sign * factor * value)
date = time.strftime(format_string, date)
else:
date = time.strptime(var, format_string)
date = time.strftime(format_string, date)
return date
def get_current_time_timestamp():
"""Return timestamp corresponding to the current time."""
return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
def compare_timestamps_with_tolerance(timestamp1,
timestamp2,
tolerance=0):
"""Compare two timestamps TIMESTAMP1 and TIMESTAMP2, of the form
'2005-03-31 17:37:26'. Optionally receives a TOLERANCE argument
(in seconds). Return -1 if TIMESTAMP1 is less than TIMESTAMP2
minus TOLERANCE, 0 if they are equal within TOLERANCE limit,
and 1 if TIMESTAMP1 is greater than TIMESTAMP2 plus TOLERANCE.
"""
# remove any trailing .00 in timestamps:
timestamp1 = re.sub(r'\.[0-9]+$', '', timestamp1)
timestamp2 = re.sub(r'\.[0-9]+$', '', timestamp2)
# first convert timestamps to Unix epoch seconds:
timestamp1_seconds = calendar.timegm(time.strptime(timestamp1, "%Y-%m-%d %H:%M:%S"))
timestamp2_seconds = calendar.timegm(time.strptime(timestamp2, "%Y-%m-%d %H:%M:%S"))
# now compare them:
if timestamp1_seconds < timestamp2_seconds - tolerance:
return -1
elif timestamp1_seconds > timestamp2_seconds + tolerance:
return 1
else:
return 0
def get_database_last_updated_timestamp():
"""Return last updated timestamp for collection-related and
record-related database tables.
"""
database_tables_timestamps = []
database_tables_timestamps.append(get_table_update_time('bibrec'))
database_tables_timestamps.append(get_table_update_time('bibfmt'))
database_tables_timestamps.append(get_table_update_time('idxWORD%'))
database_tables_timestamps.append(get_table_update_time('collection%'))
database_tables_timestamps.append(get_table_update_time('portalbox'))
database_tables_timestamps.append(get_table_update_time('field%'))
database_tables_timestamps.append(get_table_update_time('format%'))
database_tables_timestamps.append(get_table_update_time('rnkMETHODNAME'))
return max(database_tables_timestamps)
def get_cache_last_updated_timestamp():
"""Return last updated cache timestamp."""
try:
f = open(cfg_cache_last_updated_timestamp_file, "r")
except:
return "1970-01-01 00:00:00"
timestamp = f.read()
f.close()
return timestamp
def set_cache_last_updated_timestamp(timestamp):
"""Set last updated cache timestamp to TIMESTAMP."""
try:
f = open(cfg_cache_last_updated_timestamp_file, "w")
except:
pass
f.write(timestamp)
f.close()
return timestamp
def main():
"""Main that construct all the bibtask."""
task_init(authorization_action="runwebcoll",
authorization_msg="WebColl Task Submission",
description="""Description: webcoll updates the collection cache
(record universe for a given collection plus web page elements)
based on invenio.conf and DB configuration parameters.
If the collection name is passed as the second argument, it'll update
this collection only. If the collection name is immediately followed
by a plus sign, it will also update all its desdendants. The
top-level collection name may be entered as the void string.\n""",
help_specific_usage=" -c, --collection\t Update cache for the given"
"collection only. [all]\n"
" -f, --force\t Force update even if cache is up to date. [no]\n"
" -p, --part\t Update only certain cache parts (1=reclist,"
" 2=webpage). [both]\n"
" -l, --language\t Update pages in only certain language"
" (e.g. fr). [all]\n",
version=__revision__,
specific_params=("c:fp:l:", [
"collection=",
"force",
"part=",
"language="
]),
task_submit_elaborate_specific_parameter_fnc=task_submit_elaborate_specific_parameter,
task_submit_check_options_fnc=task_submit_check_options,
task_run_fnc=task_run_core)
def task_submit_elaborate_specific_parameter(key, value, opts, args):
""" Given the string key it checks it's meaning, eventually using the value.
Usually it fills some key in the options dict.
It must return True if it has elaborated the key, False, if it doesn't
know that key.
eg:
if key in ['-n', '--number']:
self.options['number'] = value
return True
return False
"""
if key in ("-c", "--collection"):
task_set_option("collection", value)
elif key in ("-f", "--force"):
task_set_option("force", 1)
elif key in ("-p", "--part"):
task_set_option("part", int(value))
elif key in ("-l", "--language"):
task_set_option("language", value)
else:
return False
return True
def task_submit_check_options():
if task_has_option('collection'):
coll = get_collection(task_get_option("collection"))
if coll.id is None:
raise StandardError, 'Collection %s does not exist' % coll.name
return True
def task_run_core():
""" Reimplement to add the body of the task."""
task_run_start_timestamp = get_current_time_timestamp()
colls = []
# decide whether we need to run or not, by comparing last updated timestamps:
write_message("Database timestamp is %s." % get_database_last_updated_timestamp(), verbose=3)
write_message("Collection cache timestamp is %s." % get_cache_last_updated_timestamp(), verbose=3)
if task_has_option("part"):
write_message("Running cache update part %s only." % task_get_option("part"), verbose=3)
if task_has_option("force") or \
compare_timestamps_with_tolerance(get_database_last_updated_timestamp(),
get_cache_last_updated_timestamp(),
cfg_cache_last_updated_timestamp_tolerance) >= 0:
## either forced update was requested or cache is not up to date, so recreate it:
# firstly, decide which collections to do:
if task_has_option("collection"):
coll = get_collection(task_get_option("collection"))
colls.append(coll)
else:
res = run_sql("SELECT name FROM collection ORDER BY id")
for row in res:
colls.append(get_collection(row[0]))
# secondly, update collection reclist cache:
if task_get_option('part', 1) == 1:
i = 0
for coll in colls:
i += 1
write_message("%s / reclist cache update" % coll.name)
coll.calculate_reclist()
coll.update_reclist()
task_update_progress("Part 1/2: done %d/%d" % (i, len(colls)))
# thirdly, update collection webpage cache:
if task_get_option("part", 2) == 2:
i = 0
for coll in colls:
i += 1
write_message("%s / webpage cache update" % coll.name)
coll.update_webpage_cache()
task_update_progress("Part 2/2: done %d/%d" % (i, len(colls)))
# finally update the cache last updated timestamp:
# (but only when all collections were updated, not when only
# some of them were forced-updated as per admin's demand)
if not task_has_option("collection"):
set_cache_last_updated_timestamp(task_run_start_timestamp)
write_message("Collection cache timestamp is set to %s." % get_cache_last_updated_timestamp(), verbose=3)
else:
## cache up to date, we don't have to run
write_message("Collection cache is up to date, no need to run.")
## we are done:
return True
### okay, here we go:
if __name__ == '__main__':
main()
diff --git a/modules/websearch/lib/websearch_webinterface.py b/modules/websearch/lib/websearch_webinterface.py
index 2fa0cf87e..f5eefd6bd 100644
--- a/modules/websearch/lib/websearch_webinterface.py
+++ b/modules/websearch/lib/websearch_webinterface.py
@@ -1,826 +1,826 @@
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""WebSearch URL handler."""
__revision__ = "$Id$"
import cgi
import os
import datetime
from urllib import quote
from mod_python import apache
try:
Set = set
except NameError:
from sets import Set
from invenio.config import \
weburl, \
cdsname, \
- cachedir, \
+ CFG_CACHEDIR, \
cdslang, \
adminemail, \
sweburl, \
CFG_WEBSEARCH_INSTANT_BROWSE_RSS, \
CFG_WEBSEARCH_RSS_TTL, \
CFG_WEBSEARCH_RSS_MAX_CACHED_REQUESTS
from invenio.dbquery import Error
from invenio.webinterface_handler import wash_urlargd, WebInterfaceDirectory
from invenio.urlutils import redirect_to_url, make_canonical_urlargd, drop_default_urlargd
from invenio.webuser import getUid, page_not_authorized, get_user_preferences, \
collect_user_info, http_check_credentials
from invenio import search_engine
from invenio.websubmit_webinterface import WebInterfaceFilesPages
from invenio.webcomment_webinterface import WebInterfaceCommentsPages
from invenio.webpage import page, create_error_box
from invenio.messages import gettext_set_language
from invenio.search_engine import get_colID, get_coll_i18nname, \
check_user_can_view_record, collection_restricted_p, restricted_collection_cache
from invenio.access_control_engine import acc_authorize_action
from invenio.access_control_config import VIEWRESTRCOLL
from invenio.access_control_mailcookie import mail_cookie_create_authorize_action
from invenio.bibformat import format_records
from invenio.websearch_webcoll import mymkdir, get_collection
from invenio.intbitset import intbitset
from invenio.bibupload import find_record_from_sysno
import invenio.template
websearch_templates = invenio.template.load('websearch')
search_results_default_urlargd = websearch_templates.search_results_default_urlargd
search_interface_default_urlargd = websearch_templates.search_interface_default_urlargd
output_formats = ['xm', 'xd', 'hm', 'hx', 'hd', 'hb', 'xe', 'xn']
def wash_search_urlargd(form):
"""
Create canonical search arguments from those passed via web form.
"""
argd = wash_urlargd(form, search_results_default_urlargd)
# Sometimes, users pass ot=245,700 instead of
# ot=245&ot=700. Normalize that.
ots = []
for ot in argd['ot']:
ots += ot.split(',')
argd['ot'] = ots
# We can either get the mode of function as
# action=, or by setting action_browse or
# action_search.
if argd['action_browse']:
argd['action'] = 'browse'
elif argd['action_search']:
argd['action'] = 'search'
else:
if argd['action'] not in ('browse', 'search'):
argd['action'] = 'search'
del argd['action_browse']
del argd['action_search']
return argd
class WebInterfaceAuthorPage(WebInterfaceDirectory):
""" Handle /author/Doe%2C+John etc """
_exports = ['author']
def __init__(self, authorname=''):
"""Constructor."""
self.authorname = authorname
def _lookup(self, component, path):
"""This handler parses dynamic URLs (/author/John+Doe)."""
return WebInterfaceAuthorPage(component), path
def __call__(self, req, form):
"""Serve the page in the given language."""
argd = wash_urlargd(form, {'ln': (str, cdslang)})
req.argd = argd #needed since perform_req_search
#wants to check it in case of no results
self.authorname = self.authorname.replace("+"," ")
search_engine.perform_request_search(req=req, p=self.authorname, f="author", of="hb")
index = __call__
class WebInterfaceRecordPages(WebInterfaceDirectory):
""" Handling of a /record/ URL fragment """
_exports = ['', 'files', 'reviews', 'comments', 'usage',
'references', 'export', 'citations']
#_exports.extend(output_formats)
def __init__(self, recid, tab, format=None):
self.recid = recid
self.tab = tab
self.format = format
self.export = self
self.files = WebInterfaceFilesPages(self.recid)
self.reviews = WebInterfaceCommentsPages(self.recid, reviews=1)
self.comments = WebInterfaceCommentsPages(self.recid)
self.usage = self
self.references = self
self.citations = self
self.export = WebInterfaceRecordExport(self.recid, self.format)
return
def __call__(self, req, form):
argd = wash_search_urlargd(form)
argd['recid'] = self.recid
argd['tab'] = self.tab
if self.format is not None:
argd['of'] = self.format
req.argd = argd
uid = getUid(req)
if uid == -1:
return page_not_authorized(req, "../",
text="You are not authorized to view this record.",
navmenuid='search')
elif uid > 0:
pref = get_user_preferences(uid)
try:
argd['rg'] = int(pref['websearch_group_records'])
except (KeyError, ValueError):
pass
user_info = collect_user_info(req)
(auth_code, auth_msg) = check_user_can_view_record(user_info, self.recid)
if auth_code and user_info['email'] == 'guest' and not user_info['apache_user']:
cookie = mail_cookie_create_authorize_action(VIEWRESTRCOLL, {'collection' : search_engine.guess_primary_collection_of_a_record(self.recid)})
target = '/youraccount/login' + \
make_canonical_urlargd({'action': cookie, 'ln' : argd['ln'], 'referer' : \
weburl + '/record/' + str(self.recid) + make_canonical_urlargd(argd, \
search_results_default_urlargd)}, {'ln' : cdslang})
return redirect_to_url(req, target)
elif auth_code:
return page_not_authorized(req, "../", \
text = auth_msg,\
navmenuid='search')
# mod_python does not like to return [] in case when of=id:
out = search_engine.perform_request_search(req, **argd)
if out == []:
return str(out)
else:
return out
# Return the same page wether we ask for /record/123 or /record/123/
index = __call__
class WebInterfaceRecordRestrictedPages(WebInterfaceDirectory):
""" Handling of a /record-restricted/ URL fragment """
_exports = ['', 'files', 'reviews', 'comments', 'usage',
'references', 'export', 'citations']
#_exports.extend(output_formats)
def __init__(self, recid, tab, format=None):
self.recid = recid
self.tab = tab
self.format = format
self.files = WebInterfaceFilesPages(self.recid)
self.reviews = WebInterfaceCommentsPages(self.recid, reviews=1)
self.comments = WebInterfaceCommentsPages(self.recid)
self.usage = self
self.references = self
self.citations = self
self.export = WebInterfaceRecordExport(self.recid, self.format)
return
def __call__(self, req, form):
argd = wash_search_urlargd(form)
argd['recid'] = self.recid
if self.format is not None:
argd['of'] = self.format
req.argd = argd
uid = getUid(req)
user_info = collect_user_info(req)
if uid == -1:
return page_not_authorized(req, "../",
text="You are not authorized to view this record.",
navmenuid='search')
elif uid > 0:
pref = get_user_preferences(uid)
try:
argd['rg'] = int(pref['websearch_group_records'])
except (KeyError, ValueError):
pass
record_primary_collection = search_engine.guess_primary_collection_of_a_record(self.recid)
if collection_restricted_p(record_primary_collection):
(auth_code, dummy) = acc_authorize_action(user_info, VIEWRESTRCOLL, collection=record_primary_collection)
if auth_code:
return page_not_authorized(req, "../",
text="You are not authorized to view this record.",
navmenuid='search')
# Keep all the arguments, they might be reused in the
# record page itself to derivate other queries
req.argd = argd
# mod_python does not like to return [] in case when of=id:
out = search_engine.perform_request_search(req, **argd)
if out == []:
return str(out)
else:
return out
# Return the same page wether we ask for /record/123 or /record/123/
index = __call__
class WebInterfaceSearchResultsPages(WebInterfaceDirectory):
""" Handling of the /search URL and its sub-pages. """
_exports = ['', 'authenticate', 'cache', 'log']
def __call__(self, req, form):
""" Perform a search. """
argd = wash_search_urlargd(form)
_ = gettext_set_language(argd['ln'])
if req.method == 'POST':
raise apache.SERVER_RETURN, apache.HTTP_METHOD_NOT_ALLOWED
uid = getUid(req)
user_info = collect_user_info(req)
if uid == -1:
return page_not_authorized(req, "../",
text = _("You are not authorized to view this area."),
navmenuid='search')
elif uid > 0:
pref = get_user_preferences(uid)
try:
argd['rg'] = int(pref['websearch_group_records'])
except (KeyError, ValueError):
pass
involved_collections = Set()
involved_collections.update(argd['c'])
involved_collections.add(argd['cc'])
if argd['id'] > 0:
argd['recid'] = argd['id']
if argd['idb'] > 0:
argd['recidb'] = argd['idb']
if argd['sysno']:
tmp_recid = find_record_from_sysno(argd['sysno'])
if tmp_recid:
argd['recid'] = tmp_recid
if argd['sysnb']:
tmp_recid = find_record_from_sysno(argd['sysnb'])
if tmp_recid:
argd['recidb'] = tmp_recid
if argd['recid'] > 0:
if argd['recidb'] > argd['recid']:
# Hack to check if among the restricted collections
# at least a record of the range is there and
# then if the user is not authorized for that
# collection.
recids = intbitset(xrange(argd['recid'], argd['recidb']))
restricted_colls = restricted_collection_cache.get_cache()
for collname in restricted_colls:
(auth_code, auth_msg) = acc_authorize_action(user_info, VIEWRESTRCOLL, collection=collname)
if auth_code:
coll_recids = get_collection(collname).reclist
if coll_recids & recids:
if auth_code and user_info['email'] == 'guest' and not user_info['apache_user']:
cookie = mail_cookie_create_authorize_action(VIEWRESTRCOLL, {'collection' : collname})
target = '/youraccount/login' + \
make_canonical_urlargd({'action' : cookie, 'ln' : argd['ln'], 'referer' : \
weburl + '/search' + make_canonical_urlargd(argd, \
search_results_default_urlargd)}, {'ln' : cdslang})
return redirect_to_url(req, target)
else:
return page_not_authorized(req, "../", \
text = auth_msg,\
navmenuid='search')
else:
involved_collections.add(search_engine.guess_primary_collection_of_a_record(argd['recid']))
# If any of the collection requires authentication, redirect
# to the authentication form.
for coll in involved_collections:
if collection_restricted_p(coll):
(auth_code, auth_msg) = acc_authorize_action(user_info, VIEWRESTRCOLL, collection=coll)
if auth_code and user_info['email'] == 'guest' and not user_info['apache_user']:
cookie = mail_cookie_create_authorize_action(VIEWRESTRCOLL, {'collection' : coll})
target = '/youraccount/login' + \
make_canonical_urlargd({'action' : cookie, 'ln' : argd['ln'], 'referer' : \
weburl + '/search' + make_canonical_urlargd(argd, \
search_results_default_urlargd)}, {'ln' : cdslang})
return redirect_to_url(req, target)
elif auth_code:
return page_not_authorized(req, "../", \
text = auth_msg,\
navmenuid='search')
# Keep all the arguments, they might be reused in the
# search_engine itself to derivate other queries
req.argd = argd
# mod_python does not like to return [] in case when of=id:
out = search_engine.perform_request_search(req, **argd)
if out == []:
return str(out)
else:
return out
def cache(self, req, form):
"""Search cache page."""
argd = wash_urlargd(form, {'action': (str, 'show')})
return search_engine.perform_request_cache(req, action=argd['action'])
def log(self, req, form):
"""Search log page."""
argd = wash_urlargd(form, {'date': (str, '')})
return search_engine.perform_request_log(req, date=argd['date'])
def authenticate(self, req, form):
"""Restricted search results pages."""
argd = wash_search_urlargd(form)
user_info = collect_user_info(req)
for coll in argd['c'] + [argd['cc']]:
if collection_restricted_p(coll):
(auth_code, dummy) = acc_authorize_action(user_info, VIEWRESTRCOLL, collection=coll)
if auth_code:
return page_not_authorized(req, "../",
text="You are not authorized to view this collection.",
navmenuid='search')
# Keep all the arguments, they might be reused in the
# search_engine itself to derivate other queries
req.argd = argd
uid = getUid(req)
if uid > 0:
pref = get_user_preferences(uid)
try:
argd['rg'] = int(pref['websearch_group_records'])
except (KeyError, ValueError):
pass
# mod_python does not like to return [] in case when of=id:
out = search_engine.perform_request_search(req, **argd)
if out == []:
return str(out)
else:
return out
# Parameters for the legacy URLs, of the form /?c=ALEPH
legacy_collection_default_urlargd = {
'as': (int, 0),
'verbose': (int, 0),
'c': (str, cdsname)}
class WebInterfaceSearchInterfacePages(WebInterfaceDirectory):
""" Handling of collection navigation."""
_exports = [('index.py', 'legacy_collection'),
('', 'legacy_collection'),
('search.py', 'legacy_search'),
'search', 'openurl', 'testsso']
search = WebInterfaceSearchResultsPages()
def testsso(self, req, form):
""" For testing single sign-on """
req.add_common_vars()
sso_env = {}
for var, value in req.subprocess_env.iteritems():
if var.startswith('HTTP_ADFS_'):
sso_env[var] = value
out = "SSO test"
out += "
"
for var, value in sso_env.iteritems():
out += "
%s
%s
" % (var, value)
out += "
"
return out
def _lookup(self, component, path):
""" This handler is invoked for the dynamic URLs (for
collections and records)"""
if component == 'collection':
c = '/'.join(path)
def answer(req, form):
"""Accessing collections cached pages."""
# Accessing collections: this is for accessing the
# cached page on top of each collection.
argd = wash_urlargd(form, search_interface_default_urlargd)
# We simply return the cached page of the collection
argd['c'] = c
if not argd['c']:
# collection argument not present; display
# home collection by default
argd['c'] = cdsname
return display_collection(req, **argd)
return answer, []
elif component == 'record' or component == 'record-restricted':
try:
recid = int(path[0])
except IndexError:
# display record #1 for URL /record without a number
recid = 1
except ValueError:
if path[0] == '':
# display record #1 for URL /record/ without a number
recid = 1
else:
# display page not found for URLs like /record/foo
return None, []
if recid <= 0:
# display page not found for URLs like /record/-5 or /record/0
return None, []
format = None
tab = ''
try:
if path[1] in ['', 'files', 'reviews', 'comments',
'usage', 'references', 'citations']:
tab = path[1]
elif path[1] == 'export':
tab = ''
format = path[2]
# format = None
# elif path[1] in output_formats:
# tab = ''
# format = path[1]
else:
# display page not found for URLs like /record/references
# for a collection where 'references' tabs is not visible
return None, []
except IndexError:
# Keep normal url if tabs is not specified
pass
#if component == 'record-restricted':
#return WebInterfaceRecordRestrictedPages(recid, tab, format), path[1:]
#else:
return WebInterfaceRecordPages(recid, tab, format), path[1:]
return None, []
def openurl(self, req, form):
""" OpenURL Handler."""
argd = wash_urlargd(form, websearch_templates.tmpl_openurl_accepted_args)
ret_url = websearch_templates.tmpl_openurl2invenio(argd)
if ret_url:
return redirect_to_url(req, ret_url)
else:
return redirect_to_url(req, weburl)
def legacy_collection(self, req, form):
"""Collection URL backward compatibility handling."""
accepted_args = dict(legacy_collection_default_urlargd)
accepted_args.update({'referer' : (str, '%s/youraccount/your'),
'realm' : (str, '')})
argd = wash_urlargd(form, accepted_args)
# Apache authentication stuff
if argd['realm']:
http_check_credentials(req, argd['realm'])
return redirect_to_url(req, argd['referer'] or '%s/youraccount/youradminactivities' % sweburl)
del argd['referer']
del argd['realm']
# If we specify no collection, then we don't need to redirect
# the user, so that accessing returns the
# default collection.
if not form.has_key('c'):
return display_collection(req, **argd)
# make the collection an element of the path, and keep the
# other query elements as is. If the collection is cdsname,
# however, redirect to the main URL.
c = argd['c']
del argd['c']
if c == cdsname:
target = '/'
else:
target = '/collection/' + quote(c)
target += make_canonical_urlargd(argd, legacy_collection_default_urlargd)
return redirect_to_url(req, target)
def legacy_search(self, req, form):
"""Search URL backward compatibility handling."""
argd = wash_search_urlargd(form)
# We either jump into the generic search form, or the specific
# /record/... display if a recid is requested
if argd['recid'] != -1:
target = '/record/%d' % argd['recid']
del argd['recid']
else:
target = '/search'
target += make_canonical_urlargd(argd, search_results_default_urlargd)
return redirect_to_url(req, target)
def display_collection(req, c, as, verbose, ln):
"""Display search interface page for collection c by looking
in the collection cache."""
_ = gettext_set_language(ln)
req.argd = drop_default_urlargd({'as': as, 'verbose': verbose, 'ln': ln},
search_interface_default_urlargd)
# get user ID:
try:
uid = getUid(req)
user_preferences = {}
if uid == -1:
return page_not_authorized(req, "../",
text="You are not authorized to view this collection",
navmenuid='search')
elif uid > 0:
user_preferences = get_user_preferences(uid)
except Error:
return page(title=_("Internal Error"),
body = create_error_box(req, verbose=verbose, ln=ln),
description="%s - Internal Error" % cdsname,
keywords="%s, Internal Error" % cdsname,
language=ln,
req=req,
navmenuid='search')
# start display:
req.content_type = "text/html"
req.send_http_header()
# deduce collection id:
colID = get_colID(c)
if type(colID) is not int:
page_body = '
' + (_("Sorry, collection %s does not seem to exist.") % ('' + str(c) + '')) + '
"""
if not rnkmethods:
output += """No rank methods"""
else:
for id, name in rnkmethods:
output += """%s, """ % name
output += """
"""
rnk_list = get_def_name('', "rnkMETHOD")
rnk_dict_in_col = dict(get_col_rnk(colID, ln))
rnk_list = filter(lambda x: not rnk_dict_in_col.has_key(x[0]), rnk_list)
if rnk_list:
text = """
Enable:
"""
output += createhiddenform(action="modifyrankmethods#9",
text=text,
button="Enable",
colID=colID,
ln=ln,
func=0,
confirm=1)
if confirm in ["1", 1] and func in ["0", 0] and int(rnkID) != -1:
output += write_outcome(finresult)
elif confirm not in ["0", 0] and func in ["0", 0]:
output += """Please select a rank method."""
coll_list = get_col_rnk(colID, ln)
if coll_list:
text = """
Disable:
"""
output += createhiddenform(action="modifyrankmethods#9",
text=text,
button="Disable",
colID=colID,
ln=ln,
func=1,
confirm=1)
if confirm in ["1", 1] and func in ["1", 1] and int(rnkID) != -1:
output += write_outcome(finresult)
elif confirm not in ["0", 0] and func in ["1", 1]:
output += """Please select a rank method."""
body = [output]
if callback:
return perform_editcollection(colID, ln, "perform_modifyrankmethods", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_addcollectiontotree(colID, ln, add_dad='', add_son='', rtype='', mtype='', callback='yes', confirm=-1):
"""Form to add a collection to the tree.
add_dad - the dad to add the collection to
add_son - the collection to add
rtype - add it as a regular or virtual
mtype - add it to the regular or virtual tree."""
output = ""
output2 = ""
subtitle = """Attach collection to tree   [?]""" % (weburl)
col_dict = dict(get_def_name('', "collection"))
if confirm not in [-1, "-1"] and not (add_son and add_dad and rtype):
output2 += """All fields must be filled.
"""
elif add_son and add_dad and rtype:
add_son = int(add_son)
add_dad = int(add_dad)
if confirm not in [-1, "-1"]:
if add_son == add_dad:
output2 += """Cannot add a collection as a pointer to itself.
"""
elif check_col(add_dad, add_son):
res = add_col_dad_son(add_dad, add_son, rtype)
output2 += write_outcome(res)
if res[0] == 1:
output2 += """ The collection will appear on your website after the next webcoll run. You can either run it manually or wait until bibsched does it for you.
"""
else:
output2 += """Cannot add the collection '%s' as a %s subcollection of '%s' since it will either create a loop, or the association already exists.
""" % (col_dict[add_son], (rtype=="r" and 'regular' or 'virtual'), col_dict[add_dad])
add_son = ''
add_dad = ''
rtype = ''
tree = get_col_tree(colID)
col_list = col_dict.items()
col_list.sort(compare_on_val)
output = show_coll_not_in_tree(colID, ln, col_dict)
text = """
Attach collection: to parent collection:
"""
text += """
with relationship:
""" % ((rtype=="r" and 'selected="selected"' or ''), (rtype=="v" and 'selected="selected"' or ''))
output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/addcollectiontotree" % weburl,
text=text,
button="Add",
colID=colID,
ln=ln,
confirm=1)
output += output2
#output += perform_showtree(colID, ln)
body = [output]
if callback:
return perform_index(colID, ln, mtype="perform_addcollectiontotree", content=addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_addcollection(colID, ln, colNAME='', dbquery='', callback="yes", confirm=-1):
"""form to add a new collection.
colNAME - the name of the new collection
dbquery - the dbquery of the new collection"""
output = ""
subtitle = """Create new collection [?]""" % (weburl)
text = """
Default name
""" % colNAME
output = createhiddenform(action="%s/admin/websearch/websearchadmin.py/addcollection" % weburl,
text=text,
colID=colID,
ln=ln,
button="Add collection",
confirm=1)
if colNAME and confirm in ["1", 1]:
res = add_col(colNAME, '')
output += write_outcome(res)
if res[0] == 1:
output += perform_addcollectiontotree(colID=colID, ln=ln, add_son=res[1], callback='')
elif confirm not in ["-1", -1]:
output += """Please give the collection a name."""
body = [output]
if callback:
return perform_index(colID, ln=ln, mtype="perform_addcollection", content=addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_modifydbquery(colID, ln, dbquery='', callback='yes', confirm=-1):
"""form to modify the dbquery of the collection.
dbquery - the dbquery of the collection."""
subtitle = ''
output = ""
col_dict = dict(get_def_name('', "collection"))
if colID and col_dict.has_key(int(colID)):
colID = int(colID)
subtitle = """1. Modify collection query for collection '%s'   [?]""" % (col_dict[colID], weburl)
if confirm == -1:
res = run_sql("SELECT dbquery FROM collection WHERE id=%s" % colID)
dbquery = res[0][0]
if not dbquery:
dbquery = ''
reg_sons = len(get_col_tree(colID, 'r'))
vir_sons = len(get_col_tree(colID, 'v'))
if reg_sons > 1:
if dbquery:
output += "Warning: This collection got subcollections, and should because of this not have a collection query, for further explanation, check the WebSearch Guide "
elif reg_sons <= 1:
if not dbquery:
output += "Warning: This collection does not have any subcollections, and should because of this have a collection query, for further explanation, check the WebSearch Guide "
text = """
Query
""" % cgi.escape(dbquery, 1)
output += createhiddenform(action="modifydbquery",
text=text,
button="Modify",
colID=colID,
ln=ln,
confirm=1)
if confirm in ["1", 1]:
res = modify_dbquery(colID, dbquery)
if res:
if dbquery == "":
text = """Query removed for this collection."""
else:
text = """Query set for this collection."""
else:
text = """Sorry, could not change query."""
output += text
body = [output]
if callback:
return perform_editcollection(colID, ln, "perform_modifydbquery", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_modifycollectiontree(colID, ln, move_up='', move_down='', move_from='', move_to='', delete='', rtype='', callback='yes', confirm=0):
"""to modify the collection tree: move a collection up and down, delete a collection, or change the father of the collection.
colID - the main collection of the tree, the root
move_up - move this collection up (is not the collection id, but the place in the tree)
move_up - move this collection down (is not the collection id, but the place in the tree)
move_from - move this collection from the current positon (is not the collection id, but the place in the tree)
move_to - move the move_from collection and set this as it's father. (is not the collection id, but the place in the tree)
delete - delete this collection from the tree (is not the collection id, but the place in the tree)
rtype - the type of the collection in the tree, regular or virtual"""
colID = int(colID)
tree = get_col_tree(colID, rtype)
col_dict = dict(get_def_name('', "collection"))
subtitle = """Modify collection tree: %s [?] Printer friendly version""" % (col_dict[colID], weburl, weburl, colID, ln)
fin_output = ""
output = ""
try:
if move_up:
move_up = int(move_up)
switch = find_last(tree, move_up)
if switch and switch_col_treescore(tree[move_up], tree[switch]):
output += """Moved the %s collection '%s' up and '%s' down.
""" % ((rtype=="r" and 'regular' or 'virtual'), col_dict[tree[move_up][0]], col_dict[tree[switch][0]])
else:
output += """Could not move the %s collection '%s' up and '%s' down.
""" % ((rtype=="r" and 'regular' or 'virtual'), col_dict[tree[move_up][0]], col_dict[tree[switch][0]])
elif move_down:
move_down = int(move_down)
switch = find_next(tree, move_down)
if switch and switch_col_treescore(tree[move_down], tree[switch]):
output += """Moved the %s collection '%s' down and '%s' up.
""" % ((rtype=="r" and 'regular' or 'virtual'), col_dict[tree[move_down][0]], col_dict[tree[switch][0]])
else:
output += """Could not move the %s collection '%s' up and '%s' down.
""" % ((rtype=="r" and 'regular' or 'virtual'), col_dict[tree[move_up][0]],col_dict[tree[switch][0]])
elif delete:
delete = int(delete)
if confirm in [0, "0"]:
if col_dict[tree[delete][0]] != col_dict[tree[delete][3]]:
text = """Do you want to remove the %s collection '%s' and its subcollections in the %s collection '%s'.
""" % ((tree[delete][4]=="r" and 'regular' or 'virtual'), col_dict[tree[delete][0]], (rtype=="r" and 'regular' or 'virtual'), col_dict[tree[delete][3]])
else:
text = """Do you want to remove all subcollections of the %s collection '%s'.
""" % ((rtype=="r" and 'regular' or 'virtual'), col_dict[tree[delete][3]])
output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/modifycollectiontree#tree" % weburl,
text=text,
button="Confirm",
colID=colID,
delete=delete,
rtype=rtype,
ln=ln,
confirm=1)
output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/index?mtype=perform_modifycollectiontree#tree" % weburl,
text="To cancel",
button="Cancel",
colID=colID,
ln=ln)
else:
if remove_col_subcol(tree[delete][0], tree[delete][3], rtype):
if col_dict[tree[delete][0]] != col_dict[tree[delete][3]]:
output += """Removed the %s collection '%s' and its subcollections in subdirectory '%s'.
""" % ((tree[delete][4]=="r" and 'regular' or 'virtual'), col_dict[tree[delete][0]], col_dict[tree[delete][3]])
else:
output += """Removed the subcollections of the %s collection '%s'.
""" % ((rtype=="r" and 'regular' or 'virtual'), col_dict[tree[delete][3]])
else:
output += """Could not remove the collection from the tree.
"""
delete = ''
elif move_from and not move_to:
move_from_rtype = move_from[0]
move_from_id = int(move_from[1:len(move_from)])
text = """Select collection to place the %s collection '%s' under.
""" % ((move_from_rtype=="r" and 'regular' or 'virtual'), col_dict[tree[move_from_id][0]])
output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/index?mtype=perform_modifycollectiontree#tree" % weburl,
text=text,
button="Cancel",
colID=colID,
ln=ln)
elif move_from and move_to:
move_from_rtype = move_from[0]
move_from_id = int(move_from[1:len(move_from)])
move_to_rtype = move_to[0]
move_to_id = int(move_to[1:len(move_to)])
tree_from = get_col_tree(colID, move_from_rtype)
tree_to = get_col_tree(colID, move_to_rtype)
if confirm in [0, '0']:
if move_from_id == move_to_id and move_from_rtype == move_to_rtype:
output += """Cannot move to itself.
"""
elif tree_from[move_from_id][3] == tree_to[move_to_id][0] and move_from_rtype==move_to_rtype:
output += """The collection is already there.
"""
elif check_col(tree_to[move_to_id][0], tree_from[move_from_id][0]) or (tree_to[move_to_id][0] == 1 and tree_from[move_from_id][3] == tree_to[move_to_id][0] and move_from_rtype != move_to_rtype):
text = """Move %s collection '%s' to the %s collection '%s'.
""" % ((tree_from[move_from_id][4]=="r" and 'regular' or 'virtual'), col_dict[tree_from[move_from_id][0]], (tree_to[move_to_id][4]=="r" and 'regular' or 'virtual'), col_dict[tree_to[move_to_id][0]])
output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/modifycollectiontree#tree" % weburl,
text=text,
button="Confirm",
colID=colID,
move_from=move_from,
move_to=move_to,
ln=ln,
rtype=rtype,
confirm=1)
output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/index?mtype=perform_modifycollectiontree#tree" % weburl,
text="""To cancel""",
button="Cancel",
colID=colID,
ln=ln)
else:
output += """Cannot move the collection '%s' and set it as a subcollection of '%s' since it will create a loop.
""" % (col_dict[tree_from[move_from_id][0]], col_dict[tree_to[move_to_id][0]])
else:
if (move_to_id != 0 and move_col_tree(tree_from[move_from_id], tree_to[move_to_id])) or (move_to_id == 0 and move_col_tree(tree_from[move_from_id], tree_to[move_to_id], move_to_rtype)):
output += """Moved %s collection '%s' to the %s collection '%s'.
""" % ((move_from_rtype=="r" and 'regular' or 'virtual'), col_dict[tree_from[move_from_id][0]], (move_to_rtype=="r" and 'regular' or 'virtual'), col_dict[tree_to[move_to_id][0]])
else:
output += """Could not move %s collection '%s' to the %s collection '%s'.
""" % ((move_from_rtype=="r" and 'regular' or 'virtual'), col_dict[tree_from[move_from_id][0]], (move_to_rtype=="r" and 'regular' or 'virtual'), col_dict[tree_to[move_to_id][0]])
move_from = ''
move_to = ''
else:
output += """
"""
except StandardError, e:
return """An error occured.
"""
output += """
"""
body = [output]
return addadminbox(subtitle, body)
def perform_addportalbox(colID, ln, title='', body='', callback='yes', confirm=-1):
"""form to add a new portalbox
title - the title of the portalbox
body - the body of the portalbox"""
col_dict = dict(get_def_name('', "collection"))
colID = int(colID)
subtitle = """Create new portalbox"""
text = """
Title Body
""" % (cgi.escape(title), cgi.escape(body))
output = createhiddenform(action="addportalbox#5.1",
text=text,
button="Add",
colID=colID,
ln=ln,
confirm=1)
if body and confirm in [1, "1"]:
res = add_pbx(title, body)
output += write_outcome(res)
if res[1] == 1:
output += """Add portalbox to collection""" % (colID, ln, res[1])
elif confirm not in [-1, "-1"]:
output += """Body field must be filled.
"""
body = [output]
return perform_showportalboxes(colID, ln, content=addadminbox(subtitle, body))
def perform_addexistingportalbox(colID, ln, pbxID=-1, score=0, position='', sel_ln='', callback='yes', confirm=-1):
"""form to add an existing portalbox to a collection.
colID - the collection to add the portalbox to
pbxID - the portalbox to add
score - the importance of the portalbox.
position - the position of the portalbox on the page
sel_ln - the language of the portalbox"""
subtitle = """Add existing portalbox to collection"""
output = ""
colID = int(colID)
res = get_pbx()
pos = get_pbx_pos()
lang = dict(get_languages())
col_dict = dict(get_def_name('', "collection"))
pbx_dict = dict(map(lambda x: (x[0], x[1]), res))
col_pbx = get_col_pbx(colID)
col_pbx = dict(map(lambda x: (x[0], x[5]), col_pbx))
if len(res) > 0:
text = """
Portalbox Language Position
"
output += createhiddenform(action="addexistingportalbox#5.2",
text=text,
button="Add",
colID=colID,
ln=ln,
confirm=1)
else:
output = """No existing portalboxes to add, please create a new one.
"""
if pbxID > -1 and position and sel_ln and confirm in [1, "1"]:
pbxID = int(pbxID)
res = add_col_pbx(colID, pbxID, sel_ln, position, '')
output += write_outcome(res)
elif pbxID > -1 and confirm not in [-1, "-1"]:
output += """All fields must be filled.
"""
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_showportalboxes(colID, ln, content=output)
def perform_deleteportalbox(colID, ln, pbxID=-1, callback='yes', confirm=-1):
"""form to delete a portalbox which is not in use.
colID - the current collection.
pbxID - the id of the portalbox"""
subtitle = """Delete an unused portalbox"""
output = ""
colID = int(colID)
if pbxID not in [-1, "-1"] and confirm in [1, "1"]:
ares = get_pbx()
pbx_dict = dict(map(lambda x: (x[0], x[1]), ares))
if pbx_dict.has_key(int(pbxID)):
pname = pbx_dict[int(pbxID)]
ares = delete_pbx(int(pbxID))
else:
return """This portalbox does not exist"""
res = get_pbx()
col_dict = dict(get_def_name('', "collection"))
pbx_dict = dict(map(lambda x: (x[0], x[1]), res))
col_pbx = get_col_pbx()
col_pbx = dict(map(lambda x: (x[0], x[5]), col_pbx))
if len(res) > 0:
text = """
Portalbox """
output += createhiddenform(action="deleteportalbox#5.3",
text=text,
button="Delete",
colID=colID,
ln=ln,
confirm=1)
if pbxID not in [-1, "-1"]:
pbxID = int(pbxID)
if confirm in [1, "1"]:
output += write_outcome(ares)
elif confirm not in [-1, "-1"]:
output += """Choose a portalbox to delete.
"""
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_showportalboxes(colID, ln, content=output)
def perform_modifyportalbox(colID, ln, pbxID=-1, score='', position='', sel_ln='', title='', body='', callback='yes', confirm=-1):
"""form to modify a portalbox in a collection, or change the portalbox itself.
colID - the id of the collection.
pbxID - the portalbox to change
score - the score of the portalbox connected to colID which should be changed.
position - the position of the portalbox in collection colID to change."""
subtitle = ""
output = ""
colID = int(colID)
res = get_pbx()
pos = get_pbx_pos()
lang = dict(get_languages())
col_dict = dict(get_def_name('', "collection"))
pbx_dict = dict(map(lambda x: (x[0], x[1]), res))
col_pbx = get_col_pbx(colID)
col_pbx = dict(map(lambda x: (x[0], x[5]), col_pbx))
if pbxID not in [-1, "-1"]:
pbxID = int(pbxID)
subtitle = """Modify portalbox '%s' for this collection""" % pbx_dict[pbxID]
col_pbx = get_col_pbx(colID)
if not (score and position) and not (body and title):
for (id_pbx, id_collection, tln, score, position, title, body) in col_pbx:
if id_pbx == pbxID:
break
output += """Collection (presentation) specific values (Changes implies only to this collection.) """
text = """
Position """
output += createhiddenform(action="modifyportalbox#5.4",
text=text,
button="Modify",
colID=colID,
pbxID=pbxID,
score=score,
title=title,
body=cgi.escape(body, 1),
sel_ln=sel_ln,
ln=ln,
confirm=3)
if pbxID > -1 and score and position and confirm in [3, "3"]:
pbxID = int(pbxID)
res = modify_pbx(colID, pbxID, sel_ln, score, position, '', '')
res2 = get_pbx()
pbx_dict = dict(map(lambda x: (x[0], x[1]), res2))
output += write_outcome(res)
output += """ Portalbox (content) specific values (any changes appears everywhere the portalbox is used.)"""
text = """
Title
""" % cgi.escape(title)
text += """
Body
""" % cgi.escape(body)
output += createhiddenform(action="modifyportalbox#5.4",
text=text,
button="Modify",
colID=colID,
pbxID=pbxID,
sel_ln=sel_ln,
score=score,
position=position,
ln=ln,
confirm=4)
if pbxID > -1 and confirm in [4, "4"]:
pbxID = int(pbxID)
res = modify_pbx(colID, pbxID, sel_ln, '', '', title, body)
output += write_outcome(res)
else:
output = """No portalbox to modify."""
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_showportalboxes(colID, ln, content=output)
def perform_switchpbxscore(colID, id_1, id_2, sel_ln, ln):
"""Switch the score of id_1 and id_2 in collection_portalbox.
colID - the current collection
id_1/id_2 - the id's to change the score for.
sel_ln - the language of the portalbox"""
output = ""
res = get_pbx()
pbx_dict = dict(map(lambda x: (x[0], x[1]), res))
res = switch_pbx_score(colID, id_1, id_2, sel_ln)
output += write_outcome(res)
return perform_showportalboxes(colID, ln, content=output)
def perform_showportalboxes(colID, ln, callback='yes', content='', confirm=-1):
"""show the portalboxes of this collection.
colID - the portalboxes to show the collection for."""
colID = int(colID)
col_dict = dict(get_def_name('', "collection"))
subtitle = """5. Modify portalboxes for collection '%s'   [?]""" % (col_dict[colID], weburl)
output = ""
pos = get_pbx_pos()
output = """
Portalbox actions (not related to this collection)
"
i += 1
if i != len(res):
move += """""" % (weburl, colID, ln, pbxID, res[i][0], tln, random.randint(0, 1000), weburl)
move += """
"""
actions.append(["%s" % (i==1 and pos[position] or ''), "%s" % (i==1 and lang[tln] or ''), move, "%s" % title])
for col in [(('Modify', 'modifyportalbox'), ('Remove', 'removeportalbox'),)]:
actions[-1].append('%s' % (weburl, col[0][1], colID, ln, pbxID, tln, col[0][0]))
for (str, function) in col[1:]:
actions[-1][-1] += ' / %s' % (weburl, function, colID, ln, pbxID, tln, str)
output += tupletotable(header=header, tuple=actions)
else:
output += """No portalboxes exists for this collection"""
output += content
body = [output]
if callback:
return perform_editcollection(colID, ln, "perform_showportalboxes", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_removeportalbox(colID, ln, pbxID='', sel_ln='', callback='yes', confirm=0):
"""form to remove a portalbox from a collection.
colID - the current collection, remove the portalbox from this collection.
sel_ln - remove the portalbox with this language
pbxID - remove the portalbox with this id"""
subtitle = """Remove portalbox"""
output = ""
col_dict = dict(get_def_name('', "collection"))
res = get_pbx()
pbx_dict = dict(map(lambda x: (x[0], x[1]), res))
if colID and pbxID and sel_ln:
colID = int(colID)
pbxID = int(pbxID)
if confirm in ["0", 0]:
text = """Do you want to remove the portalbox '%s' from the collection '%s'.""" % (pbx_dict[pbxID], col_dict[colID])
output += createhiddenform(action="removeportalbox#5.5",
text=text,
button="Confirm",
colID=colID,
pbxID=pbxID,
sel_ln=sel_ln,
confirm=1)
elif confirm in ["1", 1]:
res = remove_pbx(colID, pbxID, sel_ln)
output += write_outcome(res)
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_showportalboxes(colID, ln, content=output)
def perform_switchfmtscore(colID, type, id_1, id_2, ln):
"""Switch the score of id_1 and id_2 in the table type.
colID - the current collection
id_1/id_2 - the id's to change the score for.
type - like "format" """
fmt_dict = dict(get_def_name('', "format"))
res = switch_score(colID, id_1, id_2, type)
output = write_outcome(res)
return perform_showoutputformats(colID, ln, content=output)
def perform_switchfldscore(colID, id_1, id_2, fmeth, ln):
"""Switch the score of id_1 and id_2 in collection_field_fieldvalue.
colID - the current collection
id_1/id_2 - the id's to change the score for."""
fld_dict = dict(get_def_name('', "field"))
res = switch_fld_score(colID, id_1, id_2)
output = write_outcome(res)
if fmeth == "soo":
return perform_showsortoptions(colID, ln, content=output)
elif fmeth == "sew":
return perform_showsearchfields(colID, ln, content=output)
elif fmeth == "seo":
return perform_showsearchoptions(colID, ln, content=output)
def perform_switchfldvaluescore(colID, id_1, id_fldvalue_1, id_fldvalue_2, ln):
"""Switch the score of id_1 and id_2 in collection_field_fieldvalue.
colID - the current collection
id_1/id_2 - the id's to change the score for."""
name_1 = run_sql("SELECT name from fieldvalue where id=%s" % id_fldvalue_1)[0][0]
name_2 = run_sql("SELECT name from fieldvalue where id=%s" % id_fldvalue_2)[0][0]
res = switch_fld_value_score(colID, id_1, id_fldvalue_1, id_fldvalue_2)
output = write_outcome(res)
return perform_modifyfield(colID, fldID=id_1, ln=ln, content=output)
def perform_addnewfieldvalue(colID, fldID, ln, name='', value='', callback="yes", confirm=-1):
"""form to add a new fieldvalue.
name - the name of the new fieldvalue
value - the value of the new fieldvalue
"""
output = ""
subtitle = """Add new value"""
text = """
Display name Search value
""" % (name, value)
output = createhiddenform(action="%s/admin/websearch/websearchadmin.py/addnewfieldvalue" % weburl,
text=text,
colID=colID,
fldID=fldID,
ln=ln,
button="Add",
confirm=1)
if name and value and confirm in ["1", 1]:
res = add_fldv(name, value)
output += write_outcome(res)
if res[0] == 1:
res = add_col_fld(colID, fldID, 'seo', res[1])
if res[0] == 0:
output += " " + write_outcome(res)
elif confirm not in ["-1", -1]:
output += """Please fill in name and value.
"""
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_modifyfield(colID, fldID=fldID, ln=ln, content=output)
def perform_modifyfieldvalue(colID, fldID, fldvID, ln, name='', value='', callback="yes", confirm=-1):
"""form to modify a fieldvalue.
name - the name of the fieldvalue
value - the value of the fieldvalue
"""
if confirm in [-1, "-1"]:
res = get_fld_value(fldvID)
(id, name, value) = res[0]
output = ""
subtitle = """Modify existing value"""
output = """
Warning: Modifications done below will also inflict on all places the modified data is used.
"""
text = """
Display name Search value
""" % (name, value)
output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/modifyfieldvalue" % weburl,
text=text,
colID=colID,
fldID=fldID,
fldvID=fldvID,
ln=ln,
button="Update",
confirm=1)
output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/modifyfieldvalue" % weburl,
text="Delete value and all associations",
colID=colID,
fldID=fldID,
fldvID=fldvID,
ln=ln,
button="Delete",
confirm=2)
if name and value and confirm in ["1", 1]:
res = update_fldv(fldvID, name, value)
output += write_outcome(res)
#if res:
# output += """Operation successfully completed."""
#else:
# output += """Operation failed."""
elif confirm in ["2", 2]:
res = delete_fldv(fldvID)
output += write_outcome(res)
elif confirm not in ["-1", -1]:
output += """Please fill in name and value."""
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_modifyfield(colID, fldID=fldID, ln=ln, content=output)
def perform_removefield(colID, ln, fldID='', fldvID='', fmeth='', callback='yes', confirm=0):
"""form to remove a field from a collection.
colID - the current collection, remove the field from this collection.
sel_ln - remove the field with this language
fldID - remove the field with this id"""
if fmeth == "soo":
field = "sort option"
elif fmeth == "sew":
field = "search field"
elif fmeth == "seo":
field = "search option"
else:
field = "field"
subtitle = """Remove %s""" % field
output = ""
col_dict = dict(get_def_name('', "collection"))
fld_dict = dict(get_def_name('', "field"))
res = get_fld_value()
fldv_dict = dict(map(lambda x: (x[0], x[1]), res))
if colID and fldID:
colID = int(colID)
fldID = int(fldID)
if fldvID and fldvID != "None":
fldvID = int(fldvID)
if confirm in ["0", 0]:
text = """Do you want to remove the %s '%s' %s from the collection '%s'.""" % (field, fld_dict[fldID], (fldvID not in["", "None"] and "with value '%s'" % fldv_dict[fldvID] or ''), col_dict[colID])
output += createhiddenform(action="removefield#6.5",
text=text,
button="Confirm",
colID=colID,
fldID=fldID,
fldvID=fldvID,
fmeth=fmeth,
confirm=1)
elif confirm in ["1", 1]:
res = remove_fld(colID, fldID, fldvID)
output += write_outcome(res)
body = [output]
output = " " + addadminbox(subtitle, body)
if fmeth == "soo":
return perform_showsortoptions(colID, ln, content=output)
elif fmeth == "sew":
return perform_showsearchfields(colID, ln, content=output)
elif fmeth == "seo":
return perform_showsearchoptions(colID, ln, content=output)
def perform_removefieldvalue(colID, ln, fldID='', fldvID='', fmeth='', callback='yes', confirm=0):
"""form to remove a field from a collection.
colID - the current collection, remove the field from this collection.
sel_ln - remove the field with this language
fldID - remove the field with this id"""
subtitle = """Remove value"""
output = ""
col_dict = dict(get_def_name('', "collection"))
fld_dict = dict(get_def_name('', "field"))
res = get_fld_value()
fldv_dict = dict(map(lambda x: (x[0], x[1]), res))
if colID and fldID:
colID = int(colID)
fldID = int(fldID)
if fldvID and fldvID != "None":
fldvID = int(fldvID)
if confirm in ["0", 0]:
text = """Do you want to remove the value '%s' from the search option '%s'.""" % (fldv_dict[fldvID], fld_dict[fldID])
output += createhiddenform(action="removefieldvalue#7.4",
text=text,
button="Confirm",
colID=colID,
fldID=fldID,
fldvID=fldvID,
fmeth=fmeth,
confirm=1)
elif confirm in ["1", 1]:
res = remove_fld(colID, fldID, fldvID)
output += write_outcome(res)
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_modifyfield(colID, fldID=fldID, ln=ln, content=output)
def perform_rearrangefieldvalue(colID, fldID, ln, callback='yes', confirm=-1):
"""rearrang the fieldvalues alphabetically
colID - the collection
fldID - the field to rearrange the fieldvalue for
"""
subtitle = "Order values alphabetically"
output = ""
col_fldv = get_col_fld(colID, 'seo', fldID)
col_fldv = dict(map(lambda x: (x[1], x[0]), col_fldv))
fldv_names = get_fld_value()
fldv_names = map(lambda x: (x[0], x[1]), fldv_names)
if not col_fldv.has_key(None):
vscore = len(col_fldv)
for (fldvID, name) in fldv_names:
if col_fldv.has_key(fldvID):
run_sql("UPDATE collection_field_fieldvalue SET score_fieldvalue=%s WHERE id_collection=%s and id_field=%s and id_fieldvalue=%s" % (vscore, colID, fldID, fldvID))
vscore -= 1
output += write_outcome((1, ""))
else:
output += write_outcome((0, (0, "No values to order")))
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_modifyfield(colID, fldID, ln, content=output)
def perform_rearrangefield(colID, ln, fmeth, callback='yes', confirm=-1):
"""rearrang the fields alphabetically
colID - the collection
"""
subtitle = "Order fields alphabetically"
output = ""
col_fld = dict(map(lambda x: (x[0], x[1]), get_col_fld(colID, fmeth)))
fld_names = get_def_name('', "field")
if len(col_fld) > 0:
score = len(col_fld)
for (fldID, name) in fld_names:
if col_fld.has_key(fldID):
run_sql("UPDATE collection_field_fieldvalue SET score=%s WHERE id_collection=%s and id_field=%s" % (score, colID, fldID))
score -= 1
output += write_outcome((1, ""))
else:
output += write_outcome((0, (0, "No fields to order")))
body = [output]
output = " " + addadminbox(subtitle, body)
if fmeth == "soo":
return perform_showsortoptions(colID, ln, content=output)
elif fmeth == "sew":
return perform_showsearchfields(colID, ln, content=output)
elif fmeth == "seo":
return perform_showsearchoptions(colID, ln, content=output)
def perform_addexistingfieldvalue(colID, fldID, fldvID=-1, ln=cdslang, callback='yes', confirm=-1):
"""form to add an existing fieldvalue to a field.
colID - the collection
fldID - the field to add the fieldvalue to
fldvID - the fieldvalue to add"""
subtitle = """Add existing value to search option"""
output = ""
if fldvID not in [-1, "-1"] and confirm in [1, "1"]:
fldvID = int(fldvID)
ares = add_col_fld(colID, fldID, 'seo', fldvID)
colID = int(colID)
fldID = int(fldID)
lang = dict(get_languages())
res = get_def_name('', "field")
col_dict = dict(get_def_name('', "collection"))
fld_dict = dict(res)
col_fld = dict(map(lambda x: (x[0], x[1]), get_col_fld(colID, 'seo')))
fld_value = get_fld_value()
fldv_dict = dict(map(lambda x: (x[0], x[1]), fld_value))
text = """
Value """
output += createhiddenform(action="addexistingfieldvalue#7.4",
text=text,
button="Add",
colID=colID,
fldID=fldID,
ln=ln,
confirm=1)
if fldvID not in [-1, "-1"] and confirm in [1, "1"]:
output += write_outcome(ares)
elif confirm in [1, "1"]:
output += """Select a value to add and try again."""
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_modifyfield(colID, fldID, ln, content=output)
def perform_addexistingfield(colID, ln, fldID=-1, fldvID=-1, fmeth='', callback='yes', confirm=-1):
"""form to add an existing field to a collection.
colID - the collection to add the field to
fldID - the field to add
sel_ln - the language of the field"""
subtitle = """Add existing field to collection"""
output = ""
if fldID not in [-1, "-1"] and confirm in [1, "1"]:
fldID = int(fldID)
ares = add_col_fld(colID, fldID, fmeth, fldvID)
colID = int(colID)
lang = dict(get_languages())
res = get_def_name('', "field")
col_dict = dict(get_def_name('', "collection"))
fld_dict = dict(res)
col_fld = dict(map(lambda x: (x[0], x[1]), get_col_fld(colID, fmeth)))
fld_value = get_fld_value()
fldv_dict = dict(map(lambda x: (x[0], x[1]), fld_value))
if fldvID:
fldvID = int(fldvID)
text = """
Field """
output += createhiddenform(action="addexistingfield#6.2",
text=text,
button="Add",
colID=colID,
fmeth=fmeth,
ln=ln,
confirm=1)
if fldID not in [-1, "-1"] and confirm in [1, "1"]:
output += write_outcome(ares)
elif fldID in [-1, "-1"] and confirm not in [-1, "-1"]:
output += """Select a field.
"""
body = [output]
output = " " + addadminbox(subtitle, body)
if fmeth == "soo":
return perform_showsortoptions(colID, ln, content=output)
elif fmeth == "sew":
return perform_showsearchfields(colID, ln, content=output)
elif fmeth == "seo":
return perform_showsearchoptions(colID, ln, content=output)
def perform_showsortoptions(colID, ln, callback='yes', content='', confirm=-1):
"""show the sort fields of this collection.."""
colID = int(colID)
col_dict = dict(get_def_name('', "collection"))
fld_dict = dict(get_def_name('', "field"))
fld_type = get_sort_nametypes()
subtitle = """8. Modify sort options for collection '%s'   [?]""" % (col_dict[colID], weburl)
output = """
Field actions (not related to this collection)
Go to the BibIndex interface to modify the available sort options
'
return addadminbox(subtitle, [output])
def perform_update_external_collections(colID, ln, state_list, recurse_list):
colID = int(colID)
changes = []
output = ""
if not state_list:
return 'Warning : No state found. ' + perform_manage_external_collections(colID, ln)
external_collections = external_collection_sort_engine_by_name(external_collections_dictionary.values())
if len(external_collections) != len(state_list):
return 'Warning : Size of state_list different from external_collections! ' + perform_manage_external_collections(colID, ln)
for (external_collection, state) in zip(external_collections, state_list):
state = int(state)
collection_name = external_collection.name
recurse = recurse_list and collection_name in recurse_list
oldstate = external_collection_get_state(external_collection, colID)
if oldstate != state or recurse:
changes += external_collection_get_update_state_list(external_collection, colID, state, recurse)
external_collection_apply_changes(changes)
return output + '
' + perform_manage_external_collections(colID, ln)
def perform_showdetailedrecordoptions(colID, ln, callback='yes', content='', confirm=-1):
"""Show the interface to configure detailed record page to the user."""
colID = int(colID)
subtitle = """12. Configuration of detailed record page
  [?]""" % weburl
output = '''
Show tabs:
''' % {'colID': colID}
for (tab_id, tab_info) in get_detailed_page_tabs(colID).iteritems():
if tab_id == 'comments' and \
not CFG_WEBCOMMENT_ALLOW_REVIEWS and \
not CFG_WEBCOMMENT_ALLOW_COMMENTS:
continue
check = ''
output += '''
''' % {'tabid':tab_id,
'check':((tab_info['visible'] and 'checked="checked"') or ''),
'label':tab_info['label']}
output += '
'
output += '
'
output += '''
'''
output += '
'
return addadminbox(subtitle, [output])
def perform_update_detailed_record_options(colID, ln, tabs, recurse):
"""Update the preferences for the tab to show/hide in the detailed record page."""
colID = int(colID)
changes = []
output = 'Operation successfully completed.'
if '' in tabs:
tabs.remove('')
tabs.append('metadata')
def update_settings(colID, tabs, recurse):
run_sql("DELETE FROM collectiondetailedrecordpagetabs WHERE id_collection='%s'" % colID)
run_sql("REPLACE INTO collectiondetailedrecordpagetabs" + \
" SET id_collection='%s', tabs='%s'" % (colID, ';'.join(tabs)))
## for enabled_tab in tabs:
## run_sql("REPLACE INTO collectiondetailedrecordpagetabs" + \
## " SET id_collection='%s', tabs='%s'" % (colID, ';'.join(tabs)))
if recurse:
for descendant_id in get_collection_descendants(colID):
update_settings(descendant_id, tabs, recurse)
update_settings(colID, tabs, recurse)
## for colID in colIDs:
## run_sql("DELETE FROM collectiondetailedrecordpagetabs WHERE id_collection='%s'" % colID)
## for enabled_tab in tabs:
## run_sql("REPLACE INTO collectiondetailedrecordpagetabs" + \
## " SET id_collection='%s', tabs='%s'" % (colID, ';'.join(tabs)))
#if callback:
return perform_editcollection(colID, ln, "perform_modifytranslations",
'
' + perform_showdetailedrecordoptions(colID, ln)
def perform_addexistingoutputformat(colID, ln, fmtID=-1, callback='yes', confirm=-1):
"""form to add an existing output format to a collection.
colID - the collection the format should be added to
fmtID - the format to add."""
subtitle = """Add existing output format to collection"""
output = ""
if fmtID not in [-1, "-1"] and confirm in [1, "1"]:
ares = add_col_fmt(colID, fmtID)
colID = int(colID)
res = get_def_name('', "format")
fmt_dict = dict(res)
col_dict = dict(get_def_name('', "collection"))
col_fmt = get_col_fmt(colID)
col_fmt = dict(map(lambda x: (x[0], x[2]), col_fmt))
if len(res) > 0:
text = """
Output format
"""
output += createhiddenform(action="addexistingoutputformat#10.2",
text=text,
button="Add",
colID=colID,
ln=ln,
confirm=1)
else:
output = """No existing output formats to add, please create a new one."""
if fmtID not in [-1, "-1"] and confirm in [1, "1"]:
output += write_outcome(ares)
elif fmtID in [-1, "-1"] and confirm not in [-1, "-1"]:
output += """Please select output format."""
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_showoutputformats(colID, ln, content=output)
def perform_deleteoutputformat(colID, ln, fmtID=-1, callback='yes', confirm=-1):
"""form to delete an output format not in use.
colID - the collection id of the current collection.
fmtID - the format id to delete."""
subtitle = """Delete an unused output format"""
output = """
Deleting an output format will also delete the translations associated.
"""
colID = int(colID)
if fmtID not in [-1, "-1"] and confirm in [1, "1"]:
fmt_dict = dict(get_def_name('', "format"))
old_colNAME = fmt_dict[int(fmtID)]
ares = delete_fmt(int(fmtID))
res = get_def_name('', "format")
fmt_dict = dict(res)
col_dict = dict(get_def_name('', "collection"))
col_fmt = get_col_fmt()
col_fmt = dict(map(lambda x: (x[0], x[2]), col_fmt))
if len(res) > 0:
text = """
Output format """
output += createhiddenform(action="deleteoutputformat#10.3",
text=text,
button="Delete",
colID=colID,
ln=ln,
confirm=0)
if fmtID not in [-1, "-1"]:
fmtID = int(fmtID)
if confirm in [0, "0"]:
text = """Do you want to delete the output format '%s'.
""" % fmt_dict[fmtID]
output += createhiddenform(action="deleteoutputformat#10.3",
text=text,
button="Confirm",
colID=colID,
fmtID=fmtID,
ln=ln,
confirm=1)
elif confirm in [1, "1"]:
output += write_outcome(ares)
elif confirm not in [-1, "-1"]:
output += """Choose a output format to delete.
"""
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_showoutputformats(colID, ln, content=output)
def perform_removeoutputformat(colID, ln, fmtID='', callback='yes', confirm=0):
"""form to remove an output format from a collection.
colID - the collection id of the current collection.
fmtID - the format id.
"""
subtitle = """Remove output format"""
output = ""
col_dict = dict(get_def_name('', "collection"))
fmt_dict = dict(get_def_name('', "format"))
if colID and fmtID:
colID = int(colID)
fmtID = int(fmtID)
if confirm in ["0", 0]:
text = """Do you want to remove the output format '%s' from the collection '%s'.""" % (fmt_dict[fmtID], col_dict[colID])
output += createhiddenform(action="removeoutputformat#10.5",
text=text,
button="Confirm",
colID=colID,
fmtID=fmtID,
confirm=1)
elif confirm in ["1", 1]:
res = remove_fmt(colID, fmtID)
output += write_outcome(res)
body = [output]
output = " " + addadminbox(subtitle, body)
return perform_showoutputformats(colID, ln, content=output)
def perform_index(colID=1, ln=cdslang, mtype='', content='', confirm=0):
"""The index method, calling methods to show the collection tree, create new collections and add collections to tree.
"""
subtitle = "Overview"
colID = int(colID)
col_dict = dict(get_def_name('', "collection"))
output = ""
fin_output = ""
if not col_dict.has_key(1):
res = add_col(cdsname, '')
if res:
fin_output += """Created root collection. """
else:
return "Cannot create root collection, please check database."
if cdsname != run_sql("SELECT name from collection WHERE id=1")[0][0]:
res = run_sql("update collection set name='%s' where id=1" % cdsname)
if res:
fin_output += """The name of the root collection has been modified to be the same as the %(cdsname)s installation name given prior to installing %(cdsname)s. """ % {'cdsname' : cdsname}
else:
return "Error renaming root collection."
fin_output += """
For managing the collections, select an item from the menu. """
if mtype == "perform_addcollection" and content:
fin_output += content
elif mtype == "perform_addcollection" or mtype == "perform_showall":
fin_output += perform_addcollection(colID=colID, ln=ln, callback='')
fin_output += " "
if mtype == "perform_addcollectiontotree" and content:
fin_output += content
elif mtype == "perform_addcollectiontotree" or mtype == "perform_showall":
fin_output += perform_addcollectiontotree(colID=colID, ln=ln, callback='')
fin_output += " "
if mtype == "perform_modifycollectiontree" and content:
fin_output += content
elif mtype == "perform_modifycollectiontree" or mtype == "perform_showall":
fin_output += perform_modifycollectiontree(colID=colID, ln=ln, callback='')
fin_output += " "
if mtype == "perform_checkwebcollstatus" and content:
fin_output += content
elif mtype == "perform_checkwebcollstatus" or mtype == "perform_showall":
fin_output += perform_checkwebcollstatus(colID, ln, callback='')
if mtype == "perform_checkcollectionstatus" and content:
fin_output += content
elif mtype == "perform_checkcollectionstatus" or mtype == "perform_showall":
fin_output += perform_checkcollectionstatus(colID, ln, callback='')
body = [fin_output]
return addadminbox('Menu', body)
def show_coll_not_in_tree(colID, ln, col_dict):
"""Returns collections not in tree"""
tree = get_col_tree(colID)
in_tree = {}
output = "These collections are not in the tree, and should be added: "
for (id, up, down, dad, reltype) in tree:
in_tree[id] = 1
in_tree[dad] = 1
res = run_sql("SELECT id from collection")
if len(res) != len(in_tree):
for id in res:
if not in_tree.has_key(id[0]):
output += """%s ,
""" % (weburl, id[0], ln, col_dict[id[0]])
output += "
"
else:
output = ""
return output
def create_colltree(tree, col_dict, colID, ln, move_from='', move_to='', rtype='', edit=''):
"""Creates the presentation of the collection tree, with the buttons for modifying it.
tree - the tree to present, from get_tree()
col_dict - the name of the collections in a dictionary
colID - the collection id to start with
move_from - if a collection to be moved has been chosen
move_to - the collection which should be set as father of move_from
rtype - the type of the tree, regular or virtual
edit - if the method should output the edit buttons."""
if move_from:
move_from_rtype = move_from[0]
move_from_id = int(move_from[1:len(move_from)])
tree_from = get_col_tree(colID, move_from_rtype)
tree_to = get_col_tree(colID, rtype)
tables = 0
tstack = []
i = 0
text = """
"""
for i in range(0, len(tree)):
id_son = tree[i][0]
up = tree[i][1]
down = tree[i][2]
dad = tree[i][3]
reltype = tree[i][4]
tmove_from = ""
j = i
while j > 0:
j = j - 1
try:
if tstack[j][1] == dad:
table = tstack[j][2]
for k in range(0, tables - table):
tables = tables - 1
text += """
"""
break
except StandardError, e:
pass
text += """
"""
if i > 0 and tree[i][1] == 0:
tables = tables + 1
text += """
"""
if i == 0:
tstack.append((id_son, dad, 1))
else:
tstack.append((id_son, dad, tables))
if up == 1 and edit:
text += """""" % (weburl, colID, ln, i, rtype, tree[i][0], weburl)
else:
text += """ """
text += "
"
if down == 1 and edit:
text += """""" % (weburl, colID, ln, i, rtype, tree[i][0], weburl)
else:
text += """ """
text += "
"
if edit:
if move_from and move_to:
tmove_from = move_from
move_from = ''
if not (move_from == "" and i == 0) and not (move_from != "" and int(move_from[1:len(move_from)]) == i and rtype == move_from[0]):
check = "true"
if move_from:
#if tree_from[move_from_id][0] == tree_to[i][0] or not check_col(tree_to[i][0], tree_from[move_from_id][0]):
# check = ''
#elif not check_col(tree_to[i][0], tree_from[move_from_id][0]):
# check = ''
#if not check and (tree_to[i][0] == 1 and tree_from[move_from_id][3] == tree_to[i][0] and move_from_rtype != rtype):
# check = "true"
if check:
text += """
""" % (weburl, colID, ln, move_from, rtype, i, rtype, weburl, col_dict[tree_from[int(move_from[1:len(move_from)])][0]], col_dict[tree_to[i][0]])
else:
try:
text += """""" % (weburl, colID, ln, rtype, i, rtype, tree[i][0], weburl, col_dict[tree[i][0]])
except KeyError:
pass
else:
text += """
""" % weburl
else:
text += """
""" % weburl
text += """
"""
if edit:
try:
text += """""" % (weburl, colID, ln, i, rtype, tree[i][0], weburl)
except KeyError:
pass
elif i != 0:
text += """
""" % weburl
text += """
"""
if tmove_from:
move_from = tmove_from
try:
text += """%s%s%s%s%s""" % (tree[i][0], (reltype=="v" and '' or ''), weburl, tree[i][0], ln, col_dict[id_son], (move_to=="%s%s" %(rtype, i) and ' ' % weburl or ''), (move_from=="%s%s" % (rtype, i) and ' ' % weburl or ''), (reltype=="v" and '' or ''))
except KeyError:
pass
text += """
"""
while tables > 0:
text += """
"""
tables = tables - 1
text += """
"""
return text
def perform_deletecollection(colID, ln, confirm=-1, callback='yes'):
"""form to delete a collection
colID - id of collection
"""
subtitle =''
output = """
WARNING:
When deleting a collection, you also deletes all data related to the collection like translations, relations to other collections and information about which rank methods to use.
For more information, please go to the WebSearch guide and read the section regarding deleting a collection.
""" % weburl
col_dict = dict(get_def_name('', "collection"))
if colID != 1 and colID and col_dict.has_key(int(colID)):
colID = int(colID)
subtitle = """4. Delete collection '%s'   [?]""" % (col_dict[colID], weburl)
res = run_sql("SELECT * from collection_collection WHERE id_dad=%s" % colID)
res2 = run_sql("SELECT * from collection_collection WHERE id_son=%s" % colID)
if not res and not res2:
if confirm in ["-1", -1]:
text = """Do you want to delete this collection."""
output += createhiddenform(action="deletecollection#4",
text=text,
colID=colID,
button="Delete",
confirm=0)
elif confirm in ["0", 0]:
text = """Are you sure you want to delete this collection."""
output += createhiddenform(action="deletecollection#4",
text=text,
colID=colID,
button="Confirm",
confirm=1)
elif confirm in ["1", 1]:
result = delete_col(colID)
if not result:
raise Exception
else:
output = """Can not delete a collection that is a part of the collection tree, remove collection from the tree and try again."""
else:
subtitle = """4. Delete collection"""
output = """Not possible to delete the root collection"""
body = [output]
if callback:
return perform_editcollection(colID, ln, "perform_deletecollection", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_editcollection(colID=1, ln=cdslang, mtype='', content=''):
"""interface to modify a collection. this method is calling other methods which again is calling this and sending back the output of the method.
if callback, the method will call perform_editcollection, if not, it will just return its output.
colID - id of the collection
mtype - the method that called this method.
content - the output from that method."""
colID = int(colID)
col_dict = dict(get_def_name('', "collection"))
if not col_dict.has_key(colID):
return """Collection deleted.
"""
fin_output = """
""" % (colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln)
if mtype == "perform_modifydbquery" and content:
fin_output += content
elif mtype == "perform_modifydbquery" or not mtype:
fin_output += perform_modifydbquery(colID, ln, callback='')
if mtype == "perform_modifyrestricted" and content:
fin_output += content
elif mtype == "perform_modifyrestricted" or not mtype:
fin_output += perform_modifyrestricted(colID, ln, callback='')
if mtype == "perform_modifytranslations" and content:
fin_output += content
elif mtype == "perform_modifytranslations" or not mtype:
fin_output += perform_modifytranslations(colID, ln, callback='')
if mtype == "perform_deletecollection" and content:
fin_output += content
elif mtype == "perform_deletecollection" or not mtype:
fin_output += perform_deletecollection(colID, ln, callback='')
if mtype == "perform_showportalboxes" and content:
fin_output += content
elif mtype == "perform_showportalboxes" or not mtype:
fin_output += perform_showportalboxes(colID, ln, callback='')
if mtype == "perform_showsearchfields" and content:
fin_output += content
elif mtype == "perform_showsearchfields" or not mtype:
fin_output += perform_showsearchfields(colID, ln, callback='')
if mtype == "perform_showsearchoptions" and content:
fin_output += content
elif mtype == "perform_showsearchoptions" or not mtype:
fin_output += perform_showsearchoptions(colID, ln, callback='')
if mtype == "perform_showsortoptions" and content:
fin_output += content
elif mtype == "perform_showsortoptions" or not mtype:
fin_output += perform_showsortoptions(colID, ln, callback='')
if mtype == "perform_modifyrankmethods" and content:
fin_output += content
elif mtype == "perform_modifyrankmethods" or not mtype:
fin_output += perform_modifyrankmethods(colID, ln, callback='')
if mtype == "perform_showoutputformats" and content:
fin_output += content
elif mtype == "perform_showoutputformats" or not mtype:
fin_output += perform_showoutputformats(colID, ln, callback='')
if mtype == "perform_manage_external_collections" and content:
fin_output += content
elif mtype == "perform_manage_external_collections" or not mtype:
fin_output += perform_manage_external_collections(colID, ln, callback='')
if mtype == "perform_showdetailedrecordoptions" and content:
fin_output += content
elif mtype == "perform_showdetailedrecordoptions" or not mtype:
fin_output += perform_showdetailedrecordoptions(colID, ln, callback='')
return addadminbox("Overview of edit options for collection '%s'" % col_dict[colID], [fin_output])
def perform_checkwebcollstatus(colID, ln, confirm=0, callback='yes'):
"""Check status of the collection tables with respect to the webcoll cache."""
subtitle = """Webcoll Status   [?]""" % weburl
output = ""
colID = int(colID)
col_dict = dict(get_def_name('', "collection"))
output += """ Last updates: """
collection_table_update_time = ""
collection_web_update_time = ""
collection_table_update_time = get_table_update_time('collection')
output += "Collection table last updated: %s " % collection_table_update_time
try:
- file = open("%s/collections/last_updated" % cachedir)
+ file = open("%s/collections/last_updated" % CFG_CACHEDIR)
collection_web_update_time = file.readline().strip()
output += "Collection cache last updated: %s " % collection_web_update_time
file.close()
except:
pass
# reformat collection_web_update_time to the format suitable for comparisons
try:
collection_web_update_time = time.strftime("%Y-%m-%d %H:%M:%S",
time.strptime(collection_web_update_time, "%d %b %Y %H:%M:%S"))
except ValueError, e:
pass
if collection_table_update_time > collection_web_update_time:
output += """ Warning: The collections has been modified since last time Webcoll was executed, to process the changes, Webcoll must be executed. """
header = ['ID', 'Name', 'Time', 'Status', 'Progress']
actions = []
output += """ Last BibSched tasks: """
res = run_sql("select id, proc, host, user, runtime, sleeptime, arguments, status, progress from schTASK where proc='webcoll' and runtime< now() ORDER by runtime")
if len(res) > 0:
(id, proc, host, user, runtime, sleeptime, arguments, status, progress) = res[len(res) - 1]
webcoll__update_time = runtime
actions.append([id, proc, runtime, (status !="" and status or ''), (progress !="" and progress or '')])
else:
actions.append(['', 'webcoll', '', '', 'Not executed yet'])
res = run_sql("select id, proc, host, user, runtime, sleeptime, arguments, status, progress from schTASK where proc='bibindex' and runtime< now() ORDER by runtime")
if len(res) > 0:
(id, proc, host, user, runtime, sleeptime, arguments, status, progress) = res[len(res) - 1]
actions.append([id, proc, runtime, (status !="" and status or ''), (progress !="" and progress or '')])
else:
actions.append(['', 'bibindex', '', '', 'Not executed yet'])
output += tupletotable(header=header, tuple=actions)
output += """ Next scheduled BibSched run: """
actions = []
res = run_sql("select id, proc, host, user, runtime, sleeptime, arguments, status, progress from schTASK where proc='webcoll' and runtime > now() ORDER by runtime")
webcoll_future = ""
if len(res) > 0:
(id, proc, host, user, runtime, sleeptime, arguments, status, progress) = res[0]
webcoll__update_time = runtime
actions.append([id, proc, runtime, (status !="" and status or ''), (progress !="" and progress or '')])
webcoll_future = "yes"
else:
actions.append(['', 'webcoll', '', '', 'Not scheduled'])
res = run_sql("select id, proc, host, user, runtime, sleeptime, arguments, status, progress from schTASK where proc='bibindex' and runtime > now() ORDER by runtime")
bibindex_future = ""
if len(res) > 0:
(id, proc, host, user, runtime, sleeptime, arguments, status, progress) = res[0]
actions.append([id, proc, runtime, (status !="" and status or ''), (progress !="" and progress or '')])
bibindex_future = "yes"
else:
actions.append(['', 'bibindex', '', '', 'Not scheduled'])
output += tupletotable(header=header, tuple=actions)
if webcoll_future == "":
output += """ Warning: Webcoll is not scheduled for a future run by bibsched, any updates to the collection will not be processed. """
if bibindex_future == "":
output += """ Warning: Bibindex is not scheduled for a future run by bibsched, any updates to the records will not be processed. """
body = [output]
if callback:
return perform_index(colID, ln, "perform_checkwebcollstatus", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_modifyrestricted(colID, ln, rest='', callback='yes', confirm=-1):
"""modify which apache group is allowed to access the collection.
rest - the groupname"""
subtitle = ''
output = ""
col_dict = dict(get_def_name('', "collection"))
action_id = acc_get_action_id(VIEWRESTRCOLL)
if colID and col_dict.has_key(int(colID)):
colID = int(colID)
subtitle = """2. Modify access restrictions for collection '%s'   [?]""" % (col_dict[colID], weburl)
output = """
Please note that CDS Invenio versions greater than 0.92.1 manage collection restriction via the standard
WebAccess Admin Interface (action '%s').
""" % (action_id, VIEWRESTRCOLL)
body = [output]
if callback:
return perform_editcollection(colID, ln, "perform_modifyrestricted", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def perform_checkcollectionstatus(colID, ln, confirm=0, callback='yes'):
"""Check the configuration of the collections."""
from invenio.search_engine import collection_restricted_p
subtitle = """Collection Status   [?]""" % weburl
output = ""
colID = int(colID)
col_dict = dict(get_def_name('', "collection"))
collections = run_sql("SELECT id, name, dbquery FROM collection ORDER BY id")
header = ['ID', 'Name', 'Query', 'Subcollections', 'Restricted', 'I18N', 'Status']
rnk_list = get_def_name('', "rnkMETHOD")
actions = []
for (id, name, dbquery) in collections:
reg_sons = len(get_col_tree(id, 'r'))
vir_sons = len(get_col_tree(id, 'v'))
status = ""
langs = run_sql("SELECT ln from collectionname where id_collection=%s" % id)
i8n = ""
if len(langs) > 0:
for lang in langs:
i8n += "%s, " % lang
else:
i8n = """None"""
if (reg_sons > 1 and dbquery) or dbquery=="":
status = """1:Query"""
elif dbquery is None and reg_sons == 1:
status = """2:Query"""
elif dbquery == "" and reg_sons == 1:
status = """3:Query"""
if (reg_sons > 1 or vir_sons > 1):
subs = """Yes"""
else:
subs = """No"""
if dbquery is None:
dbquery = """No"""
restricted = collection_restricted_p(name)
if restricted:
restricted = """Yes"""
if status:
status += """,4:Restricted"""
else:
status += """4:Restricted"""
else:
restricted = """No"""
if status == "":
status = """OK"""
actions.append([id, """%s""" % (weburl, id, ln, name), dbquery, subs, restricted, i8n, status])
output += tupletotable(header=header, tuple=actions)
body = [output]
return addadminbox(subtitle, body)
if callback:
return perform_index(colID, ln, "perform_checkcollectionstatus", addadminbox(subtitle, body))
else:
return addadminbox(subtitle, body)
def get_col_tree(colID, rtype=''):
"""Returns a presentation of the tree as a list. TODO: Add loop detection
colID - startpoint for the tree
rtype - get regular or virtual part of the tree"""
try:
colID = int(colID)
stack = [colID]
ssize = 0
tree = [(colID, 0, 0, colID, 'r')]
while len(stack) > 0:
ccolID = stack.pop()
if ccolID == colID and rtype:
res = run_sql("SELECT id_son, score, type FROM collection_collection WHERE id_dad=%s AND type='%s' ORDER BY score ASC,id_son" % (ccolID, rtype))
else:
res = run_sql("SELECT id_son, score, type FROM collection_collection WHERE id_dad=%s ORDER BY score ASC,id_son" % ccolID)
ssize += 1
ntree = []
for i in range(0, len(res)):
id_son = res[i][0]
score = res[i][1]
rtype = res[i][2]
stack.append(id_son)
if i == (len(res) - 1):
up = 0
else:
up = 1
if i == 0:
down = 0
else:
down = 1
ntree.insert(0, (id_son, up, down, ccolID, rtype))
tree = tree[0:ssize] + ntree + tree[ssize:len(tree)]
return tree
except StandardError, e:
return ()
def add_col_dad_son(add_dad, add_son, rtype):
"""Add a son to a collection (dad)
add_dad - add to this collection id
add_son - add this collection id
rtype - either regular or virtual"""
try:
res = run_sql("SELECT score FROM collection_collection WHERE id_dad=%s ORDER BY score ASC" % add_dad)
highscore = 0
for score in res:
if int(score[0]) > highscore:
highscore = int(score[0])
highscore += 1
res = run_sql("INSERT INTO collection_collection(id_dad,id_son,score,type) values(%s,%s,%s,'%s')" % (add_dad, add_son, highscore, rtype))
return (1, highscore)
except StandardError, e:
return (0, e)
def compare_on_val(first, second):
"""Compare the two values"""
return cmp(first[1], second[1])
def get_col_fld(colID=-1, type = '', id_field=''):
"""Returns either all portalboxes associated with a collection, or based on either colID or language or both.
colID - collection id
ln - language id"""
sql = "SELECT id_field,id_fieldvalue,type,score,score_fieldvalue FROM collection_field_fieldvalue, field WHERE id_field=field.id"
try:
if colID > -1:
sql += " AND id_collection=%s" % colID
if id_field:
sql += " AND id_field=%s" % id_field
if type:
sql += " AND type='%s'" % type
sql += " ORDER BY type, score desc, score_fieldvalue desc"
res = run_sql(sql)
return res
except StandardError, e:
return ""
def get_col_pbx(colID=-1, ln='', position = ''):
"""Returns either all portalboxes associated with a collection, or based on either colID or language or both.
colID - collection id
ln - language id"""
sql = "SELECT id_portalbox, id_collection, ln, score, position, title, body FROM collection_portalbox, portalbox WHERE id_portalbox = portalbox.id"
try:
if colID > -1:
sql += " AND id_collection=%s" % colID
if ln:
sql += " AND ln='%s'" % ln
if position:
sql += " AND position='%s'" % position
sql += " ORDER BY position, ln, score desc"
res = run_sql(sql)
return res
except StandardError, e:
return ""
def get_col_fmt(colID=-1):
"""Returns all formats currently associated with a collection, or for one specific collection
colID - the id of the collection"""
try:
if colID not in [-1, "-1"]:
res = run_sql("SELECT id_format, id_collection, code, score FROM collection_format, format WHERE id_format = format.id AND id_collection=%s ORDER BY score desc" % colID)
else:
res = run_sql("SELECT id_format, id_collection, code, score FROM collection_format, format WHERE id_format = format.id ORDER BY score desc")
return res
except StandardError, e:
return ""
def get_col_rnk(colID, ln):
""" Returns a list of the rank methods the given collection is attached to
colID - id from collection"""
try:
res1 = dict(run_sql("SELECT id_rnkMETHOD, '' FROM collection_rnkMETHOD WHERE id_collection=%s" % colID))
res2 = get_def_name('', "rnkMETHOD")
result = filter(lambda x: res1.has_key(x[0]), res2)
return result
except StandardError, e:
return ()
def get_pbx():
"""Returns all portalboxes"""
try:
res = run_sql("SELECT id, title, body FROM portalbox ORDER by title,body")
return res
except StandardError, e:
return ""
def get_fld_value(fldvID = ''):
"""Returns fieldvalue"""
try:
sql = "SELECT id, name, value FROM fieldvalue"
if fldvID:
sql += " WHERE id=%s" % fldvID
sql += " ORDER BY name"
res = run_sql(sql)
return res
except StandardError, e:
return ""
def get_pbx_pos():
"""Returns a list of all the positions for a portalbox"""
position = {}
position["rt"] = "Right Top"
position["lt"] = "Left Top"
position["te"] = "Title Epilog"
position["tp"] = "Title Prolog"
position["ne"] = "Narrow by coll epilog"
position["np"] = "Narrow by coll prolog"
return position
def get_sort_nametypes():
"""Return a list of the various translationnames for the fields"""
type = {}
type['soo'] = 'Sort options'
type['seo'] = 'Search options'
type['sew'] = 'Search within'
return type
def get_fmt_nametypes():
"""Return a list of the various translationnames for the output formats"""
type = []
type.append(('ln', 'Long name'))
return type
def get_fld_nametypes():
"""Return a list of the various translationnames for the fields"""
type = []
type.append(('ln', 'Long name'))
return type
def get_col_nametypes():
"""Return a list of the various translationnames for the collections"""
type = []
type.append(('ln', 'Long name'))
return type
def find_last(tree, start_son):
"""Find the previous collection in the tree with the same father as start_son"""
id_dad = tree[start_son][3]
while start_son > 0:
start_son -= 1
if tree[start_son][3] == id_dad:
return start_son
def find_next(tree, start_son):
"""Find the next collection in the tree with the same father as start_son"""
id_dad = tree[start_son][3]
while start_son < len(tree):
start_son += 1
if tree[start_son][3] == id_dad:
return start_son
def remove_col_subcol(id_son, id_dad, type):
"""Remove a collection as a son of another collection in the tree, if collection isn't used elsewhere in the tree, remove all registered sons of the id_son.
id_son - collection id of son to remove
id_dad - the id of the dad"""
try:
if id_son != id_dad:
tree = get_col_tree(id_son)
res = run_sql("DELETE FROM collection_collection WHERE id_son=%s and id_dad=%s" % (id_son, id_dad))
else:
tree = get_col_tree(id_son, type)
res = run_sql("DELETE FROM collection_collection WHERE id_son=%s and id_dad=%s and type='%s'" % (id_son, id_dad, type))
if not run_sql("SELECT * from collection_collection WHERE id_son=%s and type='%s'" % (id_son, type)):
for (id, up, down, dad, rtype) in tree:
res = run_sql("DELETE FROM collection_collection WHERE id_son=%s and id_dad=%s" % (id, dad))
return (1, "")
except StandardError, e:
return (0, e)
def check_col(add_dad, add_son):
"""Check if the collection can be placed as a son of the dad without causing loops.
add_dad - collection id
add_son - collection id"""
try:
stack = [add_dad]
res = run_sql("SELECT id_dad FROM collection_collection WHERE id_dad=%s AND id_son=%s" % (add_dad, add_son))
if res:
raise StandardError
while len(stack) > 0:
colID = stack.pop()
res = run_sql("SELECT id_dad FROM collection_collection WHERE id_son=%s" % colID)
for id in res:
if int(id[0]) == int(add_son):
raise StandardError
else:
stack.append(id[0])
return (1, "")
except StandardError, e:
return (0, e)
def attach_rnk_col(colID, rnkID):
"""attach rank method to collection
rnkID - id from rnkMETHOD table
colID - id of collection, as in collection table """
try:
res = run_sql("INSERT INTO collection_rnkMETHOD(id_collection, id_rnkMETHOD) values (%s,%s)" % (colID, rnkID))
return (1, "")
except StandardError, e:
return (0, e)
def detach_rnk_col(colID, rnkID):
"""detach rank method from collection
rnkID - id from rnkMETHOD table
colID - id of collection, as in collection table """
try:
res = run_sql("DELETE FROM collection_rnkMETHOD WHERE id_collection=%s AND id_rnkMETHOD=%s" % (colID, rnkID))
return (1, "")
except StandardError, e:
return (0, e)
def switch_col_treescore(col_1, col_2):
try:
res1 = run_sql("SELECT score FROM collection_collection WHERE id_dad=%s and id_son=%s" % (col_1[3], col_1[0]))
res2 = run_sql("SELECT score FROM collection_collection WHERE id_dad=%s and id_son=%s" % (col_2[3], col_2[0]))
res = run_sql("UPDATE collection_collection SET score=%s WHERE id_dad=%s and id_son=%s" % (res2[0][0], col_1[3], col_1[0]))
res = run_sql("UPDATE collection_collection SET score=%s WHERE id_dad=%s and id_son=%s" % (res1[0][0], col_2[3], col_2[0]))
return (1, "")
except StandardError, e:
return (0, e)
def move_col_tree(col_from, col_to, move_to_rtype=''):
"""Move a collection from one point in the tree to another. becomes a son of the endpoint.
col_from - move this collection from current point
col_to - and set it as a son of this collection.
move_to_rtype - either virtual or regular collection"""
try:
res = run_sql("SELECT score FROM collection_collection WHERE id_dad=%s ORDER BY score asc" % col_to[0])
highscore = 0
for score in res:
if int(score[0]) > highscore:
highscore = int(score[0])
highscore += 1
if not move_to_rtype:
move_to_rtype = col_from[4]
res = run_sql("DELETE FROM collection_collection WHERE id_son=%s and id_dad=%s" % (col_from[0], col_from[3]))
res = run_sql("INSERT INTO collection_collection(id_dad,id_son,score,type) values(%s,%s,%s,'%s')" % (col_to[0], col_from[0], highscore, move_to_rtype))
return (1, "")
except StandardError, e:
return (0, e)
def remove_pbx(colID, pbxID, ln):
"""Removes a portalbox from the collection given.
colID - the collection the format is connected to
pbxID - the portalbox which should be removed from the collection.
ln - the language of the portalbox to be removed"""
try:
res = run_sql("DELETE FROM collection_portalbox WHERE id_collection=%s AND id_portalbox=%s AND ln='%s'" % (colID, pbxID, ln))
return (1, "")
except StandardError, e:
return (0, e)
def remove_fmt(colID, fmtID):
"""Removes a format from the collection given.
colID - the collection the format is connected to
fmtID - the format which should be removed from the collection."""
try:
res = run_sql("DELETE FROM collection_format WHERE id_collection=%s AND id_format=%s" % (colID, fmtID))
return (1, "")
except StandardError, e:
return (0, e)
def remove_fld(colID, fldID, fldvID=''):
"""Removes a field from the collection given.
colID - the collection the format is connected to
fldID - the field which should be removed from the collection."""
try:
sql = "DELETE FROM collection_field_fieldvalue WHERE id_collection=%s AND id_field=%s" % (colID, fldID)
if fldvID:
if fldvID != "None":
sql += " AND id_fieldvalue=%s" % fldvID
else:
sql += " AND id_fieldvalue is NULL"
res = run_sql(sql)
return (1, "")
except StandardError, e:
return (0, e)
def delete_fldv(fldvID):
"""Deletes all data for the given fieldvalue
fldvID - delete all data in the tables associated with fieldvalue and this id"""
try:
res = run_sql("DELETE FROM collection_field_fieldvalue WHERE id_fieldvalue=%s" % fldvID)
res = run_sql("DELETE FROM fieldvalue WHERE id=%s" % fldvID)
return (1, "")
except StandardError, e:
return (0, e)
def delete_pbx(pbxID):
"""Deletes all data for the given portalbox
pbxID - delete all data in the tables associated with portalbox and this id """
try:
res = run_sql("DELETE FROM collection_portalbox WHERE id_portalbox=%s" % pbxID)
res = run_sql("DELETE FROM portalbox WHERE id=%s" % pbxID)
return (1, "")
except StandardError, e:
return (0, e)
def delete_fmt(fmtID):
"""Deletes all data for the given format
fmtID - delete all data in the tables associated with format and this id """
try:
res = run_sql("DELETE FROM format WHERE id=%s" % fmtID)
res = run_sql("DELETE FROM collection_format WHERE id_format=%s" % fmtID)
res = run_sql("DELETE FROM formatname WHERE id_format=%s" % fmtID)
return (1, "")
except StandardError, e:
return (0, e)
def delete_col(colID):
"""Deletes all data for the given collection
colID - delete all data in the tables associated with collection and this id """
try:
res = run_sql("DELETE FROM collection WHERE id=%s" % colID)
res = run_sql("DELETE FROM collectionname WHERE id_collection=%s" % colID)
res = run_sql("DELETE FROM collection_rnkMETHOD WHERE id_collection=%s" % colID)
res = run_sql("DELETE FROM collection_collection WHERE id_dad=%s" % colID)
res = run_sql("DELETE FROM collection_collection WHERE id_son=%s" % colID)
res = run_sql("DELETE FROM collection_portalbox WHERE id_collection=%s" % colID)
res = run_sql("DELETE FROM collection_format WHERE id_collection=%s" % colID)
res = run_sql("DELETE FROM collection_field_fieldvalue WHERE id_collection=%s" % colID)
return (1, "")
except StandardError, e:
return (0, e)
def add_fmt(code, name, rtype):
"""Add a new output format. Returns the id of the format.
code - the code for the format, max 6 chars.
name - the default name for the default language of the format.
rtype - the default nametype"""
try:
res = run_sql("INSERT INTO format (code, name) values (%s,%s)", (code, name))
fmtID = run_sql("SELECT id FROM format WHERE code=%s", (code,))
res = run_sql("INSERT INTO formatname(id_format, type, ln, value) VALUES (%s,%s,%s,%s)",
(fmtID[0][0], rtype, cdslang, name))
return (1, fmtID)
except StandardError, e:
return (0, e)
def update_fldv(fldvID, name, value):
"""Modify existing fieldvalue
fldvID - id of fieldvalue to modify
value - the value of the fieldvalue
name - the name of the fieldvalue."""
try:
res = run_sql("UPDATE fieldvalue set name=%s where id=%s", (name, fldvID))
res = run_sql("UPDATE fieldvalue set value=%s where id=%s", (value, fldvID))
return (1, "")
except StandardError, e:
return (0, e)
def add_fldv(name, value):
"""Add a new fieldvalue, returns id of fieldvalue
value - the value of the fieldvalue
name - the name of the fieldvalue."""
try:
res = run_sql("SELECT id FROM fieldvalue WHERE name=%s and value=%s", (name, value))
if not res:
res = run_sql("INSERT INTO fieldvalue (name, value) values (%s,%s)", (name, value))
res = run_sql("SELECT id FROM fieldvalue WHERE name=%s and value=%s", (name, value))
if res:
return (1, res[0][0])
else:
raise StandardError
except StandardError, e:
return (0, e)
def add_pbx(title, body):
try:
res = run_sql("INSERT INTO portalbox (title, body) values (%s,%s)", (title, body))
res = run_sql("SELECT id FROM portalbox WHERE title=%s AND body=%s", (title, body))
if res:
return (1, res[0][0])
else:
raise StandardError
except StandardError, e:
return (0, e)
def add_col(colNAME, dbquery=None):
"""Adds a new collection to collection table
colNAME - the default name for the collection, saved to collection and collectionname
dbquery - query related to the collection"""
# BTW, sometimes '' are passed instead of None, so change them to None
if not dbquery:
dbquery = None
try:
rtype = get_col_nametypes()[0][0]
colID = run_sql("SELECT id FROM collection WHERE id=1")
if colID:
res = run_sql("INSERT INTO collection (name,dbquery) VALUES (%s,%s)",
(colNAME,dbquery))
else:
res = run_sql("INSERT INTO collection (id,name,dbquery) VALUES (1,%s,%s)",
(colNAME,dbquery))
colID = run_sql("SELECT id FROM collection WHERE name=%s", (colNAME,))
res = run_sql("INSERT INTO collectionname(id_collection, type, ln, value) VALUES (%s,%s,%s,%s)" % (colID[0][0], rtype, cdslang, colNAME))
if colID:
return (1, colID[0][0])
else:
raise StandardError
except StandardError, e:
return (0, e)
def add_col_pbx(colID, pbxID, ln, position, score=''):
"""add a portalbox to the collection.
colID - the id of the collection involved
pbxID - the portalbox to add
ln - which language the portalbox is for
score - decides which portalbox is the most important
position - position on page the portalbox should appear."""
try:
if score:
res = run_sql("INSERT INTO collection_portalbox(id_portalbox, id_collection, ln, score, position) values (%s,%s,'%s',%s,'%s')" % (pbxID, colID, ln, score, position))
else:
res = run_sql("SELECT score FROM collection_portalbox WHERE id_collection=%s and ln='%s' and position='%s' ORDER BY score desc, ln, position" % (colID, ln, position))
if res:
score = int(res[0][0])
else:
score = 0
res = run_sql("INSERT INTO collection_portalbox(id_portalbox, id_collection, ln, score, position) values (%s,%s,'%s',%s,'%s')" % (pbxID, colID, ln, (score + 1), position))
return (1, "")
except StandardError, e:
return (0, e)
def add_col_fmt(colID, fmtID, score=''):
"""Add a output format to the collection.
colID - the id of the collection involved
fmtID - the id of the format.
score - the score of the format, decides sorting, if not given, place the format on top"""
try:
if score:
res = run_sql("INSERT INTO collection_format(id_format, id_collection, score) values (%s,%s,%s)" % (fmtID, colID, score))
else:
res = run_sql("SELECT score FROM collection_format WHERE id_collection=%s ORDER BY score desc" % colID)
if res:
score = int(res[0][0])
else:
score = 0
res = run_sql("INSERT INTO collection_format(id_format, id_collection, score) values (%s,%s,%s)" % (fmtID, colID, (score + 1)))
return (1, "")
except StandardError, e:
return (0, e)
def add_col_fld(colID, fldID, type, fldvID=''):
"""Add a sort/search/field to the collection.
colID - the id of the collection involved
fldID - the id of the field.
fldvID - the id of the fieldvalue.
type - which type, seo, sew...
score - the score of the format, decides sorting, if not given, place the format on top"""
try:
if fldvID and fldvID not in [-1, "-1"]:
run_sql("DELETE FROM collection_field_fieldvalue WHERE id_collection=%s AND id_field=%s and type='%s' and id_fieldvalue is NULL" % (colID, fldID, type))
res = run_sql("SELECT score FROM collection_field_fieldvalue WHERE id_collection=%s AND id_field=%s and type='%s' ORDER BY score desc" % (colID, fldID, type))
if res:
score = int(res[0][0])
res = run_sql("SELECT score_fieldvalue FROM collection_field_fieldvalue WHERE id_collection=%s AND id_field=%s and type='%s' ORDER BY score_fieldvalue desc" % (colID, fldID, type))
else:
res = run_sql("SELECT score FROM collection_field_fieldvalue WHERE id_collection=%s and type='%s' ORDER BY score desc" % (colID, type))
if res:
score = int(res[0][0]) + 1
else:
score = 1
res = run_sql("SELECT * FROM collection_field_fieldvalue where id_field=%s and id_collection=%s and type='%s' and id_fieldvalue=%s" % (fldID, colID, type, fldvID))
if not res:
run_sql("UPDATE collection_field_fieldvalue SET score_fieldvalue=score_fieldvalue+1 WHERE id_field=%s AND id_collection=%s and type='%s'" % (fldID, colID, type))
res = run_sql("INSERT INTO collection_field_fieldvalue(id_field, id_fieldvalue, id_collection, type, score, score_fieldvalue) values (%s,%s,%s,'%s',%s,%s)" % (fldID, fldvID, colID, type, score, 1))
else:
return (0, (1, "Already exists"))
else:
res = run_sql("SELECT * FROM collection_field_fieldvalue WHERE id_collection=%s AND type='%s' and id_field=%s and id_fieldvalue is NULL" % (colID, type, fldID))
if res:
return (0, (1, "Already exists"))
else:
run_sql("UPDATE collection_field_fieldvalue SET score=score+1")
res = run_sql("INSERT INTO collection_field_fieldvalue(id_field, id_collection, type, score,score_fieldvalue) values (%s,%s,'%s',%s, 0)" % (fldID, colID, type, 1))
return (1, "")
except StandardError, e:
return (0, e)
def modify_dbquery(colID, dbquery=None):
"""Modify the dbquery of an collection.
colID - the id of the collection involved
dbquery - the new dbquery"""
# BTW, sometimes '' is passed instead of None, so change it to None
if not dbquery:
dbquery = None
try:
res = run_sql("UPDATE collection SET dbquery=%s WHERE id=%s", (dbquery, colID))
return (1, "")
except StandardError, e:
return (0, e)
def modify_pbx(colID, pbxID, sel_ln, score='', position='', title='', body=''):
"""Modify a portalbox
colID - the id of the collection involved
pbxID - the id of the portalbox that should be modified
sel_ln - the language of the portalbox that should be modified
title - the title
body - the content
score - if several portalboxes in one position, who should appear on top.
position - position on page"""
try:
if title:
res = run_sql("UPDATE portalbox SET title=%s WHERE id=%s", (title, pbxID))
if body:
res = run_sql("UPDATE portalbox SET body=%s WHERE id=%s", (body, pbxID))
if score:
res = run_sql("UPDATE collection_portalbox SET score='%s' WHERE id_collection=%s and id_portalbox=%s and ln='%s'" % (score, colID, pbxID, sel_ln))
if position:
res = run_sql("UPDATE collection_portalbox SET position='%s' WHERE id_collection=%s and id_portalbox=%s and ln='%s'" % (position, colID, pbxID, sel_ln))
return (1, "")
except Exception, e:
return (0, e)
def switch_fld_score(colID, id_1, id_2):
"""Switch the scores of id_1 and id_2 in collection_field_fieldvalue
colID - collection the id_1 or id_2 is connected to
id_1/id_2 - id field from tables like format..portalbox...
table - name of the table"""
try:
res1 = run_sql("SELECT score FROM collection_field_fieldvalue WHERE id_collection=%s and id_field=%s" % (colID, id_1))
res2 = run_sql("SELECT score FROM collection_field_fieldvalue WHERE id_collection=%s and id_field=%s" % (colID, id_2))
if res1[0][0] == res2[0][0]:
return (0, (1, "Cannot rearrange the selected fields, either rearrange by name or use the mySQL client to fix the problem."))
else:
res = run_sql("UPDATE collection_field_fieldvalue SET score=%s WHERE id_collection=%s and id_field=%s" % (res2[0][0], colID, id_1))
res = run_sql("UPDATE collection_field_fieldvalue SET score=%s WHERE id_collection=%s and id_field=%s" % (res1[0][0], colID, id_2))
return (1, "")
except StandardError, e:
return (0, e)
def switch_fld_value_score(colID, id_1, fldvID_1, fldvID_2):
"""Switch the scores of two field_value
colID - collection the id_1 or id_2 is connected to
id_1/id_2 - id field from tables like format..portalbox...
table - name of the table"""
try:
res1 = run_sql("SELECT score_fieldvalue FROM collection_field_fieldvalue WHERE id_collection=%s and id_field=%s and id_fieldvalue=%s" % (colID, id_1, fldvID_1))
res2 = run_sql("SELECT score_fieldvalue FROM collection_field_fieldvalue WHERE id_collection=%s and id_field=%s and id_fieldvalue=%s" % (colID, id_1, fldvID_2))
if res1[0][0] == res2[0][0]:
return (0, (1, "Cannot rearrange the selected fields, either rearrange by name or use the mySQL client to fix the problem."))
else:
res = run_sql("UPDATE collection_field_fieldvalue SET score_fieldvalue=%s WHERE id_collection=%s and id_field=%s and id_fieldvalue=%s" % (res2[0][0], colID, id_1, fldvID_1))
res = run_sql("UPDATE collection_field_fieldvalue SET score_fieldvalue=%s WHERE id_collection=%s and id_field=%s and id_fieldvalue=%s" % (res1[0][0], colID, id_1, fldvID_2))
return (1, "")
except Exception, e:
return (0, e)
def switch_pbx_score(colID, id_1, id_2, sel_ln):
"""Switch the scores of id_1 and id_2 in the table given by the argument.
colID - collection the id_1 or id_2 is connected to
id_1/id_2 - id field from tables like format..portalbox...
table - name of the table"""
try:
res1 = run_sql("SELECT score FROM collection_portalbox WHERE id_collection=%s and id_portalbox=%s and ln='%s'" % (colID, id_1, sel_ln))
res2 = run_sql("SELECT score FROM collection_portalbox WHERE id_collection=%s and id_portalbox=%s and ln='%s'" % (colID, id_2, sel_ln))
if res1[0][0] == res2[0][0]:
return (0, (1, "Cannot rearrange the selected fields, either rearrange by name or use the mySQL client to fix the problem."))
res = run_sql("UPDATE collection_portalbox SET score=%s WHERE id_collection=%s and id_portalbox=%s and ln='%s'" % (res2[0][0], colID, id_1, sel_ln))
res = run_sql("UPDATE collection_portalbox SET score=%s WHERE id_collection=%s and id_portalbox=%s and ln='%s'" % (res1[0][0], colID, id_2, sel_ln))
return (1, "")
except Exception, e:
return (0, e)
def switch_score(colID, id_1, id_2, table):
"""Switch the scores of id_1 and id_2 in the table given by the argument.
colID - collection the id_1 or id_2 is connected to
id_1/id_2 - id field from tables like format..portalbox...
table - name of the table"""
try:
res1 = run_sql("SELECT score FROM collection_%s WHERE id_collection=%s and id_%s=%s" % (table, colID, table, id_1))
res2 = run_sql("SELECT score FROM collection_%s WHERE id_collection=%s and id_%s=%s" % (table, colID, table, id_2))
if res1[0][0] == res2[0][0]:
return (0, (1, "Cannot rearrange the selected fields, either rearrange by name or use the mySQL client to fix the problem."))
res = run_sql("UPDATE collection_%s SET score=%s WHERE id_collection=%s and id_%s=%s" % (table, res2[0][0], colID, table, id_1))
res = run_sql("UPDATE collection_%s SET score=%s WHERE id_collection=%s and id_%s=%s" % (table, res1[0][0], colID, table, id_2))
return (1, "")
except Exception, e:
return (0, e)
def get_detailed_page_tabs(colID=None, recID=None, ln=cdslang):
"""
Returns the complete list of tabs to be displayed in the
detailed record pages.
Returned structured is a dict with
- key : last component of the url that leads to detailed record tab: http:www.../record/74/key
- values: a dictionary with the following keys:
- label: *string* label to be printed as tab (Not localized here)
- visible: *boolean* if False, tab should not be shown
- enabled: *boolean* if True, tab should be disabled
- order: *int* position of the tab in the list of tabs
- ln: language of the tab labels
returns dict
"""
_ = gettext_set_language(ln)
tabs = {'metadata' : {'label': _('Information'), 'visible': False, 'enabled': True, 'order': 1},
'references': {'label': _('References'), 'visible': False, 'enabled': True, 'order': 2},
'citations' : {'label': _('Citations'), 'visible': False, 'enabled': True, 'order': 3},
'comments' : {'label': _('Discussion'), 'visible': False, 'enabled': True, 'order': 4},
'usage' : {'label': _('Usage statistics'), 'visible': False, 'enabled': True, 'order': 5},
'files' : {'label': _('Fulltext'), 'visible': False, 'enabled': True, 'order': 6}
}
res = run_sql("SELECT tabs FROM collectiondetailedrecordpagetabs " + \
"WHERE id_collection='%s'" % colID)
if len(res) > 0:
tabs_state = res[0][0].split(';')
for tab_state in tabs_state:
if tabs.has_key(tab_state):
tabs[tab_state]['visible'] = True;
else:
# no preference set for this collection.
# assume all tabs are displayed
for key in tabs.keys():
tabs[key]['visible'] = True
if not CFG_WEBCOMMENT_ALLOW_COMMENTS and \
not CFG_WEBCOMMENT_ALLOW_REVIEWS:
tabs['comments']['visible'] = False
tabs['comments']['enabled'] = False
if recID is not None:
# Disable references if no references found
bfo = BibFormatObject(recID)
if bfe_references.format(bfo, '', '') == '':
tabs['references']['enabled'] = False
# Disable citations if not citations found
if len(get_cited_by(recID)) == 0:
# TODO: Also check for cocitations
tabs['citations']['enabled'] = False
# Disable fulltext if no file found
brd = BibRecDocs(recID)
if len(brd.list_bibdocs()) == 0:
tabs['files']['enabled'] = False
tabs[''] = tabs['metadata']
del tabs['metadata']
return tabs
diff --git a/modules/websession/lib/inveniogc.py b/modules/websession/lib/inveniogc.py
index fb1761902..04dd769ab 100644
--- a/modules/websession/lib/inveniogc.py
+++ b/modules/websession/lib/inveniogc.py
@@ -1,483 +1,483 @@
## -*- mode: python; coding: utf-8; -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
"""
Invenio garbage collector.
"""
__revision__ = "$Id$"
import sys
import datetime
import time
import os
try:
from invenio.dbquery import run_sql
- from invenio.config import logdir, tmpdir, cachedir, \
+ from invenio.config import CFG_LOGDIR, CFG_TMPDIR, CFG_CACHEDIR, \
CFG_WEBSEARCH_RSS_TTL
from invenio.websession_config import CFG_WEBSESSION_NOT_CONFIRMED_EMAIL_ADDRESS_EXPIRE_IN_DAYS
from invenio.bibtask import task_init, task_set_option, task_get_option, \
write_message, write_messages
from invenio.access_control_mailcookie import mail_cookie_gc
from invenio.bibdocfile import BibDoc
except ImportError, e:
print "Error: %s" % (e, )
sys.exit(1)
# configure variables
CFG_MYSQL_ARGUMENTLIST_SIZE = 100
# After how many days to remove obsolete log/err files
CFG_MAX_ATIME_RM_LOG = 28
# After how many days to zip obsolete log/err files
CFG_MAX_ATIME_ZIP_LOG = 7
# After how many days to remove obsolete bibreformat fmt xml files
CFG_MAX_ATIME_RM_FMT = 28
# After how many days to zip obsolete bibreformat fmt xml files
CFG_MAX_ATIME_ZIP_FMT = 7
# After how many days to remove obsolete bibharvest fmt xml files
CFG_MAX_ATIME_RM_OAI = 28
# After how many days to zip obsolete bibharvest fmt xml files
CFG_MAX_ATIME_ZIP_OAI = 7
# After how many days to remove deleted bibdocs
CFG_DELETED_BIBDOC_MAXLIFE = 365*10
# AFter how many day to remove old cached webjournal files
CFG_WEBJOURNAL_TTL = 7
def gc_exec_command(command):
""" Exec the command logging in appropriate way its output."""
write_message(' %s' % command, verbose=9)
(dummy, output, errors) = os.popen3(command)
write_messages(errors.read())
write_messages(output.read())
def clean_logs():
""" Clean the logs from obsolete files. """
write_message("""CLEANING OF LOG FILES STARTED""")
write_message("- deleting/gzipping bibsched empty/old err/log "
"BibSched files")
vstr = task_get_option('verbose') > 1 and '-v' or ''
gc_exec_command('find %s -name "bibsched_task_*"'
' -size 0c -exec rm %s -f {} \;' \
- % (logdir, vstr))
+ % (CFG_LOGDIR, vstr))
gc_exec_command('find %s -name "bibsched_task_*"'
' -atime +%s -exec rm %s -f {} \;' \
- % (logdir, CFG_MAX_ATIME_RM_LOG, vstr))
+ % (CFG_LOGDIR, CFG_MAX_ATIME_RM_LOG, vstr))
gc_exec_command('find %s -name "bibsched_task_*"'
' -atime +%s -exec gzip %s -9 {} \;' \
- % (logdir, CFG_MAX_ATIME_ZIP_LOG, vstr))
+ % (CFG_LOGDIR, CFG_MAX_ATIME_ZIP_LOG, vstr))
write_message("- deleting/gzipping temporary empty/old "
"BibReformat xml files")
gc_exec_command('find %s -name "rec_fmt_*"'
' -size 0c -exec rm %s -f {} \;' \
- % (tmpdir, vstr))
+ % (CFG_TMPDIR, vstr))
gc_exec_command('find %s -name "rec_fmt_*"'
' -atime +%s -exec rm %s -f {} \;' \
- % (tmpdir, CFG_MAX_ATIME_RM_FMT, vstr))
+ % (CFG_TMPDIR, CFG_MAX_ATIME_RM_FMT, vstr))
gc_exec_command('find %s -name "rec_fmt_*"'
' -atime +%s -exec gzip %s -9 {} \;' \
- % (tmpdir, CFG_MAX_ATIME_ZIP_FMT, vstr))
+ % (CFG_TMPDIR, CFG_MAX_ATIME_ZIP_FMT, vstr))
write_message("- deleting/gzipping temporary old "
"BibHarvest xml files")
gc_exec_command('find %s -name "bibharvestadmin.*"'
' -exec rm %s -f {} \;' \
- % (tmpdir, vstr))
+ % (CFG_TMPDIR, vstr))
gc_exec_command('find %s -name "bibconvertrun.*"'
' -exec rm %s -f {} \;' \
- % (tmpdir, vstr))
+ % (CFG_TMPDIR, vstr))
gc_exec_command('find %s -name "oaiharvest*"'
' -atime +%s -exec gzip %s -9 {} \;' \
- % (tmpdir, CFG_MAX_ATIME_ZIP_OAI, vstr))
+ % (CFG_TMPDIR, CFG_MAX_ATIME_ZIP_OAI, vstr))
gc_exec_command('find %s -name "oaiharvest*"'
' -atime +%s -exec rm %s -f {} \;' \
- % (tmpdir, CFG_MAX_ATIME_RM_OAI, vstr))
+ % (CFG_TMPDIR, CFG_MAX_ATIME_RM_OAI, vstr))
gc_exec_command('find %s -name "oai_archive*"'
' -atime +%s -exec rm %s -f {} \;' \
- % (tmpdir, CFG_MAX_ATIME_RM_OAI, vstr))
+ % (CFG_TMPDIR, CFG_MAX_ATIME_RM_OAI, vstr))
write_message("""CLEANING OF LOG FILES FINISHED""")
def clean_cache():
"""Clean the cache for expired and old files."""
write_message("""CLEANING OF OLD CACHED RSS REQUEST STARTED""")
- rss_cache_dir = "%s/rss/" % cachedir
+ rss_cache_dir = "%s/rss/" % CFG_CACHEDIR
try:
filenames = os.listdir(rss_cache_dir)
except OSError:
filenames = []
count = 0
for filename in filenames:
filename = os.path.join(rss_cache_dir, filename)
last_update_time = datetime.datetime.fromtimestamp(os.stat(os.path.abspath(filename)).st_mtime)
if not (datetime.datetime.now() < last_update_time + datetime.timedelta(minutes=CFG_WEBSEARCH_RSS_TTL)):
try:
os.remove(filename)
count += 1
except OSError, e:
write_message("Error: %s" % e)
write_message("""%s rss cache file pruned out of %s.""" % (count, len(filenames)))
write_message("""CLEANING OF OLD CACHED RSS REQUEST FINISHED""")
write_message("""CLEANING OF OLD CACHED WEBJOURNAL FILES STARTED""")
- webjournal_cache_dir = "%s/webjournal/" % cachedir
+ webjournal_cache_dir = "%s/webjournal/" % CFG_CACHEDIR
try:
filenames = os.listdir(webjournal_cache_dir)
except OSError:
filenames = []
count = 0
for filename in filenames:
filename = os.path.join(webjournal_cache_dir, filename)
last_update_time = datetime.datetime.fromtimestamp(os.stat(os.path.abspath(filename)).st_mtime)
if not (datetime.datetime.now() < last_update_time + datetime.timedelta(days=CFG_WEBJOURNAL_TTL)):
try:
os.remove(filename)
count += 1
except OSError, e:
write_message("Error: %s" % e)
write_message("""%s webjournal cache file pruned out of %s.""" % (count, len(filenames)))
write_message("""CLEANING OF OLD CACHED WEBJOURNAL FILES FINISHED""")
def clean_bibxxx():
"""
Clean unreferenced bibliographic values from bibXXx tables.
This is useful to prettify browse results, as it removes
old, no longer used values.
WARNING: this function must be run only when no bibupload is
running and/or sleeping.
"""
write_message("""CLEANING OF UNREFERENCED bibXXx VALUES STARTED""")
for xx in range(0, 100):
bibxxx = 'bib%02dx' % xx
bibrec_bibxxx = 'bibrec_bib%02dx' % xx
if task_get_option('verbose') >= 9:
num_unref_values = run_sql("""SELECT COUNT(*) FROM %(bibxxx)s
LEFT JOIN %(bibrec_bibxxx)s
ON %(bibxxx)s.id=%(bibrec_bibxxx)s.id_bibxxx
WHERE %(bibrec_bibxxx)s.id_bibrec IS NULL""" % \
{'bibxxx': bibxxx,
'bibrec_bibxxx': bibrec_bibxxx,})[0][0]
run_sql("""DELETE %(bibxxx)s FROM %(bibxxx)s
LEFT JOIN %(bibrec_bibxxx)s
ON %(bibxxx)s.id=%(bibrec_bibxxx)s.id_bibxxx
WHERE %(bibrec_bibxxx)s.id_bibrec IS NULL""" % \
{'bibxxx': bibxxx,
'bibrec_bibxxx': bibrec_bibxxx,})
if task_get_option('verbose') >= 9:
write_message(""" - %d unreferenced %s values cleaned""" % \
(num_unref_values, bibxxx))
write_message("""CLEANING OF UNREFERENCED bibXXx VALUES FINISHED""")
def clean_documents():
"""Delete all the bibdocs that have been set as deleted and have not been
modified since CFG_DELETED_BIBDOC_MAXLIFE days. Returns the number of
bibdocs involved."""
write_message("""CLEANING OF OBSOLETED DELETED DOCUMENTS STARTED""")
write_message("select id from bibdoc where status='DELETED' and NOW()>ADDTIME(modification_date, '%s 0:0:0')" % CFG_DELETED_BIBDOC_MAXLIFE, verbose=9)
records = run_sql("select id from bibdoc where status='DELETED' and NOW()>ADDTIME(modification_date, '%s 0:0:0')" % CFG_DELETED_BIBDOC_MAXLIFE)
for record in records:
bibdoc = BibDoc(record[0])
bibdoc.expunge()
write_message("DELETE FROM bibdoc WHERE id=%i" % int(record[0]), verbose=9)
run_sql("DELETE FROM bibdoc WHERE id=%s", (record[0], ))
write_message("""%s obsoleted deleted documents cleaned""" % len(records))
write_message("""CLEANING OF OBSOLETED DELETED DOCUMENTS FINISHED""")
return len(records)
def guest_user_garbage_collector():
"""Session Garbage Collector
program flow/tasks:
1: delete expired sessions
1b:delete guest users without session
2: delete queries not attached to any user
3: delete baskets not attached to any user
4: delete alerts not attached to any user
5: delete expired mailcookies
5b: delete expired not confirmed email address
6: delete expired roles memberships
verbose - level of program output.
0 - nothing
1 - default
9 - max, debug"""
# dictionary used to keep track of number of deleted entries
delcount = {'session': 0,
'user': 0,
'user_query': 0,
'query': 0,
'bskBASKET': 0,
'user_bskBASKET': 0,
'bskREC': 0,
'bskRECORDCOMMENT': 0,
'bskEXTREC': 0,
'bskEXTFMT': 0,
'user_query_basket': 0,
'mail_cookie': 0,
'email_addresses': 0,
'role_membership' : 0}
write_message("CLEANING OF GUEST SESSIONS STARTED")
# 1 - DELETE EXPIRED SESSIONS
write_message("- deleting expired sessions")
timelimit = time.time()
write_message(" DELETE FROM session WHERE"
" session_expiry < %d \n" % (timelimit, ), verbose=9)
delcount['session'] += run_sql("DELETE FROM session WHERE"
" session_expiry < %s """ % (timelimit, ))
# 1b - DELETE GUEST USERS WITHOUT SESSION
write_message("- deleting guest users without session")
# get uids
write_message(""" SELECT u.id\n FROM user AS u LEFT JOIN session AS s\n ON u.id = s.uid\n WHERE s.uid IS NULL AND u.email = ''""", verbose=9)
result = run_sql("""SELECT u.id
FROM user AS u LEFT JOIN session AS s
ON u.id = s.uid
WHERE s.uid IS NULL AND u.email = ''""")
write_message(result, verbose=9)
if result:
# work on slices of result list in case of big result
for i in range(0, len(result), CFG_MYSQL_ARGUMENTLIST_SIZE):
# create string of uids
uidstr = ''
for (id_user, ) in result[i:i+CFG_MYSQL_ARGUMENTLIST_SIZE]:
if uidstr: uidstr += ','
uidstr += "%s" % (id_user, )
# delete users
write_message(" DELETE FROM user WHERE"
" id IN (TRAVERSE LAST RESULT) AND email = '' \n", verbose=9)
delcount['user'] += run_sql("DELETE FROM user WHERE"
" id IN (%s) AND email = ''" % (uidstr, ))
# 2 - DELETE QUERIES NOT ATTACHED TO ANY USER
# first step, delete from user_query
write_message("- deleting user_queries referencing"
" non-existent users")
# find user_queries referencing non-existent users
write_message(" SELECT DISTINCT uq.id_user\n"
" FROM user_query AS uq LEFT JOIN user AS u\n"
" ON uq.id_user = u.id\n WHERE u.id IS NULL", verbose=9)
result = run_sql("""SELECT DISTINCT uq.id_user
FROM user_query AS uq LEFT JOIN user AS u
ON uq.id_user = u.id
WHERE u.id IS NULL""")
write_message(result, verbose=9)
# delete in user_query one by one
write_message(" DELETE FROM user_query WHERE"
" id_user = 'TRAVERSE LAST RESULT' \n", verbose=9)
for (id_user, ) in result:
delcount['user_query'] += run_sql("""DELETE FROM user_query
WHERE id_user = %s""" % (id_user, ))
# delete the actual queries
write_message("- deleting queries not attached to any user")
# select queries that must be deleted
write_message(""" SELECT DISTINCT q.id\n FROM query AS q LEFT JOIN user_query AS uq\n ON uq.id_query = q.id\n WHERE uq.id_query IS NULL AND\n q.type <> 'p' """, verbose=9)
result = run_sql("""SELECT DISTINCT q.id
FROM query AS q LEFT JOIN user_query AS uq
ON uq.id_query = q.id
WHERE uq.id_query IS NULL AND
q.type <> 'p'""")
write_message(result, verbose=9)
# delete queries one by one
write_message(""" DELETE FROM query WHERE id = 'TRAVERSE LAST RESULT \n""", verbose=9)
for (id_user, ) in result:
delcount['query'] += run_sql("""DELETE FROM query WHERE id = %s""", (id_user, ))
# 3 - DELETE BASKETS NOT OWNED BY ANY USER
write_message("- deleting baskets not owned by any user")
# select basket ids
write_message(""" SELECT ub.id_bskBASKET\n FROM user_bskBASKET AS ub LEFT JOIN user AS u\n ON u.id = ub.id_user\n WHERE u.id IS NULL""", verbose=9)
try:
result = run_sql("""SELECT ub.id_bskBASKET
FROM user_bskBASKET AS ub LEFT JOIN user AS u
ON u.id = ub.id_user
WHERE u.id IS NULL""")
except:
result = []
write_message(result, verbose=9)
# delete from user_basket and basket one by one
write_message(""" DELETE FROM user_bskBASKET WHERE id_bskBASKET = 'TRAVERSE LAST RESULT' """, verbose=9)
write_message(""" DELETE FROM bskBASKET WHERE id = 'TRAVERSE LAST RESULT' """, verbose=9)
write_message(""" DELETE FROM bskREC WHERE id_bskBASKET = 'TRAVERSE LAST RESULT'""", verbose=9)
write_message(""" DELETE FROM bskRECORDCOMMENT WHERE id_bskBASKET = 'TRAVERSE LAST RESULT' \n""", verbose=9)
for (id_basket, ) in result:
delcount['user_bskBASKET'] += run_sql("""DELETE FROM user_bskBASKET WHERE id_bskBASKET = %s""", (id_basket, ))
delcount['bskBASKET'] += run_sql("""DELETE FROM bskBASKET WHERE id = %s""", (id_basket, ))
delcount['bskREC'] += run_sql("""DELETE FROM bskREC WHERE id_bskBASKET = %s""", (id_basket, ))
delcount['bskRECORDCOMMENT'] += run_sql("""DELETE FROM bskRECORDCOMMENT WHERE id_bskBASKET = %s""", (id_basket, ))
write_message(""" SELECT DISTINCT ext.id, rec.id_bibrec_or_bskEXTREC FROM bskEXTREC AS ext \nLEFT JOIN bskREC AS rec ON ext.id=-rec.id_bibrec_or_bskEXTREC WHERE id_bibrec_or_bskEXTREC is NULL""", verbose=9)
try:
result = run_sql("""SELECT DISTINCT ext.id FROM bskEXTREC AS ext
LEFT JOIN bskREC AS rec ON ext.id=-rec.id_bibrec_or_bskEXTREC
WHERE id_bibrec_or_bskEXTREC is NULL""")
except:
result = []
write_message(result, verbose=9)
write_message(""" DELETE FROM bskEXTREC WHERE id = 'TRAVERSE LAST RESULT' """, verbose=9)
write_message(""" DELETE FROM bskEXTFMT WHERE id_bskEXTREC = 'TRAVERSE LAST RESULT' \n""", verbose=9)
for (id_basket, ) in result:
delcount['bskEXTREC'] += run_sql("""DELETE FROM bskEXTREC WHERE id=%s""", (id_basket,))
delcount['bskEXTFMT'] += run_sql("""DELETE FROM bskEXTFMT WHERE id_bskEXTREC=%s""", (id_basket,))
# 4 - DELETE ALERTS NOT OWNED BY ANY USER
write_message('- deleting alerts not owned by any user')
# select user ids in uqb that reference non-existent users
write_message("""SELECT DISTINCT uqb.id_user FROM user_query_basket AS uqb LEFT JOIN user AS u ON uqb.id_user = u.id WHERE u.id IS NULL""", verbose=9)
result = run_sql("""SELECT DISTINCT uqb.id_user FROM user_query_basket AS uqb LEFT JOIN user AS u ON uqb.id_user = u.id WHERE u.id IS NULL""")
write_message(result, verbose=9)
# delete all these entries
for (id_user, ) in result:
write_message("""DELETE FROM user_query_basket WHERE id_user = 'TRAVERSE LAST RESULT """, verbose=9)
delcount['user_query_basket'] += run_sql("""DELETE FROM user_query_basket WHERE id_user = %s """, (id_user, ))
# 5 - delete expired mailcookies
write_message("""mail_cookie_gc()""", verbose=9)
delcount['mail_cookie'] = mail_cookie_gc()
## 5b - delete expired not confirmed email address
write_message("""DELETE FROM user WHERE note='2' AND NOW()>ADDTIME(last_login, '%s 0:0:0')""" % CFG_WEBSESSION_NOT_CONFIRMED_EMAIL_ADDRESS_EXPIRE_IN_DAYS, verbose=9)
delcount['email_addresses'] = run_sql("""DELETE FROM user WHERE note='2' AND NOW()>ADDTIME(last_login, '%s 0:0:0')""" % CFG_WEBSESSION_NOT_CONFIRMED_EMAIL_ADDRESS_EXPIRE_IN_DAYS)
# 6 - delete expired roles memberships
write_message("""DELETE FROM user_accROLE WHERE expiration',
'x_url_close': ''}
accBody += "
"
bask=aler=msgs= _("The %(x_fmt_open)sguest%(x_fmt_close)s users need to %(x_url_open)sregister%(x_url_close)s first") %\
{'x_fmt_open': '',
'x_fmt_close': '',
'x_url_open': '',
'x_url_close': ''}
sear= _("No queries found")
else:
user = username
accBody = websession_templates.tmpl_account_body(
ln = ln,
user = user,
)
return websession_templates.tmpl_account_page(
ln = ln,
weburl = weburl,
accBody = accBody,
baskets = bask,
alerts = aler,
searches = sear,
messages = msgs,
groups = grps,
administrative = perform_youradminactivities(user_info, ln)
)
def template_account(title, body, ln):
"""It is a template for print each of the options from the user's account."""
return websession_templates.tmpl_account_template(
ln = ln,
title = title,
body = body
)
def warning_guest_user(type, ln=cdslang):
"""It returns an alert message,showing that the user is a guest user and should log into the system."""
# load the right message language
_ = gettext_set_language(ln)
return websession_templates.tmpl_warning_guest_user(
ln = ln,
type = type,
)
def perform_delete(ln):
"""Delete the account of the user, not implement yet."""
# TODO
return websession_templates.tmpl_account_delete(ln = ln)
def perform_set(email, ln, verbose=0):
"""Perform_set(email,password): edit your account parameters, email and
password.
"""
try:
res = run_sql("SELECT id, nickname FROM user WHERE email=%s", (email,))
uid = res[0][0]
nickname = res[0][1]
except:
uid = 0
nickname = ""
CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS_LOCAL = CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS
prefs = get_user_preferences(uid)
if CFG_EXTERNAL_AUTHENTICATION.has_key(prefs['login_method']) and CFG_EXTERNAL_AUTHENTICATION[prefs['login_method']][0]:
CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS_LOCAL = 3
out = websession_templates.tmpl_user_preferences(
ln = ln,
email = email,
email_disabled = (CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS_LOCAL >= 2),
password_disabled = (CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS_LOCAL >= 3),
nickname = nickname,
)
if len(CFG_EXTERNAL_AUTHENTICATION) > 1:
try:
uid = run_sql("SELECT id FROM user where email=%s", (email,))
uid = uid[0][0]
except:
uid = 0
current_login_method = prefs['login_method']
methods = CFG_EXTERNAL_AUTHENTICATION.keys()
# Filtering out methods that don't provide user_exists to check if
# a user exists in the external auth method before letting him/her
# to switch.
for method in methods:
if CFG_EXTERNAL_AUTHENTICATION[method][0]:
try:
if not CFG_EXTERNAL_AUTHENTICATION[method][0].user_exists(email):
methods.remove(method)
except (AttributeError, InvenioWebAccessExternalAuthError):
methods.remove(method)
methods.sort()
if len(methods) > 1:
out += websession_templates.tmpl_user_external_auth(
ln = ln,
methods = methods,
current = current_login_method,
method_disabled = (CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 4)
)
try:
current_group_records = prefs['websearch_group_records']
except KeyError:
current_group_records = 10
try:
show_latestbox = prefs['websearch_latestbox']
except KeyError:
show_latestbox = True
try:
show_helpbox = prefs['websearch_helpbox']
except KeyError:
show_helpbox = True
out += websession_templates.tmpl_user_websearch_edit(
ln = ln,
current = current_group_records,
show_latestbox = show_latestbox,
show_helpbox = show_helpbox,
)
if verbose >= 9:
for key, value in prefs.items():
out += "%s:%s " % (key, value)
out += perform_display_external_user_settings(prefs, ln)
return out
def create_register_page_box(referer='', ln=cdslang):
"""Register a new account."""
return websession_templates.tmpl_register_page(
referer = referer,
ln = ln,
level = CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS,
supportemail = supportemail,
cdsname = cdsname
)
## create_login_page_box(): ask for the user's email and password, for login into the system
def create_login_page_box(referer='', apache_msg="", ln=cdslang):
# List of referer regexep and message to print
_ = gettext_set_language(ln)
login_referrer2msg = (
(re.compile(r"/search"), "
" + _("This collection is restricted. If you think you have right to access it, please authenticate yourself.") + "
"),
)
msg = ""
for regexp, txt in login_referrer2msg:
if regexp.search(referer):
msg = txt
break
# FIXME: Temporary Hack to help CDS current migration
if CFG_CERN_SITE and apache_msg:
return msg + apache_msg
if apache_msg:
msg += apache_msg + "
Otherwise please, provide the correct authorization" \
" data in the following form.
"
internal = None
for system in CFG_EXTERNAL_AUTHENTICATION.keys():
if not CFG_EXTERNAL_AUTHENTICATION[system][0]:
internal = system
break
register_available = CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS <= 1 and internal
methods = CFG_EXTERNAL_AUTHENTICATION.keys()
methods.sort()
selected = ''
for method in methods:
if CFG_EXTERNAL_AUTHENTICATION[method][1]:
selected = method
break
return websession_templates.tmpl_login_form(
ln = ln,
referer = referer,
internal = internal,
register_available = register_available,
methods = methods,
selected_method = selected,
supportemail = supportemail,
msg = msg,
)
# perform_logout: display the message of not longer authorized,
def perform_logout(req, ln):
return websession_templates.tmpl_account_logout(ln = ln)
#def perform_lost: ask the user for his email, in order to send him the lost password
def perform_lost(ln):
return websession_templates.tmpl_lost_password_form(ln)
#def perform_reset_password: ask the user for a new password to reset the lost one
def perform_reset_password(ln, email, reset_key, msg=''):
return websession_templates.tmpl_reset_password_form(ln, email, reset_key, msg)
# perform_emailSent(email): confirm that the password has been emailed to 'email' address
def perform_emailSent(email, ln):
return websession_templates.tmpl_account_emailSent(ln = ln, email = email)
# peform_emailMessage : display a error message when the email introduced is not correct, and sugest to try again
def perform_emailMessage(eMsg, ln):
return websession_templates.tmpl_account_emailMessage( ln = ln,
msg = eMsg
)
# perform_back(): template for return to a previous page, used for login,register and setting
def perform_back(mess,act,linkname='', ln='en'):
if not linkname:
linkname = act
return websession_templates.tmpl_back_form(
ln = ln,
message = mess,
act = act,
link = linkname,
)
diff --git a/modules/websession/lib/webgroup_dblayer.py b/modules/websession/lib/webgroup_dblayer.py
index 257a09b47..628c7abbf 100644
--- a/modules/websession/lib/webgroup_dblayer.py
+++ b/modules/websession/lib/webgroup_dblayer.py
@@ -1,442 +1,442 @@
# -*- coding: utf-8 -*-
##
## $Id$
##
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
""" Database related functions for groups"""
__revision__ = "$Id$"
from time import localtime
from zlib import decompress
from invenio.config import \
cdslang, \
- version
+ CFG_VERSION
from invenio.dbquery import run_sql
from invenio.dateutils import convert_datestruct_to_datetext
from invenio.messages import gettext_set_language
from invenio.websession_config import CFG_WEBSESSION_GROUP_JOIN_POLICY
def get_groups_by_user_status(uid, user_status, login_method='INTERNAL'):
"""Select all the groups the user is admin of.
@param uid: user id
@return ((id_usergroup,
group_name,
group_description, ))
"""
query = """SELECT g.id,
g.name,
g.description
FROM usergroup g, user_usergroup ug
WHERE ug.id_user=%s AND
ug.id_usergroup=g.id AND
ug.user_status=%s AND
g.login_method = %s
ORDER BY g.name"""
uid = int(uid)
res = run_sql(query, (uid, user_status, login_method))
return res
def get_groups_by_login_method(uid, login_method):
"""Select all the groups the user is member of selecting the login_method.
@param uid: user id
@param login_method: the login_method (>0 external)
@return ((id_usergroup,
group_name,
group_description, ))
"""
query = """SELECT g.id,
g.name,
g.description
FROM usergroup g, user_usergroup ug
WHERE ug.id_user=%s AND
ug.id_usergroup=g.id AND
g.login_method=%s
ORDER BY g.name"""
uid = int(uid)
res = run_sql(query, (uid, login_method))
return res
def get_groups_with_description(uid):
"""Select all the groups the user is member of.
@param uid: user id
@return ((id_usergroup,
group_name,
group_description, ))
"""
query = """SELECT g.id,
g.name,
g.description
FROM usergroup g, user_usergroup ug
WHERE ug.id_user=%s AND
ug.id_usergroup=g.id
ORDER BY g.name"""
uid = int(uid)
res = run_sql(query, (uid, ))
return res
def get_external_groups(uid):
"""Select all the groups the user is member of selecting the login_method.
@param uid: user id
@param login_method: the login_method (>0 external)
@return ((id_usergroup,
group_name,
group_description, ))
"""
query = """SELECT g.id,
g.name,
g.description
FROM usergroup g, user_usergroup ug
WHERE ug.id_user=%s AND
ug.id_usergroup=g.id AND
g.login_method != 'INTERNAL'
ORDER BY g.name"""
uid = int(uid)
res = run_sql(query, (uid, ))
return res
def get_groups(uid):
"""Select all the groups id the user is member of."""
query = """SELECT g.id, g.name
FROM usergroup g, user_usergroup ug
WHERE ug.id_user=%s AND
ug.id_usergroup=g.id
"""
res = run_sql(query, (uid, ))
res = list(res)
return res
def get_group_id(group_name, login_method):
"""@return the id of the group called group_name with given login_method."""
return run_sql("""
SELECT id FROM usergroup
WHERE login_method = %s AND name = %s""", (login_method, group_name,))
def get_login_method_groups(uid, login_method):
"""Select all the external groups of a particular login_method for which
the user is subscrided.
@return ((group_name, group_id))
"""
return run_sql("""
SELECT g.name as name, g.id as id
FROM user_usergroup as u JOIN usergroup as g
ON u.id_usergroup = g.id
WHERE u.id_user = %s and g.login_method = %s""",
(uid, login_method,))
def get_all_login_method_groups(login_method):
"""Select all the external groups of a particular login_method.
@return ({group_name: group_id, ...})
"""
return dict(run_sql("""
SELECT name, id
FROM usergroup
WHERE login_method = %s""",
(login_method,)))
def get_all_users_with_groups_with_login_method(login_method):
"""Select all the users that belong at least to one external group
of kind login_method.
"""
return dict(run_sql("""
SELECT DISTINCT u.email, u.id
FROM user AS u JOIN user_usergroup AS uu ON u.id = uu.id_user
JOIN usergroup AS ug ON ug.id = uu.id_usergroup
WHERE ug.login_method = %s""", (login_method,)))
def get_visible_group_list(uid, pattern=""):
"""List the group the user can join (not already member
of the group regardless user's status).
@return groups {id : name} whose name matches pattern
"""
grpID = []
groups = {}
#list the group the user is member of"""
query = """SELECT distinct(id_usergroup)
FROM user_usergroup
WHERE id_user=%i """
uid = int(uid)
query %= uid
res = run_sql(query)
map(lambda x: grpID.append(int(x[0])), res)
query2 = """SELECT id,name
FROM usergroup
WHERE (join_policy='%s' OR join_policy='%s')""" % (
CFG_WEBSESSION_GROUP_JOIN_POLICY['VISIBLEOPEN'],
CFG_WEBSESSION_GROUP_JOIN_POLICY['VISIBLEMAIL'])
if len(grpID) == 1 :
query2 += """ AND id!=%i""" % grpID[0]
elif len(grpID) > 1:
query2 += """ AND id NOT IN %s""" % str(tuple(grpID))
if pattern:
res2 = run_sql(query2 + """ AND name RLIKE %s ORDER BY name""", (pattern,))
else:
res2 = run_sql(query2 + """ ORDER BY name""")
map(lambda x: groups.setdefault(x[0], x[1]), res2)
return groups
def insert_new_group(uid,
new_group_name,
new_group_description,
join_policy,
login_method='INTERNAL'):
"""Create a new group and affiliate a user."""
query1 = """INSERT INTO usergroup (id, name, description, join_policy,
login_method)
VALUES (NULL,%s,%s,%s,%s)
"""
params1 = (new_group_name,
new_group_description,
join_policy,
login_method)
res1 = run_sql(query1, params1)
date = convert_datestruct_to_datetext(localtime())
uid = int(uid)
query2 = """INSERT INTO user_usergroup (id_user, id_usergroup, user_status,
user_status_date)
VALUES (%s,%s,'A',%s)
"""
params2 = (uid, res1, date)
res2 = run_sql(query2, params2)
return res1
def insert_only_new_group(new_group_name,
new_group_description,
join_policy,
login_method='INTERNAL'):
"""Create a group with no user in (yet).
@return its id
"""
query = """INSERT INTO usergroup (name, description, join_policy, login_method)
VALUES (%s, %s, %s, %s)
"""
res = run_sql(query, (new_group_name, new_group_description, join_policy, login_method))
return res
def insert_new_member(uid,
grpID,
status):
"""Insert new member."""
query = """INSERT INTO user_usergroup (id_user, id_usergroup, user_status,
user_status_date)
VALUES (%s,%s,%s,%s)
"""
date = convert_datestruct_to_datetext(localtime())
res = run_sql(query, (uid, grpID, status, date))
return res
def get_group_infos(grpID):
"""Get group infos."""
query = """SELECT id,name,description,join_policy,login_method FROM usergroup
WHERE id = %s"""
grpID = int(grpID)
res = run_sql(query, (grpID, ))
return res
def get_all_groups_description(login_method):
"""Get all groups description, dictionary with key name."""
query = """SELECT name, description
FROM usergroup
WHERE login_method = %s
"""
res = run_sql(query, (login_method, ))
if res:
return dict(res)
else:
return {}
def update_group_infos(grpID,
group_name,
group_description,
join_policy):
"""Update group."""
res = run_sql("""UPDATE usergroup
SET name=%s, description=%s, join_policy=%s
WHERE id=%s""",
(group_name, group_description, join_policy, grpID))
return res
def get_user_status(uid, grpID):
"""Get the status of the user for the given group."""
query = """SELECT user_status FROM user_usergroup
WHERE id_user = %i
AND id_usergroup=%i"""
uid = int(uid)
grpID = int(grpID)
res = run_sql(query% (uid, grpID))
return res
def get_users_by_status(grpID, status, ln=cdslang):
"""Get the list of users with the given status.
@return ((id, nickname),) nickname= user # uid if
the user has no nickname
"""
_ = gettext_set_language(ln)
res = run_sql("""SELECT ug.id_user, u.nickname
FROM user_usergroup ug, user u
WHERE ug.id_usergroup = %s
AND ug.id_user=u.id
AND user_status = %s""",
(grpID, status))
users = []
if res:
for (mid, nickname) in res:
nn = nickname
if not nickname:
nn = _("user") + "#%i" % mid
users.append((mid, nn))
return tuple(users)
def delete_member(grpID, member_id):
"""Delete member."""
query = """DELETE FROM user_usergroup
WHERE id_usergroup = %i
AND id_user = %i"""
grpID = int(grpID)
member_id = int(member_id)
res = run_sql(query% (grpID, member_id))
return res
def delete_group_and_members(grpID):
"""Delete the group and its members."""
query = """DELETE FROM usergroup
WHERE id = %i
"""
grpID = int(grpID)
res = run_sql(query% grpID)
query = """DELETE FROM user_usergroup
WHERE id_usergroup = %i
"""
res = run_sql(query% grpID)
return res
def add_pending_member(grpID, member_id, user_status):
"""Change user status:
Pending member becomes normal member"""
date = convert_datestruct_to_datetext(localtime())
res = run_sql("""UPDATE user_usergroup
SET user_status = %s, user_status_date = %s
WHERE id_usergroup = %s
AND id_user = %s""",
(user_status, date, grpID, member_id))
return res
def leave_group(grpID, uid):
"""Remove user from the group member list."""
query = """DELETE FROM user_usergroup
WHERE id_usergroup=%i
AND id_user=%i"""
grpID = int(grpID)
uid = int(uid)
res = run_sql(query% (grpID, uid))
return res
def drop_external_groups(userId):
"""Drops all the external groups memberships of userid."""
query = """DELETE user_usergroup FROM user_usergroup, usergroup
WHERE user_usergroup.id_user=%s
AND usergroup.id = user_usergroup.id_usergroup
AND usergroup.login_method <> 'INTERNAL'"""
return run_sql(query, (userId,))
def group_name_exist(group_name, login_method='INTERNAL'):
"""Get all group id whose name like group_name and login_method."""
query = """SELECT id
FROM usergroup
WHERE login_method=%s AND name=%s"""
res = run_sql(query, (group_name, login_method,))
return res
def get_group_login_method(grpID):
"""Return the login_method of the group or None if the grpID doesn't exist."""
query = """SELECT login_method
FROM usergroup
WHERE id=%s"""
res = run_sql(query, (grpID, ))
if res:
return res[0][0]
else:
return None
def count_nb_group_user(uid, user_status):
"""
@param uid: user id
@param status: member status
@return integer of number of groups the user belongs to
with the given status, 0 if none
"""
res = run_sql("""SELECT count(id_user)
FROM user_usergroup
WHERE id_user = %s
AND user_status = %s""",
(uid, user_status))
if res:
return int(res[0][0])
else:
return 0
def get_all_users():
"""@return all the email:id"""
query = """SELECT UPPER(email), id
FROM user
WHERE email != ''
"""
res = run_sql(query)
if res:
return dict(res)
else:
return {}
def get_users_in_group(grpID):
"""@return all uids of users belonging to group grpID"""
grpID = int(grpID)
query = """SELECT id_user
FROM user_usergroup
WHERE id_usergroup = %s
"""
res = run_sql(query, (grpID, ))
return [uid[0] for uid in res]
########################## helpful functions ##################################
def __decompress_last(item):
"""private function, used to shorten code"""
item = list(item)
item[-1] = decompress(item[-1])
return item
diff --git a/modules/websession/lib/websession_templates.py b/modules/websession/lib/websession_templates.py
index a5c2a3fd3..c00792186 100644
--- a/modules/websession/lib/websession_templates.py
+++ b/modules/websession/lib/websession_templates.py
@@ -1,2152 +1,2151 @@
## $Id$
## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
__revision__ = "$Id$"
import urllib
import time
import cgi
import gettext
import string
import locale
from invenio.config import \
CFG_CERN_SITE, \
- bibformat, \
cdslang, \
cdsname, \
cdsnameintl, \
supportemail, \
sweburl, \
- version, \
+ CFG_VERSION, \
weburl
from invenio.access_control_config import CFG_EXTERNAL_AUTH_USING_SSO, \
CFG_EXTERNAL_AUTH_LOGOUT_SSO
from invenio.websession_config import \
CFG_WEBSESSION_RESET_PASSWORD_EXPIRE_IN_DAYS, \
CFG_WEBSESSION_ADDRESS_ACTIVATION_EXPIRE_IN_DAYS
from invenio.urlutils import make_canonical_urlargd
from invenio.messages import gettext_set_language
from invenio.websession_config import CFG_WEBSESSION_GROUP_JOIN_POLICY
class Template:
def tmpl_back_form(self, ln, message, act, link):
"""
A standard one-message-go-back-link page.
Parameters:
- 'ln' *string* - The language to display the interface in
- 'message' *string* - The message to display
- 'act' *string* - The action to accomplish when going back
- 'link' *string* - The link text
"""
out = """
""" % (key, value)
return out
def tmpl_external_user_settings(self, ln, html_settings):
_ = gettext_set_language(ln)
out = """
%(external_user_settings)s
%(html_settings)s
%(external_user_groups)s
%(consult_external_groups)s
""" % {
'external_user_settings' : _('External account settings'),
'html_settings' : html_settings,
'consult_external_groups' : _('You can consult the list of your external groups directly in the %(x_url_open)sgroups page%(x_url_close)s.') % {
'x_url_open' : '' % ln,
'x_url_close' : ''
},
'external_user_groups' : _('External user groups'),
}
return out
def tmpl_user_preferences(self, ln, email, email_disabled, password_disabled, nickname):
"""
Displays a form for the user to change his email/password.
Parameters:
- 'ln' *string* - The language to display the interface in
- 'email' *string* - The email of the user
- 'email_disabled' *boolean* - If the user has the right to edit his email
- 'password_disabled' *boolean* - If the user has the right to edit his password
- 'nickname' *string* - The nickname of the user (empty string if user does not have it)
"""
# load the right message language
_ = gettext_set_language(ln)
out = """
""" % {
'change_user' : _("If you want to change your email or set for the first time your nickname, please set new values in the form below."),
'edit_params' : _("Edit login credentials"),
'nickname_label' : _("Nickname"),
'nickname' : nickname,
'nickname_prefix' : nickname=='' and ' '+_("Example")+':johnd' or '',
'new_email' : _("New email address"),
'mandatory' : _("mandatory"),
'example' : _("Example"),
'note' : _("Note"),
'set_values' : _("Set new values"),
'email' : email,
'email_disabled' : email_disabled and "readonly" or "",
'sweburl': sweburl,
'fixed_nickname_note' : _('Since this is considered as a signature for comments and reviews, once set it can not be changed.')
}
if not password_disabled and not CFG_EXTERNAL_AUTH_USING_SSO:
out += """
%(change_pass)s
%(old_password)s:
(%(mandatory)s)
%(note)s:
%(old_password_note)s
%(new_password)s:
(%(optional)s)
%(note)s:
%(password_note)s
%(retype_password)s:
""" % {
'change_pass' : _("If you want to change your password, please enter the old one and set the new value in the form below."),
'mandatory' : _("mandatory"),
'old_password' : _("Old password"),
'new_password' : _("New password"),
'optional' : _("optional"),
'note' : _("Note"),
'password_note' : _("The password phrase may contain punctuation, spaces, etc."),
'old_password_note' : _("You must fill the old password in order to set a new one."),
'retype_password' : _("Retype password"),
'set_values' : _("Set new password"),
'password_disabled' : password_disabled and "disabled" or "",
'sweburl': sweburl,
}
elif not CFG_EXTERNAL_AUTH_USING_SSO and CFG_CERN_SITE:
out += "
"
elif CFG_EXTERNAL_AUTH_USING_SSO and CFG_CERN_SITE:
out += "
" + _("""You can change or reset your CERN account password by means of the %(x_url_open)sCERN account system%(x_url_close)s.""") % \
{'x_url_open' : '', 'x_url_close' : ''} + "
"
return out
def tmpl_user_websearch_edit(self, ln, current = 10, show_latestbox = True, show_helpbox = True):
_ = gettext_set_language(ln)
out = """
%(edit_websearch_settings)s
%(show_latestbox)s
%(show_helpbox)s
%(select_group_records)s
""" % {
'update_settings' : _("Update settings"),
'select_group_records' : _("Number of search results per page"),
}
return out
def tmpl_user_external_auth(self, ln, methods, current, method_disabled):
"""
Displays a form for the user to change his authentication method.
Parameters:
- 'ln' *string* - The language to display the interface in
- 'methods' *array* - The methods of authentication
- 'method_disabled' *boolean* - If the user has the right to change this
- 'current' *string* - The currently selected method
"""
# load the right message language
_ = gettext_set_language(ln)
out = """
%(edit_method)s
%(explain_method)s:
%(select_method)s:
""" % {
'edit_method' : _("Edit login method"),
'explain_method' : _("Please select which login method you would like to use to authenticate yourself"),
'select_method' : _("Select method"),
'sweburl': sweburl,
}
for system in methods:
out += """%(system)s """ % {
'system' : system,
'disabled' : method_disabled and 'disabled="disabled"' or "",
'selected' : current == system and 'checked="checked"' or "",
}
out += """
""" % {
'select_method' : _("Select method"),
}
return out
def tmpl_lost_password_form(self, ln):
"""
Displays a form for the user to ask for his password sent by email.
Parameters:
- 'ln' *string* - The language to display the interface in
- 'msg' *string* - Explicative message on top of the form.
"""
# load the right message language
_ = gettext_set_language(ln)
out = "
" + _("If you have lost the password for your %(cdsname)s %(x_fmt_open)sinternal account%(x_fmt_close)s, then please enter your email address in the following form in order to have a password reset link emailed to you.") % {'x_fmt_open' : '', 'x_fmt_close' : '', 'cdsname' : cdsnameintl[ln]} + "
" + _("Note that if you have been using an external login system, then we cannot do anything and you have to ask there.") + " "
out += _("Alternatively, you can ask %s to change your login system from external to internal.") % ("""%(email)s""" % { 'email' : supportemail }) + "
"
return out
def tmpl_account_info(self, ln, uid, guest, CFG_CERN_SITE):
"""
Displays the account information
Parameters:
- 'ln' *string* - The language to display the interface in
- 'uid' *string* - The user id
- 'guest' *boolean* - If the user is guest
- 'CFG_CERN_SITE' *boolean* - If the site is a CERN site
"""
# load the right message language
_ = gettext_set_language(ln)
out = """
%(account_offer)s
""" % {
'account_offer' : _("%s offers you the possibility to personalize the interface, to set up your own personal library of documents, or to set up an automatic alert query that would run periodically and would notify you of search results by email.") % cdsnameintl[ln],
}
if not guest:
out += """
""" % {
'ln' : ln,
'your_settings' : _("Your Settings"),
'change_account' : _("Set or change your account email address or password. Specify your preferences about the look and feel of the interface.")
}
out += """
%(basket_explain)s""" % {
'ln' : ln,
'your_searches' : _("Your Searches"),
'search_explain' : _("View all the searches you performed during the last 30 days."),
'your_baskets' : _("Your Baskets"),
'basket_explain' : _("With baskets you can define specific collections of items, store interesting records you want to access later or share with others."),
}
if guest:
out += self.tmpl_warning_guest_user(ln = ln, type = "baskets")
out += """
%(explain_alerts)s""" % {
'ln' : ln,
'your_alerts' : _("Your Alerts"),
'explain_alerts' : _("Subscribe to a search which will be run periodically by our service. The result can be sent to you via Email or stored in one of your baskets."),
}
if guest:
out += self.tmpl_warning_guest_user(type="alerts", ln = ln)
out += "
""" % {
'your_loans' : _("Your Loans"),
'explain_loans' : _("Check out book you have on loan, submit borrowing requests, etc. Requires CERN ID."),
}
out += """
"""
return out
def tmpl_warning_guest_user(self, ln, type):
"""
Displays a warning message about the specified type
Parameters:
- 'ln' *string* - The language to display the interface in
- 'type' *string* - The type of data that will get lost in case of guest account (for the moment: 'alerts' or 'baskets')
"""
# load the right message language
_ = gettext_set_language(ln)
if (type=='baskets'):
msg = _("You are logged in as a guest user, so your baskets will disappear at the end of the current session.") + ' '
elif (type=='alerts'):
msg = _("You are logged in as a guest user, so your alerts will disappear at the end of the current session.") + ' '
msg += _("If you wish you can %(x_url_open)slogin or register here%(x_url_close)s.") % {'x_url_open': '',
'x_url_close': ''}
return """
%s
""" % msg
def tmpl_account_body(self, ln, user):
"""
Displays the body of the actions of the user
Parameters:
- 'ln' *string* - The language to display the interface in
- 'user' *string* - The username (nickname or email)
"""
# load the right message language
_ = gettext_set_language(ln)
out = _("You are logged in as %(x_user)s. You may want to a) %(x_url1_open)slogout%(x_url1_close)s; b) edit your %(x_url2_open)saccount settings%(x_url2_close)s.") %\
{'x_user': user,
'x_url1_open': '',
'x_url1_close': '',
'x_url2_open': '',
'x_url2_close': '',
}
return out + "
"
def tmpl_account_template(self, title, body, ln, url):
"""
Displays a block of the your account page
Parameters:
- 'ln' *string* - The language to display the interface in
- 'title' *string* - The title of the block
- 'body' *string* - The body of the block
- 'url' *string* - The URL to go to the proper section
"""
out ="""
""" % (url, title, body)
return out
def tmpl_account_page(self, ln, weburl, accBody, baskets, alerts, searches, messages, groups, administrative):
"""
Displays the your account page
Parameters:
- 'ln' *string* - The language to display the interface in
- 'weburl' *string* - The URL of CDS Invenio
- 'accBody' *string* - The body of the heading block
- 'baskets' *string* - The body of the baskets block
- 'alerts' *string* - The body of the alerts block
- 'searches' *string* - The body of the searches block
- 'messages' *string* - The body of the messages block
- 'groups' *string* - The body of the groups block
- 'administrative' *string* - The body of the administrative block
"""
# load the right message language
_ = gettext_set_language(ln)
out = ""
out += self.tmpl_account_template(_("Your Account"), accBody, ln, '/youraccount/edit?ln=%s' % ln)
out += self.tmpl_account_template(_("Your Messages"), messages, ln, '/yourmessages/display?ln=%s' % ln)
out += self.tmpl_account_template(_("Your Baskets"), baskets, ln, '/yourbaskets/display?ln=%s' % ln)
out += self.tmpl_account_template(_("Your Alert Searches"), alerts, ln, '/youralerts/list?ln=%s' % ln)
out += self.tmpl_account_template(_("Your Searches"), searches, ln, '/youralerts/display?ln=%s' % ln)
groups_description = _("You can consult the list of %(x_url_open)syour groups%(x_url_close)s you are administering or are a member of.")
groups_description %= {'x_url_open': '',
'x_url_close': ''}
out += self.tmpl_account_template(_("Your Groups"), groups_description, ln, '/yourgroups/display?ln=%s' % ln)
submission_description = _("You can consult the list of %(x_url_open)syour submissions%(x_url_close)s and inquire about their status.")
submission_description %= {'x_url_open': '',
'x_url_close': ''}
out += self.tmpl_account_template(_("Your Submissions"), submission_description, ln, '/yoursubmissions.py?ln=%s' % ln)
approval_description = _("You can consult the list of %(x_url_open)syour approvals%(x_url_close)s with the documents you approved or refereed.")
approval_description %= {'x_url_open': '',
'x_url_close': ''}
out += self.tmpl_account_template(_("Your Approvals"), approval_description, ln, '/yourapprovals.py?ln=%s' % ln)
out += self.tmpl_account_template(_("Your Administrative Activities"), administrative, ln, '/admin')
return out
def tmpl_account_emailMessage(self, ln, msg):
"""
Displays a link to retrieve the lost password
Parameters:
- 'ln' *string* - The language to display the interface in
- 'msg' *string* - Explicative message on top of the form.
"""
# load the right message language
_ = gettext_set_language(ln)
out =""
out +="""
%(msg)s %(try_again)s
""" % {
'ln' : ln,
'msg' : msg,
'try_again' : _("Try again")
}
return out
def tmpl_account_reset_password_email_body(self, email, reset_key, ip_address, ln=cdslang):
"""
The body of the email that sends lost internal account
passwords to users.
"""
_ = gettext_set_language(ln)
out = """
%(intro)s
%(intro2)s
<%(link)s>
%(outro)s
%(outro2)s""" % {
'intro': _("Somebody (possibly you) coming from %(ip_address)s "
"has asked\nfor a password reset at %(cdsname)s\nfor "
"the account \"%(email)s\"." % {
'cdsname' :cdsnameintl.get(ln, cdsname),
'email' : email,
'ip_address' : ip_address,
}
),
'intro2' : _("If you want to reset the password for this account, please go to:"),
'link' : "%s/youraccount/access%s" %
(sweburl, make_canonical_urlargd({
'ln' : ln,
'mailcookie' : reset_key
}, {})),
'outro' : _("in order to confirm the validity of this request."),
'outro2' : _("Please note that this URL will remain valid for about %(days)s days only.") % {'days' : CFG_WEBSESSION_RESET_PASSWORD_EXPIRE_IN_DAYS},
}
return out
def tmpl_account_address_activation_email_body(self, email, address_activation_key, ip_address, ln=cdslang):
"""
The body of the email that sends email address activation cookie
passwords to users.
"""
_ = gettext_set_language(ln)
out = """
%(intro)s
%(intro2)s
<%(link)s>
%(outro)s
%(outro2)s""" % {
'intro': _("Somebody (possibly you) coming from %(ip_address)s "
"has asked\nto register a new account at %(cdsname)s\nfor the "
"email address \"%(email)s\"." % {
'cdsname' :cdsnameintl.get(ln, cdsname),
'email' : email,
'ip_address' : ip_address,
}
),
'intro2' : _("If you want to complete this account registration, please go to:"),
'link' : "%s/youraccount/access%s" %
(sweburl, make_canonical_urlargd({
'ln' : ln,
'mailcookie' : address_activation_key
}, {})),
'outro' : _("in order to confirm the validity of this request."),
'outro2' : _("Please note that this URL will remain valid for about %(days)s days only.") % {'days' : CFG_WEBSESSION_ADDRESS_ACTIVATION_EXPIRE_IN_DAYS},
}
return out
def tmpl_account_emailSent(self, ln, email):
"""
Displays a confirmation message for an email sent
Parameters:
- 'ln' *string* - The language to display the interface in
- 'email' *string* - The email to which the message has been sent
"""
# load the right message language
_ = gettext_set_language(ln)
out =""
out += _("Okay, a password reset link has been emailed to %s.") % email
return out
def tmpl_account_delete(self, ln):
"""
Displays a confirmation message about deleting the account
Parameters:
- 'ln' *string* - The language to display the interface in
"""
# load the right message language
_ = gettext_set_language(ln)
out = "
" + _("""Deleting your account""") + '
'
return out
def tmpl_account_logout(self, ln):
"""
Displays a confirmation message about logging out
Parameters:
- 'ln' *string* - The language to display the interface in
"""
# load the right message language
_ = gettext_set_language(ln)
out = _("You are no longer recognized by our system.") + ' '
if CFG_EXTERNAL_AUTH_USING_SSO and CFG_EXTERNAL_AUTH_LOGOUT_SSO:
out += _("""You are still recognized by the centralized
%(x_fmt_open)sSSO%(x_fmt_close)s system. You can
%(x_url_open)slogout from SSO%(x_url_close)s, too.""") % \
{'x_fmt_open' : '', 'x_fmt_close' : '',
'x_url_open' : '' % CFG_EXTERNAL_AUTH_LOGOUT_SSO,
'x_url_close' : ''}
out += ' '
out += _("If you wish you can %(x_url_open)slogin here%(x_url_close)s.") % \
{'x_url_open': '',
'x_url_close': ''}
return out
def tmpl_login_form(self, ln, referer, internal, register_available, methods, selected_method, supportemail, msg=None):
"""
Displays a login form
Parameters:
- 'ln' *string* - The language to display the interface in
- 'referer' *string* - The referer URL - will be redirected upon after login
- 'internal' *boolean* - If we are producing an internal authentication
- 'register_available' *boolean* - If users can register freely in the system
- 'methods' *array* - The available authentication methods
- 'selected_method' *string* - The default authentication method
- 'supportemail' *string* - The email of the support team
- 'msg' *string* - The message to print before the form, if needed
"""
# load the right message language
_ = gettext_set_language(ln)
if msg is "":
out = "
%(please_login)s" % {
'please_login' : _("If you already have an account, please login using the form below.")
}
if CFG_CERN_SITE:
out += "
" + _("If you don't own a CERN account yet, you can register a %(x_url_open)snew CERN lightweight account%(x_url_close)s.") % {'x_url_open' : '', 'x_url_close' : ''} + "
"
else:
if register_available:
out += "
"+_("If you don't own an account yet, please %(x_url_open)sregister%(x_url_close)s an internal account.") %\
{'x_url_open': '',
'x_url_close': ''} + "
"
else:
out += "
" + _("It is not possible to create an account yourself. Contact %s if you want an account.") % ('%s' % (supportemail, supportemail)) + "
"
else:
out = "
%s
" % msg
out += """
"""
if len(methods) > 1:
# more than one method, must make a select
login_select = """"
out += """
%(login_title)s
%(login_select)s
""" % {
'login_title' : _("Login method:"),
'login_select' : login_select,
}
else:
# only one login method available
out += """""" % (methods[0])
out += """
%(username)s:
%(password)s:
""" % {
'ln': ln,
'referer' : cgi.escape(referer),
'username' : _("Username"),
'password' : _("Password"),
'login' : _("login"),
}
if internal:
out += """ (%(lost_pass)s)""" % {
'ln' : ln,
'lost_pass' : _("Lost your password?")
}
out += """
"""
out += """
%(note)s: %(note_text)s
""" % {
'note' : _("Note"),
'note_text': _("You can use your nickname or your email address to login.")}
return out
def tmpl_lost_your_password_teaser(self, ln=cdslang):
"""Displays a short sentence to attract user to the fact that
maybe he lost his password. Used by the registration page.
"""
_ = gettext_set_language(ln)
out = ""
out += """%(maybe_lost_pass)s""" % {
'ln' : ln,
'maybe_lost_pass': ("Maybe you have lost your password?")
}
return out
def tmpl_reset_password_form(self, ln, email, reset_key, msg=''):
"""Display a form to reset the password."""
_ = gettext_set_language(ln)
out = ""
out = "
%s
" % _("Your request is valid. Please set the new "
"desired password in the following form.")
if msg:
out += """
%s
""" % msg
out += """
%(set_password_for)s:
%(email)s
%(type_new_password)s:
%(type_it_again)s:
""" % {
'ln' : ln,
'reset_key' : reset_key,
'email' : email,
'set_password_for' : _('Set a new password for'),
'type_new_password' : _('Type the new password'),
'type_it_again' : _('Type again the new password'),
'set_new_password' : _('Set the new password')
}
return out
def tmpl_register_page(self, ln, referer, level, supportemail, cdsname):
"""
Displays a login form
Parameters:
- 'ln' *string* - The language to display the interface in
- 'referer' *string* - The referer URL - will be redirected upon after login
- 'level' *int* - Login level (0 - all access, 1 - accounts activated, 2+ - no self-registration)
- 'supportemail' *string* - The email of the support team
- 'cdsname' *string* - The name of the installation
"""
# load the right message language
_ = gettext_set_language(ln)
out = ""
if level <= 1:
out += _("Please enter your email address and desired nickname and password:")
if level == 1:
out += _("It will not be possible to use the account before it has been verified and activated.")
out += """
%(email_address)s: (%(mandatory)s)
%(example)s:john.doe@example.com
%(nickname)s: (%(mandatory)s)
%(example)s:johnd
%(password)s: (%(optional)s)
%(note)s: %(password_contain)s
%(retype)s:
%(note)s: %(explain_acc)s""" % {
'referer' : cgi.escape(referer),
'email_address' : _("Email address"),
'nickname' : _("Nickname"),
'password' : _("Password"),
'mandatory' : _("mandatory"),
'optional' : _("optional"),
'example' : _("Example"),
'note' : _("Note"),
'password_contain' : _("The password phrase may contain punctuation, spaces, etc."),
'retype' : _("Retype Password"),
'register' : _("register"),
'explain_acc' : _("Please do not use valuable passwords such as your Unix, AFS or NICE passwords with this service. Your email address will stay strictly confidential and will not be disclosed to any third party. It will be used to identify you for personal services of %s. For example, you may set up an automatic alert search that will look for new preprints and will notify you daily of new arrivals by email.") % cdsname,
}
return out
def tmpl_account_adminactivities(self, ln, weburl, uid, guest, roles, activities):
"""
Displays the admin activities block for this user
Parameters:
- 'ln' *string* - The language to display the interface in
- 'weburl' *string* - The address of the site
- 'uid' *string* - The used id
- 'guest' *boolean* - If the user is guest
- 'roles' *array* - The current user roles
- 'activities' *array* - The user allowed activities
"""
# load the right message language
_ = gettext_set_language(ln)
out = ""
# guest condition
if guest:
return _("You seem to be a guest user. You have to %(x_url_open)slogin%(x_url_close)s first.") % \
{'x_url_open': '',
'x_url_close': ''}
# no rights condition
if not roles:
return "
" + _("You are not authorized to access administrative functions.") + "
"
# displaying form
out += "
" + _("You are enabled to the following roles: %(x_role)s.") % {'x_role': ('' + string.join(roles, ", ") + " ")} + '
'
if activities:
out += _("Here are some interesting web admin links for you:")
# print proposed links:
activities.sort(lambda x, y: cmp(string.lower(x), string.lower(y)))
for action in activities:
if action == "runbibedit":
out += """ %s""" % (weburl, ln, _("Run BibEdit"))
if action == "cfgbibformat":
out += """ %s""" % (weburl, ln, _("Configure BibFormat"))
if action == "cfgbibharvest":
out += """ %s""" % (weburl, ln, _("Configure BibHarvest"))
if action == "cfgoairepository":
out += """ %s""" % (weburl, ln, _("Configure OAI Repository"))
if action == "cfgbibindex":
out += """ %s""" % (weburl, ln, _("Configure BibIndex"))
if action == "cfgbibrank":
out += """ %s""" % (weburl, ln, _("Configure BibRank"))
if action == "cfgwebaccess":
out += """ %s""" % (weburl, ln, _("Configure WebAccess"))
if action == "cfgwebcomment":
out += """ %s""" % (weburl, ln, _("Configure WebComment"))
if action == "cfgwebsearch":
out += """ %s""" % (weburl, ln, _("Configure WebSearch"))
if action == "cfgwebsubmit":
out += """ %s""" % (weburl, ln, _("Configure WebSubmit"))
out += " " + _("For more admin-level activities, see the complete %(x_url_open)sAdmin Area%(x_url_close)s.") %\
{'x_url_open': '',
'x_url_close': ''}
return out
def tmpl_create_userinfobox(self, ln, url_referer, guest, username, submitter, referee, admin):
"""
Displays the user block
Parameters:
- 'ln' *string* - The language to display the interface in
- 'url_referer' *string* - URL of the page being displayed
- 'guest' *boolean* - If the user is guest
- 'username' *string* - The username (nickname or email)
- 'submitter' *boolean* - If the user is submitter
- 'referee' *boolean* - If the user is referee
- 'admin' *boolean* - If the user is admin
"""
# load the right message language
_ = gettext_set_language(ln)
out = """ """ % weburl
if guest:
out += """%(guest_msg)s ::
%(login)s""" % {
'weburl' : weburl,
'sweburl': sweburl,
'ln' : ln,
'guest_msg' : _("guest"),
'session' : _("session"),
'alerts' : _("alerts"),
'baskets' : _("baskets"),
'login' : _("login"),
'referer' : url_referer and ('&referer=%s' % urllib.quote(url_referer)) or '',
}
else:
out += """%(username)s ::
%(account)s ::
%(messages)s ::
%(baskets)s ::
%(alerts)s ::
%(groups)s ::
%(stats)s :: """ % {
'username' : username,
'weburl' : weburl,
'sweburl' : sweburl,
'ln' : ln,
'account' : _("account"),
'alerts' : _("alerts"),
'messages': _("messages"),
'baskets' : _("baskets"),
'groups' : _("groups"),
'stats' : _("statistics"),
}
if submitter:
out += """%(submission)s :: """ % {
'weburl' : weburl,
'ln' : ln,
'submission' : _("submissions"),
}
if referee:
out += """%(approvals)s :: """ % {
'weburl' : weburl,
'ln' : ln,
'approvals' : _("approvals"),
}
if admin:
out += """%(administration)s :: """ % {
'sweburl' : sweburl,
'ln' : ln,
'administration' : _("administration"),
}
out += """%(logout)s""" % {
'sweburl' : sweburl,
'ln' : ln,
'logout' : _("logout"),
}
return out
def tmpl_warning(self, warnings, ln=cdslang):
"""
Prepare the warnings list
@param warnings: list of warning tuples (warning_msg, arg1, arg2, etc)
@return html string of warnings
"""
from invenio.errorlib import get_msgs_for_code_list
span_class = 'important'
out = ""
if type(warnings) is not list:
warnings = [warnings]
if len(warnings) > 0:
warnings_parsed = get_msgs_for_code_list(warnings, 'warning', ln)
for (warning_code, warning_text) in warnings_parsed:
if not warning_code.startswith('WRN'):
#display only warnings that begin with WRN to user
continue
span_class = 'important'
out += '''
%(warning)s ''' % \
{ 'span_class' : span_class,
'warning' : warning_text }
return out
else:
return ""
def tmpl_warnings(self, warnings, ln=cdslang):
"""
Display len(warnings) warning fields
@param infos: list of strings
@param ln=language
@return html output
"""
if not((type(warnings) is list) or (type(warnings) is tuple)):
warnings = [warnings]
warningbox = ""
if warnings != []:
warningbox = "
\n Warning:\n"
for warning in warnings:
lines = warning.split("\n")
warningbox += "
"
for line in lines[0:-1]:
warningbox += line + " \n"
warningbox += lines[-1] + "
"
warningbox += "
\n"
return warningbox
def tmpl_display_all_groups(self,
infos,
admin_group_html,
member_group_html,
external_group_html = None,
warnings=[],
ln=cdslang):
"""
Displays the 3 tables of groups: admin, member and external
Parameters:
- 'ln' *string* - The language to display the interface in
- 'admin_group_html' *string* - HTML code for displaying all the groups
the user is the administrator of
- 'member_group_html' *string* - HTML code for displaying all the groups
the user is member of
- 'external_group_html' *string* - HTML code for displaying all the
external groups the user is member of
"""
_ = gettext_set_language(ln)
group_text = self.tmpl_infobox(infos)
group_text += self.tmpl_warning(warnings)
if external_group_html:
group_text += """
""" %(admin_group_html, member_group_html)
return group_text
def tmpl_display_admin_groups(self, groups, ln=cdslang):
"""
Display the groups the user is admin of.
Parameters:
- 'ln' *string* - The language to display the interface in
- 'groups' *list* - All the group the user is admin of
- 'infos' *list* - Display infos on top of admin group table
"""
_ = gettext_set_language(ln)
img_link = """
%(text)s
"""
out = self.tmpl_group_table_title(img="/img/group_admin.png",
text=_("You are an administrator of the following groups:") )
out += """
%s
%s
""" %(_("Group"), _("Description"))
if len(groups) == 0:
out += """
%s
""" %(_("You are not an administrator of any groups."),)
for group_data in groups:
(grpID, name, description) = group_data
edit_link = img_link % {'weburl' : weburl,
'grpID' : grpID,
'ln': ln,
'img':"webbasket_create_small.png",
'text':_("Edit group"),
'action':"edit"
}
members_link = img_link % {'weburl' : weburl,
'grpID' : grpID,
'ln': ln,
'img':"webbasket_usergroup.png",
'text':_("Edit %s members") % '',
'action':"members"
}
out += """
%s
%s
%s
%s
""" % (cgi.escape(name), cgi.escape(description), edit_link, members_link)
out += """
""" % {'ln': ln,
'write_label': _("Create new group"),
}
return out
def tmpl_display_member_groups(self, groups, ln=cdslang):
"""
Display the groups the user is member of.
Parameters:
- 'ln' *string* - The language to display the interface in
- 'groups' *list* - All the group the user is member of
"""
_ = gettext_set_language(ln)
group_text = self.tmpl_group_table_title(img="/img/webbasket_us.png", text=_("You are a member of the following groups:"))
group_text += """
""" % {'ln': ln,
'join_label': _("Join new group"),
'leave_label':_("Leave group")
}
return group_text
def tmpl_display_external_groups(self, groups, ln=cdslang):
"""
Display the external groups the user is member of.
Parameters:
- 'ln' *string* - The language to display the interface in
- 'groups' *list* - All the group the user is member of
"""
_ = gettext_set_language(ln)
group_text = self.tmpl_group_table_title(img="/img/webbasket_us.png", text=_("You are a member of the following external groups:"))
group_text += """
"""
return group_text
def tmpl_display_input_group_info(self,
group_name,
group_description,
join_policy,
act_type="create",
grpID="",
warnings=[],
ln=cdslang):
"""
Display group data when creating or updating a group:
Name, description, join_policy.
Parameters:
- 'ln' *string* - The language to display the interface in
- 'group_name' *string* - name of the group
- 'group_description' *string* - description of the group
- 'join_policy' *string* - join policy
- 'act_type' *string* - info about action : create or edit(update)
- 'grpID' *string* - ID of the group(not null in case of group editing)
- 'warnings' *list* - Display warning if values are not correct
"""
_ = gettext_set_language(ln)
#default
hidden_id =""
form_name = "create_group"
action = weburl + '/yourgroups/create'
button_label = _("Create new group")
button_name = "create_button"
label = _("Create new group")
delete_text = ""
if act_type == "update":
form_name = "update_group"
action = weburl + '/yourgroups/edit'
button_label = _("Update group")
button_name = "update"
label = _('Edit group %s') % cgi.escape(group_name)
delete_text = """"""
delete_text %= (_("Delete group"),"delete")
if grpID != "":
hidden_id = """"""
hidden_id %= grpID
out = self.tmpl_warning(warnings)
out += """
%(label)s
%(name_label)s
%(description_label)s
%(join_policy_label)s
%(join_policy)s
%(hidden_id)s
%(delete_text)s
"""
out %= {'action' : action,
'logo': weburl + '/img/webbasket_create.png',
'label': label,
'form_name' : form_name,
'name_label': _("Group name:"),
'delete_text': delete_text,
'description_label': _("Group description:"),
'join_policy_label': _("Group join policy:"),
'group_name': cgi.escape(group_name, 1),
'group_description': cgi.escape(group_description, 1),
'button_label': button_label,
'button_name':button_name,
'cancel_label':_("Cancel"),
'hidden_id':hidden_id,
'ln': ln,
'join_policy' :self.__create_join_policy_selection_menu("join_policy",
join_policy,
ln)
}
return out
def tmpl_display_input_join_group(self,
group_list,
group_name,
group_from_search,
search,
warnings=[],
ln=cdslang):
"""
Display the groups the user can join.
He can use default select list or the search box
Parameters:
- 'ln' *string* - The language to display the interface in
- 'group_list' *list* - All the group the user can join
- 'group_name' *string* - Name of the group the user is looking for
- 'group_from search' *list* - List of the group the user can join matching group_name
- 'search' *int* - User is looking for group using group_name
- 'warnings' *list* - Display warning if two group are selected
"""
_ = gettext_set_language(ln)
out = self.tmpl_warning(warnings)
search_content = ""
if search:
search_content = """
"""
out %= {'action' : weburl + '/yourgroups/join',
'logo': weburl + '/img/webbasket_create.png',
'label': _("Join group"),
'group_name': cgi.escape(group_name, 1),
'label2':_("or find it") + ': ',
'list_label':_("Choose group:"),
'ln': ln,
'find_label': _("Find group"),
'cancel_label':_("Cancel"),
'group_list' :self.__create_select_menu("grpID",group_list, _("Please select:")),
'search_content' : search_content
}
return out
def tmpl_display_manage_member(self,
grpID,
group_name,
members,
pending_members,
infos=[],
warnings=[],
ln=cdslang):
"""Display current members and waiting members of a group.
Parameters:
- 'ln' *string* - The language to display the interface in
- 'grpID *string* - ID of the group
- 'group_name' *string* - Name of the group
- 'members' *list* - List of the current members
- 'pending_members' *list* - List of the waiting members
- 'infos' *tuple of 2 lists* - Message to inform user about his last action
- 'warnings' *list* - Display warning if two group are selected
"""
_ = gettext_set_language(ln)
out = self.tmpl_warning(warnings)
out += self.tmpl_infobox(infos)
out += """
%(title)s
%(header1)s
%(member_text)s
%(header2)s
%(pending_text)s
%(header3)s
%(invite_text)s
"""
if members :
member_list = self.__create_select_menu("member_id", members, _("Please select:"))
member_text = """
""" % _("No members awaiting approval.")
header1 = self.tmpl_group_table_title(text=_("Current members"))
header2 = self.tmpl_group_table_title(text=_("Members awaiting approval"))
header3 = _("Invite new members")
link_open = ''
link_open %= (weburl, ln)
invite_text = _("If you want to invite new members to join your group, please use the %(x_url_open)sweb message%(x_url_close)s system.") % \
{'x_url_open': link_open,
'x_url_close': ''}
action = weburl + '/yourgroups/members?ln=' + ln
out %= {'title':_('Group: %s') % group_name,
'member_text' : member_text,
'pending_text' :pending_text,
'action':action,
'grpID':grpID,
'header1': header1,
'header2': header2,
'header3': header3,
'img_alt_header1': _("Current members"),
'img_alt_header2': _("Members awaiting approval"),
'img_alt_header3': _("Invite new members"),
'invite_text': invite_text,
'imgurl': weburl + '/img',
'cancel_label':_("Cancel"),
'ln':ln
}
return out
def tmpl_display_input_leave_group(self,
groups,
warnings=[],
ln=cdslang):
"""Display groups the user can leave.
Parameters:
- 'ln' *string* - The language to display the interface in
- 'groups' *list* - List of groups the user is currently member of
- 'warnings' *list* - Display warning if no group is selected
"""
_ = gettext_set_language(ln)
out = self.tmpl_warning(warnings)
out += """
%(label)s
%(list_label)s
%(groups)s
%(submit)s
"""
if groups:
groups = self.__create_select_menu("grpID", groups, _("Please select:"))
list_label = _("Group list")
submit = """""" % _("Leave group")
else :
groups = _("You are not member of any group.")
list_label = ""
submit = ""
action = weburl + '/yourgroups/leave?ln=%s'
action %= (ln)
out %= {'groups' : groups,
'list_label' : list_label,
'action':action,
'logo': weburl + '/img/webbasket_create.png',
'label' : _("Leave group"),
'cancel_label':_("Cancel"),
'ln' :ln,
'submit' : submit
}
return out
def tmpl_confirm_delete(self, grpID, ln=cdslang):
"""
display a confirm message when deleting a group
@param ln: language
@return html output
"""
_ = gettext_set_language(ln)
action = weburl + '/yourgroups/edit'
out = """
%(message)s
"""% {'message': _("Are you sure you want to delete this group?"),
'ln':ln,
'yes_label': _("Yes"),
'no_label': _("No"),
'grpID':grpID,
'action': action
}
return out
def tmpl_confirm_leave(self, uid, grpID, ln=cdslang):
"""
display a confirm message
@param ln: language
@return html output
"""
_ = gettext_set_language(ln)
action = weburl + '/yourgroups/leave'
out = """
%(message)s
"""% {'message': _("Are you sure you want to leave this group?"),
'ln':ln,
'yes_label': _("Yes"),
'no_label': _("No"),
'grpID':grpID,
'action': action
}
return out
def __create_join_policy_selection_menu(self, name, current_join_policy, ln=cdslang):
"""Private function. create a drop down menu for selection of join policy
@param current_join_policy: join policy as defined in CFG_WEBSESSION_GROUP_JOIN_POLICY
@param ln: language
"""
_ = gettext_set_language(ln)
elements = [(CFG_WEBSESSION_GROUP_JOIN_POLICY['VISIBLEOPEN'],
_("Visible and open for new members")),
(CFG_WEBSESSION_GROUP_JOIN_POLICY['VISIBLEMAIL'],
_("Visible but new members need approval"))
]
select_text = _("Please select:")
return self.__create_select_menu(name, elements, select_text, selected_key=current_join_policy)
def __create_select_menu(self, name, elements, select_text, multiple=0, selected_key=None):
""" private function, returns a popup menu
@param name: name of HTML control
@param elements: list of (key, value)
"""
if multiple :
out = """