diff --git a/INSTALL b/INSTALL index 646108270..d662cf868 100644 --- a/INSTALL +++ b/INSTALL @@ -1,605 +1,617 @@ CERN DOCUMENT SERVER SOFTWARE (CDSware) INSTALLATION ==================================================== Revision: $Id$ About ===== This document specifies how to build, customize, and install the CERN Document Server Software (CDSware) for the first time. See RELEASE-NOTES if you are upgrading from a previous CDSware release. Contents ======== 0. Prerequisites 1. Quick instructions for the impatient CDSware admin 2. Detailed instructions for the patient CDSware admin 3. Configuration philosophy explained and elucidated 0. Prerequisites ================ Here is the software you need to have around before you start installing CDSware: a) Unix-like operating system. The main development and production platform for CDSware at CERN is Debian GNU/Linux, but any Unix system supporting the software listed below should do. Note that localhost should have an MTA running so that CDSware can email notification alerts or registration information to the end users, contact moderators and reviewers of submitted documents, inform administrators about various runtime system information, etc. b) MySQL server (may be on a remote machine), and MySQL client (must be available locally too). MySQL versions 4.0.x are recommended, versions above 4.1.0 may cause troubles. Please set the variable ``max_allowed_packet'' in your ``my.cnf'' init file to at least 4M. c) Apache 2 server, with support for loading DSO modules. Tested mainly with version 2.0.43 and above. Apache 2 is required for the mod_python module (see below). Note for FreeBSD users: Thierry Thomas reports troubles with Python open() in the Apache 2 and mod_python context. The solution is either (1) to compile Apache 2 with --enable-threads (the port has a knob WITH_THREADS to do that); or (2) to add the two following two lines in Apache's envvars file: LD_PRELOAD=/usr/lib/libc_r.so # or libpthread.so export LD_PRELOAD d) Python v2.2.2 or above (v2.3.2 and above recommended): as well as the following Python modules: - (mandatory) MySQLdb: - (mandatory) Numeric module (v21 and above): - (recommended) PyStemmer, for indexing and ranking: - (recommended) PyRXP, for very fast XML MARC processing: - (recommended) Gnuplot.Py, for producing graphs: - (optional) 4suite, slower alternative to PyRXP: - (optional) Psyco, to speed up the code at places: e) mod_python Apache module. Tested mainly with versions 3.0BETA4 and above. mod_python 3.x is required for Apache 2. Previous versions (as well as Apache 1 ones) exhibited some problems with MySQL connectivity in our experience. f) PHP compiled as Apache module, including MySQL support. Tested mainly with PHP version 4.3.0 and above. (Note that if you are compiling mod_php from source, it is good to compile it against the same MySQL client library as mod_python, so that the two Apache modules are using the same MySQL client library. We saw Apache/PHP/Python problems in the past when they weren't.) g) PHP compiled as a standalone command-line executable (CLI) (in addition to Apache module) is required, too. As of PHP 4.3.0 you'll obtain the CLI executable by default, so you don't have to compile it separately. Note that PHP CLI should be compiled with the process control support (--enable-pcntl) and the compression library (--with-zlib). h) WML - Website META Language. Tested mainly with versions 2.0.8 and 2.0.9. Note that on Red Hat Linux 9 the WML 2.0.9 compiled with Perl 5.8.0 exhibits problems, so you better use downgraded/upgraded Perl for compiling WML on that platform. i) If you want to be able to extract references from PDF fulltext files, then you need to install pdftotext version 3 at least. j) If you want to be able to search for words in the fulltext files (i.e. to have fulltext indexing), then you need as well to install some of the following tools: - for PDF files: pdftotext or pstotext - for PostScript files: pstotext or ps2ascii - for MS Word files: antiword, catdoc, or wvText - for MS PowerPoint files: pptHtml and html2text - for MS Excel files: xlhtml and html2text k) If you have chosen to install fast XML MARC Python processors in the step d) above, then you have to install the parsers themselves: - (optional) RXP: - (optional) 4suite: l) (recommended) Gnuplot, the command-line driven interactive plotting program. It is used to display download and citation history graphs on the Detailed record pages on the web interface. Note that Gnuplot is not required, only recommended. m) (recommended) A Common Lisp implementation, such as CLISP, SBCL or CMUCL. It is used for the web server log analysing tool and the metadata checking program. Note that any of the three implementations CLISP, SBCL, or CMUCL will do. CMUCL produces fastest machine code, but it does not support UTF-8 yet. Pick up CLISP if you don't know what to do. Note that a Common Lisp implementation is not required, only recommended. n) GNU Gettext, a set of tools that makes it possible to translate the application in multiple languages. This is available by default on many systems. Note that the configure script checks whether you have all the prerequisite software installed and that it won't let you to continue unless everything is in order. It also warns you if it cannot find some optional, but recommended software. 1. Quick instructions for the impatient CDSware admin ===================================================== $ cd /usr/local/src/ $ wget http://cdsware.cern.ch/download/cdsware-0.3.2.tar.gz $ wget http://cdsware.cern.ch/download/cdsware-0.3.2.tar.gz.md5 $ wget http://cdsware.cern.ch/download/cdsware-0.3.2.tar.gz.sig $ md5sum -v -c cdsware-0.3.2.tar.gz.md5 $ gpg --verify cdsware-0.3.2.tar.gz.sig cdsware-0.3.2.tar.gz $ tar xvfz cdsware-0.3.2.tar.gz $ cd cdsware-0.3.2 $ ./configure --prefix=/usr/local/cdsware-DEMO \ --with-webdir=/var/www/DEMO \ --with-weburl=http://webserver.domain.com/DEMO \ --with-dbhost=sqlserver.domain.com \ --with-dbname=cdsware \ --with-dbuser=cdsware \ --with-dbpass=myp1ss \ --with-python=/opt/python/bin/python2.3 $ vi ./config/config.wml ## optional, but strongly recommended $ make $ mysql -h sqlserver.domain.com -u root -p mysql mysql> CREATE DATABASE cdsware; mysql> GRANT ALL PRIVILEGES ON cdsware.* TO cdsware@webserver.domain.com IDENTIFIED BY 'myp1ss'; $ sudo vi /path/to/apache/conf/httpd.conf ## see below in part 2 $ sudo vi /path/to/php/conf/php.ini ## see below in part 2 $ sudo /path/to/apache/bin/apachectl graceful + $ sudo ln -s /usr/local/cdsware-DEMO/lib/python/cdsware \ + /usr/lib/python2.3/site-packages/cdsware $ make create-tables ## optional $ make install $ make test ## optional $ sudo chown -R www-data /usr/local/cdsware-DEMO/var $ make create-demo-site ## optional $ make load-demo-records ## optional $ make remove-demo-records ## optional $ make drop-demo-site ## optional $ netscape http://webserver.domain.com/cdsware/admin/ ## optional 2. Detailed instructions for the patient CDSware admin ====================================================== The CERN Document Server Software (CDSware) uses standard GNU autoconf method to build and install its files. This means that you proceed as follows: $ cd /usr/local/src/ Change to a directory where we will configure and build the CDS Software. (The built files will be installed into different "target" directories later.) $ wget http://cdsware.cern.ch/download/cdsware-0.3.2.tar.gz $ wget http://cdsware.cern.ch/download/cdsware-0.3.2.tar.gz.md5 $ wget http://cdsware.cern.ch/download/cdsware-0.3.2.tar.gz.sig Fetch CDSware source tarball from the CDSware distribution server, together with MD5 checksum and GnuPG cryptographic signature files useful for verifying the integrity of the tarball. $ md5sum -v -c cdsware-0.3.2.tar.gz.md5 Verify MD5 checksum. $ gpg --verify cdsware-0.3.2.tar.gz.sig cdsware-0.3.2.tar.gz Verify GnuPG cryptographic signature. Note that you may first have to import my public key into your keyring, if you haven't done that already: $ gpg --keyserver wwwkeys.pgp.net --recv-keys 0xBA5A2B67 The output of the gpg --verify command should then read: Good signature from "Tibor Simko " You can safely ignore any trusted signature certification warning that may follow after the signature has been successfully verified. $ tar xvfz cdsware-0.3.2.tar.gz Untar the distribution tarball. $ cd cdsware-0.3.2 Go to the source directory. $ ./configure --prefix=/usr/local/cdsware-DEMO \ --with-webdir=/var/www/DEMO \ --with-weburl=http://webserver.domain.com/DEMO \ --with-dbhost=sqlserver.domain.com \ --with-dbname=cdsware \ --with-dbuser=cdsware \ --with-dbpass=myp1ss \ --with-python=/opt/python/bin/python2.3 Configure essential CDSware parameters, with the following signification: --prefix=/usr/local/cdsware-DEMO The CDSware general installation directory, used to hold command-line binaries and program libraries containing the core CDSware functionality, but also to store runtime log and cache information. Several subdirs like `bin', `lib', and `var' will be created inside the --prefix directory to this effect. Note that the --prefix directory should be chosen outside of the Apache htdocs tree, since no file from this directory is to be accessible on the Web. --with-webdir=/var/www/DEMO The directory holding the web interface to CDSware, i.e. containing all the callable scripts and web pages visible to the end user. Must be located inside Apache htdocs tree. The scripts within this directory will generally call core CDSware libraries and binaries installed in the --prefix directory. --with-weburl=http://webserver.domain.com/DEMO The URL corresponding to the --with-webdir directory. It will denote the home URL of your CDSware installation. --with-dbhost=sqlserver.domain.com --with-dbname=cdsware --with-dbuser=cdsware --with-dbpass=myp1ss The database server host, the database name, and the database user credentials. --with-python=/opt/python/bin/python2.3 Optionally, specify a path to some specific Python binary. This is useful if you have more than one Python installation on your system. If you don't set this option, then the first Python that will be found in your PATH will be chosen for running CDSware. CDSware won't install to any other directory but to the two mentioned in this configuration line. Do not use trailing slashes when specifying any of the above values. This configuration step is mandatory, and is referred to as "pre-compile time configuration step a)" in the elucidative explanatory commentary below. (Note that if you prefer to build CDSware out of its source tree, you may run the above configure command like this: $ mkdir build && cd build && ../configure --prefix=...) $ vi ./config/config.wml ## optional, but strongly recommended Optionally, customize your CDSware installation. We strongly recommend you to edit at least the top of this file where you can define some very essential CDSware parameters like the name of your CDSware document server or the email address of the local CDSware administrator. (The latter is needed if you want to use administration modules, and you certainly do!) The rest of the "config.wml" file enables you to change the CDSware web page look and feel, and otherwise to influence its behaviour and default parameters. This configuration step is optional, but strongly recommended. It is referred to as "compile time configuration step b)" in the elucidative explanatory commentary below. $ make Launch the CDSware build. Since many messages are printed during the build process, you may want to run it in a fast-scrolling terminal such as rxvt or in a detached screen session. During this step all the pages and scripts will be pre-created and customized based on the config you have edited in the previous step. Before proceeding further with the CDSware installation, we have to do some admin-level tasks on the MySQL and Apache servers. $ mysql -h sqlserver.domain.com -u root -p mysql mysql> CREATE DATABASE cdsware; mysql> GRANT ALL PRIVILEGES ON cdsware.* TO cdsware@webserver.domain.com IDENTIFIED BY 'myp1ss'; You need to create a dedicated database on your MySQL server that the CDSware can use for its purposes. Please contact your MySQL administrator and ask him to execute the above commands that will create the "cdsware" database, a user called "cdsware" with password "myp1ss", and that will grant all rights on the "cdsware" database to the "cdsware" user. (Of course, you are free to choose your own user credentials and the database name; the above values were just an example. See also the configure line below.) $ sudo vi /path/to/apache/conf/httpd.conf ## see below in part 2 Please ask your webserver administrator to put the following lines in your "httpd.conf" configuration file: AddDefaultCharset UTF-8 AddType application/x-httpd-php .php AddType application/x-httpd-php-source .phps DirectoryIndex index.en.html index.html index.py index.en.php index.php This is to ensure that the browsers will get UTF-8 as the default page encoding, that "*.php" files will be interpreted by the web server as PHP files, and that "index.py" or "index.en.php" will be considered as directory index file. In addition, you have to ask Apache to interpret .py files in the installation place via mod_python and to indicate path to CDSware's Python library directory (composed from your --prefix choice followed by `/lib/python'): AddHandler python-program .py PythonHandler mod_python.publisher PythonPath "['/usr/local/cdsware-DEMO/lib/python']+sys.path" PythonDebug On $ sudo vi /path/to/php/conf/php.ini ## see below in part 2 Please ask your webserver administrator to put the following lines in your "php.ini" configuration file: log_errors = on display_errors = off expose_php = off max_execution_time = 160 register_globals = on short_open_tag = on This will set up some relevant PHP variables. $ sudo /path/to/apache/bin/apachectl graceful Please ask your webserver administrator to restart the Apache server after the above "httpd.conf" and "php.ini" changes. After these admin-level tasks to be performed as root, let's now go back to finish the installation of the CDSware package. + $ sudo ln -s /usr/local/cdsware-DEMO/lib/python/cdsware \ + /usr/lib/python2.3/site-packages/cdsware + + You have to create a symbolic link inside Python's + site-packages directory that would indicate Python where to + find CDSware files. Note that the exact location of the + symlink depends on the --prefix option you have used and on + the Python version you are using. (See also --with-python + configure option.) + $ make create-tables ## optional Optionally, create CDSware tables on the MySQL server. You probably want to do this step only once, i.e. if you have not created any CDSware database and tables yet. $ make install Install the web pages, scripts, utilities and everything needed for runtime into the respective directories, as specified earlier by the configure command. After this step, you should be able to point your browser to the chosen URL of your local CDSware installation and see it running! $ make test Optionally, you can run our test suite to verify the results of known tests on your local CDSware installation. Note that this command should be run only after you have installed the whole system via `make install'. $ sudo chown -R www-data /usr/local/cdsware-DEMO/var One more superuser step, as we need to enable Apache server to write some log information and to cache interesting entities inside the "var" subdirectory of our CDSware general installation directory. Here we assume that your Apache server processes are run under "www-data" group. Change this appropriately for your system. $ make create-demo-site ## optional This step is recommended to test your local CDSware installation. It should give you our "Atlantis Institute of Science" demo installation, exactly as you see it at . $ make load-demo-records ## optional Optionally, load some demo records to be able to test indexing and searching of your local demo CDSware installation. $ make remove-demo-records ## optional Optionally, remove the demo records loaded in the previous step but otherwise keep the demo collection, submit, format etc configurations that you may reuse and modify for production purposes. $ make drop-demo-site ## optional Optionally, drop also all the demo configuration so that you'll have a blank CDSware system for your production purposes. $ netscape http://webserver.domain.com/DEMO/admin/ ## optional Optionally, do further runtime configuration of the CDSware, like definition of data collections, document types, document formats, word file tables, etc. This configuration step is optional, and is referred to as "runtime configuration step c)" in the elucidative explanatory commentary below. 3. Configuration philosophy explained and elucidated ==================================================== As you could see from the above, the configuration of the CDSware is threefold: (a) pre-compile time configuration phase [uses command line options / while doing "configure"] (b) compile time configuration phase [uses WML / after "configure", while doing "make && make install"] (c) runtime configuration phase [uses MySQL / after "make install", while doing "netscape http://webserver.domain.com/DEMO/admin/"] What is the difference, and why? (a) pre-compile time configuration phase [uses command line options / while doing "configure"] This configures essential CDSware parameters that makes your CDSware copy installable and runable. The essential parameters include: general CDSware installation directory containing (among others) binaries, libraries, and log and cache directories; install directory for Web scripts and pages; and MySQL user and server credentials. This configuration step uses standard GNU autoconf approach, i.e. you will run the standard "configure" script. Note that the only arguments that CDSware takes into consideration are the general "--prefix" one and CDSware-specific "--with-foo" arguments, see the end of "configure --help" output. This configuration step is mandatory. Without knowing theses essential parameters there is nothing to install and nothing to run. Usually, you do this step only once. (b) compile time configuration phase [uses WML / after "configure", while doing "make && make install"] Optionally, you may choose to influence CDSware behaviour, to set up CDSware system name, to choose its default parameters, to change its look and feel, to add your local web pages, etc. This configuration step uses WML, the Website META Language. The most important configuration file is "config/config.wml" that you can edit at your will. Optionally, if you are an advanced user, you may edit other WML files in the distribution tree. After that, when you type "make", the CDSware pages will be pre-generated. We prefer that this configuration step is done during compile-time and not runtime, because of multiple reasons: (i) the pre-generated pages impose less load on the web server and the database server and so they are served faster to the end user; (ii) we use several different languages (Python, PHP) and by using independent compile-time configuration language we can share the same configuration variables across heterogeneous languages; (iii) use of WML and page templates enables you to easily change anything in the CDSware system, even into deep levels. If you are changing parameters and/or look and feel of CDSware pages, you may want to repeat these step several times: $ vi config/config.wml $ make drop-tables $ make create-tables $ make create-demo-site $ make load-demo-records $ netscape http://webserver.domain.com/DEMO/ [...] to see what it brings, until you are satisfied with the result. (Note that you may as well choose to change these parameters inside the CDSware library configuration files later on, during runtime, at least for non-static Python and PHP pages.) (c) runtime configuration phase [uses MySQL / after "make install", while doing "netscape http://webserver.domain.com/DEMO/admin/"] Optionally, you will most probably want to define specific data collections, to configure submit and search page for those collections, to specify search options and word files to search in, formats how to display data, etc. This configuration step uses MySQL configuration tables and is done during the runtime, for your convenience. It means that after previous configuration step (b), and after successful "make install", if you are happy with its result you no longer edit WML files within the CDSware source tree but rather configure "fully running" CDSware installation via its Administration web interface. Usually, you will do this step many times in the future, to tweak the running installation, to add new collections and data types, etc. (Note that if you want to change something "deeper" in a running CDSware installation, such as look and feel of pages, or to add some new pages, then you may want to edit library configuration files for Python and PHP non-static pages, as noted above. But if you want to do really "deep" changes, you need to go back to WML source, so you may want to leave your customized copy of the CDSware WML source tree around.) We hope that this explains why we have chosen this three-level configuration model, and that you will find it convenient in real life. Good luck, and thanks for choosing the CERN Document Server Software. - CDS Development Group diff --git a/modules/bibconvert/bin/bibconvert.in b/modules/bibconvert/bin/bibconvert.in index 99672dfc4..b031c454e 100644 --- a/modules/bibconvert/bin/bibconvert.in +++ b/modules/bibconvert/bin/bibconvert.in @@ -1,188 +1,186 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """BibConvert tool to convert bibliographic records from any format to any format.""" __version__ = "$Id$" try: import fileinput import string import os import re import sys import time import getopt from time import gmtime, strftime, localtime import os.path except ImportError, e: print "Error: %s" % e import sys sys.exit(1) try: - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.search_engine import perform_request_search from cdsware.config import * - from cdsware import bibconvert + import cdsware.bibconvert except ImportError, e: print "Error: %s" % e sys.exit(1) ### MAIN ### ar_ = [] -conv_setting = bibconvert.set_conv() -sysno = bibconvert.generate("DATE(%w%H%M%S)") -sysno500 = bibconvert.generate("DATE(%w%H%M%S)") +conv_setting = cdsware.bibconvert.set_conv() +sysno = cdsware.bibconvert.generate("DATE(%w%H%M%S)") +sysno500 = cdsware.bibconvert.generate("DATE(%w%H%M%S)") separator = "" tcounter = 0 source_data = "" query_string = "" match_mode = -1 begin_record_header = "" ending_record_footer = "" output_rec_sep = "" begin_header = "" ending_footer = "" oai_identifier_from = 1 opts, args = getopt.getopt(sys.argv[1:],"c:d:hVl:o:b:e:B:E:s:m:C:", [ "config", "directory", "help", "version", "length", "oai", "header", "footer", "record-header", "record-footer", "separator", "match", "config-alt" ]) # get options and arguments dirmode = 0 Xcount = 0 for opt, opt_value in opts: if opt in ["-c", "--config"]: extract_tpl = opt_value - extract_tpl_parsed = bibconvert.parse_common_template(extract_tpl,1) + extract_tpl_parsed = cdsware.bibconvert.parse_common_template(extract_tpl,1) source_tpl = opt_value - source_tpl_parsed = bibconvert.parse_common_template(source_tpl,2) + source_tpl_parsed = cdsware.bibconvert.parse_common_template(source_tpl,2) target_tpl = opt_value - target_tpl_parsed = bibconvert.parse_common_template(target_tpl,3) + target_tpl_parsed = cdsware.bibconvert.parse_common_template(target_tpl,3) elif opt in ["-d", "--directory"]: source_data = opt_value source_data = source_data + "/" extract_tpl = "/" extract_tpl_parsed = None dirmode = 1 elif opt in ["-h", "--help"]: - bibconvert.printInfo() + cdsware.bibconvert.printInfo() sys.exit(0) elif opt in ["-V", "--version"]: print __version__ sys.exit(0) elif opt in ["-l", "--length"]: try: conv_setting[0] = string.atoi(opt_value) except ValueError, e: conv_setting[0] = 1 elif opt in ["-o", "--oai"]: try: oai_identifier_from = string.atoi(opt_value) except ValueError, e: oai_identifier_from = 1 elif opt in ["-b", "--header"]: begin_header = opt_value elif opt in ["-e", "--footer"]: ending_footer = opt_value elif opt in ["-B", "--record-header"]: begin_record_header = opt_value elif opt in ["-E", "--record-footer"]: ending_record_footer = opt_value elif opt in ["-s", "--separator"]: separator = opt_value elif opt in ["-t", "--output_separator"]: output_rec_sep = opt_value elif opt in ["-m", "--match"]: match_mode = string.atoi(opt_value[0:1]) query_string = opt_value[1:] elif opt in ["-C", "--config-alt"]: if opt_value[0:1] == "x": extract_tpl = opt_value[1:] - extract_tpl_parsed = bibconvert.parse_template(extract_tpl) + extract_tpl_parsed = cdsware.bibconvert.parse_template(extract_tpl) if opt_value[0:1] == "t": target_tpl = opt_value[1:] - target_tpl_parsed = bibconvert.parse_template(target_tpl) + target_tpl_parsed = cdsware.bibconvert.parse_template(target_tpl) if opt_value[0:1] == "s": source_tpl = opt_value[1:] - source_tpl_parsed = bibconvert.parse_template(source_tpl) + source_tpl_parsed = cdsware.bibconvert.parse_template(source_tpl) ar_.append(dirmode) ar_.append(Xcount) ar_.append(conv_setting) ar_.append(sysno) ar_.append(sysno500) ar_.append(separator) ar_.append(tcounter) ar_.append(source_data) ar_.append(query_string) ar_.append(match_mode) ar_.append(begin_record_header) ar_.append(ending_record_footer) ar_.append(output_rec_sep) ar_.append(begin_header) ar_.append(ending_footer) ar_.append(oai_identifier_from) ar_.append(source_tpl) ar_.append(source_tpl_parsed) ar_.append(target_tpl) ar_.append(target_tpl_parsed) ar_.append(extract_tpl) ar_.append(extract_tpl_parsed) -bibconvert.convert(ar_) +cdsware.bibconvert.convert(ar_) diff --git a/modules/bibconvert/lib/bibconvert.py b/modules/bibconvert/lib/bibconvert.py index c54a4844c..568e895be 100644 --- a/modules/bibconvert/lib/bibconvert.py +++ b/modules/bibconvert/lib/bibconvert.py @@ -1,1479 +1,1479 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """BibConvert tool to convert bibliographic records from any format to any format.""" __version__ = "$Id$" try: import fileinput import string import os import re import sys import time import getopt from time import gmtime, strftime, localtime import os.path except ImportError, e: print "Error: %s" % e import sys sys.exit(1) try: - from config import * - from search_engine import perform_request_search - from oai_repository_config import oaiidprefix + from cdsware.config import * + from cdsware.search_engine import perform_request_search + from cdsware.oai_repository_config import oaiidprefix except ImportError, e: print "Error: %s" % e sys.exit(1) ### Matching records with database content def parse_query_string(query_string): """Parse query string, e.g.: Input: 245__a::REP(-, )::SHAPE::SUP(SPACE, )::MINL(4)::MAXL(8)::EXPW(PUNCT)::WORDS(4,L)::SHAPE::SUP(SPACE, )||700__a::MINL(2)::REP(COMMA,). Output:[['245__a','REP(-,)','SHAPE','SUP(SPACE, )','MINL(4)','MAXL(8)','EXPW(PUNCT)','WORDS(4,L)','SHAPE','SUP(SPACE, )'],['700__a','MINL(2)','REP(COMMA,)']] """ query_string_out = [] query_string_out_in = [] query_string_split_1 = query_string.split('||') for item_1 in query_string_split_1: query_string_split_2 = item_1.split('::') query_string_out_in = [] for item in query_string_split_2: query_string_out_in.append(item) query_string_out.append(query_string_out_in) return query_string_out def set_conv(): """ bibconvert common settings ======================= minimal length of output line = 1 maximal length of output line = 4096 """ conv_setting = [ 1, 4096 ] return conv_setting def get_pars(fn): "Read function and its parameters into list" out = [] out.append(re.split('\(|\)', fn)[0]) out.append(re.split(',', re.split('\(|\)', fn)[1])) return out def append_to_output_file(filename, output): "bibconvert output file creation by output line" try: file = open(filename, 'a') file.write(output) file.close() except IOError, e: exit_on_error("Cannot write into %s" % filename) return 1 def sub_keywd(out): "bibconvert keywords literal substitution" out = string.replace(out, "EOL", "\n") out = string.replace(out, "_CR_", "\r") out = string.replace(out, "_LF_", "\n") out = string.replace(out, "\\", '\\') out = string.replace(out, "\r", '\r') out = string.replace(out, "BSLASH", '\\') out = string.replace(out, "COMMA", ',') out = string.replace(out, "LEFTB", '[') out = string.replace(out, "RIGHTB", ']') out = string.replace(out, "LEFTP", '(') out = string.replace(out, "RIGHTP", ')') return out def check_split_on(data_item_split, sep, tpl_f): """ bibconvert conditional split with following conditions =================================================== ::NEXT(N,TYPE,SIDE) - next N chars are of the TYPE having the separator on the SIDE ::PREV(N,TYPE,SIDE) - prev.N chars are of the TYPE having the separator on the SIDE """ fn = get_pars(tpl_f)[0] par = get_pars(tpl_f)[1] done = 0 while (done == 0): if ( (( fn == "NEXT" ) and ( par[2]=="R" )) or (( fn == "PREV" ) and ( par[2]=="L" )) ): test_value = data_item_split[0][-(string.atoi(par[0])):] elif ( ((fn == "NEXT") and ( par[2]=="L")) or ((fn == "PREV") and ( par[2]=="R")) ): test_value = data_item_split[1][:(string.atoi(par[0]))] data_item_split_tmp = [] if ((FormatField(test_value, "SUP(" + par[1] + ",)") != "") or (len(test_value) < string.atoi(par[0]))): data_item_split_tmp = data_item_split[1].split(sep, 1) if(len(data_item_split_tmp)==1): done = 1 data_item_split[0] = data_item_split[0] + sep + data_item_split_tmp[0] data_item_split[1] = "" else: data_item_split[0] = data_item_split[0] + sep + data_item_split_tmp[0] data_item_split[1] = data_item_split_tmp[1] else: done = 1 return data_item_split def get_subfields(data, subfield, src_tpl): "Get subfield according to the template" out = [] for data_item in data: found = 0 for src_tpl_item in src_tpl: if (src_tpl_item[:2] == "<:"): if (src_tpl_item[2:-2] == subfield): found = 1 else: sep_in_list = src_tpl_item.split("::") sep = sep_in_list[0] data_item_split = data_item.split(sep, 1) if (len(data_item_split)==1): data_item = data_item_split[0] else: if (len(sep_in_list) > 1): data_item_split = check_split_on(data_item.split(sep, 1), sep_in_list[0], sep_in_list[1]) if(found == 1): data_item = data_item_split[0] else: data_item = string.join(data_item_split[1:], sep) out.append(data_item) return out def exp_n(word): "Replace newlines and carriage return's from string." out = "" for ch in word: if ((ch != '\n') and (ch != '\r')): out = out + ch return out def exp_e(list): "Expunge empty elements from a list" out = [] for item in list: item = exp_n(item) if ((item != '\r\n' and item != '\r' and item != '\n' and item !="" and len(item)!=0)): out.append(item) return out def sup_e(word): "Replace spaces" out = "" for ch in word: if (ch != ' '): out = out + ch return out def select_line(field_code, list): "Return appropriate item from a list" out = [''] for field in list: field[0] = sup_e(field[0]) field_code = sup_e(field_code) if (field[0] == field_code): out = field[1] return out def parse_field_definition(source_field_definition): "Create list of source_field_definition" word_list = [] out = [] word = "" counter = 0 if (len(source_field_definition.split("---"))==4): out = source_field_definition.split("---") else: element_list_high = source_field_definition.split("<:") for word_high in element_list_high: element_list_low = word_high.split(':>') for word_low in element_list_low: word_list.append(word_low) word_list.append(":>") word_list.pop() word_list.append("<:") word_list.pop() for item in word_list: word = word + item if (item == "<:"): counter = counter + 1 if (item == ":>"): counter = counter - 1 if counter == 0: out.append(word) word = "" return out def parse_template(template): """ bibconvert parse template ====================== in - template filename out - [ [ field_code , [ field_template_parsed ] , [] ] """ out = [] for field_def in read_file(template, 1): field_tpl_new = [] if ((len(field_def.split("---", 1)) > 1) and (field_def[:1] != "#")): field_code = field_def.split("---", 1)[0] field_tpl = parse_field_definition(field_def.split("---", 1)[1]) field_tpl_new = field_tpl field_tpl = exp_e(field_tpl_new) out_data = [field_code, field_tpl] out.append(out_data) return out def parse_common_template(template, part): """ bibconvert parse template ========================= in - template filename out - [ [ field_code , [ field_template_parsed ] , [] ] """ out = [] counter = 0 for field_def in read_file(template, 1): if (exp_n(field_def)[:3] == "==="): counter = counter + 1 elif (counter == part): field_tpl_new = [] if ((len(field_def.split("---", 1)) > 1) and (field_def[:1]!="#")): field_code = field_def.split("---", 1)[0] field_tpl = parse_field_definition(field_def.split("---", 1)[1]) field_tpl_new = field_tpl field_tpl = exp_e(field_tpl_new) out_data = [field_code, field_tpl] out.append(out_data) return out def parse_input_data_f(source_data_open, source_tpl): """ bibconvert parse input data ======================== in - input source data location (filehandle) source data template source_field_code list of source field codes source_field_data list of source field data values (repetitive fields each line one occurence) out - [ [ source_field_code , [ source_field_data ] ] , [] ] source_data_template entry - field_code---[const]<:subfield_code:>[const][<:subfield_code:>][] destination_templace entry - [::GFF()]---[const]<:field_code::subfield_code[::FF()]:>[] input data file; by line: - fieldcode value """ global separator out = [['',[]]] count = 0 values = [] while (count < 1): line = source_data_open.readline() if (line == ""): return(-1) line_split = line.split(" ", 1) if (re.sub("\s", "", line) == separator): count = count + 1 if (len(line_split) == 2): field_code = line_split[0] field_value = exp_n(line_split[1]) values.append([field_code, field_value]) item_prev = "" stack = [''] for item in values: if ((item[0]==item_prev)or(item_prev == "")): stack.append(item[1]) item_prev = item[0] else: out.append([item_prev, stack]) item_prev = item[0] stack = [] stack.append(item[1]) try: if (stack[0] != ""): if (out[0][0]==""): out = [] out.append([field_code, stack]) except IndexError, e: out = out return out def parse_input_data_fx(source_tpl): """ bibconvert parse input data ======================== in - input source data location (filehandle) source data template source_field_code list of source field codes source_field_data list of source field data values (repetitive fields each line one occurence) out - [ [ source_field_code , [ source_field_data ] ] , [] ] extraction_template_entry - input data file - specified by extract_tpl """ global separator count = 0 record = "" field_data_1_in_list = [] out = [['',[]]] while (count <10): line = sys.stdin.readline() if (line == ""): count = count + 1 if (record == "" and count): return (-1) if (re.sub("\s", "", line) == separator): count = count + 10 else: record = record + line for field_defined in extract_tpl_parsed: try: field_defined[1][0] = sub_keywd(field_defined[1][0]) field_defined[1][1] = sub_keywd(field_defined[1][1]) except IndexError, e: field_defined = field_defined try: field_defined[1][2] = sub_keywd(field_defined[1][2]) except IndexError, e: field_defined = field_defined field_data_1 ="" if ((field_defined[1][0][0:2] == '//') and (field_defined[1][0][-2:] == '//')): field_defined_regexp = field_defined[1][0][2:-2] try: #### if (len(re.split(field_defined_regexp, record)) == 1): field_data_1 = "" field_data_1_in_list = [] else: field_data_1_tmp = re.split(field_defined_regexp, record, 1)[1] field_data_1_in_list = field_data_1_tmp.split(field_defined_regexp) except IndexError, e: field_data_1 = "" else: try: if (len(record.split(field_defined[1][0])) == 1): field_data_1 = "" field_data_1_in_list = [] else: field_data_1_tmp = record.split(field_defined[1][0], 1)[1] field_data_1_in_list = field_data_1_tmp.split(field_defined[1][0]) except IndexError, e: field_data_1 = "" spliton = [] outvalue = "" field_data_2 = "" field_data = "" try: if ((field_defined[1][1])=="EOL"): spliton = ['\n'] elif ((field_defined[1][1])=="MIN"): spliton = ['\n'] elif ((field_defined[1][1])=="MAX"): for item in extract_tpl_parsed: try: spliton.append(item[1][0]) except IndexError, e: spliton = spliton elif (field_defined[1][1][0:2] == '//') and (field_defined[1][1][-2:] == '//'): spliton = [field_defined[1][1][2:-2]] else: spliton = [field_defined[1][1]] except IndexError,e : spliton = "" outvalues = [] for field_data in field_data_1_in_list: outvalue = "" for splitstring in spliton: field_data_2 = "" if (len(field_data.split(splitstring))==1): if (outvalue == ""): field_data_2 = field_data else: field_data_2 = outvalue else: field_data_2 = field_data.split(splitstring)[0] outvalue = field_data_2 field_data = field_data_2 outvalues.append(outvalue) outvalues = exp_e(outvalues) if (len(outvalues) > 0): if (out[0][0]==""): out = [] outstack = [] if (len(field_defined[1])==3): spliton = [field_defined[1][2]] if (field_defined[1][2][0:2] == '//') and (field_defined[1][2][-2:] == '//'): spliton = [field_defined[1][2][2:-2]] for item in outvalues: stack = re.split(spliton[0], item) for stackitem in stack: outstack.append(stackitem) else: outstack = outvalues out.append([field_defined[0], outstack]) return out def parse_input_data_d(source_data, source_tpl): """ bibconvert parse input data ======================== in - input source data location (directory) source data template source_field_code list of source field codes source_field_data list of source field data values (repetitive fields each line one occurence) out - [ [ source_field_code , [ source_field_data ] ] , [] ] source_data_template entry - field_code---[const]<:subfield_code:>[const][<:subfield_code:>][] destination_templace entry - [::GFF()]---[const]<:field_code::subfield_code[::FF()]:>[] input data dir; by file: - fieldcode value per line """ out = [] for source_field_tpl in read_file(source_tpl, 1): source_field_code = source_field_tpl.split("---")[0] source_field_data = read_file(source_data + source_field_code, 0) source_field_data = exp_e(source_field_data) out_data = [source_field_code, source_field_data] out.append(out_data) return out def sub_empty_lines(value): out = re.sub('\n\n+', '', value) return out def set_par_defaults(par1, par2): "Set default parameter when not defined" par_new_in_list = par2.split(",") i = 0 out = [] for par in par_new_in_list: if (len(par1)>i): if (par1[i] == ""): out.append(par) else: out.append(par1[i]) else: out.append(par) i = i + 1 return out def generate(keyword): """ bibconvert generaded values: ========================= SYSNO() - generate date as '%w%H%M%S' WEEK(N) - generate date as '%V' with shift (N) DATE(format) - generate date in specifieddate FORMAT VALUE(value) - enter value literarly OAI() - generate oai_identifier, starting value given at command line as -o """ out = keyword fn = keyword + "()" par = get_pars(fn)[1] fn = get_pars(fn)[0] par = set_par_defaults(par, "") if (fn == "SYSNO"): out = sysno500 if (fn == "SYSNO330"): out = sysno if (fn == "WEEK"): par = set_par_defaults(par, "0") out = "%02d" % (string.atoi(strftime("%V", localtime())) + string.atoi(par[0])) if (string.atoi(out)<0): out = "00" if (fn == "VALUE"): par = set_par_defaults(par, "") out = par[0] if (fn == "DATE"): par = set_par_defaults(par, "%w%H%M%S," + "%d" % set_conv()[1]) out = strftime(par[0],localtime()) out = out[:string.atoi(par[1])] if (fn == "XDATE"): par = set_par_defaults(par,"%w%H%M%S," + ",%d" % set_conv()[1]) out = strftime(par[0],localtime()) out = par[1] + out[:string.atoi(par[2])] if (fn == "OAI"): out = "%s:%d" % (oaiidprefix,tcounter + oai_identifier_from) return out def read_file(filename,exception): "Read file into list" out = [] if (os.path.isfile(filename)): file = open(filename,'r') out = file.readlines() file.close() else: if exception: exit_on_error("Cannot access file: %s" % filename) return out def crawl_KB(filename,value,mode): """ bibconvert look-up value in KB_file in one of following modes: =========================================================== 1 - case sensitive / match (default) 2 - not case sensitive / search 3 - case sensitive / search 4 - not case sensitive / match 5 - case sensitive / search (in KB) 6 - not case sensitive / search (in KB) 7 - case sensitive / search (reciprocal) 8 - not case sensitive / search (reciprocal) 9 - replace by _DEFAULT_ only R - not case sensitive / search (reciprocal) (8) replace """ if (os.path.isfile(filename) != 1): pathtmp = string.split(extract_tpl,"/") pathtmp.pop() path = string.join(pathtmp,"/") filename = path + "/" + filename if (os.path.isfile(filename)): file_to_read = open(filename,"r") file_read = file_to_read.readlines() for line in file_read: code = string.split(line,"---") if (mode == "2"): value_to_cmp = string.lower(value) code[0] = string.lower(code[0]) if ((len(string.split(value_to_cmp,code[0])) > 1)or(code[0]=="_DEFAULT_")): value = code[1] return value elif ((mode == "3") or (mode == "0")): if ((len(string.split(value,code[0])) > 1)or(code[0]=="_DEFAULT_")): value = code[1] return value elif (mode == "4"): value_to_cmp = string.lower(value) code[0] = string.lower(code[0]) if ((code[0] == value_to_cmp)or(code[0]=="_DEFAULT_")): value = code[1] return value elif (mode == "5"): if ((len(string.split(code[0],value)) > 1)or(code[0]=="_DEFAULT_")): value = code[1] return value elif (mode == "6"): value_to_cmp = string.lower(value) code[0] = string.lower(code[0]) if ((len(string.split(code[0],value_to_cmp)) > 1)or(code[0]=="_DEFAULT_")): value = code[1] return value elif (mode == "7"): if ((len(string.split(code[0],value)) > 1)or(len(string.split(value,code[0])) > 1)or(code[0]=="_DEFAULT_")): value = code[1] return value elif (mode == "8"): value_to_cmp = string.lower(value) code[0] = string.lower(code[0]) if ((len(string.split(code[0],value_to_cmp)) > 1)or(len(string.split(value_to_cmp,code[0]))>1)or(code[0]=="_DEFAULT_")): value = code[1] return value elif (mode == "9"): if (code[0]=="_DEFAULT_"): value = code[1] return value elif (mode == "R"): value_to_cmp = string.lower(value) code[0] = string.lower(code[0]) if ((len(string.split(code[0],value_to_cmp)) > 1)or(len(string.split(value_to_cmp,code[0]))>1)or(code[0]=="_DEFAULT_")): value = value.replace(code[0],code[1]) else: if ((code[0] == value)or(code[0]=="_DEFAULT_")): value = code[1] return value return value def FormatField(value, fn): """ bibconvert formatting functions: ================================ ADD(prefix,suffix) - add prefix/suffix KB(kb_file,mode) - lookup in kb_file and replace value ABR(N,suffix) - abbreviate to N places with suffix ABRX() - abbreviate exclusively words longer ABRW() - abbreviate word (limit from right) REP(x,y) - replace SUP(type) - remove characters of certain TYPE LIM(n,side) - limit to n letters from L/R LIMW(string,side) - L/R after split on string WORDS(n,side) - limit to n words from L/R IF(value,valueT,valueF) - replace on IF condition MINL(n) - replace words shorter than n MINLW(n) - replace words shorter than n MAXL(n) - replace words longer than n EXPW(type) - replace word from value containing TYPE EXP(STR,0/1) - replace word from value containing string NUM() - take only digits in given string SHAPE() - remove extra space UP() - to uppercase DOWN() - to lowercase CAP() - make capitals each word SPLIT(n,h,str,from) - only for final Aleph field, i.e. AB , maintain whole words SPLITW(sep,h,str,from) - only for final Aleph field, split on string CONF(filed,value,0/1) - confirm validity of output line (check other field) CONFL(substr,0/1) - confirm validity of output line (check field being processed) CUT(prefix,postfix) - remove substring from side RANGE(MIN,MAX) - select items in repetitive fields RE(regexp) - regular expressions bibconvert character TYPES ========================== ALPHA - alphabetic NALPHA - not alpphabetic NUM - numeric NNUM - not numeric ALNUM - alphanumeric NALNUM - non alphanumeric LOWER - lowercase UPPER - uppercase PUNCT - punctual NPUNCT - non punctual SPACE - space """ global data_parsed out = value fn = fn + "()" par = get_pars(fn)[1] fn = get_pars(fn)[0] regexp = "//" NRE = len(regexp) value = sub_keywd(value) par_tmp = [] for item in par: item = sub_keywd(item) par_tmp.append(item) par = par_tmp if (fn == "RE"): new_value = "" par = set_par_defaults(par,".*,0") if (re.search(par[0],value) and (par[1] == "0")): new_value = value out = new_value if (fn == "KB"): new_value = "" par = set_par_defaults(par,"KB,0") new_value = crawl_KB(par[0],value,par[1]) out = new_value elif (fn == "ADD"): par = set_par_defaults(par,",") out = par[0] + value + par[1] elif (fn == "ABR"): par = set_par_defaults(par,"1,.") out = value[:string.atoi(par[0])] + par[1] elif (fn == "ABRW"): tmp = FormatField(value,"ABR(1,.)") tmp = tmp.upper() out = tmp elif (fn == "ABRX"): par = set_par_defaults(par,",") toout = [] tmp = value.split(" ") for wrd in tmp: if (len(wrd) > string.atoi(par[0])): wrd = wrd[:string.atoi(par[0])] + par[1] toout.append(wrd) out = string.join(toout," ") elif (fn == "SUP"): par = set_par_defaults(par,",") if(par[0]=="NUM"): out = re.sub('\d+',par[1],value) if(par[0]=="NNUM"): out = re.sub('\D+',par[1],value) if(par[0]=="ALPHA"): out = re.sub('[a-zA-Z]+',par[1],value) if(par[0]=="NALPHA"): out = re.sub('[^a-zA-Z]+',par[1],value) if((par[0]=="ALNUM")or(par[0]=="NPUNCT")): out = re.sub('\w+',par[1],value) if(par[0]=="NALNUM"): out = re.sub('\W+',par[1],value) if(par[0]=="PUNCT"): out = re.sub('\W+',par[1],value) if(par[0]=="LOWER"): out = re.sub('[a-z]+',par[1],value) if(par[0]=="UPPER"): out = re.sub('[A-Z]+',par[1],value) if(par[0]=="SPACE"): out = re.sub('\s+',par[1],value) elif (fn == "LIM"): par = set_par_defaults(par,",") if (par[1] == "L"): out = value[(len(value) - string.atoi(par[0])):] if (par[1] == "R"): out = value[:string.atoi(par[0])] elif (fn == "LIMW"): par = set_par_defaults(par,",") if (par[0]!= ""): if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp): par[0] = par[0][NRE:-NRE] par[0] = re.search(par[0],value).group() tmp = value.split(par[0]) if (par[1] == "L"): out = par[0] + tmp[1] if (par[1] == "R"): out = tmp[0] + par[0] elif (fn == "WORDS"): tmp2 = [value] par = set_par_defaults(par,",") if (par[1] == "R"): tmp = value.split(" ") tmp2 = [] i = 0 while (i < string.atoi(par[0])): tmp2.append(tmp[i]) i = i + 1 if (par[1] == "L"): tmp = value.split(" ") tmp.reverse() tmp2 = [] i = 0 while (i < string.atoi(par[0])): tmp2.append(tmp[i]) i = i + 1 tmp2.reverse() out = string.join(tmp2, " ") elif (fn == "MINL"): par = set_par_defaults(par,"1") tmp = value.split(" ") tmp2 = [] i = 0 for wrd in tmp: if (len(wrd) >= string.atoi(par[0])): tmp2.append(wrd) out = string.join(tmp2, " ") elif (fn == "MINLW"): par = set_par_defaults(par,"1") if (len(value) >= string.atoi(par[0])): out = value else: out = "" elif (fn == "MAXL"): par = set_par_defaults(par,"4096") tmp = value.split(" ") tmp2 = [] i = 0 for wrd in tmp: if (len(wrd) <= string.atoi(par[0])): tmp2.append(wrd) out = string.join(tmp2, " ") elif (fn == "REP"): set_par_defaults(par,",") if (par[0]!= ""): if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp): par[0] = par[0][NRE:-NRE] out = re.sub(par[0],value) else: out = value.replace(par[0],par[1]) elif (fn == "SHAPE"): if (value != ""): out = value.strip() elif (fn == "UP"): out = value.upper() elif (fn == "DOWN"): out = value.lower() elif (fn == "CAP"): tmp = value.split(" ") out2 = [] for wrd in tmp: wrd2 = wrd.capitalize() out2.append(wrd2) out = string.join(out2," ") elif (fn == "IF"): par = set_par_defaults(par,",,") N = 0 while N < 3: if (par[N][0:NRE] == regexp and par[N][-NRE:] == regexp): par[N] = par[N][NRE:-NRE] par[N] = re.search(par[N],value).group() N += 1 if (value == par[0]): out = par[1] else: out = par[2] if (out == "ORIG"): out = value elif (fn == "EXP"): par = set_par_defaults(par,",0") if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp): par[0] = par[0][NRE:-NRE] par[0] = re.search(par[0],value).group() tmp = value.split(" ") out2 = [] for wrd in tmp: if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp): par[0] = par[0][NRE:-NRE] if ((re.search(par[0],wrd).group() == wrd) and (par[1]=="1")): out2.append(wrd) if ((re.search(par[0],wrd).group() != wrd) and (par[1]=="0")): out2.append(wrd) else: if ((len(wrd.split(par[0])) == 1)and(par[1]=="1")): out2.append(wrd) if ((len(wrd.split(par[0])) != 1)and(par[1]=="0")): out2.append(wrd) out = string.join(out2," ") elif (fn == "EXPW"): par = set_par_defaults(par,",0") tmp = value.split(" ") out2 = [] for wrd in tmp: if ((FormatField(wrd,"SUP(" + par[0] + ")") == wrd)and(par[1]=="1")): out2.append(wrd) if ((FormatField(wrd,"SUP(" + par[0] + ")") != wrd)and(par[1]=="0")): out2.append(wrd) out = string.join(out2," ") elif (fn == "SPLIT"): par = set_par_defaults(par,"%d,0,,1" % conv_setting[1]) length = string.atoi(par[0]) + (string.atoi(par[1])) header = string.atoi(par[1]) headerplus = par[2] starting = string.atoi(par[3]) line = "" tmp2 = [] tmp3 = [] tmp = value.split(" ") linenumber = 1 if (linenumber >= starting): tmp2.append(headerplus) line = line + headerplus for wrd in tmp: line = line + " " + wrd tmp2.append(wrd) if (len(line) > length): linenumber = linenumber + 1 line = tmp2.pop() toout = string.join(tmp2) tmp3.append(toout) tmp2 = [] line2 = value[:header] if (linenumber >= starting): line3 = line2 + headerplus + line else: line3 = line2 + line line = line3 tmp2.append(line) tmp3.append(line) out = string.join(tmp3,"\n") out = FormatField(out,"SHAPE()") elif (fn == "SPLITW"): par = set_par_defaults(par,",0,,1") if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp): par[0] = par[0][NRE:-NRE] str = re.search(par[0], value) header = string.atoi(par[1]) headerplus = par[2] starting = string.atoi(par[3]) counter = 1 tmp2 = [] tmp = re.split(par[0],value) last = tmp.pop() for wrd in tmp: counter = counter + 1 if (counter >= starting): tmp2.append(value[:header] + headerplus + wrd + str) else: tmp2.append(value[:header] + wrd + str) if (last != ""): counter = counter + 1 if (counter >= starting): tmp2.append(value[:header] + headerplus + last) else: tmp2.append(value[:header] + last) out = string.join(tmp2,"\n") elif (fn == "CONF"): par = set_par_defaults(par,",,1") found = 0 par1 = "" data = select_line(par[0],data_parsed) for line in data: if (par[1][0:NRE] == regexp and par[1][-NRE:] == regexp): par1 = par[1][NRE:-NRE] else: par1 = par[1] if (par1 == ""): if (line == ""): found = 1 elif (len(re.split(par1,line)) > 1 ): found = 1 if ((found == 1)and(string.atoi(par[2]) == 1)): out = value if ((found == 1)and(string.atoi(par[2]) == 0)): out = "" if ((found == 0)and(string.atoi(par[2]) == 1)): out = "" if ((found == 0)and(string.atoi(par[2]) == 0)): out = value return out elif (fn == "CONFL"): set_par_defaults(par,",1") if (par[0][0:NRE] == regexp and par[0][-NRE:] == regexp): par[0] = par[0][NRE:-NRE] if (re.search(par[0],value)): if (string.atoi(par[1]) == 1): out = value else: out = "" else: if (string.atoi(par[1]) == 1): out = "" else: out = value return out elif (fn == "CUT"): par = set_par_defaults(par,",") left = value[:len(par[0])] right = value[-(len(par[1])):] if (left == par[0]): out = out[len(par[0]):] if (right == par[1]): out = out[:-(len(par[1]))] return out elif (fn == "NUM"): tmp = re.findall('\d',value) out = string.join(tmp,"") return out def printInfo(): "print out when not enough parmeters given" print """ BibConvert data convertor Usage: bibconvert [options] -ctemplate.cfg < input.dat Options: -c'config' configuration templates file -d'directory' source_data fields are located in separated files in 'directory'one record) -h print this help -V print version number -l'length' minimum line length (default = 1) -o'value' OAI identifier starts with specified value (default = 1) -b'file header' insert file header -e'file footer' insert file footer -B'record header' insert record header -E'record footer' insert record footer -s'record separator' record separator, default empty line (EOLEOL) -m0'query_string' match records using query string, output unmatched -m1'query_string' match records using query string, output matched -m2'query_string' match records using query string, output ambiguous -Cx'field extraction template' alternative to -c when configuration is split to several files -Cs'source data template' alternative to -c when configuration is split to several files -Ct'target data template' alternative to -c when configuration is split to several files """ ## Match records with the database content ## def match_in_database(record, query_string): "Check if record is in alreadey in database with an oai identifier. Returns recID if present, 0 otherwise." query_string_parsed = parse_query_string(query_string) search_pattern = [] search_field = [] for query_field in query_string_parsed: ind1 = query_field[0][3:4] if ind1 == "_": ind1 = "" ind2 = query_field[0][4:5] if ind2 == "_": ind2 = "" stringsplit = "" % (query_field[0][0:3], ind1, ind2, query_field[0][5:6]) formatting = query_field[1:] record1 = string.split(record, stringsplit) if len(record1) > 1: matching_value = string.split(record1[1],"<")[0] for fn in formatting: matching_value = FormatField(matching_value, fn) search_pattern.append(matching_value) search_field.append(query_field[0]) search_field.append("") search_field.append("") search_field.append("") search_pattern.append("") search_pattern.append("") search_pattern.append("") recID_list = perform_request_search(p1=search_pattern[0],f1=search_field[0],p2=search_pattern[1],f2=search_field[1],p3=search_pattern[2],f3=search_field[2]) return recID_list def parse_query_string(query_string): """Parse query string, e.g.: Input: 245__a::REP(-, )::SHAPE::SUP(SPACE, )::MINL(4)::MAXL(8)::EXPW(PUNCT)::WORDS(4,L)::SHAPE::SUP(SPACE, )||700__a::MINL(2)::REP(COMMA,). Output:[['245__a','REP(-,)','SHAPE','SUP(SPACE, )','MINL(4)','MAXL(8)','EXPW(PUNCT)','WORDS(4,L)','SHAPE','SUP(SPACE, )'],['700__a','MINL(2)','REP(COMMA,)']] """ query_string_out = [] query_string_out_in = [] query_string_split_1 = query_string.split('||') for item_1 in query_string_split_1: query_string_split_2 = item_1.split('::') query_string_out_in = [] for item in query_string_split_2: query_string_out_in.append(item) query_string_out.append(query_string_out_in) return query_string_out def exit_on_error(error_message): "exit when error occured" sys.stderr.write("\n bibconvert data convertor\n") sys.stderr.write(" Error: %s\n" % error_message) sys.exit() return 0 def create_record(begin_record_header, ending_record_footer, query_string, match_mode, Xcount): "Create output record" global data_parsed out_to_print = "" out = [] field_data_item_LIST = [] ssn5cnt = "%3d" % Xcount sysno = generate("DATE(%w%H%M%S)") sysno500 = generate("XDATE(%w%H%M%S)," + ssn5cnt) for T_tpl_item_LIST in target_tpl_parsed: # the line is printed only if the variables inside are not empty print_line = 0 to_output = [] rows = 1 for field_tpl_item_STRING in T_tpl_item_LIST[1]: DATA = [] if (field_tpl_item_STRING[:2]=="<:"): field_tpl_item_STRING = field_tpl_item_STRING[2:-2] field = field_tpl_item_STRING.split("::")[0] if (len(field_tpl_item_STRING.split("::")) == 1): value = generate(field) to_output.append([value]) else: subfield = field_tpl_item_STRING.split("::")[1] if (field[-1] == "*"): repetitive = 1 field = field[:-1] else: repetitive = 0 if dirmode: DATA = select_line(field,data_parsed) else: DATA = select_line(field,data_parsed) if (repetitive == 0): DATA = [string.join(DATA," ")] SRC_TPL = select_line(field,source_tpl_parsed) try: if (DATA[0] != ""): DATA = get_subfields(DATA,subfield,SRC_TPL) FF = field_tpl_item_STRING.split("::") if (len(FF) > 2): FF = FF[2:] for fn in FF: # DATAFORMATTED = [] if (len(DATA) != 0 and DATA[0] != ""): DATA = get_subfields(DATA,subfield,SRC_TPL) FF = field_tpl_item_STRING.split("::") if (len(FF) > 2): FF = FF[2:] for fn2 in FF: DATAFORMATTED = [] for item in DATA: item = FormatField(item,fn) DATAFORMATTED.append(item) DATA = DATAFORMATTED if (len(DATA) > rows): rows = len(DATA) if DATA != "": print_line = 1 to_output.append(DATA) except IndexError, e: pass else: to_output.append([field_tpl_item_STRING]) current = 0 default_print = 0 while (current < rows): line_to_print = [] for item in to_output: if (item==[]): item =[''] if (len(item) <= current): printout = item[0] else: printout = item[current] line_to_print.append(printout) output = exp_n(string.join(line_to_print,"")) global_formatting_functions = T_tpl_item_LIST[0].split("::")[1:] for GFF in global_formatting_functions: if (GFF[:5] == "RANGE"): parR = get_pars(GFF)[1] parR = set_par_defaults(parR,"MIN,MAX") if (parR[0]!="MIN"): if (string.atoi(parR[0]) > (current+1)): output = "" if (parR[1]!="MAX"): if (string.atoi(parR[1]) < (current+1)): output = "" elif (GFF[:4] == "DEFP"): default_print = 1 else: output = FormatField(output,GFF) if ((len(output) > set_conv()[0] and print_line == 1) or default_print): out_to_print = out_to_print + output + "\n" current = current + 1 ### out_flag = 0 if query_string: recID = match_in_database(out_to_print, query_string) if len(recID) == 1 and match_mode == 1: ctrlfield = "%d" % (recID[0]) out_to_print = ctrlfield + "\n" + out_to_print out_flag = 1 if len(recID) == 0 and match_mode == 0: out_flag = 1 if len(recID) > 1 and match_mode == 2: out_flag = 1 if out_flag or match_mode == -1: if begin_record_header != "": out_to_print = begin_record_header + "\n" + out_to_print if ending_record_footer != "": out_to_print = out_to_print + "\n" + ending_record_footer else: out_to_print = "" return out_to_print def convert(ar_): global dirmode, Xcount, conv_setting, sysno, sysno500, separator, tcounter, source_data, query_string, match_mode, begin_record_header ,ending_record_footer,output_rec_sep, begin_header, ending_footer, oai_identifier_from, source_tpl, source_tpl_parsed, target_tpl, target_tpl_parsed, extract_tpl, extract_tpl_parsed, data_parsed dirmode, Xcount, conv_setting, sysno, sysno500, separator, tcounter, source_data, query_string, match_mode, begin_record_header ,ending_record_footer,output_rec_sep, begin_header, ending_footer, oai_identifier_from, source_tpl, source_tpl_parsed, target_tpl, target_tpl_parsed, extract_tpl, extract_tpl_parsed = ar_ # separator = spt if dirmode: if (os.path.isdir(source_data)): data_parsed = parse_input_data_d(source_data,source_tpl) record = create_record(begin_record_header, ending_record_footer, query_string, match_mode, Xcount) if record != "": print record tcounter = tcounter + 1 if output_rec_sep != "": print output_rec_sep else: exit_on_error("Cannot access directory: %s" % source_data) else: done = 0 print begin_header while (done == 0): data_parsed = parse_input_data_fx(source_tpl) if (data_parsed == -1): done = 1 else: if (data_parsed[0][0]!= ''): record = create_record(begin_record_header, ending_record_footer, query_string, match_mode, Xcount) Xcount += 1 if record != "": print record tcounter = tcounter + 1 if output_rec_sep != "": print output_rec_sep print ending_footer return diff --git a/modules/bibconvert/lib/bibconvert_tests.py b/modules/bibconvert/lib/bibconvert_tests.py index 44e883b95..d4d42035d 100644 --- a/modules/bibconvert/lib/bibconvert_tests.py +++ b/modules/bibconvert/lib/bibconvert_tests.py @@ -1,134 +1,135 @@ # -*- coding: utf-8 -*- ## $Id$ ## CDSware bibconvert unit tests. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Unit tests for the bibconvert.""" __version__ = "$Id$" -import bibconvert import unittest import re +from cdsware import bibconvert + class TestFormattingFunctions(unittest.TestCase): """Test bibconvert formatting functions.""" def test_ff(self): """bibconvert - formatting functions""" self.assertEqual("Hello world!", bibconvert.FormatField("ello world", "ADD(H,!)")) self.assertEqual("Hello world!", bibconvert.FormatField("Hello world", "ABR(11,!)")) self.assertEqual("Hello world!", bibconvert.FormatField("xHello world!x", "CUT(x,x)")) self.assertEqual("Hello world!", bibconvert.FormatField("He11o wor1d!", "REP(1,l)")) self.assertEqual("Hello world!", bibconvert.FormatField("Hello world!", "SUP(NUM)")) self.assertEqual("Hello world!", bibconvert.FormatField("Hello world!", "LIM(12,R)")) self.assertEqual("Hello world!", bibconvert.FormatField("Hello world!", "WORDS(2)")) self.assertEqual("Hello world!", bibconvert.FormatField("Hello world!", "MINL(5)")) self.assertEqual("Hello world!", bibconvert.FormatField("Hello world!", "MAXL(12)")) self.assertEqual("Hello world!", bibconvert.FormatField("Hello world! @", "EXP(@,1)")) self.assertEqual("Hello world!", bibconvert.FormatField("Hello world!", "IF(Hello world!,ORIG,)")) self.assertEqual("", bibconvert.FormatField("Hello world!", "NUM()")) self.assertEqual("Hello world!", bibconvert.FormatField("Hello world! ", "SHAPE()")) self.assertEqual("HELLO WORLD!", bibconvert.FormatField("Hello world!", "UP()")) self.assertEqual("hello world!", bibconvert.FormatField("Hello world!", "DOWN()")) self.assertEqual("Hello World!", bibconvert.FormatField("Hello world!", "CAP()")) class TestGlobalFormattingFunctions(unittest.TestCase): """Test bibconvert global formatting functions.""" def test_gff(self): """bibconvert - global formatting functions""" self.assertEqual("Hello world!", bibconvert.FormatField("Hello world!","DEFP()")) class TestGenerateValues(unittest.TestCase): """Test bibconvert value generation.""" def test_gv(self): """bibconvert - value generation""" self.assertEqual("Hello world!", bibconvert.generate("VALUE(Hello world!)")) class TestParseData(unittest.TestCase): """Test bibconvert input data parsing.""" def test_idp(self): """bibconvert - input data parsing""" self.assertEqual(['A','B','C','D'], bibconvert.parse_field_definition("A---B---C---D")) class TestRegExp(unittest.TestCase): """Test bibconvert regular expressions""" def test_regexp(self): """bibconvert - regular expressions""" self.assertEqual("Hello world!", bibconvert.FormatField("Hello world!", "RE([A-Z][a-z].*!)")) class TestBCCL(unittest.TestCase): """Test bibconvert BCCL complinacy""" def test_bccl_09(self): """bibconvert - BCCL v.0.9 compliancy""" self.assertEqual(1, 1) class TestKnowledgeBase(unittest.TestCase): """Test bibconvert knowledge base""" def test_enc(self): """bibconvert - knowledge base""" self.assertEqual(1, 1) class TestErrorCodes(unittest.TestCase): """Test bibconvert error codes""" def test_enc(self): """bibconvert - error codes""" self.assertEqual(1, 1) class TestEncodings(unittest.TestCase): """Test bibconvert encodings""" def test_enc(self): """bibconvert - encodings""" self.assertEqual(1, 1) def create_test_suite(): """Return test suite for the bibconvert module.""" return unittest.TestSuite((unittest.makeSuite(TestFormattingFunctions, 'test'), unittest.makeSuite(TestGlobalFormattingFunctions, 'test'), unittest.makeSuite(TestGenerateValues, 'test'), unittest.makeSuite(TestParseData, 'test'), unittest.makeSuite(TestRegExp, 'test'), unittest.makeSuite(TestBCCL, 'test'), unittest.makeSuite(TestKnowledgeBase, 'test'), unittest.makeSuite(TestErrorCodes, 'test'), unittest.makeSuite(TestEncodings, 'test'))) if __name__ == "__main__": unittest.TextTestRunner(verbosity=2).run(create_test_suite()) diff --git a/modules/bibedit/bin/refextract.in b/modules/bibedit/bin/refextract.in index 6b2f31559..ada784242 100644 --- a/modules/bibedit/bin/refextract.in +++ b/modules/bibedit/bin/refextract.in @@ -1,54 +1,52 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ "bibrefextract" is used to extract and process the "references" or "citations" made to other documents from within a document. A document's "references" section is usually found at the end of the document, and generally consists of a list of the works cited during the course of the document. "bibrefextract" can attempt to identify a document's references section and extract it from the document. It can also attempt to standardise the references (correct the names of journals etc so that they are written in a standard format), and mark them up so that they can be linked to the full articles on the Web by means of hyper-links. "bibrefextract" has 4 phases of processing (passes): 1. Convert PDF or Postscript file to plaintext (UTF-8). 2. Extract References from plaintext. 3. Standardise titles in extracted reference lines. 4. Markup standardised titles (this pass can only be performed if pass 3 was also performed). Options: --help, -h Display help/usage message and exit. --version, -V Print version number and exit. """ try: import sys import os - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.refextract import * except ImportError, e: import sys sys.exit("E: %s" % e) if __name__ == '__main__': main() diff --git a/modules/bibedit/bin/xmlmarclint.in b/modules/bibedit/bin/xmlmarclint.in index d85c36fc6..a9f7cc11f 100644 --- a/modules/bibedit/bin/xmlmarclint.in +++ b/modules/bibedit/bin/xmlmarclint.in @@ -1,120 +1,118 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ XML MARC lint - check your XML MARC files """ import getopt import string import sys try: import sys - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.bibrecord import * except ImportError, e: print "Error: %s" % e import sys sys.exit(1) cmdusage = """command usage: %s [-v ] xmlfile or %s --help """ % (sys.argv[0], sys.argv[0]) helpmsg = cmdusage try: opts,args=getopt.getopt(sys.argv[1:], "c:v:h:",["-help"]) except getopt.GetoptError: print cmdusage sys.exit(2) badrecords = [] listofrecs=[] verbose= 0 if len(args)==1: xmlfile = args[0] elif len(args)==0: if len(opts)==1: if opts[0][0] in ['-help','-h']: print helpmsg else: print cmdusage sys.exit(2) else: print cmdusage sys.exit(2) for opt in opts: if not opt[0] in ['-v']: print cmdusage sys.exit(2) if opt[0] == '-v': try: verbose = string.atoi(opt[1]) except ValueError: print 'Verbose must be an integer' sys.exit(2) global parser try: f = open(xmlfile,'r') xmltext = f.read() f.close() except IOError: print 'File not found\n Please check the name' import sys sys.exit(1) parser = parser listofrecs = create_records(xmltext,0,1) badr = filter((lambda x: x[1]==0),listofrecs) badrecords = map((lambda x:x[0]),badr) s='' e='' if verbose: if verbose <=3: e=print_errors(concat(map((lambda x:x[2]),listofrecs))) else: s=print_recs(badrecords) e=print_errors(concat(map((lambda x:x[2]),listofrecs))) else: if badrecords !=[]: print 'Bad records detected! For more information, set verbosity.' sys.exit(1) if s!='' or e!='': print s print e sys.exit(1) diff --git a/modules/bibedit/lib/bibedit_templates.py b/modules/bibedit/lib/bibedit_templates.py index e4ef4a7af..288fa2968 100644 --- a/modules/bibedit/lib/bibedit_templates.py +++ b/modules/bibedit/lib/bibedit_templates.py @@ -1,22 +1,22 @@ ## $Id$ ## CDSware WebStyle templates. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. -from config import * -from messages import gettext_set_language, language_list_long +from cdsware.config import * +from cdsware.messages import gettext_set_language, language_list_long diff --git a/modules/bibedit/lib/bibrecord.py b/modules/bibedit/lib/bibrecord.py index a578a7afe..294c75223 100644 --- a/modules/bibedit/lib/bibrecord.py +++ b/modules/bibedit/lib/bibrecord.py @@ -1,973 +1,973 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ BibRecord - XML MARC processing library for CDSware """ ### IMPORT INTERESTING MODULES AND XML PARSERS ## import interesting modules: try: import sys import re from zlib import decompress import_error = 0 except ImportError, e: import_error = 1 imperr = e ## test available parsers: try: import sys import string err=[] except ImportError, e: parser = -3 err1 = e try: - from bibrecord_config import * + from cdsware.bibrecord_config import * verbose = cfg_bibrecord_default_verbose_level correct = cfg_bibrecord_default_correct parsers = cfg_bibrecord_parsers_available except ImportError, e: parser = -2 verbose = 0 correct = 0 parsers = [] if parsers == []: print 'No parser available' sys.exit(2) else: j,i=1,1 if 2 in parsers: try: import pyRXP parser = 2 ## function to show the pyRXP_parser warnings ## def warnCB(s): """ function used to treat the PyRXP parser warnings""" global err err.append((0,'Parse warning:\n'+s)) err2 = "" except ImportError,e : err2=e i=0 elif 1 in parsers: try: from Ft.Xml.Domlette import NonvalidatingReader parser = 1 except ImportError,e : err2=e j=0 elif 0 in parsers: try: from xml.dom.minidom import parseString parser = 0 except ImportError,e : err2=e parser = -1 if not i: if 1 in parsers: try: from Ft.Xml.Domlette import NonvalidatingReader parser = 1 except ImportError,e : err2=e j=0 elif 0 in parsers: try: from xml.dom.minidom import parseString parser = 0 except ImportError,e : err2=e parser = -1 else: parser = -1 if not j: if 0 in parsers: try: from xml.dom.minidom import parseString parser = 0 except ImportError,e : err2=e parser = -1 else: parser = -1 ### INTERFACE / VISIBLE FUNCTIONS def create_records(xmltext,verbose=verbose,correct=correct): """ creates a list of records """ global import_error err = [] if import_error == 1: err.append((6,imperr)) else: if sys.version >= '2.3': pat = r".*?" p = re.compile(pat,re.DOTALL) # DOTALL - to ignore whitespaces list = p.findall(xmltext) else: l = xmltext.split('') n=len(l) ind = (l[n-1]).rfind('') aux = l[n-1][:ind+9] l[n-1] = aux list=[] for s in l: if s != '': i = -1 while (s[i].isspace()): i=i-1 if i == -1:#in case there are no spaces at the end i=len(s)-1 if s[:i+1].endswith(''): list.append(''+s) listofrec = map((lambda x:create_record(x,verbose,correct)),list) return listofrec return [] # Record :: {tag : [Field]} # Field :: (Subfields,ind1,ind2,value) # Subfields :: [(code,value)] def create_record(xmltext,verbose = verbose, correct=correct): """ creates a record object and returns it uses pyRXP if installed else uses 4Suite domlette or xml.dom.minidom """ global parser (i,errors) = testImports(parser) if i==0: print "Error: no suitable XML parsers found. Please read INSTALL file." sys.exit() try: if parser==2: ## the following is because of DTD validation t = """ \n""" % cfg_marc21_dtd t = "%s%s" % (t,xmltext) t = "%s" % t xmltext = t (rec,er) = create_record_RXP(xmltext,verbose,correct) elif parser: (rec,er) = create_record_4suite(xmltext,verbose,correct) else: (rec,er) = create_record_minidom(xmltext,verbose,correct) errs = warnings(er) except Exception, e: print e errs = warnings(concat(err)) return (None,0,errs) if errs == []: return (rec,1,errs) else: return (rec,0,errs) def record_get_field_instances(rec, tag="", ind1="", ind2=""): """Return the list of field instances of record REC matching TAG and IND1 and IND2. When TAG is an emtpy string, then return all field instances.""" out = [] if tag: if record_has_field(rec, tag): for possible_field_instance in rec[tag]: if possible_field_instance[1] == ind1 and \ possible_field_instance[2] == ind2: out.append(possible_field_instance) else: return rec.items() return out def record_has_field(rec,tag): """checks whether record 'rec' contains tag 'tag'""" return rec.has_key(tag) def record_add_field(rec,tag,value,ind1="",ind2=""): """ adds new field defined by the tag|value|ind1|ind2 parameters to record 'rec' returns the new field """ val=rec.values() if val != []: ord = max([f[4] for x in val for f in x]) else: ord = 1 newfield = create_field(value,ind1,ind2,[],ord) if rec.has_key(tag): rec[tag].append(newfield) else: rec[tag] = [newfield] return newfield def record_delete_field(rec,tag,ind1="",ind2=""): """ delete all fields defined with marc tag 'tag' and indicators 'ind1' and 'ind2' from record 'rec' """ newlist = [] if rec.has_key(tag): for field in rec[tag]: if not (field[1]==ind1 and field[2]==ind2): newlist.append(field) rec[tag] = newlist def record_get_field_value(rec,tag,ind1="",ind2="",code=""): """ retrieves the value of the first field containing tag 'tag' and indicators 'ind1' and 'ind2' inside record 'rec'. Returns the found value as a string. If no matching field is found returns the empty string. if the tag has a '%', it will retrieve the value of first field containg tag, which first characters are those before '%' in tag. The ind1, ind2 and code parameters will be ignored """ s = tag.split('%') if len(s) > 1: t = s[0] keys=rec.keys() tags=[k for k in keys if k.startswith(t)] for tag in tags: fields = rec[tag] for field in fields: if field[3] != "": return field[3] else: for subfield in field[0]: return subfield[1] else: if rec.has_key(tag): fields = rec[tag] for field in fields: if field[1]==ind1 and field[2]==ind2: if field[3] != "": return field[3] else: for subfield in field[0]: if subfield[0]==code: return subfield[1] return "" def record_get_field_values(rec,tag,ind1="",ind2="",code=""): """ retrieves the values of all the fields containing tag 'tag' and indicators 'ind1' and 'ind2' inside record 'rec'. Returns the found values as a list. If no matching field is found returns an empty list. if the tag has a '%', it will retrieve the value of all fields containg tag, which first characters are those before '%' in tag. The ind1, ind2 and code parameters will be ignored """ tmp = [] s = tag.split('%') if len(s) > 1: t = s[0] keys=rec.keys() tags=[k for k in keys if k.startswith(t)] for tag in tags: fields = rec[tag] for field in fields: if field[3] != "": tmp.append(field[3]) else: for subfield in field[0]: tmp.append(subfield[1]) else: if rec.has_key(tag): fields = rec[tag] for field in fields: if field[1]==ind1 and field[2]==ind2: if field[3] != "": tmp.append(field[3]) else: for subfield in field[0]: if subfield[0]==code: tmp.append(subfield[1]) return tmp def print_rec(rec,format=1): """prints a record format = 1 -- XML format = 2 -- HTML (not implemented) """ if format==1: text = record_xml_output(rec) else: return '' return text def print_recs(listofrec,format=1): """prints a list of records format = 1 -- XML format = 2 -- HTML (not implemented) if 'listofrec' is not a list it returns empty string """ text = "" if type(listofrec).__name__ !='list': return "" else: for rec in listofrec: text = "%s\n%s" % (text,print_rec(rec,format)) return text def record_xml_output(rec): """generates the XML for record 'rec' and returns it as a string""" xmltext = "\n" #add the tag 'tag' to each field in rec[tag] fields=[] for tag in rec.keys(): for field in rec[tag]: fields.append((tag,field)) record_order_fields(fields) for field in fields: xmltext = "%s%s" % (xmltext,field_xml_output(field[1],field[0]))#field[0]=tag xmltext = "%s" % xmltext return xmltext def records_xml_output(listofrec): """generates the XML for the list of records 'listofrec' and returns it as a string""" xmltext = """ \n""" % cfg_marc21_dtd for rec in listofrec: xmltext = "%s%s" % (xmltext, record_xml_output(rec)) xmltext = "%s" % xmltext return xmltext def field_get_subfield_instances(field): """returns the list of subfields associated with field 'field'""" return field[0] def field_get_subfield_values(field_instance, code): """Return subfield CODE values of the field instance FIELD.""" out = [] for sf_code, sf_value in field_instance[0]: if sf_code == code: out.append(sf_value) return out def field_add_subfield(field,code,value): """adds a subfield to field 'field'""" field[0].append(create_subfield(code,value)) ### IMPLEMENTATION / INVISIBLE FUNCTIONS def create_record_RXP(xmltext, verbose=verbose, correct=correct): """ creates a record object and returns it uses the RXP parser If verbose>3 then the parser will be strict and will stop in case of well-formedness errors or DTD errors If verbose=0, the parser will not give warnings If 0 We will try to correct errors such as missing attributtes correct = 0 -> there will not be any attempt to correct errors """ record = {} global err ord = 1 # this is needed because of the record_xml_output function, where we need to know # the order of the fields TAG, ATTRS,CHILD_LIST = range(3) if verbose > 3: p = pyRXP.Parser(ErrorOnValidityErrors=1, ProcessDTD=1, ErrorOnUnquotedAttributeValues=1, warnCB = warnCB, srcName='string input') else: p = pyRXP.Parser(ErrorOnValidityErrors=0, ProcessDTD=1, ErrorOnUnquotedAttributeValues=0, warnCB = warnCB, srcName='string input') if correct: (rec,e) = wash(xmltext) err.extend(e) return (rec,e) root1=p(xmltext) #root = (tagname, attr_dict, child_list, reserved) if root1[0]=='collection': recs = [t for t in root1[CHILD_LIST] if type(t).__name__=='tuple' and t[TAG]=="record"] if recs !=[]: root = recs[0] else: root = None else: root=root1 # get childs of 'controlfield' childs_controlfield = [] if not root[2]==None: childs_controlfield =[t for t in root[CHILD_LIST] if type(t).__name__=='tuple' and t[TAG]=="controlfield"] # get childs of 'datafield' childs_datafield = [] if not root[CHILD_LIST]==None: childs_datafield =[t for t in root[CHILD_LIST] if type(t).__name__=='tuple' and t[TAG]=="datafield"] for controlfield in childs_controlfield: s=controlfield[ATTRS]["tag"] value='' if not controlfield==None: value=''.join([ n for n in controlfield[CHILD_LIST] if type(n).__name__ == 'str']) name = type(value).__name__ if name in ["int","long"] : st = str(value) elif name in ['str', 'unicode']: st = value else: if verbose: err.append((7,'Type found: ' + name)) st = "" # the type of value is not correct. (user insert something like a list...) field = ([],"","",st,ord) #field = (subfields, ind1, ind2,value,ord) if record.has_key(s): record[s].append(field) else: record[s]=[field] ord = ord+1 for datafield in childs_datafield: #create list of subfields subfields = [] childs_subfield = [] if not datafield[CHILD_LIST]==None: childs_subfield =[t for t in datafield[CHILD_LIST] if type(t).__name__=='tuple' and t[0]=="subfield"] for subfield in childs_subfield: value='' if not subfield==None: value=''.join([ n for n in subfield[CHILD_LIST] if type(n).__name__ == 'str']) #get_string_value(subfield) if subfield[ATTRS].has_key('code'): subfields.append((subfield[ATTRS]["code"],value)) else: subfields.append(('!',value)) #create field if datafield[ATTRS].has_key('tag'): s = datafield[ATTRS]["tag"] else: s = '!' if datafield[ATTRS].has_key('ind1'): ind1 = datafield[ATTRS]["ind1"] else: ind1 = '!' if datafield[ATTRS].has_key('ind2'): ind2 = datafield[ATTRS]["ind2"] else: ind2 = '!' field = (subfields,ind1,ind2,"",ord) if record.has_key(s): record[s].append(field) else: record[s]=[field] ord = ord+1 return (record,err) def create_record_minidom(xmltext, verbose=verbose, correct=correct): """ creates a record object and returns it uses xml.dom.minidom """ record = {} ord=1 global err if correct: xmlt = xmltext (rec,e) = wash(xmlt,0) err.extend(e) return (rec,err) dom = parseString(xmltext) root = dom.childNodes[0] for controlfield in get_childs_by_tag_name(root,"controlfield"): s = controlfield.getAttribute("tag") text_nodes = controlfield.childNodes v = u''.join([ n.data for n in text_nodes ]).encode("utf-8") name = type(v).__name__ if (name in ["int","long"]) : field = ([],"","",str(v),ord) # field = (subfields, ind1, ind2,value) elif name in ['str', 'unicode']: field = ([],"","",v,ord) else: if verbose: err.append((7,'Type found: ' + name)) field = ([],"","","",ord)# the type of value is not correct. (user insert something like a list...) if record.has_key(s): record[s].append(field) else: record[s]=[field] ord=ord+1 for datafield in get_childs_by_tag_name(root,"datafield"): subfields = [] for subfield in get_childs_by_tag_name(datafield,"subfield"): text_nodes = subfield.childNodes v = u''.join([ n.data for n in text_nodes ]).encode("utf-8") code = subfield.getAttributeNS(None,'code').encode("utf-8") if code != '': subfields.append((code,v)) else: subfields.append(('!',v)) s = datafield.getAttribute("tag").encode("utf-8") if s == '': s = '!' ind1 = datafield.getAttribute("ind1").encode("utf-8") ind2 = datafield.getAttribute("ind2").encode("utf-8") if record.has_key(s): record[s].append((subfields,ind1,ind2,"",ord)) else: record[s]=[(subfields,ind1,ind2,"",ord)] ord = ord+1 return (record,err) def create_record_4suite(xmltext,verbose=verbose,correct=correct): """ creates a record object and returns it uses 4Suite domlette """ record = {} global err if correct: xmlt = xmltext (rec,e) = wash(xmlt,1) err.extend(e) return (rec,e) dom = NonvalidatingReader.parseString(xmltext,"urn:dummy") root = dom.childNodes[0] ord=1 for controlfield in get_childs_by_tag_name(root,"controlfield"): s = controlfield.getAttributeNS(None,"tag") text_nodes = controlfield.childNodes v = u''.join([ n.data for n in text_nodes ]).encode("utf-8") name = type(v).__name__ if (name in ["int","long"]) : field = ([],"","",str(v),ord) # field = (subfields, ind1, ind2,value) elif name in ['str','unicode']: field = ([],"","",v,ord) else: if verbose: err.append((7,'Type found: ' + name)) field = ([],"","","",ord)# the type of value is not correct. (user insert something like a list...) if record.has_key(s): record[s].append(field) else: record[s]=[field] ord=ord+1 for datafield in get_childs_by_tag_name(root,"datafield"): subfields = [] for subfield in get_childs_by_tag_name(datafield,"subfield"): text_nodes = subfield.childNodes v = u''.join([ n.data for n in text_nodes ]).encode("utf-8") code = subfield.getAttributeNS(None,'code').encode("utf-8") if code != '': subfields.append((code,v)) else: subfields.append(('!',v)) s = datafield.getAttributeNS(None,"tag").encode("utf-8") if s == '': s = '!' ind1 = datafield.getAttributeNS(None,"ind1").encode("utf-8") ind2 = datafield.getAttributeNS(None,"ind2").encode("utf-8") if record.has_key(s): record[s].append((subfields,ind1,ind2,"",ord)) else: record[s]=[(subfields,ind1,ind2,"",ord)] ord=ord+1 return (record,err) def record_order_fields(rec,fun="order_by_ord"): """orders field inside record 'rec' according to a function""" rec.sort(eval(fun)) return def record_order_subfields(rec,fun="order_by_code"): """orders subfield inside record 'rec' according to a function""" for tag in rec: for field in rec[tag]: field[0].sort(eval(fun)) return def concat(list): """concats a list of lists""" newl = [] for l in list: newl.extend(l) return newl def create_field(value,ind1="",ind2="",subfields=[],ord=-1): """ creates a field object and returns it""" name = type(value).__name__ if name in ["int","long"] : s = str(value) elif name in ['str', 'unicode']: s = value else: err.append((7,'Type found: ' + name)) s="" field = (subfields,ind1,ind2,s,ord) return field def field_add_subfield(field,code,value): """adds a subfield to field 'field'""" field[0].append(create_subfield(code,value)) def field_xml_output(field,tag): """generates the XML for field 'field' and returns it as a string""" xmltext = "" if field[3] != "": xmltext = "%s %s\n" % (xmltext,tag,encode_for_xml(field[3])) else: xmltext = "%s \n" % (xmltext,tag,field[1],field[2]) for subfield in field[0]: xmltext = "%s%s" % (xmltext,subfield_xml_output(subfield)) xmltext = "%s \n" % xmltext return xmltext def create_subfield(code,value): """ creates a subfield object and returns it""" if type(value).__name__ in ["int","long"]: s = str(value) else: s = value subfield = (code, s) return subfield def subfield_xml_output(subfield): """generates the XML for a subfield object and return it as a string""" xmltext = " %s\n" % (subfield[0],encode_for_xml(subfield[1])) return xmltext def order_by_ord(field1, field2): """function used to order the fields according to their ord value""" return cmp(field1[1][4], field2[1][4]) def order_by_code(subfield1,subfield2): """function used to order the subfields according to their code value""" return cmp(subfield1[0],subfield2[0]) def get_childs_by_tag_name(node, local): """retrieves all childs from node 'node' with name 'local' and returns them as a list""" cNodes = list(node.childNodes) res = [child for child in cNodes if child.nodeName==local] return res def get_string_value(node): """gets all child text nodes of node 'node' and returns them as a unicode string""" text_nodes = node.childNodes return u''.join([ n.data for n in text_nodes ]) def get_childs_by_tag_name_RXP(listofchilds,tag): """retrieves all childs from 'listofchilds' with tag name 'tag' and returns them as a list. listofchilds is a list returned by the RXP parser """ l=[] if not listofchilds==None: l =[t for t in listofchilds if type(t).__name__=='tuple' and t[0]==tag] return l def getAttribute_RXP(root, attr): """ returns the attributte 'attr' from root 'root' root is a node returned by RXP parser """ try: return u''.join(root[1][attr]) except KeyError,e: return "" def get_string_value_RXP(node): """gets all child text nodes of node 'node' and returns them as a unicode string""" if not node==None: return ''.join([ n for n in node[2] if type(n).__name__ == 'str']) else: return "" def encode_for_xml(s): "Encode special chars in string so that it would be XML-compliant." s = string.replace(s, '&', '&') s = string.replace(s, '<', '<') return s def print_errors(list): """ creates a unique string with the strings in list, using '\n' as a separator """ text="" for l in list: text = '%s\n%s'% (text,l) return text def wash(xmltext, parser=2): """ Check the structure of the xmltext. Returns a record structure and a list of errors. parser = 1 - 4_suite parser = 2 - pyRXP parser = 0 - minidom """ errors=[] i,e1 = tagclose('datafield',xmltext) j,e2 = tagclose('controlfield',xmltext) k,e3 = tagclose('subfield',xmltext) w,e4 = tagclose('record',xmltext) errors.extend(e1) errors.extend(e2) errors.extend(e3) errors.extend(e4) if i and j and k and w and parser!=-3: if parser==1: (rec,ee) = create_record_4suite(xmltext,0,0) elif parser==2: (rec,ee) = create_record_RXP(xmltext,0,0) else: (rec,ee) = create_record_minidom(xmltext,0,0) else: return (None,errors) keys = rec.keys() for tag in keys: upper_bound = '999' n = len(tag) if n>3: i=n-3 while i>0: upper_bound = '%s%s' % ('0',upper_bound) i = i-1 if tag == '!': # missing tag errors.append((1, '(field number(s): ' + ([f[4] for f in rec[tag]]).__str__()+')')) v=rec[tag] rec.__delitem__(tag) rec['000'] = v tag = '000' elif not ("001" <= tag <=upper_bound): errors.append(2) v = rec[tag] rec.__delitem__(tag) rec['000'] = v tag = '000' fields =[] for field in rec[tag]: if field[0]==[] and field[3]=='': ## datafield without any subfield errors.append((8,'(field number: '+field[4].__str__()+')')) subfields=[] for subfield in field[0]: if subfield[0]=='!': errors.append((3,'(field number: '+field[4].__str__()+')')) newsub = ('',subfield[1]) else: newsub = subfield subfields.append(newsub) if field[1]=='!': errors.append((4,'(field number: '+field[4].__str__()+')')) ind1 = "" else: ind1 = field[1] if field[2]=='!': errors.append((5,'(field number: '+field[4].__str__()+')')) ind2 = "" else: ind2=field[2] newf = (subfields,ind1,ind2,field[3],field[4]) fields.append(newf) rec[tag]=fields return (rec,errors) def tagclose(tagname,xmltext): """ checks if an XML document does not hae any missing tag with name tagname """ import re errors=[] pat_open = '<'+tagname+'.*?>' pat_close = '' p_open = re.compile(pat_open,re.DOTALL) # DOTALL - to ignore whitespaces p_close = re.compile(pat_close,re.DOTALL) list1 = p_open.findall(xmltext) list2 = p_close.findall(xmltext) if len(list1)!=len(list2): errors.append((99,'(Tagname : ' + tagname + ')')) return (0,errors) else: return (1,errors) def testImports(c): """ Test if the import statements did not failed""" errors=[] global err1,err2 if c==-1: i = 0 errors.append((6,err2)) elif c == -3: i=0 errors.append((6,err1)) else: i=1 return (i,errors) def warning(code): """ It returns a warning message of code 'code'. If code = (cd, str) it returns the warning message of code 'cd' and appends str at the end""" ws = cfg_bibrecord_warning_msgs s='' if type(code).__name__ == 'str': return code if type(code).__name__ == 'tuple': if type(code[1]).__name__ == 'str': s = code[1] c = code[0] else: c = code if ws.has_key(c): return ws[c]+s else: return "" def warnings(l): """it applies the function warning to every element in l""" list = [] for w in l: list.append(warning(w)) return list diff --git a/modules/bibedit/lib/bibrecord_config.py b/modules/bibedit/lib/bibrecord_config.py index 0d585b64b..3fc31cceb 100644 --- a/modules/bibedit/lib/bibrecord_config.py +++ b/modules/bibedit/lib/bibrecord_config.py @@ -1,51 +1,51 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ### CONFIGURATION OPTIONS FOR BIBRECORD LIBRARY """bibrecord configuration""" -from config import etcdir +from cdsware.config import etcdir # location of the MARC21 DTD file: cfg_marc21_dtd = "%s/bibedit/MARC21slim.dtd" % etcdir # internal dictionary of warning messages: cfg_bibrecord_warning_msgs = { 0: '' , 1: 'WARNING: tag missing for field(s)\nValue stored with tag \'000\'', 2: 'WARNING: bad range for tags (tag must be in range 001-999)\nValue stored with tag \'000\'', 3: 'WARNING: Missing atributte \'code\' for subfield\nValue stored with code \'\'', 4: 'WARNING: Missing attributte \'ind1\'\n Value stored with ind1 = \'\'', 5: 'WARNING: Missing attributte \'ind2\'\n Value stored with ind2 = \'\'', 6: 'Import Error\n', 7: 'WARNING: value expected of type string.', 8: 'WARNING: empty datafield', 98:'WARNING: problems importing cdsware', 99: 'Document not well formed' } # verbose level to be used when creating records from XML: (0=least, ..., 9=most) cfg_bibrecord_default_verbose_level=0 # correction level to be used when creating records from XML: (0=no, 1=yes) cfg_bibrecord_default_correct=0 # XML parsers available: (0=minidom, 1=4suite, 2=PyRXP) cfg_bibrecord_parsers_available = [0,1,2] diff --git a/modules/bibedit/lib/bibrecord_tests.py b/modules/bibedit/lib/bibrecord_tests.py index 613a60108..9961e7097 100644 --- a/modules/bibedit/lib/bibrecord_tests.py +++ b/modules/bibedit/lib/bibrecord_tests.py @@ -1,281 +1,282 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. -from config import tmpdir, etcdir -import bibrecord import unittest from string import expandtabs, replace +from cdsware.config import tmpdir, etcdir +from cdsware import bibrecord + class SanityTest(unittest.TestCase): """ bibrecord - sanity test (xml -> create records -> xml)""" def test_for_sanity(self): """ bibrecord - demo file sanity test (xml -> create records -> xml)""" f=open(tmpdir + '/demobibdata.xml','r') xmltext = f.read() f.close() # let's try to reproduce the demo XML MARC file by parsing it and printing it back: recs = map((lambda x:x[0]), bibrecord.create_records(xmltext)) xmltext_reproduced = bibrecord.records_xml_output(recs) x = xmltext_reproduced y = xmltext # 'normalize' the two XML MARC files for the purpose of comparing x = expandtabs(x) y = expandtabs(y) x = x.replace(' ','') y = y.replace(' ','') x = x.replace('\n' % etcdir, '') x = x.replace('', "\n") x = x.replace('', "\n\n") self.assertEqual(x,y) class SuccessTest(unittest.TestCase): """ bibrecord - demo file parsing test """ def setUp(self): f=open(tmpdir + '/demobibdata.xml','r') xmltext = f.read() f.close() self.recs = map((lambda x:x[0]),bibrecord.create_records(xmltext)) def test_records_created(self): """ bibrecord - demo file how many records are created """ self.assertEqual(76,len(self.recs)) def test_tags_created(self): """ bibrecord - demo file which tags are created """ ## check if the tags are correct tags= ['020', '037', '041', '080', '088', '100', '245', '246', '250', '260', '270', '300', '340', '490', '500', '502', '520', '590', '595', '650', '653', '690', '700', '710', '856','909','980','999'] t=[] for rec in self.recs: t.extend(rec.keys()) t.sort() #eliminate the elements repeated tt = [] for x in t: if not x in tt: tt.append(x) self.assertEqual(tags,tt) def test_fields_created(self): """bibrecord - demo file how many fields are created""" ## check if the number of fields for each record is correct fields=[13,13, 8, 11, 10,12, 10, 14, 10, 17, 13, 15, 10, 9, 14, 10, 11, 11, 11, 9, 10, 10, 10, 8, 8, 8, 9, 9, 9, 10, 8, 8, 8,8, 14, 13, 14, 14, 15, 12,12, 12,14, 13, 11, 15, 15, 14, 14, 13, 15, 14, 14, 14, 15, 14, 15, 14, 14, 15, 14, 13, 13, 14, 11, 13, 11, 14, 8, 10, 13, 12, 11, 12, 6, 6] cr=[] ret=[] for rec in self.recs: cr.append(len(rec.values())) ret.append(rec) self.assertEqual(fields,cr) class BadInputTreatmentTest(unittest.TestCase): """ bibrecord - testing for bad input treatment """ def test_wrong_attribute(self): """bibrecord - bad input subfield \'cde\' instead of \'code\'""" ws = bibrecord.cfg_bibrecord_warning_msgs xml_error1 = """ 33 eng Doe, John On the foo and bar """ (rec,st,e) = bibrecord.create_record(xml_error1,1,1) ee='' for i in e: if type(i).__name__ == 'str': if i.count(ws[3])>0: ee = i self.assertEqual(bibrecord.warning((3,'(field number: 4)')),ee) def test_missing_attribute(self): """ bibrecord - bad input missing \"tag\" """ ws = bibrecord.cfg_bibrecord_warning_msgs xml_error2 = """ 33 eng Doe, John On the foo and bar """ (rec,st,e) = bibrecord.create_record(xml_error2,1,1) ee='' for i in e: if type(i).__name__ == 'str': if i.count(ws[1])>0: ee = i self.assertEqual(bibrecord.warning((1,'(field number(s): [2])')),ee) def test_empty_datafield(self): """ bibrecord - bad input no subfield """ ws = bibrecord.cfg_bibrecord_warning_msgs xml_error3 = """ 33 Doe, John On the foo and bar """ (rec,st,e) = bibrecord.create_record(xml_error3,1,1) ee='' for i in e: if type(i).__name__ == 'str': if i.count(ws[8])>0: ee = i self.assertEqual(bibrecord.warning((8,'(field number: 2)')),ee) def test_missing_tag(self): """bibrecord - bad input missing end \"tag\"""" ws = bibrecord.cfg_bibrecord_warning_msgs xml_error4 = """ 33 eng Doe, John On the foo and bar """ (rec,st,e) = bibrecord.create_record(xml_error4,1,1) ee = '' for i in e: if type(i).__name__ == 'str': if i.count(ws[99])>0: ee = i self.assertEqual(bibrecord.warning((99,'(Tagname : datafield)')),ee) class AccentedUnicodeLettersTest(unittest.TestCase): """ bibrecord - testing accented UTF-8 letters """ def setUp(self): xml_example_record = """ 33 eng Döè1, John Doe2, J>ohn editor Пушкин On the foo and bar2 """ (self.rec, st, e) = bibrecord.create_record(xml_example_record,1,1) def test_accented_unicode_characters(self): """bibrecord - accented Unicode letters""" self.assertEqual(bibrecord.record_get_field_instances(self.rec, "100", "", ""), [([('a', 'Döè1, John')], '', '', '', 3), ([('a', 'Doe2, J>ohn'), ('b', 'editor')], '', '', '', 4)]) self.assertEqual(bibrecord.record_get_field_instances(self.rec, "245", "", "1"), [([('a', 'Пушкин')], '', '1', '', 5)]) class GettingFieldValuesTest(unittest.TestCase): """ bibrecord - testing for getting field/subfield values """ def setUp(self): xml_example_record = """ 33 eng Doe1, John Doe2, John editor On the foo and bar1 On the foo and bar2 """ (self.rec, st, e) = bibrecord.create_record(xml_example_record,1,1) def test_get_field_instances(self): """bibrecord - getting field instances""" self.assertEqual(bibrecord.record_get_field_instances(self.rec, "100", "", ""), [([('a', 'Doe1, John')], '', '', '', 3), ([('a', 'Doe2, John'), ('b', 'editor')], '', '', '', 4)]) self.assertEqual(bibrecord.record_get_field_instances(self.rec, "", "", ""), [('245', [([('a', 'On the foo and bar1')], '', '1', '', 5), ([('a', 'On the foo and bar2')], '', '2', '', 6)]), ('001', [([], '', '', '33', 1)]), ('100', [([('a', 'Doe1, John')], '', '', '', 3), ([('a', 'Doe2, John'), ('b', 'editor')], '', '', '', 4)]), ('041', [([('a', 'eng')], '', '', '', 2)])]) def test_get_field_values(self): """bibrecord - getting field values""" self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "", "", "a"), ['Doe1, John', 'Doe2, John']) self.assertEqual(bibrecord.record_get_field_values(self.rec, "100", "", "", "b"), ['editor']) def test_get_subfield_values(self): """bibrecord - getting subfield values""" fi1, fi2 = bibrecord.record_get_field_instances(self.rec, "100", "", "") self.assertEqual(bibrecord.field_get_subfield_values(fi1, "b"), []) self.assertEqual(bibrecord.field_get_subfield_values(fi2, "b"), ["editor"]) def create_test_suite(): """Return test suite for the bibrecord module""" return unittest.TestSuite((unittest.makeSuite(SanityTest,'test'), unittest.makeSuite(SuccessTest,'test'), unittest.makeSuite(BadInputTreatmentTest,'test'), unittest.makeSuite(GettingFieldValuesTest,'test'), unittest.makeSuite(AccentedUnicodeLettersTest,'test'))) if __name__ == '__main__': unittest.TextTestRunner(verbosity=2).run(create_test_suite()) diff --git a/modules/bibedit/lib/refextract.py b/modules/bibedit/lib/refextract.py index f23258994..af062f5e6 100644 --- a/modules/bibedit/lib/refextract.py +++ b/modules/bibedit/lib/refextract.py @@ -1,3002 +1,3002 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. try: import sys, re, string, time import os, getopt, cgi - from refextract_config import * from cStringIO import StringIO + from cdsware.refextract_config import * except ImportError, e: raise ImportError(e) class StringBuffer1: """This class is a String buffer, used for concatenation of strings. This version uses a memory file as a string buffer """ def __init__(self): self._bufferFile = StringIO() def append(self, itm): """Add a string to the string buffer""" self._bufferFile.write("%s"%(itm,)) def get(self): """Get buffered string and return it as string object""" return self._bufferFile.getvalue() class StringBuffer2: """This class is a String buffer, used for concatenation of strings. This version uses a list as a string buffer """ def __init__(self): self._buffer = [] def append(self, itm): """Add a new string into the buffer""" self._buffer.append(itm) def get(self): """Join all strings in th buffer into a single string and return it""" return ''.join(self._buffer) class SystemMessage: def __init__(self): self._helpMessage = """refextract recid:pdffile [recid:pdffile]""" self._versionMessage = cfg_refextract_version def getHelpMessage(self): return self._helpMessage def getVersionMessage(self): return self._versionMessage class ReferenceSection: """Concrete class representing the Reference section of a document. Once References have been extracted, they are put into a ReferenceSection object, which contains a list of "ReferenceLine" objects """ class ReferenceSectionIterator: def __init__(self, reflines): self._mylist = reflines self._listptr = 0 def next(self): try: item = self._mylist[self._listptr] self._listptr += 1 return item except IndexError: raise StopIteration def __init__(self, refLineStrings = []): """Initialise a ReferenceSection object with the lines composing the references of a document. If a string argument is supplied, it will be appended as the first reference line. If a list argument is supplied, each element of the list that contains a string will be appended to the list of reference lines in order. Arguments of neither String or list type will be ignored and an empty ReferenceSection object will be the result """ self._referenceLines = [] self._lnPtr = 0 if type(refLineStrings) is list: for line in refLineStrings: self.addNewLine(line) else: if not self.addNewLine(refLineStrings): self._referenceLines = [] self.resetLinePointer() def __iter__(self): """return self as iterator object""" return ReferenceSection.ReferenceSectionIterator(self._referenceLines) def resetLinePointer(self): """Reset the position of the ReferenceLine pointer of a ReferenceSection object to point at the first line""" self._lnPtr = 0 def gotoNextLine(self): """Move the position of the ReferenceLine pointer of a ReferenceSection object to point at the next line""" if self.lineExists(self._lnPtr+1): self._lnPtr = self._lnPtr+1 return True else: return False def gotoLine(self, lNum): """Move the position of the ReferenceLine pointer of a ReferenceSection object to point at the line number supplied""" if self.lineExists(lNum-1): self._lnPtr = lNum-1 return True else: return False def getCurrentLineAsString(self): """Return a String containing the text contents of the ReferenceLine object currently pointed to by the internal pointer of a ReferenceSection object. Returns empty string if no ReferenceLine is currently pointed at """ if self.lineExists(self._lnPtr): return self._referenceLines[self._lnPtr].getContent() else: self.resetLinePointer() return u'' def getCurrentLine(self): """Return the ReferenceLine object that is currently pointed to by the internal pointer of a ReferenceSection object. If no object pointed at, return the 'None' object """ if self.lineExists(self._lnPtr): return self._referenceLines[self._lnPtr] else: self.resetLinePointer() return None def getLineAsString(self, lNum): """Return a String containing the text contents of the ReferenceLine object at the line number supplied (1..n). Returns ain empty String if line number does not exist """ if self.lineExists(lNum-1): return self._referenceLines[lNum-1].getContent() else: return u"" def getLine(self, lNum): """Return the ReferenceLine at the line number supplied (1..n) Returns 'None' object if line number does not exist""" if self.lineExists(lNum-1): return self._referenceLines[lNum-1] else: return None def displayAllLines(self): """Display all ReferenceLine objects stored within a ReferenceSection object consecutively as Strings on the standard output stream""" for x in self._referenceLines: x.display() def displayCurrentLine(self): """Display the ReferenceLine that is currently pointed to by a ReferenceSection object""" if self.lineExists(self._lnPtr): self._referenceLines[self._lnPtr].display() else: self.resetLinePointer() def displayLine(self, lNum): """Display the reference line at the line number supplied (1..n). Will display nothing if the line number does not exist""" if self.lineExists(lNum-1): self._referenceLines[lNum-1].display() def lineExists(self, lNum): """Returns True if line lNum exists in a ReferenceSection, False if not. (Reminder: Lines in the range 0..N)""" if (lNum < len(self._referenceLines)) and (lNum >= 0): return True else: return False def addNewLine(self, lineTxt): """Takes one String argument (lineTxt) and attempts to create a new ReferenceLine with this text, adding it to the last place in the referenceLines list of a ReferenceSection object. Returns True if successful, False if not """ if type(lineTxt) is str or type(lineTxt) is unicode: ln = ReferenceLine(lineTxt) self._referenceLines.append(ln) return True else: return False def setContentLine(self, newContent): """Set the contents of the current line to that supplied in the 'newContent' argument. Return True on success, False on failure""" if self.lineExists(self._lnPtr): return self._referenceLines[self._lnPtr].setContent(newContent) else: return False def lAppendLineText(self, appendStr): """Append text to the beginning of the ReferenceLine object currently pointed at by a ReferenceSection object""" if self.lineExists(self._lnPtr): return self._referenceLines[self._lnPtr].lAppend(appendStr) else: return False def rAppendLineText(self, appendStr): """Append text to the end of the ReferenceLine object currently pointed at by a ReferenceSection object""" if self.lineExists(self._lnPtr): return self._referenceLines[self._lnPtr].rAppend(appendStr) else: return False def isEmpty(self): """Return True if the reference section contains no reference lines, False if it does contain lines""" return (len(self._referenceLines) < 1) class ReferenceLine: """Concrete class representing an individual reference line as extracted from a document""" def __init__(self, data=''): """Initialise a ReferenceLine's contents with the supplied String. If argument supplied is not a String, the ReferenceLine object's contents will be initialised with a blank String """ if type(data) is str or type(data) is unicode: self._content = data else: self._content = u'' def getContent(self): """Return a String version of a ReferenceLine's contents""" return self._content def display(self): """Display a ReferenceLine as a String on the standard output stream""" print self._content.encode("utf-8") def setContent(self, newContent=u''): """Set the content of a ReferenceLine to a new text String. Returns True if successful, False if not""" if type(newContent) is str or type(newContent) is unicode: self._content = newContent return True else: return False def rAppend(self, appendStr): """Append a text String to the end of a ReferenceLine object's textual content. Returns True if append successful, False if not""" if type(appendStr) is str or type(appendStr) is unicode: self._content=self._content + appendStr return True else: return False def lAppend(self, appendStr): """Append a text String to the beginning of a ReferenceLine objects textual content. Returns True if append successful False if not""" if type(appendStr) is str or type(appendStr) is unicode: self._content = appendStr+self._content return True else: return False class ReferenceSectionDisplayer: def display(self, refsect, recid=None, myostream=sys.stdout): if isinstance(refsect, ReferenceSection): myostream.write("%s" % (self._rawReferebcesToString(refsect,recid).encode("utf-8"),)) myostream.flush() elif isinstance(refsect, ProcessedReferenceSection): myostream.write("%s" % (self._processedReferebcesToMARCXMLString(refsect,recid).encode("utf-8"),)) myostream.flush() def _rawReferebcesToString(self,refsect,recid=None): refstr = u"" if not refsect.isEmpty(): # Section Header refstr += u"#################### START REFERENCE SECTION " if recid is not None: refstr += u"SYSID: '%s' " % (recid,) refstr += u"####################\n" for x in refsect: refstr += x.getContent()+u"\n" # Section Footer refstr += u"#################### END REFERENCE SECTION ####################\n" return refstr def _processedReferebcesToMARCXMLString(self,refsect,recid=None): refsectmainbody = refsect.getSelfMARCXML() if len(refsectmainbody.strip()) > 0: out = u""" \n""" if recid is not None and (type(recid) is unicode or type(recid) is str): out += u""" """+cgi.escape(recid)+u"""\n""" out += refsectmainbody out += u""" \n""" else: out = u"" return out class RegexWordSpacer: """Concrete Class. Adds optional regex space matchers and quantifiers (\s*?) between the characters of a word. Useful because sometimes the document conversion process breaks up words with spaces """ def space(self, word): """Add the space chars to a word & return the regex pattern (not compiled)""" newWord = None if type(word) is str or type(word) is unicode: newWord = u'' p_spc = re.compile(unicode(r'\s'),re.UNICODE) for x in word: m_spc = p_spc.match(x) if m_spc is None: newWord = newWord+x+unicode(r'\s*?') else: newWord = newWord+x return newWord class DocumentSearchPatternListCompiler: """Abstract class. Used to get a 'DocumentSearchCompiledPatternList' object, which is used for searching lines of a document for a given pattern """ def getCompiledPatternList(self, prefix = u'', suffix = u''): """Return a list of compiled regex patterns""" pass def createPatterns(self, prefix = u'', suffix = u''): """Create the regex patterns (don't compile though)""" pass class RefSecnTitleListCompiler(DocumentSearchPatternListCompiler): """Concrete class. Used to return a 'DocumentSearchCompiledPatternList' object containing regex patterns enabling the identification of possible reference section titles in a text line """ def getCompiledPatternList(self, prefix = u'', suffix = u''): """Return a list of compiled regex patterns used to ID reference section title""" patterns = self.createPatterns() return CompiledPatternList(patterns) def createPatterns(self, prefix = u'', suffix = u''): """Create the regex patterns (don't compile though)""" patternList = [] titles = self.getTitles() sectMarker = unicode(r'^\s*?([\[\-\{\(])?\s*?((\w|\d){1,5}([\.\-\,](\w|\d){1,5})?\s*?[\.\-\}\)\]]\s*?)?(?P') lineEnd = unicode(r'(\s+?s\s*?e\s*?c\s*?t\s*?i\s*?o\s*?n\s*?)?)') lineEnd = lineEnd+unicode(r'($|\s*?[\[\{\(\<]\s*?[1a-z]\s*?[\}\)\>\]]|\:)') s = RegexWordSpacer() for x in titles: if (type(x) is str or type(x) is unicode) and len(x) > 1: s = RegexWordSpacer() namePtn = sectMarker+s.space(x)+lineEnd patternList.append(namePtn) elif (type(x) is str or type(x) is unicode) and len(x) > 0: namePtn = sectMarker+s.space(x)+lineEnd patternList.append(namePtn) return patternList def getTitles(self): """Get and return a list of the titles to be searched for""" titles = [] titles.append(u'references') titles.append(u'r\u00C9f\u00E9rences') titles.append(u'r\u00C9f\u00C9rences') titles.append(u'reference') titles.append(u'refs') titles.append(u'r\u00E9f\u00E9rence') titles.append(u'r\u00C9f\u00C9rence') titles.append(u'r\xb4ef\xb4erences') titles.append(u'r\u00E9fs') titles.append(u'r\u00C9fs') titles.append(u'bibliography') titles.append(u'bibliographie') titles.append(u'citations') return titles class PostRefSecnTitleListCompiler(DocumentSearchPatternListCompiler): """Concrete class. Used to return a 'DocumentSearchCompiledPatternList' object containing regex patterns enabling the identification of possible titles that usually follow the reference section in a doc """ def getCompiledPatternList(self, prefix = '', suffix = ''): """Return a list of compiled regex patterns used to ID post reference section title""" patterns = self.createPatterns() return CompiledPatternList(patterns) def createPatterns(self, prefix = '', suffix = ''): """Create the regex patterns (don't compile though)""" patterns = [] thead = unicode(r'^\s*?([\{\(\<\[]?\s*?(\w|\d)\s*?[\)\}\>\.\-\]]?\s*?)?') ttail = unicode(r'(\s*?\:\s*?)?') numatn = unicode(r'(\d+|\w\b|i{1,3}v?|vi{0,3})[\.\,]?\b') s = RegexWordSpacer() # Section titles: patterns.append(thead+s.space(u'appendix')+ttail) patterns.append(thead+s.space(u'appendices')+ttail) patterns.append(thead+s.space(u'acknowledgement')+unicode(r's?')+ttail) patterns.append(thead+s.space(u'table')+unicode(r'\w?s?\d?')+ttail) patterns.append(thead+s.space(u'figure')+unicode(r's?')+ttail) patterns.append(thead+s.space(u'annex')+unicode(r's?')+ttail) patterns.append(thead+s.space(u'discussion')+unicode(r's?')+ttail) patterns.append(thead+s.space(u'remercie')+unicode(r's?')+ttail) # Figure nums: patterns.append(r'^\s*?'+s.space(u'figure')+numatn) patterns.append(r'^\s*?'+s.space(u'fig')+unicode(r'\.\s*?')+numatn) patterns.append(r'^\s*?'+s.space(u'fig')+unicode(r'\.?\s*?\d\w?\b')) # Table nums: patterns.append(r'^\s*?'+s.space(u'table')+numatn) patterns.append(r'^\s*?'+s.space(u'tab')+unicode(r'\.\s*?')+numatn) patterns.append(r'^\s*?'+s.space(u'tab')+unicode(r'\.?\s*?\d\w?\b')) return patterns class PostRefSecnKWListCompiler(DocumentSearchPatternListCompiler): """Concrete class. Used to return a 'DocumentSearchCompiledPatternList' object containing regex patterns enabling the identification of Key Words/phrases that are often found in lines following the reference section of a document """ def getCompiledPatternList(self, prefix = u'', suffix = u''): """Return a list of compiled regex patterns used to ID keywords usually found in lines after a reference section""" patterns = self.createPatterns() return CompiledPatternList(patterns) def createPatterns(self, prefix = u'', suffix = u''): """Create the regex patterns (don't compile though)""" patterns = [] s = RegexWordSpacer() patterns.append(unicode(r'(')+s.space(u'prepared')+unicode(r'|')+s.space(u'created')+unicode(r').*?(AAS\s*?)?\sLATEX')) patterns.append(unicode(r'AAS\s+?LATEX\s+?')+s.space(u'macros')+u'v') patterns.append(unicode(r'^\s*?')+s.space(u'This paper has been produced using')) patterns.append(unicode(r'^\s*?')+s.space(u'This article was processed by the author using Springer-Verlag')+u' LATEX') return patterns class FirstRefLineNumerationListCompiler(DocumentSearchPatternListCompiler): """Concrete class. Used to return a 'DocumentSearchCompiledPatternList' object containing regex patterns enabling the identification of the first reference line by its numeration marker """ def getCompiledPatternList(self, prefix = u'', suffix = u''): """Return a list of compiled regex patterns used to ID the first reference line by its numeration marker""" patterns = self.createPatterns() return CompiledPatternList(patterns) def createPatterns(self, prefix = u'', suffix = u''): """Create the regex patterns (don't compile though)""" patterns = [] g_name = unicode(r'(?P<mark>') g_close = u')' patterns.append(g_name+unicode(r'(?P<left>\[)\s*?(?P<num>\d+)\s*?(?P<right>\])')+g_close) patterns.append(g_name+unicode(r'(?P<left>\{)\s*?(?P<num>\d+)\s*?(?P<right>\})')+g_close) return patterns class RefLineNumerationListCompiler(DocumentSearchPatternListCompiler): """Concrete class. Used to return a 'DocumentSearchCompiledPatternList' object containing regex patterns enabling the ID of any reference line by its numeration marker """ def getCompiledPatternList(self, prefix = u'', suffix = u''): """Return a list of compiled regex patterns used to ID the numeration marker for a reference line""" patterns = self.createPatterns() return CompiledPatternList(patterns) def createPatterns(self, prefix = u'', suffix = u''): """Create the regex patterns (don't compile though)""" patterns = [] if type(prefix) is str or type(prefix) is unicode: title = prefix else: title = u'' g_name = unicode(r'(?P<mark>') g_close = u')' space = unicode(r'\s*?') patterns.append(space+title+g_name+unicode(r'\[\s*?(?P<linenumber>\d+)\s*?\]')+g_close) patterns.append(space+title+g_name+unicode(r'\[\s*?[a-zA-Z]+\s?(\d{1,4}[A-Za-z]?)?\s*?\]')+g_close) patterns.append(space+title+g_name+unicode(r'\{\s*?\d+\s*?\}')+g_close) patterns.append(space+title+g_name+unicode(r'\<\s*?\d+\s*?\>')+g_close) patterns.append(space+title+g_name+unicode(r'\(\s*?\d+\s*?\)')+g_close) patterns.append(space+title+g_name+unicode(r'(?P<marknum>\d+)\s*?\.')+g_close) patterns.append(space+title+g_name+unicode(r'\d+\s*?')+g_close) patterns.append(space+title+g_name+unicode(r'\d+\s*?\]')+g_close) patterns.append(space+title+g_name+unicode(r'\d+\s*?\}')+g_close) patterns.append(space+title+g_name+unicode(r'\d+\s*?\)')+g_close) patterns.append(space+title+g_name+unicode(r'\d+\s*?\>')+g_close) patterns.append(space+title+g_name+unicode(r'\[\s*?\]')+g_close) patterns.append(space+title+g_name+unicode(r'\*')+g_close) return patterns class CompiledPatternList: """Concrete Class. List of compiled regex patterns, ready to be used for searching through text lines""" class CompiledPatternListIterator: def __init__(self, ptnlines): self._mylist = ptnlines self._listptr = 0 def next(self): try: item = self._mylist[self._listptr] self._listptr += 1 return item except IndexError: raise StopIteration def __init__(self, patternList): """Accept a list of regex strings and compile them, adding them to the internal list of compiled regex patterns""" self._patterns = [] if type(patternList) is list: for x in patternList: self._patterns.append(re.compile(x, re.I|re.UNICODE)) def __iter__(self): """Return a CompiledPatternListIterator object so that the patterns held by a CompiledPatternList can be iterated through""" return CompiledPatternList.CompiledPatternListIterator(self._patterns) def getNumPatterns(self): """Return the length of the internal pattern list (patterns)""" return len(self._patterns) def getPattern(self, ptnIdx): """Return the regex pattern at [ptnIdx] in the internal pattern list (self._patterns). Returns 'None' if ptnIdx not valid""" if type(ptnIdx) is int and ptnIdx < len(self._patterns) and ptnIdx > -1: return self._patterns[ptnIdx] else: return None def display(self): """Display all patterns held in a CompiledPatternList object""" for x in self._patterns: print x.pattern.encode("utf-8") class LineSearchAlgorithm: """Search algorithm for matching a pattern in a line""" def doSearch(self, searcher, line, patternList): """Search for a pattern in a line of text""" match = None unsafe = False try: getNumPatterns=patternList.getNumPatterns except AttributeError: unsafe=True if (type(line) is str or type(line) is unicode) and not unsafe: for x in patternList: match = searcher.goSearch(line, x) if match is not None: break return match class SearchExecuter: """Abstract class. Executes a regex search operation on a line of text which is passed to it""" def goSearch(self, line, pattern): """Execute the search and return a match object or None""" pass class MatchSearchExecuter(SearchExecuter): """Concrete class. Executes a 're.match()' on a compiled re pattern""" def goSearch(self, line, pattern): """Execute the search and return a 'Match' object or None""" return pattern.match(line) class SearchSearchExecuter(SearchExecuter): """Concrete class. Executes a 're.search()' on a compiled re pattern""" def goSearch(self, line, pattern): """Execute the search and return a 'Match' object or None""" return pattern.search(line) class LineSearcher: """Concrete Class. This is the interface through which the user can carry out a line search""" def findAtStartLine(self, line, patternList): """Test a line of text against a list of patterns to see if any of the patterns match at the start of the line""" al = LineSearchAlgorithm() searcher = MatchSearchExecuter() return al.doSearch(searcher, line, patternList) def findWithinLine(self, line, patternList): """Test a line of text against a list of patterns to see if any of the patterns match anywhere within the line""" al = LineSearchAlgorithm() searcher = SearchSearchExecuter() return al.doSearch(searcher, line, patternList) class TextLineTransformer: """Abstract Class Interface. Accepts line, performs some transformationon it and returns transformed line""" def processLine(self, line): """Carry out transformation on line. Return transformed line""" pass class EscapeSequenceTransformer(TextLineTransformer): """Class to correct escape seq's which were not properly represented in the document conversion""" def __init__(self): """Compile & initialise pattern list""" self._patterns = self._getPatterns() def processLine(self, line): """Transform accents in a line into correct format""" try: for x in self._patterns.keys(): try: line = line.replace(x,self._patterns[x]) except UnicodedecodeError: sys.exit(0) except TypeError: pass return line def _getPatterns(self): """Return a list of regex patterns used to recognise escaped patterns""" plist = {} def _addLanguageTagCodePoints(ptnlist): """Add all language tag code points to remove from document""" # Language Tag Code Points: langTagCPs = [u"\U000E0000",u"\U000E0001",u"\U000E0002",u"\U000E0003",u"\U000E0004",u"\U000E0005",u"\U000E0006",u"\U000E0007",u"\U000E0008",u"\U000E0009",u"\U000E000A",u"\U000E000B",u"\U000E000C",u"\U000E000D",u"\U000E000E",u"\U000E000F", u"\U000E0010",u"\U000E0011",u"\U000E0012",u"\U000E0013",u"\U000E0014",u"\U000E0015",u"\U000E0016",u"\U000E0017",u"\U000E0018",u"\U000E0019",u"\U000E001A",u"\U000E001B",u"\U000E001C",u"\U000E001D",u"\U000E001E",u"\U000E001F", u"\U000E0020",u"\U000E0021",u"\U000E0022",u"\U000E0023",u"\U000E0024",u"\U000E0025",u"\U000E0026",u"\U000E0027",u"\U000E0028",u"\U000E0029",u"\U000E002A",u"\U000E002B",u"\U000E002C",u"\U000E002D",u"\U000E002E",u"\U000E002F", u"\U000E0030",u"\U000E0031",u"\U000E0032",u"\U000E0033",u"\U000E0034",u"\U000E0035",u"\U000E0036",u"\U000E0037",u"\U000E0038",u"\U000E0039",u"\U000E003A",u"\U000E003B",u"\U000E003C",u"\U000E003D",u"\U000E003E",u"\U000E003F", u"\U000E0040",u"\U000E0041",u"\U000E0042",u"\U000E0043",u"\U000E0044",u"\U000E0045",u"\U000E0046",u"\U000E0047",u"\U000E0048",u"\U000E0049",u"\U000E004A",u"\U000E004B",u"\U000E004C",u"\U000E004D",u"\U000E004E",u"\U000E004F", u"\U000E0050",u"\U000E0051",u"\U000E0052",u"\U000E0053",u"\U000E0054",u"\U000E0055",u"\U000E0056",u"\U000E0057",u"\U000E0058",u"\U000E0059",u"\U000E005A",u"\U000E005B",u"\U000E005C",u"\U000E005D",u"\U000E005E",u"\U000E005F", u"\U000E0060",u"\U000E0061",u"\U000E0062",u"\U000E0063",u"\U000E0064",u"\U000E0065",u"\U000E0066",u"\U000E0067",u"\U000E0068",u"\U000E0069",u"\U000E006A",u"\U000E006B",u"\U000E006C",u"\U000E006D",u"\U000E006E",u"\U000E006F", u"\U000E0070",u"\U000E0071",u"\U000E0072",u"\U000E0073",u"\U000E0074",u"\U000E0075",u"\U000E0076",u"\U000E0077",u"\U000E0078",u"\U000E0079",u"\U000E007A",u"\U000E007B",u"\U000E007C",u"\U000E007D",u"\U000E007E",u"\U000E007F"] for itm in langTagCPs: ptnlist[itm] = u"" def _addMusicNotation(ptnlist): """Add all musical notation items to remove from document""" # Musical Notation Scoping musicNotation = [u"\U0001D173",u"\U0001D174",u"\U0001D175",u"\U0001D176",u"\U0001D177",u"\U0001D178",u"\U0001D179",u"\U0001D17A"] for itm in musicNotation: ptnlist[itm] = u"" # Control characters not suited to XML: plist[u'\u2028'] = u"" plist[u'\u2029'] = u"" plist[u'\u202A'] = u"" plist[u'\u202B'] = u"" plist[u'\u202C'] = u"" plist[u'\u202D'] = u"" plist[u'\u202E'] = u"" plist[u'\u206A'] = u"" plist[u'\u206B'] = u"" plist[u'\u206C'] = u"" plist[u'\u206D'] = u"" plist[u'\u206E'] = u"" plist[u'\u206F'] = u"" plist[u'\uFFF9'] = u"" plist[u'\uFFFA'] = u"" plist[u'\uFFFB'] = u"" plist[u'\uFFFC'] = u"" plist[u'\uFEFF'] = u"" _addLanguageTagCodePoints(plist) _addMusicNotation(plist) plist[u'\u0001'] = u"" # START OF HEADING # START OF TEXT & END OF TEXT: plist[u'\u0002'] = u"" plist[u'\u0003'] = u"" plist[u'\u0004'] = u"" # END OF TRANSMISSION # ENQ and ACK plist[u'\u0005'] = u"" plist[u'\u0006'] = u"" plist[u'\u0007'] = u"" # BELL plist[u'\u0008'] = u"" # BACKSPACE # SHIFT-IN & SHIFT-OUT plist[u'\u000E'] = u"" plist[u'\u000F'] = u"" # Other controls: plist[u'\u0010'] = u"" # DATA LINK ESCAPE plist[u'\u0011'] = u"" # DEVICE CONTROL ONE plist[u'\u0012'] = u"" # DEVICE CONTROL TWO plist[u'\u0013'] = u"" # DEVICE CONTROL THREE plist[u'\u0014'] = u"" # DEVICE CONTROL FOUR plist[u'\u0015'] = u"" # NEGATIVE ACK plist[u'\u0016'] = u"" # SYNCRONOUS IDLE plist[u'\u0017'] = u"" # END OF TRANSMISSION BLOCK plist[u'\u0018'] = u"" # CANCEL plist[u'\u0019'] = u"" # END OF MEDIUM plist[u'\u001A'] = u"" # SUBSTITUTE plist[u'\u001B'] = u"" # ESCAPE plist[u'\u001C'] = u"" # INFORMATION SEPARATOR FOUR (file separator) plist[u'\u001D'] = u"" # INFORMATION SEPARATOR THREE (group separator) plist[u'\u001E'] = u"" # INFORMATION SEPARATOR TWO (record separator) plist[u'\u001F'] = u"" # INFORMATION SEPARATOR ONE (unit separator) # \r -> remove it plist[u'\r'] = u"" # Strange parantheses - change for normal: plist[u'\x1c'] = u'(' plist[u'\x1d'] = u')' # Some ff from tex: plist[u'\u0013\u0010'] = u'\u00ED' plist[u'\x0b'] = u'ff' # fi from tex: plist[u'\x0c'] = u'fi' # ligatures from TeX: plist[u'\ufb00'] = u'ff' plist[u'\ufb01'] = u'fi' plist[u'\ufb02'] = u'fl' plist[u'\ufb03'] = u'ffi' plist[u'\ufb04'] = u'ffl' # Superscripts from TeX plist[u'\u2212'] = u'-' plist[u'\u2013'] = u'-' # Word style speech marks: plist[u'\u201d'] = u'"' plist[u'\u201c'] = u'"' # pdftotext has problems with umlaut and prints it as diaeresis followed by a letter:correct it # (Optional space between char and letter - fixes broken line examples) plist[u'\u00A8 a'] = u'\u00E4' plist[u'\u00A8 e'] = u'\u00EB' plist[u'\u00A8 i'] = u'\u00EF' plist[u'\u00A8 o'] = u'\u00F6' plist[u'\u00A8 u'] = u'\u00FC' plist[u'\u00A8 y'] = u'\u00FF' plist[u'\u00A8 A'] = u'\u00C4' plist[u'\u00A8 E'] = u'\u00CB' plist[u'\u00A8 I'] = u'\u00CF' plist[u'\u00A8 O'] = u'\u00D6' plist[u'\u00A8 U'] = u'\u00DC' plist[u'\u00A8 Y'] = u'\u0178' plist[u'\xA8a'] = u'\u00E4' plist[u'\xA8e'] = u'\u00EB' plist[u'\xA8i'] = u'\u00EF' plist[u'\xA8o'] = u'\u00F6' plist[u'\xA8u'] = u'\u00FC' plist[u'\xA8y'] = u'\u00FF' plist[u'\xA8A'] = u'\u00C4' plist[u'\xA8E'] = u'\u00CB' plist[u'\xA8I'] = u'\u00CF' plist[u'\xA8O'] = u'\u00D6' plist[u'\xA8U'] = u'\u00DC' plist[u'\xA8Y'] = u'\u0178' # More umlaut mess to correct: plist[u'\x7fa'] = u'\u00E4' plist[u'\x7fe'] = u'\u00EB' plist[u'\x7fi'] = u'\u00EF' plist[u'\x7fo'] = u'\u00F6' plist[u'\x7fu'] = u'\u00FC' plist[u'\x7fy'] = u'\u00FF' plist[u'\x7fA'] = u'\u00C4' plist[u'\x7fE'] = u'\u00CB' plist[u'\x7fI'] = u'\u00CF' plist[u'\x7fO'] = u'\u00D6' plist[u'\x7fU'] = u'\u00DC' plist[u'\x7fY'] = u'\u0178' plist[u'\x7f a'] = u'\u00E4' plist[u'\x7f e'] = u'\u00EB' plist[u'\x7f i'] = u'\u00EF' plist[u'\x7f o'] = u'\u00F6' plist[u'\x7f u'] = u'\u00FC' plist[u'\x7f y'] = u'\u00FF' plist[u'\x7f A'] = u'\u00C4' plist[u'\x7f E'] = u'\u00CB' plist[u'\x7f I'] = u'\u00CF' plist[u'\x7f O'] = u'\u00D6' plist[u'\x7f U'] = u'\u00DC' plist[u'\x7f Y'] = u'\u0178' # pdftotext: fix accute accent: plist[u'\x13a'] = u'\u00E1' plist[u'\x13e'] = u'\u00E9' plist[u'\x13i'] = u'\u00ED' plist[u'\x13o'] = u'\u00F3' plist[u'\x13u'] = u'\u00FA' plist[u'\x13y'] = u'\u00FD' plist[u'\x13A'] = u'\u00C1' plist[u'\x13E'] = u'\u00C9' plist[u'\x13I'] = u'\u00CD' plist[u'\x13O'] = u'\u00D3' plist[u'\x13U'] = u'\u00DA' plist[u'\x13Y'] = u'\u00DD' plist[u'\x13 a'] = u'\u00E1' plist[u'\x13 e'] = u'\u00E9' plist[u'\x13 i'] = u'\u00ED' plist[u'\x13 o'] = u'\u00F3' plist[u'\x13 u'] = u'\u00FA' plist[u'\x13 y'] = u'\u00FD' plist[u'\x13 A'] = u'\u00C1' plist[u'\x13 E'] = u'\u00C9' plist[u'\x13 I'] = u'\u00CD' plist[u'\x13 O'] = u'\u00D3' plist[u'\x13 U'] = u'\u00DA' plist[u'\x13 Y'] = u'\u00DD' plist[u'\u00B4 a'] = u'\u00E1' plist[u'\u00B4 e'] = u'\u00E9' plist[u'\u00B4 i'] = u'\u00ED' plist[u'\u00B4 o'] = u'\u00F3' plist[u'\u00B4 u'] = u'\u00FA' plist[u'\u00B4 y'] = u'\u00FD' plist[u'\u00B4 A'] = u'\u00C1' plist[u'\u00B4 E'] = u'\u00C9' plist[u'\u00B4 I'] = u'\u00CD' plist[u'\u00B4 O'] = u'\u00D3' plist[u'\u00B4 U'] = u'\u00DA' plist[u'\u00B4 Y'] = u'\u00DD' plist[u'\u00B4a'] = u'\u00E1' plist[u'\u00B4e'] = u'\u00E9' plist[u'\u00B4i'] = u'\u00ED' plist[u'\u00B4o'] = u'\u00F3' plist[u'\u00B4u'] = u'\u00FA' plist[u'\u00B4y'] = u'\u00FD' plist[u'\u00B4A'] = u'\u00C1' plist[u'\u00B4E'] = u'\u00C9' plist[u'\u00B4I'] = u'\u00CD' plist[u'\u00B4O'] = u'\u00D3' plist[u'\u00B4U'] = u'\u00DA' plist[u'\u00B4Y'] = u'\u00DD' # pdftotext: fix grave accent: plist[u'\u0060 a'] = u'\u00E0' plist[u'\u0060 e'] = u'\u00E8' plist[u'\u0060 i'] = u'\u00EC' plist[u'\u0060 o'] = u'\u00F2' plist[u'\u0060 u'] = u'\u00F9' plist[u'\u0060 A'] = u'\u00C0' plist[u'\u0060 E'] = u'\u00C8' plist[u'\u0060 I'] = u'\u00CC' plist[u'\u0060 O'] = u'\u00D2' plist[u'\u0060 U'] = u'\u00D9' plist[u'\u0060a'] = u'\u00E0' plist[u'\u0060e'] = u'\u00E8' plist[u'\u0060i'] = u'\u00EC' plist[u'\u0060o'] = u'\u00F2' plist[u'\u0060u'] = u'\u00F9' plist[u'\u0060A'] = u'\u00C0' plist[u'\u0060E'] = u'\u00C8' plist[u'\u0060I'] = u'\u00CC' plist[u'\u0060O'] = u'\u00D2' plist[u'\u0060U'] = u'\u00D9' # \02C7 : caron plist[u'\u02C7C'] = u'\u010C' plist[u'\u02C7c'] = u'\u010D' plist[u'\u02C7S'] = u'\u0160' plist[u'\u02C7s'] = u'\u0161' plist[u'\u02C7Z'] = u'\u017D' plist[u'\u02C7z'] = u'\u017E' # \027 : aa (a with ring above) plist[u'\u02DAa'] = u'\u00E5' plist[u'\u02DAA'] = u'\u00C5' # \030 : cedilla plist[u'\u0327c'] = u'\u00E7' plist[u'\u0327C'] = u'\u00C7' return plist class URLRepairer(TextLineTransformer): """Class to attempt to re-assemble URLs which have been broken during the document's conversion to text""" def __init__(self): """Initialise the URI correction pattern list""" self._patterns = self._compilePatterns(self._getPatterns()) def processLine(self, line): """Repair any broken URLs in line and return newly repaired line""" def chop_spaces(m): chopper = SpaceNullifier() line = m.group(1) return chopper.processLine(line) if type(line) is str or type(line) is unicode: for x in self._patterns: line = x.sub(chop_spaces, line) return line def _getPatterns(self): """Return a list regex patterns and corrective measures to be used when broken URLs are encountered in a line""" fileTypesList = [] fileTypesList.append(unicode(r'h\s*?t\s*?m')) # htm fileTypesList.append(unicode(r'h\s*?t\s*?m\s*?l')) # html fileTypesList.append(unicode(r't\s*?x\s*?t')) # txt fileTypesList.append(unicode(r'p\s*?h\s*?p')) # php fileTypesList.append(unicode(r'a\s*?s\s*?p\s*?')) # asp fileTypesList.append(unicode(r'j\s*?s\s*?p')) # jsp fileTypesList.append(unicode(r'p\s*?y')) # py (python) fileTypesList.append(unicode(r'p\s*?l')) # pl (perl) fileTypesList.append(unicode(r'x\s*?m\s*?l')) # xml fileTypesList.append(unicode(r'j\s*?p\s*?g')) # jpg fileTypesList.append(unicode(r'g\s*?i\s*?f')) # gif fileTypesList.append(unicode(r'm\s*?o\s*?v')) # mov fileTypesList.append(unicode(r's\s*?w\s*?f')) # swf fileTypesList.append(unicode(r'p\s*?d\s*?f')) # pdf fileTypesList.append(unicode(r'p\s*?s')) # ps fileTypesList.append(unicode(r'd\s*?o\s*?c')) # doc fileTypesList.append(unicode(r't\s*?e\s*?x')) # tex fileTypesList.append(unicode(r's\s*?h\s*?t\s*?m\s*?l')) # shtml plist = [] plist.append(unicode(r'(h\s*t\s*t\s*p\s*\:\s*\/\s*\/)')) plist.append(unicode(r'(f\s*t\s*p\s*\:\s*\/\s*\/\s*)')) plist.append(unicode(r'((http|ftp):\/\/\s*[\w\d])')) plist.append(unicode(r'((http|ftp):\/\/([\w\d\s\._\-])+?\s*\/)')) plist.append(unicode(r'((http|ftp):\/\/([\w\d\_\.\-])+\/(([\w\d\_\s\.\-])+?\/)+)')) plist.append(unicode(r'((http|ftp):\/\/([\w\d\_\.\-])+\/(([\w\d\_\s\.\-])+?\/)*([\w\d\_\s\-]+\.\s?[\w\d]+))')) # some possible endings for URLs: for x in fileTypesList: plist.append(unicode(r'((http|ftp):\/\/([\w\d\_\.\-])+\/(([\w\d\_\.\-])+?\/)*([\w\d\_\-]+\.') + x + u'))') # if url last thing in line, and only 10 letters max, concat them plist.append(unicode(r'((http|ftp):\/\/([\w\d\_\.\-])+\/(([\w\d\_\.\-])+?\/)*'\ r'\s*?([\w\d\_\.\-]\s?){1,10}\s*)$')) return plist def _compilePatterns(self, plist): """Compile regex patterns. Return mapping object containing patterns and replacement strings for each pattern""" ptns = [] for x in plist: ptns.append(re.compile(x, re.I+re.UNICODE)) return ptns class SpaceNullifier(TextLineTransformer): """Class to remove all spaces from a text string""" def __init__(self): """Initialise space chopping pattern""" self.ptn = re.compile(unicode(r'\s+'), re.UNICODE) self.rep = u'' def processLine(self, line): """Perform the act of chopping spaces from a line. Return line with no spaces in it""" newLine = line if type(newLine) is str or type(newLine) is unicode: newLine = self.ptn.sub(self.rep, line) return newLine class MultispaceTruncator(TextLineTransformer): """Class to transform multiple spaces into a single space""" def __init__(self): """Initialise space detection pattern""" self.ptn = re.compile(unicode(r'\s{2,}'), re.UNICODE) self.rep = u' ' def processLine(self, line): """Perform the act of detecting and replacing multiple spaces""" newLine = line if type(newLine) is str or type(newLine) is unicode: newLine = self.ptn.sub(self.rep, line) return newLine class Document: """Abstract class Representing a fulltext document in the system""" def __init__(self, newDocBody = [], filepath = None): """Initialise state of a document object""" self._content = [] if filepath is not None: self._file_readlines(filepath) elif type(newDocBody) is list or type(newDocBody) is str or type(newDocBody) is unicode: self.appendData(newDocBody) def _file_readlines(self, fname): try: fh=open("%s" % (fname,), "r") for line in fh: self._content.append(line.decode("utf-8")) fh.close() except IOError: sys.stderr.write("""E: Failed to read in file "%s".\n""" % (fname,)) except ValueError: sys.stderr.write("""E: Failed to read in file "%s".\n""" % (fname,)) def displayDocument(self): """Abstract: Display the Document""" pass def appendData(self, newData): """Add a text line to a TextDocument object""" if type(newData) is list: for line in newData: self._content.append(line) elif type(newData) is str or type(newData) is unicode: self._content.append(newData) def isEmpty(self): """Return 1 if self._content is empty; 0 if not""" return (len(self._content) < 1) class TextDocument(Document): """Concrete class representing a TextDocument - effectively a list of Strings of plaintext""" def __init__(self, newDocBody = [], filepath = None): """Initialise a TextDocument object""" Document.__init__(self, newDocBody, filepath) def displayDocument(self): for i in self._content: print i.encode("utf-8") def getReferences(self, start, end): """Get the reference section lines, put them into a ReferenceSectionRebuilder object, ask it to rebuild the lines, and return the resulting ReferenceSection object """ startIdx = None if start.firstLineIsTitleAndMarker(): # Title on same line as 1st ref- take title out! t = start.getTitleString() startIdx = start.getLineNum() newline = None sp = re.compile(unicode(r'^.*?')+t,re.UNICODE) newl = sp.split(self._content[startIdx],1) self._content[startIdx] = newl[1] elif start.titlePresent(): # Pass title startIdx = start.getLineNum()+1 else: startIdx = start.getLineNum() if type(end) is int: b = ReferenceSectionRebuilder(self._content[startIdx:end+1]) else: b = ReferenceSectionRebuilder() return b.getRebuiltLines(start) def findEndReferenceSection(self, refStart): """Find the line number of the end of a TextDocument's reference section. Should be passed a ReferenceSectionStartPoint object containing at least the start line of the reference section. Returns the reference section end line number details if success, None if not """ if refStart is None or refStart.getLineNum() is None: # No reference section start info! return None sectEnded = 0 x = refStart.getLineNum() if (type(x) is not int) or (x<0) or (x>len(self._content)) or (len(self._content)<1): # Cant safely find end of refs with this info - quit! return None # Get line test patterns: t_patterns = PostRefSecnTitleListCompiler().getCompiledPatternList() kw_patterns = PostRefSecnKWListCompiler().getCompiledPatternList() if refStart.markerCharPresent(): mk_patterns = CompiledPatternList([refStart.getMarkerPattern()]) else: mk_patterns = RefLineNumerationListCompiler().getCompiledPatternList() garbageDigit_pattern = re.compile(unicode(r'^\s*?([\+\-]?\d+?(\.\d+)?\s*?)+?\s*?$'),re.UNICODE) searcher=LineSearcher() while (x<len(self._content)) and (not sectEnded): end_match = searcher.findWithinLine(self._content[x], t_patterns) if end_match is None: end_match = searcher.findWithinLine(self._content[x], kw_patterns) if end_match is not None: # End reference section? Check within next 5 lines for other reference numeration markers y = x+1 lineFnd = 0 while (y<x+6) and (y<len(self._content)) and (not lineFnd): num_match=searcher.findWithinLine(self._content[y], mk_patterns) if num_match is not None and not num_match.group(0).isdigit(): lineFnd = 1 y = y + 1 if not lineFnd: # No ref line found-end section sectEnded = 1 if not sectEnded: # Does this & the next 5 lines simply contain numbers? If yes, it's probably the axis # scale of a graph in a fig. End refs section dm = garbageDigit_pattern.match(self._content[x]) if dm is not None: y = x + 1 digitLines = 4 numDigitLns = 1 while(y<x+digitLines) and (y<len(self._content)): dm = garbageDigit_pattern.match(self._content[y]) if dm is not None: numDigitLns = numDigitLns + 1 y = y + 1 if numDigitLns == digitLines: sectEnded = 1 x = x + 1 return x - 1 def extractReferences(self,no_rebuild = False): """Extract references from a TextDocument and return a ReferenceSection object""" # Try to remove pagebreaks, headers, footers self._removePageBoundaryInformation() # Find start of refs section: sectStart = self.findReferenceSection() if sectStart is None: # No references found sectStart = self.findReferenceSectionNoTitle() if sectStart is None: # No References refs = ReferenceSection() else: sectEnd = self.findEndReferenceSection(sectStart) if sectEnd is None: # No End to refs? Not safe to extract refs = ReferenceSection() else: # Extract refs = self.getReferences(sectStart, sectEnd) return refs def findReferenceSectionNoTitle(self): """Find the line number of the start of a TextDocument object's reference section by searching for the first reference line. Can only find reference sections with distinct line markers such as [1]. Returns a ReferenceSectionStartPoint object containing ref start line number & marker char, or the None type if nothing found """ refStartLine = refLineMarker = refStart = None if len(self._content) > 0: mk_patterns = FirstRefLineNumerationListCompiler().getCompiledPatternList() searcher = LineSearcher() p_blank = re.compile(unicode(r'^\s*$')) x = len(self._content)-1 foundSect = 0 while x >= 0 and not foundSect: mk_match = searcher.findAtStartLine(self._content[x], mk_patterns) if mk_match is not None and string.atoi(mk_match.group('num')) == 1: # Get mark recognition pattern: mk_ptn = mk_match.re.pattern # Look for [2] in next 10 lines: nxtTestLines = 10 y = x + 1 tmpFnd = 0 while y < len(self._content) and y < x+nxtTestLines and not tmpFnd: mk_match2=searcher.findAtStartLine(self._content[y], mk_patterns) if (mk_match2 is not None) and (string.atoi(mk_match2.group('num')) == 2) and (mk_match.group('left') == mk_match2.group('left')) and (mk_match.group('right') == mk_match2.group('right')): # Found next line: tmpFnd = 1 elif y == len(self._content) - 1: tmpFnd = 1 y = y + 1 if tmpFnd: foundSect = 1 refStartLine = x refLineMarker = mk_match.group('mark') refLineMarkerPattern = mk_ptn x = x - 1 if refStartLine is not None: # Make ReferenceSectionStartPoint object with ref section start location details refStart = ReferenceSectionStartPoint() refStart.setLineNum(refStartLine) refStart.setMarkerChar(refLineMarker) refStart.setMarkerPattern(refLineMarkerPattern) return refStart def findReferenceSection(self): """Find the line number of the start of a TextDocument object's reference section. Returns a 'ReferenceSectionStartPoint' object containing details of the reference section start line number, the reference section title & the marker char used for each reference line or returns None if not found """ refStartLine = refTitle = refLineMarker = refLineMarkerPattern = None refStart = titleMarkerSameLine = foundPart = None if len(self._content) > 0: t_patterns = RefSecnTitleListCompiler().getCompiledPatternList() mk_patterns = RefLineNumerationListCompiler().getCompiledPatternList() searcher = LineSearcher() p_blank = re.compile(unicode(r'^\s*$')) # Try to find refs section title: x = len(self._content)-1 foundTitle = 0 while x >= 0 and not foundTitle: title_match = searcher.findWithinLine(self._content[x], t_patterns) if title_match is not None: temp_refStartLine = x tempTitle = title_match.group('title') mk_wtitle_ptrns = RefLineNumerationListCompiler().getCompiledPatternList(tempTitle) mk_wtitle_match = searcher.findWithinLine(self._content[x], mk_wtitle_ptrns) if mk_wtitle_match is not None: mk = mk_wtitle_match.group('mark') mk_ptn = mk_wtitle_match.re.pattern p_num = re.compile(unicode(r'(\d+)')) m_num = p_num.search(mk) if m_num is not None and string.atoi(m_num.group(0)) == 1: # Mark found. foundTitle = 1 refTitle = tempTitle refLineMarker = mk refLineMarkerPattern = mk_ptn refStartLine=temp_refStartLine titleMarkerSameLine = 1 else: foundPart = 1 refStartLine = temp_refStartLine refLineMarker = mk refLineMarkerPattern = mk_ptn refTitle = tempTitle titleMarkerSameLine = 1 else: try: y = x + 1 # Move past blank lines m_blank = p_blank.match(self._content[y]) while m_blank is not None and y < len(self._content): y = y+1 m_blank = p_blank.match(self._content[y]) # Is this line numerated like a reference line? mark_match = searcher.findAtStartLine(self._content[y], mk_patterns) if mark_match is not None: # Ref line found. What is it? titleMarkerSameLine=None mark = mark_match.group('mark') mk_ptn = mark_match.re.pattern p_num = re.compile(unicode(r'(\d+)')) m_num = p_num.search(mark) if m_num is not None and string.atoi(m_num.group(0)) == 1: # 1st ref truly found refStartLine = temp_refStartLine refLineMarker = mark refLineMarkerPattern = mk_ptn refTitle = tempTitle foundTitle = 1 elif m_num is not None and m_num.groups(0) != 1: foundPart = 1 refStartLine = temp_refStartLine refLineMarker = mark refLineMarkerPattern = mk_ptn refTitle = tempTitle else: if foundPart: foundTitle = 1 else: foundPart = 1 refStartLine = temp_refStartLine refTitle=tempTitle refLineMarker = mark refLineMarkerPattern = mk_ptn else: # No numeration if foundPart: foundTitle = 1 else: foundPart = 1 refStartLine = temp_refStartLine refTitle=tempTitle except IndexError: # References section title was on last line for some reason. Ignore pass x = x - 1 if refStartLine is not None: # Make ReferenceSectionStartPoint object with ref # section start location details refStart = ReferenceSectionStartPoint() refStart.setLineNum(refStartLine) refStart.setTitleString(refTitle) refStart.setMarkerChar(refLineMarker) refStart.setMarkerPattern(refLineMarkerPattern) if titleMarkerSameLine is not None: refStart.setTitleMarkerSameLine() return refStart def _removePageBoundaryInformation(self): """Locate page breaks, headers and footers within the doc body. remove them when found""" numHeadLn = numFootLn = 0 pageBreaks = [] # Make sure document not just full of whitespace: if not self.documentContainsText(): return 0 # Get list of index posns of pagebreaks in document: pageBreaks = self.getDocPageBreakPositions() # Get num lines making up each header if poss: numHeadLn = self.getHeadLines(pageBreaks) # Get num lines making up each footer if poss: numFootLn = self.getFootLines(pageBreaks) # Remove pagebreaks,headers,footers: self.chopHeadFootBreaks(pageBreaks, numHeadLn, numFootLn) def getheadFootWordPattern(self): """Regex pattern used to ID a word in a header/footer line""" return re.compile(unicode(r'([A-Za-z0-9-]+)'),re.UNICODE) def getHeadLines(self, breakIndices = []): """Using list of indices of pagebreaks in document, attempt to determine how many lines page headers consist of""" remainingBreaks = (len(breakIndices) - 1) numHeadLns = emptyLine = 0 p_wordSearch = self.getheadFootWordPattern() if remainingBreaks > 2: if remainingBreaks > 3: # Only check odd page headers nxtHead = 2 else: # Check headers on each page nxtHead = 1 keepChecking = True while keepChecking: curBreak = 1 #m_blankLineTest = p_wordSearch.search(self._content[(breakIndices[curBreak]+numHeadLns+1)]) m_blankLineTest = re.compile(u'\S',re.UNICODE).search(self._content[(breakIndices[curBreak]+numHeadLns+1)]) if m_blankLineTest == None: # Empty line in header: emptyLine = 1 if (breakIndices[curBreak]+numHeadLns+1) == (breakIndices[(curBreak + 1)]): # Have reached next pagebreak: document has no body - only head/footers! keepChecking = False grps_headLineWords = p_wordSearch.findall(self._content[(breakIndices[curBreak]+numHeadLns+1)]) curBreak = curBreak + nxtHead while (curBreak < remainingBreaks) and keepChecking: grps_thisLineWords = p_wordSearch.findall(self._content[(breakIndices[curBreak]+numHeadLns+1)]) if emptyLine: if len(grps_thisLineWords) != 0: # This line should be empty, but isnt keepChecking = False else: if (len(grps_thisLineWords) == 0) or (len(grps_headLineWords) != len(grps_thisLineWords)): # Not same num 'words' as equivilent line in 1st header: keepChecking = False else: keepChecking = self.checkBoundaryLinesSimilar(grps_headLineWords, grps_thisLineWords) # Update curBreak for nxt line to check curBreak = curBreak + nxtHead if keepChecking: # Line is a header line: check next numHeadLns = numHeadLns+1 emptyLine = 0 return numHeadLns def getFootLines(self, breakIndices = []): """Using list of indices of pagebreaks in document, attempt to determine how many lines page footers consist of""" numBreaks = (len(breakIndices)) numFootLns = 0 emptyLine = 0 keepChecking = 1 p_wordSearch = self.getheadFootWordPattern() if numBreaks > 2: while keepChecking: curBreak = 1 #m_blankLineTest = p_wordSearch.match(self._content[(breakIndices[curBreak]-numFootLns-1)]) m_blankLineTest = re.compile(u'\S',re.UNICODE).search(self._content[(breakIndices[curBreak] - numFootLns - 1)]) if m_blankLineTest == None: emptyLine = 1 grps_headLineWords = p_wordSearch.findall(self._content[(breakIndices[curBreak]-numFootLns-1)]) curBreak=curBreak + 1 while (curBreak < numBreaks) and keepChecking: grps_thisLineWords = p_wordSearch.findall(self._content[(breakIndices[curBreak] - numFootLns - 1)]) if emptyLine: if len(grps_thisLineWords) != 0: keepChecking = 0 else: if (len(grps_thisLineWords) == 0) or (len(grps_headLineWords) != len(grps_thisLineWords)): keepChecking = 0 else: keepChecking = self.checkBoundaryLinesSimilar(grps_headLineWords, grps_thisLineWords) curBreak = curBreak + 1 if keepChecking: numFootLns = numFootLns+1 emptyLine = 0 return numFootLns def chopHeadFootBreaks(self, breakIndices = [], headLn = 0, footLn = 0): """Remove document lines containing breaks, headers, footers""" numBreaks = len(breakIndices) pageLens = [] for x in range(0,numBreaks): if x < numBreaks - 1: pageLens.append(breakIndices[x + 1] - breakIndices[x]) pageLens.sort() if (len(pageLens) > 0) and (headLn+footLn+1 < pageLens[0]): # Safe to chop hdrs & ftrs breakIndices.reverse() first = 1 for i in range(0, len(breakIndices)): # Unless this is the last page break, chop headers if not first: for j in range(1,headLn+1): self._content[breakIndices[i]+1:breakIndices[i]+2] = [] else: first = 0 # Chop page break itself self._content[breakIndices[i]:breakIndices[i]+1] = [] # Chop footers (unless this is the first page break) if i != len(breakIndices) - 1: for k in range(1,footLn + 1): self._content[breakIndices[i] - footLn:breakIndices[i] - footLn + 1] = [] def checkBoundaryLinesSimilar(self, l_1, l_2): """Compare two lists to see if their elements are roughly the same""" numMatches = 0 if (type(l_1) != list) or (type(l_2) != list) or (len(l_1) != len(l_2)): return False p_int = re.compile(unicode(r'^(\d+)$')) for i in range(0,len(l_1)): m_int1 = p_int.match(l_1[i]) m_int2 = p_int.match(l_2[i]) if(m_int1 != None) and (m_int2 != None): numMatches=numMatches+1 else: l1_str = l_1[i].lower() l2_str = l_2[i].lower() if (l1_str[0] == l2_str[0]) and (l1_str[len(l1_str) - 1] == l2_str[len(l2_str) - 1]): numMatches=numMatches+1 if (len(l_1) == 0) or (float(numMatches)/float(len(l_1)) < 0.9): return False else: return True def getDocPageBreakPositions(self): """Locate page breaks in the list of document lines and make a list of their indices to be returned""" pageBreaks = [] p_break = re.compile(unicode(r'^\s*?\f\s*?$'),re.UNICODE) for i in range(len(self._content)): if p_break.match(self._content[i]) != None: pageBreaks.append(i) return pageBreaks def documentContainsText(self): """Test whether document contains text, or is just full of worthless whitespace. Return 1 if has text, 0 if not""" foundWord = False p_word = re.compile(unicode(r'\S+')) for i in self._content: if p_word.match(i) != None: foundWord = True break return foundWord class Ps2asciiEncodedTextDocument(Document): """Text document that is encoded with PS coordinate information. This type of document is result of a ps2ascii conversion""" class Ps2asciiOutputLine: """Represents a line from a ps2ascii conversion""" def __init__(self, posx, posy, content, diffx): """Initialise a dataline's state""" self._posnX = self._posnY = 0 self._dataContent = '' self._diff_posnX = 0 self.setPosX(int(posx)) self.setPosY(int(posy)) self.setText(content) self.setDiffPosX(int(diffx)) def setPosX(self, x): """Set posnX value for a Ps2asciiOutputLine object""" self._posnX = x def setPosY(self, y): """Set posnY value for a Ps2asciiOutputLine object""" self._posnY = y def setText(self, data): """Set dataContent value for Ps2asciiOutputLine object""" self._dataContent = data def setDiffPosX(self, dpx): """Set diff_posnX value for a Ps2asciiOutputLine object""" self._diff_posnX = dpx def getPosX(self): """Return the posnX value for a Ps2asciiOutputLine object""" return self._posnX def getPosY(self): """Return the posnY value for a Ps2asciiOutputLine object""" return self._posnY def getText(self): """Return a cleaned up version of the dataContent in this Ps2asciiOutputLine object""" return self._dataContent def getDiffPosX(self): """Return the diff_posnX value for a Ps2asciiOutputLine object""" return self._diff_posnX def isNewLine(self, previousLine): """Check the positional coordinates of this line with those of the supplied Ps2asciiOutputLine object to determine whether this is a new line. Return 1 if yes, or 0 if no """ if (self.getPosX() <= previousLine.getPosX()) and (self.getPosY() != previousLine.getPosY()): return 1 else: return 0 def isSpaceSeparated(self, posnxEst): """Return 1 if the text in this Ps2asciiOutputLine object should be separated from that in a previous Ps2asciiOutputLine object, as determined by an X position estimate (posnxEst). Return 0 if not """ if (self.getPosX() > (posnxEst + 7)): return 1 else: return 0 def __init__(self, newDocBody = []): Document.__init__(self, newDocBody) def convertToPlainText(self): """Tell a Ps2asciiEncodedTextDocument to convert itself to convert itself to pure plaintext. Returns TextDocument object""" # Converted document: plaintextContent = [] tempLine = '' # Fictitious old line to compare with 1st line: oldRawLine = self.Ps2asciiOutputLine(9999,9999,"",0) posnxEst = 9999 for line in self._content: curRawLine = self.getDataLine(line) if curRawLine != None: # Find out if this a new line or a continuation of the last line if curRawLine.isNewLine(oldRawLine): # Append previous full line: plaintextContent.append(self.prepareLineForAppending(tempLine)) # Start a new line buffer: tempLine = curRawLine.getText() else: # Not new line: concat with last line if curRawLine.isSpaceSeparated(posnxEst): tempLine = tempLine+' '+curRawLine.getText() else: tempLine = tempLine+curRawLine.getText() posnxEst = (curRawLine.getPosX() + curRawLine.getDiffPosX()) oldRawLine = curRawLine # Append very last line to list: plaintextContent.append(self.prepareLineForAppending(tempLine)) # Remove first, empty cell from list: plaintextContent[0:1] = [] # Make a TextDocument with the newly converted text content and return it: return TextDocument(plaintextContent) def getDataLine(self, rawLine): """Take a raw line from ps2ascii, and put its components into a Ps2asciiOutputLine object""" idPattern = re.compile(r'^S\s(?P<posnX>\d+)\s(?P<posnY>\d+)\s\((?P<content>.*)\)\s(?P<diff_posnX>\d+)$') match = idPattern.search(rawLine) if match != None: return self.Ps2asciiOutputLine(match.group('posnX'), match.group('posnY'), match.group('content'), match.group('diff_posnX')) else: return None def prepareLineForAppending(self, line): """Prepare the contents of a plaintext line which has been rebuilt from Ps2asciiOutputLine(s) to be appended to the list of plaintext lines which make up the plaintext document Test its contents: if all whitespace, but not formfeed, return an empty line; if contains non-whitespace or a formfeed, return the line as is """ # Clean line to append of control codes: line = self.cleanLine(line) ep = re.compile('\S') em = ep.match(line) if em == None: fp = re.compile('^ *\f *$') fm = fp.match(line) if fm == None: line = '' return line def cleanLine(self, line): """Clean a line of text of the messy character codes that ps2ascii adds during conversion""" # Correct escaped parentheses p = re.compile(r'\\\(') line = p.sub('(', line) p = re.compile(r'\\\)') line = p.sub(r')', line) # Correct special symbols p = re.compile(r'\\\\') line = p.sub('', line) p = re.compile('\n') line = p.sub(r' ', line) # Change '\013' to 'ff' (ps2ascii messes this up) p = re.compile(r'\\013') line = p.sub('ff', line) # Change '\017' (bullet point) into '*' p = re.compile(r'\\017') line = p.sub('*', line) # Change '\003' into '*' p = re.compile(r'\\003') line = p.sub('', line) # Change '\\f' to 'fi' (ps2ascii messes this up) p = re.compile(r'\\f') line = p.sub('fi', line) # Remove page numbers: p = re.compile('\{\s\d+\s\{') line = p.sub(r'', line) # Correct Hyphens: p = re.compile('\{') line = p.sub('-', line) return line def displayDocument(self): """Let Ps2asciiEncodedTextDocument display itself on standard output stream""" for i in self._content: print i class ReferenceSectionStartPoint: """Concrete class to hold information about the start line of a document's reference section (e.g. line number, title, etc)""" def __init__(self): self._lineNum = self._title = self._lineMarkerPresent = None self._haveMarkerRegex = self._markerChar = self._markerRegexPattern = self._markerTitleSameLine=None def setLineNum(self, num): """Set the line number of the references section start""" self._lineNum = num def setTitleString(self, t): """Set the title string for the references section start""" self._title = t def setMarkerChar(self, m): """Set the marker char for the references section start""" if m is not None and (type(m) is str or type(m) is unicode): self._markerChar = m self._lineMarkerPresent = 1 else: self._markerChar = None self._lineMarkerPresent = 0 def setMarkerPattern(self, p): """Set the regex pattern for the start of the first reference line""" if p is not None and (type(p) is str or type(p) is unicode): self._markerRegexPattern = p self._haveMarkerRegex = 1 else: self._markerRegexPattern = None self._haveMarkerRegex = 0 def setTitleMarkerSameLine(self): """Set a flag to say that the first reference line contains both a title and the first line""" self._markerTitleSameLine = 1 def getLineNum(self): """Return the line number of the references section start""" return self._lineNum def getTitleString(self): """Return the title string for the references section start if there is one, else it will be None""" return self._title def firstLineIsTitleAndMarker(self): """Return 1 if the first reference line contains both reference section title & first line numeration marker""" if self._markerTitleSameLine is not None: return True else: return False def titlePresent(self): """Return 1 if there is a title present in the first reference line, 0 if not""" if self._title is not None: return True else: return False def markerCharPresent(self): """Return 1 if there is a marker char, 0 if not""" if self._lineMarkerPresent: return True else: return False def markerPatternPresent(self): """Return 1 if there is a marker regex pattern, 0 if not""" if self._haveMarkerRegex: return True else: return False def getMarkerChar(self): """Return the marker char for the reference section start if there is one, else it will be None""" return self._markerChar def getMarkerPattern(self): return self._markerRegexPattern class ReferenceSectionRebuilder: """Concrete class whose job is to rebuild broken reference lines. Contains a list of Strings. Each String in this list represents the contents of either a complete reference line or part of a reference line. When a document is converted from its original format to plaintext, lines are often broken because the converter cant distinguish between wrapped lines and new lines. Objects of this class can be used to try to rebuild broken reference lines and create a 'ReferenceSection' object """ def __init__(self, lines = []): """Initialise a ReferenceSectionRebuilder object with a list of 'broken' reference lines""" if type(lines) is list: self._dataLines = lines elif type(lines) is str or type(lines) is unicode: self._dataLines.append(lines) else: self._dataLines = [] def getRebuiltLines(self, refStartInfo): """Trigger reference lines rebuilding process & return ReferenceSection object containing rebuilt ReferenceLine objects""" # Ensure we have a real 'ReferenceSectionStartPoint' try: getLineNum = refStartInfo.getLineNum except AttributeError: return ReferenceSection() self._removeLeadingGarbageLines() numatnInfo = self._getLineNumerationStyle(refStartInfo) return ReferenceSection(self._rebuild(numatnInfo)) def _testBlankLineRefSeparators(self): """Test to see if reference lines are separated by blank lines so that these can be used to rebuild reference lines""" p_ws = re.compile(unicode(r'^\s*$'),re.UNICODE) numblank = 0 # No blank lines fnd between non-blanks numline = 0 # No ref lines separated by blanks blankLnSep = 0 # Flag to indicate if blanks lines separate ref lines multi_nonblanks_fd = 0 # Flag to indicate if multiple nonblank lines are found together (used because # if line is dbl-spaced, it isnt a blank that separates refs & cant be relied upon) x = 0 max = len(self._dataLines) while x < max: m_ws = p_ws.search(self._dataLines[x]) if m_ws is None: # ! empty line numline = numline+1 x = x + 1 # Move past line while x < len(self._dataLines) and p_ws.search(self._dataLines[x]) is None: multi_nonblanks_fd=1 x = x + 1 x = x - 1 else: # empty line numblank = numblank + 1 x = x + 1 while x< len(self._dataLines) and p_ws.search(self._dataLines[x]) is not None: x = x + 1 if x == len(self._dataLines): # Blanks at end doc: dont count numblank = numblank-1 x = x - 1 x = x + 1 # Now from data of num blank lines & num text lines, if numline>3, & numblank=numline or numblank=numline-1 # then we hav blank line separators between ref lines if (numline > 3) and ((numblank == numline) or (numblank == numline - 1)) and (multi_nonblanks_fd): blankLnSep = 1 return blankLnSep def _rebuild(self, refNum): """Based on whether a reference line numeration pattern was found, either have the reference lines rebuild by the identification of marker characters, or join all lines together if no numeration was found """ # Private internal function def cleanAndAppendToRefsList(transformers, refList, line): """Before appending to list, process line with 'TextLineTransformers'""" for x in transformers: line = x.processLine(line) sp = re.compile(unicode(r'^\s*$'),re.UNICODE) if sp.match(line) is None: refList.append(line) rebuilt = [] lineTrans = [] tl = u'' # List of line transformers to clean up line: lineTrans.append(URLRepairer()) lineTrans.append(EscapeSequenceTransformer()) lineTrans.append(MultispaceTruncator()) if refNum is None or (type(refNum) is not str and type(refNum) is not unicode): if self._testBlankLineRefSeparators(): # Use blank lines to separate ref lines refNum = unicode(r'^\s*$') else: # No ref line dividers: unmatchable pattern refNum = unicode(r'^A$^A$$') p_refNum = re.compile(refNum,re.I|re.UNICODE) p_leadingws = re.compile(unicode(r'^\s+')) p_trailingws = re.compile(unicode(r'\s+$')) for x in range(len(self._dataLines)-1,-1,-1): tstr = p_leadingws.sub(u'',self._dataLines[x]) tstr = p_trailingws.sub(u'',tstr) m = p_refNum.match(tstr) if m is not None: # Ref line start marker if tstr == '': # Blank line to separate refs tl = p_trailingws.sub(u'',tl) cleanAndAppendToRefsList(lineTrans, rebuilt, tl) tl = u'' else: if tstr[len(tstr)-1] == u'-' or tstr[len(tstr)-1] == u' ': tl = tstr + tl else: tl = tstr + u' ' + tl tl = p_trailingws.sub(u'',tl) cleanAndAppendToRefsList(lineTrans, rebuilt, tl) tl = u'' else: if tstr != u'': # Continuation of line if tstr[len(tstr) - 1] == u'-' or tstr[len(tstr) - 1] == u' ': tl = tstr + tl else: tl = tstr + u' ' + tl if tl != u'': # Append last line tl = p_trailingws.sub(u'',tl) cleanAndAppendToRefsList(lineTrans, rebuilt, tl) rebuilt.reverse() d=self._testAndCorrectRebuiltLines(rebuilt, p_refNum) if d is not None: rebuilt = d return rebuilt def _testAndCorrectRebuiltLines(self, rebuiltlines, p_refmarker): """EXPERIMENTAL METHOD. Try to correct any rebuild reference lines that have been given a bad reference number at the start. Needs testing.""" fixed = [] unsafe = False try: m = p_refmarker.match(rebuiltlines[0]) last_marknum = int(m.group("marknum")) if last_marknum != 1: return None # Even the first mark isnt 1 - probaby too dangerous to try to repair except IndexError: return None # Either no references or not a "numbered line marker" - cannot do anything except AttributeError: return None # No reference line marker (i.e. NoneType because couldn't match marker) - cannot do anything fixed.append(rebuiltlines[0]) try: for x in range(1,len(rebuiltlines)): m = p_refmarker.match(rebuiltlines[x]) try: if int(m.group("marknum")) == last_marknum + 1: # All is well fixed.append(rebuiltlines[x]) last_marknum += 1 continue elif len(string.strip(rebuiltlines[x][m.end():])) == 0: # this line consists of a number only. And it is not a coorrect marker. Add it to the last line: fixed[len(fixed) - 1] += rebuiltlines[x] continue else: # Problem maybe. May have taken some of the last line into this line. Can we find the next marker in this line? m_fix = p_refmarker.search(rebuiltlines[x]) if m_fix is not None and int(m_fix.group("marknum")) == last_marknum + 1: m_fix_test = re.match(u"%s\s*[A-Z]"%(m_fix.group(),)) if m_fix_test is not None: movesect = rebuiltlines[x][0:m_fix.start()] rebuiltlines[x] = rebuiltlines[x][m_fix.start():] fixed[len(fixed) - 1] += movesect fixed.append(rebuiltlines[x]) else: unsafe = True break else: unsafe = True break except AttributeError: # This line does not have a line marker at the start! This line shall be added to the end of the previous line. fixed[len(fixed) - 1] += rebuiltlines[x] continue except IndexError: unsafe = True if unsafe: return None else: return fixed def _getLineNumerationStyle(self, refStartInfo): """Try to determine the numeration marker style for the reference lines""" mkregex = None if refStartInfo.markerPatternPresent(): mkregex = refStartInfo.getMarkerPattern() return mkregex def _removeLeadingGarbageLines(self): """Sometimes, the first lines of the extracted references are completely blank or email addresses. These must be removed as they are not references""" p_emptyline = re.compile(unicode(r'^\s*$'),re.UNICODE) p_email = re.compile(unicode(r'^\s*e\-?mail'),re.UNICODE) while (len(self._dataLines)>0) and (p_emptyline.match(self._dataLines[0]) is not None or p_email.match(self._dataLines[0]) is not None): self._dataLines[0:1] = [] class DocumentConverter: """Abstract Class representing a document format conversion tool which converts a document from one format to another """ def convertDocument(self, toConvert): """Document Conversion Method - returns a Document object""" pass def checkConvertFile(self, filePath): """Check that the file to convert is usable""" pass class OSDependentDocumentConverter(DocumentConverter): """ABSTRACT CLASS: Represents a document conversion tool which is a separate program which needs to be executed via a call to the shell below """ def __init__(self): self._converterSessionLink = self._convertCommand = '' def setConvertCommand(self, filePath): """ABSTRACT METHOD: Set the shell command used for calling the converter application. Declared abstract because it differs according to which specific application is used """ pass def getConvertCommand(self): """Return the shell command by which the conversion application is called""" return self._convertCommand def openConverterSession(self): """Open a session with the shell 'converter' application""" if self._converterSessionLink is file: self._converterSessionLink.close() self._converterSessionLink = "" self._converterSessionLink = os.popen(self.getConvertCommand(),'r') def closeConverterSession(self): """Close session with the shell 'converter' application""" if self._converterSessionLink is file: self._converterSessionLink.close() self._converterSessionLink = "" def getConversionResult(self): """Return list of lines from shell conversion session""" return self._converterSessionLink.readlines() class PDFtoTextDocumentConverter(OSDependentDocumentConverter): """Converts PDF documents to ASCII plaintext documents""" def __init__(self): """Initialise PDFtoTextDocumentConverter object""" OSDependentDocumentConverter.__init__(self) self._applicationPath = '' self.setApplicationPath(cfg_refextract_pdftotext) def setApplicationPath(self, newPath): """Set path to conversion application""" self._applicationPath = newPath def getApplicationPath(self): """Return the path to the conversion application""" return self._applicationPath def setConvertCommand(self, filePath): """Set up the command by which to call pdftotext application""" self._convertCommand = self.getApplicationPath() + ' -raw -q -enc UTF-8 ' + filePath + ' -' def getConversionResult(self): mylines = [] for line in self._converterSessionLink: mylines.append(line.decode("utf-8")) return mylines def convertDocument(self, toConvert): """Perform a conversion from PDF to text, returning the document contents as a TextDocument object""" if self._canAccessConvertFile(toConvert): self.setConvertCommand(toConvert) self.openConverterSession() convRes = self.getConversionResult() self.closeConverterSession() if self._conversionIsBad(convRes): # Bad conversion: empty document textDoc = TextDocument() else: # Good conversion textDoc = TextDocument(convRes) else: textDoc = TextDocument() return textDoc def _conversionIsBad(self, convertedLines): """Sometimes pdftotext performs a bad conversion which consists of many spaces and garbage characters. This method takes a list of strings obtained from a pdftotext conversion and examines them to see if they are likely to be the result of a bad conversion. Returns 1 if bad conversion, 0 if not """ # Numbers of 'words' and 'whitespaces' found in document: numWords = numSpaces = 0 # whitespace line pattern: ws_patt = re.compile(unicode(r'^\s+$'),re.UNICODE) # whitespace character pattern: p_space = re.compile(unicode(r'(\s)'),re.UNICODE) # non-whitespace 'word' pattern: p_noSpace = re.compile(unicode(r'(\S+)'),re.UNICODE) for line in convertedLines: numWords = numWords + len(p_noSpace.findall(line)) numSpaces = numSpaces + len(p_space.findall(line)) if numSpaces >= (numWords * 3): # Too many spaces - probably bad conversion return True else: return False def _canAccessConvertFile(self, filePath): """Check that the path to the file to convert really exists and is readable by the shell""" if os.access(filePath, os.R_OK): return True else: return False class PS2AsciiDocumentConverter(OSDependentDocumentConverter): """Converts PS documents to ASCII plaintext documents""" def __init__(self): """Initialise PS2AsciiDocumentConverter object""" OSDependentDocumentConverter.__init__(self) self._catAppPath = self._gunzipAppPath = self._gsAppPath = '' self.setCATapplicationPath(cfg_refextract_cat) self.setGUNZIPapplicationPath(cfg_refextract_gunzip) self.setGSapplicationPath(cfg_refextract_gs) def setCATapplicationPath(self, catAppPath): """Set the path to the 'cat' application, used in conversion""" self._catAppPath = catAppPath def setGUNZIPapplicationPath(self, gunzipAppPath): """Set the path to the 'gunzip' application, used in conversion if the PS file has been zipped""" self._gunzipAppPath = gunzipAppPath def setGSapplicationPath(self, gsAppPath): """Set the path to the 'GhostScript' application, which is the means of calling 'ps2ascii'""" self._gsAppPath = gsAppPath def getCATapplicationPath(self): """Return the path to 'cat' as a string""" return self._catAppPath def getGUNZIPapplicationPath(self): """Return the path to 'gunzip' as a string""" return self._gunzipAppPath def getGSapplicationPath(self): """Return the path to 'gs' as a string""" return self._gsAppPath def setUnzippedPSConvertCommand(self, filePath): """Set converter command for unzipped PS file conversion""" self._convertCommand = self.getCATapplicationPath() + " " + filePath + " | " + self.getGSapplicationPath() + " -q -dNODISPLAY -dNOBIND -dWRITESYSTEMDICT -c save -f ps2ascii.ps - -c quit" def setZippedPSConvertCommand(self, filePath): """Set converter command for zipped PS file conversion""" self._convertCommand = self.getGUNZIPapplicationPath() + " -c " + filePath + " | " + self.getGSapplicationPath() + " -q -dNODISPLAY -dNOBIND -dWRITESYSTEMDICT -c save -f ps2ascii.ps - -c quit" def setConvertCommand(self, filePath): """Set up the shell command by which to call applications needed to perform the conversion""" if re.search(r'(\w{2})$', filePath).group(0) == "ps": self.setUnzippedPSConvertCommand(filePath) else: self.setZippedPSConvertCommand(filePath) def _canAccessConvertFile(self, filePath): """Check that the path to the file to convert really exists and is readable by the shell""" if os.access(filePath, os.R_OK): return True else: return False def _correctConvertFileName(self, filename): """Strip file extension from filename & replace with '.ps' or '.ps.gz' depending on which exists. If neither exist, replace with no extension """ regexPattern = re.compile(r'(?P<fname>.*?)(\.\w+)?$') match = regexPattern.search(filename) name = match.group('fname') if self._canAccessConvertFile(name+'.ps'): name = name + '.ps' else: name = name + '.ps.gz' return name def convertDocument(self, toConvert): """This method performs a conversion from PS to text. If the file 'toConvert' exists and can be converted, a TextDocument object is returned. If not, then an empty TextDocument is returned""" toConvert = self._correctConvertFileName(toConvert) if self._canAccessConvertFile(toConvert): self.setConvertCommand(toConvert) self.openConverterSession() ps2asciiDoc = Ps2asciiEncodedTextDocument(self.getConversionResult()) # Convert the ps2asciiDoc to plaintext: textDoc = ps2asciiDoc.convertToPlainText() self.closeConverterSession() else: textDoc = TextDocument() return textDoc class BadKBLineError(Exception): """Exception thrown if a line in the periodicals knowledge base does not comply with the expected format""" pass class KnowledgeBase: """The knowledge base of periodical titles. Consists of search & replace terms. The search terms consist of non-standard periodical titles in upper case. These are often found in the text of documents. Replacement terms consist of standardised periodical titles in a standardised case. These will be used to replace identified non-standard titles """ def __init__(self, fn = None): self._kb = {} self._compiledPatternsKB = {} self._unstandardisedTitle = {} if type(fn) is str: self._buildKB(fn) def _buildKB(self, fn): """From the filename provided (fn), read the periodicals knowledge base into memory, and build a dictionary of seek/replace values to be stored in self._kb""" def _mychop(line): if line[:-1] == u'\n': line = line[:-1] return line try: fh=open(fn, 'r') p_kbLine = re.compile(unicode('^\s*(?P<seek>\w.*?)\s*---\s*(?P<repl>\w.*?)\s*$'),re.UNICODE) for x in fh: y = x.decode("utf-8") y = _mychop(y) m_kbLine = p_kbLine.search(y) if m_kbLine is None: raise BadKBLineError() if len(m_kbLine.group('seek')) > 1: # Only add KB line if the search term is more than 1 char in length self._kb[m_kbLine.group('seek')] = m_kbLine.group('repl') tmp_ptn = re.compile(unicode(r'\b(') + re.escape(m_kbLine.group('seek')) + unicode(r')[^A-Z0-9]'), re.UNICODE) self._compiledPatternsKB[tmp_ptn] = m_kbLine.group('repl') self._unstandardisedTitle[tmp_ptn] = m_kbLine.group('seek') fh.close() except IOError: sys.exit('E: Cannot Open Knowledge Base File "%s".' % fn) except (BadKBLineError, AttributeError): sys.exit('E: Unexpected Line in Knowledge Base "%s".' % fn) def display(self): """Display the contents of the KB on the standard output stream""" print u"Knowledge Base Contents:" for x in self._kb.keys(): sys.stdout.write("Search Term: '%s';\t\tReplace Term: '%s'\n" % (x.encode("utf-8"), (self._kb[x]).encode("utf-8"))) def findPeriodicalTitles(self, ln): """Identify periodical titles in text line 'ln' and record information about where in the line they occur. Replace them for lower-case versions or lowercase letter 'a's if the match was numerical. Return a Tuple containing dictionaries containing information about the substitutions, along with the new line """ def _bytitlelen(a, b): (aa,bb) = (self._unstandardisedTitle[a],self._unstandardisedTitle[b]) if len(aa) < len(bb): return 1 elif len(aa) == len(bb): return 0 else: return -1 def _byLen(a, b): (aa,bb) = (a.pattern,b.pattern) if len(aa) < len(bb): return 1 elif len(aa) == len(bb): return 0 else: return -1 foundMatch = False title_match_len = {} title_match_txt = {} kb_keys = self._compiledPatternsKB.keys() kb_keys.sort(_bytitlelen) word_ptn = re.compile(unicode(r'^[ A-Z-a-z]+$'),re.UNICODE) for t_ptn in kb_keys: matches_iter = t_ptn.finditer(ln) # Record dets of each match: for m in matches_iter: # Record match info title_match_len[m.start()] = (len(m.group(0)) - 1) title_match_txt[m.start()] = self._unstandardisedTitle[t_ptn] # Replace matched txt in line with lowercase version (or n*'_' where n is len of match) rep_str = m.group(1) word_mtch = word_ptn.search(rep_str) if word_mtch is None: # None alpha/whitespace chars rep_str = u'_'*len(rep_str) else: # Words rep_str = rep_str.lower() ln = u''.join([ln[0:m.start(1)],rep_str,ln[m.end(1):]]) if len(title_match_len) > 0: foundMatch = True return (title_match_len, title_match_txt, ln, foundMatch) def __getitem__(self, non_std_title): """Return the standardised title thought to be keyed by 'non_std_title'. Return None if not there""" try: return self._kb[non_std_title] except KeyError: return None class PreprintClassificationItem: def __init__(self, srch = '', repl = ''): self._srchStr, self._rpStr = srch, repl def setSearchString(self, sstr): self._srchStr = sstr def setReplString(self, repstr): self._rpStr = repstr def getSearchString(self): return self._srchStr def getReplString(self): return self._rpStr def getLength(self): return len(self._srchStr) r_str = property(fget = getReplString, fset = setReplString) s_str = property(fget = getSearchString, fset = setSearchString) length = property(fget = getLength) del setSearchString, setReplString, getSearchString, getReplString del getLength class Institute: def __init__(self, nm): self._name = nm self._preprintCatsList = [] self._numerationList = [] self._numerationRegex = "" self._preprintCatPatternsList = {} def setName(self, nm): self._name = nm def getName(self): return self._name def display(self): print u"----------------------" print u"Name: " + self._name.encode("utf-8") print u"Preprint Categories:" for x in self._preprintCatsList: print u"Search:", x.s_str.encode("utf-8"), u"Replace With:", x.r_str.encode("utf-8") print u"Numeration Styles List:" for x in self._numerationList: print x print u"Numeration Styles Regular expression List:" print self._numerationRegex print u"----------------------" def _getPatternLenList(self): """Make a copy of the list of numeration patterns for an Institute object. Return this new list""" nl = [] ccp = re.compile(unicode(r'\[[^\]]+\]'),re.UNICODE) for x in self._numerationList: # Remove the character class & append to newList nx = ccp.sub(u'1', x) nl.append((len(nx),x)) return nl def _createPattern(self, ptn): """Accept a user-defined search pattern, transform it, according to some simple rules, into a regex pattern, then compile and return it as a compiled RE object \ -> \\ 9 -> \d a -> [A-Za-z] mm -> (0[1-9]|1[0-2]) yy -> \d{2} yyyy -> [12]\d{3} / -> \/ """ # Make the search/replace patterns: s_r = [] s_r.append((re.compile(unicode(r'([^\]A-Za-z0-9\/\[ "])'),re.UNICODE), unicode(r'\\\g<1>'))) s_r.append((re.compile(u'9',re.UNICODE), unicode(r'\d'))) s_r.append((re.compile(u'a',re.UNICODE), unicode(r'[A-Za-z]'))) s_r.append((re.compile(u'mm',re.UNICODE), unicode(r'(0[1-9]|1[0-2])'))) s_r.append((re.compile(u'yyyy',re.UNICODE), unicode(r'[12]\d\d\d'))) s_r.append((re.compile(u'yy',re.UNICODE), unicode(r'\d\d'))) s_r.append((re.compile(unicode(r'\/'),re.UNICODE), unicode(r'\/'))) s_r.append((re.compile(unicode(r'\"([^"]+)\"'),re.UNICODE), unicode(r'\g<1>'))) s_r.append((re.compile(unicode(r' \[([^\]]+) \]'),re.UNICODE), unicode(r'( [\g<1>])?'))) for x in s_r: ptn = x[0].sub(x[1], ptn) return ptn def _makeOrderedPtns(self, ptns): """Using the list ordered by lengths, produce a list of ordered regex patterns""" p_list = u"" if len(ptns) > 0: p_list = u"(?P<numn>" for i in ptns: p_list += self._createPattern(i[1]) + u"|" p_list = p_list[0:len(p_list)-1] p_list += u")" return p_list def assignNumerationRegex(self): """Build the regex patterns for this institute's numeration styles""" def _my_cmpfunc(a,b): if a[0] < b[0]: return 1 elif a[0] == b[0]: return 0 else: return -1 # Remove user-defined character classes: lenPtns = self._getPatternLenList() lenPtns.sort(_my_cmpfunc) # Set own list of regex patterns: self._numerationRegex = self._makeOrderedPtns(lenPtns) ## def _makeOrderedPtnsList(self, ptns): p_list = [] if len(ptns) > 0: for p in ptns: p_itm = u"(?P<numn>"+self._createPattern(p[1])+u")" p_list.append(p_itm) return p_list def assignNumerationRegexList(self): """Build the regex patterns for this institute's numeration styles""" def _my_cmpfunc(a,b): if a[0] < b[0]: return 1 elif a[0] == b[0]: return 0 else: return -1 # Remove user-defined character classes: lenPtns = self._getPatternLenList() lenPtns.sort(_my_cmpfunc) # Set own list of regex patterns: self._numerationRegexList = self._makeOrderedPtnsList(lenPtns) def createTestPatternsList(self): def _my_cmpfunc(a,b): if a.length < b.length: return 1 elif a.length == b.length: return 0 else: return -1 self.assignNumerationRegexList() self._preprintCatsList.sort(_my_cmpfunc) preprintCatPatternsList = {} for categ in self._preprintCatsList: categptnslist = [] for num_ptn in self._numerationRegexList: categptnslist.append(re.compile(unicode(r'\b((?P<categ>') + categ.s_str + u')' + num_ptn + r')',re.UNICODE)) preprintCatPatternsList[categ] = categptnslist self._preprintCatPatternsList = preprintCatPatternsList def matchCategs2(self, ln): """Accept a line. Try to find matches for each of the preprint categories of this institute within that line""" def _my_cmpfunc(a,b): if a.length < b.length: return 1 elif a.length == b.length: return 0 else: return -1 inst_full_len = {} inst_RN_rep_str = {} self._preprintCatsList.sort(_my_cmpfunc) for categ in self._preprintCatsList: for ptn in self._preprintCatPatternsList[categ]: # Search for this categ in line: matches_iter = ptn.finditer(ln) for x in matches_iter: # Get hyphenated numeration segment of category: numnMatch = x.group('numn') numnMatch = re.sub(r'\s', '-', numnMatch) # Replace found categ in string with lowercase version: foundCateg = x.group('categ') foundCateg = foundCateg.lower() ln = ln[0:x.start()] + foundCateg + ln[x.end('categ'):] inst_full_len[x.start()] = len(x.group(0)) inst_RN_rep_str[x.start()] = categ.r_str + numnMatch return (inst_full_len, inst_RN_rep_str, ln) ## def matchCategs(self, ln): """Accept a line. Try to find matches for each of the preprint categories of this institute within that line""" def _my_cmpfunc(a,b): if a.length < b.length: return 1 elif a.length == b.length: return 0 else: return -1 inst_full_len = {} inst_RN_rep_str = {} self._preprintCatsList.sort(_my_cmpfunc) for categ in self._preprintCatsList: # Search for this categ in line: # Make the regex: my_ptn = re.compile(unicode(r'\b((?P<categ>') + categ.s_str + u')' + self._numerationRegex + r')',re.UNICODE) # Perform the search: matches_iter = my_ptn.finditer(ln) # For each match, record its position, etc and replace it with lower-case version for x in matches_iter: # Get hyphenated numeration segment of category: numnMatch = x.group('numn') numnMatch = re.sub(r'\s', '-', numnMatch) # Replace found categ in string with lowercase version: foundCateg = x.group('categ') foundCateg = foundCateg.lower() ln = ln[0:x.start()] + foundCateg + ln[x.end('categ'):] inst_full_len[x.start()] = len(x.group(0)) inst_RN_rep_str[x.start()] = categ.r_str + numnMatch return (inst_full_len, inst_RN_rep_str, ln) def addCategory(self, k, v): self._preprintCatsList.append(PreprintClassificationItem(k,v)) def addNumerationStyle(self, num): self._numerationList.append(num) name = property(fget = getName, fset = setName) del setName, getName class InstituteList: def __init__(self, fn = ''): self._iList = self._getInstituteList(fn) self._buildInstNumtnRegexs() def _buildInstNumtnRegexs(self): for i in self._iList: i.createTestPatternsList() def display(self): for x in self._iList: x.display() def _getInstituteList(self, fn): """Read the list of institutes in from the file and return an institute list. Terminates execution if cant read the file""" try: fh = open(fn, 'r') iList = [] p_instName = re.compile(unicode(r'^\#{5}\s*(.+)\s*\#{5}$'),re.UNICODE) p_prepClass = re.compile(unicode(r'^\s*(\w.*?)\s*---\s*(\w.*?)\s*$'),re.UNICODE) p_numtn = re.compile(unicode(r'^\<(.+)\>$'),re.UNICODE) for x in fh: y = x.decode("utf-8") m_instName = p_instName.search(y) m_prepClass = p_prepClass.search(y) m_numtn = p_numtn.search(y) if m_instName is not None: curInst = Institute(m_instName.group(1)) iList.append(curInst) elif m_prepClass is not None: try: curInst.addCategory(m_prepClass.group(1), m_prepClass.group(2)) except AttributeError, NameError: pass elif m_numtn is not None: try: curInst.addNumerationStyle(m_numtn.group(1)) except AttributeError, NameError: pass fh.close() return iList except IOError: import sys sys.exit('E: Cannot Open Institutes File "%s".' % fn) def identifyPreprintReferences(self, ln): """Accept a line of text (String) and search it against the institutes records held in order to identify references to an institutes preprints""" foundMatch = False identified_pp_len = {} identified_pp_repStr = {} for inst in self._iList: #(tmp_id_lens, tmp_id_repStrs, ln) = inst.matchCategs(ln) (tmp_id_lens, tmp_id_repStrs, ln) = inst.matchCategs2(ln) identified_pp_len.update(tmp_id_lens) identified_pp_repStr.update(tmp_id_repStrs) if len(identified_pp_len) > 0: foundMatch = True return (identified_pp_len, identified_pp_repStr, ln, foundMatch) class LineIBIDidentifier: """Class to identify and record information about IBID ocurrences in a text line""" def __init__(self): """Initialise regex pattern used to identify an IBID item""" self._p_ibid = re.compile(unicode(r'(-|\b)(IBID\.?( ([A-H]|(I{1,3}V?|VI{0,3})|[1-3]))?)\s?:'),re.UNICODE) self._pIbidPresent = re.compile(unicode(r'IBID\.?\s?([A-H]|(I{1,3}V?|VI{0,3})|[1-3])?'),re.UNICODE) def lineHasIbid(self, ln): m_ibidPresent = self._pIbidPresent.search(ln) if m_ibidPresent is not None: return True else: return False def getIbidSeriesLetter(self, ln): m_ibid = self._pIbidPresent.search(ln) try: series_letter = m_ibid.group(1) except IndexError: series_letter = u"" if series_letter is None: series_letter = u"" return series_letter def identify_record_ibids(self, ln): """Identify the IBIDs in "line". Record their information (index position in line, match length, and matched text. When identified, the word IBID is replaced with a lower-case version of itself Finally, the line is returned with all IBIDs identified, along with a lists of the identified IBID text and length. These 3 items are returned in a tuple. """ ibid_match_len = {} ibid_match_txt = {} matches_iter = self._p_ibid.finditer(ln) # Record dets of each match: for m in matches_iter: # Record match info ibid_match_len[m.start()] = len(m.group(2)) ibid_match_txt[m.start()] = m.group(2) # Replace matched txt in line with # Lowercase version rep_str = m.group(2) rep_str = rep_str.lower() ln = ln[0:m.start(2)] + rep_str + ln[m.end(2):] return (ibid_match_len, ibid_match_txt, ln) class URLidentifier: """Identify, record information about, and remove URLs from a line""" def __init__(self): """Initialise url recognition patterns""" self._urlstr = unicode(r'((https?|s?ftp):\/\/([\w\d\_\.\-])+(\/([\w\d\_\.\-])+)*(\/([\w\d\_\-]+\.\w{1,6})?)?)') self._p_rawURL = re.compile(self._urlstr,re.UNICODE|re.I) self._p_taggedURL = re.compile(unicode(r'(\<a\s+href\s*=\s*([\'"])?(((https?|s?ftp):\/\/)?([\w\d\_\.\-])+(\/([\w\d\_\.\-])+)*(\/([\w\d\_\-]+\.\w{1,6})?)?)([\'"])?\>([^\<]+)\<\/a\>)'),re.UNICODE|re.I) def removeURLs(self, ln): # Find URLS in tags: urlfound = False found_urlmatch_fulllen = {} found_urlstr = {} found_urldescstr = {} # Record and remove tagged URLs found in line m_taggedURL_iter = self._p_taggedURL.finditer(ln) for m in m_taggedURL_iter: urlfound = True startpos = m.start() endpos = m.end() matchlen = len(m.group()) found_urlmatch_fulllen[startpos] = matchlen found_urlstr[startpos] = m.group(3) found_urldescstr[startpos] = m.group(12) ln = ln[0:startpos] + u"_"*matchlen + ln[endpos:] # Record and remove raw URLs found in line: m_rawURL_iter = self._p_rawURL.finditer(ln) for m in m_rawURL_iter: urlfound = True startpos = m.start() endpos = m.end() matchlen = len(m.group()) found_urlmatch_fulllen[startpos] = matchlen found_urlstr[startpos] = m.group(1) found_urldescstr[startpos] = m.group(1) ln = ln[0:startpos] + u"_"*matchlen + ln[endpos:] return (found_urlmatch_fulllen, found_urlstr, found_urldescstr, urlfound, ln) class ProcessedReferenceLineBuilder: """Create a "ProcessedReferenceLine" from a reference line and information about where any matched items are""" def __init__(self, titles_list, ibid_identifier, numeration_processor, line_cleaner): self._titleslist = titles_list self._ibidIdentifier = ibid_identifier self._numerationprocessor = numeration_processor self._linecleaner = line_cleaner self._p_lineMarker = RefLineNumerationListCompiler().getCompiledPatternList() self._searcher = LineSearcher() self._p_tagFinder = re.compile(unicode(r'(\<cds\.(TITLE|VOL|YR|PG|RN|SER|URI value="[^\>]+")\>)'),re.UNICODE) self._p_leadRubbishRemover = re.compile(unicode(r'^([\.,;:-]+|\s+)+'),re.UNICODE) self._p_getNumatn = re.compile(unicode(r'^(\s*.?,?\s*:\s\<cds\.VOL\>(\d+)\<\/cds\.VOL> \<cds\.YR\>\(([1-2]\d\d\d)\)\<\/cds\.YR\> \<cds\.PG\>([RL]?\d+[c]?)\<\/cds\.PG\>)'),re.UNICODE) def _buildProcessedLine(self,ln,rawline): """Given a potentially marked up reference line, build and return a "ProcessedReferenceLine" object""" processedLine = ProcessedReferenceLine() linebckp = ln ln = string.lstrip(ln) # Trim line marker from start of line if possible & add it as a line segment m_lineMarker = self._searcher.findAtStartLine(ln, self._p_lineMarker) if m_lineMarker is not None: processedLine.addSection(LineMarker(m_lineMarker.group(u'mark'))) ln = ln[m_lineMarker.end():] else: processedLine.addSection(LineMarker(u" ")) m_tag = self._p_tagFinder.search(ln) thismisc = u"" while m_tag is not None: # Found citation markup tag in line tagtype = m_tag.group(2) if tagtype == u"TITLE": # Title section thisyr = thispg = thisvol = None # Get text up to point of this match: if len(self._p_leadRubbishRemover.sub(u"",ln[0:m_tag.start()])) > 0: thismisc += ln[0:m_tag.start()] m_titletxt = re.match(unicode(r'^(%s(\<cds\.TITLE\>([^\<]+)\<\/cds\.TITLE\>))'%(re.escape(ln[0:m_tag.start()]),)),ln,re.UNICODE) thistitle = m_titletxt.group(3) ln = ln[m_titletxt.end():] # Remove and add volume, year and pagination tags which follow title if present m_numatn = self._p_getNumatn.match(ln) if m_numatn is not None: thisvol = m_numatn.group(2) thisyr = m_numatn.group(3) thispg = m_numatn.group(4) ln = ln[m_numatn.end():] if len(thismisc) == 0: thismisc = None processedLine.addSection(TitleCitationStandard(thistitle, thismisc, thispg, thisvol, thisyr)) thismisc = u"" else: thismisc += u" " + thistitle elif tagtype == u"RN": # Preprint reference number section # Get misc text up to point of match if len(self._p_leadRubbishRemover.sub(u"",ln[0:m_tag.start()])) > 0: thismisc += ln[0:m_tag.start()] m_rntxt = re.match(unicode(r'^(%s(\<cds\.RN\>([^\<]+)\<\/cds\.RN\>))'%(re.escape(ln[0:m_tag.start()]),)),ln,re.UNICODE) thisrn = m_rntxt.group(3) ln = ln[m_rntxt.end():] if len(thismisc) == 0: thismisc = None processedLine.addSection(InstitutePreprintReferenceCitation(thisrn, thismisc)) thismisc = u"" elif string.find(tagtype ,u"URI") == 0: # URL found # Get misc text up to point of match if len(self._p_leadRubbishRemover.sub(u"",ln[0:m_tag.start()])) > 0: thismisc += ln[0:m_tag.start()] m_urlinfo = re.match(unicode(r'^(%s(\<cds\.URI value\=\"([^\>]+)\"\>([^\<]+)\<\/cds\.URI\>))'%(re.escape(ln[0:m_tag.start()]),)),ln,re.UNICODE) thisurl = m_urlinfo.group(3) thisurldescr = m_urlinfo.group(4) if len(thisurldescr) == 0: thisurldescr = thisurl if len(thismisc) == 0: thismisc = None processedLine.addSection(URLCitation(thisurl, thisurldescr, thismisc)) thismisc = u"" ln = ln[m_urlinfo.end():] elif tagtype == u"VOL": # Volume info - it wasnt found after a title, so treat as misc thismisc += ln[0:m_tag.start()] m_voltxt = re.match(unicode(r'^(%s(\<cds\.VOL\>(\d+)\<\/cds\.VOL\>))'%(re.escape(ln[0:m_tag.start()]),)),ln,re.UNICODE) thismisc += m_voltxt.group(3) ln = ln[m_voltxt.end():] elif tagtype == u"YR": # Year info - discard as misc since not found after title info thismisc += ln[0:m_tag.start()] m_yrtxt = re.match(unicode(r'^(%s(\<cds\.YR\>(\([1-2]\d\d\d\))\<\/cds\.YR\>))'%(re.escape(ln[0:m_tag.start()]),)),ln,re.UNICODE) thismisc += m_yrtxt.group(3) ln = ln[m_yrtxt.end():] elif tagtype == u"PG": # Pagination info - discard since not found after title info thismisc += ln[0:m_tag.start()] m_pgtxt = re.match(unicode(r'^(%s(\<cds\.PG\>([RL]?\d+[c]?)\<\/cds\.PG\>))'%(re.escape(ln[0:m_tag.start()]),)),ln,re.UNICODE) thismisc += m_pgtxt.group(3) ln = ln[m_pgtxt.end():] elif tagtype == u"SER": # Series info - discard since not after title info (should have been caught earlier infact) thismisc += ln[0:m_tag.start()] m_sertxt = re.match(unicode(r'^(%s(\<cds\.SER\>([A-H]|(I{1,3}V?|VI{0,3}))\<\/cds\.SER\>))'%(re.escape(ln[0:m_tag.start()]),)),ln,re.UNICODE) thismisc += m_sertxt.group(3) ln = ln[m_sertxt.end():] else: # Unknown tag (never happen) - discard as misc thismisc += ln[0:m_tag.start()] m_uknowntag = re.match(unicode(r'^(%s(\<cds\.[^\>]+?\>([^\<]+?)\<\/cds\.[^\>]+?\>))'%(re.escape(ln[0:m_tag.start()]),)),ln,re.UNICODE) thismisc += m_uknowntag.group(3) ln = ln[m_uknowntag.end():] m_tag = self._p_tagFinder.search(ln) if processedLine.getNumberCitations() == 0 and cfg_refextract_no_citation_treatment == 0: # No Citations were found and strict mode in use demanding that when no citations are found the entire ORIGINAL, UNTOUCHED line be marked up into misc processedLine = ProcessedReferenceLine() untouchedline = string.lstrip(rawline) m_lineMarker = self._searcher.findAtStartLine(untouchedline, self._p_lineMarker) if m_lineMarker is not None: processedLine.addSection(LineMarker(m_lineMarker.group(u'mark'))) untouchedline = untouchedline[m_lineMarker.end():] else: processedLine.addSection(LineMarker(u" ")) if len(self._p_leadRubbishRemover.sub(u"",untouchedline)) > 0: processedLine.addSection(LineMiscellaneousText(untouchedline)) else: thismisc += ln if len(self._p_leadRubbishRemover.sub(u"",thismisc)) > 0: processedLine.addSection(LineMiscellaneousText(thismisc)) return processedLine def getProcessedReferenceLine(self, titlematch_len, titlematch_str, pprintmatch_str, pprintmatch_len, urlmatchfull_len, urlmatch_str, url_desc_str,\ removed_spaces, rawline, original_line, working_line, foundCitations): marked_line = u"" # line after titles etc have been recognised & marked up with "<cds.TITLE/>" etc tags if not foundCitations: marked_line = original_line else: # Rebuild line with citations marked up and standardised: start_pos = 0 # First cell of the reference line... last_match = u"" extras = 0 # Variable to count the extra spaces to add series_letter = u"" replacement_types = {} url_keys = urlmatch_str.keys() url_keys.sort() title_keys = titlematch_str.keys() title_keys.sort() pp_keys = pprintmatch_str.keys() pp_keys.sort() spaces_keys = removed_spaces.keys() spaces_keys.sort() # First, adjust the index replacement values of the URI replacements as they were made before the multispaces etc were # stripped & other replacements made after this could therefore have the same replacement indeces uri_virtual_locations = self._getVirtualUrlPositions(url_keys, spaces_keys, removed_spaces) # Make dictionary containing the types of replacements to be made at each position: rep_types = self._getReplacementTypes(uri_virtual_locations,title_keys,pp_keys) rep_types_keys = rep_types.keys() rep_types_keys.sort() # Begin the rebuild: for repidx in rep_types_keys: true_repidx = repidx spare_repidx = repidx extras = 0 # Account for any spaces stripped before these values: (true_repidx,spare_repidx,extras) =\ self._addExtraStrippedSpaces(spaces_keys,removed_spaces,rep_types,pprintmatch_len,titlematch_len,true_repidx,spare_repidx,repidx,extras) if rep_types[repidx] == u"TITLE": # Process addition of text into line for title: (marked_line,start_pos,last_match) = self._addLineTitle(titlematch_str,titlematch_len,original_line,marked_line,start_pos,repidx,true_repidx,extras,last_match) elif rep_types[repidx] == u"RN": # Process addition of text into line for preprint reference: (marked_line,start_pos) = self._replaceLineItemPreprintRef(pprintmatch_str,pprintmatch_len,original_line,marked_line,start_pos,repidx,true_repidx,extras) elif rep_types[repidx] == u"URI": # Process addition of text into line for URL: (marked_line,start_pos) = self._addLineURI(urlmatch_str,url_desc_str,urlmatchfull_len,uri_virtual_locations,original_line,marked_line,start_pos,repidx,true_repidx) marked_line = marked_line + original_line[start_pos:] marked_line = self._numerationprocessor.restandardise(marked_line) marked_line = self._numerationprocessor.removeSeriesTags(marked_line) # Remove any "Series tags" marked_line = self._linecleaner.clean(marked_line) return self._buildProcessedLine(marked_line,rawline) def _replaceIbid(self,series_letter,last_match,rebuiltLine,ibid_str): """Replace an IBID occurrence in a line with the "last matched" title in the line. Also take into account a new series letter governed by the ibid""" if series_letter != u"": # IBID to replace has a series letter, so if the last matched title had a series letter, this must be changed to the new series letter if string.find(last_match,",") != -1: # Presence of comma signifies possible series information. Only replace if it is a single item (e.g. "A") m_lastMatch = re.search(unicode(r'\, +([A-H]|(I{1,3}V?|VI{0,3}))$'),last_match,re.UNICODE) if m_lastMatch is not None: temp_series = m_lastMatch.group(1) if temp_series == series_letter: rebuiltLine = rebuiltLine + u" <cds.TITLE>" + last_match + u"</cds.TITLE>" else: last_match = re.sub(u"(\\.?)(,?) %s$"%(temp_series,),u"\\g<1>\\g<2> %s"%(series_letter,),last_match) rebuiltLine = rebuiltLine + u" <cds.TITLE>" + last_match + u"</cds.TITLE>" else: # Series info of last match not letter or roman numeral: cannot be sure about meaning of IBID - dont replace it rebuiltLine = rebuiltLine + ibid_str else: # Match had no series letter but IBID did. Add comma followed by IBID series letter to last match, then add it last_match = string.rstrip(last_match) if last_match[-1] == u".": last_match = last_match + u", " + series_letter else: # Last match end with space - replace all spaces at end last_match = last_match + u"., " + series_letter rebuiltLine = rebuiltLine + u" <cds.TITLE>" + last_match + u"</cds.TITLE>" else: # IBID has no series letter. Replace as-is: rebuiltLine = rebuiltLine + u" <cds.TITLE>" + last_match + u"</cds.TITLE>" return (rebuiltLine,last_match) def _addLineTitle(self,titlematch_str,titlematch_len,orig_line,rebuiltLine,start_pos,repidx,true_repidx,extras,last_match): rebuiltLine=rebuiltLine+orig_line[start_pos:true_repidx] series_letter = u"" #if self._ibidIdentifier.lineHasIbid(titlematch_str[repidx]): if titlematch_str[repidx].upper().find(u"IBID") != -1: # Replace IBID item # Get series letter series_letter = self._ibidIdentifier.getIbidSeriesLetter(titlematch_str[repidx]) if last_match != "": # Replacement has already been made in this line. IBID can therefore be replaced (rebuiltLine,last_match) = self._replaceIbid(series_letter, last_match, rebuiltLine, titlematch_str[repidx]) start_pos=true_repidx+titlematch_len[repidx]+extras if orig_line[start_pos] == u"." or orig_line[start_pos] == u":" or\ orig_line[start_pos] == u";": # Skip past ".:;" which may have followed an IBID: start_pos=start_pos+1 else: # No replacements made in this line before this IBID (its a line with an IBID and # we dont know what the IBID refers to..ignore it rebuiltLine = rebuiltLine + orig_line[true_repidx:true_repidx + titlematch_len[repidx] + extras] start_pos=true_repidx+titlematch_len[repidx]+extras else: # Normal title replacement - not an IBID # Skip past any "[" or "(" chars rebuiltLine = rebuiltLine + u"<cds.TITLE>" + self._titleslist[titlematch_str[repidx]] + u"</cds.TITLE>" last_match = self._titleslist[titlematch_str[repidx]] start_pos = true_repidx+titlematch_len[repidx]+extras if orig_line[start_pos] == u"." or orig_line[start_pos] == u":" or\ orig_line[start_pos] == u";": # Skip past punctuation at end of title start_pos = start_pos + 1 return (rebuiltLine,start_pos,last_match) def _replaceLineItemPreprintRef(self,pprintmatch_str,pprintmatch_len,orig_line,rebuiltLine,start_pos,repidx,true_repidx,extras): """Replace a Preprint reference item in the line with a marked-up, standardised version of itself""" # Often pprint refs are enclosed in "[]" chars which we dont want. Stop 1 char before this if possible: if (true_repidx - start_pos - 1) >= 0: rebuiltLine = rebuiltLine + orig_line[start_pos:true_repidx - 1] else: rebuiltLine = rebuiltLine + orig_line[start_pos:true_repidx] # Is next char a "[" or "("? Skip past it if yes: if orig_line[true_repidx] == u"[" or \ orig_line[true_repidx] == u"(": rebuiltLine = rebuiltLine + u" - " else: rebuiltLine = rebuiltLine + orig_line[true_repidx-1] rebuiltLine = rebuiltLine + u"<cds.RN>" + pprintmatch_str[repidx] + u"</cds.RN>" start_pos = true_repidx + pprintmatch_len[repidx] + extras try: if orig_line[start_pos] == u"]" or orig_line[start_pos] == u")": # Skip past preprint ref no closing brace start_pos = start_pos + 1 except IndexError: # Went past end of line. Ignore. pass return (rebuiltLine, start_pos) def _addLineURI(self,urlmatch_str,urldesc_str,urlmatchfull_len,uri_virtual_locations,orig_line,rebuiltLine,start_pos,repidx,true_repidx): rebuiltLine = rebuiltLine + orig_line[start_pos:start_pos + true_repidx - start_pos] rebuiltLine = rebuiltLine + u"<cds.URI value=\"" + urlmatch_str[uri_virtual_locations[repidx]] + u"\">" + urldesc_str[uri_virtual_locations[repidx]] + u"</cds.URI>" start_pos = true_repidx + urlmatchfull_len[uri_virtual_locations[repidx]] return (rebuiltLine, start_pos) def _addExtraStrippedSpaces(self, spacesKeys, removed_spaces, rep_types, pprintmatch_len, titlematch_len, true_repidx, spare_repidx, repidx, extras): """For a replacement index position, calculate a new (correct) replacement index, based on any spaces that have been removed before it, according to the type of the replacement""" for strip_space in spacesKeys: if strip_space < true_repidx: # Spaces were removed before this replacement item should be placed. Add number of spaces removed to current replacement idx: true_repidx = true_repidx + removed_spaces[strip_space] spare_repidx = spare_repidx + removed_spaces[strip_space] elif (strip_space >= spare_repidx) and (rep_types[repidx] == u"TITLE") and\ (strip_space < (spare_repidx + titlematch_len[repidx])): # Replacing a periodical title. Account for double spaces that may have been removed # from the title before it was recognised. spare_repidx = spare_repidx + removed_spaces[strip_space] extras = extras + removed_spaces[strip_space] elif (strip_space >= spare_repidx) and (rep_types[repidx] == u"RN") and\ (strip_space < (spare_repidx + pprintmatch_len[repidx])): # Replacing an institute preprint reference. Spaces would have been removed from this # pprint reference itself, and must therefore be added spare_repidx = spare_repidx + removed_spaces[strip_space] extras = extras + removed_spaces[strip_space] return (true_repidx, spare_repidx, extras) def _getReplacementTypes(self,urls,titles,preprints): """Make dictionary detailing the type of replacement made at each position""" rep_types = {} for idx in urls: rep_types[idx] = u"URI" for idx in titles: rep_types[idx] = u"TITLE" for idx in preprints: rep_types[idx] = u"RN" return rep_types def _getVirtualUrlPositions(self, url_keys, spaces_keys, removed_spaces): """URLs were removed before punctuation and multiple spaces were recorded and stripped. This method makes a dictionary of URL positions as-if the URLs had been identified/removed after the punctuation/spaces """ uri_virtual_locations = {} for idx in url_keys: virtual_pos = idx for spcidx in spaces_keys: if spcidx < idx: # Spaces were removed before this URL. Account for this. virtual_pos = virtual_pos - removed_spaces[spcidx] # All spaces removed before this URL accounted for - add it to the dictionary uri_virtual_locations[virtual_pos] = idx return uri_virtual_locations class ReferenceSectionMarkupProcessor: """Process a reference section. Line will be cleaned, and cited items will be identified and their notation standardised. ProcessedReferenceLine will be returned""" def __init__(self, institutes, titles): self._instlist = institutes self._titleslist = titles self._ibidIdentifier = LineIBIDidentifier() self._numerationIdentifier = NumerationHandler() self._lineCleaner = LineCleaner() self._lineBuilder = ProcessedReferenceLineBuilder(self._titleslist, self._ibidIdentifier, self._numerationIdentifier, self._lineCleaner) self._accentTransformer = EscapeSequenceTransformer() self._punctuationStripper = PunctuationStripper() self._multispaceRemover = MultispaceRemover() self._urlRemover = URLidentifier() def getProcessedReferenceSection(self, refSect): """Take a ReferenceSection as argument. For each line, process it""" processedRefSection = ProcessedReferenceSection() for line in refSect: citationMatch=False found_ibids_len = {} found_ibids_matchtxt = {} found_title_len = {} found_title_txt = {} tmpLine = line.getContent() # Got line as unicode string # Remove and record details of URLs #(found_urlmatch_fulllen, found_urlstr, found_urldescstr, foundItem, tmpLine) = self._urlRemover.removeURLs(tmpLine) found_urlmatch_fulllen = {} found_urlstr = {} found_urldescstr = {} foundItem = False if foundItem: citationMatch=True # Preliminary line cleaning: transform bad accents, clean punctuation & remove dbl-spaces tmpLine = self._accentTransformer.processLine(tmpLine) tmpLine = self._lineCleaner.clean(tmpLine) # Standardise numeration: tmpLine = self._numerationIdentifier.standardise(tmpLine) tmpLine = self._lineCleaner.clean(tmpLine) # ---> Standardise the titles: tmpLine2 = string.upper(tmpLine) # uppercase the line tmpLine2 = self._punctuationStripper.strip(tmpLine2) # Strip punctuation (removedSpaces,tmpLine2) = self._multispaceRemover.recordRemove(tmpLine2) # remove multispace & record their positions (found_pp_len, found_pp_rep_str, tmpLine2, foundItem) = self._instlist.identifyPreprintReferences(tmpLine2) if foundItem: citationMatch=True # find_nonstandard_titles (found_title_len,found_title_txt,tmpLine2,foundItem) = self._titleslist.findPeriodicalTitles(tmpLine2) if foundItem: citationMatch=True # If there is an IBID in the line, do a 2nd pass to try to catch it & identify its meaning if tmpLine2.upper().find(u"IBID") != -1: # Record/remove IBID(s) in line (found_ibids_len,found_ibids_matchtxt,tmpLine2) = self._ibidIdentifier.identify_record_ibids(tmpLine2) # Add found ibids to title matches: for itm in found_ibids_len.keys(): found_title_len[itm] = found_ibids_len[itm] for itm in found_ibids_matchtxt.keys(): found_title_txt[itm] = found_ibids_matchtxt[itm] # Create "ProcessedReferenceLine": thisProcessedLine = self._lineBuilder.getProcessedReferenceLine(found_title_len,found_title_txt,found_pp_rep_str,found_pp_len,\ found_urlmatch_fulllen, found_urlstr, found_urldescstr,removedSpaces,line.getContent(),tmpLine,tmpLine2,citationMatch) processedRefSection.appendLine(thisProcessedLine) return processedRefSection class LineItem: def getSelfMARCXML(self): """Return self, as marc xml string""" pass class LineMarker(LineItem): def __init__(self, val): if type(val) is str or type(val) is unicode: self._value = val else: self._value = u"" def getSelfMARCXML(self): return u""" <datafield tag="999" ind1="C" ind2="5"> <subfield code="o">""" + cgi.escape(self._value)+u"""</subfield> </datafield>\n""" class LineMiscellaneousText(LineItem): def __init__(self, val): if type(val) is str or type(val) is unicode: self._value = val.strip() else: self._value = u"" def getSelfMARCXML(self): return u""" <datafield tag="999" ind1="C" ind2="5"> <subfield code="m">"""+cgi.escape(self._value)+u"""</subfield> </datafield>\n""" class Citation(LineItem): """Abstract - represents a citation instance. Could be used to count citations found in a line""" pass class TitleCitation(Citation): def __init__(self, title, misc = None, pg = None, vol = None, yr = None): self._title = title if misc is not None: self._misc = misc.strip() else: self._misc = misc self._page = pg self._volume = vol self._yr = yr def getSelfMARCXML(self): out = u""" <datafield tag="999" ind1="C" ind2="5">\n""" if self._misc is not None and (type(self._misc) is unicode or type(self._misc) is str): out += u""" <subfield code="m">"""+cgi.escape(self._misc)+u"""</subfield>\n""" out += u""" <subfield code="t">"""+cgi.escape(self._title)+u"""</subfield>\n""" if self._page is not None and (type(self._page) is unicode or type(self._page) is str): out += u""" <subfield code="p">"""+cgi.escape(self._page)+u"""</subfield>\n""" if self._volume is not None and (type(self._volume) is unicode or type(self._volume) is str): out += u""" <subfield code="v">"""+cgi.escape(self._volume)+u"""</subfield>\n""" if self._yr is not None and (type(self._yr) is unicode or type(self._yr) is str): out += u""" <subfield code="y">"""+cgi.escape(self._yr)+u"""</subfield>\n""" out += u""" </datafield>\n""" return out class TitleCitationStandard(Citation): """[journal name] [volume] ([year]) [pagination]""" def __init__(self, title, misc = None, pg = None, vol = None, yr = None): self._title = title if misc is not None: self._misc = misc.strip() else: self._misc = misc self._page = pg self._volume = vol self._yr = yr def hasMisc(self): if self._misc is not None and (type(self._misc) is unicode or type(self._misc) is str) and len(self._misc.strip("()[], {}-")) > 0 or not\ (self._title is not None and self._page is not None and self._volume is not None and self._yr is not None): return True else: return False def getS_subfield(self): if self._title is not None and self._page is not None and self._volume is not None and self._yr is not None: return u""" <subfield code="s">%s %s (%s) %s</subfield>\n"""%(self._title,self._volume,self._yr,self._page) else: return None def getSelfMARCXML(self, xtra_subfield=None): subfieldOpen = False out = u""" <datafield tag="999" ind1="C" ind2="5">\n""" if self._misc is not None and (type(self._misc) is unicode or type(self._misc) is str): out += u""" <subfield code="m">"""+cgi.escape(self._misc) subfieldOpen=True if self._title is not None and self._page is not None and self._volume is not None and self._yr is not None: if subfieldOpen: out += u"""</subfield>\n""" subfieldOpen=False out += u""" <subfield code="s">%s %s (%s) %s</subfield>\n"""%(self._title,self._volume,self._yr,self._page) else: if not subfieldOpen: out += u""" <subfield code="m">""" subfieldOpen = True if self._title is not None: out += u" %s"%(self._title,) if self._title is not None: out += u" %s"%(self._volume,) if self._title is not None: out += u" (%s)"%(self._yr,) if self._title is not None: out += u" %s"%(self._page,) if subfieldOpen: out += u"""</subfield>\n""" subfieldOpen=False if xtra_subfield is not None: out += xtra_subfield out += u""" </datafield>\n""" return out class InstitutePreprintReferenceCitation(Citation): def __init__(self, rn, misc = None): self._rn = rn if misc is not None and len(misc.strip("()[], {}-")) > 0: self._misc = misc.strip() else: self._misc = None def hasMisc(self): if self._misc is not None and (type(self._misc) is unicode or type(self._misc) is str) and len(self._misc.strip()) > 0: return True else: return False def getRN_subfield(self): return u""" <subfield code="r">"""+cgi.escape(self._rn)+u"""</subfield>\n""" def getSelfMARCXML(self, xtra_subfield=None): out = u""" <datafield tag="999" ind1="C" ind2="5">\n""" if self._misc is not None and (type(self._misc) is unicode or type(self._misc) is str): out += u""" <subfield code="m">"""+cgi.escape(self._misc)+u"""</subfield>\n""" out += u""" <subfield code="r">"""+cgi.escape(self._rn)+u"""</subfield>\n""" if xtra_subfield is not None: out += xtra_subfield out += u""" </datafield>\n""" return out class URLCitation(Citation): def __init__(self, url, urldescr, misc=None): self._url = url self._urldescr = urldescr if misc is not None: self._misc = misc.strip() else: self._misc = misc def getSelfMARCXML(self): out = u""" <datafield tag="999" ind1="C" ind2="5">\n""" if self._misc is not None and (type(self._misc) is unicode or type(self._misc) is str): out += u""" <subfield code="m">"""+cgi.escape(self._misc)+u"""</subfield>\n""" out += u""" <subfield code="u">"""+cgi.escape(self._url)+u"""</subfield>\n""" out += u""" <subfield code="z">"""+cgi.escape(self._urldescr)+u"""</subfield>\n""" out += u""" </datafield>\n""" return out class ProcessedReferenceLine: """This is a reference line that has been processed for cited items""" def __init__(self): self._segments = {} # Segments of reference line, each keyed by start point index. Each is a 'LineItem'. self._nextposn = 0 def getSelfMARCXML(self): """Return an XML string containing this lines contents, marked up in XML MARC, as used in CDS""" i = 0 lenline = len(self._segments) out = u"" while i < lenline: if isinstance(self._segments[i],TitleCitationStandard) and i < lenline-1 and isinstance(self._segments[i+1],InstitutePreprintReferenceCitation) and not self._segments[i+1].hasMisc(): # This is a $s (periodical title) reference, followed immediately by its report number ($r). Concat them both under the $s. out += self._segments[i].getSelfMARCXML(self._segments[i+1].getRN_subfield()) i = i + 1 elif isinstance(self._segments[i],InstitutePreprintReferenceCitation) and i < lenline-1 and isinstance(self._segments[i+1],TitleCitationStandard) and not self._segments[i+1].hasMisc(): # This is a report number ($r) reference followed immediately by its periodical title ($s) reference. Concat them both under $s. out += self._segments[i].getSelfMARCXML(self._segments[i+1].getS_subfield()) i = i + 1 else: out += self._segments[i].getSelfMARCXML() i = i + 1 return out def addSection(self, newSect): if isinstance(newSect,LineItem): self._segments[self._nextposn] = newSect self._nextposn += 1 def getNumberCitations(self): numcitations = 0 numsegments = len(self._segments) for i in range(0,numsegments): if isinstance(self._segments[i], Citation): numcitations += 1 return numcitations class ProcessedReferenceSection: """This is a reference section after it has been processed to identify cited items. It contains a list of ProcessedReferenceLines.""" def __init__(self): self._lines = {} self._nextline = 0 def getSelfMARCXML(self): """Return a unicode string of all reference lines marked up in MARC XML""" out = u"" numlines = len(self._lines) for i in range(0,numlines): out += self._lines[i].getSelfMARCXML() return out def appendLine(self, ln): """Add a new line to the list of processed reference lines""" if isinstance(ln, ProcessedReferenceLine): self._lines[self._nextline] = ln self._nextline += 1 def getTotalNumberCitations(self): """Return an integer representing the total number of citations recognised (and thus marked up) in the reference section""" numcitations = 0 numlines = len(self._lines) for i in range(0,numlines): numcitations += self._lines[i].getNumberCitations() return numcitations class NumerationHandler: """Class whose instances identify reference numeration patterns in a text line and rearrange them into standardised numeration patterns Returns line with numeration patterns marked up in an XML style """ def __init__(self): self._ptnList = [] self._checkAgainPtnList = [] self._ptn_seriesRemove = re.compile(unicode(r'((\<cds.TITLE\>)([^\<]+)(\<\/cds.TITLE\>)\s*.?\s*\<cds\.SER\>([A-H]|(I{1,3}V?|VI{0,3}))\<\/cds\.SER\>)'),re.UNICODE) self._setSearchPatterns() self._setRecheckPatterns() def _setRecheckPatterns(self): """After the line has been rebuilt with marked up titles, it can be rechecked for numeration patterns because perhaps now more can be found with the aid of the recognised titles""" self._checkAgainPtnList.append([re.compile(unicode(r'\(?([12]\d{3})([A-Za-z]?)\)?,? *(<cds\.TITLE>(\.|[^<])*<\/cds\.TITLE>),? *(\b[Vv]o?l?\.?)?\s?(\d+)(,\s*|\s+)[pP]?[p]?\.?\s?([RL]?\d+[c]?)\-?[RL]?\d{0,6}[c]?'),re.UNICODE),unicode('\\g<1>\\g<2>, \\g<3> \\g<6> (\\g<1>) \\g<8>')]) self._checkAgainPtnList.append([re.compile(unicode(r'\(?([12]\d{3})([A-Za-z]?)\)?,? *(<cds\.TITLE>(\.|[^<])*<\/cds\.TITLE>),? *(\b[Vv]o?l?\.?)?\s?(\d+)\s?([A-H])\s?[pP]?[p]?\.?\s?([RL]?\d+[c]?)\-?[RL]?\d{0,6}[c]?'),re.UNICODE),unicode('\\g<1>\\g<2>, \\g<3> \\g<6> \\g<7> \\g<8> (\\g<1>)')]) def _setSearchPatterns(self): """Populate self._ptnList with seek/replace numeration pattern pairs""" # Delete the colon and expressions as Serie, vol, V. inside the pattern <serie : volume> self._ptnList.append([re.compile(unicode(r'(Serie\s|\bS\.?\s)?([A-H])\s?[:,]\s?(\b[Vv]o?l?\.?)?\s?(\d+)'),re.UNICODE),unicode('\\g<2> \\g<4>')]) # Use 4 different patterns to standardise numeration as <serie(?) : volume (year) page> # Pattern 1: <x, vol, year, page> self._ptnList.append([re.compile(unicode(r'(\b[Vv]o?l?\.?)?\s?(\d+)\s?\(([1-2]\d\d\d)\),?\s?[pP]?[p]?\.?\s?([RL]?\d+[c]?)(?:\-|\255)?[RL]?\d{0,6}[c]?'),re.UNICODE), unicode(' : <cds.VOL>\\g<2></cds.VOL> <cds.YR>(\\g<3>)</cds.YR> <cds.PG>\\g<4></cds.PG> ')]) # Pattern 2: <vol, serie, year, page> self._ptnList.append([re.compile(unicode(r'(\b[Vv]o?l?\.?)?\s?(\d+)\s?([A-H])\s?\(([1-2]\d\d\d)\),?\s?[pP]?[p]?\.?\s?([RL]?\d+[c]?)(?:\-|\255)?[RL]?\d{0,6}[c]?'),re.UNICODE), unicode(' <cds.SER>\\g<3></cds.SER> : <cds.VOL>\\g<2></cds.VOL> <cds.YR>(\\g<4>)</cds.YR> <cds.PG>\\g<5></cds.PG> ')]) # Pattern 3: <x, vol, page, year> self._ptnList.append([re.compile(unicode(r'(\b[Vv]o?l?\.?)?\s?(\d+)\s?[,:]\s?[pP]?[p]?\.?\s?([RL]?\d+[c]?)(?:\-|\255)?[RL]?\d{0,6}[c]?,?\s?\(?([1-2]\d\d\d)\)?'),re.UNICODE), unicode(' : <cds.VOL>\\g<2></cds.VOL> <cds.YR>(\\g<4>)</cds.YR> <cds.PG>\\g<3></cds.PG> ')]) # Pattern 4: <vol, serie, page, year> self._ptnList.append([re.compile(unicode(r'(\b[Vv]o?l?\.?)?\s?(\d+)\s?([A-H])[,:\s]\s?[pP]?[p]?\.?\s?([RL]?\d+[c]?)(?:\-|\255)?[RL]?\d{0,6}[c]?,?\s?\(([1-2]\d\d\d)\)'),re.UNICODE), unicode(' <cds.SER>\\g<3></cds.SER> : <cds.VOL>\\g<2></cds.VOL> <cds.YR>(\\g<5>)</cds.YR> <cds.PG>\\g<4></cds.PG> ')]) def removeSeriesTags(self, ln): """Remove any "<cds.SER/>" tags from a line. Series information should be part of a title, not separate""" m_seriesTagLine = self._ptn_seriesRemove.search(ln) while m_seriesTagLine is not None: whole_match = m_seriesTagLine.group(0) title_tag_opener = m_seriesTagLine.group(2) title_text = m_seriesTagLine.group(3) title_tag_closer = m_seriesTagLine.group(4) series_letter = m_seriesTagLine.group(5) real_title_text = title_text # If there is no comma in the matched title, add one to the end of it before series info added. If there is already a comma present, simply discard the series info if string.find(real_title_text,u",") != -1: real_title_text = string.rstrip(real_title_text) if real_title_text[-1] == u".": real_title_text = real_title_text + u", " + series_letter else: real_title_text = real_title_text + u"., " + series_letter ln = re.sub(u"%s"%(re.escape(whole_match),),u"%s%s%s"%(title_tag_opener,real_title_text,title_tag_closer),ln,1) m_seriesTagLine = self._ptn_seriesRemove.search(ln) return ln def restandardise(self, ln): """Given that some more titles have been recognised within a line, reprocess that line in the hopes of recognising more numeration patterns""" for x in self._checkAgainPtnList: ln = x[0].sub(x[1], ln) return self.standardise(ln) def standardise(self, ln): """Accept ln (text line) as argument. Perform transformations on this line to replace non-standard numeration styles with marked-up versions in a standard format. These recognised and marked-up numeration patterns can later be used to identify cited documents """ for x in self._ptnList: ln = x[0].sub(x[1], ln) return ln class LineCleaner: """Class to enable lines to be cleaned of punctuation errors""" def __init__(self): self._correctionList = {} self._setCorrectionList() def _setCorrectionList(self): """Set the list of punctuation (etc) errors in a line to be corrected""" self._correctionList[re.compile(unicode(r'\s,'),re.UNICODE)] = u',' self._correctionList[re.compile(unicode(r'\s;'),re.UNICODE)] = u';' self._correctionList[re.compile(unicode(r'\s\.'),re.UNICODE)] = u'.' self._correctionList[re.compile(unicode(r':\s:'),re.UNICODE)] = u':' self._correctionList[re.compile(unicode(r',\s:'),re.UNICODE)] = u':' self._correctionList[re.compile(unicode(r'\s\]'),re.UNICODE)] = u']' self._correctionList[re.compile(unicode(r'\[\s'),re.UNICODE)] = u'[' self._correctionList[re.compile(unicode(r'\\255'),re.UNICODE)] = u'-' # Hyphen symbols self._correctionList[re.compile(u'\u02D7',re.UNICODE)] = u'-' self._correctionList[re.compile(u'\u0335',re.UNICODE)] = u'-' self._correctionList[re.compile(u'\u0336',re.UNICODE)] = u'-' self._correctionList[re.compile(u'\u2212',re.UNICODE)] = u'-' self._correctionList[re.compile(u'\u002D',re.UNICODE)] = u'-' self._correctionList[re.compile(u'\uFE63',re.UNICODE)] = u'-' self._correctionList[re.compile(u'\uFF0D',re.UNICODE)] = u'-' def clean(self, ln): # Remove double spaces: p_dblSpace = re.compile(unicode(r'\s{2,}'),re.UNICODE) ln = p_dblSpace.sub(u' ', ln) # Correct other bad punctuation: for x in self._correctionList.keys(): ln = x.sub(self._correctionList[x], ln) return ln class PunctuationStripper: """Class to strip punctuation characters from a line & replace them with a space character""" def __init__(self): self._punct = re.compile(unicode(r'[\.\,\;\'\(\)\-]'),re.UNICODE) self._rep = u' ' def strip(self, ln): return self._punct.sub(self._rep, ln) class MultispaceRemover: """Class to remove all ocurrences of multiple spaces from a line and replace them with a single space while recording information about their positioning""" def __init__(self): self._spcPtn = re.compile(unicode(r'(\s{2,})'),re.UNICODE) def recordRemove(self, ln): removedSpaces = {} # Records posn of removed multispace & length of truncation fromPos = 0 # Posn in line from which to check for multispaces # Search for multispace: ms_matches = self._spcPtn.finditer(ln) for m in ms_matches: removedSpaces[m.start()] = m.end() - m.start() - 1 ln = self._spcPtn.sub(u' ', ln) # Return a tuple of 2 items: a dictionary containing the removed multispace info, # and the line itself after the multispaces have been converted to single spaces return (removedSpaces, ln) def getFileList(fname): """Return a list of files to be processed""" flist = [] if os.access(fname, os.R_OK): try: f = open(fname, "r") for line in f: flist.append(line.strip()) f.close() except IOError: return None return flist else: return None def getRecidFilenames(args): files = [] for x in args: items = string.split(x, ":") if len(items) != 2: sys.stderr.write(u"W: Recid:filepath argument invalid. Skipping.\n") continue files.append((items[0],items[1])) return files def main(): myoptions, myargs = getopt.getopt(sys.argv[1:], "hV", ["help","version"]) for o in myoptions: if o[0] in ("-V","--version"): sys.stderr.write("%s\n" % (SystemMessage().getVersionMessage(),)) # Version message and stop sys.exit(0) elif o[0] in ("-h","--help"): sys.stderr.write("%s\n" % (SystemMessage().getHelpMessage(),)) # Help message and stop sys.exit(0) if len(myargs) == 0: sys.stderr.write("%s\n" % (SystemMessage().getHelpMessage(),)) # Help message and stop sys.exit(0) recidfiles = getRecidFilenames(myargs) if len(recidfiles) == 0: sys.stderr.write("%s\n" % (SystemMessage().getHelpMessage(),)) # Help message and stop sys.exit(0) converterList=[PDFtoTextDocumentConverter()] # List of document converters to use titles_kb = KnowledgeBase(fn = cfg_refextract_kb_journal_titles) institutes = InstituteList(fn = cfg_refextract_kb_report_numbers) refSect_processor = ReferenceSectionMarkupProcessor(institutes, titles_kb) openxmltag = u"""<?xml version="1.0" encoding="UTF-8"?>""" opencollectiontag = u"""<collection xmlns="http://www.loc.gov/MARC21/slim">""" closecollectiontag = u"""</collection>\n""" done_coltags = False for curitem in recidfiles: # Perform required processing (according to stages): if not os.access(curitem[1], os.F_OK): # path to file invalid sys.stderr.write("E: File Path %s invalid! Ignored.\n" % (curitem,)) continue doc = None if len(converterList) < 1: sys.stderr.write("E: No document converter tools available - cannot process reference extraction.\n" % (curitem,)) sys.exit(1) # Convert file to text: for conv in converterList: doc = conv.convertDocument(curitem[1]) try: if not doc.isEmpty(): break except AttributeError: pass if doc is None: sys.stderr.write("""W: File "%s" cannot be converted to plain-text. Cannot be processed.\n""" % (curitem,)) continue # Do "Extract References" Stage try: if doc.isEmpty(): sys.stderr.write("""W: File "%s" appears to be empty or cannot be read-in. Cannot be processed.\n""" % (curitem,)) continue except AttributeError: sys.stderr.write("""W: File "%s" appears to be empty or cannot be read-in. Cannot be processed.\n""" % (curitem,)) continue if doc is None: sys.stderr.write("""W: File "%s" appears to be empty or cannot be read-in. Cannot be processed.\n""" % (curitem,)) continue refSection = doc.extractReferences() if not done_coltags and not refSection.isEmpty(): # Output collection tags: sys.stdout.write("%s\n" % (openxmltag.encode("utf-8"),)) sys.stdout.write("%s\n" % (opencollectiontag.encode("utf-8"),)) done_coltags = True # Do citation title standardisation stage processedReferenceSection = refSect_processor.getProcessedReferenceSection(refSection) ReferenceSectionDisplayer().display(processedReferenceSection, curitem[0]) if done_coltags: sys.stdout.write("%s\n" % (closecollectiontag.encode("utf-8"),)) diff --git a/modules/bibedit/lib/refextract_config.py b/modules/bibedit/lib/refextract_config.py index a262dc317..3958113e4 100644 --- a/modules/bibedit/lib/refextract_config.py +++ b/modules/bibedit/lib/refextract_config.py @@ -1,54 +1,54 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. -from config import version, etcdir, pdftotext +from cdsware.config import version, etcdir, pdftotext # version number: cfg_refextract_version = "CDSware/%s refextract/%s" % (version, version) # periodicals knowledge base: cfg_refextract_kb_journal_titles = "%s/bibedit/refextract-journal-titles.kb" % etcdir # report numbers knowledge base: cfg_refextract_kb_report_numbers = "%s/bibedit/refextract-report-numbers.kb" % etcdir # path to pdftotext executable: cfg_refextract_pdftotext = pdftotext ### FIXME. The following are not used in this early release. Do not change them. # Not important in this version: cfg_refextract_cat = "LD_LIBRARY_PATH='/opt/SUNWspro/lib:/usr/openwin/lib:/usr/dt/lib:/usr/local/lib'; export LD_LIBRARY_PATH; /bin/cat" # Again, not important in this version: cfg_refextract_gunzip = "LD_LIBRARY_PATH='/opt/SUNWspro/lib:/usr/openwin/lib:/usr/dt/lib:/usr/local/lib'; export LD_LIBRARY_PATH; /bin/gunzip" # Again not important in this version: cfg_refextract_gs = "/usr/bin/gs" # cfg_refextract_no_citation_treatment: # If no usable citations are found in a line, there are 2 options: # 1) If this flag is set to 0, DO NOT use the standardised version of the line. Instead, strip off the line marker and # mark up the original UNTOUCHED line as miscellaneous text. # 2) If this flag is set to 1, mark up the "standardised" version of the line # as Miscellaneous text. This could result in a better formed reference line as titles could be # standardised and corrected, BUT, there is a risk that the line could also be corrupted by # partial title identification for example. cfg_refextract_no_citation_treatment = 0 diff --git a/modules/bibharvest/bin/bibharvest.in b/modules/bibharvest/bin/bibharvest.in index 11edb7a6e..243ed026c 100644 --- a/modules/bibharvest/bin/bibharvest.in +++ b/modules/bibharvest/bin/bibharvest.in @@ -1,274 +1,272 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware OAI harvestor.""" __version__ = "$Id$" try: import httplib import urllib import sys import re import string import getopt import time except ImportError, e: print "Error: %s" % e import sys sys.exit(1) try: - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.config import adminemail, version, cdsname except ImportError, e: print "Error: %s" % e import sys sys.exit(1) http_response_status_code = { "000" : "Unknown", "100" : "Continue", "200" : "OK", "302" : "Redirect", "403" : "Forbidden", "404" : "Not Found", "500" : "Error", "503" : "Service Unavailable" } def http_param_resume(http_param_dict,resumptionToken): "Change parameter dictionary for harvest resumption" http_param = { 'verb' : http_param_dict['verb'], 'resumptionToken' : resumptionToken } return http_param def http_request_parameters(http_param_dict, method="POST"): "Assembly http request parameters for http method used" params = "" if method == "GET": for key in http_param_dict.keys(): if params: params = "%s&" % (params) if key: params = "%s%s=%s" % (params, key, http_param_dict[key]) elif method == "POST": http_param = {} for key in http_param_dict.keys(): if http_param_dict[key]: http_param[key] = http_param_dict[key] params = urllib.urlencode(http_param) return params def OAI_Session(server, script, http_param_dict ,method="POST",output="", stylesheet=""): "Handle OAi session" sys.stderr.write("Starting the harvesting session at %s" % time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())) sys.stderr.write("%s - %s\n" % (server, http_request_parameters(http_param_dict))) a = OAI_Request(server, script, http_request_parameters(http_param_dict, method), method) rt_obj = re.search('>.*</resumptionToken>',a) i = 0 while rt_obj != None and rt_obj !="": if output: write_file( "%s.%07d" % (output,i), a) else: sys.stdout.write(a) i = i + 1 time.sleep(1) http_param_dict = http_param_resume(http_param_dict,rt_obj.group()[1:-18]) a = OAI_Request(server, script, http_request_parameters(http_param_dict, method), method) rt_obj = re.search('>.*</resumptionToken>',a) if output: write_file("%s.%07d" % (output,i),a) else: sys.stdout.write(a) def write_file(filename="harvest",a=""): "Writes a to filename" f = open(filename,"w") f.write(a) f.close() def help(): "Print out info" print "\n bibharvest -fhimoprsuv baseURL\n" print " -h print this help" print " -V print version number" print " -o<outputfilename> specify output file" print " -v<verb> OAI verb to be executed" print " -m<method> http method (default POST)" print " -p<metadataPrefix> metadata format" print " -i<identifier> OAI identifier" print " -s<set> OAI set" print " -r<resuptionToken> Resume previous harvest" print " -f<from> from date (datestamp)" print " -u<until> until date (datestamp)\n" def OAI_Request(server, script, params, method="POST"): "Handle OAi request" headers = {"Content-type":"application/x-www-form-urlencoded", "Accept":"text/xml", "From": adminemail, "User-Agent":"CDSware %s" % version} i = 0 while i < 10: i = i + 1 conn = httplib.HTTPConnection(server) if method == "GET": conn.putrequest(method,script + "?" + params) conn.putheader("Content-type","application/x-www-form-urlencoded") conn.putheader("Accept","text/xml") conn.putheader("From", adminemail) conn.putheader("User-Agent", cdsname) conn.endheaders() elif method == "POST": conn.request("POST", script, params, headers) response = conn.getresponse() status = "%d" % response.status if http_response_status_code.has_key(status): sys.stderr.write("%s(%s) : %s : %s\n" % (status, http_response_status_code[status], response.reason, params)) else: sys.stderr.write("%s(%s) : %s : %s\n" % (status, http_response_status_code['000'], response.reason, params)) if response.status == 200: i = 10 data = response.read() conn.close() return data elif response.status == 503: sys.stderr.write("Retry in %d seconds...\n" % string.atoi(response.getheader("Retry-After","%d" % (i*i)))) time.sleep(string.atoi(response.getheader("Retry-After","%d" % (i*i)))) elif response.status == 302: sys.stderr.write("Redirecting...\n") server = response.getheader("Location").split("/")[2] script = "/" + string.join(response.getheader("Location").split("/")[3:],"/") else: sys.stderr.write("Retry in 10 seconds...\n") time.sleep(10) sys.stderr.write("Harvesting interrupted (after 10 attempts) at %s: %s\n" % (time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())),params) sys.exit(1) def main(): "Main" try: opts, args = getopt.getopt(sys.argv[1:],"hVo:v:m:p:i:s:f:u:r:x:", [ "help", "version", "output", "verb", "method", "metadataPrefix", "identifier", "set", "from", "until", "resumptionToken" ] ) except getopt.error: help() sys.exit(1) http_param_dict = {} method = "POST" output = "" stylesheet = "" # get options and arguments for opt, opt_value in opts: if opt == "-v": http_param_dict['verb'] = opt_value elif opt == "-m": if opt_value == "GET" or opt_value == "POST": method = opt_value elif opt == "-p": http_param_dict['metadataPrefix'] = opt_value elif opt == "-i": http_param_dict['identifier'] = opt_value elif opt == "-s": http_param_dict['set'] = opt_value elif opt == "-f": http_param_dict['from'] = opt_value elif opt == "-u": http_param_dict['until'] = opt_value elif opt == "-r": http_param_dict['resumptionToken'] = opt_value elif opt == "-o": output = opt_value elif opt == "-x": stylesheet = opt_value elif opt in ["-V", "--version"]: print __version__ sys.exit(0) else: help() sys.exit() if len(args) > 0: server = args[0].split("/")[2] script = "/" + string.join(args[0].split("/")[3:],"/") OAI_Session(server, script, http_param_dict, method, output, stylesheet) sys.stderr.write("Harvesting successfully completed at: %s\n\n" % time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())) else: help() sys.exit() if __name__ == '__main__': main() diff --git a/modules/bibharvest/bin/oaiharvest.in b/modules/bibharvest/bin/oaiharvest.in index be1744625..82cfdc8f2 100644 --- a/modules/bibharvest/bin/oaiharvest.in +++ b/modules/bibharvest/bin/oaiharvest.in @@ -1,55 +1,53 @@ #!/usr/bin/python2.3 ## -*- mode: python; coding: utf-8; -*- ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware BibHarvest Admin Task. It launches bibharvest periodically by reading table oaiHARVEST. Usage: oaiharvest %s [options] Examples: oaiharvest -r arxiv -s 24h oaiharvest -r pubmed -d 2005-05-05:2005-05-10 -t 10m Specific options: -r, --repository=NAME name of the OAI repository to be harvested (default=all) Scheduling options: -u, --user=USER user name to store task, password needed -s, --sleeptime=SLEEP time after which to repeat task (no) e.g.: 1s, 30m, 24h, 7d -t, --time=TIME moment for the task to be active (now) e.g.: +15s, 5m, 3h, 2002-10-27 13:57:26 General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1) """ try: import sys - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.oaiharvestlib import main except ImportError, e: print "Error: %s" % e import sys sys.exit(1) main() diff --git a/modules/bibharvest/lib/bibharvest_templates.py b/modules/bibharvest/lib/bibharvest_templates.py index c840efa55..075425d96 100644 --- a/modules/bibharvest/lib/bibharvest_templates.py +++ b/modules/bibharvest/lib/bibharvest_templates.py @@ -1,229 +1,229 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import urllib import time import cgi import gettext import traceback import sre import urllib import sys -from config import * -from messages import gettext_set_language, language_list_long +from cdsware.config import * +from cdsware.messages import gettext_set_language, language_list_long class Template: def tmpl_getnavtrail(self, ln, previous): """Get the navigation trail - 'previous' *string* - The previous navtrail""" _ = gettext_set_language(ln) navtrail = """<a class=navtrail href="%s/admin/">Admin Area</a> > <a class=navtrail href="%s/admin/bibharvest/">BibHarvest Admin</a> """ % (weburl, weburl) navtrail = navtrail + previous return navtrail def tmpl_draw_titlebar(self, ln, weburl, title, guideurl, extraname="", extraurl=""): """Draws an html title bar - 'title' *string* - The name of the titlebar - 'weburl' *string* - The general weburl root for this admin section (e.g. admin/bibharvest/guide.html#mi ) - 'guideurl' *string* - The relative url of the guide relative to this section - 'extraname' *string* - The name of an extra function - 'extraurl' *string* - The relative url to an extra function """ _ = gettext_set_language(ln) guidetitle = _("See Guide") titlebar = """ <table class="admin_wvar_nomargin"><th class="adminheader">""" titlebar += """%s   <small>[<a title="%s" href="%s/%s">?</a>]</small>""" % (title, guidetitle, weburl, guideurl) if extraname and extraurl: titlebar += """               <small>[<a href="%s/%s">%s</a>]</small>""" % (weburl, extraurl, extraname) titlebar += """</th></table>""" return titlebar def tmpl_draw_subtitle(self, ln, weburl, title, subtitle, guideurl): """Draws an html title bar - 'title' *string* - The name of the titlebar - 'subtitle' *string* - The header name of the subtitle - 'weburl' *string* - The general weburl root for this admin section (e.g. admin/bibharvest/guide.html#mi ) - 'guideurl' *string* - The relative url of the guide relative to this section """ _ = gettext_set_language(ln) guidetitle = _("See Guide") titlebar = """<a name="%s">""" % title titlebar += """ </a>%s   <small>""" % subtitle titlebar += """ [<a title="%s" href="%s/%s">?</a>]</small>""" % (guidetitle, weburl, guideurl) return titlebar def tmpl_link_with_args(self, ln, weburl, funcurl, title, args): """Draws an html title bar - 'weburl' *string* - The general weburl root for this admin section (e.g. admin/bibharvest/guide.html#mi ) - 'funcurl' *string* - The relative url to this section - 'title' *string* - The name of the link - 'args' *list* - The list of arguments to be appended to the url in the form [name, value] """ _ = gettext_set_language(ln) initurl = '<a href="' + weburl + '/' + funcurl endurl = '" title="' + title + '">' + title + '</a>' noargs = len(args) if noargs==0: # there are no arguments, close link and return return initurl + endurl else: # we have args. list them in the link, then close it and return it argsurl = '?' count = 1 for arg in args: if count != noargs: argsurl += arg[0] + '=' + arg[1] + '&' else: argsurl += arg[0] + '=' + arg[1] count = count + 1 return initurl+argsurl+endurl def tmpl_output_numbersources(self, ln, numbersources): """Get the navigation trail - 'number of sources' *int* - The number of sources in the database""" _ = gettext_set_language(ln) present = _("OAI sources currently present in the database") notpresent = _("No OAI sources currently present in the database") if (numbersources>0): output = """    <strong><span class="info">%s %s</span></strong><br><br>""" % (numbersources, present) return output else: output = """    <strong><span class="warning">%s</span></strong><br><br>""" % notpresent return output def tmpl_output_schedule(self, ln, schtime, schstatus): _ = gettext_set_language(ln) msg_next = _("Next oaiharvest task") msg_sched = _("scheduled time:") msg_cur = _("current status:") msg_notask = _("No oaiharvest task currently scheduled") if schtime and schstatus: output = """    <strong>%s<br>              - %s %s <br>              - %s %s </strong><br><br>""" % (msg_next, msg_sched, schtime, msg_cur, schstatus) return output else: output = """    <strong><span class="warning">%s</span></strong><br><br>""" % msg_notask return output def tmpl_admin_w200_text(self, ln, title, name, value): """Draws an html w200 text box - 'title' *string* - The name of the textbox - 'name' *string* - The name of the value in the textbox - 'value' *string* - The value in the textbox""" _ = gettext_set_language(ln) text = """<span class="adminlabel">%s""" % title text += """</span><input class="admin_w200" type="text" name="%s" value="%s" /><br>""" % (cgi.escape(name,1), cgi.escape(value, 1)) return text def tmpl_admin_w200_select(self, ln, title, name, valuenil, values, lastval=""): """Draws an html w200 drop-down box - 'title' *string* - The name of the dd box - 'name' *string* - The name of the value in the dd box - 'value' *list* - The values in the textbox""" _ = gettext_set_language(ln) text = """<span class="adminlabel">%s""" % title text += """</span><select name="%s" class="admin_w200">""" % name text += """<option value="">%s</option>""" % valuenil try: for val in values: intval = int(lastval) if intval==int(val[0]): ## retrieve and display last value inputted into drop-down box text += """<option value="%s" %s>%s</option>""" % (val[0], 'selected="selected"', str(val[1])) else: text += """<option value="%s">%s</option>""" % (val[0], str(val[1])) text += """</select><br>""" except StandardError, e: for val in values: if lastval==val[0]: text += """<option value="%s" %s>%s</option>""" % (val[0], 'selected="selected"', str(val[1])) else: text += """<option value="%s">%s</option>""" % (val[0], str(val[1])) text += """</select><br>""" return text ### deprecated ### # def tmpl_print_src_add_tips(self): # """Outputs some tips for source adding and editing""" # text = """<br> # <small><strong>Frequency</strong>: how often do you intend to harvest from this repository?</small><br><br> # <small><strong>Starting date</strong>: do you intend to harvest the whole repository (from beginning) or only newly added material (from today)? <strong>WARNING</strong>: harvesting large collections of material may take very long!</small><br><br> # <small><strong>Postprocess</strong>: how is the harvested material be dealt with after harvesting? # <ul><li>h: harvest only<li>h-c: harvest and convert<li>h-c-u: harvest, convert and upload</ul></small> # <small><strong>BibConvert configuration file</strong>: if postprocess involves conversion, please include full path to a configuration file</small> # """ # return text ### deprecated ### # def tmpl_print_src_edit_tips(self): # """Outputs some tips for source adding and editing""" # text = """<br> # <small><strong>Frequency</strong>: how often do you intend to harvest from this repository?</small><br><br> # <small><strong>Postprocess</strong>: how is the harvested material be dealt with after harvesting? # <ul><li>h: harvest only<li>h-c: harvest and convert<li>h-c-u: harvest, convert and upload</ul></small> # <small><strong>BibConvert configuration file</strong>: if postprocess involves conversion, please include full path to a configuration file</small> # """ # return text ### deprecated ### # def tmpl_print_validate_tips(self): # """Outputs some tips for source validation""" # text = """<br><small><strong>Validate</strong>: to check that the baseURL of the repository is OAI-compliant</small>""" # return text def tmpl_print_info(self, ln, infotext): """Outputs some info""" _ = gettext_set_language(ln) text = """<br><b><span class="info">%s</span></b>""" % infotext return text def tmpl_print_warning(self, ln, warntext): """Outputs some info""" _ = gettext_set_language(ln) text = """<span class="warning">%s</span>""" % warntext return text def tmpl_print_brs(self, ln, howmany): """Outputs some <br>s""" _ = gettext_set_language(ln) text = "" while howmany>>0: text += """<br>""" howmany = howmany - 1 return text def tmpl_output_validate_info(self, ln, outcome, base): """Prints a message to say whether source was validated or not - 'outcome' *int* - 0=success, 1=fail - 'base' *string* - baseurl""" _ = gettext_set_language(ln) msg_success = _("successfully validated!") msg_nosuccess = _("does not seem to be a OAI-compliant baseURL...") if (outcome==0): output = """<BR><span class="info">baseURL <strong>%s</strong> %s</span>""" % (base, msg_success) return output else: output = """<BR><span class="info">baseURL <strong>%s</strong> %s</span>""" % (base, msg_nosuccess) return output diff --git a/modules/bibharvest/lib/bibharvestadminlib.py b/modules/bibharvest/lib/bibharvestadminlib.py index 25b7e3e47..e986ef4fa 100644 --- a/modules/bibharvest/lib/bibharvestadminlib.py +++ b/modules/bibharvest/lib/bibharvestadminlib.py @@ -1,440 +1,441 @@ ## $Id$ ## Administrator interface for BibIndex ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware Bibharvest Administrator Interface.""" import cgi import re import MySQLdb import Numeric import os, sys, string import ConfigParser import time import random import urllib import sre -from bibrankadminlib import write_outcome,modify_translations,get_def_name,get_i8n_name,get_name,get_rnk_nametypes,get_languages,check_user,is_adminuser,adderrorbox,addadminbox,tupletotable,tupletotable_onlyselected,addcheckboxes,createhiddenform,serialize_via_numeric_array_dumps,serialize_via_numeric_array_compr,serialize_via_numeric_array_escape,serialize_via_numeric_array,deserialize_via_numeric_array,serialize_via_marshal,deserialize_via_marshal -from dbquery import run_sql -from messages import * -from config import * -from webpage import page, pageheaderonly, pagefooteronly -from webuser import getUid, get_email from mod_python import apache -import template -bibharvest_templates = template.load('bibharvest') +from cdsware.bibrankadminlib import write_outcome,modify_translations,get_def_name,get_i8n_name,get_name,get_rnk_nametypes,get_languages,check_user,is_adminuser,adderrorbox,addadminbox,tupletotable,tupletotable_onlyselected,addcheckboxes,createhiddenform,serialize_via_numeric_array_dumps,serialize_via_numeric_array_compr,serialize_via_numeric_array_escape,serialize_via_numeric_array,deserialize_via_numeric_array,serialize_via_marshal,deserialize_via_marshal +from cdsware.dbquery import run_sql +from cdsware.messages import * +from cdsware.config import * +from cdsware.webpage import page, pageheaderonly, pagefooteronly +from cdsware.webuser import getUid, get_email + +import cdsware.template +bibharvest_templates = cdsware.template.load('bibharvest') tmppath = tmpdir + '/bibharvestadmin.' + str(os.getpid()) guideurl = "admin/bibharvest/guide.html" freqs = [[0, "never"], [24, "daily"], [168, "weekly"], [720, "monthly"] ] posts = [["h", "harvest only (h)"], ["h-c", "harvest and convert (h-c)"], ["h-u", "harvest and upload (h-u)"], ["h-c-u", "harvest, convert and upload (h-c-u)"]] dates = [[0, "from beginning"], [1, "from today"]] __version__ = "$Id$" def getnavtrail(previous = ''): """Get the navtrail""" return bibharvest_templates.tmpl_getnavtrail(previous = previous, ln = cdslang) def perform_request_index(ln=cdslang): """start area for administering harvesting from OAI repositories""" titlebar = bibharvest_templates.tmpl_draw_titlebar(ln = cdslang, weburl = weburl, title = "Overview of sources", guideurl = guideurl, extraname = "add new OAI source" , extraurl = "admin/bibharvest/bibharvestadmin.py/addsource" ) titlebar2 = bibharvest_templates.tmpl_draw_titlebar(ln = cdslang, weburl = weburl, title = "Harvesting status", guideurl = guideurl) header = ['name', 'baseURL', 'metadataprefix', 'frequency', 'bibconvertfile', 'postprocess', 'actions'] header2 = ['name', 'last update'] oai_src = get_oai_src() upd_status = get_update_status() sources = [] for (oai_src_id,oai_src_name,oai_src_baseurl,oai_src_prefix,oai_src_frequency,oai_src_config,oai_src_post) in oai_src: namelinked_args = [] namelinked_args.append(["oai_src_id", str(oai_src_id)]) namelinked_args.append(["ln", ln]) namelinked = bibharvest_templates.tmpl_link_with_args(ln = cdslang, weburl = weburl, funcurl = "admin/bibharvest/bibharvestadmin.py/editsource", title = oai_src_name, args = namelinked_args) freq = "Not Set" if oai_src_frequency==0: freq = "never" elif oai_src_frequency==24: freq = "daily" elif oai_src_frequency==168: freq = "weekly" elif oai_src_frequency==720: freq = "monthly" editACTION = bibharvest_templates.tmpl_link_with_args(ln = cdslang, weburl = weburl, funcurl = "admin/bibharvest/bibharvestadmin.py/editsource", title = "edit", args = namelinked_args) delACTION = bibharvest_templates.tmpl_link_with_args(ln = cdslang, weburl = weburl, funcurl = "admin/bibharvest/bibharvestadmin.py/delsource", title = "delete", args = namelinked_args) action = editACTION + " / " + delACTION sources.append([namelinked,oai_src_baseurl,oai_src_prefix,freq,oai_src_config,oai_src_post, action]) updates = [] for (upd_name, upd_status) in upd_status: if not upd_status: upd_status = bibharvest_templates.tmpl_print_warning(cdslang, "Never harvested") else: #cut away leading zeros upd_status = sre.sub(r'\.[0-9]+$', '', str(upd_status)) updates.append([upd_name, upd_status]) (schtime, schstatus) = get_next_schedule() if schtime: schtime = sre.sub(r'\.[0-9]+$', '', str(schtime)) output = titlebar output += bibharvest_templates.tmpl_output_numbersources(cdslang, get_tot_oai_src()) output += tupletotable(header=header, tuple=sources) output += bibharvest_templates.tmpl_print_brs(cdslang, 2) output += titlebar2 output += bibharvest_templates.tmpl_output_schedule(cdslang, schtime, str(schstatus)) output += tupletotable(header=header2, tuple=updates) return output def perform_request_editsource(oai_src_id, oai_src_name='', oai_src_baseurl='', oai_src_prefix='', oai_src_frequency='', oai_src_config='', oai_src_post='',ln=cdslang, confirm=-1): """creates html form to edit a OAI source. this method is calling other methods which again is calling this and sending back the output of the method. confirm - determines the validation status of the data input into the form""" output = "" subtitle = bibharvest_templates.tmpl_draw_subtitle(ln = cdslang, weburl = weburl, title = "edit source", subtitle = "Edit OAI source", guideurl = guideurl) if confirm in [-1, "-1"]: oai_src = get_oai_src(oai_src_id) oai_src_name = oai_src[0][1] oai_src_baseurl = oai_src[0][2] oai_src_prefix = oai_src[0][3] oai_src_frequency = oai_src[0][4] oai_src_config = oai_src[0][5] oai_src_post = oai_src[0][6] text = bibharvest_templates.tmpl_print_brs(cdslang, 1) text += bibharvest_templates.tmpl_admin_w200_text(ln = cdslang, title = "Source name", name = "oai_src_name", value = oai_src_name) text += bibharvest_templates.tmpl_admin_w200_text(ln = cdslang, title = "Base URL", name = "oai_src_baseurl", value = oai_src_baseurl) text += bibharvest_templates.tmpl_admin_w200_text(ln = cdslang, title = "Metadata prefix", name = "oai_src_prefix", value = oai_src_prefix) text += bibharvest_templates.tmpl_admin_w200_select(ln = cdslang, title = "Frequency", name = "oai_src_frequency", valuenil = "- select frequency -" , values = freqs, lastval = oai_src_frequency) text += bibharvest_templates.tmpl_admin_w200_select(ln = cdslang, title = "Postprocess", name = "oai_src_post", valuenil = "- select mode -" , values = posts, lastval = oai_src_post) text += bibharvest_templates.tmpl_admin_w200_text(ln = cdslang, title = "Bibconvert configuration file", name = "oai_src_config", value = oai_src_config) text += bibharvest_templates.tmpl_print_brs(cdslang, 2) output += createhiddenform(action="editsource#1", text=text, button="Modify", oai_src_id=oai_src_id, ln=ln, confirm=1) if confirm in [1, "1"] and not oai_src_name: output += bibharvest_templates.tmpl_print_info(cdslang, "Please enter a name for the source.") elif confirm in [1, "1"] and not oai_src_prefix: output += bibharvest_templates.tmpl_print_info(cdslang, "Please enter a metadata prefix.") elif confirm in [1, "1"] and not oai_src_baseurl: output += bibharvest_templates.tmpl_print_info(cdslang, "Please enter a base url.") elif confirm in [1, "1"] and not oai_src_frequency: output += bibharvest_templates.tmpl_print_info(cdslang, "Please choose a frequency of harvesting") elif confirm in [1, "1"] and not oai_src_post: output += bibharvest_templates.tmpl_print_info(cdslang, "Please choose a postprocess mode") elif confirm in [1, "1"] and (oai_src_post=="h-c" or oai_src_post=="h-c-u") and (not oai_src_config or validatefile(oai_src_config)!=0): output += bibharvest_templates.tmpl_print_info(cdslang, "You selected a postprocess mode which involves conversion.") output += bibharvest_templates.tmpl_print_info(cdslang, "Please enter a valid full path to a bibConvert config file or change postprocess mode.") elif oai_src_id > -1 and confirm in [1, "1"]: if not oai_src_frequency: oai_src_frequency = 0 if not oai_src_config: oai_src_config = "NULL" if not oai_src_post: oai_src_post = "h" res = modify_oai_src(oai_src_id, oai_src_name, oai_src_baseurl, oai_src_prefix, oai_src_frequency, oai_src_config, oai_src_post) output += write_outcome(res) lnargs = [["ln", ln]] output += bibharvest_templates.tmpl_print_brs(cdslang, 2) output += bibharvest_templates.tmpl_link_with_args(ln = cdslang, weburl = weburl, funcurl = "admin/bibharvest/bibharvestadmin.py/index", title = "Go back to the OAI sources overview", args = lnargs ) try: body = [output, extra] except NameError: body = [output] return addadminbox(subtitle, body) def perform_request_addsource(oai_src_name, oai_src_baseurl='', oai_src_prefix='', oai_src_frequency='', oai_src_lastrun='', oai_src_config='', oai_src_post='', ln=cdslang, confirm=-1): """creates html form to add a new source""" subtitle = bibharvest_templates.tmpl_draw_subtitle(ln = cdslang, weburl = weburl, title = "add source", subtitle = "Add new OAI source", guideurl = guideurl) output = "" if confirm <= -1: text = bibharvest_templates.tmpl_print_brs(cdslang, 1) text += bibharvest_templates.tmpl_admin_w200_text(ln = cdslang, title = "Enter the base url", name = "oai_src_baseurl", value = oai_src_baseurl) output = createhiddenform(action="addsource", text=text, ln=ln, button="Validate", confirm=0) if confirm not in ["-1", -1] and validate(oai_src_baseurl)==1: lnargs = [["ln", ln]] output += bibharvest_templates.tmpl_output_validate_info(cdslang, 1, str(oai_src_baseurl)) output += bibharvest_templates.tmpl_print_brs(cdslang, 2) output += bibharvest_templates.tmpl_link_with_args(ln = cdslang, weburl = weburl, funcurl = "admin/bibharvest/bibharvestadmin.py/addsource", title = "Try again", args = []) output += bibharvest_templates.tmpl_print_brs(cdslang, 1) output += """or""" output += bibharvest_templates.tmpl_print_brs(cdslang, 1) output += bibharvest_templates.tmpl_link_with_args(ln = cdslang, weburl = weburl, funcurl = "admin/bibharvest/bibharvestadmin.py/index", title = "Go back to the OAI sources overview", args = lnargs) if confirm not in ["-1", -1] and validate(oai_src_baseurl)==0: output += bibharvest_templates.tmpl_output_validate_info(cdslang, 0, str(oai_src_baseurl)) output += bibharvest_templates.tmpl_print_brs(cdslang, 2) text = bibharvest_templates.tmpl_admin_w200_text(ln = cdslang, title = "Source name", name = "oai_src_name", value = oai_src_name) metadatas = findMetadataFormats(oai_src_baseurl) prefixes = [] for value in metadatas: prefixes.append([value, str(value)]) text += bibharvest_templates.tmpl_admin_w200_select(ln = cdslang, title = "Metadata prefix", name = "oai_src_prefix", valuenil = "- select prefix -" , values = prefixes, lastval = oai_src_prefix) text += bibharvest_templates.tmpl_admin_w200_select(ln = cdslang, title = "Frequency", name = "oai_src_frequency", valuenil = "- select frequency -" , values = freqs, lastval = oai_src_frequency) text += bibharvest_templates.tmpl_admin_w200_select(ln = cdslang, title = "Starting date", name = "oai_src_lastrun", valuenil = "- select a date -" , values = dates, lastval = oai_src_lastrun) text += bibharvest_templates.tmpl_admin_w200_select(ln = cdslang, title = "Postprocess", name = "oai_src_post", valuenil = "- select mode -" , values = posts, lastval = oai_src_post) text += bibharvest_templates.tmpl_admin_w200_text(ln = cdslang, title = "Bibconvert configuration file", name = "oai_src_config", value = oai_src_config) text += bibharvest_templates.tmpl_print_brs(cdslang, 2) output += createhiddenform(action="addsource#1", text=text, button="Add OAI Source", oai_src_baseurl=oai_src_baseurl, ln=ln, confirm=1) if confirm in [1, "1"] and not oai_src_name: output += bibharvest_templates.tmpl_print_info(cdslang, "Please enter a name for the source.") elif confirm in [1, "1"] and not oai_src_prefix: output += bibharvest_templates.tmpl_print_info(cdslang, "Please enter a metadata prefix.") elif confirm in [1, "1"] and not oai_src_frequency: output += bibharvest_templates.tmpl_print_info(cdslang, "Please choose a frequency of harvesting") elif confirm in [1, "1"] and not oai_src_lastrun: output += bibharvest_templates.tmpl_print_info(cdslang, "Please choose the harvesting starting date") elif confirm in [1, "1"] and not oai_src_post: output += bibharvest_templates.tmpl_print_info(cdslang, "Please choose a postprocess mode") elif confirm in [1, "1"] and (oai_src_post=="h-c" or oai_src_post=="h-c-u") and (not oai_src_config or validatefile(oai_src_config)!=0): output += bibharvest_templates.tmpl_print_info(cdslang, "You selected a postprocess mode which involves conversion.") output += bibharvest_templates.tmpl_print_info(cdslang, "Please enter a valid full path to a bibConvert config file or change postprocess mode.") elif oai_src_name and confirm in [1, "1"]: if not oai_src_frequency: oai_src_frequency = 0 if not oai_src_lastrun: oai_src_lastrun = 1 if not oai_src_config: oai_src_config = "NULL" if not oai_src_post: oai_src_post = "h" res = add_oai_src(oai_src_name, oai_src_baseurl, oai_src_prefix, oai_src_frequency, oai_src_lastrun, oai_src_config, oai_src_post) output += write_outcome(res) lnargs = [["ln", ln]] output += bibharvest_templates.tmpl_print_brs(cdslang, 2) output += bibharvest_templates.tmpl_link_with_args(ln = cdslang, weburl = weburl, funcurl = "admin/bibharvest/bibharvestadmin.py/index", title = "Go back to the OAI sources overview", args = lnargs ) try: body = [output, extra] except NameError: body = [output] return addadminbox(subtitle, body) def perform_request_delsource(oai_src_id, ln=cdslang, callback='yes', confirm=0): """creates html form to delete a source """ if oai_src_id: oai_src = get_oai_src(oai_src_id) namesrc = (oai_src[0][1]) pagetitle = """Delete OAI source: %s""" % namesrc subtitle = bibharvest_templates.tmpl_draw_subtitle(ln = cdslang, weburl = weburl, title = "delete source", subtitle = pagetitle, guideurl = guideurl) output = "" if confirm in ["0", 0]: if oai_src: question = """Do you want to delete the OAI source '%s' and all its definitions?""" % namesrc text = bibharvest_templates.tmpl_print_info(cdslang, question) text += bibharvest_templates.tmpl_print_brs(cdslang, 3) output += createhiddenform(action="delsource#5", text=text, button="Confirm", oai_src_id=oai_src_id, confirm=1) else: return bibharvest_templates.tmpl_print_info(cdslang, "Source specified does not exist.") elif confirm in ["1", 1]: res = delete_oai_src(oai_src_id) if res[0] == 1: output += bibharvest_templates.tmpl_print_info(cdslang, "Source removed.") output += bibharvest_templates.tmpl_print_brs(cdslang, 1) output += write_outcome(res) else: output += write_outcome(res) lnargs = [["ln", ln]] output += bibharvest_templates.tmpl_print_brs(cdslang, 2) output += bibharvest_templates.tmpl_link_with_args(ln = cdslang, weburl = weburl, funcurl = "admin/bibharvest/bibharvestadmin.py/index", title = "Go back to the OAI sources overview", args = lnargs ) try: body = [output, extra] except NameError: body = [output] return addadminbox(subtitle, body) ################################################################## ### Here the functions to retrieve, modify, delete and add sources ################################################################## def get_oai_src(oai_src_id=''): """Returns a row parameters for a given id""" sql = "SELECT id,name,baseurl,metadataprefix,frequency,bibconvertcfgfile,postprocess FROM oaiHARVEST" try: if oai_src_id: sql += " WHERE id=%s" % oai_src_id sql += " ORDER BY id asc" res = run_sql(sql) return res except StandardError, e: return "" def modify_oai_src(oai_src_id, oai_src_name, oai_src_baseurl, oai_src_prefix, oai_src_frequency, oai_src_config, oai_src_post): """Modifies a row's parameters""" try: sql = "UPDATE oaiHARVEST SET name='%s' WHERE id=%s" % (MySQLdb.escape_string(oai_src_name), oai_src_id) res = run_sql(sql) sql = "UPDATE oaiHARVEST SET baseurl='%s' WHERE id=%s" % (MySQLdb.escape_string(oai_src_baseurl), oai_src_id) res = run_sql(sql) sql = "UPDATE oaiHARVEST SET metadataprefix='%s' WHERE id=%s" % (MySQLdb.escape_string(oai_src_prefix), oai_src_id) res = run_sql(sql) sql = "UPDATE oaiHARVEST SET frequency='%s' WHERE id=%s" % (MySQLdb.escape_string(oai_src_frequency), oai_src_id) res = run_sql(sql) sql = "UPDATE oaiHARVEST SET bibconvertcfgfile='%s' WHERE id=%s" % (MySQLdb.escape_string(oai_src_config), oai_src_id) res = run_sql(sql) sql = "UPDATE oaiHARVEST SET postprocess='%s' WHERE id=%s" % (MySQLdb.escape_string(oai_src_post), oai_src_id) res = run_sql(sql) return (1, "") except StandardError, e: return (0, e) def add_oai_src(oai_src_name, oai_src_baseurl, oai_src_prefix, oai_src_frequency, oai_src_lastrun, oai_src_config, oai_src_post): """Adds a new row to the database with the given parameters""" try: if oai_src_lastrun in [0, "0"]: lastrun_mode = 'NULL' else: lastrun_mode = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) # lastrun_mode = "'"+lastrun_mode+"'" sql = "insert into oaiHARVEST values (0, '%s', '%s', NULL, NULL, '%s', '%s', '%s', '%s', '%s')" % (MySQLdb.escape_string(oai_src_baseurl), MySQLdb.escape_string(oai_src_prefix), MySQLdb.escape_string(oai_src_config), MySQLdb.escape_string(oai_src_name), MySQLdb.escape_string(lastrun_mode), MySQLdb.escape_string(oai_src_frequency), MySQLdb.escape_string(oai_src_post)) res = run_sql(sql) return (1, "") except StandardError, e: return (0, e) def delete_oai_src(oai_src_id): """Deletes a row from the database according to its id""" try: res = run_sql("DELETE FROM oaiHARVEST WHERE id=%s" % oai_src_id) return (1, "") except StandardError, e: return (0, e) def get_tot_oai_src(): """Returns number of rows in the database""" try: sql = "select count(*) FROM oaiHARVEST" res = run_sql(sql) return res[0][0] except StandardError, e: return "" def get_update_status(): """Returns a table showing a list of all rows and their LastUpdate status""" try: sql = "select name,lastrun from oaiHARVEST order by lastrun desc" res = run_sql(sql) return res except StandardError, e: return "" def get_next_schedule(): """Returns the next scheduled oaiharvestrun tasks""" try: sql = "select runtime,status from schTASK where proc='oaiharvest' and runtime > now() ORDER by runtime limit 1" res = run_sql(sql) if len(res)>>0: return res[0] else: return ("", "") except StandardError, e: return ("","") def validate(oai_src_baseurl): """This function validates a baseURL by opening its URL and 'greping' for the <OAI-PMH> and <Identify> tags: 0 = okay 1 = baseURL non existing """ try: url = oai_src_baseurl + "?verb=Identify" urllib.urlretrieve(url, tmppath) grepOUT1 = os.popen('grep -iwc "<identify>" '+tmppath).read() grepOUT2 = os.popen('grep -iwc "<OAI-PMH" '+tmppath).read() if int(grepOUT1) >> 0 and int(grepOUT2) >> 0: #print "Valid!" return 0 else: #print "Not valid!" return 1 except StandardError, e: return 1 def validatefile(oai_src_config): """This function checks whether teh given path to text file exists or not 0 = okay 1 = file non existing """ try: ftmp = open(oai_src_config, 'r') cfgstr= ftmp.read() ftmp.close() if cfgstr!="": #print "Valid!" return 0 else: #print "Not valid!" return 1 except StandardError, e: return 1 def findMetadataFormats(oai_src_baseurl): """This function finds the Metadata formats offered by a OAI repository by analysing the output of verb=ListMetadataFormats""" url = oai_src_baseurl + "?verb=ListMetadataFormats" urllib.urlretrieve(url, tmppath) formats = [] ftmp = open(tmppath, 'r') xmlstr= ftmp.read() ftmp.close() chunks = xmlstr.split('<metadataPrefix>') count = 0 # first chunk is invalid for chunk in chunks: if count!=0: formats.append(chunk.split('</metadataPrefix>')[0]) count = count + 1 return formats diff --git a/modules/bibharvest/lib/oai_repository.py b/modules/bibharvest/lib/oai_repository.py index 5b8796bc5..9b68e7f63 100644 --- a/modules/bibharvest/lib/oai_repository.py +++ b/modules/bibharvest/lib/oai_repository.py @@ -1,885 +1,885 @@ ## $Id$ ## OAI interface for CDSware/MySQL written in Python compliant with OAI-PMH2.0 ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """OAI interface for CDSware/MySQL written in Python compliant with OAI-PMH2.0""" import cPickle import string from string import split import os import re import urllib import sys import time import md5 -from oai_repository_config import * -from config import * -from dbquery import run_sql +from cdsware.oai_repository_config import * +from cdsware.config import * +from cdsware.dbquery import run_sql verbs = { "Identify" : [""], "ListSets" : ["resumptionToken"], "ListMetadataFormats" : ["resumptionToken"], "ListRecords" : ["resumptionToken"], "ListIdentifiers" : ["resumptionToken"], "GetRecord" : [""] } params = { "verb" : ["Identify","ListIdentifiers","ListSets","ListMetadataFormats","ListRecords","GetRecord"], "metadataPrefix" : ["","oai_dc","marcxml"], "from" :[""], "until":[""], "set" :[""], "identifier": [""] } def encode_for_xml(strxml): "Encode special chars in string for XML-compliancy." if strxml == None: return strxml else: strxml = string.replace(strxml, '&', '&') strxml = string.replace(strxml, '<', '<') return strxml def escape_space(strxml): "Encode special chars in string for URL-compliancy." strxml = string.replace(strxml, ' ', '%20') return strxml def encode_for_url(strxml): "Encode special chars in string for URL-compliancy." strxml = string.replace(strxml, '%', '%25') strxml = string.replace(strxml, ' ', '%20') strxml = string.replace(strxml, '?', '%3F') strxml = string.replace(strxml, '#', '%23') strxml = string.replace(strxml, '=', '%3D') strxml = string.replace(strxml, '&', '%26') strxml = string.replace(strxml, '/', '%2F') strxml = string.replace(strxml, ':', '%3A') strxml = string.replace(strxml, ';', '%3B') strxml = string.replace(strxml, '+', '%2B') return strxml def oai_header(args, verb): "Print OAI header" out = "" out = out + "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" + "\n" out = out + "<OAI-PMH xmlns=\"http://www.openarchives.org/OAI/2.0/\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd\">\n" out = out + " <responseDate>" + oaigetresponsedate() + "</responseDate>\n" if verb: out = out + " <request verb=\"%s\">%s</request>\n" % (verb, oaigetrequesturl(args)) out = out + " <%s>\n" % verb else: out = out + " <request>%s</request>\n" % (oaigetrequesturl(args)) return out def oai_footer(verb): "Print OAI footer" out = "" if verb: out = "%s </%s>\n" % (out, verb) out = out + "</OAI-PMH>\n" return out def oai_error_header(args, verb): "Print OAI header" out = "" ### out = "Content-Type: text/xml\n\n" out = out + "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" + "\n" out = out + "<OAI-PMH xmlns=\"http://www.openarchives.org/OAI/2.0/\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd\">\n" out = out + " <responseDate>" + oaigetresponsedate() + "</responseDate>\n" out = out + " <request verb=\"%s\">%s</request>\n" % (verb, oaigetrequesturl(args)) return out def oai_error_footer(verb): "Print OAI footer" out = verb out = "</OAI-PMH>\n" return out def get_field(sysno, field): "Gets list of field 'field' for the record with 'sysno' system number." out = [] digit = field[0:2] bibbx = "bib%sx" % digit bibx = "bibrec_bib%sx" % digit query = "SELECT bx.value FROM %s AS bx, %s AS bibx WHERE bibx.id_bibrec='%s' AND bx.id=bibx.id_bibxxx AND bx.tag='%s'" % (bibbx, bibx, sysno, field) res = run_sql(query) for row in res: out.append(row[0]) return out def utc_to_localtime(date): "Convert UTC to localtime" ldate = date.split("T")[0] ltime = date.split("T")[1] lhour = ltime.split(":")[0] lminute = ltime.split(":")[1] lsec = ltime.split(":")[2] lyear = ldate.split("-")[0] lmonth = ldate.split("-")[1] lday = ldate.split("-")[2] timetoconvert = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.mktime((string.atoi(lyear), string.atoi(lmonth), string.atoi(lday), string.atoi(lhour), string.atoi(lminute), string.atoi(lsec[:-1]), 0, 0, -1)) - time.timezone + (time.daylight)*3600)) return timetoconvert def localtime_to_utc(date): "Convert localtime to UTC" ldate = date.split(" ")[0] ltime = date.split(" ")[1] lhour = ltime.split(":")[0] lminute = ltime.split(":")[1] lsec = ltime.split(":")[2] lyear = ldate.split("-")[0] lmonth = ldate.split("-")[1] lday = ldate.split("-")[2] timetoconvert = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime(time.mktime((string.atoi(lyear), string.atoi(lmonth), string.atoi(lday), string.atoi(lhour), string.atoi(lminute), string.atoi(lsec), 0, 0, -1)))) return timetoconvert def get_creation_date(sysno): "Returns the creation date of the record 'sysno'." out = "" res = run_sql("SELECT DATE_FORMAT(creation_date, '%%Y-%%m-%%d %%H:%%i:%%s') FROM bibrec WHERE id=%s", (sysno,), 1) if res[0][0]: out = localtime_to_utc(res[0][0]) return out def get_modification_date(sysno): "Returns the date of last modification for the record 'sysno'." out = "" res = run_sql("SELECT DATE_FORMAT(modification_date,'%%Y-%%m-%%d %%H:%%i:%%s') FROM bibrec WHERE id=%s", (sysno,), 1) if res[0][0]: out = localtime_to_utc(res[0][0]) return out def get_earliest_datestamp(): "Get earliest datestamp in the database" out = "" res = run_sql("SELECT MIN(DATE_FORMAT(creation_date,'%%Y-%%m-%%d %%H:%%i:%%s')) FROM bibrec", (), 1) if res[0][0]: out = localtime_to_utc(res[0][0]) return out def check_date(date, dtime="T00:00:00Z"): "Check if the date has a correct format" if(re.sub("[0123456789\-:TZ]", "", date) == ""): if len(date) == 10: date = date + dtime if len(date) == 20: date = utc_to_localtime(date) else: date = "" else: date = "" return date def record_exists(sysno): "Returns 1 if record with SYSNO 'sysno' exists. Returns 0 otherwise." out = 0 query = "SELECT id FROM bibrec WHERE id='%s'" % (sysno) res = run_sql(query) for row in res: if row[0] != "": out = 1 return out def print_record(sysno, format='marcxml'): "Prints record 'sysno' formatted accoding to 'format'." out = "" # sanity check: if not record_exists(sysno): return if (format == "dc") or (format == "oai_dc"): format = "xd" # print record opening tags: out = out + " <record>\n" if is_deleted(sysno) and oaideleted != "no": out = out + " <header status=\"deleted\">\n" else: out = out + " <header>\n" for ident in get_field(sysno, oaiidfield): out = "%s <identifier>%s</identifier>\n" % (out, escape_space(ident)) out = "%s <datestamp>%s</datestamp>\n" % (out, get_modification_date(sysno)) for set in get_field(sysno, oaisetfield): out = "%s <setSpec>%s</setSpec>\n" % (out, set) out = out + " </header>\n" if is_deleted(sysno) and oaideleted != "no": pass else: out = out + " <metadata>\n" if format == "marcxml": out = out + " <record xmlns=\"http://www.loc.gov/MARC21/slim\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd\" type=\"Bibliographic\">" out = out + " <leader>00000coc 2200000uu 4500</leader>" ## MARC21 and XML formats, possibley OAI -- they are not in "bibfmt" table; so fetch all the data from "bibXXx" tables: if format == "marcxml": out = "%s <controlfield tag=\"001\">%d</controlfield>\n" % (out, int(sysno)) for digit1 in range(0, 10): for digit2 in range(0, 10): bibbx = "bib%d%dx" % (digit1, digit2) bibx = "bibrec_bib%d%dx" % (digit1, digit2) query = "SELECT b.tag,b.value,bb.field_number FROM %s AS b, %s AS bb "\ "WHERE bb.id_bibrec='%s' AND b.id=bb.id_bibxxx AND b.tag LIKE '%s%%' "\ "ORDER BY bb.field_number, b.tag ASC" % (bibbx, bibx, sysno, str(digit1)+str(digit2)) res = run_sql(query) field_number_old = -999 field_old = "" for row in res: field, value, field_number = row[0], row[1], row[2] ind1, ind2 = field[3], field[4] if ind1 == "_": ind1 = " " if ind2 == "_": ind2 = " " # print field tag if field_number != field_number_old or field[:-1] != field_old[:-1]: if format == "marcxml": if field_number_old != -999: out = out + " </datafield>\n" out = "%s <datafield tag=\"%s\" ind1=\"%s\" ind2=\"%s\">\n" % (out, encode_for_xml(field[0:3]), encode_for_xml(ind1).lower(), encode_for_xml(ind2).lower()) field_number_old = field_number field_old = field # print subfield value if format == "marcxml": value = encode_for_xml(value) out = "%s <subfield code=\"%s\">%s</subfield>\n" % (out, encode_for_xml(field[-1:]), value) # fetch next subfield # all fields/subfields printed in this run, so close the tag: if (format == "marcxml") and field_number_old != -999: out = out + " </datafield>\n" out = out + " </record>\n" elif format == "xd": # XML Dublin Core format, possibly OAI -- select only some bibXXx fields: out = out + " <oaidc:dc xmlns=\"http://purl.org/dc/elements/1.1/\" xmlns:oaidc=\"http://www.openarchives.org/OAI/2.0/oai_dc/\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation=\"http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd\">\n" for field_ in get_field(sysno, "041__a"): out = "%s <language>%s</language>\n" % (out, field_) for field_ in get_field(sysno, "100__a"): out = "%s <creator>%s</creator>\n" % (out, encode_for_xml(field_)) for field_ in get_field(sysno, "700__a"): out = "%s <creator>%s</creator>\n" % (out, encode_for_xml(field_)) for field_ in get_field(sysno, "245__a"): out = "%s <title>%s\n" % (out, encode_for_xml(field_)) for field_ in get_field(sysno, "111__a"): out = "%s %s\n" % (out, encode_for_xml(field_)) for field_ in get_field(sysno, "65017a"): out = "%s %s\n" % (out, encode_for_xml(field_)) for field_ in get_field(sysno, "8564_u"): out = "%s %s\n" % (out, encode_for_xml(escape_space(field_))) for field_ in get_field(sysno, "520__a"): out = "%s %s\n" % (out, encode_for_xml(field_)) date = get_creation_date(sysno) out = "%s %s\n" % (out, date) out = out + " \n" # print record closing tags: out = out + " \n" out = out + " \n" return out def oailistmetadataformats(args): "Generates response to oailistmetadataformats verb." arg = parse_args(args) out = "" flag = 1 # list or not depending on identifier if arg['identifier'] != "": flag = 0 sysno = oaigetsysno(arg['identifier']) if record_exists(sysno): flag = 1 else: out = out + oai_error("idDoesNotExist","invalid record Identifier") out = oai_error_header(args, "ListMetadataFormats") + out + oai_error_footer("ListMetadataFormats") return out if flag: out = out + " \n" out = out + " oai_dc\n" out = out + " http://www.openarchives.org/OAI/1.1/dc.xsd\n" out = out + " http://purl.org/dc/elements/1.1/\n" out = out + " \n" out = out + " \n" out = out + " marcxml\n" out = out + " http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd\n" out = out + " http://www.loc.gov/MARC21/slim\n" out = out + " \n" out = oai_header(args, "ListMetadataFormats") + out + oai_footer("ListMetadataFormats") return out def oailistrecords(args): "Generates response to oailistrecords verb." arg = parse_args(args) out = "" sysnos = [] sysno = [] # check if the resumptionToken did not expire if arg['resumptionToken']: filename = "%s/RTdata/%s" % (logdir, arg['resumptionToken']) if os.path.exists(filename) == 0: out = oai_error("badResumptionToken", "ResumptionToken expired") out = oai_error_header(args, "ListRecords") + out + oai_error_footer("ListRecords") return out if arg['resumptionToken'] != "": sysnos = oaicacheout(arg['resumptionToken']) arg['metadataPrefix'] = sysnos.pop() else: sysnos = oaigetsysnolist(arg['set'], arg['from'], arg['until']) if len(sysnos) == 0: # noRecordsMatch error out = out + oai_error("noRecordsMatch", "no_ records correspond to the request") out = oai_error_header(args, "ListRecords") + out + oai_error_footer("ListRecords") return out i = 0 for sysno_ in sysnos: if sysno_: i = i + 1 if i > nb_records_in_resume: # cache or write? if i == nb_records_in_resume + 1: # resumptionToken? arg['resumptionToken'] = oaigenresumptionToken() extdate = oaigetresponsedate(oai_rt_expire) if extdate: out = "%s %s\n" % (out, extdate, arg['resumptionToken']) else: out = "%s %s\n" % (out, arg['resumptionToken']) sysno.append(sysno_) else: done = 0 for field_ in get_field(sysno_, "245__a"): if done == 0: out = out + print_record(sysno_, arg['metadataPrefix']) if i > nb_records_in_resume: oaicacheclean() sysno.append(arg['metadataPrefix']) oaicachein(arg['resumptionToken'], sysno) out = oai_header(args, "ListRecords") + out + oai_footer("ListRecords") return out def oailistsets(args): "Lists available sets for OAI metadata harvesting." out = "" # note: no flow control in ListSets sets = get_sets() for set_ in sets: out = out + " \n" out = "%s %s\n" % (out, set_[0]) out = "%s %s\n" % (out, set_[1]) if set_[2]: out = "%s %s\n" % (out, set_[2]) out = out + " \n" out = oai_header(args, "ListSets") + out + oai_footer("ListSets") return out def oaigetrecord(args): """Returns record 'identifier' according to 'metadataPrefix' format for OAI metadata harvesting.""" arg = parse_args(args) out = "" sysno = oaigetsysno(arg['identifier']) if record_exists(sysno): datestamp = get_modification_date(sysno) out = out + print_record(sysno, arg['metadataPrefix']) else: out = out + oai_error("idDoesNotExist", "invalid record Identifier") out = oai_error_header(args, "GetRecord") + out + oai_error_footer("GetRecord") return out out = oai_header(args, "GetRecord") + out + oai_footer("GetRecord") return out def oailistidentifiers(args): "Prints OAI response to the ListIdentifiers verb." arg = parse_args(args) out = "" sysno = [] sysnos = [] if arg['resumptionToken']: filename = "%s/RTdata/%s" % (logdir, arg['resumptionToken']) if os.path.exists(filename) == 0: out = out + oai_error("badResumptionToken", "ResumptionToken expired") out = oai_error_header(args, "ListIdentifiers") + out + oai_error_footer("ListIdentifiers") return out if arg['resumptionToken']: sysnos = oaicacheout(arg['resumptionToken']) else: sysnos = oaigetsysnolist(arg['set'], arg['from'], arg['until']) if len(sysnos) == 0: # noRecordsMatch error out = out + oai_error("noRecordsMatch", "no records correspond to the request") out = oai_error_header(args, "ListIdentifiers") + out + oai_error_footer("ListIdentifiers") return out i = 0 for sysno_ in sysnos: if sysno_: i = i + 1 if i > nb_identifiers_in_resume: # cache or write? if i == nb_identifiers_in_resume + 1: # resumptionToken? arg['resumptionToken'] = oaigenresumptionToken() extdate = oaigetresponsedate(oai_rt_expire) if extdate: out = "%s %s\n" % (out, extdate, arg['resumptionToken']) else: out = "%s %s\n" % (out, arg['resumptionToken']) sysno.append(sysno_) else: done = 0 for field_ in get_field(sysno_, "245__a"): if done == 0: for ident in get_field(sysno_, oaiidfield): if is_deleted(sysno_) and oaideleted != "no": out = out + "
\n" else: out = out + "
\n" out = "%s %s\n" % (out, escape_space(ident)) out = "%s %s\n" % (out, get_modification_date(oaigetsysno(ident))) for set in get_field(sysno_, oaisetfield): out = "%s %s\n" % (out, set) out = out + "
\n" done = 1 if i > nb_identifiers_in_resume: oaicacheclean() # clean cache from expired resumptionTokens oaicachein(arg['resumptionToken'], sysno) out = oai_header(args, "ListIdentifiers") + out + oai_footer("ListIdentifiers") return out def oaiidentify(args): "Generates response to oaiidentify verb." out = "" repositoryname = " " + cdsname + "\n" baseurl = " %s/oai2d.py/\n" % weburl protocolversion = " 2.0\n" adminemail = " %s\n" % supportemail earliestdst = " %s\n" % get_earliest_datestamp() deletedrecord = " %s\n" % oaideleted repositoryidentifier = "%s" % oaiidprefix sampleidentifier = oaisampleidentifier identifydescription = oaiidentifydescription + "\n" out = out + repositoryname out = out + baseurl out = out + protocolversion out = out + adminemail out = out + earliestdst out = out + deletedrecord out = out + " YYYY-MM-DDThh:mm:ssZ\n" # print " \n" out = out + oaiidentifydescription out = oai_header(args, "Identify") + out + oai_footer("Identify") return out def oaigetrequesturl(args): "Generates requesturl tag for OAI." # re_amp = re.compile('&') requesturl = weburl + "/" + "oai2d.py/"# + "?" + re_amp.sub("&", args) return requesturl def oaigetresponsedate(delay=0): "Generates responseDate tag for OAI." return time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime(time.time() + delay)) def oai_error(code, msg): "OAI error occured" return "%s\n" % (code, msg) def oaigetsysno(identifier): "Returns the first MySQL BIB ID for the OAI identifier 'identifier', if it exists." sysno = None if identifier: query = "SELECT DISTINCT(bb.id_bibrec) FROM bib%sx AS bx, bibrec_bib%sx AS bb WHERE bx.tag='%s' AND bb.id_bibxxx=bx.id AND bx.value='%s'" % (oaiidfield[0:2], oaiidfield[0:2], oaiidfield, identifier) res = run_sql(query) for row in res: sysno = row[0] return sysno def oaigetsysnolist(set, fromdate, untildate): "Returns list of system numbers for the OAI set 'set', modified from 'date_from' until 'date_until'." out_dict = {} # dict to hold list of out sysnos as its keys if set: query = "SELECT DISTINCT bibx.id_bibrec FROM bib%sx AS bx LEFT JOIN bibrec_bib%sx AS bibx ON bx.id=bibx.id_bibxxx LEFT JOIN bibrec AS b ON b.id=bibx.id_bibrec WHERE bx.tag='%s' AND bx.value='%s'" % (oaiidfield[0:2], oaiidfield[0:2], oaisetfield, set) else: query = "SELECT DISTINCT bibx.id_bibrec FROM bib%sx AS bx LEFT JOIN bibrec_bib%sx AS bibx ON bx.id=bibx.id_bibxxx LEFT JOIN bibrec AS b ON b.id=bibx.id_bibrec WHERE bx.tag='%s'" % (oaiidfield[0:2], oaiidfield[0:2], oaiidfield) if untildate: query = query + " AND b.modification_date <= '%s'" % untildate if fromdate: query = query + " AND b.modification_date >= '%s'" % fromdate res = run_sql(query) for row in res: out_dict[row[0]] = 1 return out_dict.keys() def is_deleted(recid): "Check if record with recid has been deleted. Return 1 if deleted." query = "select a.id from bibrec as a left join bibrec_bib98x as b on a.id=b.id_bibrec left join bib98x as c on b.id_bibxxx=c.id where c.value='DELETED' and a.id=%s" % recid res = run_sql(query) for item in res: if item == None: return 0 else: return 1 def oaigenresumptionToken(): "Generates unique ID for resumption token management." return md5.new(str(time.time())).hexdigest() def oaicachein(resumptionToken, sysnos): "Stores or adds sysnos in cache. Input is a string of sysnos separated by commas." filename = "%s/RTdata/%s" % (logdir, resumptionToken) fil = open(filename, "w") cPickle.dump(sysnos, fil) fil.close() return 1 def oaicacheout(resumptionToken): "Restores string of comma-separated system numbers from cache." sysnos = [] filename = "%s/RTdata/%s" % (logdir, resumptionToken) if oaicachestatus(resumptionToken): fil = open(filename, "r") sysnos = cPickle.load(fil) fil.close() else: return 0 return sysnos def oaicacheclean(): "Removes cached resumptionTokens older than specified" directory = "%s/RTdata" % logdir files = os.listdir(directory) for file_ in files: filename = directory + "/" + file_ # cache entry expires when not modified during a specified period of time if ((time.time() - os.path.getmtime(filename)) > oai_rt_expire): os.remove(filename) return 1 def oaicachestatus(resumptionToken): "Checks cache status. Returns 0 for empty, 1 for full." filename = "%s/RTdata/%s" % (logdir, resumptionToken) if os.path.exists(filename): if os.path.getsize(filename) > 0: return 1 else: return 0 else: return 0 def get_sets(): "Returns list of sets." out = [] row = ['', ''] query = "SELECT setSpec,setName,setDescription FROM oaiSET" res = run_sql(query) for row in res: row_bis = [row[0], row[1], row[2]] out.append(row_bis) return out def parse_args(args=""): "Parse input args" out_args = { "verb" : "", "metadataPrefix" : "", "from" : "", "until" : "", "set" : "", "identifier" : "", "resumptionToken" : "" } if args == "" or args == None: pass else: list_of_arguments = args.split('&') for item in list_of_arguments: keyvalue = item.split('=') if len(keyvalue) == 2: if (out_args.has_key(keyvalue[0])): if(out_args[keyvalue[0]] != ""): out_args[keyvalue[0]] = "Error" else: out_args[keyvalue[0]] = urllib.unquote(keyvalue[1]) else: out_args[keyvalue[0]] = urllib.unquote(keyvalue[1]) else: out_args['verb'] = "" return out_args def check_args(arguments): "Check OAI arguments" out_args = { "verb" : "", "metadataPrefix" : "", "from" : "", "until" : "", "set" : "", "identifier" : "", "resumptionToken" : "" } out = "" ## principal argument required # # if verbs.has_key(arguments['verb']): pass else: out = out + oai_error("badVerb", "Illegal OAI verb") ## defined args # # for param in arguments.keys(): if out_args.has_key(param): pass else: out = out + oai_error("badArgument", "The request includes illegal arguments") ## unique args # # for param in arguments.keys(): if (arguments[param] == "Error"): out = out + oai_error("badArgument", "The request includes illegal arguments") ## resumptionToken exclusive # # if ((arguments['from'] != "" or arguments['until'] != "" or arguments['metadataPrefix'] != "" or arguments['identifier'] != "" or arguments['set'] != "") and arguments['resumptionToken'] != ""): out = out + oai_error("badArgument", "The request includes illegal arguments") ## datestamp formats # # if arguments['from'] != "" and arguments['from'] != "": from_length = len(arguments['from']) if check_date(arguments['from'], "T00:00:00Z") == "": out = out + oai_error("badArgument", "Bad datestamp format in from") else: from_length = 0 if arguments['until'] != "" and arguments['until'] != "": until_length = len(arguments['until']) if check_date(arguments['until'], "T23:59:59Z") == "": out = out + oai_error("badArgument", "Bad datestamp format in until") else: until_length = 0 if from_length != 0: if until_length != 0: if from_length != until_length: out = out + oai_error("badArgument", "Bad datestamp format") if arguments['from'] != "" and arguments['until'] != "" and arguments['from'] > arguments['until']: out = out + oai_error("badArgument", "Wrong date") ## Identify exclusive # # if (arguments['verb'] =="Identify" and (arguments['metadataPrefix'] != "" or arguments['identifier'] != "" or arguments['set'] != "" or arguments['from'] != "" or arguments['until'] != "" or arguments['resumptionToken'] != "")): out = out + oai_error("badArgument", "The request includes illegal arguments") ## parameters for GetRecord # # if arguments['verb'] =="GetRecord" and arguments['identifier'] == "": out = out + oai_error("badArgument", "Record identifier missing") if arguments['verb'] =="GetRecord" and arguments['metadataPrefix'] == "": out = out + oai_error("badArgument", "Missing metadataPrefix") ## parameters for ListRecords and ListIdentifiers # # if (arguments['verb'] =="ListRecords" or arguments['verb'] =="ListIdentifiers") and (arguments['metadataPrefix'] == "" and arguments['resumptionToken'] == ""): out = out + oai_error("badArgument", "Missing metadataPrefix") ## Metadata prefix defined # # if arguments.has_key('metadataPrefix'): if ((arguments['metadataPrefix'] in params['metadataPrefix']) or (params['metadataPrefix'] == "")): pass else: out = out + oai_error("badArgument", "Missing metadataPrefix") return out diff --git a/modules/bibharvest/lib/oai_repository_config.py b/modules/bibharvest/lib/oai_repository_config.py index 58092041b..b3b69194a 100644 --- a/modules/bibharvest/lib/oai_repository_config.py +++ b/modules/bibharvest/lib/oai_repository_config.py @@ -1,30 +1,30 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """OAI repository config""" -from config import oaiidprefix, \ - oaisampleidentifier, \ - oaiidentifydescription, \ - oaiidfield, \ - oaisetfield, \ - oaideleted, \ - oai_rt_expire, \ - nb_records_in_resume +from cdsware.config import oaiidprefix, \ + oaisampleidentifier, \ + oaiidentifydescription, \ + oaiidfield, \ + oaisetfield, \ + oaideleted, \ + oai_rt_expire, \ + nb_records_in_resume diff --git a/modules/bibharvest/lib/oai_repository_tests.py b/modules/bibharvest/lib/oai_repository_tests.py index b757d8aee..9cf9c3ebb 100644 --- a/modules/bibharvest/lib/oai_repository_tests.py +++ b/modules/bibharvest/lib/oai_repository_tests.py @@ -1,80 +1,81 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## CDSware OAI repository unit tests. ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Unit tests for the oai repository.""" __version__ = "$Id$" -import oai_repository import unittest import re +from cdsware import oai_repository + class TestVerbs(unittest.TestCase): """Test for OAI verb functionality.""" def test_verbs(self): """bibharvest oai repository - testing verbs""" self.assertNotEqual(None, re.search("Identify", oai_repository.oaiidentify(""))) self.assertNotEqual(None, re.search("ListIdentifiers", oai_repository.oailistidentifiers(""))) self.assertNotEqual(None, re.search("ListRecords", oai_repository.oailistrecords(""))) self.assertNotEqual(None, re.search("ListMetadataFormats", oai_repository.oailistmetadataformats(""))) self.assertNotEqual(None, re.search("ListSets", oai_repository.oailistsets(""))) self.assertNotEqual(None, re.search("GetRecord", oai_repository.oaigetrecord(""))) class TestErrorCodes(unittest.TestCase): """Test for handling OAI error codes.""" def test_issue_error_identify(self): """bibharvest oai repository - testing error codes""" self.assertNotEqual(None, re.search("badVerb", oai_repository.check_args(oai_repository.parse_args("junk")))) self.assertNotEqual(None, re.search("badVerb", oai_repository.check_args(oai_repository.parse_args("verb=IllegalVerb")))) self.assertNotEqual(None, re.search("badArgument", oai_repository.check_args(oai_repository.parse_args("verb=Identify&test=test")))) self.assertNotEqual(None, re.search("badArgument", oai_repository.check_args(oai_repository.parse_args("verb=ListIdentifiers&metadataPrefix=oai_dc&from=some_random_date&until=some_random_date")))) self.assertNotEqual(None, re.search("badArgument", oai_repository.check_args(oai_repository.parse_args("verb=ListIdentifiers&metadataPrefix=oai_dc&from=2001-01-01&until=2002-01-01T00:00:00Z")))) self.assertNotEqual(None, re.search("badArgument", oai_repository.check_args(oai_repository.parse_args("verb=ListIdentifiers")))) self.assertNotEqual(None, re.search("badArgument", oai_repository.check_args(oai_repository.parse_args("verb=ListIdentifiers&metadataPrefix=illegal_mdp")))) self.assertNotEqual(None, re.search("badArgument", oai_repository.check_args(oai_repository.parse_args("verb=ListIdentifiers&metadataPrefix=oai_dc&metadataPrefix=oai_dc")))) self.assertNotEqual(None, re.search("badArgument", oai_repository.check_args(oai_repository.parse_args("verb=ListRecords&metadataPrefix=oai_dc&set=really_wrong_set&from=some_random_date&until=some_random_date")))) self.assertNotEqual(None, re.search("badArgument", oai_repository.check_args(oai_repository.parse_args("verb=ListRecords")))) class TestEncodings(unittest.TestCase): """Test for OAI response encodings.""" def test_encoding(self): """bibharvest oai repository - testing encodings""" self.assertEqual("<&>", oai_repository.encode_for_xml("<&>")) self.assertEqual("%20", oai_repository.escape_space(" ")) self.assertEqual("%25%20%3F%23%3D%26%2F%3A%3B%2B", oai_repository.encode_for_url("% ?#=&/:;+")) def create_test_suite(): """Return test suite for the oai repository.""" return unittest.TestSuite((unittest.makeSuite(TestVerbs, 'test'), unittest.makeSuite(TestErrorCodes, 'test'), unittest.makeSuite(TestEncodings, 'test'))) if __name__ == "__main__": unittest.TextTestRunner(verbosity=2).run(create_test_suite()) diff --git a/modules/bibharvest/lib/oaiharvestlib.py b/modules/bibharvest/lib/oaiharvestlib.py index 57581327c..2e251775f 100644 --- a/modules/bibharvest/lib/oaiharvestlib.py +++ b/modules/bibharvest/lib/oaiharvestlib.py @@ -1,609 +1,610 @@ # -*- coding: utf-8 -*- ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ oaiharvest implementation. See oaiharvest executable for entry point. """ from marshal import loads,dumps import getopt import getpass import string import os import sre import sys import time import MySQLdb import Numeric import signal import traceback import calendar -from config import * -from bibindex_engine_config import * -from dbquery import run_sql -from access_control_engine import acc_authorize_action + +from cdsware.config import * +from cdsware.bibindex_engine_config import * +from cdsware.dbquery import run_sql +from cdsware.access_control_engine import acc_authorize_action ## precompile some often-used regexp for speed reasons: sre_subfields = sre.compile('\$\$\w'); sre_html = sre.compile("(?s)<[^>]*>|&#?\w+;") sre_datetime_shift = sre.compile("([-\+]{0,1})([\d]+)([dhms])") tmpHARVESTpath = tmpdir + '/oaiharvest' def write_message(msg, stream=sys.stdout): """Write message and flush output stream (may be sys.stdout or sys.stderr).""" if stream == sys.stdout or stream == sys.stderr: stream.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())) stream.write("%s\n" % msg) stream.flush() else: sys.stderr.write("Unknown stream %s. [must be sys.stdout or sys.stderr]\n" % stream) return def usage(code, msg=''): "Prints usage for this module." if msg: sys.stderr.write("Error: %s.\n" % msg) usagetext = """ Usage: oaiharvest [options] Examples: oaiharvest -r arxiv -s 24h oaiharvest -r pubmed -d 2005-05-05:2005-05-10 -t 10m Specific options: -r, --repository=REPOS_ONE,"REPOS TWO" name of the OAI repositories to be harvested (default=all) -d, --dates=yyyy-mm-dd:yyyy-mm-dd harvest repositories between specified dates (overrides repositories' last updated timestamps) Scheduling options: -u, --user=USER user name to store task, password needed -s, --sleeptime=SLEEP time after which to repeat tasks (no) e.g.: 1s, 30m, 24h, 7d -t, --time=TIME moment for the task to be active (now) e.g.: +15s, 5m, 3h , 2002-10-27 13:57:26 General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1) """ sys.stderr.write(usagetext) sys.exit(code) def authenticate(user, header="oaiharvest Task Submission", action="runoaiharvest"): """Authenticate the user against the user database. Check for its password, if it exists. Check for action access rights. Return user name upon authorization success, do system exit upon authorization failure. """ print header print "=" * len(header) if user == "": print >> sys.stdout, "\rUsername: ", user = string.strip(string.lower(sys.stdin.readline())) else: print >> sys.stdout, "\rUsername: ", user ## first check user pw: res = run_sql("select id,password from user where email=%s", (user,), 1) if not res: print "Sorry, %s does not exist." % user sys.exit(1) else: (uid_db, password_db) = res[0] if password_db: password_entered = getpass.getpass() if password_db == password_entered: pass else: print "Sorry, wrong credentials for %s." % user sys.exit(1) ## secondly check authorization for the action: (auth_code, auth_message) = acc_authorize_action(uid_db, action) if auth_code != 0: print auth_message sys.exit(1) return user def get_datetime(var, format_string="%Y-%m-%d %H:%M:%S"): """Returns a date string according to the format string. It can handle normal date strings and shifts with respect to now.""" date = time.time() factors = {"d":24*3600, "h":3600, "m":60, "s":1} m = sre_datetime_shift.match(var) if m: sign = m.groups()[0] == "-" and -1 or 1 factor = factors[m.groups()[2]] value = float(m.groups()[1]) date = time.localtime(date + sign * factor * value) date = time.strftime(format_string, date) else: date = time.strptime(var, format_string) date = time.strftime(format_string, date) return date def task_run(row): """Run the harvesting task. The row argument is the Bibharvest task queue row, containing if, arguments, etc. Return 1 in case of success and 0 in case of failure. """ global options, task_id reposlist = [] datelist = [] dateflag = 0 # read from SQL row: task_id = row[0] task_proc = row[1] options = loads(row[6]) task_status = row[7] # sanity check: if task_proc != "oaiharvest": write_message("The task #%d does not seem to be a oaiharvest task." % task_id, sys.stderr) return 0 if task_status != "WAITING": write_message("The task #%d is %s. I expected WAITING." % (task_id, task_status), sys.stderr) return 0 # we can run the task now: if options["verbose"]: write_message("Task #%d started." % task_id) task_starting_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) task_update_status("RUNNING") # install signal handlers signal.signal(signal.SIGUSR1, task_sig_sleep) signal.signal(signal.SIGTERM, task_sig_stop) signal.signal(signal.SIGABRT, task_sig_suicide) signal.signal(signal.SIGCONT, task_sig_wakeup) signal.signal(signal.SIGINT, task_sig_unknown) ### go ahead: build up the reposlist if options["repository"] != None: ### user requests harvesting from selected repositories write_message("harvesting from selected repositories") for reposname in options["repository"]: row = get_row_from_reposname(reposname) if row==[]: write_message("source name " + reposname + " is not valid") continue else: reposlist.append(get_row_from_reposname(reposname)) else: ### user requests harvesting from all repositories write_message("harvesting from all repositories in the database") reposlist = get_all_rows_from_db() ### go ahead: check if user requested from-until harvesting if options["dates"]: ### for each repos simply perform a from-until date harvesting... no need to update anything dateflag = 1 for element in options["dates"]: datelist.append(element) for repos in reposlist: postmode = str(repos[0][9]) if postmode=="h" or postmode=="h-c" or postmode=="h-u" or postmode=="h-c-u": harvestpath = tmpdir + "/oaiharvest" + str(os.getpid()) if dateflag == 1: res = call_bibharvest(prefix=repos[0][2], baseurl=repos[0][1], harvestpath=harvestpath, fro=str(datelist[0]), until=str(datelist[1])) if res==0 : write_message("source " + str(repos[0][6]) + " was harvested from " + str(datelist[0]) + " to " + str(datelist[1])) else : write_message("an error occurred while harvesting from source " + str(repos[0][6]) + " for the dates chosen") continue elif dateflag != 1 and repos[0][7] == None and repos[0][8] != 0: write_message("source " + str(repos[0][6]) + " was never harvested before - harvesting whole repository") res = call_bibharvest(prefix=repos[0][2], baseurl=repos[0][1], harvestpath=harvestpath) if res==0 : update_lastrun(repos[0][0]) else : write_message("an error occurred while harvesting from source " + str(repos[0][6])) continue elif dateflag != 1 and repos[0][8] != 0: ### check that update is actually needed, i.e. lastrun+frequency>today timenow = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) lastrundate = sre.sub(r'\.[0-9]+$', '', str(repos[0][7])) # remove trailing .00 timeinsec = int(repos[0][8])*60*60 updatedue = add_timestamp_and_timelag(lastrundate, timeinsec) proceed = compare_timestamps_with_tolerance(updatedue, timenow) if proceed==0 or proceed==-1 : #update needed! write_message("source " + str(repos[0][6]) + " is going to be updated") fromdate = str(repos[0][7]) fromdate = fromdate.split()[0] # get rid of time of the day for the moment res = call_bibharvest(prefix=repos[0][2], baseurl=repos[0][1], harvestpath=harvestpath, fro=fromdate) if res==0 : update_lastrun(repos[0][0]) else : write_message("an error occurred while harvesting from source " + str(repos[0][6])) continue else: write_message("source " + str(repos[0][6]) + " does not need updating") continue elif dateflag != 1 and repos[0][8] == 0: write_message("source " + str(repos[0][6]) + " has frequency set to 'Never' so it will not be updated") continue if postmode=="h-u": res = call_bibupload(convertpath=harvestpath) if res==0 : write_message("material harvested from source " + str(repos[0][6]) + " was successfully uploaded") else : write_message("an error occurred while uploading harvest from " + str(repos[0][6])) continue if postmode=="h-c" or postmode=="h-c-u": convertpath = tmpdir + "/bibconvertrun" + str(os.getpid()) res = call_bibconvert(config=str(repos[0][5]), harvestpath=harvestpath, convertpath=convertpath) if res==0 : write_message("material harvested from source " + str(repos[0][6]) + " was successfully converted") else : write_message("an error occurred while converting from " + str(repos[0][6])) continue if postmode=="h-c-u": res = call_bibupload(convertpath=convertpath) if res==0 : write_message("material harvested from source " + str(repos[0][6]) + " was successfully uploaded") else : write_message("an error occurred while uploading harvest from " + str(repos[0][6])) continue elif postmode not in ["h", "h-c", "h-u", "h-c-u"] : ### this should not happen write_message("invalid postprocess mode: " + postmode + " skipping repository") continue task_update_status("DONE") if options["verbose"]: write_message("Task #%d finished." % task_id) return 1 def add_timestamp_and_timelag(timestamp, timelag): """ Adds a time lag in seconds to a given date (timestamp). Returns the resulting date. """ # remove any trailing .00 in timestamp: timestamp = sre.sub(r'\.[0-9]+$', '', timestamp) # first convert timestamp to Unix epoch seconds: timestamp_seconds = calendar.timegm(time.strptime(timestamp, "%Y-%m-%d %H:%M:%S")) # now add them: result_seconds = timestamp_seconds + timelag result = time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime(result_seconds)) return result def update_lastrun(index): """ A method that updates the lastrun of a repository successfully harvested """ try: today = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) sql = 'UPDATE oaiHARVEST SET lastrun="%s" WHERE id=%s' % (today, index) res = run_sql(sql) return 1 except StandardError, e: return (0,e) def call_bibharvest(prefix, baseurl, harvestpath, fro="", until=""): """ A method that calls bibharvest and writes harvested output to disk """ try: command = '%s/bibharvest -v ListRecords -p %s ' % (bindir, prefix) if fro!="": command += '-f %s ' % fro if until!="": command += '-u %s ' % until command += baseurl (stdin, harvesttext, stderr) = os.popen3(command) harvestread = harvesttext.read() ftmp = open(harvestpath, 'w') ftmp.write(harvestread) ftmp.close() return 0 except StandardError, e: return (0,e) def call_bibconvert(config, harvestpath, convertpath): """ A method that reads from a file and converts according to a BibConvert Configuration file. Converted output is returned """ command = """%s/bibconvert -c %s < %s > %s """ % (bindir, config, harvestpath, convertpath) stdout = os.popen(command) return stdout def call_bibupload(convertpath): """ A method that uploads a file to the database - calls bibUpload """ command = '%s/bibupload -i %s ' % (bindir, convertpath) p=os.system(command) return p def get_row_from_reposname(reposname): """ Returns all information about a row (OAI source) from the source name """ try: sql = 'select * from oaiHARVEST where name="%s"' % MySQLdb.escape_string(reposname) res = run_sql(sql) reposdata = [] for element in res: reposdata.append(element) return reposdata except StandardError, e: return (0,e) def get_all_rows_from_db(): """ This method retrieves the full database of repositories and returns a list containing (in exact order): | id | baseurl | metadataprefix | arguments | comment | bibconvertcfgfile | name | lastrun | frequency | postprocess | """ try: reposlist = [] sql = """select id from oaiHARVEST""" idlist = run_sql(sql) for index in idlist: sql = """select * from oaiHARVEST where id=%s""" % index reposelements = run_sql(sql) repos = [] for element in reposelements: repos.append(element) reposlist.append(repos) return reposlist except StandardError, e: return (0,e) def compare_timestamps_with_tolerance(timestamp1, timestamp2, tolerance=0): """Compare two timestamps TIMESTAMP1 and TIMESTAMP2, of the form '2005-03-31 17:37:26'. Optionally receives a TOLERANCE argument (in seconds). Return -1 if TIMESTAMP1 is less than TIMESTAMP2 minus TOLERANCE, 0 if they are equal within TOLERANCE limit, and 1 if TIMESTAMP1 is greater than TIMESTAMP2 plus TOLERANCE. """ # remove any trailing .00 in timestamps: timestamp1 = sre.sub(r'\.[0-9]+$', '', timestamp1) timestamp2 = sre.sub(r'\.[0-9]+$', '', timestamp2) # first convert timestamps to Unix epoch seconds: timestamp1_seconds = calendar.timegm(time.strptime(timestamp1, "%Y-%m-%d %H:%M:%S")) timestamp2_seconds = calendar.timegm(time.strptime(timestamp2, "%Y-%m-%d %H:%M:%S")) # now compare them: if timestamp1_seconds < timestamp2_seconds - tolerance: return -1 elif timestamp1_seconds > timestamp2_seconds + tolerance: return 1 else: return 0 def command_line(): global options long_flags =["repository=", "dates=" "user=","sleeptime=","time=", "help", "version", "verbose="] short_flags ="r:d:u:s:t:hVv:" format_string = "%Y-%m-%d %H:%M:%S" repositories = None dates = None sleeptime = "" try: opts, args = getopt.getopt(sys.argv[1:], short_flags, long_flags) except getopt.GetoptError, err: write_message(err, sys.stderr) usage(1) if args: usage(1) options={"sleeptime":0, "verbose":1, "repository":0, "dates":0} sched_time = time.strftime(format_string) user = "" # Check for key options try: for opt in opts: if opt == ("-h","") or opt == ("--help",""): usage(1) elif opt == ("-V","") or opt == ("--version",""): print "Version 0.0 FIXME" sys.exit(1) elif opt[0] in ["--verbose", "-v"]: options["verbose"] = int(opt[1]) elif opt[0] in [ "-r", "--repository" ]: repositories = opt[1] elif opt[0] in [ "-d", "--dates" ]: dates = opt[1] elif opt[0] in [ "-u", "--user"]: user = opt[1] elif opt[0] in [ "-s", "--sleeptime" ]: get_datetime(opt[1]) # see if it is a valid shift sleeptime= opt[1] elif opt[0] in [ "-t", "--time" ]: sched_time= get_datetime(opt[1]) else: usage(1) except StandardError, e: write_message(e, sys.stderr) sys.exit(1) options["repository"]=get_repository_names(repositories) if dates != None: options["dates"]=get_dates(dates) if dates != None and options["dates"]==None: write_message("Date format not valid. Quitting task...") sys.exit(1) user = authenticate(user) if options["verbose"] >= 9: print "" write_message("storing task options %s\n" % options) new_task_id = run_sql("""INSERT INTO schTASK (proc,user,runtime,sleeptime,arguments,status) VALUES ('oaiharvest',%s,%s,%s,%s,'WAITING')""", (user, sched_time, sleeptime, dumps(options))) print "Task #%d was successfully scheduled for execution." % new_task_id return def get_dates(dates): """ A method to validate and process the dates input by the user at the command line """ twodates = [] if dates: datestring = string.split(dates, ":") if len(datestring)==2: for date in datestring: ### perform some checks on the date format datechunks = string.split(date, "-") if len(datechunks)==3: try: if int(datechunks[0]) and int(datechunks[1]) and int(datechunks[2]): twodates.append(date) except StandardError, e: write_message("Dates have invalid format, not 'yyyy-mm-dd:yyyy-mm-dd'") twodates=None return twodates else: write_message("Dates have invalid format, not 'yyyy-mm-dd:yyyy-mm-dd'") twodates=None return twodates ## final check.. date1 must me smaller than date2 date1 = str(twodates[0]) + " 01:00:00" date2 = str(twodates[1]) + " 01:00:00" if compare_timestamps_with_tolerance(date1, date2)!=-1: write_message("First date must be before second date.") twodates=None return twodates else: write_message("Dates have invalid format, not 'yyyy-mm-dd:yyyy-mm-dd'") twodates=None else: twodates=None return twodates def get_repository_names(repositories): """ A method to validate and process the repository names input by the user at the command line """ repository_names = [] if repositories: names = string.split(repositories, ",") for name in names: ### take into account both single word names and multiple word names (which get wrapped around "" or '') quote = "'" doublequote = '"' if name.find(quote)==0 and name.find(quote)==len(name): name = name.split(quote)[1] if name.find(doublequote)==0 and name.find(doublequote)==len(name): name = name.split(doublequote)[1] repository_names.append(name) else: repository_names=None return repository_names def task_sig_sleep(sig, frame): """Signal handler for the 'sleep' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("sleeping...") task_update_status("SLEEPING") signal.pause() # wait for wake-up signal def task_sig_wakeup(sig, frame): """Signal handler for the 'wakeup' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("continuing...") task_update_status("CONTINUING") def task_sig_stop(sig, frame): """Signal handler for the 'stop' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("stopping...") task_update_status("STOPPING") errcode = 0 try: task_sig_stop_commands() write_message("stopped") task_update_status("STOPPED") except StandardError, err: write_message("Error during stopping! %e" % err) task_update_status("STOPPINGFAILED") errcode = 1 sys.exit(errcode) def task_sig_stop_commands(): """Do all the commands necessary to stop the task before quitting. Useful for task_sig_stop() handler. """ write_message("stopping commands started") for table in wordTables: table.put_into_db() write_message("stopping commands ended") def task_sig_suicide(sig, frame): """Signal handler for the 'suicide' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("suiciding myself now...") task_update_status("SUICIDING") write_message("suicided") task_update_status("SUICIDED") sys.exit(0) def task_sig_unknown(sig, frame): """Signal handler for the other unknown signals sent by shell or user.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("unknown signal %d ignored" % sig) # do nothing for other signals def task_update_progress(msg): """Updates progress information in the BibSched task table.""" global task_id, options if options["verbose"] >= 9: write_message("Updating task progress to %s." % msg) return run_sql("UPDATE schTASK SET progress=%s where id=%s", (msg, task_id)) def task_update_status(val): """Updates state information in the BibSched task table.""" global task_id, options if options["verbose"] >= 9: write_message("Updating task status to %s." % val) return run_sql("UPDATE schTASK SET status=%s where id=%s", (val, task_id)) def main(): """Reads arguments and either runs the task, or starts user-interface (command line).""" if len(sys.argv) == 2: try: id = int(sys.argv[1]) except StandardError, err: command_line() sys.exit() res = run_sql("SELECT * FROM schTASK WHERE id='%d'" % (id), None, 1) if not res: write_message("Selected task not found.", sys.stderr) sys.exit(1) try: if not task_run(res[0]): write_message("Error occurred. Exiting.", sys.stderr) except StandardError, e: write_message("Unexpected error occurred: %s." % e, sys.stderr) if options["verbose"] >= 9: traceback.print_tb(sys.exc_info()[2]) write_message("Exiting.") task_update_status("ERROR") else: command_line() diff --git a/modules/bibharvest/web/admin/bibharvestadmin.py b/modules/bibharvest/web/admin/bibharvestadmin.py index 9eb8234eb..eeaa39109 100644 --- a/modules/bibharvest/web/admin/bibharvestadmin.py +++ b/modules/bibharvest/web/admin/bibharvestadmin.py @@ -1,160 +1,160 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware BibHarvest Administrator Interface.""" __lastupdated__ = """$Date$""" import sys + import cdsware.bibharvestadminlib as bhc -reload(bhc) from cdsware.webpage import page, create_error_box from cdsware.config import weburl,cdslang from cdsware.webuser import getUid, page_not_authorized __version__ = "$Id$" def index(req, ln=cdslang): navtrail_previous_links = bhc.getnavtrail() try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bhc.check_user(uid,'cfgbibharvest') if not auth[0]: return page(title="BibHarvest Admin Interface", body=bhc.perform_request_index(ln), uid=uid, language=ln, navtrail = navtrail_previous_links, lastupdated=__lastupdated__, urlargs=req.args) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def editsource(req, oai_src_id, oai_src_name='', oai_src_baseurl='', oai_src_prefix='', oai_src_frequency='', oai_src_config='', oai_src_post='', ln=cdslang, mtype='', callback='yes', confirm=-1): navtrail_previous_links = bhc.getnavtrail() + """> BibHarvest Admin Interface """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bhc.check_user(uid,'cfgbibharvest') if not auth[0]: return page(title="Edit OAI Source", body=bhc.perform_request_editsource(oai_src_id=oai_src_id, oai_src_name=oai_src_name, oai_src_baseurl=oai_src_baseurl, oai_src_prefix=oai_src_prefix, oai_src_frequency=oai_src_frequency, oai_src_config=oai_src_config, oai_src_post=oai_src_post, ln=ln, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def modifysource(req, oai_src_id, oai_src_name, oai_src_baseurl='', oai_src_prefix='', oai_src_frequency='', oai_src_config='', oai_src_post='', ln=cdslang, mtype='', callback='yes', confirm=-1): navtrail_previous_links = bhc.getnavtrail() + """> BibHarvest Admin Interface """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bhc.check_user(uid,'cfgbibharvest') if not auth[0]: return page(title="Edit OAI Source", body=bhc.perform_request_modifysource(oai_src_id=oai_src_id, oai_src_name=oai_src_name, oai_src_baseurl=oai_src_baseurl, oai_src_prefix=oai_src_prefix, oai_src_frequency=oai_src_frequency, oai_src_config=oai_src_config, oai_src_post=oai_src_post, ln=ln, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def addsource(req, ln=cdslang, oai_src_name='', oai_src_baseurl ='', oai_src_prefix='', oai_src_frequency='', oai_src_lastrun='', oai_src_config='', oai_src_post='', confirm=-1): navtrail_previous_links = bhc.getnavtrail() + """> BibHarvest Admin Interface """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bhc.check_user(uid,'cfgbibharvest') if not auth[0]: return page(title="Add new OAI Source", body=bhc.perform_request_addsource(oai_src_name=oai_src_name, oai_src_baseurl=oai_src_baseurl, oai_src_prefix=oai_src_prefix, oai_src_frequency=oai_src_frequency, oai_src_lastrun=oai_src_lastrun, oai_src_config=oai_src_config, oai_src_post=oai_src_post, ln=cdslang, confirm=confirm), uid=uid, language=ln, navtrail = navtrail_previous_links, urlargs=req.args, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def delsource(req, oai_src_id, ln=cdslang, confirm=0): navtrail_previous_links = bhc.getnavtrail() + """> BibHarvest Admin Interface """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bhc.check_user(uid,'cfgbibharvest') if not auth[0]: return page(title="Delete OAI Source", body=bhc.perform_request_delsource(oai_src_id=oai_src_id, ln=ln, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) diff --git a/modules/bibharvest/web/oai2d.py b/modules/bibharvest/web/oai2d.py index af7255215..5578422d4 100644 --- a/modules/bibharvest/web/oai2d.py +++ b/modules/bibharvest/web/oai2d.py @@ -1,126 +1,126 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """OAI interface for CDSware/MySQL written in Python compliant with OAI-PMH2.0""" __lastupdated__ = """$Date$""" __version__ = "$Id$" import sys import urllib +from mod_python import apache + from cdsware.dbquery import run_sql from cdsware.oai_repository_config import * from cdsware import oai_repository from cdsware.config import logdir -from mod_python import apache - def index (req): "OAI repository interface" ## check availability if os.path.exists("%s/RTdata/last_harvest_date" % logdir): req.err_headers_out["Status-Code"] = "503" req.err_headers_out["Retry-After"] = "60" req.status = apache.HTTP_SERVICE_UNAVAILABLE return "%s" % apache.OK command = "date > %s/RTdata/last_harvest_date" % logdir os.system(command) ## parse input parameters args = "" if req.method == "GET": args = req.args elif req.method == "POST": params = {} for key in req.form.keys(): params[key] = req.form[key] args = urllib.urlencode(params) arg = oai_repository.parse_args(args) ## check request for OAI compliancy oai_error = oai_repository.check_args(arg) ## create OAI response req.content_type = "text/xml" req.send_http_header() if oai_error == "": ## OAI Identify if arg['verb'] == "Identify": req.write(oai_repository.oaiidentify(args)) ## OAI ListSets elif arg['verb'] == "ListSets": req.write(oai_repository.oailistsets(args)) ## OAI ListIdentifiers elif arg['verb'] == "ListIdentifiers": req.write(oai_repository.oailistidentifiers(args)) ## OAI ListRecords elif arg['verb'] == "ListRecords": req.write(oai_repository.oailistrecords(args)) ## OAI GetRecord elif arg['verb'] == "GetRecord": req.write(oai_repository.oaigetrecord(args)) ## OAI ListMetadataFormats elif arg['verb'] == "ListMetadataFormats": req.write(oai_repository.oailistmetadataformats(args)) ## Unknown verb else: req.write(oai_repository.oai_error("badVerb","Illegal OAI verb")) ## OAI error else: req.write(oai_repository.oai_header(args,"")) req.write(oai_error) req.write(oai_repository.oai_footer("")) command = "rm %s/RTdata/last_harvest_date" % logdir os.system(command) return "\n" diff --git a/modules/bibindex/bin/bibindex.in b/modules/bibindex/bin/bibindex.in index 288ec3e12..1f075da55 100644 --- a/modules/bibindex/bin/bibindex.in +++ b/modules/bibindex/bin/bibindex.in @@ -1,71 +1,68 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ BibIndex bibliographic data, reference and fulltext indexing utility. Usage: bibindex %s [options] Examples: bibindex -a -i 234-250,293,300-500 -u admin@cdsware bibindex -a -w author,fulltext -M 8192 -v3 bibindex -d -m +4d -A on --flush=10000 Indexing options: -a, --add add or update words for selected records -d, --del delete words for selected records -i, --id=low[-high] select according to record recID. -m, --modified=from[,to] select according to modification date -c, --collection=c1[,c2] select according to collection Repairing options: -k, --check check consistency for all records in the table(s) -r, --repair try to repair all records in the table(s) Specific options: -w, --windex=w1[,w2] word/phrase indexes to consider (all) -M, --maxmem=XXX maximum memory usage in kB (no limit) -f, --flush=NNN full consistent table flush after NNN records (10000) Scheduling options: -u, --user=USER user name to store task, password needed -s, --sleeptime=SLEEP time after which to repeat task (no) e.g.: 1s, 30m, 24h, 7d -t, --time=TIME moment for the task to be active (now) e.g.: +15s, 5m, 3h, 2002-10-27 13:57:26 General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1) """ try: - import sys - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.bibindex_engine import main except ImportError, e: print "Error: %s" % e import sys sys.exit(1) main() diff --git a/modules/bibindex/bin/bibstat.in b/modules/bibindex/bin/bibstat.in index 8dfb00879..f69f129a9 100644 --- a/modules/bibindex/bin/bibstat.in +++ b/modules/bibindex/bin/bibstat.in @@ -1,109 +1,107 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ BibStat reports some interesting numbers on CDSware bibliographic record set. """ __version__ = "$Id$" ## import interesting modules: try: import sys - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.dbquery import run_sql from marshal import loads,dumps from zlib import compress,decompress from string import split,translate,lower,upper import getopt import string import os import re import time import urllib import signal import threading import unicodedata import traceback import cStringIO except ImportError, e: print "Error: %s" % e import sys sys.exit(1) def report_table_status(tablename): """Report stats for the table TABLENAME.""" out = "" res = run_sql("SHOW TABLE STATUS LIKE %s", (tablename,)) for row in res: out += "%14s %17d %17d %17d" % (row[0], row[3], row[5], row[7]) # Name, Rows, Data_length, Max_data_length return out def report_on_all_bibliographic_tables(): """Report stats for all the interesting bibliographic tables.""" print "%14s %17s %17s %17s" % ("TABLE", "ROWS", "DATA SIZE", "INDEX SIZE") print "============== ================= ================= =================" for i in range(0,10): for j in range(0,10): print report_table_status("bib%1d%1dx" % (i, j)) print report_table_status("bibrec_bib%1d%1dx" % (i, j)) for i in range(0,11): print report_table_status("idxWORD%02dF" % i) print report_table_status("idxWORD%02dR" % i) for i in range(0,11): print report_table_status("idxPHRASE%02dF" % i) print report_table_status("idxPHRASE%02dR" % i) return def usage(exitcode=1, msg=""): """Prints usage info.""" if msg: sys.stderr.write("Error: %s.\n" % msg) sys.stderr.write("Usage: %s [options]\n" % sys.argv[0]) sys.stderr.write("General options:\n") sys.stderr.write(" -h, --help \t\t Print this help.\n") sys.stderr.write(" -V, --version \t\t Print version information.\n") sys.exit(exitcode) def main(): """Report stats on the CDSware bibliographic tables.""" try: opts, args = getopt.getopt(sys.argv[1:], "hV", ["help", "version"]) except getopt.GetoptError, err: usage(1, err) if opts: for opt in opts: if opt[0] in ["-h", "--help"]: usage(0) elif opt[0] in ["-V", "--version"]: print __version__ sys.exit(0) else: usage(1) else: report_on_all_bibliographic_tables() if __name__ == "__main__": main() diff --git a/modules/bibindex/lib/bibindex_engine.py b/modules/bibindex/lib/bibindex_engine.py index 8d1d8335d..2709de7de 100644 --- a/modules/bibindex/lib/bibindex_engine.py +++ b/modules/bibindex/lib/bibindex_engine.py @@ -1,1672 +1,1672 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## BibIndxes bibliographic data, reference and fulltext indexing utility. ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ BibIndex indexing engine implementation. See bibindex executable for entry point. """ from marshal import loads,dumps from zlib import compress,decompress from string import split,translate,lower,upper import getopt import getpass import string import os import sre import sys import time import MySQLdb import Numeric import urllib import signal import tempfile import unicodedata import traceback import cStringIO -from config import * -from bibindex_engine_config import * -from search_engine_config import cfg_max_recID -from search_engine import perform_request_search, strip_accents -from dbquery import run_sql -from access_control_engine import acc_authorize_action -from bibindex_engine_stopwords import is_stopword -from bibindex_engine_stemmer import stem +from cdsware.config import * +from cdsware.bibindex_engine_config import * +from cdsware.search_engine_config import cfg_max_recID +from cdsware.search_engine import perform_request_search, strip_accents +from cdsware.dbquery import run_sql +from cdsware.access_control_engine import acc_authorize_action +from cdsware.bibindex_engine_stopwords import is_stopword +from cdsware.bibindex_engine_stemmer import stem ## import optional modules: try: import psyco psyco.bind(get_words_from_phrase) psyco.bind(merge_with_old_recIDs) psyco.bind(serialize_via_numeric_array) psyco.bind(serialize_via_marshal) psyco.bind(deserialize_via_numeric_array) psyco.bind(deserialize_via_marshal) except: pass ## override urllib's default password-asking behaviour: class MyFancyURLopener(urllib.FancyURLopener): def prompt_user_passwd(self, host, realm): # supply some dummy credentials by default return (cfg_bibindex_urlopener_username, cfg_bibindex_urlopener_password) def http_error_401(self, url, fp, errcode, errmsg, headers): # do not bother with protected pages raise IOError, (999, 'unauthorized access') return None urllib._urlopener = MyFancyURLopener() def write_message(msg, stream=sys.stdout): """Write message and flush output stream (may be sys.stdout or sys.stderr).""" if stream == sys.stdout or stream == sys.stderr: stream.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())) stream.write("%s\n" % msg) stream.flush() else: sys.stderr.write("Unknown stream %s. [must be sys.stdout or sys.stderr]\n" % stream) return ## precompile some often-used regexp for speed reasons: sre_subfields = sre.compile('\$\$\w'); sre_html = sre.compile("(?s)<[^>]*>|&#?\w+;") sre_block_punctuation_begin = sre.compile(r"^"+cfg_bibindex_chars_punctuation+"+") sre_block_punctuation_end = sre.compile(cfg_bibindex_chars_punctuation+"+$") sre_punctuation = sre.compile(cfg_bibindex_chars_punctuation) sre_separators = sre.compile(cfg_bibindex_chars_alphanumeric_separators) sre_datetime_shift = sre.compile("([-\+]{0,1})([\d]+)([dhms])") nb_char_in_line = 50 # for verbose pretty printing chunksize = 1000 # default size of chunks that the records will be treated by wordTables = [] task_id = -1 base_process_size = 4500 # process base size options = {} # will hold task options ## Dictionary merging functions def intersection(dict, dict2): "Returns intersection of the two dictionaries." int_dict = {} if len(dict1) > len(dict2): for e in dict2: if dict1.has_key(e): int_dict[e] = 1 else: for e in dict1: if dict2.has_key(e): int_dict[e] = 1 return int_dict def union(dict1, dict2): "Returns union of the two dictionaries." union_dict = {} for e in dict1.keys(): union_dict[e] = 1 for e in dict2.keys(): union_dict[e] = 1 return union_dict def diff(dict1, dict2): "Returns dict1 - dict2." diff_dict = {} for e in dict1.keys(): if not dict2.has_key(e): diff_dict[e] = 1 return diff_dict def list_union(list1, list2): "Returns union of the two lists." union_dict = {} for e in list1: union_dict[e] = 1 for e in list2: union_dict[e] = 1 return union_dict.keys() ## safety function for killing slow MySQL threads: def kill_sleepy_mysql_threads(max_threads=cfg_max_mysql_threads, thread_timeout=cfg_mysql_thread_timeout): """Check the number of MySQL threads and if there are more than MAX_THREADS of them, lill all threads that are in a sleeping state for more than THREAD_TIMEOUT seconds. (This is useful for working around the the max_connection problem that appears during indexation in some not-yet-understood cases.) If some threads are to be killed, write info into the log file. """ res = run_sql("SHOW FULL PROCESSLIST") if len(res) > max_threads: for row in res: r_id,r_user,r_host,r_db,r_command,r_time,r_state,r_info = row if r_command == "Sleep" and int(r_time) > thread_timeout: run_sql("KILL %s", (r_id,)) if options["verbose"] >= 1: write_message("WARNING: too many MySQL threads, killing thread %s" % r_id) return ## MARC-21 tag/field access functions def get_fieldvalues(recID, tag): """Returns list of values of the MARC-21 'tag' fields for the record 'recID'.""" out = [] bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx query = "SELECT value FROM %s AS b, %s AS bb WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id AND tag LIKE '%s'" \ % (bibXXx, bibrec_bibXXx, recID, tag) res = run_sql(query) for row in res: out.append(row[0]) return out def get_associated_subfield_value(recID, tag, value, associated_subfield_code): """Return list of ASSOCIATED_SUBFIELD_CODE, if exists, for record RECID and TAG of value VALUE. Used by fulltext indexer only. Note: TAG must be 6 characters long (tag+ind1+ind2+sfcode), otherwise en empty string is returned. FIXME: what if many tag values have the same value but different associated_subfield_code? Better use bibrecord library for this. """ out = "" if len(tag) != 6: return out bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx query = """SELECT bb.field_number, b.tag, b.value FROM %s AS b, %s AS bb WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id AND tag LIKE '%s%%'""" % \ (bibXXx, bibrec_bibXXx, recID, tag[:-1]) res = run_sql(query) field_number = -1 for row in res: if row[1] == tag and row[2] == value: field_number = row[0] if field_number > 0: for row in res: if row[0] == field_number and row[1] == tag[:-1] + associated_subfield_code: out = row[2] break return out def get_field_tags(field): """Returns a list of MARC tags for the field code 'field'. Returns empty list in case of error. Example: field='author', output=['100__%','700__%'].""" out = [] query = """SELECT t.value FROM tag AS t, field_tag AS ft, field AS f WHERE f.code='%s' AND ft.id_field=f.id AND t.id=ft.id_tag ORDER BY ft.score DESC""" % field res = run_sql(query) for row in res: out.append(row[0]) return out ## Fulltext word extraction functions def get_fulltext_urls_from_html_page(htmlpagebody): """Parses htmlpagebody data looking for url_directs referring to probable fulltexts. Returns an array of (ext,url_direct) to fulltexts. Note: it looks for file format extensions as defined by global 'conv_programs'structure. """ out = [] for ext in conv_programs.keys(): expr = sre.compile( r"\"(http://[\w]+\.+[\w]+[^\"'><]*\." + \ ext + r")\"") match = expr.search(htmlpagebody) if match: out.append([ext,match.group(1)]) else: # FIXME: workaround for getfile, should use bibdoc tables expr_getfile = sre.compile(r"\"(http://.*getfile\.py\?.*format=" + ext + r"&version=.*)\"") match = expr_getfile.search(htmlpagebody) if match: out.append([ext,match.group(1)]) return out def get_words_from_fulltext(url_direct_or_indirect, separators="[^\w]", split=string.split, force_file_extension=None): """Returns all the words contained in the document specified by URL_DIRECT_OR_INDIRECT with the words being split by SEPARATORS. If FORCE_FILE_EXTENSION is set (e.g. to "pdf", then treat URL_DIRECT_OR_INDIRECT as a PDF file. (This is interesting to index Indico for example.) Note also that URL_DIRECT_OR_INDIRECT may be either a direct URL to the fulltext file or an URL to a setlink-like page body that presents the links to be indexed. In the latter case the URL_DIRECT_OR_INDIRECT is parsed to extract actual direct URLs to fulltext documents, for all knows file extensions as specified by global conv_programs config variable. """ if cfg_bibindex_fulltext_index_local_files_only and string.find(url_direct_or_indirect, weburl) < 0: return [] if options["verbose"] >= 2: write_message("... reading fulltext files from %s started" % url_direct_or_indirect) fulltext_urls = None if not force_file_extension: url_direct = None fulltext_urls = None # check for direct link in url url_direct_or_indirect_ext = lower(split(url_direct_or_indirect,".")[-1]) if url_direct_or_indirect_ext in conv_programs.keys(): fulltext_urls = [(url_direct_or_indirect_ext,url_direct_or_indirect)] # Indirect url. Try to fetch the real fulltext(s) if not fulltext_urls: # read "setlink" data try: htmlpagebody = urllib.urlopen(url_direct_or_indirect).read() except: sys.stderr.write("Error: Cannot read %s.\n" % url_direct_or_indirect) return [] fulltext_urls = get_fulltext_urls_from_html_page(htmlpagebody) if options["verbose"] >= 9: write_message("... fulltext_urls = %s" % fulltext_urls) else: fulltext_urls = [[force_file_extension, url_direct_or_indirect]] words = {} # process as many urls as they were found: for (ext,url_direct) in fulltext_urls: if options["verbose"] >= 2: write_message(".... processing %s from %s started" % (ext,url_direct)) # sanity check: if not url_direct: break; # read fulltext file: try: url = urllib.urlopen(url_direct) except: sys.stderr.write("Error: Cannot read %s.\n" % url_direct) break # try other fulltext files... tmp_name = tempfile.mktemp('cdsware.tmp') tmp_fd = open(tmp_name, "w") data_chunk = url.read(8*1024) while data_chunk: tmp_fd.write(data_chunk) data_chunk = url.read(8*1024) tmp_fd.close() # try all available conversion programs according to their order: bingo = 0 for conv_program in conv_programs[ext]: if os.path.exists(conv_program): # intelligence on how to run various conversion programs: cmd = "" # wil keep command to run bingo = 0 # had we success? if os.path.basename(conv_program) == "pdftotext": cmd = "%s %s %s.txt" % (conv_program, tmp_name, tmp_name) elif os.path.basename(conv_program) == "pstotext": if ext == "ps.gz": # is there gzip available? if os.path.exists(conv_programs_helpers["gz"]): cmd = "%s -cd %s | %s > %s.txt" \ % (conv_programs_helpers["gz"], tmp_name, conv_program, tmp_name) else: cmd = "%s %s > %s.txt" \ % (conv_program, tmp_name, tmp_name) elif os.path.basename(conv_program) == "ps2ascii": if ext == "ps.gz": # is there gzip available? if os.path.exists(conv_programs_helpers["gz"]): cmd = "%s -cd %s | %s > %s.txt"\ % (conv_programs_helpers["gz"], tmp_name, conv_program, tmp_name) else: cmd = "%s %s %s.txt" \ % (conv_program, tmp_name, tmp_name) elif os.path.basename(conv_program) == "antiword": cmd = "%s %s > %s.txt" % (conv_program, tmp_name, tmp_name) elif os.path.basename(conv_program) == "catdoc": cmd = "%s %s > %s.txt" % (conv_program, tmp_name, tmp_name) elif os.path.basename(conv_program) == "wvText": cmd = "%s %s %s.txt" % (conv_program, tmp_name, tmp_name) elif os.path.basename(conv_program) == "ppthtml": # is there html2text available? if os.path.exists(conv_programs_helpers["html"]): cmd = "%s %s | %s > %s.txt"\ % (conv_program, tmp_name, conv_programs_helpers["html"], tmp_name) else: cmd = "%s %s > %s.txt" \ % (conv_program, tmp_name, tmp_name) elif os.path.basename(conv_program) == "xlhtml": # is there html2text available? if os.path.exists(conv_programs_helpers["html"]): cmd = "%s %s | %s > %s.txt" % \ (conv_program, tmp_name, conv_programs_helpers["html"], tmp_name) else: cmd = "%s %s > %s.txt" % \ (conv_program, tmp_name, tmp_name) else: sys.stderr.write("Error: Do not know how to handle %s conversion program.\n" % conv_program) # try to run it: try: if options["verbose"] >= 9: write_message("..... launching %s" % cmd) errcode = os.system(cmd) if errcode == 0 and os.path.exists("%s.txt" % tmp_name): bingo = 1 break # bingo! else: write_message("Error while running %s for %s.\n" % (cmd, url_direct), sys.stderr) except: write_message("Error running %s for %s.\n" % (cmd, url_direct), sys.stderr) # were we successful? if bingo: tmp_name_txt_file = open("%s.txt" % tmp_name) for phrase in tmp_name_txt_file.xreadlines(): for word in get_words_from_phrase(phrase): if not words.has_key(word): words[word] = 1; tmp_name_txt_file.close() else: if options["verbose"]: write_message("No conversion success for %s.\n" % (url_direct), sys.stderr) # delete temp files (they might not exist): try: os.unlink(tmp_name) os.unlink(tmp_name + ".txt") except StandardError: write_message("Error: Could not delete file. It didn't exist", sys.stderr) if options["verbose"] >= 2: write_message(".... processing %s from %s ended" % (ext,url_direct)) if options["verbose"] >= 2: write_message("... reading fulltext files from %s ended" % url_direct_or_indirect) return words.keys() # tagToFunctions mapping. It offers an indirection level necesary for # indexing fulltext. The default is get_words_from_phrase tagToWordsFunctions = {'8564_u':get_words_from_fulltext} def get_words_from_phrase(phrase, split=string.split): """Return list of words found in PHRASE. Note that the phrase is split into groups depending on the alphanumeric characters and punctuation characters definition present in the config file. """ words = {} if cfg_bibindex_remove_html_markup and string.find(phrase, " -1: phrase = sre_html.sub(' ', phrase) phrase = str.lower(phrase) # 1st split phrase into blocks according to whitespace for block in split(strip_accents(phrase)): # 2nd remove leading/trailing punctuation and add block: block = sre_block_punctuation_begin.sub("", block) block = sre_block_punctuation_end.sub("", block) if block: block = apply_stemming_and_stopwords_and_length_check(block) if block: words[block] = 1 # 3rd break each block into subblocks according to punctuation and add subblocks: for subblock in sre_punctuation.split(block): subblock = apply_stemming_and_stopwords_and_length_check(subblock) if subblock: words[subblock] = 1 # 4th break each subblock into alphanumeric groups and add groups: for alphanumeric_group in sre_separators.split(subblock): alphanumeric_group = apply_stemming_and_stopwords_and_length_check(alphanumeric_group) if alphanumeric_group: words[alphanumeric_group] = 1 return words.keys() def apply_stemming_and_stopwords_and_length_check(word): """Return WORD after applying stemming and stopword and length checks. See the config file in order to influence these. """ # stem word, when configured so: if cfg_bibindex_stemmer_default_language != "": word = stem(word, cfg_bibindex_stemmer_default_language) # now check against stopwords: if is_stopword(word): return "" # finally check the word length: if len(word) < cfg_bibindex_min_word_length: return "" return word def remove_subfields(s): "Removes subfields from string, e.g. 'foo $$c bar' becomes 'foo bar'." return sre_subfields.sub(' ', s) def get_index_id(indexname): """Returns the words/phrase index id for INDEXNAME. Returns empty string in case there is no words table for this index. Example: field='author', output=4.""" out = 0 query = """SELECT w.id FROM idxINDEX AS w WHERE w.name='%s' LIMIT 1""" % indexname res = run_sql(query, None, 1) if res: out = res[0][0] return out def get_index_tags(indexname): """Returns the list of tags that are indexed inside INDEXNAME. Returns empty list in case there are no tags indexed in this index. Note: uses get_field_tags() defined before. Example: field='author', output=['100__%', '700__%'].""" out = [] query = """SELECT f.code FROM idxINDEX AS w, idxINDEX_field AS wf, field AS f WHERE w.name='%s' AND w.id=wf.id_idxINDEX AND f.id=wf.id_field""" % indexname res = run_sql(query) for row in res: out.extend(get_field_tags(row[0])) return out def get_all_indexes(): """Returns the list of the names of all defined words indexes. Returns empty list in case there are no tags indexed in this index. Example: output=['global', 'author'].""" out = [] query = """SELECT name FROM idxINDEX""" res = run_sql(query) for row in res: out.append(row[0]) return out def usage(code, msg=''): "Prints usage for this module." if msg: sys.stderr.write("Error: %s.\n" % msg) print >> sys.stderr, \ """ Usage: %s [options] Examples: %s -a -i 234-250,293,300-500 -u admin@cdsware %s -a -w author,fulltext -M 8192 -v3 %s -d -m +4d -A on --flush=10000 Indexing options: -a, --add add or update words for selected records -d, --del delete words for selected records -i, --id=low[-high] select according to doc recID -m, --modified=from[,to] select according to modification date -c, --collection=c1[,c2] select according to collection Repairing options: -k, --check check consistency for all records in the table(s) -r, --repair try to repair all records in the table(s) Specific options: -w, --windex=w1[,w2] word/phrase indexes to consider (all) -M, --maxmem=XXX maximum memory usage in kB (no limit) -f, --flush=NNN full consistent table flush after NNN records (10000) Scheduling options: -u, --user=USER user name to store task, password needed -s, --sleeptime=SLEEP time after which to repeat tasks (no) e.g.: 1s, 30m, 24h, 7d -t, --time=TIME moment for the task to be active (now) e.g.: +15s, 5m, 3h , 2002-10-27 13:57:26 General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1) """ % ((sys.argv[0],) * 4) sys.exit(code) def authenticate(user, header="BibIndex Task Submission", action="runbibindex"): """Authenticate the user against the user database. Check for its password, if it exists. Check for action access rights. Return user name upon authorization success, do system exit upon authorization failure. """ print header print "=" * len(header) if user == "": print >> sys.stdout, "\rUsername: ", user = string.strip(string.lower(sys.stdin.readline())) else: print >> sys.stdout, "\rUsername: ", user ## first check user pw: res = run_sql("select id,password from user where email=%s", (user,), 1) if not res: print "Sorry, %s does not exist." % user sys.exit(1) else: (uid_db, password_db) = res[0] if password_db: password_entered = getpass.getpass() if password_db == password_entered: pass else: print "Sorry, wrong credentials for %s." % user sys.exit(1) ## secondly check authorization for the action: (auth_code, auth_message) = acc_authorize_action(uid_db, action) if auth_code != 0: print auth_message sys.exit(1) return user def split_ranges(parse_string): recIDs = [] ranges = string.split(parse_string, ",") for range in ranges: tmp_recIDs = string.split(range, "-") if len(tmp_recIDs)==1: recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[0])]) else: if int(tmp_recIDs[0]) > int(tmp_recIDs[1]): # sanity check tmp = tmp_recIDs[0] tmp_recIDs[0] = tmp_recIDs[1] tmp_recIDs[1] = tmp recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[1])]) return recIDs def get_word_tables(tables): wordTables = [] if tables: indexes = string.split(tables, ",") for index in indexes: index_id = get_index_id(index) if index_id: wordTables.append({"idxWORD%02dF" % index_id: \ get_index_tags(index)}) else: write_message("Error: There is no %s words table." % index, sys.stderr) else: for index in get_all_indexes(): index_id = get_index_id(index) wordTables.append({"idxWORD%02dF" % index_id: \ get_index_tags(index)}) return wordTables def get_date_range(var): "Returns the two dates contained as a low,high tuple" limits = string.split(var, ",") if len(limits)==1: low = get_datetime(limits[0]) return low,None if len(limits)==2: low = get_datetime(limits[0]) high = get_datetime(limits[1]) return low,high def get_datetime(var, format_string="%Y-%m-%d %H:%M:%S"): """Returns a date string according to the format string. It can handle normal date strings and shifts with respect to now.""" date = time.time() factors = {"d":24*3600, "h":3600, "m":60, "s":1} m = sre_datetime_shift.match(var) if m: sign = m.groups()[0] == "-" and -1 or 1 factor = factors[m.groups()[2]] value = float(m.groups()[1]) date = time.localtime(date + sign * factor * value) date = time.strftime(format_string, date) else: date = time.strptime(var, format_string) date = time.strftime(format_string, date) return date def create_range_list(res): """Creates a range list from a recID select query result contained in res. The result is expected to have ascending numerical order.""" if not res: return [] row = res[0] if not row: return [] else: range_list = [[row[0],row[0]]] for row in res[1:]: id = row[0] if id == range_list[-1][1] + 1: range_list[-1][1] = id else: range_list.append([id,id]) return range_list def beautify_range_list(range_list): """Returns a non overlapping, maximal range list""" ret_list = [] for new in range_list: found = 0 for old in ret_list: if new[0] <= old[0] <= new[1] + 1 or new[0] - 1 <= old[1] <= new[1]: old[0] = min(old[0], new[0]) old[1] = max(old[1], new[1]) found = 1 break if not found: ret_list.append(new) return ret_list def serialize_via_numeric_array(arr): """Serialize Numeric array into a compressed string.""" return compress(Numeric.dumps(arr)) def deserialize_via_numeric_array(string): """Decompress and deserialize string into a Numeric array.""" return Numeric.loads(decompress(string)) def serialize_via_marshal(obj): """Serialize Python object via marshal into a compressed string.""" return MySQLdb.escape_string(compress(dumps(obj))) def deserialize_via_marshal(string): """Decompress and deserialize string into a Python object via marshal.""" return loads(decompress(string)) class WordTable: "A class to hold the words table." def __init__(self, tablename, fields_to_index, separators="[^\s]"): "Creates words table instance." self.tablename = tablename self.recIDs_in_mem = [] self.fields_to_index = fields_to_index self.separators = separators self.value = {} def get_field(self, recID, tag): """Returns list of values of the MARC-21 'tag' fields for the record 'recID'.""" out = [] bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx query = """SELECT value FROM %s AS b, %s AS bb WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID, tag); res = run_sql(query) for row in res: out.append(row[0]) return out def clean(self): "Cleans the words table." self.value={} def put_into_db(self, mode="normal", split=string.split): """Updates the current words table in the corresponding MySQL's idxFOO table. Mode 'normal' means normal execution, mode 'emergency' means words index reverting to old state. """ if options["verbose"]: write_message("%s %s wordtable flush started" % (self.tablename,mode)) write_message('...updating %d words into %s started' % \ (len(self.value), self.tablename)) task_update_progress("%s flushed %d/%d words" % (self.tablename, 0, len(self.value))) self.recIDs_in_mem = beautify_range_list(self.recIDs_in_mem) if mode == "normal": for group in self.recIDs_in_mem: query = """UPDATE %sR SET type='TEMPORARY' WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='CURRENT'""" % \ (self.tablename[:-1], group[0], group[1]) if options["verbose"] >= 9: write_message(query) run_sql(query) nb_words_total = len(self.value) nb_words_report = int(nb_words_total/10) nb_words_done = 0 for word in self.value.keys(): self.put_word_into_db(word) nb_words_done += 1 if nb_words_report!=0 and ((nb_words_done % nb_words_report) == 0): if options["verbose"]: write_message('......processed %d/%d words' % (nb_words_done, nb_words_total)) task_update_progress("%s flushed %d/%d words" % (self.tablename, nb_words_done, nb_words_total)) if options["verbose"] >= 9: write_message('...updating %d words into %s ended' % \ (nb_words_total, self.tablename)) if options["verbose"]: write_message('...updating reverse table %sR started' % self.tablename[:-1]) if mode == "normal": for group in self.recIDs_in_mem: query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \ (self.tablename[:-1], group[0], group[1]) if options["verbose"] >= 9: write_message(query) run_sql(query) query = """DELETE FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \ (self.tablename[:-1], group[0], group[1]) if options["verbose"] >= 9: write_message(query) run_sql(query) if options["verbose"] >= 9: write_message('End of updating wordTable into %s' % self.tablename) elif mode == "emergency": for group in self.recIDs_in_mem: query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \ (self.tablename[:-1], group[0], group[1]) if options["verbose"] >= 9: write_message(query) run_sql(query) query = """DELETE FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \ (self.tablename[:-1], group[0], group[1]) if options["verbose"] >= 9: write_message(query) run_sql(query) if options["verbose"] >= 9: write_message('End of emergency flushing wordTable into %s' % self.tablename) if options["verbose"]: write_message('...updating reverse table %sR ended' % self.tablename[:-1]) self.clean() self.recIDs_in_mem = [] if options["verbose"]: write_message("%s %s wordtable flush ended" % (self.tablename, mode)) task_update_progress("%s flush ended" % (self.tablename)) def load_old_recIDs(self,word): """Load existing hitlist for the word from the database index files.""" query = "SELECT hitlist FROM %s WHERE term=%%s" % self.tablename res = run_sql(query, (word,)) if res: return deserialize_via_numeric_array(res[0][0]) else: return None def merge_with_old_recIDs(self,word,set): """Merge the system numbers stored in memory (hash of recIDs with value +1 or -1 according to whether to add/delete them) with those stored in the database index and received in set universe of recIDs for the given word. Return 0 in case no change was done to SET, return 1 in case SET was changed. """ set_changed_p = 0 for recID,sign in self.value[word].items(): if sign == -1 and set[recID]==1: # delete recID if existent in set and if marked as to be deleted set[recID] = 0 set_changed_p = 1 elif set[recID] == 0: # add recID if not existent in set and if marked as to be added set[recID] = 1 set_changed_p = 1 return set_changed_p def put_word_into_db(self, word, split=string.split): """Flush a single word to the database and delete it from memory""" set = self.load_old_recIDs(word) if set: # merge the word recIDs found in memory: if self.merge_with_old_recIDs(word,set) == 0: # nothing to update: if options["verbose"] >= 9: write_message("......... unchanged hitlist for ``%s''" % word) pass else: # yes there were some new words: if options["verbose"] >= 9: write_message("......... updating hitlist for ``%s''" % word) run_sql("UPDATE %s SET hitlist=%%s WHERE term=%%s" % self.tablename, (serialize_via_numeric_array(set), word)) else: # the word is new, will create new set: set = Numeric.zeros(cfg_max_recID+1, Numeric.Int0) Numeric.put(set, self.value[word].keys(), 1) if options["verbose"] >= 9: write_message("......... inserting hitlist for ``%s''" % word) run_sql("INSERT INTO %s (term, hitlist) VALUES (%%s, %%s)" % self.tablename, (word, serialize_via_numeric_array(set))) if not set: # never store empty words run_sql("DELETE from %s WHERE term=%%s" % self.tablename, (word,)) del self.value[word] def display(self): "Displays the word table." keys = self.value.keys() keys.sort() for k in keys: if options["verbose"]: write_message("%s: %s" % (k, self.value[k])) def count(self): "Returns the number of words in the table." return len(self.value) def info(self): "Prints some information on the words table." if options["verbose"]: write_message("The words table contains %d words." % self.count()) def lookup_words(self, word=""): "Lookup word from the words table." if not word: done = 0 while not done: try: word = raw_input("Enter word: ") done = 1 except (EOFError, KeyboardInterrupt): return if self.value.has_key(word): if options["verbose"]: write_message("The word '%s' is found %d times." \ % (word, len(self.value[word]))) else: if options["verbose"]: write_message("The word '%s' does not exist in the word file."\ % word) def update_last_updated(self, starting_time=None): """Update last_updated column of the index table in the database. Puts starting time there so that if the task was interrupted for record download, the records will be reindexed next time.""" if starting_time is None: return None if options["verbose"] >= 9: write_message("updating last_updated to %s...", starting_time) return run_sql("UPDATE idxINDEX SET last_updated=%s WHERE id=%s", (starting_time, self.tablename[-3:-1],)) def add_recIDs(self, recIDs): """Fetches records which id in the recIDs range list and adds them to the wordTable. The recIDs range list is of the form: [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]]. """ global chunksize flush_count = 0 records_done = 0 records_to_go = 0 for range in recIDs: records_to_go = records_to_go + range[1] - range[0] + 1 time_started = time.time() # will measure profile time for range in recIDs: i_low = range[0] chunksize_count = 0 while i_low <= range[1]: # calculate chunk group of recIDs and treat it: i_high = min(i_low+options["flush"]-flush_count-1,range[1]) i_high = min(i_low+chunksize-chunksize_count-1, i_high) try: self.chk_recID_range(i_low, i_high) except StandardError, e: write_message("Exception caught: %s" % e, sys.stderr) if options["verbose"] >= 9: traceback.print_tb(sys.exc_info()[2]) task_update_status("ERROR") task_sig_stop_commands() sys.exit(1) if options["verbose"]: write_message("%s adding records #%d-#%d started" % \ (self.tablename, i_low, i_high)) if cfg_check_mysql_threads: kill_sleepy_mysql_threads() task_update_progress("%s adding recs %d-%d" % (self.tablename, i_low, i_high)) self.del_recID_range(i_low, i_high) just_processed = self.add_recID_range(i_low, i_high) flush_count = flush_count + i_high - i_low + 1 chunksize_count = chunksize_count + i_high - i_low + 1 records_done = records_done + just_processed if options["verbose"]: write_message("%s adding records #%d-#%d ended " % \ (self.tablename, i_low, i_high)) if chunksize_count >= chunksize: chunksize_count = 0 # flush if necessary: if flush_count >= options["flush"]: self.put_into_db() self.clean() if options["verbose"]: write_message("%s backing up" % (self.tablename)) flush_count = 0 self.log_progress(time_started,records_done,records_to_go) # iterate: i_low = i_high + 1 if flush_count > 0: self.put_into_db() self.log_progress(time_started,records_done,records_to_go) def add_recIDs_by_date(self, dates): """Add records that were modified between DATES[0] and DATES[1]. If DATES is not set, then add records that were modified since the last update of the index. """ if not dates: id = self.tablename[-3:-1] query = """SELECT last_updated FROM idxINDEX WHERE id='%s' """ % id res = run_sql(query) if not res: return if not res[0][0]: dates = ("0000-00-00", None) else: dates = (res[0][0], None) if dates[1] is None: res = run_sql("""SELECT b.id FROM bibrec AS b WHERE b.modification_date >= %s ORDER BY b.id ASC""", (dates[0],)) elif dates[0] is None: res = run_sql("""SELECT b.id FROM bibrec AS b WHERE b.modification_date <= %s ORDER BY b.id ASC""", (dates[1],)) else: res = run_sql("""SELECT b.id FROM bibrec AS b WHERE b.modification_date >= %s AND b.modification_date <= %s ORDER BY b.id ASC""", (dates[0], dates[1])) list = create_range_list(res) if not list: if options["verbose"]: write_message( "No new records added. %s is up to date" % self.tablename) else: self.add_recIDs(list) def add_recID_range(self, recID1, recID2): empty_list_string = serialize_via_marshal([]) wlist = {} self.recIDs_in_mem.append([recID1,recID2]) # secondly fetch all needed tags: for tag in self.fields_to_index: if tag in tagToWordsFunctions.keys(): get_words_function = tagToWordsFunctions[tag] else: get_words_function = get_words_from_phrase bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx query = """SELECT bb.id_bibrec,b.value FROM %s AS b, %s AS bb WHERE bb.id_bibrec BETWEEN %d AND %d AND bb.id_bibxxx=b.id AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID1, recID2, tag) res = run_sql(query) nb_total_to_read = len(res) verbose_idx = 0 # for verbose pretty printing for row in res: recID,phrase = row if not wlist.has_key(recID): wlist[recID] = [] if tag == "8564_u": # Special treatment for fulltext indexing. 8564 # $$u contains URL, and $$y link name. If $$y is # actually a file name, that is if it ends with # something like .pdf or .ppt, then $$u is treated # as direct URL to the PDF file, and is indexed as # such. This is useful to index Indico files. # FIXME: this is a quick fix only. We should # rather download all 856 $$u files and analyze # content in order to decide how to index them # (directly for Indico, indirectly for Setlink). filename = get_associated_subfield_value(recID,'8564_u', phrase, 'y') filename_extension = lower(split(filename, ".")[-1]) if filename_extension in conv_programs.keys(): new_words = get_words_function(phrase, force_file_extension=filename_extension) # ,self.separators else: new_words = get_words_function(phrase) # ,self.separators else: new_words = get_words_function(phrase) # ,self.separators wlist[recID] = list_union(new_words,wlist[recID]) # were there some words for these recIDs found? if len(wlist) == 0: return 0 recIDs = wlist.keys() for recID in recIDs: # was this record marked as deleted? if "DELETED" in self.get_field(recID, "980__c"): wlist[recID] = [] if options["verbose"] >= 9: write_message("... record %d was declared deleted, removing its word list" % recID) if options["verbose"] >= 9: write_message("... record %d, termlist: %s" % (recID, wlist[recID])) # Using cStringIO for speed. query_factory = cStringIO.StringIO() qwrite = query_factory.write qwrite( "INSERT INTO %sR (id_bibrec,termlist,type) VALUES" % self.tablename[:-1]) qwrite( "('" ) qwrite( str(recIDs[0]) ) qwrite( "','" ) qwrite( serialize_via_marshal(wlist[recIDs[0]]) ) qwrite( "','FUTURE')" ) for recID in recIDs[1:]: qwrite(",('") qwrite(str(recID)) qwrite("','") qwrite(serialize_via_marshal(wlist[recID])) qwrite("','FUTURE')") query = query_factory.getvalue() query_factory.close() run_sql(query) query_factory = cStringIO.StringIO() qwrite = query_factory.write qwrite("INSERT INTO %sR (id_bibrec,termlist,type) VALUES" % self.tablename[:-1]) qwrite("('") qwrite(str(recIDs[0])) qwrite("','") qwrite(serialize_via_marshal(wlist[recIDs[0]])) qwrite("','CURRENT')") for recID in recIDs[1:]: qwrite( ",('" ) qwrite( str(recID) ) qwrite( "','" ) qwrite( empty_list_string ) qwrite( "','CURRENT')" ) query = query_factory.getvalue() query_factory.close() try: run_sql(query) except MySQLdb.DatabaseError: # ok, we tried to add an existent record. No problem pass put = self.put for recID in recIDs: for w in wlist[recID]: put(recID, w, 1) return len(recIDs) def log_progress(self, start, done, todo): """Calculate progress and store it. start: start time, done: records processed, todo: total number of records""" time_elapsed = time.time() - start # consistency check if time_elapsed == 0 or done > todo: return time_recs_per_min = done/(time_elapsed/60.0) if options["verbose"]: write_message("%d records took %.1f seconds to complete.(%1.f recs/min)"\ % (done, time_elapsed, time_recs_per_min)) if time_recs_per_min: if options["verbose"]: write_message("Estimated runtime: %.1f minutes" % \ ((todo-done)/time_recs_per_min)) def put(self, recID, word, sign): "Adds/deletes a word to the word list." try: word = lower(word[:50]) if self.value.has_key(word): # the word 'word' exist already: update sign self.value[word][recID] = sign else: self.value[word] = {recID: sign} except: write_message("Error: Cannot put word %s with sign %d for recID %s." % (word, sign, recID)) def del_recIDs(self, recIDs): """Fetches records which id in the recIDs range list and adds them to the wordTable. The recIDs range list is of the form: [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]]. """ count = 0 for range in recIDs: self.del_recID_range(range[0],range[1]) count = count + range[1] - range[0] self.put_into_db() def del_recID_range(self, low, high): """Deletes records with 'recID' system number between low and high from memory words index table.""" if options["verbose"] > 2: write_message("%s fetching existing words for records #%d-#%d started" % \ (self.tablename, low, high)) self.recIDs_in_mem.append([low,high]) query = """SELECT id_bibrec,termlist FROM %sR as bb WHERE bb.id_bibrec BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high) recID_rows = run_sql(query) for recID_row in recID_rows: recID = recID_row[0] wlist = deserialize_via_marshal(recID_row[1]) for word in wlist: self.put(recID, word, -1) if options["verbose"] > 2: write_message("%s fetching existing words for records #%d-#%d ended" % \ (self.tablename, low, high)) def report_on_table_consistency(self): """Check reverse words index tables (e.g. idxWORD01R) for interesting states such as 'TEMPORARY' state. Prints small report (no of words, no of bad words). """ # find number of words: query = """SELECT COUNT(*) FROM %s""" % (self.tablename) res = run_sql(query, None, 1) if res: nb_words = res[0][0] else: nb_words = 0 # find number of records: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1]) res = run_sql(query, None, 1) if res: nb_records = res[0][0] else: nb_records = 0 # report stats: if options["verbose"]: write_message("%s contains %d words from %d records" % (self.tablename, nb_words, nb_records)) # find possible bad states in reverse tables: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1]) res = run_sql(query) if res: nb_bad_records = res[0][0] else: nb_bad_records = 999999999 if nb_bad_records: write_message("EMERGENCY: %s needs to repair %d of %d records" % \ (self.tablename, nb_bad_records, nb_records)) else: if options["verbose"]: write_message("%s is in consistent state" % (self.tablename)) return nb_bad_records def repair(self): """Repair the whole table""" # find possible bad states in reverse tables: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1]) res = run_sql(query, None, 1) if res: nb_bad_records = res[0][0] else: nb_bad_records = 0 # find number of records: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1]) res = run_sql(query) if res: nb_records = res[0][0] else: nb_records = 0 if nb_bad_records == 0: return query = """SELECT id_bibrec FROM %sR WHERE type <> 'CURRENT' ORDER BY id_bibrec""" \ % (self.tablename[:-1]) res = run_sql(query) recIDs = create_range_list(res) flush_count = 0 records_done = 0 records_to_go = 0 for range in recIDs: records_to_go = records_to_go + range[1] - range[0] + 1 time_started = time.time() # will measure profile time for range in recIDs: i_low = range[0] chunksize_count = 0 while i_low <= range[1]: # calculate chunk group of recIDs and treat it: i_high = min(i_low+options["flush"]-flush_count-1,range[1]) i_high = min(i_low+chunksize-chunksize_count-1, i_high) try: self.fix_recID_range(i_low, i_high) except StandardError, e: write_message("Exception caught: %s" % e, sys.stderr) if options["verbose"] >= 9: traceback.print_tb(sys.exc_info()[2]) task_update_status("ERROR") task_sig_stop_commands() sys.exit(1) flush_count = flush_count + i_high - i_low + 1 chunksize_count = chunksize_count + i_high - i_low + 1 records_done = records_done + i_high - i_low + 1 if chunksize_count >= chunksize: chunksize_count = 0 # flush if necessary: if flush_count >= options["flush"]: self.put_into_db("emergency") self.clean() flush_count = 0 self.log_progress(time_started,records_done,records_to_go) # iterate: i_low = i_high + 1 if flush_count > 0: self.put_into_db("emergency") self.log_progress(time_started,records_done,records_to_go) write_message("%s inconsistencies repaired." % self.tablename) def chk_recID_range(self, low, high): """Check if the reverse index table is in proper state""" ## check db query = """SELECT COUNT(*) FROM %sR WHERE type <> 'CURRENT' AND id_bibrec BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high) res = run_sql(query, None, 1) if res[0][0]==0: if options["verbose"]: write_message("%s for %d-%d is in consistent state"%(self.tablename,low,high)) return # okay, words table is consistent ## inconsistency detected! write_message("EMERGENCY: %s inconsistencies detected..." % self.tablename) write_message("""EMERGENCY: Errors found. You should check consistency of the %s - %sR tables.\nRunning 'bibindex --repair' is recommended.""" \ % (self.tablename, self.tablename[:-1])) raise StandardError def fix_recID_range(self, low, high): """Try to fix reverse index database consistency (e.g. table idxWORD01R) in the low,high doc-id range. Possible states for a recID follow: CUR TMP FUT: very bad things have happened: warn! CUR TMP : very bad things have happened: warn! CUR FUT: delete FUT (crash before flushing) CUR : database is ok TMP FUT: add TMP to memory and del FUT from memory flush (revert to old state) TMP : very bad things have happened: warn! FUT: very bad things have happended: warn! """ state = {} query = "SELECT id_bibrec,type FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d'"\ % (self.tablename[:-1], low, high) res = run_sql(query) for row in res: if not state.has_key(row[0]): state[row[0]]=[] state[row[0]].append(row[1]) ok = 1 # will hold info on whether we will be able to repair for recID in state.keys(): if not 'TEMPORARY' in state[recID]: if 'FUTURE' in state[recID]: if 'CURRENT' not in state[recID]: write_message("EMERGENCY: Record %d is in inconsistent state. Can't repair it" % recID) ok = 0 else: write_message("EMERGENCY: Inconsistency in record %d detected" % recID) query = """DELETE FROM %sR WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID) run_sql(query) write_message("EMERGENCY: Inconsistency in record %d repaired." % recID) else: if 'FUTURE' in state[recID] and not 'CURRENT' in state[recID]: self.recIDs_in_mem.append([recID,recID]) # Get the words file query = """SELECT type,termlist FROM %sR WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID) if options["verbose"] >= 9: write_message(query) res = run_sql(query) for row in res: wlist = deserialize_via_marshal(row[1]) if options["verbose"] >= 9: write_message("Words are %s " % wlist) if row[0] == 'TEMPORARY': sign = 1 else: sign = -1 for word in wlist: self.put(recID, word, sign) else: write_message("EMERGENCY: %s for %d is in inconsistent state. Couldn't repair it." % (self.tablename, recID)) ok = 0 if not ok: write_message("""EMERGENCY: Unrepairable errors found. You should check consistency of the %s - %sR tables. Deleting affected records is recommended.""" % (self.tablename, self.tablename[:-1])) raise StandardError def task_run(row): """Run the indexing task. The row argument is the BibSched task queue row, containing if, arguments, etc. Return 1 in case of success and 0 in case of failure. """ global options, task_id, wordTables, stemmer, stopwords # read from SQL row: task_id = row[0] task_proc = row[1] options = loads(row[6]) task_status = row[7] # sanity check: if task_proc != "bibindex": write_message("The task #%d does not seem to be a BibIndex task." % task_id, sys.stderr) return 0 if task_status != "WAITING": write_message("The task #%d is %s. I expected WAITING." % (task_id, task_status), sys.stderr) return 0 # we can run the task now: if options["verbose"]: write_message("Task #%d started." % task_id) task_starting_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) task_update_status("RUNNING") # install signal handlers signal.signal(signal.SIGUSR1, task_sig_sleep) signal.signal(signal.SIGTERM, task_sig_stop) signal.signal(signal.SIGABRT, task_sig_suicide) signal.signal(signal.SIGCONT, task_sig_wakeup) signal.signal(signal.SIGINT, task_sig_unknown) ## go ahead and treat each table : for table in options["windex"]: wordTable = WordTable(table.keys()[0], table.values()[0]) wordTable.report_on_table_consistency() try: if options["cmd"] == "del": if options["id"]: wordTable.del_recIDs(options["id"]) elif options["collection"]: l_of_colls = string.split(options["collection"], ",") recIDs = perform_request_search(c=l_of_colls) recIDs_range = [] for recID in recIDs: recIDs_range.append([recID,recID]) wordTable.del_recIDs(recIDs_range) else: write_message("Missing IDs of records to delete from index %s.", wordTable.tablename, sys.stderr) raise StandardError elif options["cmd"] == "add": if options["id"]: wordTable.add_recIDs(options["id"]) elif options["collection"]: l_of_colls = string.split(options["collection"], ",") recIDs = perform_request_search(c=l_of_colls) recIDs_range = [] for recID in recIDs: recIDs_range.append([recID,recID]) wordTable.add_recIDs(recIDs_range) else: wordTable.add_recIDs_by_date(options["modified"]) # only update last_updated if run via automatic mode: wordTable.update_last_updated(task_starting_time) elif options["cmd"] == "repair": wordTable.repair() else: write_message("Invalid command found processing %s" % \ wordTable.tablename, sys.stderr) raise StandardError except StandardError, e: write_message("Exception caught: %s" % e, sys.stderr) if options["verbose"] >= 9: traceback.print_tb(sys.exc_info()[2]) task_update_status("ERROR") task_sig_stop_commands() sys.exit(1) wordTable.report_on_table_consistency() # We are done. State it in the database, close and quit task_update_status("DONE") if options["verbose"]: write_message("Task #%d finished." % task_id) return 1 def command_line(): global options long_flags =["add","del","id=","modified=","collection=", "windex=", "check","repair","maxmem=", "flush=","user=","sleeptime=", "time=","help", "version", "verbose="] short_flags ="adi:m:c:w:krM:f:u:s:t:hVv:" format_string = "%Y-%m-%d %H:%M:%S" tables = None sleeptime = "" try: opts, args = getopt.getopt(sys.argv[1:], short_flags, long_flags) except getopt.GetoptError, err: write_message(err, sys.stderr) usage(1) if args: usage(1) options={"cmd":"add", "id":[], "modified":[], "collection":[], "maxmem":0, "flush":10000, "sleeptime":0, "verbose":1 } sched_time = time.strftime(format_string) user = "" # Check for key options try: for opt in opts: if opt == ("-h","") or opt == ("--help",""): usage(1) elif opt == ("-V","") or opt == ("--version",""): print bibindex_engine_version sys.exit(1) elif opt[0] in ["--verbose", "-v"]: options["verbose"] = int(opt[1]) elif opt == ("-a","") or opt == ("--add",""): options["cmd"] = "add" if ("-x","") in opts or ("--del","") in opts: usage(1) elif opt == ("-k","") or opt == ("--check",""): options["cmd"] = "check" elif opt == ("-r","") or opt == ("--repair",""): options["cmd"] = "repair" elif opt == ("-d","") or opt == ("--del",""): options["cmd"]="del" elif opt[0] in [ "-i", "--id" ]: options["id"] = options["id"] + split_ranges(opt[1]) elif opt[0] in [ "-m", "--modified" ]: options["modified"] = get_date_range(opt[1]) elif opt[0] in [ "-c", "--collection" ]: options["collection"] = opt[1] elif opt[0] in [ "-w", "--windex" ]: tables = opt[1] elif opt[0] in [ "-M", "--maxmem"]: options["maxmem"]=int(opt[1]) if options["maxmem"] < base_process_size + 1000: raise StandardError, "Memory usage should be higher than %d kB" % (base_process_size + 1000) elif opt[0] in [ "-f", "--flush"]: options["flush"]=int(opt[1]) elif opt[0] in [ "-u", "--user"]: user = opt[1] elif opt[0] in [ "-s", "--sleeptime" ]: get_datetime(opt[1]) # see if it is a valid shift sleeptime= opt[1] elif opt[0] in [ "-t", "--time" ]: sched_time= get_datetime(opt[1]) else: usage(1) except StandardError, e: write_message(e, sys.stderr) sys.exit(1) options["windex"]=get_word_tables(tables) if options["cmd"] == "check": for table in options["windex"]: wordTable = WordTable(table.keys()[0], table.values()[0]) wordTable.report_on_table_consistency() return user = authenticate(user) if options["verbose"] >= 9: print "" write_message("storing task options %s\n" % options) new_task_id = run_sql("""INSERT INTO schTASK (proc,user,runtime,sleeptime,arguments,status) VALUES ('bibindex',%s,%s,%s,%s,'WAITING')""", (user, sched_time, sleeptime, dumps(options))) print "Task #%d was successfully scheduled for execution." % new_task_id return def task_sig_sleep(sig, frame): """Signal handler for the 'sleep' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("sleeping...") task_update_status("SLEEPING") signal.pause() # wait for wake-up signal def task_sig_wakeup(sig, frame): """Signal handler for the 'wakeup' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("continuing...") task_update_status("CONTINUING") def task_sig_stop(sig, frame): """Signal handler for the 'stop' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("stopping...") task_update_status("STOPPING") errcode = 0 try: task_sig_stop_commands() write_message("stopped") task_update_status("STOPPED") except StandardError, err: write_message("Error during stopping! %e" % err) task_update_status("STOPPINGFAILED") errcode = 1 sys.exit(errcode) def task_sig_stop_commands(): """Do all the commands necessary to stop the task before quitting. Useful for task_sig_stop() handler. """ write_message("stopping commands started") for table in wordTables: table.put_into_db() write_message("stopping commands ended") def task_sig_suicide(sig, frame): """Signal handler for the 'suicide' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("suiciding myself now...") task_update_status("SUICIDING") write_message("suicided") task_update_status("SUICIDED") sys.exit(0) def task_sig_unknown(sig, frame): """Signal handler for the other unknown signals sent by shell or user.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("unknown signal %d ignored" % sig) # do nothing for other signals def task_update_progress(msg): """Updates progress information in the BibSched task table.""" global task_id, options if options["verbose"] >= 9: write_message("Updating task progress to %s." % msg) return run_sql("UPDATE schTASK SET progress=%s where id=%s", (msg, task_id)) def task_update_status(val): """Updates state information in the BibSched task table.""" global task_id, options if options["verbose"] >= 9: write_message("Updating task status to %s." % val) return run_sql("UPDATE schTASK SET status=%s where id=%s", (val, task_id)) def test_fulltext_indexing(): """Tests fulltext indexing programs on PDF, PS, DOC, PPT, XLS. Prints list of words and word table on the screen. Does not integrate anything into the database. Useful when debugging problems with fulltext indexing: call this function instead of main(). """ options = {} options["verbose"] = 9 print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=atlnot&categ=Communication&id=com-indet-2002-012") # protected URL print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=agenda&categ=a00388&id=a00388s2t7") # XLS print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=agenda&categ=a02883&id=a02883s1t6/transparencies") # PPT print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=agenda&categ=a99149&id=a99149s1t10/transparencies") # DOC print get_words_from_fulltext("http://doc.cern.ch/cgi-bin/setlink?base=preprint&categ=cern&id=lhc-project-report-601") # PDF sys.exit(0) def test_word_separators(phrase="hep-th/0101001"): """Tests word separating policy on various input.""" print "%s:" % phrase for word in get_words_from_phrase(phrase): print "\t-> %s" % word def main(): """Reads arguments and either runs the task, or starts user-interface (command line).""" if len(sys.argv) == 2: try: id = int(sys.argv[1]) except StandardError, err: command_line() sys.exit() res = run_sql("SELECT * FROM schTASK WHERE id='%d'" % (id), None, 1) if not res: write_message("Selected task not found.", sys.stderr) sys.exit(1) try: if not task_run(res[0]): write_message("Error occurred. Exiting.", sys.stderr) except StandardError, e: write_message("Unexpected error occurred: %s." % e, sys.stderr) if options["verbose"] >= 9: traceback.print_tb(sys.exc_info()[2]) write_message("Exiting.") task_update_status("ERROR") else: command_line() diff --git a/modules/bibindex/lib/bibindex_engine_config.py b/modules/bibindex/lib/bibindex_engine_config.py index 501b87fa1..5280d07db 100644 --- a/modules/bibindex/lib/bibindex_engine_config.py +++ b/modules/bibindex/lib/bibindex_engine_config.py @@ -1,66 +1,66 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ BibIndex indexing engine configuration parameters. """ ## configuration parameters read from the general config file: -from config import cfg_bibindex_fulltext_index_local_files_only, \ - cfg_bibindex_stemmer_default_language, \ - cfg_bibindex_remove_stopwords, \ - cfg_bibindex_path_to_stopwords_file, \ - cfg_bibindex_chars_alphanumeric_separators, \ - cfg_bibindex_chars_punctuation, \ - cfg_bibindex_remove_html_markup, \ - cfg_bibindex_min_word_length, \ - cfg_bibindex_urlopener_username, \ - cfg_bibindex_urlopener_password, \ - version, \ - pdftotext, \ - pstotext, \ - pstoascii, \ - antiword, \ - catdoc, \ - wvtext, \ - ppthtml, \ - xlhtml, \ - htmltotext, \ - gzip +from cdsware.config import cfg_bibindex_fulltext_index_local_files_only, \ + cfg_bibindex_stemmer_default_language, \ + cfg_bibindex_remove_stopwords, \ + cfg_bibindex_path_to_stopwords_file, \ + cfg_bibindex_chars_alphanumeric_separators, \ + cfg_bibindex_chars_punctuation, \ + cfg_bibindex_remove_html_markup, \ + cfg_bibindex_min_word_length, \ + cfg_bibindex_urlopener_username, \ + cfg_bibindex_urlopener_password, \ + version, \ + pdftotext, \ + pstotext, \ + pstoascii, \ + antiword, \ + catdoc, \ + wvtext, \ + ppthtml, \ + xlhtml, \ + htmltotext, \ + gzip ## version number: bibindex_engine_version = "CDSware/%s bibindex/%s" % (version, version) ## programs used to convert fulltext files to text: conv_programs = {#"ps": [pstotext,pstoascii], # switched off at the moment, since PDF is faster #"ps.gz": [pstotext,pstoascii], "pdf": [pdftotext,pstotext,pstoascii], "doc": [antiword,catdoc,wvtext], "ppt": [ppthtml], "xls": [xlhtml]} ## helper programs used if the above programs convert only to html or other intermediate file formats: conv_programs_helpers = {"html": htmltotext, "gz": gzip} ## safety parameters concerning MySQL thread-multiplication problem: cfg_check_mysql_threads = 0 # to check or not to check the problem? cfg_max_mysql_threads = 50 # how many threads (connections) we consider as still safe cfg_mysql_thread_timeout = 20 # we'll kill threads that were sleeping for more than X seconds diff --git a/modules/bibindex/lib/bibindex_engine_stemmer.py b/modules/bibindex/lib/bibindex_engine_stemmer.py index f9a43a8e0..7275fd752 100644 --- a/modules/bibindex/lib/bibindex_engine_stemmer.py +++ b/modules/bibindex/lib/bibindex_engine_stemmer.py @@ -1,50 +1,49 @@ ## $Id$ -## BibIndex stemmer. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. -from bibindex_engine_config import * +from cdsware.bibindex_engine_config import * def create_stemmers(): """Create stemmers dictionary for all possible languages.""" languages = {'fr': 'french', 'en': 'english', 'no':'norwegian', 'sv':'swedish', 'de': 'german', 'it':'italian', 'pt':'portuguese'} stemmers = {} try: import Stemmer for (key, value) in languages.iteritems(): stemmers[key] = Stemmer.Stemmer(value) except ImportError: pass # PyStemmer isn't available return stemmers stemmers = create_stemmers() def is_stemmer_available_for_language(lang): """Return true if stemmer for language LANG is available. Return false otherwise. """ return stemmers.has_key(lang) def stem(word, lang=cfg_bibindex_stemmer_default_language): """Return WORD stemmed according to language LANG (e.g. 'en').""" if lang and is_stemmer_available_for_language(lang): return stemmers[lang].stem(word) else: return word diff --git a/modules/bibindex/lib/bibindex_engine_stemmer_tests.py b/modules/bibindex/lib/bibindex_engine_stemmer_tests.py index 0ce89fbcc..e20fa7a82 100644 --- a/modules/bibindex/lib/bibindex_engine_stemmer_tests.py +++ b/modules/bibindex/lib/bibindex_engine_stemmer_tests.py @@ -1,46 +1,47 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Unit tests for the indexing engine.""" __version__ = "$Id$" -import bibindex_engine_stemmer import unittest +from cdsware import bibindex_engine_stemmer + class TestStemmer(unittest.TestCase): """Test stemming, if available.""" def test_stemmer_none(self): """bibindex engine - no stemmer""" self.assertEqual("information", bibindex_engine_stemmer.stem("information", None)) def test_stemmer_english(self): """bibindex engine - English stemmer""" self.assertEqual("inform", bibindex_engine_stemmer.stem("information", "en")) def create_test_suite(): """Return test suite for the indexing engine.""" return unittest.TestSuite((unittest.makeSuite(TestStemmer,'test'),)) if __name__ == "__main__": unittest.TextTestRunner(verbosity=2).run(create_test_suite()) diff --git a/modules/bibindex/lib/bibindex_engine_stopwords.py b/modules/bibindex/lib/bibindex_engine_stopwords.py index aff7f5342..131dea438 100644 --- a/modules/bibindex/lib/bibindex_engine_stopwords.py +++ b/modules/bibindex/lib/bibindex_engine_stopwords.py @@ -1,51 +1,51 @@ ## $Id$ ## BibIndex stopwords facility. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import string -from bibindex_engine_config import * +from cdsware.bibindex_engine_config import * def create_stopwords(filename=cfg_bibindex_path_to_stopwords_file): """Create stopword dictionary out of FILENAME.""" try: filename = open(filename, 'r') except: return {} lines = filename.readlines() filename.close() stopdict = {} for line in lines: stopdict[string.rstrip(line)] = 1 return stopdict stopwords = create_stopwords() def is_stopword(word, force_check=0): """Return true if WORD is found among stopwords, false otherwise. Also, return false if BibIndex wasn't configured to use stopwords. However, if FORCE_CHECK is set to 1, then do not pay attention to whether the admin disabled stopwords functionality, but look up the word anyway. This mode is useful for ranking. """ # note: input word is assumed to be in lowercase if (cfg_bibindex_remove_stopwords or force_check) and stopwords.has_key(word): return True return False diff --git a/modules/bibindex/lib/bibindex_engine_tests.py b/modules/bibindex/lib/bibindex_engine_tests.py index ff4015137..aeaa81e4f 100644 --- a/modules/bibindex/lib/bibindex_engine_tests.py +++ b/modules/bibindex/lib/bibindex_engine_tests.py @@ -1,41 +1,42 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Unit tests for the indexing engine.""" __version__ = "$Id$" -import bibindex_engine import unittest +from cdsware import bibindex_engine + class TestListSetOperations(unittest.TestCase): """Test list set operations.""" def test_list_union(self): """bibindex engine - list union""" self.assertEqual([1,2,3,4], bibindex_engine.list_union([1,2,3],[1,3,4])) def create_test_suite(): """Return test suite for the indexing engine.""" return unittest.TestSuite((unittest.makeSuite(TestListSetOperations,'test'),)) if __name__ == "__main__": unittest.TextTestRunner(verbosity=2).run(create_test_suite()) diff --git a/modules/bibindex/lib/bibindexadminlib.py b/modules/bibindex/lib/bibindexadminlib.py index f8ebe6604..b5cb567f1 100644 --- a/modules/bibindex/lib/bibindexadminlib.py +++ b/modules/bibindex/lib/bibindexadminlib.py @@ -1,1651 +1,1652 @@ ## $Id$ ## Administrator interface for BibIndex ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware BibIndex Administrator Interface.""" import cgi import re import MySQLdb import Numeric import os import urllib import time import random from zlib import compress,decompress -from bibrankadminlib import write_outcome,modify_translations,get_def_name,get_i8n_name,get_name,get_rnk_nametypes,get_languages,check_user,is_adminuser,adderrorbox,addadminbox,tupletotable,tupletotable_onlyselected,addcheckboxes,createhiddenform,serialize_via_numeric_array_dumps,serialize_via_numeric_array_compr,serialize_via_numeric_array_escape,serialize_via_numeric_array,deserialize_via_numeric_array,serialize_via_marshal,deserialize_via_marshal -from messages import * -from dbquery import run_sql -from config import * -from webpage import page, pageheaderonly, pagefooteronly -from webuser import getUid, get_email from mod_python import apache -from search_engine import nice_number + +from cdsware.bibrankadminlib import write_outcome,modify_translations,get_def_name,get_i8n_name,get_name,get_rnk_nametypes,get_languages,check_user,is_adminuser,adderrorbox,addadminbox,tupletotable,tupletotable_onlyselected,addcheckboxes,createhiddenform,serialize_via_numeric_array_dumps,serialize_via_numeric_array_compr,serialize_via_numeric_array_escape,serialize_via_numeric_array,deserialize_via_numeric_array,serialize_via_marshal,deserialize_via_marshal +from cdsware.messages import * +from cdsware.dbquery import run_sql +from cdsware.config import * +from cdsware.webpage import page, pageheaderonly, pagefooteronly +from cdsware.webuser import getUid, get_email +from cdsware.search_engine import nice_number __version__ = "$Id$" def getnavtrail(previous = ''): """Get the navtrail""" navtrail = """Admin Area > BibIndex Admin """ % (weburl, weburl) navtrail = navtrail + previous return navtrail def perform_index(ln=cdslang, mtype='', content=''): """start area for modifying indexes mtype - the method that called this method. content - the output from that method.""" fin_output = """
0. Show all 1. Overview of indexes 2. Edit index 3. Add new index
""" % (weburl, ln, weburl, ln, weburl, ln, weburl, ln) if mtype == "perform_showindexoverview" and content: fin_output += content elif mtype == "perform_showindexoverview" or not mtype: fin_output += perform_showindexoverview(ln, callback='') if mtype == "perform_editindexes" and content: fin_output += content elif mtype == "perform_editindexes" or not mtype: fin_output += perform_editindexes(ln, callback='') if mtype == "perform_addindex" and content: fin_output += content elif mtype == "perform_addindex" or not mtype: fin_output += perform_addindex(ln, callback='') return addadminbox("Menu", [fin_output]) def perform_field(ln=cdslang, mtype='', content=''): """Start area for modifying fields mtype - the method that called this method. content - the output from that method.""" fin_output = """
0. Show all 1. Overview of logical fields 2. Edit logical field 3. Add new logical field
""" % (weburl, ln, weburl, ln, weburl, ln, weburl, ln) if mtype == "perform_editfields" and content: fin_output += content elif mtype == "perform_editfields" or not mtype: fin_output += perform_editfields(ln, callback='') if mtype == "perform_addfield" and content: fin_output += content elif mtype == "perform_addfield" or not mtype: fin_output += perform_addfield(ln, callback='') if mtype == "perform_showfieldoverview" and content: fin_output += content elif mtype == "perform_showfieldoverview" or not mtype: fin_output += perform_showfieldoverview(ln, callback='') return addadminbox("Menu", [fin_output]) def perform_editfield(fldID, ln=cdslang, mtype='', content='', callback='yes', confirm=-1): """form to modify a field. this method is calling other methods which again is calling this and sending back the output of the method. if callback, the method will call perform_editcollection, if not, it will just return its output. fldID - id of the field mtype - the method that called this method. content - the output from that method.""" fld_dict = dict(get_def_name('', "field")) if fldID in [-1, "-1"]: return addadminbox("Edit logical field", ["""Please go back and select a logical field"""]) fin_output = """
Menu
0. Show all 1. Modify field code 2. Modify translations 3. Modify MARC tags 4. Delete field
5. Show field usage
""" % (weburl, fldID, ln, weburl, fldID, ln, weburl, fldID, ln, weburl, fldID, ln, weburl, fldID, ln, weburl, fldID, ln) if mtype == "perform_modifyfield" and content: fin_output += content elif mtype == "perform_modifyfield" or not mtype: fin_output += perform_modifyfield(fldID, ln, callback='') if mtype == "perform_modifyfieldtranslations" and content: fin_output += content elif mtype == "perform_modifyfieldtranslations" or not mtype: fin_output += perform_modifyfieldtranslations(fldID, ln, callback='') if mtype == "perform_modifyfieldtags" and content: fin_output += content elif mtype == "perform_modifyfieldtags" or not mtype: fin_output += perform_modifyfieldtags(fldID, ln, callback='') if mtype == "perform_deletefield" and content: fin_output += content elif mtype == "perform_deletefield" or not mtype: fin_output += perform_deletefield(fldID, ln, callback='') return addadminbox("Edit logical field '%s'" % fld_dict[int(fldID)], [fin_output]) def perform_editindex(idxID, ln=cdslang, mtype='', content='', callback='yes', confirm=-1): """form to modify a index. this method is calling other methods which again is calling this and sending back the output of the method. idxID - id of the index mtype - the method that called this method. content - the output from that method.""" if idxID in [-1, "-1"]: return addadminbox("Edit index", ["""Please go back and select a index"""]) fin_output = """
Menu
0. Show all 1. Modify index name / descriptor 2. Modify translations 3. Modify index fields 4. Delete index
""" % (weburl, idxID, ln, weburl, idxID, ln, weburl, idxID, ln, weburl, idxID, ln, weburl, idxID, ln) if mtype == "perform_modifyindex" and content: fin_output += content elif mtype == "perform_modifyindex" or not mtype: fin_output += perform_modifyindex(idxID, ln, callback='') if mtype == "perform_modifyindextranslations" and content: fin_output += content elif mtype == "perform_modifyindextranslations" or not mtype: fin_output += perform_modifyindextranslations(idxID, ln, callback='') if mtype == "perform_modifyindexfields" and content: fin_output += content elif mtype == "perform_modifyindexfields" or not mtype: fin_output += perform_modifyindexfields(idxID, ln, callback='') if mtype == "perform_deleteindex" and content: fin_output += content elif mtype == "perform_deleteindex" or not mtype: fin_output += perform_deleteindex(idxID, ln, callback='') return addadminbox("Edit index", [fin_output]) def perform_showindexoverview(ln=cdslang, callback='', confirm=0): subtitle = """1. Overview of indexes""" output = """""" output += """""" % ("ID", "Name", "Fwd.Idx Size", "Rev.Idx Size", "Fwd.Idx Words", "Rev.Idx Records", "Last updated", "Fields", "Translations") idx = get_idx() idx_dict = dict(get_def_name('', "idxINDEX")) for idxID, idxNAME, idxDESC,idxUPD in idx: table_status_forward = get_table_status('idxWORD%sF' % (idxID < 10 and '0%s' % idxID or idxID)) table_status_reverse = get_table_status('idxWORD%sR' % (idxID < 10 and '0%s' % idxID or idxID)) if str(idxUPD)[-3:] == ".00": idxUPD = str(idxUPD)[0:-3] lang = get_lang_list("idxINDEXNAME", "id_idxINDEX", idxID) idx_fld = get_idx_fld(idxID) fld = "" for row in idx_fld: fld += row[1] + ", " if fld.endswith(", "): fld = fld[:-2] if len(fld) == 0: fld = """None""" date = (idxUPD and idxUPD or """Not updated""") if table_status_forward and table_status_reverse: output += """""" % (idxID, """%s""" % (weburl, idxID, ln, idxDESC, idx_dict[idxID]), "%s MB" % nice_number(table_status_forward[0][5] / 1048576), "%s MB" % nice_number(table_status_reverse[0][5] / 1048576), nice_number(table_status_forward[0][3]), nice_number(table_status_reverse[0][3]), date, fld, lang) elif not table_status_forward: output += """""" % (idxID, """%s""" % (weburl, idxID, ln, idx_dict[idxID]), "Error", "%s MB" % nice_number(table_status_reverse[0][5] / 1048576),"Error", nice_number(table_status_reverse[0][3]), date, "", lang) elif not table_status_reverse: output += """""" % (idxID, """%s""" % (weburl, idxID, ln, idx_dict[idxID]), "%s MB" % nice_number(table_status_forward[0][5] / 1048576), "Error", nice_number(table_status_forward[0][3]), "Error", date, "", lang) output += "
%s%s%s%s%s%s%s%s%s
%s%s%s%s%s%s%s%s%s
%s%s%s%s%s%s%s%s%s
%s%s%s%s%s%s%s%s%s
" try: body = [output, extra] except NameError: body = [output] if callback: return perform_index(fldID, ln, "perform_showindexoverview", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_editindexes(ln=cdslang, callback='yes', content='', confirm=-1): """show a list of indexes that can be edited.""" subtitle = """2. Edit index   [?]""" % (weburl) fin_output = '' idx = get_idx() output = "" if len(idx) > 0: text = """ Index name """ output += createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/editindex" % weburl, text=text, button="Edit", ln=ln, confirm=1) else: output += """No indexes exists""" try: body = [output, extra] except NameError: body = [output] if callback: return perform_index(ln, "perform_editindexes", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_editfields(ln=cdslang, callback='yes', content='', confirm=-1): """show a list of all logical fields that can be edited.""" subtitle = """5. Edit logical field   [?]""" % (weburl) fin_output = '' res = get_fld() output = "" if len(res) > 0: text = """ Field name """ output += createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/editfield" % weburl, text=text, button="Edit", ln=ln, confirm=1) else: output += """No logical fields exists""" try: body = [output, extra] except NameError: body = [output] if callback: return perform_field(ln, "perform_editfields", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_addindex(ln=cdslang, idxNAME='', callback="yes", confirm=-1): """form to add a new index. idxNAME - the name of the new index""" output = "" subtitle = """3. Add new index""" text = """ Index name
""" % idxNAME output = createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/addindex" % weburl, text=text, ln=ln, button="Add index", confirm=1) if idxNAME and confirm in ["1", 1]: res = add_idx(idxNAME) output += write_outcome(res) + """
Configure this index.""" % (weburl, res[1], ln) elif confirm not in ["-1", -1]: output += """Please give the index a name. """ try: body = [output, extra] except NameError: body = [output] if callback: return perform_index(ln, "perform_addindex", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyindextranslations(idxID, ln=cdslang, sel_type='', trans=[], confirm=-1, callback='yes'): """Modify the translations of a index sel_type - the nametype to modify trans - the translations in the same order as the languages from get_languages()""" output = '' subtitle = '' cdslangs = get_languages() if confirm in ["2", 2] and idxID: finresult = modify_translations(idxID, cdslangs, sel_type, trans, "idxINDEX") idx_dict = dict(get_def_name('', "idxINDEX")) if idxID and idx_dict.has_key(int(idxID)): idxID = int(idxID) subtitle = """2. Modify translations for index.   [?]""" % weburl if type(trans) is str: trans = [trans] if sel_type == '': sel_type = get_idx_nametypes()[0][0] header = ['Language', 'Translation'] actions = [] types = get_idx_nametypes() if len(types) > 1: text = """ Name type """ output += createhiddenform(action="modifyindextranslations#2", text=text, button="Select", idxID=idxID, ln=ln, confirm=0) if confirm in [-1, "-1", 0, "0"]: trans = [] for (key, value) in cdslangs: try: trans_names = get_name(idxID, key, sel_type, "idxINDEX") trans.append(trans_names[0][0]) except StandardError, e: trans.append('') for nr in range(0,len(cdslangs)): actions.append(["%s %s" % (cdslangs[nr][1], (cdslangs[nr][0]==cdslang and '(def)' or ''))]) actions[-1].append('' % trans[nr]) text = tupletotable(header=header, tuple=actions) output += createhiddenform(action="modifyindextranslations#2", text=text, button="Modify", idxID=idxID, sel_type=sel_type, ln=ln, confirm=2) if sel_type and len(trans): if confirm in ["2", 2]: output += write_outcome(finresult) try: body = [output, extra] except NameError: body = [output] if callback: return perform_editindex(idxID, ln, "perform_modifyindextranslations", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyfieldtranslations(fldID, ln=cdslang, sel_type='', trans=[], confirm=-1, callback='yes'): """Modify the translations of a field sel_type - the nametype to modify trans - the translations in the same order as the languages from get_languages()""" output = '' subtitle = '' cdslangs = get_languages() if confirm in ["2", 2] and fldID: finresult = modify_translations(fldID, cdslangs, sel_type, trans, "field") fld_dict = dict(get_def_name('', "field")) if fldID and fld_dict.has_key(int(fldID)): fldID = int(fldID) subtitle = """3. Modify translations for logical field '%s'   [?]""" % (fld_dict[fldID], weburl) if type(trans) is str: trans = [trans] if sel_type == '': sel_type = get_fld_nametypes()[0][0] header = ['Language', 'Translation'] actions = [] types = get_fld_nametypes() if len(types) > 1: text = """ Name type """ output += createhiddenform(action="modifyfieldtranslations#3", text=text, button="Select", fldID=fldID, ln=ln, confirm=0) if confirm in [-1, "-1", 0, "0"]: trans = [] for (key, value) in cdslangs: try: trans_names = get_name(fldID, key, sel_type, "field") trans.append(trans_names[0][0]) except StandardError, e: trans.append('') for nr in range(0,len(cdslangs)): actions.append(["%s %s" % (cdslangs[nr][1], (cdslangs[nr][0]==cdslang and '(def)' or ''))]) actions[-1].append('' % trans[nr]) text = tupletotable(header=header, tuple=actions) output += createhiddenform(action="modifyfieldtranslations#3", text=text, button="Modify", fldID=fldID, sel_type=sel_type, ln=ln, confirm=2) if sel_type and len(trans): if confirm in ["2", 2]: output += write_outcome(finresult) try: body = [output, extra] except NameError: body = [output] if callback: return perform_editfield(fldID, ln, "perform_modifytranslations", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_showdetailsfieldtag(fldID, tagID, ln=cdslang, callback="yes", confirm=-1): """form to add a new field. fldNAME - the name of the new field code - the field code""" fld_dict = dict(get_def_name('', "field")) fldID = int(fldID) tagname = run_sql("SELECT name from tag where id=%s" % tagID)[0][0] output = "" subtitle = """Showing details for MARC tag '%s'""" % tagname output += "
This MARC tag is used directly in these logical fields: " fld_tag = get_fld_tags('', tagID) exist = {} for (id_field,id_tag, tname, tvalue, score) in fld_tag: output += "%s, " % fld_dict[int(id_field)] exist[id_field] = 1 output += "
This MARC tag is used indirectly in these logical fields: " tag = run_sql("SELECT value from tag where id=%s" % id_tag) tag = tag[0][0] for i in range(0, len(tag) - 1): res = run_sql("SELECT id_field,id_tag FROM field_tag,tag WHERE tag.id=field_tag.id_tag AND tag.value='%s%%'" % tag[0:i]) for (id_field, id_tag) in res: output += "%s, " % fld_dict[int(id_field)] exist[id_field] = 1 res = run_sql("SELECT id_field,id_tag FROM field_tag,tag WHERE tag.id=field_tag.id_tag AND tag.value like '%s'" % tag) for (id_field, id_tag) in res: if not exist.has_key(id_field): output += "%s, " % fld_dict[int(id_field)] try: body = [output, extra] except NameError: body = [output] if callback: return perform_modifyfieldtags(fldID, ln, "perform_showdetailsfieldtag", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_showdetailsfield(fldID, ln=cdslang, callback="yes", confirm=-1): """form to add a new field. fldNAME - the name of the new field code - the field code""" fld_dict = dict(get_def_name('', "field")) col_dict = dict(get_def_name('', "collection")) fldID = int(fldID) col_fld = get_col_fld('', '', fldID) sort_types = dict(get_sort_nametypes()) fin_output = "" subtitle = """5. Show usage for logical field '%s'""" % fld_dict[fldID] output = "This logical field is used in these collections:
" ltype = '' exist = {} for (id_collection, id_field, id_fieldvalue, ftype, score, score_fieldvalue) in col_fld: if ltype != ftype: output += "
%s: " % sort_types[ftype] ltype = ftype exist = {} if not exist.has_key(id_collection): output += "%s, " % col_dict[int(id_collection)] exist[id_collection] = 1 if not col_fld: output = "This field is not used by any collections." fin_output = addadminbox('Collections', [output]) try: body = [fin_output, extra] except NameError: body = [fin_output] if callback: return perform_editfield(ln, "perform_showdetailsfield", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_addfield(ln=cdslang, fldNAME='', code='', callback="yes", confirm=-1): """form to add a new field. fldNAME - the name of the new field code - the field code""" output = "" subtitle = """6. Add new logical field""" code = str.replace(code,' ', '') text = """ Field name
Field code
""" % (fldNAME, code) output = createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/addfield" % weburl, text=text, ln=ln, button="Add field", confirm=1) if fldNAME and code and confirm in ["1", 1]: res = add_fld(fldNAME, code) output += write_outcome(res) elif confirm not in ["-1", -1]: output += """Please give the logical field a name and code. """ try: body = [output, extra] except NameError: body = [output] if callback: return perform_field(ln, "perform_addfield", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_deletefield(fldID, ln=cdslang, callback='yes', confirm=0): """form to remove a field. fldID - the field id from table field. """ fld_dict = dict(get_def_name('', "field")) if not fld_dict.has_key(int(fldID)): return """Field does not exist""" subtitle = """4. Delete the logical field '%s'   [?]""" % (fld_dict[int(fldID)], weburl) output = "" if fldID: fldID = int(fldID) if confirm in ["0", 0]: check = run_sql("SELECT * from idxINDEX_field where id_field=%s" % fldID) text = "" if check: text += """This field is used in an index, deletion may cause problems.
""" text += """Do you want to delete the logical field '%s' and all its relations and definitions.""" % (fld_dict[fldID]) output += createhiddenform(action="deletefield#4", text=text, button="Confirm", fldID=fldID, confirm=1) elif confirm in ["1", 1]: res = delete_fld(fldID) if res[0] == 1: return """
Field deleted.""" + write_outcome(res) else: output += write_outcome(res) try: body = [output, extra] except NameError: body = [output] if callback: return perform_editfield(fldID, ln, "perform_deletefield", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_deleteindex(idxID, ln=cdslang, callback='yes', confirm=0): """form to delete an index. idxID - the index id from table idxINDEX. """ if idxID: subtitle = """4. Delete the index.   [?]""" % weburl output = "" if confirm in ["0", 0]: idx = get_idx(idxID) if idx: text = "" text += """By deleting an index, you may also loose any indexed data in the forward and reverse table for this index.
""" text += """Do you want to delete the index '%s' and all its relations and definitions.""" % (idx[0][1]) output += createhiddenform(action="deleteindex#5", text=text, button="Confirm", idxID=idxID, confirm=1) else: return """
Index specified does not exist.""" elif confirm in ["1", 1]: res = delete_idx(idxID) if res[0] == 1: return """
Index deleted.""" + write_outcome(res) else: output += write_outcome(res) try: body = [output, extra] except NameError: body = [output] if callback: return perform_editindex(idxID, ln, "perform_deleteindex", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_showfieldoverview(ln=cdslang, callback='', confirm=0): subtitle = """4. Logical fields overview""" output = """""" output += """""" % ("Field", "MARC Tags", "Translations") query = "SELECT id,name FROM field" res = run_sql(query) col_dict = dict(get_def_name('', "collection")) fld_dict = dict(get_def_name('', "field")) for field_id,field_name in res: query = "SELECT tag.value FROM tag, field_tag WHERE tag.id=field_tag.id_tag AND field_tag.id_field=%d ORDER BY field_tag.score DESC,tag.value ASC" % field_id res = run_sql(query) field_tags = "" for row in res: field_tags = field_tags + row[0] + ", " if field_tags.endswith(", "): field_tags = field_tags[:-2] if not field_tags: field_tags = """None""" lang = get_lang_list("fieldname", "id_field", field_id) output += """""" % ("""%s""" % (weburl, field_id, ln, fld_dict[field_id]), field_tags, lang) output += "
%s%s%s
%s%s%s
" try: body = [output, extra] except NameError: body = [output] if callback: return perform_field(fldID, ln, "perform_showfieldoverview", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyindex(idxID, ln=cdslang, idxNAME='', idxDESC='', callback='yes', confirm=-1): """form to modify an index name. idxID - the index name to change. idxNAME - new name of index idxDESC - description of index content""" subtitle = "" output = "" if idxID not in [-1, "-1"]: subtitle = """1. Modify index name.   [?]""" % weburl if confirm in [-1, "-1"]: idx = get_idx(idxID) idxNAME = idx[0][1] idxDESC = idx[0][2] text = """ Index name
Index description
""" % (idxNAME, idxDESC) output += createhiddenform(action="modifyindex#1", text=text, button="Modify", idxID=idxID, ln=ln, confirm=1) if idxID > -1 and idxNAME and confirm in [1, "1"]: res = modify_idx(idxID, idxNAME, idxDESC) output += write_outcome(res) elif confirm in [1, "1"]: output += """
Please give a name for the index.""" else: output = """No index to modify.""" try: body = [output, extra] except NameError: body = [output] if callback: return perform_editindex(idxID, ln, "perform_modifyindex", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyfield(fldID, ln=cdslang, code='', callback='yes', confirm=-1): """form to modify a field. fldID - the field to change.""" subtitle = "" output = "" fld_dict = dict(get_def_name('', "field")) if fldID not in [-1, "-1"]: if confirm in [-1, "-1"]: res = get_fld(fldID) code = res[0][2] else: code = str.replace("%s" % code, " ", "") fldID = int(fldID) subtitle = """1. Modify field code for logical field '%s'   [?]""" % (fld_dict[int(fldID)], weburl) text = """ Field code
""" % code output += createhiddenform(action="modifyfield#2", text=text, button="Modify", fldID=fldID, ln=ln, confirm=1) if fldID > -1 and confirm in [1, "1"]: fldID = int(fldID) res = modify_fld(fldID, code) output += write_outcome(res) else: output = """No field to modify. """ try: body = [output, extra] except NameError: body = [output] if callback: return perform_editfield(fldID, ln, "perform_modifyfield", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyindexfields(idxID, ln=cdslang, callback='yes', content='', confirm=-1): """Modify which logical fields to use in this index..""" output = '' subtitle = """3. Modify index fields.   [?]""" % weburl output = """
Menu
Add field to index
Manage fields
""" % (weburl, idxID, ln, weburl, ln) header = ['Field', ''] actions = [] idx_fld = get_idx_fld(idxID) if len(idx_fld) > 0: for (idxID, idxNAME,fldID, fldNAME, regexp_punct, regexp_alpha_sep) in idx_fld: actions.append([fldNAME]) for col in [(('Remove','removeindexfield'),)]: actions[-1].append('%s' % (weburl, col[0][1], idxID, fldID, ln, col[0][0])) for (str, function) in col[1:]: actions[-1][-1] += ' / %s' % (weburl, function, idxID, fldID, ln, str) output += tupletotable(header=header, tuple=actions) else: output += """No index fields exists""" output += content try: body = [output, extra] except NameError: body = [output] if callback: return perform_editindex(idxID, ln, "perform_modifyindexfields", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyfieldtags(fldID, ln=cdslang, callback='yes', content='', confirm=-1): """show the sort fields of this collection..""" output = '' fld_dict = dict(get_def_name('', "field")) fld_type = get_fld_nametypes() fldID = int(fldID) subtitle = """3. Modify MARC tags for the logical field '%s'   [?]""" % (fld_dict[int(fldID)], weburl) output = """
Menu
Add MARC tag
Delete unused MARC tags
""" % (weburl, fldID, ln, weburl, fldID, ln) header = ['', 'Value', 'Comment', 'Actions'] actions = [] res = get_fld_tags(fldID) if len(res) > 0: i = 0 for (fldID, tagID, tname, tvalue, score) in res: move = "" if i != 0: move += """""" % (weburl, fldID, tagID, res[i - 1][1], ln, random.randint(0, 1000), weburl) else: move += "   " i += 1 if i != len(res): move += '' % (weburl, fldID, tagID, res[i][1], ln, random.randint(0, 1000), weburl) actions.append([move, tvalue, tname]) for col in [(('Details','showdetailsfieldtag'), ('Modify','modifytag'),('Remove','removefieldtag'),)]: actions[-1].append('%s' % (weburl, col[0][1], fldID, tagID, ln, col[0][0])) for (str, function) in col[1:]: actions[-1][-1] += ' / %s' % (weburl, function, fldID, tagID, ln, str) output += tupletotable(header=header, tuple=actions) else: output += """No fields exists""" output += content try: body = [output, extra] except NameError: body = [output] if callback: return perform_editfield(fldID, ln, "perform_modifyfieldtags", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_addtag(fldID, ln=cdslang, value=['',-1], name='', callback="yes", confirm=-1): """form to add a new field. fldNAME - the name of the new field code - the field code""" output = "" subtitle = """Add MARC tag to logical field""" text = """ Add new tag:
Tag value
Tag comment
""" % ((name=='' and value[0] or name), value[0]) text += """Or existing tag:
Tag """ output = createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/addtag" % weburl, text=text, fldID=fldID, ln=ln, button="Add tag", confirm=1) if (value[0] and value[1] in [-1, "-1"]) or (not value[0] and value[1] not in [-1, "-1"]): if confirm in ["1", 1]: res = add_fld_tag(fldID, name, (value[0] !='' and value[0] or value[1])) output += write_outcome(res) elif confirm not in ["-1", -1]: output += """Please choose to add either a new or an existing MARC tag, but not both. """ try: body = [output, extra] except NameError: body = [output] if callback: return perform_modifyfieldtags(fldID, ln, "perform_addtag", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifytag(fldID, tagID, ln=cdslang, name='', value='', callback='yes', confirm=-1): """form to modify a field. fldID - the field to change.""" subtitle = "" output = "" fld_dict = dict(get_def_name('', "field")) fldID = int(fldID) tagID = int(tagID) tag = get_tags(tagID) if confirm in [-1, "-1"] and not value and not name: name = tag[0][1] value = tag[0][2] subtitle = """Modify MARC tag""" text = """ Any modifications will apply to all logical fields using this tag.
Tag value
Comment
""" % (value, name) output += createhiddenform(action="modifytag#4.1", text=text, button="Modify", fldID=fldID, tagID=tagID, ln=ln, confirm=1) if name and value and confirm in [1, "1"]: res = modify_tag(tagID, name, value) output += write_outcome(res) try: body = [output, extra] except NameError: body = [output] if callback: return perform_modifyfieldtags(fldID, ln, "perform_modifytag", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_removefieldtag(fldID, tagID, ln=cdslang, callback='yes', confirm=0): """form to remove a tag from a field. fldID - the current field, remove the tag from this field. tagID - remove the tag with this id""" subtitle = """Remove MARC tag from logical field""" output = "" fld_dict = dict(get_def_name('', "field")) if fldID and tagID: fldID = int(fldID) tagID = int(tagID) tag = get_fld_tags(fldID, tagID) if confirm not in ["1", 1]: text = """Do you want to remove the tag '%s - %s ' from the field '%s'.""" % (tag[0][3], tag[0][2], fld_dict[fldID]) output += createhiddenform(action="removefieldtag#4.1", text=text, button="Confirm", fldID=fldID, tagID=tagID, confirm=1) elif confirm in ["1", 1]: res = remove_fldtag(fldID, tagID) output += write_outcome(res) try: body = [output, extra] except NameError: body = [output] if callback: return perform_modifyfieldtags(fldID, ln, "perform_removefieldtag", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_addindexfield(idxID, ln=cdslang, fldID='', callback="yes", confirm=-1): """form to add a new field. fldNAME - the name of the new field code - the field code""" output = "" subtitle = """Add logical field to index""" text = """ Field name """ output = createhiddenform(action="%s/admin/bibindex/bibindexadmin.py/addindexfield" % weburl, text=text, idxID=idxID, ln=ln, button="Add field", confirm=1) if fldID and not fldID in [-1, "-1"] and confirm in ["1", 1]: res = add_idx_fld(idxID, fldID) output += write_outcome(res) elif confirm in ["1", 1]: output += """Please select a field to add.""" try: body = [output, extra] except NameError: body = [output] if callback: return perform_modifyindexfields(idxID, ln, "perform_addindexfield", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_removeindexfield(idxID, fldID, ln=cdslang, callback='yes', confirm=0): """form to remove a field from an index. idxID - the current index, remove the field from this index. fldID - remove the field with this id""" subtitle = """Remove field from index""" output = "" if fldID and idxID: fldID = int(fldID) idxID = int(idxID) fld = get_fld(fldID) idx = get_idx(idxID) if fld and idx and confirm not in ["1", 1]: text = """Do you want to remove the field '%s' from the index '%s'.""" % (fld[0][1], idx[0][1]) output += createhiddenform(action="removeindexfield#3.1", text=text, button="Confirm", idxID=idxID, fldID=fldID, confirm=1) elif confirm in ["1", 1]: res = remove_idxfld(idxID, fldID) output += write_outcome(res) try: body = [output, extra] except NameError: body = [output] if callback: return perform_modifyindexfields(idxID, ln, "perform_removeindexfield", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_switchtagscore(fldID, id_1, id_2, ln=cdslang): """Switch the score of id_1 and id_2 in the table type. colID - the current collection id_1/id_2 - the id's to change the score for. type - like "format" """ output = "" name_1 = run_sql("select name from tag where id=%s" % id_1)[0][0] name_2 = run_sql("select name from tag where id=%s" % id_2)[0][0] res = switch_score(fldID, id_1, id_2) output += write_outcome(res) return perform_modifyfieldtags(fldID, ln, content=output) def perform_deletetag(fldID, ln=cdslang, tagID=-1, callback='yes', confirm=-1): """form to delete an MARC tag not in use. fldID - the collection id of the current collection. fmtID - the format id to delete.""" subtitle = """Delete an unused MARC tag""" output = """
Deleting an MARC tag will also delete the translations associated.
""" fldID = int(fldID) if tagID not in [-1," -1"] and confirm in [1, "1"]: ares = delete_tag(tagID) fld_tag = get_fld_tags() fld_tag = dict(map(lambda x: (x[1], x[0]), fld_tag)) tags = get_tags() text = """ MARC tag
""" if i == 0: output += """No unused MARC tags
""" else: output += createhiddenform(action="deletetag#4.1", text=text, button="Delete", fldID=fldID, ln=ln, confirm=0) if tagID not in [-1,"-1"]: tagID = int(tagID) tags = get_tags(tagID) if confirm in [0, "0"]: text = """Do you want to delete the MARC tag '%s'.""" % tags[0][2] output += createhiddenform(action="deletetag#4.1", text=text, button="Confirm", fldID=fldID, tagID=tagID, ln=ln, confirm=1) elif confirm in [1, "1"]: output += write_outcome(ares) elif confirm not in [-1, "-1"]: output += """Choose a MARC tag to delete.""" try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_modifyfieldtags(fldID, ln, content=output) def compare_on_val(first, second): """Compare the two values""" return cmp(first[1], second[1]) def get_table_status(tblname): sql = "SHOW TABLE STATUS LIKE '%s'" % tblname try: res = run_sql(sql) return res except StandardError, e: return "" def get_col_fld(colID=-1, type = '', id_field=''): """Returns either all portalboxes associated with a collection, or based on either colID or language or both. colID - collection id ln - language id""" sql = "SELECT id_collection,id_field,id_fieldvalue,type,score,score_fieldvalue FROM collection_field_fieldvalue, field WHERE id_field=field.id" try: if id_field: sql += " AND id_field=%s" % id_field sql += " ORDER BY type, score desc, score_fieldvalue desc" res = run_sql(sql) return res except StandardError, e: return "" def get_idx(idxID=''): sql = "SELECT id,name,description,last_updated FROM idxINDEX" try: if idxID: sql += " WHERE id=%s" % idxID sql += " ORDER BY id asc" res = run_sql(sql) return res except StandardError, e: return "" def get_fld_tags(fldID='', tagID=''): """Returns tags associated with a field. fldID - field id tagID - tag id""" sql = "SELECT id_field,id_tag, tag.name, tag.value, score FROM field_tag,tag WHERE tag.id=field_tag.id_tag" try: if fldID: sql += " AND id_field=%s" % fldID if tagID: sql += " AND id_tag=%s" % tagID sql += " ORDER BY score desc, tag.value, tag.name" res = run_sql(sql) return res except StandardError, e: return "" def get_tags(tagID=''): """Returns all or a given tag. tagID - tag id ln - language id""" sql = "SELECT id, name, value FROM tag" try: if tagID: sql += " WHERE id=%s" % tagID sql += " ORDER BY name, value" res = run_sql(sql) return res except StandardError, e: return "" def get_fld(fldID=''): """Returns all fields or only the given field""" try: if not fldID: res = run_sql("SELECT id, name, code FROM field ORDER by name, code") else: res = run_sql("SELECT id, name, code FROM field WHERE id=%s ORDER by name, code" % fldID) return res except StandardError, e: return "" def get_fld_value(fldvID = ''): """Returns fieldvalue""" try: sql = "SELECT id, name, value FROM fieldvalue" if fldvID: sql += " WHERE id=%s" % fldvID res = run_sql(sql) return res except StandardError, e: return "" def get_idx_fld(idxID=''): """Return a list of fields associated with one or all indexes""" try: sql = "SELECT id_idxINDEX, idxINDEX.name, id_field, field.name, regexp_punctuation, regexp_alphanumeric_separators FROM idxINDEX, field, idxINDEX_field WHERE idxINDEX.id = idxINDEX_field.id_idxINDEX AND field.id = idxINDEX_field.id_field" if idxID: sql += " AND id_idxINDEX=%s" % idxID sql += " ORDER BY id_idxINDEX asc" res = run_sql(sql) return res except StandardError, e: return "" def get_col_nametypes(): """Return a list of the various translationnames for the fields""" type = [] type.append(('ln', 'Long name')) return type def get_fld_nametypes(): """Return a list of the various translationnames for the fields""" type = [] type.append(('ln', 'Long name')) return type def get_idx_nametypes(): """Return a list of the various translationnames for the index""" type = [] type.append(('ln', 'Long name')) return type def get_sort_nametypes(): """Return a list of the various translationnames for the fields""" type = {} type['soo'] = 'Sort options' type['seo'] = 'Search options' type['sew'] = 'Search within' return type def remove_fld(colID,fldID, fldvID=''): """Removes a field from the collection given. colID - the collection the format is connected to fldID - the field which should be removed from the collection.""" try: sql = "DELETE FROM collection_field_fieldvalue WHERE id_collection=%s AND id_field=%s" % (colID, fldID) if fldvID: sql += " AND id_fieldvalue=%s" % fldvID res = run_sql(sql) return (1, "") except StandardError, e: return (0, e) def remove_idxfld(idxID, fldID): """Remove a field from a index in table idxINDEX_field idxID - index id from idxINDEX fldID - field id from field table""" try: sql = "DELETE FROM idxINDEX_field WHERE id_field=%s and id_idxINDEX=%s" % (fldID, idxID) res = run_sql(sql) return (1, "") except StandardError, e: return (0, e) def remove_fldtag(fldID,tagID): """Removes a tag from the field given. fldID - the field the tag is connected to tagID - the tag which should be removed from the field.""" try: sql = "DELETE FROM field_tag WHERE id_field=%s AND id_tag=%s" % (fldID, tagID) res = run_sql(sql) return (1, "") except StandardError, e: return (0, e) def delete_tag(tagID): """Deletes all data for the given field fldID - delete all data in the tables associated with field and this id """ try: res = run_sql("DELETE FROM tag where id=%s" % tagID) return (1, "") except StandardError, e: return (0, e) def delete_idx(idxID): """Deletes all data for the given index together with the idxWORDXXR and idxWORDXXF tables""" try: res = run_sql("DELETE FROM idxINDEX WHERE id=%s" % idxID) res = run_sql("DELETE FROM idxINDEXNAME WHERE id_idxINDEX=%s" % idxID) res = run_sql("DELETE FROM idxINDEX_field WHERE id_idxINDEX=%s" % idxID) res = run_sql("DROP TABLE idxWORD%sF" % (idxID < 10 and "0%s" % idxID or idxID)) res = run_sql("DROP TABLE idxWORD%sR" % (idxID < 10 and "0%s" % idxID or idxID)) res = run_sql("DROP TABLE idxPHRASE%sF" % (idxID < 10 and "0%s" % idxID or idxID)) res = run_sql("DROP TABLE idxPHRASE%sR" % (idxID < 10 and "0%s" % idxID or idxID)) return (1, "") except StandardError, e: return (0, e) def delete_fld(fldID): """Deletes all data for the given field fldID - delete all data in the tables associated with field and this id """ try: res = run_sql("DELETE FROM collection_field_fieldvalue WHERE id_field=%s" % fldID) res = run_sql("DELETE FROM field_tag WHERE id_field=%s" % fldID) res = run_sql("DELETE FROM idxINDEX_field WHERE id_field=%s" % fldID) res = run_sql("DELETE FROM field WHERE id=%s" % fldID) return (1, "") except StandardError, e: return (0, e) def add_idx(idxNAME): """Add a new index. returns the id of the new index. idxID - the id for the index, number idxNAME - the default name for the default language of the format.""" try: idxID = 0 res = run_sql("SELECT id from idxINDEX WHERE name='%s'" % MySQLdb.escape_string(idxNAME)) if res: return (0, (0, "A index with the given name already exists.")) for i in range(1, 100): res = run_sql("SELECT id from idxINDEX WHERE id=%s" % i) res2 = run_sql("SHOW TABLE STATUS LIKE 'idxWORD%s%%'" % (i < 10 and "0%s" % i or i)) if not res and not res2: idxID = i break if idxID == 0: return (0, (0, "Not possible to create new indexes, delete an index and try again.")) res = run_sql("INSERT INTO idxINDEX(id, name) values('%s','%s')" % (idxID, MySQLdb.escape_string(idxNAME))) type = get_idx_nametypes()[0][0] res = run_sql("INSERT INTO idxINDEXNAME(id_idxINDEX, ln, type, value) VALUES(%s,'%s','%s', '%s')" % (idxID, cdslang, type, MySQLdb.escape_string(idxNAME))) res = run_sql("""CREATE TABLE IF NOT EXISTS idxWORD%sF ( id mediumint(9) unsigned NOT NULL auto_increment, term varchar(50) default NULL, hitlist longblob, PRIMARY KEY (id), UNIQUE KEY term (term) ) TYPE=MyISAM""" % (idxID < 10 and "0%s" % idxID or idxID)) res = run_sql("""CREATE TABLE IF NOT EXISTS idxWORD%sR ( id_bibrec mediumint(9) unsigned NOT NULL, termlist longblob, type enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (id_bibrec,type) ) TYPE=MyISAM""" % (idxID < 10 and "0%s" % idxID or idxID)) res = run_sql("""CREATE TABLE `idxPHRASE%sF` ( `id` mediumint(9) unsigned NOT NULL auto_increment, `term` varchar(50) default NULL, `hitlist` longblob, PRIMARY KEY (`id`), UNIQUE KEY `term` (`term`) ) TYPE=MyISAM""" % (idxID < 10 and "0%s" % idxID or idxID)) res = run_sql("""CREATE TABLE `idxPHRASE%sR` ( `id_bibrec` mediumint(9) unsigned NOT NULL default '0', `termlist` longblob, `type` enum('CURRENT','FUTURE','TEMPORARY') NOT NULL default 'CURRENT', PRIMARY KEY (`id_bibrec`,`type`) ) TYPE=MyISAM""" % (idxID < 10 and "0%s" % idxID or idxID)) res = run_sql("SELECT id from idxINDEX WHERE id=%s" % idxID) res2 = run_sql("SHOW TABLE STATUS LIKE 'idxWORD%sF'" % (idxID < 10 and "0%s" % idxID or idxID)) res3 = run_sql("SHOW TABLE STATUS LIKE 'idxWORD%sR'" % (idxID < 10 and "0%s" % idxID or idxID)) if res and res2 and res3: return (1, res[0][0]) elif not res: return (0, (0, "Could not add the new index to idxINDEX")) elif not res2: return (0, (0, "Forward table not created for unknown reason.")) elif not res3: return (0, (0, "Reverse table not created for unknown reason.")) except StandardError, e: return (0, e) def add_fld(name, code): """Add a new logical field. Returns the id of the field. code - the code for the field, name - the default name for the default language of the field.""" try: type = get_fld_nametypes()[0][0] res = run_sql("INSERT INTO field (name, code) values('%s','%s')" % (MySQLdb.escape_string(name), MySQLdb.escape_string(code))) fldID = run_sql("SELECT id FROM field WHERE code='%s'" % MySQLdb.escape_string(code)) res = run_sql("INSERT INTO fieldname (id_field, type, ln, value) VALUES (%s,'%s','%s','%s')" % (fldID[0][0], type, cdslang, MySQLdb.escape_string(name))) if fldID: return (1, fldID[0][0]) else: raise StandardError except StandardError, e: return (0, e) def add_fld_tag(fldID, name, value): """Add a sort/search/field to the collection. colID - the id of the collection involved fmtID - the id of the format. score - the score of the format, decides sorting, if not given, place the format on top""" try: res = run_sql("SELECT score FROM field_tag WHERE id_field=%s ORDER BY score desc" % (fldID)) if res: score = int(res[0][0]) + 1 else: score = 0 res = run_sql("SELECT id FROM tag WHERE value='%s'" % MySQLdb.escape_string(value)) if not res: if name == '': name = value res = run_sql("INSERT INTO tag(name, value) values('%s','%s')" % (MySQLdb.escape_string(name), MySQLdb.escape_string(value))) res = run_sql("SELECT id FROM tag WHERE value='%s'" % MySQLdb.escape_string(value)) res = run_sql("INSERT INTO field_tag(id_field, id_tag, score) values(%s, %s, %s)" % (fldID, res[0][0], score)) return (1, "") except StandardError, e: return (0, e) def add_idx_fld(idxID, fldID): """Add a field to an index""" try: sql = "SELECT * FROM idxINDEX_field WHERE id_idxINDEX=%s and id_field=%s" % (idxID, fldID) res = run_sql(sql) if res: return (0, (0, "The field selected already exists for this index")) sql = "INSERT INTO idxINDEX_field(id_idxINDEX, id_field) values (%s, %s)" % (idxID, fldID) res = run_sql(sql) return (1, "") except StandardError, e: return (0, e) def modify_idx(idxID, idxNAME, idxDESC): """Modify index name or index description in idxINDEX table""" try: sql = "UPDATE idxINDEX SET name='%s' WHERE id=%s" % (MySQLdb.escape_string(idxNAME), idxID) res = run_sql(sql) sql = "UPDATE idxINDEX SET description='%s' WHERE ID=%s" % (MySQLdb.escape_string(idxDESC), idxID) res = run_sql(sql) return (1, "") except StandardError, e: return (0, e) def modify_fld(fldID, code): """Modify the code of field fldID - the id of the field to modify code - the new code""" try: sql = "UPDATE field SET code='%s'" % code sql += " WHERE id=%s" % fldID res = run_sql(sql) return (1, "") except StandardError, e: return (0, e) def modify_tag(tagID, name, value): """Modify the name and value of a tag. tagID - the id of the tag to modify name - the new name of the tag value - the new value of the tag""" try: sql = "UPDATE tag SET name='%s' WHERE id=%s" % (name, tagID) res = run_sql(sql) sql = "UPDATE tag SET value='%s' WHERE id=%s" % (value, tagID) res = run_sql(sql) return (1, "") except StandardError, e: return (0, e) def switch_score(fldID, id_1, id_2): """Switch the scores of id_1 and id_2 in the table given by the argument. colID - collection the id_1 or id_2 is connected to id_1/id_2 - id field from tables like format..portalbox... table - name of the table""" try: res1 = run_sql("SELECT score FROM field_tag WHERE id_field=%s and id_tag=%s" % (fldID, id_1)) res2 = run_sql("SELECT score FROM field_tag WHERE id_field=%s and id_tag=%s" % (fldID, id_2)) res = run_sql("UPDATE field_tag SET score=%s WHERE id_field=%s and id_tag=%s" % (res2[0][0], fldID, id_1)) res = run_sql("UPDATE field_tag SET score=%s WHERE id_field=%s and id_tag=%s" % (res1[0][0], fldID, id_2)) return (1, "") except StandardError, e: return (0, e) def get_lang_list(table, field, id): langs = run_sql("select ln from %s where %s=%s" % (table, field, id)) exists = {} lang = '' for lng in langs: if not exists.has_key(lng[0]): lang += lng[0] + ", " exists[lng[0]] = 1 if lang.endswith(", "): lang = lang [:-2] if len(exists) == 0: lang = """None""" return lang diff --git a/modules/bibindex/web/admin/bibindexadmin.py b/modules/bibindex/web/admin/bibindexadmin.py index f06ec4eb2..7a8e544e0 100644 --- a/modules/bibindex/web/admin/bibindexadmin.py +++ b/modules/bibindex/web/admin/bibindexadmin.py @@ -1,643 +1,643 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware BibIndex Administrator Interface.""" __lastupdated__ = """$Date$""" import sys + import cdsware.bibindexadminlib as bic -reload(bic) from cdsware.webpage import page, create_error_box from cdsware.config import weburl,cdslang from cdsware.webuser import getUid, page_not_authorized __version__ = "$Id$" def deletetag(req, fldID, ln=cdslang, tagID=-1, callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Logical Field", body=bic.perform_deletetag(fldID=fldID, ln=ln, tagID=tagID, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def addtag(req, fldID, ln=cdslang, value=['',-1], name='', callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Logical Field", body=bic.perform_addtag(fldID=fldID, ln=ln, value=value, name=name, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def modifyfieldtags(req, fldID, ln=cdslang, callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Logical Field", body=bic.perform_modifyfieldtags(fldID=fldID, ln=ln, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def addindexfield(req, idxID, ln=cdslang, fldID='', callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage indexes """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Index", body=bic.perform_addindexfield(idxID=idxID, ln=ln, fldID=fldID, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def modifyindexfields(req, idxID, ln=cdslang, callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage Indexes """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Index", body=bic.perform_modifyindexfields(idxID=idxID, ln=ln, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def showdetailsfieldtag(req, fldID, tagID, ln=cdslang, callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Logical Field", body=bic.perform_showdetailsfieldtag(fldID=fldID, tagID=tagID, ln=ln, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def showdetailsfield(req, fldID, ln=cdslang, callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Logical Field", body=bic.perform_showdetailsfield(fldID=fldID, ln=ln, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def modifyfield(req, fldID, ln=cdslang, code='', callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Logical Field", body=bic.perform_modifyfield(fldID=fldID, ln=ln, code=code, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def modifyindex(req, idxID, ln=cdslang, idxNAME='', idxDESC='', callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage Indexes """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Index", body=bic.perform_modifyindex(idxID=idxID, ln=ln, idxNAME=idxNAME, idxDESC=idxDESC, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def modifytag(req, fldID, tagID, ln=cdslang, name='', value='', callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Logical Field", body=bic.perform_modifytag(fldID=fldID, tagID=tagID, ln=ln, name=name, value=value, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def deletefield(req, fldID, ln=cdslang, confirm=0): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Logical Field", body=bic.perform_deletefield(fldID=fldID, ln=ln, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def deleteindex(req, idxID, ln=cdslang, confirm=0): navtrail_previous_links = bic.getnavtrail() + """> Manage Indexes """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Index", body=bic.perform_deleteindex(idxID=idxID, ln=ln, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def showfieldoverview(req, ln=cdslang, callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Manage logical fields", body=bic.perform_showfieldoverview(ln=ln, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def editfields(req, ln=cdslang, callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Manage logical fields", body=bic.perform_editfields(ln=ln, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def editfield(req, fldID, ln=cdslang, mtype='', callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Logical Field", body=bic.perform_editfield(fldID=fldID, ln=ln, mtype=mtype, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def editindex(req, idxID, ln=cdslang, mtype='', callback='yes', confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage Indexes """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Index", body=bic.perform_editindex(idxID=idxID, ln=ln, mtype=mtype, callback=callback, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def modifyindextranslations(req, idxID, ln=cdslang, sel_type='', trans = [], confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage Indexes """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Index", body=bic.perform_modifyindextranslations(idxID=idxID, ln=ln, sel_type=sel_type, trans=trans, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def modifyfieldtranslations(req, fldID, ln=cdslang, sel_type='', trans = [], confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Logical Field", body=bic.perform_modifyfieldtranslations(fldID=fldID, ln=ln, sel_type=sel_type, trans=trans, confirm=confirm), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def addfield(req, ln=cdslang, fldNAME='', code='', callback="yes", confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Manage logical fields", body=bic.perform_addfield(ln=cdslang, fldNAME=fldNAME, code=code, callback=callback, confirm=confirm), uid=uid, language=ln, navtrail = navtrail_previous_links, urlargs=req.args, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def addindex(req, ln=cdslang, idxNAME='', callback="yes", confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage Indexes """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Manage Indexes", body=bic.perform_addindex(ln=cdslang, idxNAME=idxNAME, callback=callback, confirm=confirm), uid=uid, language=ln, navtrail = navtrail_previous_links, urlargs=req.args, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def switchtagscore(req, fldID, id_1, id_2, ln=cdslang): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Logical Field", body=bic.perform_switchtagscore(fldID=fldID, id_1=id_1, id_2=id_2, ln=ln), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def removeindexfield(req, idxID, fldID, ln=cdslang, callback="yes", confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage Indexes """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Index", body=bic.perform_removeindexfield(idxID=idxID, fldID=fldID, ln=cdslang, callback=callback, confirm=confirm), uid=uid, language=ln, navtrail = navtrail_previous_links, urlargs=req.args, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def removefieldtag(req, fldID, tagID, ln=cdslang, callback="yes", confirm=-1): navtrail_previous_links = bic.getnavtrail() + """> Manage logical fields """ % (weburl) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Edit Logical Field", body=bic.perform_removefieldtag(fldID=fldID, tagID=tagID, ln=cdslang, callback=callback, confirm=confirm), uid=uid, language=ln, navtrail = navtrail_previous_links, urlargs=req.args, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def index(req, ln=cdslang, mtype='', content=''): navtrail_previous_links = bic.getnavtrail() try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Manage Indexes", body=bic.perform_index(ln=ln, mtype=mtype, content=content), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def field(req, ln=cdslang, mtype='', content=''): navtrail_previous_links = bic.getnavtrail() try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) auth = bic.check_user(uid,'cfgbibindex') if not auth[0]: return page(title="Manage logical fields", body=bic.perform_field(ln=ln, mtype=mtype, content=content), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth[1], navtrail=navtrail_previous_links) def error_page(req): return page(title=msg_internal_error[ln], body = create_error_box(req, verbose=verbose, ln=ln), description="%s - Internal Error" % cdsname, keywords="%s, CDSware, Internal Error" % cdsname, language=ln, urlargs=req.args) diff --git a/modules/bibrank/bin/bibrank.in b/modules/bibrank/bin/bibrank.in index 04cd558f4..bf8ff2f53 100644 --- a/modules/bibrank/bin/bibrank.in +++ b/modules/bibrank/bin/bibrank.in @@ -1,471 +1,469 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ BibRank ranking daemon. Usage: %s [options] Ranking examples: %s -wjif -a --id=0-30000,30001-860000 --verbose=9 %s -wjif -d --modified='2002-10-27 13:57:26' %s -wwrd --rebalance --collection=Articles %s -wwrd -a -i 234-250,293,300-500 -u admin@cdsware Ranking options: -w, --run=r1[,r2] runs each rank method in the order given -c, --collection=c1[,c2] select according to collection -i, --id=low[-high] select according to doc recID -m, --modified=from[,to] select according to modification date -l, --lastupdate select according to last update -a, --add add or update words for selected records -d, --del delete words for selected records -S, --stat show statistics for a method -R, --recalculate recalculate weigth data, used by word frequency method should be used if ca 1% of the document has been changed since last time -R was used Repairing options: -k, --check check consistency for all records in the table(s) check if update of ranking data is necessary -r, --repair try to repair all records in the table(s) Scheduling options: -u, --user=USER user name to store task, password needed -s, --sleeptime=SLEEP time after which to repeat tasks (no) e.g.: 1s, 30m, 24h, 7d -t, --time=TIME moment for the task to be active (now) e.g.: +15s, 5m, 3h , 2002-10-27 13:57:26 General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1) """ __version__ = "$Id$" try: from marshal import loads,dumps from zlib import compress,decompress from string import split,translate,lower,upper import getopt import getpass import string import os import sre import sys import time import MySQLdb import urllib import signal import tempfile import traceback import cStringIO import re import copy import types import ConfigParser except ImportError, e: import sys try: - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.dbquery import run_sql from cdsware.bibrank_tag_based_indexer import * from cdsware.bibrank_word_indexer import * from cdsware.access_control_engine import acc_authorize_action from cdsware.search_engine import perform_request_search except ImportError, e: import sys task_id = -1 # the task id nb_char_in_line = 50 # for verbose pretty printing chunksize = 1000 # default size of chunks that the records will be treated by base_process_size = 4500 # process base size bibrank_options = {} # will hold task options def serialize_via_numeric_array_dumps(arr): return Numeric.dumps(arr) def serialize_via_numeric_array_compr(str): return compress(str) def serialize_via_numeric_array_escape(str): return MySQLdb.escape_string(str) def serialize_via_numeric_array(arr): """Serialize Numeric array into a compressed string.""" return serialize_via_numeric_array_escape(serialize_via_numeric_array_compr(serialize_via_numeric_array_dumps(arr))) def deserialize_via_numeric_array(string): """Decompress and deserialize string into a Numeric array.""" return Numeric.loads(decompress(string)) def serialize_via_marshal(obj): """Serialize Python object via marshal into a compressed string.""" return MySQLdb.escape_string(compress(dumps(obj))) def deserialize_via_marshal(string): """Decompress and deserialize string into a Python object via marshal.""" return loads(decompress(string)) def authenticate(user, header="BibRank Task Submission", action="runbibrank"): print header print "=" * len(header) if user == "": print>> sys.stdout, "\rUsername: ", user = string.strip(string.lower(sys.stdin.readline())) else: print>> sys.stdout, "\rUsername: ", user res = run_sql("select id,password from user where email=%s", (user,), 1) if not res: print "Sorry, %s does not exist." % user sys.exit(1) else: (uid_db, password_db) = res[0] if password_db: password_entered = getpass.getpass() if password_db == password_entered: pass else: print "Sorry, wrong credentials for %s." % user sys.exit(1) (auth_code, auth_message) = acc_authorize_action(uid_db, action) if auth_code != 0: print auth_message sys.exit(1) return user def usage(code, msg=''): "Prints usage for this module." if msg: sys.stderr.write("Error: %s.\n" % msg) print >> sys.stderr, \ """ Usage: %s [options] Ranking examples: %s -wjif -a --id=0-30000,30001-860000 --verbose=9 %s -wjif -d --modified='2002-10-27 13:57:26' %s -wjif --rebalance --collection=Articles %s -wsbr -a -i 234-250,293,300-500 -u admin@cdsware Ranking options: -w, --run=r1[,r2] runs each rank method in the order given -c, --collection=c1[,c2] select according to collection -i, --id=low[-high] select according to doc recID -m, --modified=from[,to] select according to modification date -l, --lastupdate select according to last update -a, --add add or update words for selected records -d, --del delete words for selected records -S, --stat show statistics for a method -R, --recalculate recalculate weigth data, used by word frequency method should be used if ca 1%% of the document has been changed since last time -R was used Repairing options: -k, --check check consistency for all records in the table(s) check if update of ranking data is necessary -r, --repair try to repair all records in the table(s) Scheduling options: -u, --user=USER user name to store task, password needed -s, --sleeptime=SLEEP time after which to repeat tasks (no) e.g.: 1s, 30m, 24h, 7d -t, --time=TIME moment for the task to be active (now) e.g.: +15s, 5m, 3h , 2002-10-27 13:57:26 General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1) """ % ((sys.argv[0],) * 5) sys.exit(code) def get_datetime(var, format_string="%Y-%m-%d %H:%M:%S"): """Returns a date string according to the format string. It can handle normal date strings and shifts with respect to now.""" date = time.time() shift_re = sre.compile("([-\+]{0,1})([\d]+)([dhms])") factors = {"d":24*3600, "h":3600, "m":60, "s":1} m = shift_re.match(var) if m: sign = m.groups()[0] == "-" and -1 or 1 factor = factors[m.groups()[2]] value = float(m.groups()[1]) date = time.localtime(date + sign * factor * value) date = time.strftime(format_string, date) else: date = time.strptime(var, format_string) date = time.strftime(format_string, date) return date def task_sig_sleep(sig, frame): """Signal handler for the 'sleep' signal sent by BibSched.""" if bibrank_options["verbose"]>= 9: write_message("got signal %d" % sig) write_message("sleeping...") task_update_status("SLEEPING") signal.pause() # wait for wake-up signal def task_sig_wakeup(sig, frame): """Signal handler for the 'wakeup' signal sent by BibSched.""" if bibrank_options["verbose"]>= 9: write_message("got signal %d" % sig) write_message("continuing...") task_update_status("CONTINUING") def task_sig_stop_commands(): """Do all the commands necessary to stop the task before quitting. Useful for task_sig_stop() handler. """ write_message("stopping commands started") write_message("stopping commands ended") def task_sig_suicide(sig, frame): """Signal handler for the 'suicide' signal sent by BibSched.""" if bibrank_options["verbose"]>= 9: write_message("got signal %d" % sig) write_message("suiciding myself now...") task_update_status("SUICIDING") write_message("suicided") task_update_status("SUICIDED") sys.exit(0) def task_sig_unknown(sig, frame): """Signal handler for the other unknown signals sent by shell or user.""" if bibrank_options["verbose"]>= 9: write_message("got signal %d" % sig) write_message("unknown signal %d ignored" % sig) # do nothing for other signals def task_update_progress(msg): """Updates progress information in the BibSched task table.""" query = "UPDATE schTASK SET progress='%s' where id=%d" % (MySQLdb.escape_string(msg), task_id) if bibrank_options["verbose"]>= 9: write_message(query) run_sql(query) return def task_update_status(val): """Updates state information in the BibSched task table.""" query = "UPDATE schTASK SET status='%s' where id=%d" % (MySQLdb.escape_string(val), task_id) if bibrank_options["verbose"]>= 9: write_message(query) run_sql(query) return def split_ranges(parse_string): recIDs = [] ranges = string.split(parse_string, ",") for range in ranges: tmp_recIDs = string.split(range, "-") if len(tmp_recIDs)==1: recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[0])]) else: if int(tmp_recIDs[0]) > int(tmp_recIDs[1]): # sanity check tmp = tmp_recIDs[0] tmp_recIDs[0] = tmp_recIDs[1] tmp_recIDs[1] = tmp recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[1])]) return recIDs def get_date_range(var): "Returns the two dates contained as a low,high tuple" limits = string.split(var, ",") if len(limits)==1: low = get_datetime(limits[0]) return low,None if len(limits)==2: low = get_datetime(limits[0]) high = get_datetime(limits[1]) return low,high def command_line(): """Storing the task together with the parameters given.""" global bibrank_options long_flags = ["lastupdate","add","del","repair","maxmem", "flush","stat", "rebalance", "id=", "collection=", "check", "modified=", "update", "run=", "user=", "sleeptime=", "time=", "help", "version", "verbose="] short_flags = "ladSi:m:c:kUrRM:f:w:u:s:t:hVv:" format_string = "%Y-%m-%d %H:%M:%S" sleeptime = "" try: opts, args = getopt.getopt(sys.argv[1:], short_flags, long_flags) except getopt.GetoptError, err: write_message(err, sys.stderr) usage(1) if args: usage(1) bibrank_options = {"quick":"yes","cmd":"add","flush":100000,"validset":"", "collection":[], "id":[], "check": "", "stat":"", "modified":"", "last_updated":"last_updated","run":"", "verbose":1} res = run_sql("SELECT name from rnkMETHOD") bibrank_options["run"] = [] for (name,) in res: bibrank_options["run"].append(name) sched_time = time.strftime(format_string) user = "" try: for opt in opts: if opt == ("-h","") or opt == ("--help",""): usage(1) elif opt == ("-V","") or opt == ("--version",""): print __version__ sys.exit(1) elif opt[0] in ["--verbose", "-v"]: bibrank_options["verbose"] = int(opt[1]) elif opt == ("-a","") or opt == ("--add",""): bibrank_options["cmd"] = "add" if ("-x","") in opts or ("--del","") in opts: usage(1) elif opt[0] in ["--run", "-w"]: bibrank_options["run"] = [] run = split(opt[1],",") for key in range(0,len(run)): bibrank_options["run"].append(run[key]) elif opt == ("-r","") or opt == ("--repair",""): bibrank_options["cmd"] = "repair" elif opt == ("-d","") or opt == ("--del",""): bibrank_options["cmd"]="del" elif opt[0] in [ "-u", "--user"]: user = opt[1] elif opt[0] in [ "-k", "--check"]: bibrank_options["cmd"]= "check" elif opt[0] in [ "-S", "--stat"]: bibrank_options["cmd"] = "stat" elif opt[0] in [ "-i", "--id" ]: bibrank_options["id"] = bibrank_options["id"] + split_ranges(opt[1]) bibrank_options["last_updated"] = "" elif opt[0] in [ "-c", "--collection" ]: bibrank_options["collection"] = opt[1] elif opt[0] in [ "-R", "--rebalance"]: bibrank_options["quick"] = "no" elif opt[0] in [ "-f", "--flush"]: bibrank_options["flush"]=int(opt[1]) elif opt[0] in [ "-M", "--maxmem"]: bibrank_options["maxmem"]=int(opt[1]) if bibrank_options["maxmem"] < base_process_size + 1000: raise StandardError, "Memory usage should be higher than %d kB" % (base_process_size + 1000) elif opt[0] in [ "-m", "--modified" ]: bibrank_options["modified"] = get_date_range(opt[1]) #2002-10-27 13:57:26 bibrank_options["last_updated"] = "" elif opt[0] in [ "-l", "--lastupdate" ]: bibrank_options["last_updated"] = "last_updated" elif opt[0] in [ "-s", "--sleeptime" ]: get_datetime(opt[1]) # see if it is a valid shift sleeptime=opt[1] elif opt[0] in [ "-t", "--time" ]: sched_time = get_datetime(opt[1]) else: usage(1) except StandardError, e: write_message(e, sys.stderr) sys.exit(1) user = authenticate(user) if bibrank_options["verbose"]>=9: write_message("Storing task options %s" % bibrank_options) new_task_id = run_sql("""INSERT INTO schTASK (proc,user,runtime,sleeptime,arguments,status) VALUES ('bibrank',%s,%s,%s,%s,'WAITING')""", (user, sched_time, sleeptime, dumps(bibrank_options))) print "Task #%d was successfully scheduled for execution." % new_task_id return def task_run(row): """Run the indexing task. The row argument is the BibSched task queue row, containing if, arguments, etc. Return 1 in case of success and 0 in case of failure. """ global task_id, bibrank_options task_id = row[0] task_proc = row[1] bibrank_options = loads(row[6]) task_status = row[7] # install signal handlers signal.signal(signal.SIGUSR1, task_sig_sleep) signal.signal(signal.SIGTERM, task_sig_stop) signal.signal(signal.SIGABRT, task_sig_suicide) signal.signal(signal.SIGCONT, task_sig_wakeup) signal.signal(signal.SIGINT, task_sig_unknown) if task_proc != "bibrank": write_message("-The task #%d does not seem to be a BibRank task." % task_id, sys.stderr) return 0 if task_status != "WAITING": write_message("The task #%d is %s. I expected WAITING." % (task_id, task_status), sys.stderr) return 0 if bibrank_options["verbose"]: write_message("Task #%d started." % task_id) task_update_status("RUNNING") try: bibrank_options = marshal.loads(row[6]) for key in bibrank_options["run"]: write_message("") file = etcdir + "/bibrank/" + key + ".cfg" if bibrank_options["verbose"] >= 9: write_message("Getting configuration from file: %s" % file) config = ConfigParser.ConfigParser() try: config.readfp(open(file)) except StandardError, e: write_message("Cannot find configurationfile: %s. The rankmethod may also not be registered using the BibRank Admin Interface." % file, sys.stderr) raise StandardError #Using the function variable to call the function related to the rank method cfg_function = config.get("rank_method", "function") func_object = globals().get(cfg_function) if func_object: func_object(row, key) else: write_message("Cannot run method '%s', no function to call" % key) except StandardError, e: write_message("\nException caught: %s" % e, sys.stderr) traceback.print_tb(sys.exc_info()[2]) task_update_status("ERROR") sys.exit(1) task_update_status("DONE") if bibrank_options["verbose"]: write_message("Task #%d finished." % task_id) return 1 def main(): if len(sys.argv) == 2: try: id = int(sys.argv[1]) except StandardError, err: command_line() sys.exit() res = run_sql("SELECT * FROM schTASK WHERE id='%d'" % (id), None, 1) if not res: write_message("Selected task not found.", sys.stderr) sys.exit(1) try: if not task_run(res[0]): write_message("Error occurred. Exiting.", sys.stderr) except StandardError, e: write_message("Unexpected error occurred: %s." % e, sys.stderr) write_message("Traceback is:") traceback.print_tb(sys.exc_info()[2]) write_message("Exiting.") task_update_status("ERROR") else: command_line() if __name__ == "__main__": main() diff --git a/modules/bibrank/bin/bibrankgkb.in b/modules/bibrank/bin/bibrankgkb.in index 6710e2a40..c9a216f7d 100644 --- a/modules/bibrank/bin/bibrankgkb.in +++ b/modules/bibrank/bin/bibrankgkb.in @@ -1,330 +1,328 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ Usage: bibrankgkb %s [options] Examples: bibrankgkb --input=bibrankgkb.cfg --output=test.kb bibrankgkb -otest.kb -v9 bibrankgkb -v9 Generate options: -i, --input=file input file, default from /etc/bibrank/bibrankgkb.cfg -o, --output=file output file, will be placed in current folder General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1) """ __version__ = "$Id$" try: from marshal import loads,dumps from zlib import compress,decompress from string import split,translate,lower,upper import getopt import getpass import string import os import sre import sys import time import MySQLdb import Numeric import urllib import signal import tempfile import unicodedata import traceback import cStringIO import re import copy import types import ConfigParser except ImportError, e: import sys try: - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.config import * from cdsware.search_engine_config import cfg_max_recID from cdsware.search_engine import perform_request_search, strip_accents from cdsware.search_engine import HitSet from cdsware.dbquery import run_sql except ImportError, e: import sys try: import psyco psyco.bind(serialize_via_numeric_array) except: pass opts_dict = {} task_id = -1 def bibrankgkb(config): """Generates a .kb file based on input from the configuration file""" if opts_dict["verbose"] >= 1: write_message("Running: Generate Knowledgebase.") journals = {} journal_src = {} i = 0 #Reading the configuration file while config.has_option("bibrankgkb","create_%s" % i): cfg = split(config.get("bibrankgkb", "create_%s" % i),",,") conv = {} temp = {} #Input source 1, either file, www or from db if cfg[0] == "file": conv = get_from_source(cfg[0], cfg[1]) del cfg[0:2] elif cfg[0] == "www": j = 0 urls = {} while config.has_option("bibrankgkb",cfg[1] % j): urls[j] = config.get("bibrankgkb",cfg[1] % j) j = j + 1 conv = get_from_source(cfg[0], (urls, cfg[2])) del cfg[0:3] elif cfg[0] == "db": conv = get_from_source(cfg[0], (cfg[1], cfg[2])) del cfg[0:3] if not conv: del cfg[0:2] else: if opts_dict["verbose"] >= 9: write_message("Using last resource for converting values.") #Input source 2, either file, www or from db if cfg[0] == "file": temp = get_from_source(cfg[0], cfg[1]) elif cfg[0] == "www": j = 0 urls = {} while config.has_option("bibrankgkb",cfg[1] % j): urls[j] = config.get("bibrankgkb",cfg[1] % j) j = j + 1 temp = get_from_source(cfg[0], (urls, cfg[2])) elif cfg[0] == "db": temp = get_from_source(cfg[0], (cfg[1], cfg[2])) i = i + 1 #If a convertion file is given, the names will be converted to the correct convention if len(conv) != 0: if opts_dict["verbose"] >= 9: write_message("Converting between naming conventions given.") temp = convert(conv, temp) if len(journals) != 0: for element in temp.keys(): if not journals.has_key(element): journals[element] = temp[element] else: journals = temp #Writing output file if opts_dict["output"]: f = open(opts_dict["output"], 'w') f.write("#Created by %s\n" % __version__) f.write("#Sources:\n") for key in journals.keys(): f.write("%s---%s\n" % (key,journals[key])) f.close() if opts_dict["verbose"] >= 9: write_message("Output complete: %s" % opts_dict["output"]) write_message("Number of hits: %s" % len(journals)) if opts_dict["verbose"] >= 9: write_message("Result:") for key in journals.keys(): write_message("%s---%s" % (key,journals[key])) write_message("Total nr of lines: %s" % len(journals)) def showtime(timeused): if opts_dict["verbose"] >= 9: write_message("Time used: %d second(s)." % timeused) def get_from_source(type, data): """Read a source based on the input to the function""" datastruct = {} if type == "db": jvalue = run_sql(data[0]) jname = dict(run_sql(data[1])) if opts_dict["verbose"] >= 9: write_message("Reading data from database using SQL statements:") write_message(jvalue) write_message(jname) for key, value in jvalue: if jname.has_key(key): key2 = string.strip(jname[key]) datastruct[key2] = value #print "%s---%s" % (key2, value) elif type == "file": input = open(data, 'r') if opts_dict["verbose"] >= 9: write_message("Reading data from file: %s" % data) data = input.readlines() datastruct = {} for line in data: #print line if not line[0:1] == "#": key = string.strip((string.split(string.strip(line),"---"))[0]) value = (string.split(string.strip(line), "---"))[1] datastruct[key] = value #print "%s---%s" % (key,value) elif type == "www": if opts_dict["verbose"] >= 9: write_message("Reading data from www using regexp: %s" % data[1]) write_message("Reading data from url:") for link in data[0].keys(): if opts_dict["verbose"] >= 9: write_message(data[0][link]) page = urllib.urlopen(data[0][link]) input = page.read() #Using the regexp from config file reg = re.compile(data[1]) iterator = re.finditer(reg, input) for match in iterator: if match.group("value"): key = string.strip(match.group("key")) value = string.replace(match.group("value"),",",".") datastruct[key] = value if opts_dict["verbose"] == 9: print "%s---%s" % (key,value) return datastruct def convert(convstruct, journals): """Converting between names""" if len(convstruct) > 0 and len(journals) > 0: invconvstruct = dict(map(lambda x: (x[1], x[0]), convstruct.items())) tempjour = {} for name in journals.keys(): if convstruct.has_key(name): tempjour[convstruct[name]] = journals[name] elif invconvstruct.has_key(name): tempjour[name] = journals[name] return tempjour else: return journals def serialize_via_numeric_array_dumps(arr): return Numeric.dumps(arr) def serialize_via_numeric_array_compr(str): return compress(str) def serialize_via_numeric_array_escape(str): return MySQLdb.escape_string(str) def serialize_via_numeric_array(arr): """Serialize Numeric array into a compressed string.""" return serialize_via_numeric_array_escape(serialize_via_numeric_array_compr(serialize_via_numeric_array_dumps(arr))) def deserialize_via_numeric_array(string): """Decompress and deserialize string into a Numeric array.""" return Numeric.loads(decompress(string)) def write_message(msg, stream = sys.stdout): """Write message and flush output stream (may be sys.stdout or sys.stderr). Useful for debugging stuff.""" if stream == sys.stdout or stream == sys.stderr: stream.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())) stream.write("%s\n" % msg) stream.flush() else: sys.stderr.write("Unknown stream %s. [must be sys.stdout or sys.stderr]\n" % stream) return def usage(code, msg=''): "Prints usage for this module." if msg: sys.stderr.write("Error: %s.\n" % msg) print >> sys.stderr, \ """ Usage: %s [options] Examples: %s --input=bibrankgkb.cfg --output=test.kb %s -otest.kb -v9 %s -v9 Generate options: -i, --input=file input file, default from /etc/bibrank/bibrankgkb.cfg -o, --output=file output file, will be placed in current folder General options: -h, --help print this help and exit -V, --version print version and exit -v, --verbose=LEVEL verbose level (from 0 to 9, default 1) """ % ((sys.argv[0],) * 4) sys.exit(code) def command_line(): global opts_dict long_flags = ["input=", "output=", "help", "version", "verbose="] short_flags = "i:o:hVv:" format_string = "%Y-%m-%d %H:%M:%S" sleeptime = "" try: opts, args = getopt.getopt(sys.argv[1:], short_flags, long_flags) except getopt.GetoptError, err: write_message(err, sys.stderr) usage(1) if args: usage(1) opts_dict = {"input": "%s/bibrank/bibrankgkb.cfg" % etcdir, "output":"", "verbose":1} sched_time = time.strftime(format_string) user = "" try: for opt in opts: if opt == ("-h","") or opt == ("--help",""): usage(1) elif opt == ("-V","") or opt == ("--version",""): print __version__ sys.exit(1) elif opt[0] in ["--input", "-i"]: opts_dict["input"] = opt[1] elif opt[0] in ["--output", "-o"]: opts_dict["output"] = opt[1] elif opt[0] in ["--verbose", "-v"]: opts_dict["verbose"] = int(opt[1]) else: usage(1) startCreate = time.time() file = opts_dict["input"] config = ConfigParser.ConfigParser() config.readfp(open(file)) bibrankgkb(config) if opts_dict["verbose"] >= 9: showtime((time.time() - startCreate)) except StandardError, e: write_message(e, sys.stderr) sys.exit(1) return def main(): command_line() if __name__ == "__main__": main() diff --git a/modules/bibrank/lib/bibrank_citation_grapher.py b/modules/bibrank/lib/bibrank_citation_grapher.py index 0bfeefd0e..baa147913 100644 --- a/modules/bibrank/lib/bibrank_citation_grapher.py +++ b/modules/bibrank/lib/bibrank_citation_grapher.py @@ -1,132 +1,133 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import os import time import tempfile from marshal import loads from zlib import decompress -from config import weburl, cdslang -from dbquery import run_sql -from messages import gettext_set_language -from bibrank_grapher import create_temporary_image, write_coordinates_in_tmp_file, remove_old_img -from bibrank_citation_searcher import calculate_cited_by_list + +from cdsware.config import weburl, cdslang +from cdsware.dbquery import run_sql +from cdsware.messages import gettext_set_language +from cdsware.bibrank_grapher import create_temporary_image, write_coordinates_in_tmp_file, remove_old_img +from cdsware.bibrank_citation_searcher import calculate_cited_by_list cfg_bibrank_print_citation_history = 1 color_line_list = ['9', '19', '10', '15', '21', '18'] def get_field_values(recID, tag): """Return list of field values for field tag inside record RECID.""" out = [] if tag == "001___": out.append(str(recID)) else: digit = tag[0:2] bx = "bib%sx" % digit bibx = "bibrec_bib%sx" % digit query = "SELECT bx.value FROM %s AS bx, %s AS bibx WHERE bibx.id_bibrec='%s' AND bx.id=bibx.id_bibxxx AND bx.tag LIKE '%s'" "ORDER BY bibx.field_number, bx.tag ASC" % (bx, bibx, recID, tag) res = run_sql(query) for row in res: out.append(row[0]) return out def calculate_citation_history_coordinates(recid): """Return a list of citation graph coordinates for RECID, sorted by year.""" result = [] initial_result= get_initial_result(calculate_citation_graphe_x_coordinates(recid)) citlist = calculate_cited_by_list(recid) for rec_id, cit_weight in citlist: cit_year = get_field_values(rec_id,'773__y') if not cit_year: cit_year = get_field_values(rec_id, '260__c') #print rec_idh,str(cit_year[0]) if initial_result.has_key(int(cit_year[0][0:4])): initial_result[int(cit_year[0][0:4])] += 1 for key, value in initial_result.items(): result.append((key, value)) result.sort() return result def calculate_citation_graphe_x_coordinates(recid): """Return a range of year from the publication year of record RECID until the current year.""" rec_years = [] recordyear = get_field_values(recid, '773__y') if not recordyear: recordyear = get_field_values(recid, '260__c') currentyear = time.localtime()[0] if recordyear == []: recordyear = currentyear else: recordyear = find_year(recordyear[0]) interval = range(int(recordyear), currentyear+1) return interval def find_year(recordyear): """find the year in the string as a suite of 4 int""" s = "" for i in range(len(recordyear)-3): s = recordyear[i:i+4] if s.isalnum(): print s break return s def get_initial_result(rec_years): """return an initial dictionary with year of record publication as key and zero as value """ result = {} for year in rec_years : result[year] = 0 return result def html_command(file): t = """""" % (weburl, file) #t += "" return t def create_citation_history_graph_and_box(recid, ln=cdslang): """Create graph with citation history for record RECID (into a temporary file) and return HTML box refering to that image. Called by Detailed record pages. """ _ = gettext_set_language(ln) html_result = "" if cfg_bibrank_print_citation_history: coordinates = calculate_citation_history_coordinates(recid) if coordinates: html_head = """
%s
""" % _("Citation history:") graphe_file_name = 'citation_%s_stats.png' % str(recid) remove_old_img(graphe_file_name) years = calculate_citation_graphe_x_coordinates(recid) years.sort() datas_info = write_coordinates_in_tmp_file([coordinates]) graphe = create_temporary_image(recid, 'citation', datas_info[0], 'Year', 'Times Cited', [0,0], datas_info[1], [], ' ', years) graphe_image = graphe[0] graphe_source_file = graphe[1] if graphe_image and graphe_source_file: if os.path.exists(graphe_source_file): os.unlink(datas_info[0]) html_graphe_code = """

%s"""% html_command(graphe_image) html_result = html_head + html_graphe_code return html_result diff --git a/modules/bibrank/lib/bibrank_citation_indexer.py b/modules/bibrank/lib/bibrank_citation_indexer.py index 5eb067270..62f46619c 100644 --- a/modules/bibrank/lib/bibrank_citation_indexer.py +++ b/modules/bibrank/lib/bibrank_citation_indexer.py @@ -1,294 +1,295 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import MySQLdb import time import os from marshal import loads, dumps from zlib import decompress, compress -from dbquery import run_sql -from search_engine import print_record, search_pattern -from bibrecord import create_records, record_get_field_values + +from cdsware.dbquery import run_sql +from cdsware.search_engine import print_record, search_pattern +from cdsware.bibrecord import create_records, record_get_field_values class memoise: def __init__(self, function): self.memo = {} self.function = function def __call__(self, *args): if self.memo.has_key(args): return self.memo[args] else: object = self.memo[args] = self.function(*args) return object def get_recids_matching_query(pvalue, fvalue): """Return list of recIDs matching query for PVALUE and FVALUE.""" rec_id = search_pattern(p=pvalue, f=fvalue, m='e').tolist() return rec_id get_recids_matching_query = memoise(get_recids_matching_query) def get_citation_weight(rank_method_code, config): """return a dictionary which is used by bibrank daemon for generating the index of sorted research results by citation inforamtion """ begin_time = time.time() last_update_time = get_bibrankmethod_lastupdate(rank_method_code) last_modified_records = get_last_modified_rec(last_update_time) if last_modified_records: updated_recid_list = create_recordid_list(last_modified_records) result_intermediate = last_updated_result(rank_method_code, updated_recid_list) citation_weight_dic_intermediate = result_intermediate[0] citation_list_intermediate = result_intermediate[1] reference_list_intermediate = result_intermediate[2] citation_informations = get_citation_informations(updated_recid_list, config) dic = ref_analyzer(citation_informations, citation_weight_dic_intermediate, citation_list_intermediate, reference_list_intermediate) end_time = time.time() print "Total time of software: ", (end_time - begin_time) else: dic = {} print "No new records added since last time this rank method was executed" return dic def get_bibrankmethod_lastupdate(rank_method_code): """return the last excution date of bibrank method """ query = """select last_updated from rnkMETHOD where name ='%s'""" % rank_method_code last_update_time = run_sql(query) return last_update_time[0][0] def get_last_modified_rec(bibrank_method_lastupdate): """ return the list of recods which have been modified after the last execution of bibrank method. The result is expected to have ascending numerical order. """ query = """SELECT id FROM bibrec WHERE modification_date>= '%s' """ % bibrank_method_lastupdate query += "order by id ASC" list = run_sql(query) return list def create_recordid_list(rec_ids): """Create a list of record ids out of RECIDS. The result is expected to have ascending numerical order. """ rec_list = [] for row in rec_ids: rec_list.append(row[0]) return rec_list def create_record_tuple(list): """Creates a tuple of record id from a list of id. The result is expected to have ascending numerical order. """ list_length = len(list) if list_length: rec_tuple = '(' for row in list[0:list_length-1]: rec_tuple += str(row) rec_tuple += ',' rec_tuple += str(list[list_length-1]) rec_tuple += ')' else: rec_tuple = '()' return rec_tuple def last_updated_result(rank_method_code, recid_list): """ return the last value of dictionary in rnkMETHODDATA table if it exists and initialize the value of last updated records by zero,otherwise an initial dictionary with zero as value for all recids """ query = """select relevance_data from rnkMETHOD, rnkMETHODDATA where rnkMETHOD.id = rnkMETHODDATA.id_rnkMETHOD and rnkMETHOD.Name = '%s'"""% rank_method_code dict = run_sql(query) if dict: dic = loads(decompress(dict[0][0])) query = "select citation_data from rnkCITATIONDATA" cit_compressed = run_sql(query) cit = loads(decompress(cit_compressed[0][0])) query = "select citation_data_reversed from rnkCITATIONDATA" ref_compressed = run_sql(query) ref = loads(decompress(ref_compressed[0][0])) result = get_initial_result(dic, cit, ref, recid_list) else: result = make_initial_result() return result def get_initial_result(dic, cit, ref, recid_list): """initialieze the citation weights of the last updated record with zero for recalculating it later """ for recid in recid_list: dic[recid] = 0 cit[recid] = [] if ref.has_key(recid) and ref[recid]: for id in ref[recid]: if cit.has_key(id) and recid in cit[id]: cit[id].remove(recid) dic[id] -= 1 if cit.has_key(recid) and cit[recid]: for id in cit[recid]: if ref.has_key(id) and recid in ref[id]: ref[id].remove(recid) ref[recid] = [] return [dic, cit, ref] def make_initial_result(): """return an initial dictinary with recID as key and zero as value """ dic = {} cit = {} ref = {} query = "select id from bibrec" res = run_sql(query) for key in res: dic[key[0]] = 0 cit[key[0]] = [] ref[key[0]] = [] return [dic, cit, ref] def get_citation_informations(recid_list, config): """return une dictionary that contains the citation information of cds records """ begin_time = os.times()[4] d_reports_numbers = {} d_references_report_numbers = {} d_references_s = {} d_records_s = {} citation_informations = [] record_pri_number_tag = config.get(config.get("rank_method", "function"),"publication_primary_number_tag") record_add_number_tag = config.get(config.get("rank_method", "function"),"publication_aditional_number_tag") reference_number_tag = config.get(config.get("rank_method", "function"),"publication_reference_number_tag") reference_tag = config.get(config.get("rank_method", "function"),"publication_reference_tag") record_publication_info_tag = config.get(config.get("rank_method", "function"),"publication_info_tag") for recid in recid_list: xml = print_record(int(recid),'xm') rs = create_records(xml) recs = map((lambda x:x[0]), rs) l_report_numbers = [] for rec in recs: pri_report_number = record_get_field_values(rec, record_pri_number_tag[0:3], code=record_pri_number_tag[3]) add_report_numbers = record_get_field_values(rec, record_add_number_tag[0:3], code=record_add_number_tag[3]) if pri_report_number: l_report_numbers.extend(pri_report_number) if add_report_numbers: l_report_numbers.extend(add_report_numbers) d_reports_numbers[recid] = l_report_numbers reference_report_number = record_get_field_values(rec, reference_number_tag[0:3], ind1=reference_number_tag[3], ind2=reference_number_tag[4], code=reference_number_tag[5]) if reference_report_number: d_references_report_numbers[recid] = reference_report_number references_s = record_get_field_values(rec, reference_tag[0:3], ind1=reference_tag[3], ind2=reference_tag[4], code=reference_tag[5]) if references_s: d_references_s[recid] = references_s record_s = record_get_field_values(rec, record_publication_info_tag[0:3], ind1='', ind2='', code=record_publication_info_tag[3]) if record_s: d_records_s[recid] = record_s[0] citation_informations.append(d_reports_numbers) citation_informations.append(d_references_report_numbers) citation_informations.append(d_references_s) citation_informations.append(d_records_s) end_time = os.times()[4] print "Execution time for generating citation inforamtions by parsing xml contents: ", (end_time - begin_time) return citation_informations def ref_analyzer(citation_informations, initialresult, initial_citationlist, initial_referencelist): """Analyze the citation informations and calculate the citation weight and cited by list dictionary """ citation_list = initial_citationlist reference_list = initial_referencelist result = initialresult d_reports_numbers = citation_informations[0] d_references_report_numbers = citation_informations[1] d_references_s = citation_informations[2] d_records_s = citation_informations[3] t1 = os.times()[4] for recid, refnumbers in d_references_report_numbers.iteritems(): for refnumber in refnumbers: p = refnumber f = 'reportnumber' rec_id = get_recids_matching_query(p, f) if rec_id: result[rec_id[0]] += 1 citation_list[rec_id[0]].append(recid) reference_list[recid].append(rec_id[0]) t2 = os.times()[4] for recid, refss in d_references_s.iteritems(): for refs in refss: p = refs f = 'publref' rec_id = get_recids_matching_query(p, f) if rec_id and not recid in citation_list[rec_id[0]]: result[rec_id[0]] += 1 citation_list[rec_id[0]].append(recid) if rec_id and not rec_id[0] in reference_list[recid]: reference_list[recid].append(rec_id[0]) t3 = os.times()[4] for rec_id, recnumbers in d_reports_numbers.iteritems(): for recnumber in recnumbers: p = recnumber f = '999C5r' recid_list = get_recids_matching_query(p, f) if recid_list: for recid in recid_list: if not recid in citation_list[rec_id]: result[rec_id] += 1 citation_list[rec_id].append(recid) if not rec_id in reference_list[recid]: reference_list[recid].append(rec_id) t4 = os.times()[4] for recid, recs in d_records_s.iteritems(): tmp = recs.find("-") if tmp < 0: recs_modified = recs else: recs_modified = recs[:tmp] p = recs_modified f = '999C5s' rec_ids = get_recids_matching_query(p, f) if rec_ids: for rec_id in rec_ids: if not rec_id in citation_list[recid]: result[recid] += 1 citation_list[recid].append(rec_id) if not recid in reference_list[rec_id]: reference_list[rec_id].append(recid) t5 = os.times()[4] insert_cit_ref_list_intodb(citation_list, reference_list) print "\nExecution time for analizing the citation information generating the dictionary: " print "checking ref number: ", (t2-t1) print "checking ref ypvt: ", (t3-t2) print "checking rec number: ", (t4-t3) print "checking rec ypvt: ", (t5-t4) print "total time of refAnalize: ", (t5-t1) return result def get_decompressed_xml(xml): """return a decompressed content of xml into a xml content """ decompressed_xml = create_records(decompress(xml)) return decompressed_xml def insert_cit_ref_list_intodb(citation_dic, reference_dic): """Insert the reference and citation list into the database""" date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) id = run_sql("SELECT * from rnkCITATIONDATA ") if id: run_sql("update rnkCITATIONDATA set citation_data_reversed = '%s'"% (get_compressed_dictionary(reference_dic))) run_sql("update rnkCITATIONDATA set citation_data = '%s'" % (get_compressed_dictionary(citation_dic))) else : run_sql("INSERT INTO rnkCITATIONDATA VALUES ('%s', null)" % (get_compressed_dictionary(citation_dic))) run_sql("update rnkCITATIONDATA set citation_data_reversed = '%s'"% (get_compressed_dictionary(reference_dic))) def get_compressed_dictionary(dic): """Serialize Python object vi a marshal into a compressed string.""" return MySQLdb.escape_string(compress(dumps(dic))) diff --git a/modules/bibrank/lib/bibrank_citation_indexer_tests.py b/modules/bibrank/lib/bibrank_citation_indexer_tests.py index eb105bdfa..5668ff836 100644 --- a/modules/bibrank/lib/bibrank_citation_indexer_tests.py +++ b/modules/bibrank/lib/bibrank_citation_indexer_tests.py @@ -1,46 +1,47 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Unit tests for the citation indexer.""" __version__ = "$Id$" import unittest -from bibrank_citation_indexer import last_updated_result + +from cdsware.bibrank_citation_indexer import last_updated_result class TestCitationIndexer(unittest.TestCase): def setUp(self): self.rank_method_code = 'cit' self.updated_recid_list = [339705, 339704, 339708] def test_last_updated_result(self): """bibrank citation indexer - last updated result""" self.assert_(last_updated_result(self.rank_method_code, self.updated_recid_list)) def create_test_suite(): """Return test suite for the bibrank citation indexer.""" return unittest.TestSuite((unittest.makeSuite(TestCitationIndexer, 'test'),)) if __name__ == "__main__": unittest.TextTestRunner(verbosity=2).run(create_test_suite()) diff --git a/modules/bibrank/lib/bibrank_citation_searcher.py b/modules/bibrank/lib/bibrank_citation_searcher.py index 87f1c1595..7998aee86 100644 --- a/modules/bibrank/lib/bibrank_citation_searcher.py +++ b/modules/bibrank/lib/bibrank_citation_searcher.py @@ -1,103 +1,104 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. from marshal import loads from zlib import decompress -from dbquery import run_sql + +from cdsware.dbquery import run_sql def init_cited_by_dictionary(): """return citation list dictionary from rnkCITATIONDATA """ query = "select citation_data from rnkCITATIONDATA" compressed_citation_dic = run_sql(query) citation_dic = None if compressed_citation_dic and compressed_citation_dic[0]: citation_dic = loads(decompress(compressed_citation_dic[0][0])) return citation_dic def init_reference_list_dictionary(): """return reference list dictionary from rnkCITATIONDATA """ query = "select citation_data_reversed from rnkCITATIONDATA" compressed_ref_dic = run_sql(query) ref_dic = None if compressed_ref_dic and compressed_ref_dic[0]: ref_dic = loads(decompress(compressed_ref_dic[0][0])) return ref_dic cache_cited_by_dictionary = init_cited_by_dictionary() cache_reference_list_dictionary = init_reference_list_dictionary() ### INTERFACE def calculate_cited_by_list(record_id, sort_order="d"): """Return a tuple of ([recid,citation_weight],...) for all the record in citing RECORD_ID. The resulting recids is sorted by ascending/descending citation weights depending or SORT_ORDER. """ citation_list = [] result = [] # determine which record cite RECORD_ID: if cache_cited_by_dictionary: citation_list = cache_cited_by_dictionary.get(record_id, []) # get their weights: query = "select relevance_data from rnkMETHODDATA, rnkMETHOD WHERE rnkMETHOD.id=rnkMETHODDATA.id_rnkMETHOD and rnkMETHOD.name='cit'" compressed_citation_weight_dic = run_sql(query) if compressed_citation_weight_dic and compressed_citation_weight_dic[0]: citation_dic = loads(decompress(compressed_citation_weight_dic[0][0])) for id in citation_list: tmp = [id, citation_dic[id]] result.append(tmp) # sort them: if result: if sort_order == "d": result.sort(lambda x, y: cmp(y[1], x[1])) else: result.sort(lambda x, y: cmp(x[1], y[1])) return result def calculate_co_cited_with_list(record_id, sort_order="d"): """Return a tuple of ([recid,co-cited weight],...) for records that are co-cited with RECORD_ID. The resulting recids is sorted by ascending/descending citation weights depending or SORT_ORDER. """ result = [] result_intermediate = {} citation_list = [] if cache_cited_by_dictionary: citation_list = cache_cited_by_dictionary.get(record_id, []) for cit_id in citation_list: reference_list = [] if cache_reference_list_dictionary: reference_list = cache_reference_list_dictionary.get(cit_id, []) for ref_id in reference_list: if not result_intermediate.has_key(ref_id): result_intermediate[ref_id] = 1 else: result_intermediate[ref_id] += 1 for key, value in result_intermediate.iteritems(): if not (key==record_id): result.append([key, value]) if result: if sort_order == "d": result.sort(lambda x, y: cmp(y[1], x[1])) else: result.sort(lambda x, y: cmp(x[1], y[1])) return result diff --git a/modules/bibrank/lib/bibrank_citation_searcher_tests.py b/modules/bibrank/lib/bibrank_citation_searcher_tests.py index da6f0a44b..7c6f1fd6f 100644 --- a/modules/bibrank/lib/bibrank_citation_searcher_tests.py +++ b/modules/bibrank/lib/bibrank_citation_searcher_tests.py @@ -1,59 +1,60 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Unit tests for the citation searcher.""" __version__ = "$Id$" -import bibrank_citation_searcher import unittest +from cdsware import bibrank_citation_searcher + class TestCitationSearcher(unittest.TestCase): def setUp(self): self.recid = 339705 self.recids = [339705, 339706] self.rank_method_code = 'cit' def test_init_cited_by_dictionary(self): """bibrank citation searcher - init cited-by data""" # FIXME: test postponed #self.assert_(bibrank_citation_searcher.init_cited_by_dictionary()) def test_init_reference_list_dictionary(self): """bibrank citation searcher - init reference data""" # FIXME: test postponed #self.assert_(bibrank_citation_searcher.init_reference_list_dictionary()) def test_calculate_cited_by_list(self): """bibrank citation searcher - get citing relevance""" # FIXME: test postponed def test_calculate_co_cited_with_list(self): """bibrank citation searcher - get co-cited-with data""" # FIXME: test postponed def create_test_suite(): """Return test suite for the citation searcher.""" return unittest.TestSuite((unittest.makeSuite(TestCitationSearcher,'test'),)) if __name__ == "__main__": unittest.TextTestRunner(verbosity=2).run(create_test_suite()) diff --git a/modules/bibrank/lib/bibrank_downloads_grapher.py b/modules/bibrank/lib/bibrank_downloads_grapher.py index e66b2ccba..34d0726b8 100644 --- a/modules/bibrank/lib/bibrank_downloads_grapher.py +++ b/modules/bibrank/lib/bibrank_downloads_grapher.py @@ -1,274 +1,275 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import string import os import sys import time import tempfile import calendar -from config import weburl, cdslang -from messages import gettext_set_language -from dbquery import run_sql -from bibrank_downloads_indexer import database_tuples_to_single_list -from bibrank_grapher import * + +from cdsware.config import weburl, cdslang +from cdsware.messages import gettext_set_language +from cdsware.dbquery import run_sql +from cdsware.bibrank_downloads_indexer import database_tuples_to_single_list +from cdsware.bibrank_grapher import * color_line_list = ['9', '19', '10', '15', '21', '18'] cfg_id_bibdoc_id_bibrec = 5 cfg_bibrank_print_download_history = 1 cfg_bibrank_print_download_split_by_id = 1 def create_download_history_graph_and_box(id_bibrec, ln=cdslang): """Create graph with citation history for record ID_BIBREC (into a temporary file) and return HTML box refering to that image. Called by Detailed record pages. Notes: if id_bibdoc=0 : its an oustide-stored document and it has no id_bibdoc --> only one line if nb_id_bibdoc <= cfg_id_bibdoc_id_bibrec draw one line per id_bibdoc if nb_id_bibdoc > cfg_id_bibdoc_id_bibrec draw only one line which hold simultaneously the downloads per id_bibdoc Each time this function is called, all the images older than 10 minutes are deleted. """ _ = gettext_set_language(ln) html_code = "" html_content = "" users_analysis_text = "" if cfg_bibrank_print_download_split_by_id: users_analysis_text = "and Users repartition" #remove images older than 10 minutes remove_old_img("download") #Users analysis graph ips = database_tuples_to_single_list(run_sql("select client_host from rnkDOWNLOADS where id_bibrec=%s;" % id_bibrec)) if ips == []: pass else : users_analysis_results = create_users_analysis_graph(id_bibrec, ips) graph_file_users = weburl + "/img/" + users_analysis_results[0] file_to_close_users = users_analysis_results[1] html_content += """""" % graph_file_users if file_to_close_users: if os.path.exists(file_to_close_users): os.unlink(file_to_close_users) #Downloads history graph and return html code used by get_file or search_engine if cfg_bibrank_print_download_history: remove_old_img("download") nb_id_bibdoc = run_sql("select distinct id_bibdoc from rnkDOWNLOADS where id_bibrec=%s;" % id_bibrec) history_analysis_results = () if nb_id_bibdoc == (): pass elif nb_id_bibdoc[0][0] <= cfg_id_bibdoc_id_bibrec and (0, ) not in nb_id_bibdoc: history_analysis_results = draw_downloads_statistics(id_bibrec, list(nb_id_bibdoc)) else: history_analysis_results = draw_downloads_statistics(id_bibrec, []) if history_analysis_results: graph_file_history = weburl + "/img/" + history_analysis_results[0] file_to_close_history = history_analysis_results[1] html_content += """""" % graph_file_history if file_to_close_history : if os.path.exists(file_to_close_history): os.unlink(file_to_close_history) out = "" if html_content != "": out += """

%s %s
""" % (_("Downloads history:"), users_analysis_text) out += html_content out += "
" return out def draw_downloads_statistics(id_bibrec, id_bibdoc_list): """Create a graph about download history using a temporary file to store datas and a new png file for each id_bibrec to store the image of the graph which will be referenced by html code.""" intervals = [] #used to name the different curves when len(id_bibdoc_list)>1 docfile_name_list = [] #used to name the uniquecurve when len(id_bibdoc_list)=0 or > cfg_id_bibdoc_id_bibrec record_name = run_sql("select value from bibrec_bib24x,bib24x where id_bibrec=%s and id_bibxxx=id;" % id_bibrec)[0][0] #list of lists of tuples: [[("09/2004",4),..],[(..,..)]..] #Each list element of the list is represented by a curve #each elem of each list is a point on the graph coordinates_list = [] #If the document is not stored in CdsWare it has id_bibrec 0 and no creation date #In this case the beginning date is the first time the document has been downloaded creation_date_res = run_sql("""SELECT DATE_FORMAT(creation_date,"%%Y-%%m-%%d-%%H:%%i:%%s") FROM bibrec WHERE id=%s;""" % id_bibrec) if creation_date_res == (): creation_date_res = run_sql("""SELECT DATE_FORMAT(MIN(download_time),"%%Y-%%m-%%d-%%H:%%i:%%s") FROM rnkDOWNLOADS where id_bibrec=%s;""" % id_bibrec) creation_date_year, creation_date_month, creation_date_day, creation_date_time = string.split(creation_date_res[0][0], "-") creation_date_year = string.atoi(creation_date_year) creation_date_month = string.atoi(creation_date_month) creation_date_day = string.atoi(creation_date_day) creation_date_time = str(creation_date_time) #create intervals and corresponding values local_time = time.localtime() res = create_tic_intervals(local_time, creation_date_year, creation_date_month) intervals = res[1] tic_list = res[0] if id_bibdoc_list == []: coordinates_list.append(create_list_tuple_data(intervals, id_bibrec)) docfile_name_list = record_name else : for i in range(len(id_bibdoc_list)): datas = create_list_tuple_data(intervals, id_bibrec, id_bibdoc_query_addition="and id_bibdoc=%s" % id_bibdoc_list[i][0]) coordinates_list.append(datas) docfile_name_list.append(run_sql("select docname from bibdoc where id=%s;" % id_bibdoc_list[i][0])[0][0]) #In case of multiple id_bibdocs datas_max will be used to draw a line which is the total of the others lines if not (len(intervals)==1 or len(id_bibdoc_list)==1): datas_max = create_list_tuple_total(intervals, coordinates_list) coordinates_list.append(datas_max) #write coordinates_list in a temporary file result2 = write_coordinates_in_tmp_file(coordinates_list) fname = result2[0] y_max = result2[1] #Use create the graph from the temporary file return create_temporary_image(id_bibrec, 'download_history', fname, '', 'Downloads/month', (0, 0), y_max, id_bibdoc_list, docfile_name_list, tic_list) def create_list_tuple_data(intervals, id_bibrec, id_bibdoc_query_addition=""): """-Return a list of tuple of the form [('10/2004',3),(..)] used to plot graph Where 3 is the number of downloads between 01/10/2004 and 31/10/2004""" list_tuple = [] for elem in intervals: main_date_end = string.split(elem[1], '/') end_of_month_end = calendar.monthrange(string.atoi(main_date_end[1]), string.atoi(main_date_end[0]))[1] s0 = string.split(elem[0], "/") s1 = string.split(elem[1], "/") elem0 = s0[1] + "-" + s0[0] elem1 = s1[1] + "-" + s1[0] date1 = "%s%s" % (elem0, "-01 00:00:00") date2 = "%s%s" % (elem1, "-%s 00:00:00" % str(end_of_month_end)) sql_query = "select count(*) from rnkDOWNLOADS where id_bibrec=%s %s and download_time>='%s' and download_time<'%s';" % (id_bibrec, id_bibdoc_query_addition, date1, date2) res = run_sql(sql_query)[0][0] list_tuple.append((elem[0], res)) #list_tuple = sort_list_tuple_by_date(list_tuple) return (list_tuple) def sort_list_tuple_by_date(list_tuple): """Sort a list of tuple of the forme ("09/2004", 3)according to the year of the first element of the tuple""" list_tuple.sort(lambda x, y: (cmp(string.split(x[0], '/')[1], string.split(y[0], '/')[1]))) return list_tuple def create_list_tuple_total(intervals, list_data): """In the case of multiple id_bibdocs, a last paragraph is added at the end to show the global evolution of the record""" list_tuple = [] if len(intervals)==1: res = 0 for j in range(len(list_data)): res += list_data[j][1] list_tuple.append((intervals[0][0], res)) else : for i in range(len(intervals)): res = 0 for j in range(len(list_data)): res += list_data[j][i][1] list_tuple.append((intervals[i][0], res)) #list_tuple = sort_list_tuple_by_date(list_tuple) return list_tuple def create_tic_intervals(local_time, creation_date_year, creation_date_month): """Create intervals since document creation date until now Return a list of the tics for the graph of the form ["04/2004","05/2004"), ...] And a list of tuple(each tuple stands for a period) of the form [("04/2004", "04/2004"),.] to compute the number of downloads in each period For the very short periods some tics and tuples are added to make sure that at least two dates are returned. Useful for drawing graphs. """ # okay, off we go tic_list = [] interval_list = [] local_month = local_time.tm_mon local_year = local_time.tm_year original_date = (creation_date_month, creation_date_year) while (creation_date_year, creation_date_month) <= (local_year, local_month) and creation_date_month <= 12: date_elem = "%s/%s" % (creation_date_month, creation_date_year) tic_list.append(date_elem) interval_list.append((date_elem, date_elem)) if creation_date_month != 12: creation_date_month = creation_date_month+1 else : creation_date_year = creation_date_year+1 creation_date_month = 1 next_period = (creation_date_month, creation_date_year) #additional periods for the short period if len(interval_list) <= 2: period_before = "%s/%s" % (sub_month(original_date[0], original_date[1])) period_after = "%s/%s" % next_period interval_list.insert(0, (period_before, period_before)) interval_list.append((period_after, period_after)) tic_list.insert(0, period_before) tic_list.append(period_after) return (tic_list, interval_list) def add_month(month, year): """Add a month and increment the year if necessary""" if month == 12: month = 1 year += 1 else : month += 1 return (month, year) def sub_month(month, year): """Add a month and decrease the year if necessary""" if month == 1: month = 12 year = year -1 else : month -= 1 return (month, year) def create_users_analysis_graph(id_bibrec, ips): """For a given id_bibrec, classify cern users and other users Draw a percentage graphic reprentation""" cern_users = 0 other_users = 0 coordinates_list = [] #compute users repartition for i in range(len(ips)): if 2307522817 <= ips[i] <= 2307588095 or 2156724481 <= ips[i] <= 2156789759: cern_users += 1 else : other_users += 1 tot = float(cern_users+other_users) #prepare coordinates list coordinates_list.append((1, str(float(cern_users)/tot*100))) coordinates_list.append((3, str(float(other_users)/tot*100))) #write coordinates in a temporary file result2 = write_coordinates_in_tmp_file([coordinates_list]) #plot the graph return create_temporary_image(id_bibrec, 'download_users', result2[0], '', '', (0, 0), result2[1], [], [], [1, 3]) diff --git a/modules/bibrank/lib/bibrank_downloads_indexer.py b/modules/bibrank/lib/bibrank_downloads_indexer.py index ee93c1167..efb4388c7 100644 --- a/modules/bibrank/lib/bibrank_downloads_indexer.py +++ b/modules/bibrank/lib/bibrank_downloads_indexer.py @@ -1,192 +1,192 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. -## import interesting modules: import os import time import calendar import ConfigParser import string -from config import * -from dbquery import run_sql + +from cdsware.config import * +from cdsware.dbquery import run_sql def append_to_file(path, content): """print result in a file""" if os.path.exists(path): file_dest = open(path,"a") file_dest.write("Hit on %s reads:" % time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())) file_dest.write(content) file_dest.write("\n") file_dest.close() return content def get_download_weight_filtering_user(dic, keys): """ update the dictionnary.Without duplicates.Count maximum one hit per user per hour""" for k in keys: weight = 0 user_ips = run_sql("select count(distinct client_host) from rnkDOWNLOADS where id_bibrec=%s group by id_bibdoc" % k) for ip in user_ips: weight = weight + ip[0] dic[k] = weight return dic def get_download_weight_total(dic, keys): """ update the dictionnary.Count all the hit""" for k in keys: values = run_sql("select count(*) from rnkDOWNLOADS where id_bibrec=%s %s" % (k,";")) dic[k] = values[0][0] return dic def uniq(alist): """Remove duplicate element in alist Fastest without order preserving""" set = {} map(set.__setitem__, alist, []) return set.keys() def database_tuples_to_single_list(tuples): """convert a tuple extracted from the database into a list""" list_result = [] for tup in range(len(tuples)): list_result.append(tuples[tup][0]) return list_result def new_downloads_to_index (last_updated): """id_bibrec of documents downloaded since the last run of bibrank """ id_bibrec_list = database_tuples_to_single_list(run_sql("select id_bibrec from rnkDOWNLOADS where download_time >=\"%s\"" % last_updated)) res = uniq(id_bibrec_list) return res def filter_downloads_per_hour_with_docid (keys, last_updated): """filter all the duplicate downloads per user for each hour intervall""" for k in keys: id_bibdocs = run_sql("select distinct id_bibdoc from rnkDOWNLOADS where id_bibrec=%s" % k) for bibdoc in id_bibdocs: values = run_sql("""select DATE_FORMAT(download_time,"%%Y-%%m-%%d %%H"), client_host from rnkDOWNLOADS where id_bibrec=%s and id_bibdoc=%s and download_time >=\"%s\";""" % (k, bibdoc[0], last_updated)) for val in values: date_res = val[0] date1 = "%s:00:00" % (date_res) date2 = compute_next_hour(date_res) duplicates = (run_sql("select count(*) from rnkDOWNLOADS where id_bibrec=%s and id_bibdoc=%s and download_time>='%s' and download_time<'%s' and client_host=%s;" % (k, bibdoc[0], date1, date2, val[1]))[0][0])-1 run_sql("delete from rnkDOWNLOADS where id_bibrec=%s and id_bibdoc=%s and download_time>='%s' and download_time<'%s' and client_host=%s limit %s;" % (k, bibdoc[0], date1, date2, val[1], duplicates)) def filter_downloads_per_hour (keys, last_updated): """filter all the duplicate downloads per user for each hour intervall""" for k in keys: values = run_sql("""select DATE_FORMAT(download_time,"%%Y-%%m-%%d %%H"), client_host from rnkDOWNLOADS where id_bibrec=%s and download_time >=\"%s\";""" % (k, last_updated)) for val in values: date_res = val[0] date1 = "%s:00:00" % (date_res) date2 = compute_next_hour(date_res) duplicates = (run_sql("select count(*) from rnkDOWNLOADS where id_bibrec=%s and download_time>='%s' and download_time<'%s' and client_host=%s;" % (k, date1, date2, val[1]))[0][0])-1 run_sql("delete from rnkDOWNLOADS where id_bibrec=%s and download_time>='%s' and download_time<'%s' and client_host=%s limit %s;" % (k, date1, date2, val[1], duplicates)) def compute_next_hour(date_res): """treat the change of the year, of (special)month etc.. and return the date in database format""" next_date = "" date_res, date_hour = string.split(date_res, " ") date_hour = string.atoi(date_hour) if date_hour == 23: date_year, date_month, date_day = string.split(date_res, "-") date_year = string.atoi(date_year) date_month = string.atoi(date_month) date_day = string.atoi(date_day) if date_month == 12 and date_day == 31: next_date = "%s-%s-%s 00:00:00" % (date_year + 1, 01, 01) elif calendar.monthrange(date_year, date_month)[1] == date_day: next_date = "%s-%s-%s 00:00:00" % (date_year, date_month + 1, 01) else : next_date = "%s-%s-%s 00:00:00" % (date_year, date_month, date_day + 1) else : next_hour = date_hour + 1 next_date = "%s %s:00:00" % (date_res, next_hour) return next_date def get_file_similarity_by_times_downloaded(dic, id_bibrec_list): """For each id_bibrec, get the client_host and see which other id_bibrec these users have also downloaded. Return update dictionnary of this form {id_bibrec:[(id_bibrec1,score),(id_bibrec2,score)],id_bibrec:[(),()]...} Take long time so let's see bibrank_downloads_similarity which compute in fly the similarity for a particular recid.""" dic_result = {} if id_bibrec_list != []: tuple_string_id_bibrec_list = str(tuple(id_bibrec_list)) if len(id_bibrec_list) == 1: tuple_string_id_bibrec_list = tuple_string_id_bibrec_list.replace(',','') #first compute the download similarity between the new documents #which have been downloadwd since the last run of bibrank dic_news = {} res = run_sql("select id_bibrec,client_host from rnkDOWNLOADS where id_bibrec in %s;" % tuple_string_id_bibrec_list) for res_elem in res: id_bibrec_key = res_elem[0] client_host_value = str(res_elem[1]) if id_bibrec_key in dic_news.keys(): tmp_list = dic_news[id_bibrec_key] if client_host_value not in dic_news[id_bibrec_key]: tmp_list.append(client_host_value) dic_news[id_bibrec_key] = tmp_list else : list_client_host_value = [] list_client_host_value.append(client_host_value) dic_news[id_bibrec_key] = list_client_host_value #compute occurence of client_host for j in dic_news.keys(): list_tuple = [] tuple_client_host = str(tuple(dic_news[j])) if len(tuple(dic_news[j])) == 1: tuple_client_host = tuple_client_host.replace(',','') res2 = run_sql("select id_bibrec,count(*) from rnkDOWNLOADS where client_host in %s and id_bibrec in %s and id_bibrec != %s group by id_bibrec;" % (tuple_client_host, tuple_string_id_bibrec_list, j)) #0.0023 par requete list_tuple.append(list(res2)) dic_result[j] = list_tuple[0] #merge new values with old dictionnary return merge_with_old_dictionnary(dic, dic_result) def merge_with_old_dictionnary(old_dic, new_dic): """For each key id_bibrec in new_dic add the old values contained in old_dic Return not ordered merged dictionnary""" union_dic = {} for (key, value) in new_dic.iteritems(): if key in old_dic.keys(): old_dic_value_dic = dict(old_dic[key]) tuple_list = [] old_dic_value_dic_keys = old_dic_value_dic.keys() for val in value: if val[0] in old_dic_value_dic_keys: tuple_list.append((val[0], val[1]+ old_dic_value_dic[val[0]])) del old_dic_value_dic[val[0]] else : tuple_list.append((val[0], val[1])) old_dic_value_dic_items = old_dic_value_dic.items() if old_dic_value_dic_items != []: tuple_list.extend(old_dic_value_dic_items) union_dic[key] = tuple_list else : union_dic[key] = value for (key, value) in old_dic.iteritems(): if key not in union_dic.keys(): union_dic[key] = value return union_dic diff --git a/modules/bibrank/lib/bibrank_downloads_indexer_tests.py b/modules/bibrank/lib/bibrank_downloads_indexer_tests.py index 95d16ba79..d29a18d94 100644 --- a/modules/bibrank/lib/bibrank_downloads_indexer_tests.py +++ b/modules/bibrank/lib/bibrank_downloads_indexer_tests.py @@ -1,53 +1,54 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## read config variables -import bibrank_downloads_indexer import unittest +from cdsware import bibrank_downloads_indexer + class TestListSetOperations(unittest.TestCase): """Test list set operations.""" def test_uniq(self): """bibrank downloads indexer - uniq function""" self.assertEqual([1, 2, 3], bibrank_downloads_indexer.uniq([1, 2, 3, 3, 3, 2])) def test_database_tuples_to_single_list(self): """bibrank downloads indexer - database tuples to list""" self.assertEqual([1, 2, 3], bibrank_downloads_indexer.database_tuples_to_single_list(((1,), (2,), (3,)))) class TestMergeDictionnaries(unittest.TestCase): """Test bibrank_downloads_indexer merge 2 dictionnaries""" def test_merge_with_old_dictionnary(self): """bibrank downloads indexer - merging with old dictionary""" self.assertEqual({1:[(2, 3)], 2:[(3, 4)], 3:[(4, 5)]}, bibrank_downloads_indexer.merge_with_old_dictionnary(\ {3:[(4, 5)]}, {1:[(2, 3)], 2:[(3, 4)]})) self.assertEqual({1:[(2, 4)], 2:[(3, 4)]}, bibrank_downloads_indexer.merge_with_old_dictionnary(\ {1:[(2, 1)]}, {1:[(2, 3)], 2:[(3, 4)]})) self.assertEqual({1:[(3, 3), (2, 3)], 2:[(3, 4)]}, bibrank_downloads_indexer.merge_with_old_dictionnary(\ {1:[(2, 3)]}, {1:[(3, 3)], 2:[(3, 4)]})) self.assertEqual({}, bibrank_downloads_indexer.merge_with_old_dictionnary({}, {})) def create_test_suite(): """Return test suite for the downlaods engine.""" return unittest.TestSuite((unittest.makeSuite(TestListSetOperations, 'test'), unittest.makeSuite(TestMergeDictionnaries, 'test'))) if __name__ == "__main__": unittest.TextTestRunner(verbosity=2).run(create_test_suite()) diff --git a/modules/bibrank/lib/bibrank_downloads_similarity.py b/modules/bibrank/lib/bibrank_downloads_similarity.py index 8b9bf776f..a61b2bd99 100644 --- a/modules/bibrank/lib/bibrank_downloads_similarity.py +++ b/modules/bibrank/lib/bibrank_downloads_similarity.py @@ -1,101 +1,101 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. -from config import * -from dbquery import run_sql -from bibrank_downloads_indexer import database_tuples_to_single_list +from cdsware.config import * +from cdsware.dbquery import run_sql +from cdsware.bibrank_downloads_indexer import database_tuples_to_single_list def get_fieldvalues(recID, tag): """Return list of field values for field TAG inside record RECID. Copy from search_engine""" out = [] if tag == "001___": # we have asked for recID that is not stored in bibXXx tables out.append(str(recID)) else: # we are going to look inside bibXXx tables digit = tag[0:2] bx = "bib%sx" % digit bibx = "bibrec_bib%sx" % digit query = "SELECT bx.value FROM %s AS bx, %s AS bibx WHERE bibx.id_bibrec='%s' AND bx.id=bibx.id_bibxxx AND bx.tag LIKE '%s'" \ "ORDER BY bibx.field_number, bx.tag ASC" % (bx, bibx, recID, tag) res = run_sql(query) for row in res: out.append(row[0]) return out def record_exists(recID): """Return 1 if record RECID exists. Return 0 if it doesn't exist. Return -1 if it exists but is marked as deleted. Copy from search_engine""" out = 0 query = "SELECT id FROM bibrec WHERE id='%s'" % recID res = run_sql(query, None, 1) if res: # record exists; now check whether it isn't marked as deleted: dbcollids = get_fieldvalues(recID, "980__%") if ("DELETED" in dbcollids) or (cfg_cern_site and "DUMMY" in dbcollids): out = -1 # exists, but marked as deleted else: out = 1 # exists fine return out ### INTERFACE def register_page_view_event(recid, uid, client_ip_address): """Register Detailed record page view event for record RECID consulted by user UID from machine CLIENT_HOST_IP. To be called by the search engine. """ return run_sql("INSERT INTO rnkPAGEVIEWS (id_bibrec,id_user,client_host,view_time) " \ "VALUES (%s,%s,INET_ATON(%s),NOW())", (recid, uid, client_ip_address)) def calculate_reading_similarity_list(recid, type="pageviews"): """Calculate reading similarity data to use in reading similarity boxes (``people who downloaded/viewed this file/page have also downloaded/viewed''). Return list of (recid1, score1), (recid2,score2), ... for all recidN that were consulted by the same people who have also consulted RECID. The reading similarity TYPE can be either `pageviews' or `downloads', depending whether we want to obtain page view similarity or download similarity. """ if type == "downloads": tablename = "rnkDOWNLOADS" else: # default tablename = "rnkPAGEVIEWS" # firstly compute the set of client hosts who consulted recid: client_host_list = run_sql("SELECT DISTINCT(client_host) FROM %s WHERE id_bibrec=%s;" % \ (tablename, recid)) # secondly look up all recids that were consulted by these client hosts, # and order them by the number of different client hosts reading them: res = [] if client_host_list != (): client_host_list = str(database_tuples_to_single_list(client_host_list)) client_host_list = client_host_list.replace("L", "") client_host_list = client_host_list.replace("[", "") client_host_list = client_host_list.replace("]", "") res = run_sql("SELECT id_bibrec,COUNT(DISTINCT(client_host)) AS c FROM %s " \ "WHERE client_host IN (%s) AND id_bibrec != %s " \ "GROUP BY id_bibrec ORDER BY c DESC LIMIT 10;" % \ (tablename, client_host_list, recid)) return res diff --git a/modules/bibrank/lib/bibrank_grapher.py b/modules/bibrank/lib/bibrank_grapher.py index bf8556c37..84f4ca3a5 100644 --- a/modules/bibrank/lib/bibrank_grapher.py +++ b/modules/bibrank/lib/bibrank_grapher.py @@ -1,205 +1,204 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __version__ = "$Id$" -## import interesting modules: - import os import sys import time import tempfile -from config import webdir -from websubmit_config import * + +from cdsware.config import webdir +from cdsware.websubmit_config import * ## test gnuplot presence: cfg_gnuplot_available = 1 try: import Gnuplot except ImportError, e: cfg_gnuplot_available = 0 def write_coordinates_in_tmp_file(lists_coordinates): """write the graph coordinates in a temporary file for reading it later by the create_temporary_image method lists_coordinates is a list of list of this form: [[(1,3),(2,4),(3,5)],[(1,5),(2,5),(3,6)] This file is organized into one or more sets of 2 columns. Each set is separated from the others by two blank lines. Each intern list represents a set and each tuple a line in the file where fist element of the tuple is the element of the first column, and second element of the tuple is the element of the second column. With gnuplot, first column is used as x coordinates, and second column as y coordinates. One set represents a curve in the graph. """ max_y_datas = 0 tempfile.tempdir = webdir + "img" fname = tempfile.mktemp() file_dest = open(fname, 'a') for list_elem in lists_coordinates: y_axe = [] #prepare data and store them in a file for key_value in list_elem: file_dest.write("%s %s\n"%(key_value[0], key_value[1])) y_axe.append(key_value[1]) max_tmp = 0 if y_axe: max_tmp = max(y_axe) if max_tmp > max_y_datas: max_y_datas = max_tmp file_dest.write("\n\n") file_dest.close() return [fname, max_y_datas] def create_temporary_image(recid, kind_of_graphe, data_file, x_label, y_label, origin_tuple, y_max, docid_list, graphe_titles, intervals): """From a temporary file, draw a gnuplot graph The arguments are as follows: recid - reccord ID kind_of_graph - takes one of these values : "citation" ,"download_history", "download_users" All the commons gnuplot commands for these cases, are written at the beginning After the particular commands dependaing of each case are written. data_file - Name of the temporary file which contains the gnuplot datas used to plot the graph. This file is organized into one or more sets of 2 columns. First column contains x coordinates, and second column contains y coordinates. Each set is separated from the others by two blank lines. x_label - Name of the x axe. y_label - Name of the y axe. origin_tuple - Reference coordinates for positionning the graph. y_max - Max value of y. Used to set y range. docid_list - In download_history case, docid_list is used to plot multiple curves. graphe_titles - List of graph titles. It's used to name the curve in the legend. intervals - x tics location and xrange specification""" if cfg_gnuplot_available == 0: return (None, None) #For different curves color_line_list = ['4', '3', '2', '9', '6'] #Gnuplot graphe object g = Gnuplot.Gnuplot() #Graphe name: file to store graph graphe_name = "tmp_%s_%s_stats.png" % (kind_of_graphe, recid) g('set terminal png small') g('set output "%s/img/%s"' % (webdir, graphe_name)) len_intervals = len(intervals) len_docid_list = len(docid_list) # Standard options g('set size 0.5,0.5') g('set origin %s,%s'% (origin_tuple[0], origin_tuple[1])) if x_label == '': g('unset xlabel') else: g.xlabel(s = x_label) if x_label == '': g('unset ylabel') else: g.ylabel(s = y_label) g('set bmargin 5') #let a place at the top of the graph g('set tmargin 1') #Will be passed to g at the end to plot the graphe plot_text = "" if kind_of_graphe == 'download_history': g('set xdata time') #Set x scale as date g('set timefmt "%m/%Y"') #Inform about format in file .dat g('set format x "%b %y"') #Format displaying if len(intervals) > 1 : g('set xrange ["%s":"%s"]' % (intervals[0], intervals[len_intervals-1])) y_offset = max(3, float(y_max)/60) g('set yrange [0:%s]' %str(y_max + y_offset)) if len_intervals > 1 and len_intervals <= 12: g('set xtics rotate %s' % str(tuple(intervals)))#to prevent duplicate tics elif len_intervals > 12 and len_intervals <= 24: g('set xtics rotate "%s", 7776000, "%s"' % (intervals[0], intervals[len_intervals-1])) #3 months intervalls else : g('set xtics rotate "%s",15552000, "%s"' % (intervals[0], intervals[len_intervals-1])) #6 months intervalls if len_docid_list <= 1: #Only one curve #g('set style fill solid 0.25') if len(intervals)<=4: plot_text = plot_command(1, data_file, (0, 0), "", "imp", color_line_list[0], 20) else: plot_text = plot_command(1, data_file, (0, 0), "", "linespoint", color_line_list[0], 1, "pt 26", "ps 0.5") elif len_docid_list > 1: #Multiple curves if len(intervals)<=4: plot_text = plot_command(1, data_file, (0, 0), graphe_titles[0], "imp", color_line_list[0], 20) else: plot_text = plot_command(1, data_file, (0, 0), graphe_titles[0], "linespoint", color_line_list[0], 1, "pt 26", "ps 0.5") for d in range(1, len_docid_list): if len(intervals)<=4: plot_text += plot_command(0, data_file, (d, d) , graphe_titles[d], "imp", color_line_list[d], 20) else : plot_text += plot_command(0, data_file, (d, d) , graphe_titles[d], "linespoint", color_line_list[d], 1, "pt 26", "ps 0.5") if len(intervals)>2: plot_text += plot_command(0, data_file, (len_docid_list, len_docid_list), "", "impulses", 0, 2 ) plot_text += plot_command(0, data_file, (len_docid_list, len_docid_list), "TOTAL", "lines", 0, 5) elif kind_of_graphe == 'download_users': g('set size 0.25,0.5') g('set xrange [0:4]') g('set yrange [0:100]') g('set format y "%g %%"') g("""set xtics ("" 0, "CERN\\n Users" 1, "Other\\n Users" 3, "" 4)""") g('set ytics 0,10,100') g('set boxwidth 0.7 relative') g('set style fill solid 0.25') plot_text = 'plot "%s" using 1:2 title "" with boxes lt 7 lw 2' % data_file else: #citation g('set boxwidth 0.6 relative') g('set style fill solid 0.250000 border -1') g('set xtics rotate %s'% str(tuple(intervals))) g('set xrange [%s:%s]' % (str(intervals[0]), str(intervals[len_intervals-1]))) g('set yrange [0:%s]' %str(y_max+2)) plot_text = """plot "% s" index 0:0 using 1:2 title "" w steps lt %s lw 3""" % (data_file, color_line_list[1]) g('%s' % plot_text) return (graphe_name, data_file) def remove_old_img(prefix_file_name): """Detele all the images older than 10 minutes to prevent to much storage Takes 0.0 seconds for 50 files to delete""" command = "find %s/img/ -name tmp_%s*.png -amin +10 -exec rm -f {} \;" % (webdir, prefix_file_name) return os.system(command) def plot_command(first_line, file_source, indexes, title, style, line_type, line_width, point_type="", point_size=""): """Return a string of a gnuplot plot command.Particularly useful when multiple curves From a temporary file, draw a gnuplot graph Return a plot command string as follows: plot datafile , datafile ,... The arguments are as follows: first_line - only the drawing command of the first curve contains the word plot file_source - data file source which containes coordinates indexes - points out set number in data file source title - title of the curve in the legend box style - respresentation of the curve ex: linespoints, lines ... line_type - color of the line line_width - width of the line point_type - optionnal parameter: if not mentionned it's a wide string. Using in the case of style = linespoints to set point style""" if first_line: plot_text = """plot "%s" index %s:%s using 1:2 title "%s" with %s lt %s lw %s %s %s""" % (file_source, indexes[0], indexes[1], title, style, line_type, line_width, point_type, point_size) else: plot_text = """, "%s" index %s:%s using 1:2 title "%s" with %s lt %s lw %s %s %s""" % (file_source, indexes[0], indexes[1], title, style, line_type, line_width, point_type, point_size) return plot_text diff --git a/modules/bibrank/lib/bibrank_record_sorter.py b/modules/bibrank/lib/bibrank_record_sorter.py index fa1fee746..978d9aed9 100644 --- a/modules/bibrank/lib/bibrank_record_sorter.py +++ b/modules/bibrank/lib/bibrank_record_sorter.py @@ -1,739 +1,740 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## Ranking of records using different parameters and methods on the fly. ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import sys import zlib import marshal import string import time import math import MySQLdb import Numeric import re import ConfigParser import traceback import copy -from config import * -from dbquery import run_sql -from bibindex_engine_stemmer import stem -from bibindex_engine_stopwords import is_stopword -from search_engine_config import cfg_max_recID -from bibrank_citation_searcher import calculate_cited_by_list +from cdsware.config import * +from cdsware.dbquery import run_sql +from cdsware.bibindex_engine_stemmer import stem +from cdsware.bibindex_engine_stopwords import is_stopword +from cdsware.search_engine_config import cfg_max_recID +from cdsware.bibrank_citation_searcher import calculate_cited_by_list + class HitSet: """Class describing set of records, implemented as bit vectors of recIDs. Using Numeric arrays for speed (1 value = 8 bits), can use later "real" bit vectors to save space.""" def __init__(self, init_set=None): self._nbhits = -1 if init_set: self._set = init_set else: self._set = Numeric.zeros(cfg_max_recID+1, Numeric.Int0) def __repr__(self, join=string.join): return "%s(%s)" % (self.__class__.__name__, join(map(repr, self._set), ', ')) def add(self, recID): "Adds a record to the set." self._set[recID] = 1 def addmany(self, recIDs): "Adds several recIDs to the set." for recID in recIDs: self._set[recID] = 1 def addlist(self, arr): "Adds an array of recIDs to the set." Numeric.put(self._set, arr, 1) def remove(self, recID): "Removes a record from the set." self._set[recID] = 0 def removemany(self, recIDs): "Removes several records from the set." for recID in recIDs: self.remove(recID) def intersect(self, other): "Does a set intersection with other. Keep result in self." self._set = Numeric.bitwise_and(self._set, other._set) def union(self, other): "Does a set union with other. Keep result in self." self._set = Numeric.bitwise_or(self._set, other._set) def difference(self, other): "Does a set difference with other. Keep result in self." #self._set = Numeric.bitwise_not(self._set, other._set) for recID in Numeric.nonzero(other._set): self.remove(recID) def contains(self, recID): "Checks whether the set contains recID." return self._set[recID] __contains__ = contains # Higher performance member-test for python 2.0 and above def __getitem__(self, index): "Support for the 'for item in set:' protocol." return Numeric.nonzero(self._set)[index] def calculate_nbhits(self): "Calculates the number of records set in the hitset." self._nbhits = Numeric.sum(self._set.copy().astype(Numeric.Int)) def items(self): "Return an array containing all recID." return Numeric.nonzero(self._set) def tolist(self): "Return an array containing all recID." return Numeric.nonzero(self._set).tolist() def compare_on_val(first, second): return cmp(second[1], first[1]) def serialize_via_numeric_array_dumps(arr): return Numeric.dumps(arr) def serialize_via_numeric_array_compr(str): return zlib.compress(str) def serialize_via_numeric_array_escape(str): return MySQLdb.escape_string(str) def serialize_via_numeric_array(arr): """Serialize Numeric array into a compressed string.""" return serialize_via_numeric_array_escape(serialize_via_numeric_array_compr(serialize_via_numeric_array_dumps(arr))) def deserialize_via_numeric_array(string): """Decompress and deserialize string into a Numeric array.""" return Numeric.loads(zlib.decompress(string)) def serialize_via_marshal(obj): """Serialize Python object via marshal into a compressed string.""" return MySQLdb.escape_string(zlib.compress(marshal.dumps(obj))) def deserialize_via_marshal(string): """Decompress and deserialize string into a Python object via marshal.""" return marshal.loads(zlib.decompress(string)) def adderrorbox(header='', datalist=[]): """used to create table around main data on a page, row based""" try: perc = str(100 // len(datalist)) + '%' except ZeroDivisionError: perc = 1 output = '' output += '' % (len(datalist), header) output += '' for row in [datalist]: output += '' for data in row: output += '' output += '' output += '
%s
' % (perc, ) output += data output += '
' return output def check_term(term, col_size, term_rec, max_occ, min_occ, termlength): """Check if the term is valid for use term - the term to check col_size - the number of records in database term_rec - the number of records which contains this term max_occ - max frequency of the term allowed min_occ - min frequence of the term allowed termlength - the minimum length of the terms allowed""" try: if is_stopword(term, 1) or (len(term) <= termlength) or ((float(term_rec) / float(col_size)) >= max_occ) or ((float(term_rec) / float(col_size)) <= min_occ): return "" if int(term): return "" except StandardError, e: pass return "true" def create_rnkmethod_cache(): """Create cache with vital information for each rank method.""" global methods bibrank_meths = run_sql("SELECT name from rnkMETHOD") methods = {} global voutput voutput = "" for (rank_method_code,) in bibrank_meths: try: file = etcdir + "/bibrank/" + rank_method_code + ".cfg" config = ConfigParser.ConfigParser() config.readfp(open(file)) except StandardError, e: pass cfg_function = config.get("rank_method", "function") if config.has_section(cfg_function): methods[rank_method_code] = {} methods[rank_method_code]["function"] = cfg_function methods[rank_method_code]["prefix"] = config.get(cfg_function, "relevance_number_output_prologue") methods[rank_method_code]["postfix"] = config.get(cfg_function, "relevance_number_output_epilogue") methods[rank_method_code]["chars_alphanumericseparators"] = r"[1234567890\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~]" else: raise Exception("Error in configuration file: %s" % (etcdir + "/bibrank/" + rank_method_code + ".cfg")) i8n_names = run_sql("SELECT ln,value from rnkMETHODNAME,rnkMETHOD where id_rnkMETHOD=rnkMETHOD.id and rnkMETHOD.name='%s'" % (rank_method_code)) for (ln, value) in i8n_names: methods[rank_method_code][ln] = value if config.has_option(cfg_function, "table"): methods[rank_method_code]["rnkWORD_table"] = config.get(cfg_function, "table") methods[rank_method_code]["col_size"] = run_sql("SELECT count(*) FROM %sR" % methods[rank_method_code]["rnkWORD_table"][:-1])[0][0] if config.has_option(cfg_function, "stemming") and config.get(cfg_function, "stemming"): try: methods[rank_method_code]["stemmer"] = config.get(cfg_function, "stemming") except Exception,e: pass if config.has_option(cfg_function, "stopword"): methods[rank_method_code]["stopwords"] = config.get(cfg_function, "stopword") if config.has_section("find_similar"): methods[rank_method_code]["max_word_occurence"] = float(config.get("find_similar", "max_word_occurence")) methods[rank_method_code]["min_word_occurence"] = float(config.get("find_similar", "min_word_occurence")) methods[rank_method_code]["min_word_length"] = int(config.get("find_similar", "min_word_length")) methods[rank_method_code]["min_nr_words_docs"] = int(config.get("find_similar", "min_nr_words_docs")) methods[rank_method_code]["max_nr_words_upper"] = int(config.get("find_similar", "max_nr_words_upper")) methods[rank_method_code]["max_nr_words_lower"] = int(config.get("find_similar", "max_nr_words_lower")) methods[rank_method_code]["default_min_relevance"] = int(config.get("find_similar", "default_min_relevance")) if config.has_section("combine_method"): i = 1 methods[rank_method_code]["combine_method"] = [] while config.has_option("combine_method", "method%s" % i): methods[rank_method_code]["combine_method"].append(string.split(config.get("combine_method", "method%s" % i), ",")) i += 1 def is_method_valid(colID, rank_method_code): """Checks if a method is valid for the collection given""" enabled_colls = dict(run_sql("SELECT id_collection, score from collection_rnkMETHOD,rnkMETHOD WHERE id_rnkMETHOD=rnkMETHOD.id AND name='%s'" % rank_method_code)) colID = int(colID) if enabled_colls.has_key(colID): return 1 else: while colID: colID = run_sql("SELECT id_dad FROM collection_collection WHERE id_son=%s" % colID) if colID and enabled_colls.has_key(colID[0][0]): return 1 elif colID: colID = colID[0][0] return 0 def get_bibrank_methods(collection, ln=cdslang): """Returns a list of rank methods and the name om them in the language defined by the ln parameter, if collection is given, only methods enabled for that collection is returned.""" if not globals().has_key('methods'): create_rnkmethod_cache() avail_methods = [] for (rank_method_code, options) in methods.iteritems(): if options.has_key("function") and is_method_valid(collection, rank_method_code): if options.has_key(ln): avail_methods.append((rank_method_code, options[ln])) elif options.has_key(cdslang): avail_methods.append((rank_method_code, options[cdslang])) else: avail_methods.append((rank_method_code, rank_method_code)) return avail_methods def rank_records(rank_method_code, rank_limit_relevance, hitset_global, pattern=[], verbose=0): """rank_method_code, e.g. `jif' or `sbr' (word frequency vector model) rank_limit_relevance, e.g. `23' for `nbc' (number of citations) or `0.10' for `vec' hitset, search engine hits; pattern, search engine query or record ID (you check the type) verbose, verbose level output: list of records list of rank values prefix postfix verbose_output""" global voutput voutput = "" configcreated = "" try: hitset = copy.deepcopy(hitset_global) #we are receiving a global hitset if not globals().has_key('methods'): create_rnkmethod_cache() function = methods[rank_method_code]["function"] func_object = globals().get(function) if func_object and pattern and pattern[0][0:6] == "recid:" and function == "word_similarity": result = find_similar(rank_method_code, pattern[0][6:], hitset, rank_limit_relevance, verbose) elif rank_method_code == "cit" and pattern and pattern[0][0:6] == "recid:": # FIXME: func_object and pattern and pattern[0][0:6] == "recid:" and function == "citation": result = find_citations(rank_method_code, pattern[0][6:], hitset, verbose) elif func_object: result = func_object(rank_method_code, pattern, hitset, rank_limit_relevance, verbose) else: result = rank_by_method(rank_method_code, pattern, hitset, rank_limit_relevance, verbose) except Exception, e: result = (None, "", adderrorbox("An error occured when trying to rank the search result", ["Unexpected error: %s
Traceback:%s" % (e, traceback.format_tb(sys.exc_info()[2]))]), voutput) if result[0] and result[1]: #split into two lists for search_engine results_similar_recIDs = map(lambda x: x[0], result[0]) results_similar_relevances = map(lambda x: x[1], result[0]) result = (results_similar_recIDs, results_similar_relevances, result[1], result[2], "%s" % configcreated + result[3]) else: result = (None, None, result[1], result[2], result[3]) if verbose > 0: print string.replace(voutput, "
", "\n") return result def combine_method(rank_method_code, pattern, hitset, rank_limit_relevance,verbose): """combining several methods into one based on methods/percentage in config file""" global voutput result = {} try: for (method, percent) in methods[rank_method_code]["combine_method"]: function = methods[method]["function"] func_object = globals().get(function) percent = int(percent) if func_object: this_result = func_object(method, pattern, hitset, rank_limit_relevance, verbose)[0] else: this_result = rank_by_method(method, pattern, hitset, rank_limit_relevance, verbose)[0] for i in range(0, len(this_result)): (recID, value) = this_result[i] if value > 0: result[recID] = result.get(recID, 0) + int((float(i) / len(this_result)) * float(percent)) result = result.items() result.sort(lambda x, y: cmp(x[1], y[1])) return (result, "(", ")", voutput) except Exception, e: return (None, "Warning: %s method cannot be used for ranking your query." % rank_method_code, "", voutput) def rank_by_method(rank_method_code, lwords, hitset, rank_limit_relevance,verbose): """Ranking of records based on predetermined values. input: rank_method_code - the code of the method, from the name field in rnkMETHOD, used to get predetermined values from rnkMETHODDATA lwords - a list of words from the query hitset - a list of hits for the query found by search_engine rank_limit_relevance - show only records with a rank value above this verbose - verbose value output: reclist - a list of sorted records, with unsorted added to the end: [[23,34], [344,24], [1,01]] prefix - what to show before the rank value postfix - what to show after the rank value voutput - contains extra information, content dependent on verbose value""" global voutput rnkdict = run_sql("SELECT relevance_data FROM rnkMETHODDATA,rnkMETHOD where rnkMETHOD.id=id_rnkMETHOD and rnkMETHOD.name='%s'" % rank_method_code) if not rnkdict: return (None, "Warning: Could not load ranking data for method %s." % rank_method_code, "", voutput) lwords_hitset = None for j in range(0, len(lwords)): #find which docs to search based on ranges..should be done in search_engine... if lwords[j] and lwords[j][:6] == "recid:": if not lwords_hitset: lwords_hitset = HitSet() lword = lwords[j][6:] if string.find(lword, "->") > -1: lword = string.split(lword, "->") if int(lword[0]) >= cfg_max_recID + 1 or int(lword[1]) >= cfg_max_recID + 1: return (None, "Warning: Given record IDs are out of range.", "", voutput) for i in range(int(lword[0]), int(lword[1])): lwords_hitset.add(int(i)) elif lword < cfg_max_recID + 1: lwords_hitset.add(int(lword)) else: return (None, "Warning: Given record IDs are out of range.", "", voutput) rnkdict = deserialize_via_marshal(rnkdict[0][0]) if verbose > 0: voutput += "
Running rank method: %s, using rank_by_method function in bibrank_record_sorter
" % rank_method_code voutput += "Ranking data loaded, size of structure: %s
" % len(rnkdict) lrecIDs = hitset.items() if verbose > 0: voutput += "Number of records to rank: %s
" % len(lrecIDs) reclist = [] reclist_addend = [] if not lwords_hitset: #rank all docs, can this be speed up using something else than for loop? for recID in lrecIDs: if rnkdict.has_key(recID): reclist.append((recID, rnkdict[recID])) del rnkdict[recID] else: reclist_addend.append((recID, 0)) else: #rank docs in hitset, can this be speed up using something else than for loop? lwords_lrecIDs = lwords_hitset.items() for recID in lwords_lrecIDs: if rnkdict.has_key(recID) and hitset.contains(recID): reclist.append((recID, rnkdict[recID])) del rnkdict[recID] elif hitset.contains(recID): reclist_addend.append((recID, 0)) if verbose > 0: voutput += "Number of records ranked: %s
" % len(reclist) voutput += "Number of records not ranked: %s
" % len(reclist_addend) reclist.sort(lambda x, y: cmp(x[1], y[1])) return (reclist_addend + reclist, methods[rank_method_code]["prefix"], methods[rank_method_code]["postfix"], voutput) def find_citations(rank_method_code, recID, hitset, verbose): reclist = calculate_cited_by_list(int(recID), "a") if reclist: return (reclist,"(", ")", "Warning: citation search functionality is experimental.") else: return (reclist,"", "", "Warning: citation search functionality is experimental.") def find_similar(rank_method_code, recID, hitset, rank_limit_relevance,verbose): """Finding terms to use for calculating similarity. Terms are taken from the recid given, returns a list of recids's and relevance, input: rank_method_code - the code of the method, from the name field in rnkMETHOD recID - records to use for find similar hitset - a list of hits for the query found by search_engine rank_limit_relevance - show only records with a rank value above this verbose - verbose value output: reclist - a list of sorted records: [[23,34], [344,24], [1,01]] prefix - what to show before the rank value postfix - what to show after the rank value voutput - contains extra information, content dependent on verbose value""" startCreate = time.time() global voutput if verbose > 0: voutput += "
Running rank method: %s, using find_similar/word_frequency in bibrank_record_sorter
" % rank_method_code rank_limit_relevance = methods[rank_method_code]["default_min_relevance"] try: recID = int(recID) except Exception,e : return (None, "Warning: Error in record ID, please check that a number is given.", "", voutput) rec_terms = run_sql("SELECT termlist FROM %sR WHERE id_bibrec=%s" % (methods[rank_method_code]["rnkWORD_table"][:-1], recID)) if not rec_terms: return (None, "Warning: Requested record does not seem to exist.", "", voutput) rec_terms = deserialize_via_marshal(rec_terms[0][0]) #Get all documents using terms from the selected documents if len(rec_terms) == 0: return (None, "Warning: Record specified has no content indexed for use with this method.", "", voutput) else: terms = "%s" % rec_terms.keys() terms_recs = dict(run_sql("SELECT term, hitlist FROM %s WHERE term IN (%s)" % (methods[rank_method_code]["rnkWORD_table"], terms[1:len(terms) - 1]))) tf_values = {} #Calculate all term frequencies for (term, tf) in rec_terms.iteritems(): if len(term) >= methods[rank_method_code]["min_word_length"] and terms_recs.has_key(term) and tf[1] != 0: tf_values[term] = int((1 + math.log(tf[0])) * tf[1]) #calculate term weigth tf_values = tf_values.items() tf_values.sort(lambda x, y: cmp(y[1], x[1])) #sort based on weigth lwords = [] stime = time.time() (recdict, rec_termcount) = ({}, {}) for (t, tf) in tf_values: #t=term, tf=term frequency term_recs = deserialize_via_marshal(terms_recs[t]) if len(tf_values) <= methods[rank_method_code]["max_nr_words_lower"] or (len(term_recs) >= methods[rank_method_code]["min_nr_words_docs"] and (((float(len(term_recs)) / float(methods[rank_method_code]["col_size"])) <= methods[rank_method_code]["max_word_occurence"]) and ((float(len(term_recs)) / float(methods[rank_method_code]["col_size"])) >= methods[rank_method_code]["min_word_occurence"]))): #too complicated...something must be done lwords.append((t, methods[rank_method_code]["rnkWORD_table"])) #list of terms used (recdict, rec_termcount) = calculate_record_relevance_findsimilar((t, round(tf, 4)) , term_recs, hitset, recdict, rec_termcount, verbose, "true") #true tells the function to not calculate all unimportant terms if len(tf_values) > methods[rank_method_code]["max_nr_words_lower"] and (len(lwords) == methods[rank_method_code]["max_nr_words_upper"] or tf < 0): break if len(recdict) == 0 or len(lwords) == 0: return (None, "Could not find any similar documents, possibly because of error in ranking data.", "", voutput) else: #sort if we got something to sort (reclist, hitset) = sort_record_relevance_findsimilar(recdict, rec_termcount, hitset, rank_limit_relevance, verbose) if verbose > 0: voutput += "
Number of terms: %s
" % run_sql("SELECT count(id) FROM %s" % methods[rank_method_code]["rnkWORD_table"])[0][0] voutput += "Number of terms to use for query: %s
" % len(lwords) voutput += "Terms: %s
" % lwords voutput += "Current number of recIDs: %s
" % (methods[rank_method_code]["col_size"]) voutput += "Prepare time: %s
" % (str(time.time() - startCreate)) voutput += "Total time used: %s
" % (str(time.time() - startCreate)) rank_method_stat(rank_method_code, reclist, lwords) return (reclist[:len(reclist)], methods[rank_method_code]["prefix"], methods[rank_method_code]["postfix"], voutput) def word_similarity(rank_method_code, lwords, hitset, rank_limit_relevance,verbose): """Ranking a records containing specified words and returns a sorted list. input: rank_method_code - the code of the method, from the name field in rnkMETHOD lwords - a list of words from the query hitset - a list of hits for the query found by search_engine rank_limit_relevance - show only records with a rank value above this verbose - verbose value output: reclist - a list of sorted records: [[23,34], [344,24], [1,01]] prefix - what to show before the rank value postfix - what to show after the rank value voutput - contains extra information, content dependent on verbose value""" global voutput startCreate = time.time() if verbose > 0: voutput += "
Running rank method: %s, using word_frequency function in bibrank_record_sorter
" % rank_method_code lwords_old = lwords lwords = [] #Check terms, remove non alphanumeric characters. Use both unstemmed and stemmed version of all terms. for i in range(0, len(lwords_old)): term = string.lower(lwords_old[i]) if not methods[rank_method_code]["stopwords"] == "True" or methods[rank_method_code]["stopwords"] and not is_stopword(term, 1): lwords.append((term, methods[rank_method_code]["rnkWORD_table"])) terms = string.split(string.lower(re.sub(methods[rank_method_code]["chars_alphanumericseparators"], ' ', term))) for term in terms: if methods[rank_method_code].has_key("stemmer"): # stem word term = stem(string.replace(term, ' ', ''), methods[rank_method_code]["stemmer"]) if lwords_old[i] != term: #add if stemmed word is different than original word lwords.append((term, methods[rank_method_code]["rnkWORD_table"])) (recdict, rec_termcount, lrecIDs_remove) = ({}, {}, {}) #For each term, if accepted, get a list of the records using the term #calculate then relevance for each term before sorting the list of records for (term, table) in lwords: term_recs = run_sql("SELECT term, hitlist FROM %s WHERE term='%s'" % (methods[rank_method_code]["rnkWORD_table"], MySQLdb.escape_string(term))) if term_recs: #if term exists in database, use for ranking term_recs = deserialize_via_marshal(term_recs[0][1]) (recdict, rec_termcount) = calculate_record_relevance((term, int(term_recs["Gi"][1])) , term_recs, hitset, recdict, rec_termcount, verbose, quick=None) del term_recs if len(recdict) == 0 or (len(lwords) == 1 and lwords[0] == ""): return (None, "Records not ranked. The query is not detailed enough, or not enough records found, for ranking to be possible.", "", voutput) else: #sort if we got something to sort (reclist, hitset) = sort_record_relevance(recdict, rec_termcount, hitset, rank_limit_relevance, verbose) #Add any documents not ranked to the end of the list if hitset: hitset.calculate_nbhits() lrecIDs = hitset.tolist() #using 2-3mb reclist = zip(lrecIDs, [0] * len(lrecIDs)) + reclist #using 6mb if verbose > 0: voutput += "
Current number of recIDs: %s
" % (methods[rank_method_code]["col_size"]) voutput += "Number of terms: %s
" % run_sql("SELECT count(id) FROM %s" % methods[rank_method_code]["rnkWORD_table"])[0][0] voutput += "Terms: %s
" % lwords voutput += "Prepare and pre calculate time: %s
" % (str(time.time() - startCreate)) voutput += "Total time used: %s
" % (str(time.time() - startCreate)) rank_method_stat(rank_method_code, reclist, lwords) return (reclist, methods[rank_method_code]["prefix"], methods[rank_method_code]["postfix"], voutput) def calculate_record_relevance(term, invidx, hitset, recdict, rec_termcount, verbose, quick=None): """Calculating the relevance of the documents based on the input, calculates only one word term - (term, query term factor) the term and its importance in the overall search invidx - {recid: tf, Gi: norm value} The Gi value is used as a idf value hitset - a hitset with records that are allowed to be ranked recdict - contains currently ranked records, is returned with new values rec_termcount - {recid: count} the number of terms in this record that matches the query verbose - verbose value quick - if quick=yes only terms with a positive qtf is used, to limit the number of records to sort""" (t, qtf) = term if invidx.has_key("Gi"):#Gi = weigth for this term, created by bibrank_word_indexer Gi = invidx["Gi"][1] del invidx["Gi"] else: #if not existing, bibrank should be run with -R return (recdict, rec_termcount) if not quick or (qtf >= 0 or (qtf < 0 and len(recdict) == 0)): #Only accept records existing in the hitset received from the search engine for (j, tf) in invidx.iteritems(): if hitset.contains(j):#only include docs found by search_engine based on query try: #calculates rank value recdict[j] = recdict.get(j, 0) + int(math.log(tf[0] * Gi * tf[1] * qtf)) except: return (recdict, rec_termcount) rec_termcount[j] = rec_termcount.get(j, 0) + 1 #number of terms from query in document elif quick: #much used term, do not include all records, only use already existing ones for (j, tf) in recdict.iteritems(): #i.e: if doc contains important term, also count unimportant if invidx.has_key(j): tf = invidx[j] recdict[j] = recdict.get(j, 0) + int(math.log(tf[0] * Gi * tf[1] * qtf)) rec_termcount[j] = rec_termcount.get(j, 0) + 1 #number of terms from query in document return (recdict, rec_termcount) def calculate_record_relevance_findsimilar(term, invidx, hitset, recdict, rec_termcount, verbose, quick=None): """Calculating the relevance of the documents based on the input, calculates only one word term - (term, query term factor) the term and its importance in the overall search invidx - {recid: tf, Gi: norm value} The Gi value is used as a idf value hitset - a hitset with records that are allowed to be ranked recdict - contains currently ranked records, is returned with new values rec_termcount - {recid: count} the number of terms in this record that matches the query verbose - verbose value quick - if quick=yes only terms with a positive qtf is used, to limit the number of records to sort""" (t, qtf) = term if invidx.has_key("Gi"): #Gi = weigth for this term, created by bibrank_word_indexer Gi = invidx["Gi"][1] del invidx["Gi"] else: #if not existing, bibrank should be run with -R return (recdict, rec_termcount) if not quick or (qtf >= 0 or (qtf < 0 and len(recdict) == 0)): #Only accept records existing in the hitset received from the search engine for (j, tf) in invidx.iteritems(): if hitset.contains(j): #only include docs found by search_engine based on query #calculate rank value recdict[j] = recdict.get(j, 0) + int((1 + math.log(tf[0])) * Gi * tf[1] * qtf) rec_termcount[j] = rec_termcount.get(j, 0) + 1 #number of terms from query in document elif quick: #much used term, do not include all records, only use already existing ones for (j, tf) in recdict.iteritems(): #i.e: if doc contains important term, also count unimportant if invidx.has_key(j): tf = invidx[j] recdict[j] = recdict[j] + int((1 + math.log(tf[0])) * Gi * tf[1] * qtf) rec_termcount[j] = rec_termcount.get(j, 0) + 1 #number of terms from query in document return (recdict, rec_termcount) def sort_record_relevance(recdict, rec_termcount, hitset, rank_limit_relevance, verbose): """Sorts the dictionary and returns records with a relevance higher than the given value. recdict - {recid: value} unsorted rank_limit_relevance - a value > 0 usually verbose - verbose value""" startCreate = time.time() global voutput reclist = [] #remove all ranked documents so that unranked can be added to the end hitset.removemany(recdict.keys()) #gives each record a score between 0-100 divideby = max(recdict.values()) for (j, w) in recdict.iteritems(): w = int(w * 100 / divideby) if w >= rank_limit_relevance: reclist.append((j, w)) #sort scores reclist.sort(lambda x, y: cmp(x[1], y[1])) if verbose > 0: voutput += "Number of records sorted: %s
" % len(reclist) voutput += "Sort time: %s
" % (str(time.time() - startCreate)) return (reclist, hitset) def sort_record_relevance_findsimilar(recdict, rec_termcount, hitset, rank_limit_relevance, verbose): """Sorts the dictionary and returns records with a relevance higher than the given value. recdict - {recid: value} unsorted rank_limit_relevance - a value > 0 usually verbose - verbose value""" startCreate = time.time() global voutput reclist = [] #Multiply with the number of terms of the total number of terms in the query existing in the records for j in recdict.keys(): if recdict[j] > 0 and rec_termcount[j] > 1: recdict[j] = math.log((recdict[j] * rec_termcount[j])) else: recdict[j] = 0 hitset.removemany(recdict.keys()) #gives each record a score between 0-100 divideby = max(recdict.values()) for (j, w) in recdict.iteritems(): w = int(w * 100 / divideby) if w >= rank_limit_relevance: reclist.append((j, w)) #sort scores reclist.sort(lambda x, y: cmp(x[1], y[1])) if verbose > 0: voutput += "Number of records sorted: %s
" % len(reclist) voutput += "Sort time: %s
" % (str(time.time() - startCreate)) return (reclist, hitset) def rank_method_stat(rank_method_code, reclist, lwords): """Shows some statistics about the searchresult. rank_method_code - name field from rnkMETHOD reclist - a list of sorted and ranked records lwords - the words in the query""" global voutput if len(reclist) > 20: j = 20 else: j = len(reclist) voutput += "
Rank statistics:
" for i in range(1, j + 1): voutput += "%s,Recid:%s,Score:%s
" % (i,reclist[len(reclist) - i][0],reclist[len(reclist) - i][1]) for (term, table) in lwords: term_recs = run_sql("SELECT hitlist FROM %s WHERE term='%s'" % (table, term)) if term_recs: term_recs = deserialize_via_marshal(term_recs[0][0]) if term_recs.has_key(reclist[len(reclist) - i][0]): voutput += "%s-%s / " % (term, term_recs[reclist[len(reclist) - i][0]]) voutput += "
" voutput += "
Score variation:
" count = {} for i in range(0, len(reclist)): count[reclist[i][1]] = count.get(reclist[i][1], 0) + 1 i = 100 while i >= 0: if count.has_key(i): voutput += "%s-%s
" % (i, count[i]) i -= 1 try: import psyco psyco.bind(find_similar) psyco.bind(rank_by_method) psyco.bind(calculate_record_relevance) psyco.bind(post_calculate_record_relevance) psyco.bind(word_similarity) psyco.bind(sort_record_relevance) psyco.bind(serialize_via_numeric_array) psyco.bind(serialize_via_marshal) psyco.bind(deserialize_via_numeric_array) psyco.bind(deserialize_via_marshal) except StandardError, e: pass diff --git a/modules/bibrank/lib/bibrank_record_sorter_tests.py b/modules/bibrank/lib/bibrank_record_sorter_tests.py index 43e10871e..62e684460 100644 --- a/modules/bibrank/lib/bibrank_record_sorter_tests.py +++ b/modules/bibrank/lib/bibrank_record_sorter_tests.py @@ -1,55 +1,56 @@ # -*- coding: utf-8 -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Unit tests for the ranking engine.""" __version__ = "$Id$" -import bibrank_record_sorter import unittest -from search_engine import HitSet + +from cdsware import bibrank_record_sorter +from cdsware.search_engine import HitSet class TestListSetOperations(unittest.TestCase): """Test list set operations.""" def test_record_sorter(self): """bibrank record sorter - sorting records""" hitset = HitSet() hitset.addlist((1,2,5)) hitset2 = HitSet() hitset2.add(5) rec_termcount = {1: 1, 2: 1, 5: 1} (res1, res2) = bibrank_record_sorter.sort_record_relevance({1: 50, 2:30, 3:70,4:10},rec_termcount,hitset, 50,0) self.assertEqual(([(1, 71), (3, 100)], hitset2.tolist()), (res1, res2.tolist())) def test_calculate_record_relevance(self): """bibrank record sorter - calculating relevances""" hitset = HitSet() hitset.addlist((1,2,5)) self.assertEqual(({1: 7, 2: 7, 5: 5}, {1: 1, 2: 1, 5: 1}), bibrank_record_sorter.calculate_record_relevance(("testterm", 2.0), {"Gi":(0, 50.0), 1: (3, 4.0), 2: (4, 5.0), 5: (1, 3.5)}, hitset, {}, {}, 0, None)) def create_test_suite(): """Return test suite for the indexing engine.""" return unittest.TestSuite((unittest.makeSuite(TestListSetOperations,'test'),)) if __name__ == "__main__": unittest.TextTestRunner(verbosity=2).run(create_test_suite()) diff --git a/modules/bibrank/lib/bibrank_tag_based_indexer.py b/modules/bibrank/lib/bibrank_tag_based_indexer.py index 63f27bd6e..32c05aa95 100644 --- a/modules/bibrank/lib/bibrank_tag_based_indexer.py +++ b/modules/bibrank/lib/bibrank_tag_based_indexer.py @@ -1,595 +1,595 @@ # -*- coding: utf-8 -*- ## $Id$ ## Ranking of records using different parameters and methods. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __version__ = "$Id$" from marshal import loads,dumps from zlib import compress,decompress from string import split,translate,lower,upper import getopt import getpass import string import os import sre import sys import time import MySQLdb import Numeric import urllib import signal import tempfile import unicodedata import traceback import cStringIO import re import copy import types import ConfigParser -from config import * -from search_engine_config import cfg_max_recID -from search_engine import perform_request_search, strip_accents -from search_engine import HitSet, get_index_id, create_basic_search_units -from bibrank_citation_indexer import get_citation_weight -from bibrank_downloads_indexer import * -from dbquery import run_sql +from cdsware.config import * +from cdsware.search_engine_config import cfg_max_recID +from cdsware.search_engine import perform_request_search, strip_accents +from cdsware.search_engine import HitSet, get_index_id, create_basic_search_units +from cdsware.bibrank_citation_indexer import get_citation_weight +from cdsware.bibrank_downloads_indexer import * +from cdsware.dbquery import run_sql options = {} def citation_exec(rank_method_code, name, config): """Creating the rank method data for citation""" dict = get_citation_weight(rank_method_code, config) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) if dict: intoDB(dict, date, rank_method_code) else: print "no need to update the indexes for citations" def single_tag_rank_method_exec(rank_method_code, name, config): """Creating the rank method data""" startCreate = time.time() rnkset = {} rnkset_old = fromDB(rank_method_code) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) rnkset_new = single_tag_rank(config) rnkset = union_dicts(rnkset_old, rnkset_new) intoDB(rnkset, date, rank_method_code) def download_weight_filtering_user(row,run): return bibrank_engine(row,run) def download_weight_total(row,run): return bibrank_engine(row,run) def file_similarity_by_times_downloaded(row,run): return bibrank_engine(row,run) def download_weight_filtering_user_exec (rank_method_code, name, config): """Ranking by number of downloads per User. Only one full Text Download is taken in account for one specific userIP address""" time1 = time.time() dic = fromDB(rank_method_code) last_updated = get_lastupdated(rank_method_code) keys = new_downloads_to_index(last_updated) filter_downloads_per_hour(keys,last_updated) dic = get_download_weight_filtering_user(dic, keys) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) intoDB(dic, date, rank_method_code) time2 = time.time() return {"time":time2-time1} def download_weight_total_exec(rank_method_code, name, config): """rankink by total number of downloads without check the user ip if users downloads 3 time the same full text document it has to be count as 3 downloads""" time1 = time.time() dic = fromDB(rank_method_code) last_updated = get_lastupdated(rank_method_code) keys = new_downloads_to_index(last_updated) filter_downloads_per_hour(keys,last_updated) dic = get_download_weight_total(dic, keys) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) intoDB(dic, date, rank_method_code) time2 = time.time() return {"time":time2-time1} def file_similarity_by_times_downloaded_exec(rank_method_code, name, config): """update dictionnary {recid:[(recid,nb page similarity),()..]}""" time1 = time.time() dic = fromDB(rank_method_code) last_updated = get_lastupdated(rank_method_code) keys = new_downloads_to_index(last_updated) filter_downloads_per_hour(keys,last_updated) dic = get_file_similarity_by_times_downloaded(dic, keys) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) intoDB(dic, date, rank_method_code) time2 = time.time() return {"time":time2-time1} def single_tag_rank_method_exec(rank_method_code, name, config): """Creating the rank method data""" startCreate = time.time() rnkset = {} rnkset_old = fromDB(rank_method_code) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) rnkset_new = single_tag_rank(config) rnkset = union_dicts(rnkset_old, rnkset_new) intoDB(rnkset, date, rank_method_code) def single_tag_rank(config): """Connect the given tag with the data from the kb file given""" if options["verbose"] >= 9: write_message("Loading knowledgebase file") kb_data = {} records = [] write_message("Reading knowledgebase file: %s" % config.get(config.get("rank_method", "function"), "kb_src")) input = open(config.get(config.get("rank_method", "function"), "kb_src"), 'r') data = input.readlines() for line in data: if not line[0:1] == "#": kb_data[string.strip((string.split(string.strip(line),"---"))[0])] = (string.split(string.strip(line), "---"))[1] write_message("Number of lines read from knowledgebase file: %s" % len(kb_data)) tag = config.get(config.get("rank_method", "function"),"tag") tags = split(config.get(config.get("rank_method", "function"), "check_mandatory_tags"),",") if tags == ['']: tags = "" records = [] for (recids,recide) in options["recid_range"]: write_message("......Processing records #%s-%s" % (recids, recide)) recs = run_sql("SELECT id_bibrec,value FROM bib%sx,bibrec_bib%sx WHERE tag='%s' AND id_bibxxx=id and id_bibrec >=%s and id_bibrec<=%s" % (tag[0:2], tag[0:2], tag, recids, recide)) valid = HitSet(Numeric.ones(cfg_max_recID + 1)) for key in tags: newset = HitSet() newset.addlist(run_sql("SELECT id_bibrec FROM bib%sx,bibrec_bib%sx WHERE id_bibxxx=id AND tag='%s' AND id_bibxxx=id and id_bibrec >=%s and id_bibrec<=%s" % (tag[0:2], tag[0:2], key, recids, recide))) valid.intersect(newset) if tags: recs = filter(lambda x: valid.contains(x[0]), recs) records = records + list(recs) write_message("Number of records found with the necessary tags: %s" % len(records)) records = filter(lambda x: options["validset"].contains(x[0]), records) rnkset = {} for key,value in records: if kb_data.has_key(value): if not rnkset.has_key(key): rnkset[key] = float(kb_data[value]) else: if kb_data.has_key(rnkset[key]) and float(kb_data[value]) > float((rnkset[key])[1]): rnkset[key] = float(kb_data[value]) else: rnkset[key] = 0 write_message("Number of records available in rank method: %s" % len(rnkset)) return rnkset def get_lastupdated(rank_method_code): """Get the last time the rank method was updated""" res = run_sql("SELECT rnkMETHOD.last_updated FROM rnkMETHOD WHERE name='%s'" % rank_method_code) if res: return res[0][0] else: raise Exception("Is this the first run? Please do a complete update.") def intoDB(dict, date, rank_method_code): """Insert the rank method data into the database""" id = run_sql("SELECT id from rnkMETHOD where name='%s'" % rank_method_code) del_rank_method_codeDATA(rank_method_code) run_sql("INSERT INTO rnkMETHODDATA(id_rnkMETHOD, relevance_data) VALUES ('%s','%s')" % (id[0][0], serialize_via_marshal(dict))) run_sql("UPDATE rnkMETHOD SET last_updated='%s' WHERE name='%s'" % (date, rank_method_code)) def fromDB(rank_method_code): """Get the data for a rank method""" id = run_sql("SELECT id from rnkMETHOD where name='%s'" % rank_method_code) res = run_sql("SELECT relevance_data FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s" % id[0][0]) if res: return deserialize_via_marshal(res[0][0]) else: return {} def del_rank_method_codeDATA(rank_method_code): """Delete the data for a rank method""" id = run_sql("SELECT id from rnkMETHOD where name='%s'" % rank_method_code) res = run_sql("DELETE FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s" % id[0][0]) def del_recids(rank_method_code, range): """Delete some records from the rank method""" id = run_sql("SELECT id from rnkMETHOD where name='%s'" % rank_method_code) res = run_sql("SELECT relevance_data FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s" % id[0][0]) if res: rec_dict = deserialize_via_marshal(res[0][0]) write_message("Old size: %s" % len(rec_dict)) for (recids,recide) in range: for i in range(int(recids), int(recide)): if rec_dict.has_key(i): del rec_dict[i] write_messag("New size: %s" % len(rec_dict)) date = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) intoDB(rec_dict, date, rank_method_code) else: print "Create before deleting!" def union_dicts(dict1, dict2): "Returns union of the two dicts." union_dict = {} for (key, value) in dict1.iteritems(): union_dict[key] = value for (key, value) in dict2.iteritems(): union_dict[key] = value return union_dict def rank_method_code_statistics(rank_method_code): """Print statistics""" method = fromDB(rank_method_code) max = ('',-999999) maxcount = 0 min = ('',999999) mincount = 0 for (recID, value) in method.iteritems(): if value < min and value > 0: min = value if value > max: max = value for (recID, value) in method.iteritems(): if value == min: mincount += 1 if value == max: maxcount += 1 write_message("Showing statistic for selected method") write_message("Method name: %s" % getName(rank_method_code)) write_message("Short name: %s" % rank_method_code) write_message("Last run: %s" % get_lastupdated(rank_method_code)) write_message("Number of records: %s" % len(method)) write_message("Lowest value: %s - Number of records: %s" % (min, mincount)) write_message("Highest value: %s - Number of records: %s" % (max, maxcount)) write_message("Divided into 10 sets:") for i in range(1,11): setcount = 0 distinct_values = {} lower = -1.0 + ((float(max + 1) / 10)) * (i - 1) upper = -1.0 + ((float(max + 1) / 10)) * i for (recID, value) in method.iteritems(): if value >= lower and value <= upper: setcount += 1 distinct_values[value] = 1 write_message("Set %s (%s-%s) %s Distinct values: %s" % (i, lower, upper, len(distinct_values), setcount)) def check_method(rank_method_code): write_message("Checking rank method...") if len(fromDB(rank_method_code)) == 0: write_message("Rank method not yet executed, please run it to create the necessary data.") else: if len(add_recIDs_by_date(rank_method_code)) > 0: write_message("Records modified, update recommended") else: write_message("No records modified, update not necessary") def write_message(msg, stream = sys.stdout): """Write message and flush output stream (may be sys.stdout or sys.stderr). Useful for debugging stuff.""" if stream == sys.stdout or stream == sys.stderr: stream.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())) stream.write("%s\n" % msg) stream.flush() else: sys.stderr.write("Unknown stream %s. [must be sys.stdout or sys.stderr]\n" % stream) return def get_datetime(var, format_string="%Y-%m-%d %H:%M:%S"): """Returns a date string according to the format string. It can handle normal date strings and shifts with respect to now.""" date = time.time() shift_re = sre.compile("([-\+]{0,1})([\d]+)([dhms])") factors = {"d":24*3600, "h":3600, "m":60, "s":1} m = shift_re.match(var) if m: sign = m.groups()[0] == "-" and -1 or 1 factor = factors[m.groups()[2]] value = float(m.groups()[1]) date = time.localtime(date + sign * factor * value) date = time.strftime(format_string, date) else: date = time.strptime(var, format_string) date = time.strftime(format_string, date) return date def task_sig_sleep(sig, frame): """Signal handler for the 'sleep' signal sent by BibSched.""" if options["verbose"]>= 9: write_message("got signal %d" % sig) write_message("sleeping...") task_update_status("SLEEPING") signal.pause() # wait for wake-up signal def task_sig_wakeup(sig, frame): """Signal handler for the 'wakeup' signal sent by BibSched.""" if options["verbose"]>= 9: write_message("got signal %d" % sig) write_message("continuing...") task_update_status("CONTINUING") def task_sig_stop(sig, frame): """Signal handler for the 'stop' signal sent by BibSched.""" if options["verbose"]>= 9: write_message("got signal %d" % sig) write_message("stopping...") task_update_status("STOPPING") errcode = 0 try: task_sig_stop_commands() write_message("stopped") task_update_status("STOPPED") except StandardError, err: write_message("Error during stopping! %e" % err) task_update_status("STOPPINGFAILED") errcode = 1 sys.exit(errcode) def task_sig_stop_commands(): """Do all the commands necessary to stop the task before quitting. Useful for task_sig_stop() handler. """ write_message("stopping commands started") write_message("stopping commands ended") def task_sig_suicide(sig, frame): """Signal handler for the 'suicide' signal sent by BibSched.""" if options["verbose"]>= 9: write_message("got signal %d" % sig) write_message("suiciding myself now...") task_update_status("SUICIDING") write_message("suicided") task_update_status("SUICIDED") sys.exit(0) def task_sig_unknown(sig, frame): """Signal handler for the other unknown signals sent by shell or user.""" if options["verbose"]>= 9: write_message("got signal %d" % sig) write_message("unknown signal %d ignored" % sig) # do nothing for other signals def task_update_progress(msg): """Updates progress information in the BibSched task table.""" query = "UPDATE schTASK SET progress='%s' where id=%d" % (MySQLdb.escape_string(msg), task_id) if options["verbose"]>= 9: write_message(query) run_sql(query) return def task_update_status(val): """Updates state information in the BibSched task table.""" query = "UPDATE schTASK SET status='%s' where id=%d" % (MySQLdb.escape_string(val), task_id) if options["verbose"]>= 9: write_message(query) run_sql(query) return def split_ranges(parse_string): recIDs = [] ranges = string.split(parse_string, ",") for range in ranges: tmp_recIDs = string.split(range, "-") if len(tmp_recIDs)==1: recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[0])]) else: if int(tmp_recIDs[0]) > int(tmp_recIDs[1]): # sanity check tmp = tmp_recIDs[0] tmp_recIDs[0] = tmp_recIDs[1] tmp_recIDs[1] = tmp recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[1])]) return recIDs def bibrank_engine(row, run): """Run the indexing task. The row argument is the BibSched task queue row, containing if, arguments, etc. Return 1 in case of success and 0 in case of failure. """ try: import psyco psyco.bind(single_tag_rank) psyco.bind(single_tag_rank_method_exec) psyco.bind(serialize_via_numeric_array) psyco.bind(deserialize_via_numeric_array) except StandardError, e: print "Psyco ERROR",e startCreate = time.time() global options, task_id task_id = row[0] task_proc = row[1] options = loads(row[6]) task_starting_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) signal.signal(signal.SIGUSR1, task_sig_sleep) signal.signal(signal.SIGTERM, task_sig_stop) signal.signal(signal.SIGABRT, task_sig_suicide) signal.signal(signal.SIGCONT, task_sig_wakeup) signal.signal(signal.SIGINT, task_sig_unknown) sets = {} try: options["run"] = [] options["run"].append(run) for rank_method_code in options["run"]: cfg_name = getName(rank_method_code) if options["verbose"] >= 0: write_message("Running rank method: %s." % cfg_name) file = etcdir + "/bibrank/" + rank_method_code + ".cfg" config = ConfigParser.ConfigParser() try: config.readfp(open(file)) except StandardError, e: write_message("Cannot find configurationfile: %s" % file, sys.stderr) raise StandardError cfg_short = rank_method_code cfg_function = config.get("rank_method", "function") + "_exec" cfg_name = getName(cfg_short) options["validset"] = get_valid_range(rank_method_code) if options["collection"]: l_of_colls = string.split(options["collection"], ",") recIDs = perform_request_search(c=l_of_colls) recIDs_range = [] for recID in recIDs: recIDs_range.append([recID,recID]) options["recid_range"] = recIDs_range elif options["id"]: options["recid_range"] = options["id"] elif options["modified"]: options["recid_range"] = add_recIDs_by_date(rank_method_code, options["modified"]) elif options["last_updated"]: options["recid_range"] = add_recIDs_by_date(rank_method_code) else: if options["verbose"] > 1: write_message("No records specified, updating all") min_id = run_sql("SELECT min(id) from bibrec")[0][0] max_id = run_sql("SELECT max(id) from bibrec")[0][0] options["recid_range"] = [[min_id, max_id]] if options["quick"] == "no" and options["verbose"] >= 9: write_message("Recalculate parameter not used, parameter ignored.") if options["cmd"] == "del": del_recids(cfg_short, options["recid_range"]) elif options["cmd"] == "add": func_object = globals().get(cfg_function) func_object(rank_method_code, cfg_name, config) elif options["cmd"] == "stat": rank_method_code_statistics(rank_method_code) elif options["cmd"] == "check": check_method(rank_method_code) elif options["cmd"] == "repair": pass else: write_message("Invalid command found processing %s" % rank_method_code, sys.stderr) raise StandardError except StandardError, e: write_message("\nException caught: %s" % e, sys.stderr) if options["verbose"] >= 9: traceback.print_tb(sys.exc_info()[2]) raise StandardError if options["verbose"]: showtime((time.time() - startCreate)) return 1 def get_valid_range(rank_method_code): """Return a range of records""" if options["verbose"] >=9: write_message("Getting records from collections enabled for rank method.") res = run_sql("SELECT collection.name FROM collection,collection_rnkMETHOD,rnkMETHOD WHERE collection.id=id_collection and id_rnkMETHOD=rnkMETHOD.id and rnkMETHOD.name='%s'" % rank_method_code) l_of_colls = [] for coll in res: l_of_colls.append(coll[0]) if len(l_of_colls) > 0: recIDs = perform_request_search(c=l_of_colls) else: recIDs = [] valid = HitSet() valid.addlist(recIDs) return valid def add_recIDs_by_date(rank_method_code, dates=""): """Return recID range from records modified between DATES[0] and DATES[1]. If DATES is not set, then add records modified since the last run of the ranking method RANK_METHOD_CODE. """ if not dates: try: dates = (get_lastupdated(rank_method_code),'') except Exception, e: dates = ("0000-00-00 00:00:00", '') query = """SELECT b.id FROM bibrec AS b WHERE b.modification_date >= '%s'""" % dates[0] if dates[1]: query += "and b.modification_date <= '%s'" % dates[1] query += "ORDER BY b.id ASC""" res = run_sql(query) list = create_range_list(res) if not list: if options["verbose"]: write_message("No new records added since last time method was run") return list def getName(rank_method_code, ln=cdslang, type='ln'): """Returns the name of the method if it exists""" try: rnkid = run_sql("SELECT id FROM rnkMETHOD where name='%s'" % rank_method_code) if rnkid: rnkid = str(rnkid[0][0]) res = run_sql("SELECT value FROM rnkMETHODNAME where type='%s' and ln='%s' and id_rnkMETHOD=%s" % (type, ln, rnkid)) if not res: res = run_sql("SELECT value FROM rnkMETHODNAME WHERE ln='%s' and id_rnkMETHOD=%s and type='%s'" % (cdslang, rnkid, type)) if not res: return rank_method_code return res[0][0] else: raise Exception except Exception, e: write_message("Cannot run rank method, either given code for method is wrong, or it has not been added using the webinterface.") raise Exception def create_range_list(res): """Creates a range list from a recID select query result contained in res. The result is expected to have ascending numerical order.""" if not res: return [] row = res[0] if not row: return [] else: range_list = [[row[0],row[0]]] for row in res[1:]: id = row[0] if id == range_list[-1][1] + 1: range_list[-1][1] = id else: range_list.append([id,id]) return range_list def single_tag_rank_method(row, run): return bibrank_engine(row, run) def serialize_via_numeric_array_dumps(arr): return Numeric.dumps(arr) def serialize_via_numeric_array_compr(str): return compress(str) def serialize_via_numeric_array_escape(str): return MySQLdb.escape_string(str) def serialize_via_numeric_array(arr): """Serialize Numeric array into a compressed string.""" return serialize_via_numeric_array_escape(serialize_via_numeric_array_compr(serialize_via_numeric_array_dumps(arr))) def deserialize_via_numeric_array(string): """Decompress and deserialize string into a Numeric array.""" return Numeric.loads(decompress(string)) def serialize_via_marshal(obj): """Serialize Python object via marshal into a compressed string.""" return MySQLdb.escape_string(compress(dumps(obj))) def deserialize_via_marshal(string): """Decompress and deserialize string into a Python object via marshal.""" return loads(decompress(string)) def showtime(timeused): """Show time used for method""" if options["verbose"] >= 9: write_message("Time used: %d second(s)." % timeused) def citation(row,run): return bibrank_engine(row, run) diff --git a/modules/bibrank/lib/bibrank_tag_based_indexer_tests.py b/modules/bibrank/lib/bibrank_tag_based_indexer_tests.py index b4009caa9..fae8fe2ee 100644 --- a/modules/bibrank/lib/bibrank_tag_based_indexer_tests.py +++ b/modules/bibrank/lib/bibrank_tag_based_indexer_tests.py @@ -1,47 +1,48 @@ # -*- coding: utf-8 -*- ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Unit tests for the ranking engine.""" __lastupdated__ = """$Date$""" __version__ = "$Id$" -import bibrank_tag_based_indexer import unittest +from cdsware import bibrank_tag_based_indexer + class TestListSetOperations(unittest.TestCase): """Test list set operations.""" def test_union_dicts(self): """bibrank tag based indexer - union dicts""" self.assertEqual({1: 5, 2: 6, 3: 9, 4: 10, 10: 1}, bibrank_tag_based_indexer.union_dicts({1: 5, 2: 6, 3: 9}, {3:9, 4:10, 10: 1})) def test_split_ranges(self): """bibrank tag based indexer - split ranges""" self.assertEqual([[0, 500], [600, 1000]], bibrank_tag_based_indexer.split_ranges("0-500,600-1000")) def create_test_suite(): """Return test suite for the indexing engine.""" return unittest.TestSuite((unittest.makeSuite(TestListSetOperations,'test'),)) if __name__ == "__main__": unittest.TextTestRunner(verbosity=2).run(create_test_suite()) diff --git a/modules/bibrank/lib/bibrank_word_indexer.py b/modules/bibrank/lib/bibrank_word_indexer.py index 00477c963..fd359375b 100644 --- a/modules/bibrank/lib/bibrank_word_indexer.py +++ b/modules/bibrank/lib/bibrank_word_indexer.py @@ -1,1465 +1,1465 @@ ## $Id$ ## BibRank word frequency indexer utility. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __version__ = "$Id$" from zlib import compress,decompress from string import split,translate,lower,upper import marshal import getopt import getpass import string import os import sre import sys import time import MySQLdb import Numeric import urllib import signal import tempfile import unicodedata import traceback import cStringIO import math import re import ConfigParser -from config import * -from search_engine_config import cfg_max_recID -from search_engine import perform_request_search, strip_accents, HitSet -from dbquery import run_sql -from bibindex_engine_stemmer import is_stemmer_available_for_language, stem -from bibindex_engine_stopwords import is_stopword -from bibindex_engine_config import conv_programs, conv_programs_helpers +from cdsware.config import * +from cdsware.search_engine_config import cfg_max_recID +from cdsware.search_engine import perform_request_search, strip_accents, HitSet +from cdsware.dbquery import run_sql +from cdsware.bibindex_engine_stemmer import is_stemmer_available_for_language, stem +from cdsware.bibindex_engine_stopwords import is_stopword +from cdsware.bibindex_engine_config import conv_programs, conv_programs_helpers ## safety parameters concerning MySQL thread-multiplication problem: cfg_check_mysql_threads = 0 # to check or not to check the problem? cfg_max_mysql_threads = 50 # how many threads (connections) we consider as still safe cfg_mysql_thread_timeout = 20 # we'll kill threads that were sleeping for more than X seconds ## override urllib's default password-asking behaviour: class MyFancyURLopener(urllib.FancyURLopener): def prompt_user_passwd(self, host, realm): # supply some dummy credentials by default return ("mysuperuser", "mysuperpass") def http_error_401(self, url, fp, errcode, errmsg, headers): # do not bother with protected pages raise IOError, (999, 'unauthorized access') return None #urllib._urlopener = MyFancyURLopener() ## precompile some often-used regexp for speed reasons: re_subfields = sre.compile('\$\$\w'); nb_char_in_line = 50 # for verbose pretty printing chunksize = 1000 # default size of chunks that the records will be treated by wordTables = [] base_process_size = 4500 # process base size ## Dictionary merging functions def dict_union(list1, list2): "Returns union of the two dictionaries." union_dict = {} for (e, count) in list1.iteritems(): union_dict[e] = count for (e, count) in list2.iteritems(): if not union_dict.has_key(e): union_dict[e] = count else: union_dict[e] = (union_dict[e][0] + count[0], count[1]) #for (e, count) in list2.iteritems(): # list1[e] = (list1.get(e, (0, 0))[0] + count[0], count[1]) #return list1 return union_dict ## safety function for killing slow MySQL threads: def kill_sleepy_mysql_threads(max_threads=cfg_max_mysql_threads, thread_timeout=cfg_mysql_thread_timeout): """Check the number of MySQL threads and if there are more than MAX_THREADS of them, lill all threads that are in a sleeping state for more than THREAD_TIMEOUT seconds. (This is useful for working around the the max_connection problem that appears during indexation in some not-yet-understood cases.) If some threads are to be killed, write info into the log file. """ res = run_sql("SHOW FULL PROCESSLIST") if len(res) > max_threads: for row in res: r_id,r_user,r_host,r_db,r_command,r_time,r_state,r_info = row if r_command == "Sleep" and int(r_time) > thread_timeout: run_sql("KILL %s", (r_id,)) if options["verbose"] >= 1: write_message("WARNING: too many MySQL threads, killing thread %s" % r_id) return # tagToFunctions mapping. It offers an indirection level necesary for # indexing fulltext. The default is get_words_from_phrase tagToWordsFunctions = {} def get_words_from_phrase(phrase, weight, lang="", chars_punctuation=r"[\.\,\:\;\?\!\"]", chars_alphanumericseparators=r"[1234567890\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~]", split=string.split): "Returns list of words from phrase 'phrase'." words = {} phrase = strip_accents(phrase) phrase = lower(phrase) #Getting rid of strange characters phrase = re.sub("é", 'e', phrase) phrase = re.sub("è", 'e', phrase) phrase = re.sub("à", 'a', phrase) phrase = re.sub(" ", ' ', phrase) phrase = re.sub("«", ' ', phrase) phrase = re.sub("»", ' ', phrase) phrase = re.sub("ê", ' ', phrase) phrase = re.sub("&", ' ', phrase) if string.find(phrase, " -1: #Most likely html, remove html code phrase = re.sub("(?s)<[^>]*>|&#?\w+;", ' ', phrase) #removes http links phrase = re.sub("(?s)http://[^( )]*", '', phrase) phrase = re.sub(chars_punctuation, ' ', phrase) #By doing this like below, characters standing alone, like c a b is not added to the inedx, but when they are together with characters like c++ or c$ they are added. for word in split(phrase): if options["remove_stopword"] == "True" and not is_stopword(word, 1) and check_term(word, 0): if lang and lang !="none" and options["use_stemming"]: word = stem(word, lang) if not words.has_key(word): words[word] = (0,0) words[word] = (words[word][0] + weight, 0) elif options["remove_stopword"] == "True" and not is_stopword(word, 1): phrase = re.sub(chars_alphanumericseparators, ' ', word) for word_ in split(phrase): if lang and lang !="none" and options["use_stemming"]: word_ = stem(word_, lang) if word_: if not words.has_key(word_): words[word_] = (0,0) words[word_] = (words[word_][0] + weight, 0) return words def split_ranges(parse_string): recIDs = [] ranges = string.split(parse_string, ",") for range in ranges: tmp_recIDs = string.split(range, "-") if len(tmp_recIDs)==1: recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[0])]) else: if int(tmp_recIDs[0]) > int(tmp_recIDs[1]): # sanity check tmp = tmp_recIDs[0] tmp_recIDs[0] = tmp_recIDs[1] tmp_recIDs[1] = tmp recIDs.append([int(tmp_recIDs[0]), int(tmp_recIDs[1])]) return recIDs def get_date_range(var): "Returns the two dates contained as a low,high tuple" limits = string.split(var, ",") if len(limits)==1: low = get_date(limits[0]) return low,None if len(limits)==2: low = get_date(limits[0]) high = get_date(limits[1]) return low,high def get_datetime(var, format_string="%Y-%m-%d %H:%M:%S"): """Returns a date string according to the format string. It can handle normal date strings and shifts with respect to now.""" date = time.time() shift_re=sre.compile("([-\+]{0,1})([\d]+)([dhms])") factors = {"d":24*3600, "h":3600, "m":60, "s":1} m = shift_re.match(var) if m: sign = m.groups()[0] == "-" and -1 or 1 factor = factors[m.groups()[2]] value = float(m.groups()[1]) date = time.localtime(date + sign * factor * value) date = time.strftime(format_string, date) else: date = time.strptime(var, format_string) date = time.strftime(format_string, date) return date def create_range_list(res): """Creates a range list from a recID select query result contained in res. The result is expected to have ascending numerical order.""" if not res: return [] row = res[0] if not row: return [] else: range_list = [[row[0],row[0]]] for row in res[1:]: id = row[0] if id == range_list[-1][1] + 1: range_list[-1][1] = id else: range_list.append([id,id]) return range_list def beautify_range_list(range_list): """Returns a non overlapping, maximal range list""" ret_list = [] for new in range_list: found = 0 for old in ret_list: if new[0] <= old[0] <= new[1] + 1 or new[0] - 1 <= old[1] <= new[1]: old[0] = min(old[0], new[0]) old[1] = max(old[1], new[1]) found = 1 break if not found: ret_list.append(new) return ret_list def serialize_via_numeric_array_dumps(arr): return Numeric.dumps(arr) def serialize_via_numeric_array_compr(str): return compress(str) def serialize_via_numeric_array(arr): """Serialize Numeric array into a compressed string.""" return serialize_via_numeric_array_compr(serialize_via_numeric_array_dumps(arr)) def deserialize_via_numeric_array(string): """Decompress and deserialize string into a Numeric array.""" return Numeric.loads(decompress(string)) def serialize_via_marshal(obj): """Serialize Python object via marshal into a compressed string.""" return MySQLdb.escape_string(compress(marshal.dumps(obj))) def deserialize_via_marshal(string): """Decompress and deserialize string into a Python object via marshal.""" return marshal.loads(decompress(string)) class WordTable: "A class to hold the words table." def __init__(self, tablename, fields_to_index, separators="[^\s]"): "Creates words table instance." self.tablename = tablename self.recIDs_in_mem = [] self.fields_to_index = fields_to_index self.separators = separators self.value = {} def get_field(self, recID, tag): """Returns list of values of the MARC-21 'tag' fields for the record 'recID'.""" out = [] bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx query = """SELECT value FROM %s AS b, %s AS bb WHERE bb.id_bibrec=%s AND bb.id_bibxxx=b.id AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID, tag); res = run_sql(query) for row in res: out.append(row[0]) return out def clean(self): "Cleans the words table." self.value={} def put_into_db(self, mode="normal", split=string.split): """Updates the current words table in the corresponding MySQL's rnkWORD table. Mode 'normal' means normal execution, mode 'emergency' means words index reverting to old state. """ if options["verbose"]: write_message("%s %s wordtable flush started" % (self.tablename,mode)) write_message('...updating %d words into %sR started' % \ (len(self.value), self.tablename[:-1])) task_update_progress("%s flushed %d/%d words" % (self.tablename, 0, len(self.value))) self.recIDs_in_mem = beautify_range_list(self.recIDs_in_mem) if mode == "normal": for group in self.recIDs_in_mem: query = """UPDATE %sR SET type='TEMPORARY' WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='CURRENT'""" % \ (self.tablename[:-1], group[0], group[1]) if options["verbose"] >= 9: write_message(query) run_sql(query) nb_words_total = len(self.value) nb_words_report = int(nb_words_total/10) nb_words_done = 0 for word in self.value.keys(): self.put_word_into_db(word, self.value[word]) nb_words_done += 1 if nb_words_report!=0 and ((nb_words_done % nb_words_report) == 0): if options["verbose"]: write_message('......processed %d/%d words' % (nb_words_done, nb_words_total)) task_update_progress("%s flushed %d/%d words" % (self.tablename, nb_words_done, nb_words_total)) if options["verbose"] >= 9: write_message('...updating %d words into %s ended' % \ (nb_words_total, self.tablename)) #if options["verbose"]: # write_message('...updating reverse table %sR started' % self.tablename[:-1]) if mode == "normal": for group in self.recIDs_in_mem: query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \ (self.tablename[:-1], group[0], group[1]) if options["verbose"] >= 9: write_message(query) run_sql(query) query = """DELETE FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \ (self.tablename[:-1], group[0], group[1]) if options["verbose"] >= 9: write_message(query) run_sql(query) if options["verbose"] >= 9: write_message('End of updating wordTable into %s' % self.tablename) elif mode == "emergency": write_message("emergency") for group in self.recIDs_in_mem: query = """UPDATE %sR SET type='CURRENT' WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='TEMPORARY'""" % \ (self.tablename[:-1], group[0], group[1]) if options["verbose"] >= 9: write_message(query) run_sql(query) query = """DELETE FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d' AND type='FUTURE'""" % \ (self.tablename[:-1], group[0], group[1]) if options["verbose"] >= 9: write_message(query) run_sql(query) if options["verbose"] >= 9: write_message('End of emergency flushing wordTable into %s' % self.tablename) #if options["verbose"]: # write_message('...updating reverse table %sR ended' % self.tablename[:-1]) self.clean() self.recIDs_in_mem = [] if options["verbose"]: write_message("%s %s wordtable flush ended" % (self.tablename, mode)) task_update_progress("%s flush ended" % (self.tablename)) def load_old_recIDs(self,word): """Load existing hitlist for the word from the database index files.""" query = "SELECT hitlist FROM %s WHERE term=%%s" % self.tablename res = run_sql(query, (word,)) if res: return deserialize_via_marshal(res[0][0]) else: return None def merge_with_old_recIDs(self,word,recIDs, set): """Merge the system numbers stored in memory (hash of recIDs with value[0] > 0 or -1 according to whether to add/delete them) with those stored in the database index and received in set universe of recIDs for the given word. Return 0 in case no change was done to SET, return 1 in case SET was changed. """ set_changed_p = 0 for recID,sign in recIDs.iteritems(): if sign[0] == -1 and set.has_key(recID): # delete recID if existent in set and if marked as to be deleted del set[recID] set_changed_p = 1 elif sign[0] > -1 and not set.has_key(recID): # add recID if not existent in set and if marked as to be added set[recID] = sign set_changed_p = 1 elif sign[0] > -1 and sign[0] != set[recID][0]: set[recID] = sign set_changed_p = 1 return set_changed_p def put_word_into_db(self, word, recIDs, split=string.split): """Flush a single word to the database and delete it from memory""" set = self.load_old_recIDs(word) #write_message("%s %s" % (word, self.value[word])) if set: # merge the word recIDs found in memory: options["modified_words"][word] = 1 if self.merge_with_old_recIDs(word, recIDs, set) == 0: # nothing to update: if options["verbose"] >= 9: write_message("......... unchanged hitlist for ``%s''" % word) pass else: # yes there were some new words: if options["verbose"] >= 9: write_message("......... updating hitlist for ``%s''" % word) run_sql("UPDATE %s SET hitlist='%s' WHERE term='%s'" % (self.tablename, serialize_via_marshal(set), MySQLdb.escape_string(word))) else: # the word is new, will create new set: if options["verbose"] >= 9: write_message("......... inserting hitlist for ``%s''" % word) set = self.value[word] if len(set) > 0: #new word, add to list options["modified_words"][word] = 1 run_sql("INSERT INTO %s (term, hitlist) VALUES ('%s', '%s')" % (self.tablename, MySQLdb.escape_string(word), serialize_via_marshal(set))) if not set: # never store empty words run_sql("DELETE from %s WHERE term=%%s" % self.tablename, (word,)) del self.value[word] def display(self): "Displays the word table." keys = self.value.keys() keys.sort() for k in keys: if options["verbose"]: write_message("%s: %s" % (k, self.value[k])) def count(self): "Returns the number of words in the table." return len(self.value) def info(self): "Prints some information on the words table." if options["verbose"]: write_message("The words table contains %d words." % self.count()) def lookup_words(self, word=""): "Lookup word from the words table." if not word: done = 0 while not done: try: word = raw_input("Enter word: ") done = 1 except (EOFError, KeyboardInterrupt): return if self.value.has_key(word): if options["verbose"]: write_message("The word '%s' is found %d times." \ % (word, len(self.value[word]))) else: if options["verbose"]: write_message("The word '%s' does not exist in the word file."\ % word) def update_last_updated(self, rank_method_code, starting_time=None): """Update last_updated column of the index table in the database. Puts starting time there so that if the task was interrupted for record download, the records will be reindexed next time.""" if starting_time is None: return None if options["verbose"] >= 9: write_message("updating last_updated to %s...", starting_time) return run_sql("UPDATE rnkMETHOD SET last_updated=%s WHERE name=%s", (starting_time, rank_method_code,)) def add_recIDs(self, recIDs): """Fetches records which id in the recIDs range list and adds them to the wordTable. The recIDs range list is of the form: [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]]. """ global chunksize flush_count = 0 records_done = 0 records_to_go = 0 for range in recIDs: records_to_go = records_to_go + range[1] - range[0] + 1 time_started = time.time() # will measure profile time for range in recIDs: i_low = range[0] chunksize_count = 0 while i_low <= range[1]: # calculate chunk group of recIDs and treat it: i_high = min(i_low+options["flush"]-flush_count-1,range[1]) i_high = min(i_low+chunksize-chunksize_count-1, i_high) try: self.chk_recID_range(i_low, i_high) except StandardError, e: write_message("Exception caught: %s" % e, sys.stderr) if options["verbose"] >= 9: traceback.print_tb(sys.exc_info()[2]) task_update_status("ERROR") task_sig_stop_commands() sys.exit(1) if options["verbose"]: write_message("%s adding records #%d-#%d started" % \ (self.tablename, i_low, i_high)) if cfg_check_mysql_threads: kill_sleepy_mysql_threads() task_update_progress("%s adding recs %d-%d" % (self.tablename, i_low, i_high)) self.del_recID_range(i_low, i_high) just_processed = self.add_recID_range(i_low, i_high) flush_count = flush_count + i_high - i_low + 1 chunksize_count = chunksize_count + i_high - i_low + 1 records_done = records_done + just_processed if options["verbose"]: write_message("%s adding records #%d-#%d ended " % \ (self.tablename, i_low, i_high)) if chunksize_count >= chunksize: chunksize_count = 0 # flush if necessary: if flush_count >= options["flush"]: self.put_into_db() self.clean() if options["verbose"]: write_message("%s backing up" % (self.tablename)) flush_count = 0 self.log_progress(time_started,records_done,records_to_go) # iterate: i_low = i_high + 1 if flush_count > 0: self.put_into_db() self.log_progress(time_started,records_done,records_to_go) def add_recIDs_by_date(self, dates=""): """Add recIDs modified between DATES[0] and DATES[1]. If DATES is not set, then add records modified since the last run of the ranking method. """ if not dates: write_message("Using the last update time for the rank method") query = """SELECT last_updated FROM rnkMETHOD WHERE name='%s' """ % options["current_run"] res = run_sql(query) if not res: return if not res[0][0]: dates = ("0000-00-00",'') else: dates = (res[0][0],'') query = """SELECT b.id FROM bibrec AS b WHERE b.modification_date >= '%s'""" % dates[0] if dates[1]: query += "and b.modification_date <= '%s'" % dates[1] query += "ORDER BY b.id ASC""" res = run_sql(query) list = create_range_list(res) if not list: if options["verbose"]: write_message( "No new records added. %s is up to date" % self.tablename) else: self.add_recIDs(list) return list def add_recID_range(self, recID1, recID2): empty_list_string = serialize_via_marshal([]) wlist = {} normalize = {} self.recIDs_in_mem.append([recID1,recID2]) # secondly fetch all needed tags: for (tag, weight, lang) in self.fields_to_index: if tag in tagToWordsFunctions.keys(): get_words_function = tagToWordsFunctions[ tag ] else: get_words_function = get_words_from_phrase bibXXx = "bib" + tag[0] + tag[1] + "x" bibrec_bibXXx = "bibrec_" + bibXXx query = """SELECT bb.id_bibrec,b.value FROM %s AS b, %s AS bb WHERE bb.id_bibrec BETWEEN %d AND %d AND bb.id_bibxxx=b.id AND tag LIKE '%s'""" % (bibXXx, bibrec_bibXXx, recID1, recID2, tag) res = run_sql(query) nb_total_to_read = len(res) verbose_idx = 0 # for verbose pretty printing for row in res: recID, phrase = row if options["validset"].contains(recID): if not wlist.has_key(recID): wlist[recID] = {} new_words = get_words_function(phrase, weight, lang) # ,self.separators wlist[recID] = dict_union(new_words,wlist[recID]) # were there some words for these recIDs found? if len(wlist) == 0: return 0 recIDs = wlist.keys() for recID in recIDs: # was this record marked as deleted? if "DELETED" in self.get_field(recID, "980__c"): wlist[recID] = {} if options["verbose"] >= 9: write_message("... record %d was declared deleted, removing its word list" % recID) if options["verbose"] >= 9: write_message("... record %d, termlist: %s" % (recID, wlist[recID])) query_factory = cStringIO.StringIO() qwrite = query_factory.write qwrite( "INSERT INTO %sR (id_bibrec,termlist,type) VALUES" % self.tablename[:-1]) qwrite( "('" ) qwrite( str(recIDs[0]) ) qwrite( "','" ) qwrite( serialize_via_marshal(wlist[recIDs[0]]) ) qwrite( "','FUTURE')" ) for recID in recIDs[1:]: qwrite(",('") qwrite(str(recID)) qwrite("','") qwrite(serialize_via_marshal(wlist[recID])) qwrite("','FUTURE')") query = query_factory.getvalue() query_factory.close() run_sql(query) query_factory = cStringIO.StringIO() qwrite = query_factory.write qwrite("INSERT INTO %sR (id_bibrec,termlist,type) VALUES" % self.tablename[:-1]) qwrite("('") qwrite(str(recIDs[0])) qwrite("','") qwrite(serialize_via_marshal(wlist[recIDs[0]])) qwrite("','CURRENT')") for recID in recIDs[1:]: qwrite( ",('" ) qwrite( str(recID) ) qwrite( "','" ) qwrite( empty_list_string ) qwrite( "','CURRENT')" ) query = query_factory.getvalue() query_factory.close() try: run_sql(query) except MySQLdb.DatabaseError: pass put = self.put for recID in recIDs: for (w, count) in wlist[recID].iteritems(): put(recID, w, count) return len(recIDs) def log_progress(self, start, done, todo): """Calculate progress and store it. start: start time, done: records processed, todo: total number of records""" time_elapsed = time.time() - start # consistency check if time_elapsed == 0 or done > todo: return time_recs_per_min = done/(time_elapsed/60.0) if options["verbose"]: write_message("%d records took %.1f seconds to complete.(%1.f recs/min)"\ % (done, time_elapsed, time_recs_per_min)) if time_recs_per_min: if options["verbose"]: write_message("Estimated runtime: %.1f minutes" % \ ((todo-done)/time_recs_per_min)) def put(self, recID, word, sign): "Adds/deletes a word to the word list." try: word = lower(word[:50]) if self.value.has_key(word): # the word 'word' exist already: update sign self.value[word][recID] = sign # PROBLEM ? else: self.value[word] = {recID: sign} except: write_message("Error: Cannot put word %s with sign %d for recID %s." % (word, sign, recID)) def del_recIDs(self, recIDs): """Fetches records which id in the recIDs range list and adds them to the wordTable. The recIDs range list is of the form: [[i1_low,i1_high],[i2_low,i2_high], ..., [iN_low,iN_high]]. """ count = 0 for range in recIDs: self.del_recID_range(range[0],range[1]) count = count + range[1] - range[0] self.put_into_db() def del_recID_range(self, low, high): """Deletes records with 'recID' system number between low and high from memory words index table.""" if options["verbose"] > 2: write_message("%s fetching existing words for records #%d-#%d started" % \ (self.tablename, low, high)) self.recIDs_in_mem.append([low,high]) query = """SELECT id_bibrec,termlist FROM %sR as bb WHERE bb.id_bibrec BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high) recID_rows = run_sql(query) for recID_row in recID_rows: recID = recID_row[0] wlist = deserialize_via_marshal(recID_row[1]) for word in wlist: self.put(recID, word, (-1, 0)) if options["verbose"] > 2: write_message("%s fetching existing words for records #%d-#%d ended" % \ (self.tablename, low, high)) def report_on_table_consistency(self): """Check reverse words index tables (e.g. rnkWORD01R) for interesting states such as 'TEMPORARY' state. Prints small report (no of words, no of bad words). """ # find number of words: query = """SELECT COUNT(*) FROM %s""" % (self.tablename) res = run_sql(query, None, 1) if res: nb_words = res[0][0] else: nb_words = 0 # find number of records: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1]) res = run_sql(query, None, 1) if res: nb_records = res[0][0] else: nb_records = 0 # report stats: if options["verbose"]: write_message("%s contains %d words from %d records" % (self.tablename, nb_words, nb_records)) # find possible bad states in reverse tables: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1]) res = run_sql(query) if res: nb_bad_records = res[0][0] else: nb_bad_records = 999999999 if nb_bad_records: write_message("EMERGENCY: %s needs to repair %d of %d records" % \ (self.tablename, nb_bad_records, nb_records)) else: if options["verbose"]: write_message("%s is in consistent state" % (self.tablename)) return nb_bad_records def repair(self): """Repair the whole table""" # find possible bad states in reverse tables: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR WHERE type <> 'CURRENT'""" % (self.tablename[:-1]) res = run_sql(query, None, 1) if res: nb_bad_records = res[0][0] else: nb_bad_records = 0 # find number of records: query = """SELECT COUNT(DISTINCT(id_bibrec)) FROM %sR""" % (self.tablename[:-1]) res = run_sql(query) if res: nb_records = res[0][0] else: nb_records = 0 if nb_bad_records == 0: return query = """SELECT id_bibrec FROM %sR WHERE type <> 'CURRENT' ORDER BY id_bibrec""" \ % (self.tablename[:-1]) res = run_sql(query) recIDs = create_range_list(res) flush_count = 0 records_done = 0 records_to_go = 0 for range in recIDs: records_to_go = records_to_go + range[1] - range[0] + 1 time_started = time.time() # will measure profile time for range in recIDs: i_low = range[0] chunksize_count = 0 while i_low <= range[1]: # calculate chunk group of recIDs and treat it: i_high = min(i_low+options["flush"]-flush_count-1,range[1]) i_high = min(i_low+chunksize-chunksize_count-1, i_high) try: self.fix_recID_range(i_low, i_high) except StandardError, e: write_message("Exception caught: %s" % e, sys.stderr) if options["verbose"] >= 9: traceback.print_tb(sys.exc_info()[2]) task_update_status("ERROR") task_sig_stop_commands() sys.exit(1) flush_count = flush_count + i_high - i_low + 1 chunksize_count = chunksize_count + i_high - i_low + 1 records_done = records_done + i_high - i_low + 1 if chunksize_count >= chunksize: chunksize_count = 0 # flush if necessary: if flush_count >= options["flush"]: self.put_into_db("emergency") self.clean() flush_count = 0 self.log_progress(time_started,records_done,records_to_go) # iterate: i_low = i_high + 1 if flush_count > 0: self.put_into_db("emergency") self.log_progress(time_started,records_done,records_to_go) write_message("%s inconsistencies repaired." % self.tablename) def chk_recID_range(self, low, high): """Check if the reverse index table is in proper state""" ## check db query = """SELECT COUNT(*) FROM %sR WHERE type <> 'CURRENT' AND id_bibrec BETWEEN '%d' AND '%d'""" % (self.tablename[:-1], low, high) res = run_sql(query, None, 1) if res[0][0]==0: if options["verbose"]: write_message("%s for %d-%d is in consistent state"%(self.tablename,low,high)) return # okay, words table is consistent ## inconsistency detected! write_message("EMERGENCY: %s inconsistencies detected..." % self.tablename) write_message("""EMERGENCY: Errors found. You should check consistency of the %s - %sR tables.\nRunning 'bibindex --repair' is recommended.""" \ % (self.tablename, self.tablename[:-1])) raise StandardError def fix_recID_range(self, low, high): """Try to fix reverse index database consistency (e.g. table rnkWORD01R) in the low,high doc-id range. Possible states for a recID follow: CUR TMP FUT: very bad things have happened: warn! CUR TMP : very bad things have happened: warn! CUR FUT: delete FUT (crash before flushing) CUR : database is ok TMP FUT: add TMP to memory and del FUT from memory flush (revert to old state) TMP : very bad things have happened: warn! FUT: very bad things have happended: warn! """ state = {} query = "SELECT id_bibrec,type FROM %sR WHERE id_bibrec BETWEEN '%d' AND '%d'"\ % (self.tablename[:-1], low, high) res = run_sql(query) for row in res: if not state.has_key(row[0]): state[row[0]]=[] state[row[0]].append(row[1]) ok = 1 # will hold info on whether we will be able to repair for recID in state.keys(): if not 'TEMPORARY' in state[recID]: if 'FUTURE' in state[recID]: if 'CURRENT' not in state[recID]: write_message("EMERGENCY: Record %d is in inconsistent state. Can't repair it" % recID) ok = 0 else: write_message("EMERGENCY: Inconsistency in record %d detected" % recID) query = """DELETE FROM %sR WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID) run_sql(query) write_message("EMERGENCY: Inconsistency in record %d repaired." % recID) else: if 'FUTURE' in state[recID] and not 'CURRENT' in state[recID]: self.recIDs_in_mem.append([recID,recID]) # Get the words file query = """SELECT type,termlist FROM %sR WHERE id_bibrec='%d'""" % (self.tablename[:-1], recID) if options["verbose"] >= 9: write_message(query) res = run_sql(query) for row in res: wlist = deserialize_via_marshal(row[1]) if options["verbose"] >= 9: write_message("Words are %s " % wlist) if row[0] == 'TEMPORARY': sign = 1 else: sign = -1 for word in wlist: self.put(recID, word, wlist[word]) else: write_message("EMERGENCY: %s for %d is in inconsistent state. Couldn't repair it." % (self.tablename, recID)) ok = 0 if not ok: write_message("""EMERGENCY: Unrepairable errors found. You should check consistency of the %s - %sR tables. Deleting affected records is recommended.""" % (self.tablename, self.tablename[:-1])) raise StandardError def word_index(row, run): """Run the indexing task. The row argument is the BibSched task queue row, containing if, arguments, etc. Return 1 in case of success and 0 in case of failure. """ ## import optional modules: try: import psyco psyco.bind(get_words_from_phrase) psyco.bind(WordTable.merge_with_old_recIDs) psyco.bind(serialize_via_numeric_array) psyco.bind(serialize_via_marshal) psyco.bind(deserialize_via_numeric_array) psyco.bind(deserialize_via_marshal) psyco.bind(update_rnkWORD) psyco.bind(check_rnkWORD) except StandardError,e: print "Warning: Psyco", e pass global options, task_id, wordTables, languages # read from SQL row: task_id = row[0] task_proc = row[1] options = marshal.loads(row[6]) # install signal handlers signal.signal(signal.SIGUSR1, task_sig_sleep) signal.signal(signal.SIGTERM, task_sig_stop) signal.signal(signal.SIGABRT, task_sig_suicide) signal.signal(signal.SIGCONT, task_sig_wakeup) signal.signal(signal.SIGINT, task_sig_unknown) ## go ahead and treat each table: options["run"] = [] options["run"].append(run) for rank_method_code in options["run"]: method_starting_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) write_message("Running rank method: %s" % getName(rank_method_code)) try: file = etcdir + "/bibrank/" + rank_method_code + ".cfg" config = ConfigParser.ConfigParser() config.readfp(open(file)) except StandardError, e: write_message("Cannot find configurationfile: %s" % file, sys.stderr) raise StandardError options["current_run"] = rank_method_code options["modified_words"] = {} options["table"] = config.get(config.get("rank_method", "function"), "table") options["use_stemming"] = config.get(config.get("rank_method","function"),"stemming") options["remove_stopword"] = config.get(config.get("rank_method","function"),"stopword") tags = get_tags(config) #get the tags to include options["validset"] = get_valid_range(rank_method_code) #get the records from the collections the method is enabled for function = config.get("rank_method","function") wordTable = WordTable(options["table"], tags) wordTable.report_on_table_consistency() try: if options["cmd"] == "del": if options["id"]: wordTable.del_recIDs(options["id"]) elif options["collection"]: l_of_colls = string.split(options["collection"], ",") recIDs = perform_request_search(c=l_of_colls) recIDs_range = [] for recID in recIDs: recIDs_range.append([recID,recID]) wordTable.del_recIDs(recIDs_range) else: write_message("Missing IDs of records to delete from index %s.", wordTable.tablename, sys.stderr) raise StandardError elif options["cmd"] == "add": if options["id"]: wordTable.add_recIDs(options["id"]) elif options["collection"]: l_of_colls = string.split(options["collection"], ",") recIDs = perform_request_search(c=l_of_colls) recIDs_range = [] for recID in recIDs: recIDs_range.append([recID,recID]) wordTable.add_recIDs(recIDs_range) elif options["last_updated"]: wordTable.add_recIDs_by_date("") # only update last_updated if run via automatic mode: wordTable.update_last_updated(rank_method_code, method_starting_time) elif options["modified"]: wordTable.add_recIDs_by_date(options["modified"]) else: wordTable.add_recIDs([[0,cfg_max_recID]]) elif options["cmd"] == "repair": wordTable.repair() check_rnkWORD(options["table"]) elif options["cmd"] == "check": check_rnkWORD(options["table"]) options["modified_words"] = {} elif options["cmd"] == "stat": rank_method_code_statistics(options["table"]) else: write_message("Invalid command found processing %s" % \ wordTable.tablename, sys.stderr) raise StandardError update_rnkWORD(options["table"], options["modified_words"]) except StandardError, e: write_message("Exception caught: %s" % e, sys.stderr) if options["verbose"] >= 9: traceback.print_tb(sys.exc_info()[2]) sys.exit(1) wordTable.report_on_table_consistency() # We are done. State it in the database, close and quit return 1 def get_tags(config): """Get the tags that should be used creating the index and each tag's parameter""" tags = [] function = config.get("rank_method","function") i = 1 shown_error = 0 #try: if 1: while config.has_option(function,"tag%s"% i): tag = config.get(function, "tag%s" % i) tag = string.split(tag, ",") tag[1] = int(string.strip(tag[1])) tag[2] = string.strip(tag[2]) #check if stemmer for language is available if config.get(function,"stemming") and stem("information", "en") != "inform": if shown_error == 0: write_message("Warning: PyStemmer not found. Please read INSTALL.") shown_error = 1 elif tag[2] and tag[2] != "none" and config.get(function,"stemming") and not is_stemmer_available_for_language(tag[2]): write_message("Warning: Language '%s' not available in PyStemmer." % tag[2]) tags.append(tag) i += 1 #except Exception: # write_message("Could not read data from configuration file, please check for errors") # raise StandardError return tags def get_valid_range(rank_method_code): """Returns which records are valid for this rank method, according to which collections it is enabled for.""" #if options["verbose"] >=9: # write_message("Getting records from collections enabled for rank method.") #res = run_sql("SELECT collection.name FROM collection,collection_rnkMETHOD,rnkMETHOD WHERE collection.id=id_collection and id_rnkMETHOD=rnkMETHOD.id and rnkMETHOD.name='%s'" % rank_method_code) #l_of_colls = [] #for coll in res: # l_of_colls.append(coll[0]) #if len(l_of_colls) > 0: # recIDs = perform_request_search(c=l_of_colls) #else: # recIDs = [] valid = HitSet(Numeric.ones(cfg_max_recID+1, Numeric.Int0)) #valid.addlist(recIDs) return valid def write_message(msg, stream=sys.stdout): """Prints message and flush output stream (may be sys.stdout or sys.stderr).""" if stream == sys.stdout or stream == sys.stderr: stream.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())) stream.write("%s\n" % msg) stream.flush() else: sys.stderr.write("Unknown stream %s. [must be sys.stdout or sys.stderr]\n" % stream) def check_term(term, termlength): """Check if term contains not allowed characters, or for any other reasons for not using this term.""" try: if len(term) <= termlength: return False reg = re.compile(r"[1234567890\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~]") if re.search(reg, term): return False term = str.replace(term, "-", "") term = str.replace(term, ".", "") term = str.replace(term, ",", "") if int(term): return False except StandardError, e: pass return True def check_rnkWORD(table): """Checks for any problems in rnkWORD tables.""" i = 0 errors = {} termslist = run_sql("SELECT term FROM %s" % table) N = run_sql("select max(id_bibrec) from %sR" % table[:-1])[0][0] write_message("Checking integrity of rank values in %s" % table) terms = map(lambda x: x[0], termslist) while i < len(terms): current_terms = "" for j in range(i, ((i+5000)< len(terms) and (i+5000) or len(terms))): current_terms += "'%s'," % terms[j] terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE term in (%s)" % (table, current_terms[:-1])) for (t, hitlist) in terms_docs: term_docs = deserialize_via_marshal(hitlist) if (term_docs.has_key("Gi") and term_docs["Gi"][1] == 0) or not term_docs.has_key("Gi"): write_message("ERROR: Missing value for term: %s (%s) in %s: %s" % (t, repr(t), table, len(term_docs))) errors[t] = 1 i += 5000 write_message("Checking integrity of rank values in %sR" % table[:-1]) i = 0 while i < N: docs_terms = run_sql("SELECT id_bibrec, termlist FROM %sR WHERE id_bibrec>=%s and id_bibrec<=%s" % (table[:-1], i, i+5000)) for (j, termlist) in docs_terms: termlist = deserialize_via_marshal(termlist) for (t, tf) in termlist.iteritems(): if tf[1] == 0 and not errors.has_key(t): errors[t] = 1 write_message("ERROR: Gi missing for record %s and term: %s (%s) in %s" % (j,t,repr(t), table)) terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE term='%s'" % (table, t)) termlist = deserialize_via_marshal(terms_docs[0][1]) i += 5000 if len(errors) == 0: write_message("No direct errors found, but nonconsistent data may exist.") else: write_message("%s errors found during integrity check, repair and rebalancing recommended." % len(errors)) options["modified_words"] = errors def rank_method_code_statistics(table): """Shows some statistics about this rank method.""" maxID = run_sql("select max(id) from %s" % table) maxID = maxID[0][0] terms = {} Gi = {} write_message("Showing statistics of terms in index:") write_message("Important: For the 'Least used terms', the number of terms is shown first, and the number of occurences second.") write_message("Least used terms---Most important terms---Least important terms") i = 0 while i < maxID: terms_docs=run_sql("SELECT term, hitlist FROM %s WHERE id>= %s and id < %s" % (table, i, i + 10000)) for (t, hitlist) in terms_docs: term_docs=deserialize_via_marshal(hitlist) terms[len(term_docs)] = terms.get(len(term_docs), 0) + 1 if term_docs.has_key("Gi"): Gi[t] = term_docs["Gi"] i=i + 10000 terms=terms.items() terms.sort(lambda x, y: cmp(y[1], x[1])) Gi=Gi.items() Gi.sort(lambda x, y: cmp(y[1], x[1])) for i in range(0, 20): write_message("%s/%s---%s---%s" % (terms[i][0],terms[i][1], Gi[i][0],Gi[len(Gi) - i - 1][0])) def update_rnkWORD(table, terms): """Updates rnkWORDF and rnkWORDR with Gi and Nj values. For each term in rnkWORDF, a Gi value for the term is added. And for each term in each document, the Nj value for that document is added. In rnkWORDR, the Gi value for each term in each document is added. For description on how things are computed, look in the hacking docs. table - name of forward index to update terms - modified terms""" stime = time.time() Gi = {} Nj = {} N = run_sql("select count(id_bibrec) from %sR" % table[:-1])[0][0] if len(terms) == 0 and options["quick"] == "yes": write_message("No terms to process, ending...") return "" elif options["quick"] == "yes": #not used -R option, fast calculation (not accurate) write_message("Beginning post-processing of %s terms" % len(terms)) #Locating all documents related to the modified/new/deleted terms, if fast update, #only take into account new/modified occurences write_message("Phase 1: Finding records containing modified terms") terms = terms.keys() i = 0 while i < len(terms): terms_docs = get_from_forward_index(terms, i, (i+5000), table) for (t, hitlist) in terms_docs: term_docs = deserialize_via_marshal(hitlist) if term_docs.has_key("Gi"): del term_docs["Gi"] for (j, tf) in term_docs.iteritems(): if (options["quick"] == "yes" and tf[1] == 0) or options["quick"] == "no": Nj[j] = 0 write_message("Phase 1: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms))) i += 5000 write_message("Phase 1: Finished finding records containing modified terms") #Find all terms in the records found in last phase write_message("Phase 2: Finding all terms in affected records") records = Nj.keys() i = 0 while i < len(records): docs_terms = get_from_reverse_index(records, i, (i + 5000), table) for (j, termlist) in docs_terms: doc_terms = deserialize_via_marshal(termlist) for (t, tf) in doc_terms.iteritems(): Gi[t] = 0 write_message("Phase 2: ......processed %s/%s records " % ((i+5000>len(records) and len(records) or (i+5000)), len(records))) i += 5000 write_message("Phase 2: Finished finding all terms in affected records") else: #recalculate max_id = run_sql("SELECT MAX(id) FROM %s" % table) max_id = max_id[0][0] write_message("Beginning recalculation of %s terms" % max_id) terms = [] i = 0 while i < max_id: terms_docs = get_from_forward_index_with_id(i, (i+5000), table) for (t, hitlist) in terms_docs: Gi[t] = 0 term_docs = deserialize_via_marshal(hitlist) if term_docs.has_key("Gi"): del term_docs["Gi"] for (j, tf) in term_docs.iteritems(): Nj[j] = 0 write_message("Phase 1: ......processed %s/%s terms" % ((i+5000)>max_id and max_id or (i+5000), max_id)) i += 5000 write_message("Phase 1: Finished finding which records contains which terms") write_message("Phase 2: Jumping over..already done in phase 1 because of -R option") terms = Gi.keys() Gi = {} i = 0 if options["quick"] == "no": #Calculating Fi and Gi value for each term write_message("Phase 3: Calculating importance of all affected terms") while i < len(terms): terms_docs = get_from_forward_index(terms, i, (i+5000), table) for (t, hitlist) in terms_docs: term_docs = deserialize_via_marshal(hitlist) if term_docs.has_key("Gi"): del term_docs["Gi"] Fi = 0 Gi[t] = 1 for (j, tf) in term_docs.iteritems(): Fi += tf[0] for (j, tf) in term_docs.iteritems(): if tf[0] != Fi: Gi[t] = Gi[t] + ((float(tf[0]) / Fi) * math.log(float(tf[0]) / Fi) / math.log(2)) / math.log(N) write_message("Phase 3: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms))) i += 5000 write_message("Phase 3: Finished calculating importance of all affected terms") else: #Using existing Gi value instead of calculating a new one. Missing some accurancy. write_message("Phase 3: Getting approximate importance of all affected terms") while i < len(terms): terms_docs = get_from_forward_index(terms, i, (i+5000), table) for (t, hitlist) in terms_docs: term_docs = deserialize_via_marshal(hitlist) if term_docs.has_key("Gi"): Gi[t] = term_docs["Gi"][1] elif len(term_docs) == 1: Gi[t] = 1 else: Fi = 0 Gi[t] = 1 for (j, tf) in term_docs.iteritems(): Fi += tf[0] for (j, tf) in term_docs.iteritems(): if tf[0] != Fi: Gi[t] = Gi[t] + ((float(tf[0]) / Fi) * math.log(float(tf[0]) / Fi) / math.log(2)) / math.log(N) write_message("Phase 3: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms))) i += 5000 write_message("Phase 3: Finished getting approximate importance of all affected terms") write_message("Phase 4: Calculating normalization value for all affected records and updating %sR" % table[:-1]) records = Nj.keys() i = 0 while i < len(records): #Calculating the normalization value for each document, and adding the Gi value to each term in each document. docs_terms = get_from_reverse_index(records, i, (i + 5000), table) for (j, termlist) in docs_terms: doc_terms = deserialize_via_marshal(termlist) for (t, tf) in doc_terms.iteritems(): if Gi.has_key(t): Nj[j] = Nj.get(j, 0) + math.pow(Gi[t] * (1 + math.log(tf[0])), 2) Git = int(math.floor(Gi[t]*100)) if Git >= 0: Git += 1 doc_terms[t] = (tf[0], Git) else: Nj[j] = Nj.get(j, 0) + math.pow(tf[1] * (1 + math.log(tf[0])), 2) Nj[j] = 1.0 / math.sqrt(Nj[j]) Nj[j] = int(Nj[j] * 100) if Nj[j] >= 0: Nj[j] += 1 run_sql("UPDATE %sR SET termlist='%s' WHERE id_bibrec=%s" % (table[:-1], serialize_via_marshal(doc_terms), j)) write_message("Phase 4: ......processed %s/%s records" % ((i+5000>len(records) and len(records) or (i+5000)), len(records))) i += 5000 write_message("Phase 4: Finished calculating normalization value for all affected records and updating %sR" % table[:-1]) write_message("Phase 5: Updating %s with new normalization values" % table) i = 0 terms = Gi.keys() while i < len(terms): #Adding the Gi value to each term, and adding the normalization value to each term in each document. terms_docs = get_from_forward_index(terms, i, (i+5000), table) for (t, hitlist) in terms_docs: term_docs = deserialize_via_marshal(hitlist) if term_docs.has_key("Gi"): del term_docs["Gi"] for (j, tf) in term_docs.iteritems(): if Nj.has_key(j): term_docs[j] = (tf[0], Nj[j]) Git = int(math.floor(Gi[t]*100)) if Git >= 0: Git += 1 term_docs["Gi"] = (0, Git) run_sql("UPDATE %s SET hitlist='%s' WHERE term='%s'" % (table, serialize_via_marshal(term_docs), MySQLdb.escape_string(t))) write_message("Phase 5: ......processed %s/%s terms" % ((i+5000>len(terms) and len(terms) or (i+5000)), len(terms))) i += 5000 write_message("Phase 5: Finished updating %s with new normalization values" % table) write_message("Time used for post-processing: %.1fmin" % ((time.time() - stime) / 60)) write_message("Finished post-processing") def get_from_forward_index(terms, start, stop, table): current_terms = "" for j in range(start, (stop < len(terms) and stop or len(terms))): current_terms += "'%s'," % terms[j] terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE term IN (%s)" % (table,current_terms[:-1])) return terms_docs def get_from_forward_index_with_id(start, stop, table): terms_docs = run_sql("SELECT term, hitlist FROM %s WHERE id between %s and %s" % (table, start, stop)) return terms_docs def get_from_reverse_index(records, start, stop, table): current_recs = "%s" % records[start:stop] current_recs = current_recs[1:-1] docs_terms = run_sql("SELECT id_bibrec, termlist FROM %sR WHERE id_bibrec IN (%s)" % (table[:-1],current_recs)) return docs_terms def test_word_separators(phrase="hep-th/0101001"): """Tests word separating policy on various input.""" print "%s:" % phrase gwfp = get_words_from_phrase(phrase) for (word, count) in gwfp.iteritems(): print "\t-> %s - %s" % (word, count) def task_sig_sleep(sig, frame): """Signal handler for the 'sleep' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("sleeping...") task_update_status("SLEEPING") signal.pause() # wait for wake-up signal def task_sig_wakeup(sig, frame): """Signal handler for the 'wakeup' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("continuing...") task_update_status("CONTINUING") def task_sig_stop(sig, frame): """Signal handler for the 'stop' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("stopping...") task_update_status("STOPPING") errcode = 0 try: task_sig_stop_commands() write_message("stopped") task_update_status("STOPPED") except StandardError, err: write_message("Error during stopping! %e" % err) task_update_status("STOPPINGFAILED") errcode = 1 sys.exit(errcode) def task_sig_stop_commands(): """Do all the commands necessary to stop the task before quitting. Useful for task_sig_stop() handler. """ write_message("stopping commands started") for table in wordTables: table.put_into_db() write_message("stopping commands ended") def task_sig_suicide(sig, frame): """Signal handler for the 'suicide' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("suiciding myself now...") task_update_status("SUICIDING") write_message("suicided") task_update_status("SUICIDED") sys.exit(0) def task_sig_unknown(sig, frame): """Signal handler for the other unknown signals sent by shell or user.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("unknown signal %d ignored" % sig) # do nothing for other signals def task_update_progress(msg): """Updates progress information in the BibSched task table.""" global task_id, options if options["verbose"] >= 9: write_message("Updating task progress to %s." % msg) return run_sql("UPDATE schTASK SET progress=%s where id=%s", (msg, task_id)) def task_update_status(val): """Updates state information in the BibSched task table.""" global task_id, options if options["verbose"] >= 9: write_message("Updating task status to %s." % val) return run_sql("UPDATE schTASK SET status=%s where id=%s", (val, task_id)) def getName(methname, ln=cdslang, type='ln'): """Returns the name of the rank method, either in default language or given language. methname = short name of the method ln - the language to get the name in type - which name "type" to get.""" try: rnkid = run_sql("SELECT id FROM rnkMETHOD where name='%s'" % methname) if rnkid: rnkid = str(rnkid[0][0]) res = run_sql("SELECT value FROM rnkMETHODNAME where type='%s' and ln='%s' and id_rnkMETHOD=%s" % (type, ln, rnkid)) if not res: res = run_sql("SELECT value FROM rnkMETHODNAME WHERE ln='%s' and id_rnkMETHOD=%s and type='%s'" % (cdslang, rnkid, type)) if not res: return methname return res[0][0] else: raise Exception except Exception, e: write_message("Cannot run rank method, either given code for method is wrong, or it has not been added using the webinterface.") raise Exception def word_similarity(row, run): """Call correct method""" return word_index(row, run) diff --git a/modules/bibrank/lib/bibrankadminlib.py b/modules/bibrank/lib/bibrankadminlib.py index eb10070c3..5fd26027a 100644 --- a/modules/bibrank/lib/bibrankadminlib.py +++ b/modules/bibrank/lib/bibrankadminlib.py @@ -1,1071 +1,1072 @@ ## $Id$ ## Administrator interface for BibRank ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## Youshould have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware BibRank Administrator Interface.""" __lastupdated__ = """$Date$""" ## fill config variables: -import access_control_engine as acce import cgi import re import MySQLdb import Numeric import os import ConfigParser from zlib import compress,decompress -from messages import * -from dbquery import run_sql -from config import * -from webpage import page, pageheaderonly, pagefooteronly -from webuser import getUid, get_email from mod_python import apache +import cdsware.access_control_engine as acce +from cdsware.messages import * +from cdsware.dbquery import run_sql +from cdsware.config import * +from cdsware.webpage import page, pageheaderonly, pagefooteronly +from cdsware.webuser import getUid, get_email + __version__ = "$Id$" def getnavtrail(previous = ''): navtrail = """Admin Area > BibRank Admin """ % (weburl, weburl) navtrail = navtrail + previous return navtrail def check_user(uid, role, adminarea=2, authorized=0): (auth_code, auth_message) = is_adminuser(uid, role) if not authorized and auth_code != 0: return ("false", auth_message) return ("", auth_message) def is_adminuser(uid, role): """check if user is a registered administrator. """ return acce.acc_authorize_action(uid, role) def perform_index(ln=cdslang): """create the bibrank main area menu page.""" header = ['Code', 'Translations', 'Collections', 'Rank method'] rnk_list = get_def_name('', "rnkMETHOD") actions = [] for (rnkID, name) in rnk_list: actions.append([name]) for col in [(('Modify', 'modifytranslations'),), (('Modify', 'modifycollection'),), (('Show Details', 'showrankdetails'), ('Modify', 'modifyrank'), ('Delete', 'deleterank'))]: actions[-1].append('%s' % (weburl, col[0][1], rnkID, ln, col[0][0])) for (str, function) in col[1:]: actions[-1][-1] += ' / %s' % (weburl, function, rnkID, ln, str) output = """ Add new rank method

""" % (weburl, ln) output += tupletotable(header=header, tuple=actions) return addadminbox("""Overview of rank methods   [?]""" % weburl, datalist=[output, '']) def perform_modifycollection(rnkID='', ln=cdslang, func='', colID='', confirm=0): """Modify which collections the rank method is visible to""" output = "" subtitle = "" if rnkID: rnkNAME = get_def_name(rnkID, "rnkMETHOD")[0][1] if func in ["0", 0] and confirm in ["1", 1]: finresult = attach_col_rnk(rnkID, colID) elif func in ["1", 1] and confirm in ["1", 1]: finresult = detach_col_rnk(rnkID, colID) if colID: colNAME = get_def_name(colID, "collection")[0][1] subtitle = """Step 1 - Select collection to enable/disable rank method '%s' for""" % rnkNAME output = """

The rank method is currently enabled for these collections:
""" col_list = get_rnk_col(rnkID, ln) if not col_list: output += """No collections""" else: for (id, name) in col_list: output += """%s, """ % name output += """
""" col_list = get_def_name('', "collection") col_rnk = dict(get_rnk_col(rnkID)) col_list = filter(lambda x: not col_rnk.has_key(x[0]), col_list) if col_list: text = """ Enable for: """ output += createhiddenform(action="modifycollection", text=text, button="Enable", rnkID=rnkID, ln=ln, func=0, confirm=1) if confirm in ["0", 0] and func in ["0", 0] and colID: subtitle = "Step 2 - Confirm to enable rank method for the chosen collection" text = "

Please confirm to enable rank method '%s' for the collection '%s'

" % (rnkNAME, colNAME) output += createhiddenform(action="modifycollection", text=text, button="Confirm", rnkID=rnkID, ln=ln, colID=colID, func=0, confirm=1) elif confirm in ["1", 1] and func in ["0", 0] and colID: subtitle = "Step 3 - Result" output += write_outcome(finresult) elif confirm not in ["0", 0] and func in ["0", 0]: output += """Please select a collection.""" col_list = get_rnk_col(rnkID, ln) if col_list: text = """ Disable for: """ output += createhiddenform(action="modifycollection", text=text, button="Disable", rnkID=rnkID, ln=ln, func=1, confirm=1) if confirm in ["1", 1] and func in ["1", 1] and colID: subtitle = "Step 3 - Result" output += write_outcome(finresult) elif confirm not in ["0", 0] and func in ["1", 1]: output += """Please select a collection.""" try: body = [output, extra] except NameError: body = [output] return addadminbox(subtitle + """   [?]""" % weburl, body) def perform_modifytranslations(rnkID, ln, sel_type, trans, confirm, callback='yes'): """Modify the translations of a rank method""" output = '' subtitle = '' cdslangs = get_languages() cdslangs.sort() if confirm in ["2", 2] and rnkID: finresult = modify_translations(rnkID, cdslangs, sel_type, trans, "rnkMETHOD") rnk_name = get_def_name(rnkID, "rnkMETHOD")[0][1] rnk_dict = dict(get_i8n_name('', ln, get_rnk_nametypes()[0][0], "rnkMETHOD")) if rnkID and rnk_dict.has_key(int(rnkID)): rnkID = int(rnkID) subtitle = """3. Modify translations for rank method '%s'""" % rnk_name if type(trans) is str: trans = [trans] if sel_type == '': sel_type = get_rnk_nametypes()[0][0] header = ['Language', 'Translation'] actions = [] text = """ Name type """ output += createhiddenform(action="modifytranslations", text=text, button="Select", rnkID=rnkID, ln=ln, confirm=0) if confirm in [-1, "-1", 0, "0"]: trans = [] for key, value in cdslangs: try: trans_names = get_name(rnkID, key, sel_type, "rnkMETHOD") trans.append(trans_names[0][0]) except StandardError, e: trans.append('') for nr in range(0,len(cdslangs)): actions.append(["%s %s" % (cdslangs[nr][1], (cdslangs[nr][0]==cdslang and '(def)' or ''))]) actions[-1].append('' % trans[nr]) text = tupletotable(header=header, tuple=actions) output += createhiddenform(action="modifytranslations", text=text, button="Modify", rnkID=rnkID, sel_type=sel_type, ln=ln, confirm=2) if sel_type and len(trans) and confirm in ["2", 2]: output += write_outcome(finresult) try: body = [output, extra] except NameError: body = [output] return addadminbox(subtitle + """   [?]""" % weburl, body) def perform_addrankarea(rnkcode='', ln=cdslang, template='', confirm=-1): """form to add a new rank method with these values:""" subtitle = 'Step 1 - Create new rank method' output = """
BibRank code:
A unique code that identifies a rank method, is used when running the bibrank daemon and used to name the configuration file for the method.
The template files includes the necessary parameters for the chosen rank method, and only needs to be edited with the correct tags and paths.
For more information, please go to the BibRank guide and read the section about adding a rank method
""" % weburl text = """ BibRank code """ % (rnkcode) text += """
Cfg template """ output += createhiddenform(action="addrankarea", text=text, button="Add rank method", ln=ln, confirm=1) if rnkcode: if confirm in ["0", 0]: subtitle = 'Step 2 - Confirm addition of rank method' text = """Add rank method with BibRank code: '%s'.""" % (rnkcode) if template: text += """
Using configuration template: '%s'.""" % (template) else: text += """
Create empty configuration file.""" output += createhiddenform(action="addrankarea", text=text, rnkcode=rnkcode, button="Confirm", template=template, confirm=1) elif confirm in ["1", 1]: rnkID = add_rnk(rnkcode) subtitle = "Step 3 - Result" if rnkID[0] == 1: rnkID = rnkID[1] text = """Added new rank method with BibRank code '%s'""" % rnkcode try: if template: infile = open("%s/bibrank/%s" % (etcdir, template), 'r') indata = infile.readlines() infile.close() else: indata = () file = open("%s/bibrank/%s.cfg" % (etcdir, get_rnk_code(rnkID)[0][0]), 'w') for line in indata: file.write(line) file.close() if template: text += """
Configuration file created using '%s' as template.
""" % template else: text += """
Empty configuration file created.
""" except StandardError, e: text += """
Sorry, could not create configuration file: '%s/bibrank/%s.cfg', either because it already exists, or not enough rights to create file.
Please create the file in the path given.
""" % (etcdir, get_rnk_code(rnkID)[0][0]) else: text = """Sorry, could not add rank method, rank method with the same BibRank code probably exists.""" output += text elif not rnkcode and confirm not in [-1, "-1"]: output += """Sorry, could not add rank method, not enough data submitted.""" try: body = [output, extra] except NameError: body = [output] return addadminbox(subtitle + """   [?]""" % weburl, body) def perform_modifyrank(rnkID, rnkcode='', ln=cdslang, template='', cfgfile='', confirm=0): """form to modify a rank method rnkID - id of the rank method """ subtitle = 'Step 1 - Please modify the wanted values below' if not rnkcode: oldcode = get_rnk_code(rnkID)[0] else: oldcode = rnkcode output = """
When changing the BibRank code of a rank method, you must also change any scheduled tasks using the old value.
For more information, please go to the BibRank guide and read the section about modifying a rank method's BibRank code.
""" % weburl text = """ BibRank code
""" % (oldcode) try: text += """Cfg file""" textarea = "" if cfgfile: textarea +=cfgfile else: file = open("%s/bibrank/%s.cfg" % (etcdir, get_rnk_code(rnkID)[0][0])) for line in file.readlines(): textarea += line text += """""" except StandardError, e: text += """Cannot load file, either it does not exist, or not enough rights to read it: '%s/bibrank/%s.cfg'
Please create the file in the path given.
""" % (etcdir, get_rnk_code(rnkID)[0][0]) output += createhiddenform(action="modifyrank", text=text, rnkID=rnkID, button="Modify", confirm=1) if rnkcode and confirm in ["1", 1] and get_rnk_code(rnkID)[0][0] != rnkcode: oldcode = get_rnk_code(rnkID)[0][0] result = modify_rnk(rnkID, rnkcode) subtitle = "Step 3 - Result" if result: text = """Rank method modified.""" try: file = open("%s/bibrank/%s.cfg" % (etcdir, oldcode), 'r') file2 = open("%s/bibrank/%s.cfg" % (etcdir, rnkcode), 'w') lines = file.readlines() for line in lines: file2.write(line) file.close() file2.close() os.remove("%s/bibrank/%s.cfg" % (etcdir, oldcode)) except StandardError, e: text = """Sorry, could not change name of cfg file, must be done manually: '%s/bibrank/%s.cfg'""" % (etcdir, oldcode) else: text = """Sorry, could not modify rank method.""" output += text if cfgfile and confirm in ["1", 1]: try: file = open("%s/bibrank/%s.cfg" % (etcdir, get_rnk_code(rnkID)[0][0]), 'w') file.write(cfgfile) file.close() text = """
Configuration file modified: '%s/bibrank/%s.cfg'
""" % (etcdir, get_rnk_code(rnkID)[0][0]) except StandardError, e: text = """
Sorry, could not modify configuration file, please check for rights to do so: '%s/bibrank/%s.cfg'
Please modify the file manually.
""" % (etcdir, get_rnk_code(rnkID)[0][0]) output += text finoutput = addadminbox(subtitle + """   [?]""" % weburl, [output]) output = "" text = """ Select
""" output += createhiddenform(action="modifyrank", text=text, rnkID=rnkID, button="Show template", confirm=0) try: if template: textarea = "" text = """Content:""" file = open("%s/bibrank/%s" % (etcdir, template), 'r') lines = file.readlines() for line in lines: textarea += line file.close() text += """""" output += text except StandardError, e: output += """Cannot load file, either it does not exist, or not enough rights to read it: '%s/bibrank/%s'""" % (etcdir, template) finoutput += addadminbox("View templates", [output]) return finoutput def perform_deleterank(rnkID, ln=cdslang, confirm=0): """form to delete a rank method """ subtitle ='' output = """
WARNING:
When deleting a rank method, you also deletes all data related to the rank method, like translations, which collections it was attached to and the data necessary to rank the searchresults. Any scheduled tasks using the deleted rank method will also stop working.

For more information, please go to the BibRank guide and read the section regarding deleting a rank method.
""" % weburl if rnkID: if confirm in ["0", 0]: rnkNAME = get_def_name(rnkID, "rnkMETHOD")[0][1] subtitle = 'Step 1 - Confirm deletion' text = """Delete rank method '%s'.""" % (rnkNAME) output += createhiddenform(action="deleterank", text=text, button="Confirm", rnkID=rnkID, confirm=1) elif confirm in ["1", 1]: try: rnkNAME = get_def_name(rnkID, "rnkMETHOD")[0][1] rnkcode = get_rnk_code(rnkID)[0][0] table = "" try: config = ConfigParser.ConfigParser() config.readfp(open("%s/bibrank/%s.cfg" % (etcdir, rnkcode), 'r')) table = config.get(config.get('rank_method', "function"), "table") except Exception: pass result = delete_rnk(rnkID, table) subtitle = "Step 2 - Result" if result: text = """Rank method deleted""" try: os.remove("%s/bibrank/%s.cfg" % (etcdir, rnkcode)) text += """
Configuration file deleted: '%s/bibrank/%s.cfg'.""" % (etcdir, rnkcode) except StandardError, e: text += """
Sorry, could not delete configuration file: '%s/bibrank/%s.cfg'.
Please delete the file manually.
""" % (etcdir, rnkcode) else: text = """Sorry, could not delete rank method""" except StandardError, e: text = """Sorry, could not delete rank method, most likely already deleted""" output = text try: body = [output, extra] except NameError: body = [output] return addadminbox(subtitle + """   [?]""" % weburl, body) def perform_showrankdetails(rnkID, ln=cdslang): """Returns details about the rank method given by rnkID""" subtitle = """Overview [Modify]""" % (weburl, rnkID, ln) text = """ BibRank code: %s
Last updated by BibRank: """ % (get_rnk_code(rnkID)[0][0]) if get_rnk(rnkID)[0][2]: text += "%s
" % get_rnk(rnkID)[0][2] else: text += "Not yet run.
" output = addadminbox(subtitle, [text]) subtitle = """Rank method statistics""" text = "" try: text = "Not yet implemented" except StandardError, e: text = "BibRank not yet run, cannot show statistics for method" output += addadminbox(subtitle, [text]) subtitle = """Attached to collections [Modify]""" % (weburl, rnkID, ln) text = "" col = get_rnk_col(rnkID, ln) for key, value in col: text+= "%s
" % value if not col: text +="No collections" output += addadminbox(subtitle, [text]) subtitle = """Translations [Modify]""" % (weburl, rnkID, ln) prev_lang = '' trans = get_translations(rnkID) types = get_rnk_nametypes() types = dict(map(lambda x: (x[0], x[1]), types)) text = "" languages = dict(get_languages()) if trans: for lang, type, name in trans: if lang and languages.has_key(lang) and type and name: if prev_lang != lang: prev_lang = lang text += """%s:
""" % (languages[lang]) if types.has_key(type): text+= """'%s'(%s)
""" % (name, types[type]) else: text = """No translations exists""" output += addadminbox(subtitle, [text]) subtitle = """Configuration file: '%s/bibrank/%s.cfg' [Modify]""" % (etcdir, get_rnk_code(rnkID)[0][0], weburl, rnkID, ln) text = "" try: file = open("%s/bibrank/%s.cfg" % (etcdir, get_rnk_code(rnkID)[0][0])) text += """
"""
         for line in file.readlines():
             text += line
         text += """
""" except StandardError, e: text = """Cannot load file, either it does not exist, or not enough rights to read it.""" output += addadminbox(subtitle, [text]) return output def compare_on_val(second, first): return cmp(second[1], first[1]) def get_rnk_code(rnkID): """Returns the name from rnkMETHOD based on argument rnkID - id from rnkMETHOD""" try: res = run_sql("SELECT name FROM rnkMETHOD where id=%s" % (rnkID)) return res except StandardError, e: return () def get_rnk(rnkID=''): """Return one or all rank methods rnkID - return the rank method given, or all if not given""" try: if rnkID: res = run_sql("SELECT id,name,DATE_FORMAT(last_updated, '%%Y-%%m-%%d %%H:%%i:%%s') from rnkMETHOD WHERE id=%s" % rnkID) else: res = run_sql("SELECT id,name,DATE_FORMAT(last_updated, '%%Y-%%m-%%d %%H:%%i:%%s') from rnkMETHOD") return res except StandardError, e: return () def get_translations(rnkID): """Returns the translations in rnkMETHODNAME for a rankmethod rnkID - the id of the rankmethod from rnkMETHOD """ try: res = run_sql("SELECT ln, type, value FROM rnkMETHODNAME where id_rnkMETHOD=%s ORDER BY ln,type" % (rnkID)) return res except StandardError, e: return () def get_rnk_nametypes(): """Return a list of the various translationnames for the rank methods""" type = [] type.append(('ln', 'Long name')) #type.append(('sn', 'Short name')) return type def get_col_nametypes(): """Return a list of the various translationnames for the rank methods""" type = [] type.append(('ln', 'Long name')) return type def get_rnk_col(rnkID, ln=cdslang): """ Returns a list of the collections the given rank method is attached to rnkID - id from rnkMETHOD""" try: res1 = dict(run_sql("SELECT id_collection, '' FROM collection_rnkMETHOD WHERE id_rnkMETHOD=%s" % rnkID)) res2 = get_def_name('', "collection") result = filter(lambda x: res1.has_key(x[0]), res2) return result except StandardError, e: return () def get_templates(): """Read etcdir/bibrank and returns a list of all files with 'template' """ templates = [] files = os.listdir(etcdir + "/bibrank/") for file in files: if str.find(file,"template_") != -1: templates.append(file) return templates def attach_col_rnk(rnkID, colID): """attach rank method to collection rnkID - id from rnkMETHOD table colID - id of collection, as in collection table """ try: res = run_sql("INSERT INTO collection_rnkMETHOD(id_collection, id_rnkMETHOD) values (%s,%s)" % (colID, rnkID)) return (1, "") except StandardError, e: return (0, e) def detach_col_rnk(rnkID, colID): """detach rank method from collection rnkID - id from rnkMETHOD table colID - id of collection, as in collection table """ try: res = run_sql("DELETE FROM collection_rnkMETHOD WHERE id_collection=%s AND id_rnkMETHOD=%s" % (colID, rnkID)) return (1, "") except StandardError, e: return (0, e) def delete_rnk(rnkID, table=""): """Deletes all data for the given rank method rnkID - delete all data in the tables associated with ranking and this id """ try: res = run_sql("DELETE FROM rnkMETHOD WHERE id=%s" % rnkID) res = run_sql("DELETE FROM rnkMETHODNAME WHERE id_rnkMETHOD=%s" % rnkID) res = run_sql("DELETE FROM collection_rnkMETHOD WHERE id_rnkMETHOD=%s" % rnkID) res = run_sql("DELETE FROM rnkMETHODDATA WHERE id_rnkMETHOD=%s" % rnkID) if table: res = run_sql("truncate %s" % table) res = run_sql("truncate %sR" % table[:-1]) return (1, "") except StandardError, e: return (0, e) def modify_rnk(rnkID, rnkcode): """change the code for the rank method given rnkID - change in rnkMETHOD where id is like this rnkcode - new value for field 'name' in rnkMETHOD """ try: res = run_sql("UPDATE rnkMETHOD set name='%s' WHERE id=%s" % (MySQLdb.escape_string(rnkcode), rnkID)) return (1, "") except StandardError, e: return (0, e) def add_rnk(rnkcode): """Adds a new rank method to rnkMETHOD rnkcode - the "code" for the rank method, to be used by bibrank daemon """ try: res = run_sql("INSERT INTO rnkMETHOD(name) VALUES('%s')" % MySQLdb.escape_string(rnkcode)) res = run_sql("SELECT id FROM rnkMETHOD WHERE name='%s'" % MySQLdb.escape_string(rnkcode)) if res: return (1, res[0][0]) else: raise StandardError except StandardError, e: return (0, e) def addadminbox(header='', datalist=[], cls="admin_wvar"): """used to create table around main data on a page, row based. header - header on top of the table datalist - list of the data to be added row by row cls - possible to select wich css-class to format the look of the table.""" if len(datalist) == 1: per = '100' else: per = '75' output = '\n' output += """ """ % (len(datalist), header) output += ' \n' output += """ """ % (per+'%', datalist[0]) if len(datalist) > 1: output += """ """ % ('25%', datalist[1]) output += ' \n' output += """
%s
%s %s
""" return output def tupletotable(header=[], tuple=[], start='', end='', extracolumn=''): """create html table for a tuple. header - optional header for the columns tuple - create table of this start - text to be added in the beginning, most likely beginning of a form end - text to be added in the end, mot likely end of a form. extracolumn - mainly used to put in a button. """ # study first row in tuple for alignment align = [] try: firstrow = tuple[0] if type(firstrow) in [int, long]: align = ['admintdright'] elif type(firstrow) in [str, dict]: align = ['admintdleft'] else: for item in firstrow: if type(item) is int: align.append('admintdright') else: align.append('admintdleft') except IndexError: firstrow = [] tblstr = '' for h in header + ['']: tblstr += ' %s\n' % (h, ) if tblstr: tblstr = ' \n%s\n \n' % (tblstr, ) tblstr = start + '\n' + tblstr # extra column try: extra = '' if type(firstrow) not in [int, long, str, dict]: # for data in firstrow: extra += '\n' % ('admintd', data) for i in range(len(firstrow)): extra += '\n' % (align[i], firstrow[i]) else: extra += ' \n' % (align[0], firstrow) extra += '\n\n' % (len(tuple), extracolumn) except IndexError: extra = '' tblstr += extra # for i in range(1, len(tuple)): for row in tuple[1:]: tblstr += ' \n' # row = tuple[i] if type(row) not in [int, long, str, dict]: # for data in row: tblstr += '\n' % (data,) for i in range(len(row)): tblstr += '\n' % (align[i], row[i]) else: tblstr += ' \n' % (align[0], row) tblstr += ' \n' tblstr += '
%s%s%s\n%s\n
%s%s%s
\n ' tblstr += end return tblstr def tupletotable_onlyselected(header=[], tuple=[], selected=[], start='', end='', extracolumn=''): """create html table for a tuple. header - optional header for the columns tuple - create table of this selected - indexes of selected rows in the tuple start - put this in the beginning end - put this in the beginning extracolumn - mainly used to put in a button""" tuple2 = [] for index in selected: tuple2.append(tuple[int(index)-1]) return tupletotable(header=header, tuple=tuple2, start=start, end=end, extracolumn=extracolumn) def addcheckboxes(datalist=[], name='authids', startindex=1, checked=[]): """adds checkboxes in front of the listdata. datalist - add checkboxes in front of this list name - name of all the checkboxes, values will be associated with this name startindex - usually 1 because of the header checked - values of checkboxes to be pre-checked """ if not type(checked) is list: checked = [checked] for row in datalist: if 1 or row[0] not in [-1, "-1", 0, "0"]: # always box, check another place chkstr = str(startindex) in checked and 'checked="checked"' or '' row.insert(0, '' % (name, startindex, chkstr)) else: row.insert(0, '') startindex += 1 return datalist def createhiddenform(action="", text="", button="confirm", cnfrm='', **hidden): """create select with hidden values and submit button action - name of the action to perform on submit text - additional text, can also be used to add non hidden input button - value/caption on the submit button cnfrm - if given, must check checkbox to confirm **hidden - dictionary with name=value pairs for hidden input """ output = '
\n' % (action, ) output += '\n
' output += text if cnfrm: output += ' ' for key in hidden.keys(): if type(hidden[key]) is list: for value in hidden[key]: output += ' \n' % (key, value) else: output += ' \n' % (key, hidden[key]) output += '' output += ' \n' % (button, ) output += '
' output += '
\n' return output def adderrorbox(header='', datalist=[]): """used to create table around main data on a page, row based""" try: perc= str(100 // len(datalist)) + '%' except ZeroDivisionError: perc= 1 output = '' output += '' % (len(datalist), header) output += '' for row in [datalist]: output += '' for data in row: output += '' output += '' output += '
%s
' % (perc, ) output += data output += '
' return output def serialize_via_numeric_array_dumps(arr): return Numeric.dumps(arr) def serialize_via_numeric_array_compr(str): return compress(str) def serialize_via_numeric_array_escape(str): return MySQLdb.escape_string(str) def serialize_via_numeric_array(arr): """Serialize Numeric array into a compressed string.""" return serialize_via_numeric_array_escape(serialize_via_numeric_array_compr(serialize_via_numeric_array_dumps(arr))) def deserialize_via_numeric_array(string): """Decompress and deserialize string into a Numeric array.""" return Numeric.loads(decompress(string)) def serialize_via_marshal(obj): """Serialize Python object via marshal into a compressed string.""" return MySQLdb.escape_string(compress(dumps(obj))) def deserialize_via_marshal(string): """Decompress and deserialize string into a Python object via marshal.""" return loads(decompress(string)) def get_languages(): languages = [] for (lang, lang_namelong) in language_list_long(): languages.append((lang, lang_namelong)) languages.sort() return languages def get_def_name(ID, table): """Returns a list of the names, either with the name in the current language, the default language, or just the name from the given table ln - a language supported by cdsware type - the type of value wanted, like 'ln', 'sn'""" name = "name" if table[-1:].isupper(): name = "NAME" try: if ID: res = run_sql("SELECT id,name FROM %s where id=%s" % (table, ID)) else: res = run_sql("SELECT id,name FROM %s" % table) res = list(res) res.sort(compare_on_val) return res except StandardError, e: return [] def get_i8n_name(ID, ln, rtype, table): """Returns a list of the names, either with the name in the current language, the default language, or just the name from the given table ln - a language supported by cdsware type - the type of value wanted, like 'ln', 'sn'""" name = "name" if table[-1:].isupper(): name = "NAME" try: res = "" if ID: res = run_sql("SELECT id_%s,value FROM %s%s where type='%s' and ln='%s' and id_%s=%s" % (table, table, name, rtype,ln, table, ID)) else: res = run_sql("SELECT id_%s,value FROM %s%s where type='%s' and ln='%s'" % (table, table, name, rtype,ln)) if ln != cdslang: if ID: res1 = run_sql("SELECT id_%s,value FROM %s%s WHERE ln='%s' and type='%s' and id_%s=%s" % (table, table, name, cdslang, rtype, table, ID)) else: res1 = run_sql("SELECT id_%s,value FROM %s%s WHERE ln='%s' and type='%s'" % (table, table, name, cdslang, rtype)) res2 = dict(res) result = filter(lambda x: not res2.has_key(x[0]), res1) res = res + result if ID: res1 = run_sql("SELECT id,name FROM %s where id=%s" % (table, ID)) else: res1 = run_sql("SELECT id,name FROM %s" % table) res2 = dict(res) result = filter(lambda x: not res2.has_key(x[0]), res1) res = res + result res = list(res) res.sort(compare_on_val) return res except StandardError, e: raise StandardError def get_name(ID, ln, rtype, table): """Returns the value from the table name based on arguments ID - id ln - a language supported by cdsware type - the type of value wanted, like 'ln', 'sn' table - tablename""" name = "name" if table[-1:].isupper(): name = "NAME" try: res = run_sql("SELECT value FROM %s%s WHERE type='%s' and ln='%s' and id_%s=%s" % (table, name, rtype, ln, table, ID)) return res except StandardError, e: return () def modify_translations(ID, langs, sel_type, trans, table): """add or modify translations in tables given by table frmID - the id of the format from the format table sel_type - the name type langs - the languages trans - the translations, in same order as in langs table - the table""" name = "name" if table[-1:].isupper(): name = "NAME" try: for nr in range(0,len(langs)): res = run_sql("SELECT value FROM %s%s WHERE id_%s=%s AND type='%s' AND ln='%s'" % (table, name, table, ID, sel_type, langs[nr][0])) if res: if trans[nr]: res = run_sql("UPDATE %s%s SET value='%s' WHERE id_%s=%s AND type='%s' AND ln='%s'" % (table, name, MySQLdb.escape_string(trans[nr]), table, ID, sel_type, langs[nr][0])) else: res = run_sql("DELETE FROM %s%s WHERE id_%s=%s AND type='%s' AND ln='%s'" % (table, name, table, ID, sel_type, langs[nr][0])) else: if trans[nr]: res = run_sql("INSERT INTO %s%s(id_%s, type, ln, value) VALUES (%s,'%s','%s','%s')" % (table, name, table, ID, sel_type, langs[nr][0], MySQLdb.escape_string(trans[nr]))) return (1, "") except StandardError, e: return (0, e) def write_outcome(res): try: if res and res[0] == 1: return """Operation successfully completed.""" elif res: return """Operation failed. Reason:
%s""" % res[1][1] except Exception, e: return """Operation failed. Reason unknown
""" diff --git a/modules/bibsched/bin/bibsched.in b/modules/bibsched/bin/bibsched.in index 62631595e..370c3286f 100644 --- a/modules/bibsched/bin/bibsched.in +++ b/modules/bibsched/bin/bibsched.in @@ -1,756 +1,754 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """BibSched - task management, scheduling and executing system for CDSware """ __version__ = "$Id$" ### -- local configuration section starts here --- cfg_valid_processes = ["bibindex","bibupload","bibreformat","webcoll","bibtaskex","bibrank","oaiharvest"] # which tasks are reconized as valid? ### -- local configuration section ends here --- ## import interesting modules: try: import os import imp import string import sys import MySQLdb import time import sre import getopt import curses import curses.panel from curses.wrapper import wrapper import signal except ImportError, e: print "Error: %s" % e import sys sys.exit(1) try: - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.config import * from cdsware.dbquery import run_sql except ImportError, e: print "Error: %s" % e import sys sys.exit(1) def get_datetime(var, format_string="%Y-%m-%d %H:%M:%S"): """Returns a date string according to the format string. It can handle normal date strings and shifts with respect to now.""" try: date = time.time() shift_re=sre.compile("([-\+]{0,1})([\d]+)([dhms])") factors = {"d":24*3600, "h":3600, "m":60, "s":1} m = shift_re.match(var) if m: sign = m.groups()[0] == "-" and -1 or 1 factor = factors[m.groups()[2]] value = float(m.groups()[1]) date = time.localtime(date + sign * factor * value) date = time.strftime(format_string, date) else: date = time.strptime(var, format_string) date = time.strftime(format_string, date) return date except: return None def get_my_pid(process,args=''): COMMAND = "ps -C %s o '%%p%%a' | grep '%s %s' | sed -n 1p" % (process,process,args) answer = string.strip(os.popen(COMMAND).read()) if answer=='': answer = 0 else: answer = answer[:string.find(answer,' ')] return int(answer) def get_output_channelnames(task_id): "Construct and return filename for stdout and stderr for the task 'task_id'." filenamebase = "%s/bibsched_task_%d" % (logdir, task_id) return [filenamebase + ".log", filenamebase + ".err"] class Manager: def __init__(self): self.helper_modules = cfg_valid_processes self.running = 1 self.footer_move_mode = "[KeyUp/KeyDown Move] [M Select mode] [Q Quit]" self.footer_auto_mode = "[A Manual mode] [1/2 Display Type] [Q Quit]" self.footer_select_mode = "[KeyUp/KeyDown/PgUp/PgDown Select] [L View Log] [1/2 Display Type] [M Move mode] [A Auto mode] [Q Quit]" self.footer_waiting_item = "[R Run] [D Delete]" self.footer_running_item = "[S Sleep] [T Stop] [K Kill]" self.footer_stopped_item = "[I Initialise] [D Delete]" self.footer_sleeping_item = "[W Wake Up]" self.item_status = "" self.selected_line = 2 self.rows = [] self.panel = None self.display = 2 self.first_visible_line = 0 self.move_mode = 0 self.auto_mode = 0 self.currentrow = ["","","","","","",""] wrapper( self.start ) def handle_keys(self): chr = self.stdscr.getch() if chr == -1: return if self.auto_mode and (chr not in (curses.KEY_UP,curses.KEY_DOWN,curses.KEY_PPAGE,curses.KEY_NPAGE,ord("q"),ord("Q"),ord("a"),ord("A"),ord("1"),ord("2"))): self.display_in_footer("in automatic mode") self.stdscr.refresh() elif self.move_mode and (chr not in (curses.KEY_UP,curses.KEY_DOWN,ord("m"),ord("M"),ord("q"),ord("Q"))): self.display_in_footer("in move mode") self.stdscr.refresh() else: if chr == curses.KEY_UP: if self.move_mode: self.move_up() else: self.selected_line = max( self.selected_line - 1 , 2 ) self.repaint() if chr == curses.KEY_PPAGE: self.selected_line = max( self.selected_line - 10 , 2 ) self.repaint() elif chr == curses.KEY_DOWN: if self.move_mode: self.move_down() else: self.selected_line = min(self.selected_line + 1, len(self.rows) + 1 ) self.repaint() elif chr == curses.KEY_NPAGE: self.selected_line = min(self.selected_line + 10, len(self.rows) + 1 ) self.repaint() elif chr == curses.KEY_HOME: self.first_visible_line = 0 self.selected_line = 2 elif chr in (ord("a"), ord("A")): self.change_auto_mode() elif chr in (ord("l"), ord("L")): self.openlog() elif chr in (ord("w"), ord("W")): self.wakeup() elif chr in (ord("r"), ord("R")): self.run() elif chr in (ord("s"), ord("S")): self.sleep() elif chr in (ord("k"), ord("K")): self.kill() elif chr in (ord("t"), ord("T")): self.stop() elif chr in (ord("d"), ord("D")): self.delete() elif chr in (ord("i"), ord("I")): self.init() elif chr in (ord("m"), ord("M")): self.change_select_mode() elif chr == ord("1"): self.display = 1 self.first_visible_line = 0 self.selected_line = 2 self.display_in_footer("only done processes are displayed") elif chr == ord("2"): self.display = 2 self.first_visible_line = 0 self.selected_line = 2 self.display_in_footer("only not done processes are displayed") elif chr in (ord("q"), ord("Q")): if curses.panel.top_panel() == self.panel: self.panel.bottom() curses.panel.update_panels() else: self.running = 0 return def set_status(self, id, status): return run_sql("UPDATE schTASK set status=%s WHERE id=%s", (status, id)) def set_progress(self, id, progress): return run_sql("UPDATE schTASK set progress=%s WHERE id=%s", (progress, id)) def openlog(self): self.win = curses.newwin( self.height-2, self.width-2, 1, 1 ) self.panel = curses.panel.new_panel( self.win ) self.panel.top() self.win.border() self.win.addstr(1, 1, "Not implemented yet...") self.win.refresh() curses.panel.update_panels() def count_processes(self,status): out = 0 res = run_sql("SELECT COUNT(id) FROM schTASK WHERE status=%s GROUP BY status", (status,)) try: out = res[0][0] except: pass return out def wakeup(self): id = self.currentrow[0] process = self.currentrow[1] status = self.currentrow[5] if self.count_processes('RUNNING') + self.count_processes('CONTINUING') >= 1: self.display_in_footer("a process is already running!") elif status == "SLEEPING": mypid = get_my_pid(process,str(id)) if mypid!=0: os.kill(mypid, signal.SIGCONT) self.display_in_footer("process woken up") else: self.display_in_footer("process is not sleeping") self.stdscr.refresh() def run(self): id = self.currentrow[0] process = self.currentrow[1] status = self.currentrow[5] sleeptime = self.currentrow[4] if self.count_processes('RUNNING') + self.count_processes('CONTINUING') >= 1: self.display_in_footer("a process is already running!") elif status == "STOPPED" or status == "WAITING": if process in self.helper_modules: program=os.path.join( bindir, process ) fdout, fderr = get_output_channelnames(id) COMMAND = "%s %s >> %s 2>> %s &" % (program, str(id), fdout, fderr) os.system(COMMAND) Log("manually running task #%d (%s)" % (id, process)) if sleeptime: next_runtime=get_datetime(sleeptime) run_sql("INSERT INTO schTASK (proc, user, runtime, sleeptime, arguments, status) "\ " VALUES (%s,%s,%s,%s,%s,'WAITING')", (process,self.currentrow[2], next_runtime,sleeptime,self.currentrow[7])) else: self.display_in_footer("process status should be STOPPED or WAITING!") self.stdscr.refresh() def sleep(self): id = self.currentrow[0] process = self.currentrow[1] status = self.currentrow[5] if status!='RUNNING' and status!='CONTINUING': self.display_in_footer("this process is not running!") else: mypid = get_my_pid(process,str(id)) if mypid!=0: os.kill(mypid, signal.SIGUSR1) self.display_in_footer("SLEEP signal sent to process #%s" % mypid) else: self.set_status(id,'STOPPED') self.display_in_footer("cannot find process...") self.stdscr.refresh() def kill(self): id = self.currentrow[0] process = self.currentrow[1] status = self.currentrow[5] mypid = get_my_pid(process,str(id)) if mypid!=0: os.kill(mypid, signal.SIGKILL) self.set_status(id,'STOPPED') self.display_in_footer("KILL signal sent to process #%s" % mypid) else: self.set_status(id,'STOPPED') self.display_in_footer("cannot find process...") self.stdscr.refresh() def stop(self): id = self.currentrow[0] process = self.currentrow[1] status = self.currentrow[5] mypid = get_my_pid(process,str(id)) if mypid!=0: os.kill(mypid, signal.SIGINT) self.display_in_footer("INT signal sent to process #%s" % mypid) else: self.set_status(id,'STOPPED') self.display_in_footer("cannot find process...") self.stdscr.refresh() def delete(self): id = self.currentrow[0] process = self.currentrow[1] status = self.currentrow[5] if status!='RUNNING' and status!='CONTINUING' and status!='SLEEPING': self.set_status(id,"%s_DELETED" % status) self.display_in_footer("process deleted") self.selected_line = max(self.selected_line, 2) else: self.display_in_footer("cannot delete running processes") self.stdscr.refresh() def init(self): id = self.currentrow[0] process = self.currentrow[1] status = self.currentrow[5] if status!='RUNNING' and status!='CONTINUING' and status!='SLEEPING': self.set_status(id,"WAITING") self.set_progress(id,"None") self.display_in_footer("process initialised") else: self.display_in_footer("cannot initialise running processes") self.stdscr.refresh() def change_select_mode(self): if self.move_mode: self.move_mode = 0 else: status = self.currentrow[5] if status in ( "RUNNING" , "CONTINUING" , "SLEEPING" ): self.display_in_footer("cannot move running processes!") else: self.move_mode = 1 self.stdscr.refresh() def change_auto_mode(self): if self.auto_mode: program = os.path.join( bindir, "bibsched") COMMAND = "%s -q stop" % program os.system(COMMAND) self.auto_mode = 0 else: program = os.path.join( bindir, "bibsched") COMMAND = "%s -q start" % program os.system(COMMAND) self.auto_mode = 1 self.move_mode = 0 self.stdscr.refresh() def move_up(self): self.display_in_footer("not implemented yet") self.stdscr.refresh() def move_down(self): self.display_in_footer("not implemented yet") self.stdscr.refresh() def put_line(self, row): col_w = [ 5 , 11 , 21 , 21 , 7 , 11 , 25 ] maxx = self.width if self.y == self.selected_line - self.first_visible_line and self.y > 1: if self.auto_mode: attr = curses.color_pair(2) + curses.A_STANDOUT + curses.A_BOLD elif self.move_mode: attr = curses.color_pair(7) + curses.A_STANDOUT + curses.A_BOLD else: attr = curses.color_pair(8) + curses.A_STANDOUT + curses.A_BOLD self.item_status = row[5] self.currentrow = row elif self.y == 0: if self.auto_mode: attr = curses.color_pair(2) + curses.A_STANDOUT + curses.A_BOLD elif self.move_mode: attr = curses.color_pair(7) + curses.A_STANDOUT + curses.A_BOLD else: attr = curses.color_pair(8) + curses.A_STANDOUT + curses.A_BOLD elif row[5] == "DONE": attr = curses.color_pair(5) + curses.A_BOLD elif row[5] == "STOPPED": attr = curses.color_pair(6) + curses.A_BOLD elif sre.search("ERROR",row[5]): attr = curses.color_pair(4) + curses.A_BOLD elif row[5] == "WAITING": attr = curses.color_pair(3) + curses.A_BOLD elif row[5] in ("RUNNING","CONTINUING") : attr = curses.color_pair(2) + curses.A_BOLD else: attr = curses.A_BOLD myline = str(row[0]).ljust(col_w[0]) myline += str(row[1]).ljust(col_w[1]) myline += str(row[2]).ljust(col_w[2]) myline += str(row[3])[:19].ljust(col_w[3]) myline += str(row[4]).ljust(col_w[4]) myline += str(row[5]).ljust(col_w[5]) myline += str(row[6]).ljust(col_w[6]) myline = myline.ljust(maxx) self.stdscr.addnstr(self.y, 0, myline, maxx, attr) self.y = self.y+1 def display_in_footer(self, footer, i = 0, print_time_p=0): if print_time_p: footer = "%s %s" % (footer, time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())) maxx = self.stdscr.getmaxyx()[1] footer = footer.ljust(maxx) if self.auto_mode: colorpair = 2 elif self.move_mode: colorpair = 7 else: colorpair = 1 self.stdscr.addnstr(self.y - i, 0, footer, maxx - 1, curses.A_STANDOUT + curses.color_pair(colorpair) + curses.A_BOLD ) def repaint(self): self.y = 0 self.stdscr.clear() self.height,self.width = self.stdscr.getmaxyx() maxy = self.height - 2 maxx = self.width self.put_line( ("ID","PROC","USER","RUNTIME","SLEEP","STATUS","PROGRESS") ) self.put_line( ("---","----","----","-------------------","-----","-----","--------") ) if self.selected_line > maxy + self.first_visible_line - 1: self.first_visible_line = self.selected_line - maxy + 1 if self.selected_line < self.first_visible_line + 2: self.first_visible_line = self.selected_line - 2 for row in self.rows[self.first_visible_line:self.first_visible_line+maxy-2]: id,proc,user,runtime,sleeptime,status,progress,arguments = row self.put_line( row ) self.y = self.stdscr.getmaxyx()[0] - 1 if self.auto_mode: self.display_in_footer(self.footer_auto_mode, print_time_p=1) elif self.move_mode: self.display_in_footer(self.footer_move_mode, print_time_p=1) else: self.display_in_footer(self.footer_select_mode, print_time_p=1) footer2 = "" if sre.search("DONE",self.item_status) or self.item_status == "ERROR" or self.item_status == "STOPPED": footer2 += self.footer_stopped_item elif self.item_status == "RUNNING" or self.item_status == "CONTINUING": footer2 += self.footer_running_item elif self.item_status == "SLEEPING": footer2 += self.footer_sleeping_item elif self.item_status == "WAITING": footer2 += self.footer_waiting_item self.display_in_footer(footer2,1) self.stdscr.refresh() def start(self, stdscr): ring = 0 curses.start_color() curses.init_pair(8, curses.COLOR_WHITE, curses.COLOR_BLACK) curses.init_pair(1, curses.COLOR_WHITE, curses.COLOR_RED) curses.init_pair(2, curses.COLOR_GREEN, curses.COLOR_BLACK) curses.init_pair(3, curses.COLOR_MAGENTA, curses.COLOR_BLACK) curses.init_pair(4, curses.COLOR_RED, curses.COLOR_BLACK) curses.init_pair(5, curses.COLOR_BLUE, curses.COLOR_BLACK) curses.init_pair(6, curses.COLOR_CYAN, curses.COLOR_BLACK) curses.init_pair(7, curses.COLOR_YELLOW, curses.COLOR_BLACK) self.stdscr = stdscr self.stdscr.nodelay(1) self.base_panel = curses.panel.new_panel( self.stdscr ) self.base_panel.bottom() curses.panel.update_panels() self.height,self.width = stdscr.getmaxyx() self.stdscr.clear() if server_pid (): self.auto_mode = 1 if self.display == 1: where = "and status='DONE'" order = "DESC" else: where = "and status!='DONE'" order = "ASC" self.rows = run_sql("SELECT id,proc,user,runtime,sleeptime,status,progress,arguments FROM schTASK WHERE status NOT LIKE '%%DELETED%%' %s ORDER BY runtime %s" % (where,order)) self.repaint() while self.running: time.sleep( 0.1 ) self.handle_keys() if ring == 20: if self.display == 1: where = "and status='DONE'" order = "DESC" else: where = "and status!='DONE'" order = "ASC" self.rows = run_sql("SELECT id,proc,user,runtime,sleeptime,status,progress,arguments FROM schTASK WHERE status NOT LIKE '%%DELETED%%' %s ORDER BY runtime %s" % (where,order)) ring = 0 self.repaint() else: ring = ring+1 class BibSched: def __init__(self): self.helper_modules = cfg_valid_processes self.running = {} self.sleep_done = {} self.sleep_sent ={} self.stop_sent = {} self.suicide_sent = {} def set_status(self, id, status): return run_sql("UPDATE schTASK set status=%s WHERE id=%s", (status, id)) def can_run( self, proc ): return len( self.running.keys() ) == 0 def get_running_processes(self): row = None res = run_sql("SELECT id,proc,user,UNIX_TIMESTAMP(runtime),sleeptime,arguments,status FROM schTASK "\ " WHERE status='RUNNING' or status='CONTINUING' LIMIT 1") try: row = res[0] except: pass return row def handle_row( self, row ): id,proc,user,runtime,sleeptime,arguments,status = row if status == "SLEEP": if id in self.running.keys(): self.set_status( id, "SLEEP SENT" ) os.kill( self.running[id], signal.SIGUSR1 ) self.sleep_sent[id] = self.running[id] elif status == "SLEEPING": if id in self.sleep_sent.keys(): self.sleep_done[id] = self.sleep_sent[id] del self.sleep_sent[id] if status == "WAKEUP": if id in self.sleep_done.keys(): self.running[id] = self.sleep_done[id] del self.sleep_done[id] os.kill( self.running[id], signal.SIGCONT ) self.set_status( id, "RUNNING" ) elif status == "STOP": if id in self.running.keys(): self.set_status( id, "STOP SENT" ) os.kill( self.running[id], signal.SIGUSR2 ) self.stop_sent[id] = self.running[id] del self.running[id] elif status == "STOPPED" and id in self.stop_sent.keys(): del self.stop_sent[id] elif status == "SUICIDE": if id in self.running.keys(): self.set_status( id, "SUICIDE SENT" ) os.kill( self.running[id], signal.SIGABRT ) self.suicide_sent[id] = self.running[id] del self.running[id] elif status == "SUICIDED" and id in self.suicide_sent.keys(): del self.suicide_sent[ id ] elif sre.search("DONE",status) and id in self.running.keys(): del self.running[id] elif self.can_run(proc) and status == "WAITING" and runtime <= time.time(): if proc in self.helper_modules: program=os.path.join( bindir, proc ) fdout, fderr = get_output_channelnames(id) COMMAND = "%s %s >> %s 2>> %s" % (program, str(id), fdout, fderr) Log("task #%d (%s) started" % (id, proc)) os.system(COMMAND) Log("task #%d (%s) ended" % (id, proc)) self.running[id] = get_my_pid(proc,str(id)) if sleeptime: next_runtime=get_datetime(sleeptime) run_sql("INSERT INTO schTASK (proc, user, runtime, sleeptime, arguments, status) "\ " VALUES (%s,%s,%s,%s,%s,'WAITING')", (proc, user, next_runtime, sleeptime, arguments)) def watch_loop(self): running_process = self.get_running_processes() if running_process: proc = running_process[ 1 ] id = running_process[ 0 ] if get_my_pid(proc,str(id)): self.running[id] = get_my_pid(proc,str(id)) else: self.set_status(id,"ERROR") rows = [] while 1: for row in rows: self.handle_row( row ) time.sleep(1) rows = run_sql("SELECT id,proc,user,UNIX_TIMESTAMP(runtime),sleeptime,arguments,status FROM schTASK ORDER BY runtime ASC") def Log(message): log=open(logdir + "/bibsched.log","a") log.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())) log.write(message) log.write("\n") log.close() def redirect_stdout_and_stderr(): "This function redirects stdout and stderr to bibsched.log and bibsched.err file." sys.stdout = open(logdir + "/bibsched.log", "a") sys.stderr = open(logdir + "/bibsched.err", "a") def usage(exitcode=1, msg=""): """Prints usage info.""" if msg: sys.stderr.write("Error: %s.\n" % msg) sys.stderr.write ("""\ Usage: %s [options] [start|stop|restart|monitor] The following commands are available for bibsched: - start: start bibsched in background - stop: stop a running bibsched - restart: restart a running bibsched - monitor: enter the interactive monitor Command options: -d, --daemon\t Launch BibSched in the daemon mode (deprecated, use 'start') General options: -h, --help \t\t Print this help. -V, --version \t\t Print version information. """ % sys.argv [0]) #sys.stderr.write(" -v, --verbose=LEVEL \t Verbose level (0=min, 1=default, 9=max).\n") sys.exit(exitcode) import pwd, grp prefix = '@prefix@' pidfile = os.path.join (prefix, 'var', 'run', 'bibsched.pid') def error (msg): print >> sys.stderr, "error: " + msg sys.exit (1) def server_pid (): # The pid must be stored on the filesystem try: pid = int (open (pidfile).read ()) except IOError: return None # Even if the pid is available, we check if it corresponds to an # actual process, as it might have been killed externally try: os.kill (pid, signal.SIGCONT) except OSError: return None return pid def start (verbose = True): """ Fork this process in the background and start processing requests. The process PID is stored in a pid file, so that it can be stopped later on.""" if verbose: sys.stdout.write ("starting bibsched: ") sys.stdout.flush () pid = server_pid () if pid: error ("another instance of bibsched (pid %d) is running" % pid) # start the child process using the "double fork" technique pid = os.fork () if pid > 0: sys.exit (0) os.setsid () os.chdir ('/') pid = os.fork () if pid > 0: if verbose: sys.stdout.write ('pid %d\n' % pid) Log ("daemon started (pid %d)" % pid) open (pidfile, 'w').write ('%d' % pid) return sys.stdin.close () redirect_stdout_and_stderr () sched = BibSched() sched.watch_loop () return def stop (verbose = True): pid = server_pid () if not pid: error ('bibsched seems not to be running.') try: os.kill (pid, signal.SIGKILL) except OSError: print >> sys.stderr, 'no bibsched process found' Log ("daemon stopped (pid %d)" % pid) if verbose: print "stopping bibsched: pid %d" % pid os.unlink (pidfile) return def monitor (verbose = True): redirect_stdout_and_stderr() manager = Manager () return def restart (verbose = True): stop (verbose) start (verbose) return def main(): verbose = True try: opts, args = getopt.getopt(sys.argv[1:], "hVdq", [ "--help","--version","--daemon", "--quiet"]) except getopt.GetoptError, err: Log ("Error: %s" % err) usage(1, err) for opt, arg in opts: if opt in ["-h", "--help"]: usage (0) elif opt in ["-V", "--version"]: print __version__ sys.exit(0) elif opt in ["-d", "--daemon"]: redirect_stdout_and_stderr () sched = BibSched() Log("daemon started") sched.watch_loop() elif opt in ['-q', '--quiet']: verbose = False else: usage(1) try: cmd = args [0] except IndexError: cmd = 'monitor' try: { 'start': start, 'stop': stop, 'restart': restart, 'monitor': monitor } [cmd] (verbose) except KeyError: usage (1, 'unkown command: %s' % `cmd`) return if __name__ == '__main__': main() diff --git a/modules/bibsched/bin/bibtaskex.in b/modules/bibsched/bin/bibtaskex.in index f24123a2d..81a43786d 100644 --- a/modules/bibsched/bin/bibtaskex.in +++ b/modules/bibsched/bin/bibtaskex.in @@ -1,335 +1,333 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware Bibliographic Task Example. Demonstrates BibTask <-> BibSched connectivity, signal handling, error handling, etc. """ __version__ = "$Id$" ## import interesting modules: try: import sys - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.dbquery import run_sql from cdsware.access_control_engine import acc_authorize_action import getopt import getpass import marshal import signal import sre import string import sys import time import traceback except ImportError, e: print "Error: %s" % e import sys sys.exit(1) cfg_n_default = 30 # how many Fibonacci numbers to calculate if none submitted? def get_datetime(var, format_string="%Y-%m-%d %H:%M:%S"): """Returns a date string according to the format string. It can handle normal date strings and shifts with respect to now.""" date = time.time() shift_re=sre.compile("([-\+]{0,1})([\d]+)([dhms])") factors = {"d":24*3600, "h":3600, "m":60, "s":1} m = shift_re.match(var) if m: sign = m.groups()[0] == "-" and -1 or 1 factor = factors[m.groups()[2]] value = float(m.groups()[1]) date = time.localtime(date + sign * factor * value) date = time.strftime(format_string, date) else: date = time.strptime(var, format_string) date = time.strftime(format_string, date) return date def write_message(msg, stream=sys.stdout): """Write message and flush output stream (may be sys.stdout or sys.stderr). Useful for debugging stuff.""" if stream == sys.stdout or stream == sys.stderr: stream.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())) stream.write("%s\n" % msg) stream.flush() else: sys.stderr.write("Unknown stream %s. [must be sys.stdout or sys.stderr]\n" % stream) return def task_sig_sleep(sig, frame): """Signal handler for the 'sleep' signal sent by BibSched.""" write_message("sleeping...") task_update_status("SLEEPING") signal.pause() # wait for wake-up signal def task_sig_wakeup(sig, frame): """Signal handler for the 'wakeup' signal sent by BibSched.""" write_message("continuing...") task_update_status("CONTINUING") def task_sig_stop(sig, frame): """Signal handler for the 'stop' signal sent by BibSched.""" write_message("stopping...") task_update_status("STOPPING") write_message("flushing cache or whatever...") time.sleep(3) write_message("closing tables or whatever...") time.sleep(1) write_message("stopped") task_update_status("STOPPED") sys.exit(0) def task_sig_suicide(sig, frame): """Signal handler for the 'suicide' signal sent by BibSched.""" write_message("suiciding myself now...") task_update_status("SUICIDING") write_message("suicided") task_update_status("SUICIDED") sys.exit(0) def task_sig_unknown(sig, frame): """Signal handler for the other unknown signals sent by shell or user.""" write_message("unknown signal %d ignored" % sig) # do nothing for other signals def fib(n): """Returns Fibonacci number for 'n'.""" out = 1 if n >= 2: out = fib(n-2) + fib(n-1) return out def authenticate(user, header="BibTaskEx Task Submission", action="runbibtaskex"): """Authenticate the user against the user database. Check for its password, if it exists. Check for action access rights. Return user name upon authorization success, do system exit upon authorization failure. """ print header print "=" * len(header) if user == "": print >> sys.stdout, "\rUsername: ", user = string.strip(string.lower(sys.stdin.readline())) else: print >> sys.stdout, "\rUsername: ", user ## first check user pw: res = run_sql("select id,password from user where email=%s", (user,), 1) if not res: print "Sorry, %s does not exist." % user sys.exit(1) else: (uid_db, password_db) = res[0] if password_db: password_entered = getpass.getpass() if password_db == password_entered: pass else: print "Sorry, wrong credentials for %s." % user sys.exit(1) ## secondly check authorization for the action: (auth_code, auth_message) = acc_authorize_action(uid_db, action) if auth_code != 0: print auth_message sys.exit(1) return user def task_submit(options): """Submits task to the BibSched task queue. This is what people will be invoking via command line.""" ## sanity check: remove eventual "task" option: if options.has_key("task"): del options["task"] ## authenticate user: user = authenticate(options.get("user", "")) ## submit task: if options["verbose"] >= 9: print "" write_message("storing task options %s\n" % options) task_id = run_sql("""INSERT INTO schTASK (id,proc,user,runtime,sleeptime,status,arguments) VALUES (NULL,'bibtaskex',%s,%s,%s,'WAITING',%s)""", (user, options["runtime"], options["sleeptime"], marshal.dumps(options))) ## update task number: options["task"] = task_id run_sql("""UPDATE schTASK SET arguments=%s WHERE id=%s""", (marshal.dumps(options),task_id)) write_message("Task #%d submitted." % task_id) return task_id def task_update_progress(msg): """Updates progress information in the BibSched task table.""" global task_id, options if options["verbose"] >= 9: write_message("Updating task progress to %s." % msg) return run_sql("UPDATE schTASK SET progress=%s where id=%s", (msg, task_id)) def task_update_status(val): """Updates status information in the BibSched task table.""" global task_id, options if options["verbose"] >= 9: write_message("Updating task status to %s." % val) return run_sql("UPDATE schTASK SET status=%s where id=%s", (val, task_id)) def task_read_status(task_id): """Read status information in the BibSched task table.""" res = run_sql("SELECT status FROM schTASK where id=%s", (task_id,), 1) try: out = res[0][0] except: out = 'UNKNOWN' return out def task_get_options(id): """Returns options for the task 'id' read from the BibSched task queue table.""" out = {} res = run_sql("SELECT arguments FROM schTASK WHERE id=%s AND proc='bibtaskex'", (id,)) try: out = marshal.loads(res[0][0]) except: write_message("Error: BibTaskEx task %d does not seem to exist." % id, sys.stderr) sys.exit(1) return out def task_run(): """Runs the task by fetching arguments from the BibSched task queue. This is what BibSched will be invoking via daemon call. The task prints Fibinacci numbers for up to NUM on the stdout, and some messages on stderr. Return 1 in case of success and 0 in case of failure.""" global task_id, options options = task_get_options(task_id) # get options from BibSched task table ## check task id: if not options.has_key("task"): write_message("Error: The task #%d does not seem to be a BibTaskEx task." % task_id, sys.stderr) return 0 ## check task status: task_status = task_read_status(task_id) if task_status != "WAITING": write_message("Error: The task #%d is %s. I expected WAITING." % (task_id, task_status), sys.stderr) return 0 ## we can run the task now: if options["verbose"]: write_message("Task #%d started." % task_id) task_update_status("RUNNING") ## initialize signal handler: signal.signal(signal.SIGUSR1, task_sig_sleep) signal.signal(signal.SIGTERM, task_sig_stop) signal.signal(signal.SIGABRT, task_sig_suicide) signal.signal(signal.SIGCONT, task_sig_wakeup) signal.signal(signal.SIGINT, task_sig_unknown) ## run the task: if options.has_key("number"): n = options["number"] else: n = cfg_n_default if options["verbose"] >= 9: write_message("Printing %d Fibonacci numbers." % n) for i in range(0,n): if i > 0 and i % 4 == 0: if options["verbose"] >= 3: write_message("Error: water in the CPU. Ignoring and continuing.", sys.stderr) elif i > 0 and i % 5 == 0: if options["verbose"]: write_message("Error: floppy drive dropped on the floor. Ignoring and continuing.", sys.stderr) if options["verbose"]: write_message("fib(%d)=%d" % (i, fib(i))) task_update_progress("Done %d out of %d." % (i,n)) time.sleep(1) ## we are done: task_update_progress("Done %d out of %d." % (n,n)) task_update_status("DONE") if options["verbose"]: write_message("Task #%d finished." % task_id) return 1 def usage(exitcode=1, msg=""): """Prints usage info.""" if msg: sys.stderr.write("Error: %s.\n" % msg) sys.stderr.write("Usage: %s [options]\n" % sys.argv[0]) sys.stderr.write("Command options:\n") sys.stderr.write(" -n, --number=NUM\t Print Fibonacci numbers for up to NUM. [default=%d]\n" % cfg_n_default) sys.stderr.write("Scheduling options:\n") sys.stderr.write(" -u, --user=USER \t User name to submit the task as, password needed.\n") sys.stderr.write(" -t, --runtime=TIME \t Time to execute the task (now), e.g.: +15s, 5m, 3h, 2002-10-27 13:57:26\n") sys.stderr.write(" -s, --sleeptime=SLEEP \t Sleeping frequency after which to repeat task (no), e.g.: 30m, 2h, 1d\n") sys.stderr.write("General options:\n") sys.stderr.write(" -h, --help \t\t Print this help.\n") sys.stderr.write(" -V, --version \t\t Print version information.\n") sys.stderr.write(" -v, --verbose=LEVEL \t Verbose level (0=min, 1=default, 9=max).\n") sys.exit(exitcode) def main(): """Main function that analyzes command line input and calls whatever is appropriate. Useful for learning on how to write BibSched tasks.""" global task_id, options ## parse command line: if len(sys.argv) == 2 and sys.argv[1].isdigit(): ## A - run the task task_id = int(sys.argv[1]) try: if not task_run(): write_message("Error occurred. Exiting.", sys.stderr) except StandardError, e: write_message("Unexpected error occurred: %s." % e, sys.stderr) write_message("Traceback is:", sys.stderr) traceback.print_tb(sys.exc_info()[2]) write_message("Exiting.", sys.stderr) task_update_status("ERROR") else: ## B - submit the task # set default values: options = {} options["runtime"] = time.strftime("%Y-%m-%d %H:%M:%S") options["sleeptime"] = "" options["verbose"] = 1 # set user-defined options: try: opts, args = getopt.getopt(sys.argv[1:], "hVv:n:u:s:t:", ["help", "version", "verbose=","number=","user=","sleep=","time="]) except getopt.GetoptError, err: usage(1, err) try: for opt in opts: if opt[0] in ["-h", "--help"]: usage(0) elif opt[0] in ["-V", "--version"]: print __version__ sys.exit(0) elif opt[0] in [ "-u", "--user"]: options["user"] = opt[1] elif opt[0] in ["-v", "--verbose"]: options["verbose"] = int(opt[1]) elif opt[0] in ["-n", "--number"]: options["number"]=int(opt[1]) elif opt[0] in [ "-s", "--sleeptime" ]: get_datetime(opt[1]) # see if it is a valid shift options["sleeptime"] = opt[1] elif opt[0] in [ "-t", "--runtime" ]: options["runtime"] = get_datetime(opt[1]) else: usage(1) except StandardError, e: usage(e) task_submit(options) return ### okay, here we go: if __name__ == '__main__': main() diff --git a/modules/elmsubmit/bin/elmsubmit.in b/modules/elmsubmit/bin/elmsubmit.in index eedbbdec9..3fff84bf4 100644 --- a/modules/elmsubmit/bin/elmsubmit.in +++ b/modules/elmsubmit/bin/elmsubmit.in @@ -1,31 +1,29 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import sys -pylibdir = "@prefix@/lib/python" -sys.path.append('%s' % pylibdir) import cdsware.elmsubmit as elmsubmit if __name__ == '__main__': elmsubmit.process_email(sys.stdin.read()) diff --git a/modules/miscutil/bin/testsuite.in b/modules/miscutil/bin/testsuite.in index f4765989a..72151f9f0 100644 --- a/modules/miscutil/bin/testsuite.in +++ b/modules/miscutil/bin/testsuite.in @@ -1,77 +1,75 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Run CDSware test suite.""" try: import unittest import sys except ImportError, e: print "Error: %s" % e import sys sys.exit(1) try: - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.config import version from cdsware import search_engine_tests from cdsware import bibindex_engine_tests from cdsware import bibindex_engine_stemmer_tests from cdsware import bibrecord_tests from cdsware import bibrank_citation_indexer_tests from cdsware import bibrank_citation_searcher_tests from cdsware import bibrank_downloads_indexer_tests from cdsware import bibrank_record_sorter_tests from cdsware import bibrank_tag_based_indexer_tests from cdsware import oai_repository_tests from cdsware import bibconvert_tests from cdsware import errorlib_tests except ImportError, e: print "Error: %s" % e import sys sys.exit(1) def create_all_test_suites(): """Return all tests suites for all CDSware modules.""" return unittest.TestSuite((search_engine_tests.create_test_suite(), bibindex_engine_tests.create_test_suite(), bibindex_engine_stemmer_tests.create_test_suite(), bibrecord_tests.create_test_suite(), bibrank_citation_indexer_tests.create_test_suite(), bibrank_citation_searcher_tests.create_test_suite(), bibrank_downloads_indexer_tests.create_test_suite(), bibrank_record_sorter_tests.create_test_suite(), bibrank_tag_based_indexer_tests.create_test_suite(), oai_repository_tests.create_test_suite(), bibconvert_tests.create_test_suite(), errorlib_tests.create_test_suite())) def print_info_line(): """Prints info line about tests to be executed.""" info_line = """CDSware v%s test suite results:""" % version print info_line print "=" * len(info_line) if __name__ == "__main__": print_info_line() unittest.TextTestRunner(verbosity=2).run(create_all_test_suites()) diff --git a/modules/miscutil/lib/messages.py.wml b/modules/miscutil/lib/messages.py.wml index 2bfe8321c..0a6144e76 100644 --- a/modules/miscutil/lib/messages.py.wml +++ b/modules/miscutil/lib/messages.py.wml @@ -1,1921 +1,1921 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. #include "config.wml" #include "cdswmllib.wml" ## -*- coding: utf-8 -*- ## $Id$ ## DO NOT EDIT THIS FILE! IT WAS AUTOMATICALLY GENERATED FROM CDSware WML SOURCES. """CDSware messages file, to be read by all Python programs.""" import string import sre -from config import localedir +from cdsware.config import localedir ## prepare messages: msg_search = {} msg_submit = {} msg_help = {} msg_help_central = {} msg_search_help = {} msg_search_tips = {} msg_personalize = {} msg_collection_not_found_head = {} msg_collection_not_found_body = {} msg_home = {} msg_powered_by = {} msg_maintained_by = {} msg_last_updated = {} msg_narrow_search = {} msg_focus_on = {} msg_simple_search = {} msg_advanced_search = {} msg_account = {} msg_alerts = {} msg_baskets = {} msg_submissions = {} msg_approvals = {} msg_session = {} msg_login = {} msg_logout = {} msg_language = {} msg_browse = {} msg_search_records_for = {} msg_all_of_the_words = {} msg_any_of_the_words = {} msg_exact_phrase = {} msg_partial_phrase = {} msg_regular_expression = {} msg_and = {} msg_or = {} msg_and_not = {} msg_search_results = {} msg_try_your_search_on = {} msg_not_found_what_you_were_looking_for = {} msg_see_also_similar_author_names = {} msg_in = {} msg_sort_by = {} msg_latest_first = {} msg_ascending = {} msg_descending = {} msg_rank_by = {} msg_display_results = {} msg_results = {} msg_single_list = {} msg_split_by_collection = {} msg_output_format = {} msg_brief = {} msg_detailed = {} msg_detailed_record = {} msg_results_overview_found_x_records_in_y_seconds = {} msg_x_records_found = {} msg_jump_to_record = {} msg_search_took_x_seconds = {} msg_add_to_basket = {} msg_collections = {} msg_any_collection = {} msg_add_another_collection = {} msg_guest = {} msg_added_since = {} msg_until = {} msg_any_day = {} msg_any_month = {} msg_any_year = {} msg_january = {} msg_february = {} msg_march = {} msg_april = {} msg_may = {} msg_june = {} msg_july = {} msg_august = {} msg_september = {} msg_october = {} msg_november = {} msg_december = {} msg_search_term = {} msg_inside_index = {} msg_did_not_match = {} msg_no_words_index_available = {} msg_no_phrase_index_available = {} msg_no_hits_in_given_collection = {} msg_no_public_hits = {} msg_no_match_within_time_limits = {} msg_no_match_within_search_limits = {} msg_no_boolean_hits = {} msg_no_exact_match_for_foo_using_bar_instead = {} msg_internal_error = {} msg_please_contact_and_quote = {} msg_search_options = {} msg_latest_additions = {} msg_more = {} msg_collection_contains_no_records = {} msg_collection_restricted_content = {} msg_convert = {} msg_agenda = {} msg_webcast = {} msg_bulletin = {} msg_library = {} msg_this_page_in_languages = {} msg_admin_area = {} msg_administration = {} msg_thousands_separator = {} msg_record_deleted = {} msg_restricted = {} msg_record_last_modified = {} msg_similar_records = {} msg_cited_by = {} msg_cited_by_x_records = {} msg_cocited_with_x_records = {} msg_citation_history = {} msg_downloads_history = {} msg_people_who_viewed_this_page = {} msg_people_who_downloaded_this_document = {} ## English: msg_home['en'] = """""" msg_search['en'] = """""" msg_submit['en'] = """""" msg_help['en'] = """""" msg_help_central['en'] = """""" msg_search_help['en'] = """""" msg_search_tips['en'] = """""" msg_personalize['en'] = """""" msg_collection_not_found_head['en'] = """""" msg_collection_not_found_body['en'] = """""" msg_home['en'] = """""" msg_powered_by['en'] = """""" msg_maintained_by['en'] = """""" msg_last_updated['en'] = """""" msg_narrow_search['en'] = """""" msg_focus_on['en'] = """""" msg_simple_search['en'] = """""" msg_advanced_search['en'] = """""" msg_account['en'] = """""" msg_alerts['en'] = """""" msg_baskets['en'] = """""" msg_submissions['en'] = """""" msg_approvals['en'] = """""" msg_session['en'] = """""" msg_login['en'] = """""" msg_logout['en'] = """""" msg_language['en'] = """""" msg_browse['en'] = """""" msg_search_records_for['en'] = """""" msg_all_of_the_words['en'] = """""" msg_any_of_the_words['en'] = """""" msg_exact_phrase['en'] = """""" msg_partial_phrase['en'] = """""" msg_regular_expression['en'] = """""" msg_and['en'] = """""" msg_or['en'] = """""" msg_and_not['en'] = """""" msg_search_results['en'] = """""" msg_try_your_search_on['en'] = """""" msg_not_found_what_you_were_looking_for['en'] = """""" msg_see_also_similar_author_names['en'] = """""" msg_in['en'] = """""" msg_sort_by['en'] = """""" msg_latest_first['en'] = """""" msg_ascending['en'] = """""" msg_descending['en'] = """""" msg_rank_by['en'] = """""" msg_display_results['en'] = """""" msg_results['en'] = """""" msg_single_list['en'] = """""" msg_split_by_collection['en'] = """""" msg_output_format['en'] = """""" msg_brief['en'] = """""" msg_detailed['en'] = """""" msg_detailed_record['en'] = """""" msg_results_overview_found_x_records_in_y_seconds['en'] = """""" msg_x_records_found['en'] = """""" msg_jump_to_record['en'] = """""" msg_search_took_x_seconds['en'] = """""" msg_add_to_basket['en'] = """""" msg_collections['en'] = """""" msg_any_collection['en'] = """""" msg_add_another_collection['en'] = """""" msg_guest['en'] = """""" msg_added_since['en'] = """""" msg_until['en'] = """""" msg_any_day['en'] = """""" msg_any_month['en'] = """""" msg_any_year['en'] = """""" msg_january['en'] = """""" msg_february['en'] = """""" msg_march['en'] = """""" msg_april['en'] = """""" msg_may['en'] = """""" msg_june['en'] = """""" msg_july['en'] = """""" msg_august['en'] = """""" msg_september['en'] = """""" msg_october['en'] = """""" msg_november['en'] = """""" msg_december['en'] = """""" msg_search_term['en'] = """""" msg_inside_index['en'] = """""" msg_did_not_match['en'] = """""" msg_no_words_index_available['en'] = """""" msg_no_phrase_index_available['en'] = """""" msg_no_hits_in_given_collection['en'] = """""" msg_no_public_hits['en'] = """""" msg_no_match_within_time_limits['en'] = """""" msg_no_match_within_search_limits['en'] = """""" msg_no_boolean_hits['en'] = """""" msg_no_exact_match_for_foo_using_bar_instead['en'] = """""" msg_internal_error['en'] = """""" msg_please_contact_and_quote['en'] = """""" msg_search_options['en'] = """""" msg_latest_additions['en'] = """""" msg_more['en'] = """""" msg_collection_contains_no_records['en'] = """""" msg_collection_restricted_content['en'] = """""" msg_convert['en'] = """""" msg_agenda['en'] = """""" msg_webcast['en'] = """""" msg_bulletin['en'] = """""" msg_library['en'] = """""" msg_this_page_in_languages['en'] = """""" msg_admin_area['en'] = """""" msg_administration['en'] = """""" msg_thousands_separator['en'] = """""" msg_record_deleted['en'] = """""" msg_restricted['en'] = """""" msg_record_last_modified['en'] = """""" msg_similar_records['en'] = """""" msg_cited_by['en'] = """""" msg_cited_by_x_records['en'] = """""" msg_cocited_with_x_records['en'] = """""" msg_citation_history['en'] = """""" msg_downloads_history['en'] = """""" msg_people_who_viewed_this_page['en'] = """""" msg_people_who_downloaded_this_document['en'] = """""" ## French: msg_home['fr'] = """""" msg_search['fr'] = """""" msg_submit['fr'] = """""" msg_help['fr'] = """""" msg_help_central['fr'] = """""" msg_search_help['fr'] = """""" msg_search_tips['fr'] = """""" msg_personalize['fr'] = """""" msg_collection_not_found_head['fr'] = """""" msg_collection_not_found_body['fr'] = """""" msg_home['fr'] = """""" msg_powered_by['fr'] = """""" msg_maintained_by['fr'] = """""" msg_last_updated['fr'] = """""" msg_narrow_search['fr'] = """""" msg_focus_on['fr'] = """""" msg_simple_search['fr'] = """""" msg_advanced_search['fr'] = """""" msg_account['fr'] = """""" msg_alerts['fr'] = """""" msg_baskets['fr'] = """""" msg_submissions['fr'] = """""" msg_approvals['fr'] = """""" msg_session['fr'] = """""" msg_login['fr'] = """""" msg_logout['fr'] = """""" msg_language['fr'] = """""" msg_browse['fr'] = """""" msg_search_records_for['fr'] = """""" msg_all_of_the_words['fr'] = """""" msg_any_of_the_words['fr'] = """""" msg_exact_phrase['fr'] = """""" msg_partial_phrase['fr'] = """""" msg_regular_expression['fr'] = """""" msg_and['fr'] = """""" msg_or['fr'] = """""" msg_and_not['fr'] = """""" msg_search_results['fr'] = """""" msg_try_your_search_on['fr'] = """""" msg_not_found_what_you_were_looking_for['fr'] = """""" msg_see_also_similar_author_names['fr'] = """""" msg_in['fr'] = """""" msg_sort_by['fr'] = """""" msg_latest_first['fr'] = """""" msg_ascending['fr'] = """""" msg_descending['fr'] = """""" msg_rank_by['fr'] = """""" msg_display_results['fr'] = """""" msg_results['fr'] = """""" msg_single_list['fr'] = """""" msg_split_by_collection['fr'] = """""" msg_output_format['fr'] = """""" msg_brief['fr'] = """""" msg_detailed['fr'] = """""" msg_detailed_record['fr'] = """""" msg_results_overview_found_x_records_in_y_seconds['fr'] = """""" msg_x_records_found['fr'] = """""" msg_jump_to_record['fr'] = """""" msg_search_took_x_seconds['fr'] = """""" msg_add_to_basket['fr'] = """""" msg_collections['fr'] = """""" msg_any_collection['fr'] = """""" msg_add_another_collection['fr'] = """""" msg_guest['fr'] = """""" msg_added_since['fr'] = """""" msg_until['fr'] = """""" msg_any_day['fr'] = """""" msg_any_month['fr'] = """""" msg_any_year['fr'] = """""" msg_january['fr'] = """""" msg_february['fr'] = """""" msg_march['fr'] = """""" msg_april['fr'] = """""" msg_may['fr'] = """""" msg_june['fr'] = """""" msg_july['fr'] = """""" msg_august['fr'] = """""" msg_september['fr'] = """""" msg_october['fr'] = """""" msg_november['fr'] = """""" msg_december['fr'] = """""" msg_search_term['fr'] = """""" msg_inside_index['fr'] = """""" msg_did_not_match['fr'] = """""" msg_no_words_index_available['fr'] = """""" msg_no_phrase_index_available['fr'] = """""" msg_no_hits_in_given_collection['fr'] = """""" msg_no_public_hits['fr'] = """""" msg_no_match_within_time_limits['fr'] = """""" msg_no_match_within_search_limits['fr'] = """""" msg_no_boolean_hits['fr'] = """""" msg_no_exact_match_for_foo_using_bar_instead['fr'] = """""" msg_internal_error['fr'] = """""" msg_please_contact_and_quote['fr'] = """""" msg_search_options['fr'] = """""" msg_latest_additions['fr'] = """""" msg_more['fr'] = """""" msg_collection_contains_no_records['fr'] = """""" msg_collection_restricted_content['fr'] = """""" msg_convert['fr'] = """""" msg_agenda['fr'] = """""" msg_webcast['fr'] = """""" msg_bulletin['fr'] = """""" msg_library['fr'] = """""" msg_this_page_in_languages['fr'] = """""" msg_admin_area['fr'] = """""" msg_administration['fr'] = """""" msg_thousands_separator['fr'] = """""" msg_record_deleted['fr'] = """""" msg_restricted['fr'] = """""" msg_record_last_modified['fr'] = """""" msg_similar_records['fr'] = """""" msg_cited_by['fr'] = """""" msg_cited_by_x_records['fr'] = """""" msg_cocited_with_x_records['fr'] = """""" msg_citation_history['fr'] = """""" msg_downloads_history['fr'] = """""" msg_people_who_viewed_this_page['fr'] = """""" msg_people_who_downloaded_this_document['fr'] = """""" ## German: msg_home['de'] = """""" msg_search['de'] = """""" msg_submit['de'] = """""" msg_help['de'] = """""" msg_help_central['de'] = """""" msg_search_help['de'] = """""" msg_search_tips['de'] = """""" msg_personalize['de'] = """""" msg_collection_not_found_head['de'] = """""" msg_collection_not_found_body['de'] = """""" msg_home['de'] = """""" msg_powered_by['de'] = """""" msg_maintained_by['de'] = """""" msg_last_updated['de'] = """""" msg_narrow_search['de'] = """""" msg_focus_on['de'] = """""" msg_simple_search['de'] = """""" msg_advanced_search['de'] = """""" msg_account['de'] = """""" msg_alerts['de'] = """""" msg_baskets['de'] = """""" msg_submissions['de'] = """""" msg_approvals['de'] = """""" msg_session['de'] = """""" msg_login['de'] = """""" msg_logout['de'] = """""" msg_language['de'] = """""" msg_browse['de'] = """""" msg_search_records_for['de'] = """""" msg_all_of_the_words['de'] = """""" msg_any_of_the_words['de'] = """""" msg_exact_phrase['de'] = """""" msg_partial_phrase['de'] = """""" msg_regular_expression['de'] = """""" msg_and['de'] = """""" msg_or['de'] = """""" msg_and_not['de'] = """""" msg_search_results['de'] = """""" msg_try_your_search_on['de'] = """""" msg_not_found_what_you_were_looking_for['de'] = """""" msg_see_also_similar_author_names['de'] = """""" msg_in['de'] = """""" msg_sort_by['de'] = """""" msg_latest_first['de'] = """""" msg_ascending['de'] = """""" msg_descending['de'] = """""" msg_rank_by['de'] = """""" msg_display_results['de'] = """""" msg_results['de'] = """""" msg_single_list['de'] = """""" msg_split_by_collection['de'] = """""" msg_output_format['de'] = """""" msg_brief['de'] = """""" msg_detailed['de'] = """""" msg_detailed_record['de'] = """""" msg_results_overview_found_x_records_in_y_seconds['de'] = """""" msg_x_records_found['de'] = """""" msg_jump_to_record['de'] = """""" msg_search_took_x_seconds['de'] = """""" msg_add_to_basket['de'] = """""" msg_collections['de'] = """""" msg_any_collection['de'] = """""" msg_add_another_collection['de'] = """""" msg_guest['de'] = """""" msg_added_since['de'] = """""" msg_until['de'] = """""" msg_any_day['de'] = """""" msg_any_month['de'] = """""" msg_any_year['de'] = """""" msg_january['de'] = """""" msg_february['de'] = """""" msg_march['de'] = """""" msg_april['de'] = """""" msg_may['de'] = """""" msg_june['de'] = """""" msg_july['de'] = """""" msg_august['de'] = """""" msg_september['de'] = """""" msg_october['de'] = """""" msg_november['de'] = """""" msg_december['de'] = """""" msg_search_term['de'] = """""" msg_inside_index['de'] = """""" msg_did_not_match['de'] = """""" msg_no_words_index_available['de'] = """""" msg_no_phrase_index_available['de'] = """""" msg_no_hits_in_given_collection['de'] = """""" msg_no_public_hits['de'] = """""" msg_no_match_within_time_limits['de'] = """""" msg_no_match_within_search_limits['de'] = """""" msg_no_boolean_hits['de'] = """""" msg_no_exact_match_for_foo_using_bar_instead['de'] = """""" msg_internal_error['de'] = """""" msg_please_contact_and_quote['de'] = """""" msg_search_options['de'] = """""" msg_latest_additions['de'] = """""" msg_more['de'] = """""" msg_collection_contains_no_records['de'] = """""" msg_collection_restricted_content['de'] = """""" msg_convert['de'] = """""" msg_agenda['de'] = """""" msg_webcast['de'] = """""" msg_bulletin['de'] = """""" msg_library['de'] = """""" msg_this_page_in_languages['de'] = """""" msg_admin_area['de'] = """""" msg_administration['de'] = """""" msg_thousands_separator['de'] = """""" msg_record_deleted['de'] = """""" msg_restricted['de'] = """""" msg_record_last_modified['de'] = """""" msg_similar_records['de'] = """""" msg_cited_by['de'] = """""" msg_cited_by_x_records['de'] = """""" msg_cocited_with_x_records['de'] = """""" msg_citation_history['de'] = """""" msg_downloads_history['de'] = """""" msg_people_who_viewed_this_page['de'] = """""" msg_people_who_downloaded_this_document['de'] = """""" ## Spanish: msg_home['es'] = """""" msg_search['es'] = """""" msg_submit['es'] = """""" msg_help['es'] = """""" msg_help_central['es'] = """""" msg_search_help['es'] = """""" msg_search_tips['es'] = """""" msg_personalize['es'] = """""" msg_collection_not_found_head['es'] = """""" msg_collection_not_found_body['es'] = """""" msg_home['es'] = """""" msg_powered_by['es'] = """""" msg_maintained_by['es'] = """""" msg_last_updated['es'] = """""" msg_narrow_search['es'] = """""" msg_focus_on['es'] = """""" msg_simple_search['es'] = """""" msg_advanced_search['es'] = """""" msg_account['es'] = """""" msg_alerts['es'] = """""" msg_baskets['es'] = """""" msg_submissions['es'] = """""" msg_approvals['es'] = """""" msg_session['es'] = """""" msg_login['es'] = """""" msg_logout['es'] = """""" msg_language['es'] = """""" msg_browse['es'] = """""" msg_search_records_for['es'] = """""" msg_all_of_the_words['es'] = """""" msg_any_of_the_words['es'] = """""" msg_exact_phrase['es'] = """""" msg_partial_phrase['es'] = """""" msg_regular_expression['es'] = """""" msg_and['es'] = """""" msg_or['es'] = """""" msg_and_not['es'] = """""" msg_search_results['es'] = """""" msg_try_your_search_on['es'] = """""" msg_not_found_what_you_were_looking_for['es'] = """""" msg_see_also_similar_author_names['es'] = """""" msg_in['es'] = """""" msg_sort_by['es'] = """""" msg_latest_first['es'] = """""" msg_ascending['es'] = """""" msg_descending['es'] = """""" msg_rank_by['es'] = """""" msg_display_results['es'] = """""" msg_results['es'] = """""" msg_single_list['es'] = """""" msg_split_by_collection['es'] = """""" msg_output_format['es'] = """""" msg_brief['es'] = """""" msg_detailed['es'] = """""" msg_detailed_record['es'] = """""" msg_results_overview_found_x_records_in_y_seconds['es'] = """""" msg_x_records_found['es'] = """""" msg_jump_to_record['es'] = """""" msg_search_took_x_seconds['es'] = """""" msg_add_to_basket['es'] = """""" msg_collections['es'] = """""" msg_any_collection['es'] = """""" msg_add_another_collection['es'] = """""" msg_guest['es'] = """""" msg_added_since['es'] = """""" msg_until['es'] = """""" msg_any_day['es'] = """""" msg_any_month['es'] = """""" msg_any_year['es'] = """""" msg_january['es'] = """""" msg_february['es'] = """""" msg_march['es'] = """""" msg_april['es'] = """""" msg_may['es'] = """""" msg_june['es'] = """""" msg_july['es'] = """""" msg_august['es'] = """""" msg_september['es'] = """""" msg_october['es'] = """""" msg_november['es'] = """""" msg_december['es'] = """""" msg_search_term['es'] = """""" msg_inside_index['es'] = """""" msg_did_not_match['es'] = """""" msg_no_words_index_available['es'] = """""" msg_no_phrase_index_available['es'] = """""" msg_no_hits_in_given_collection['es'] = """""" msg_no_public_hits['es'] = """""" msg_no_match_within_time_limits['es'] = """""" msg_no_match_within_search_limits['es'] = """""" msg_no_boolean_hits['es'] = """""" msg_no_exact_match_for_foo_using_bar_instead['de'] = """""" msg_internal_error['es'] = """""" msg_please_contact_and_quote['es'] = """""" msg_search_options['es'] = """""" msg_latest_additions['es'] = """""" msg_more['es'] = """""" msg_collection_contains_no_records['es'] = """""" msg_collection_restricted_content['es'] = """""" msg_convert['es'] = """""" msg_agenda['es'] = """""" msg_webcast['es'] = """""" msg_bulletin['es'] = """""" msg_library['es'] = """""" msg_this_page_in_languages['es'] = """""" msg_admin_area['es'] = """""" msg_administration['es'] = """""" msg_thousands_separator['es'] = """""" msg_record_deleted['es'] = """""" msg_restricted['es'] = """""" msg_record_last_modified['es'] = """""" msg_similar_records['es'] = """""" msg_cited_by['es'] = """""" msg_cited_by_x_records['es'] = """""" msg_cocited_with_x_records['es'] = """""" msg_citation_history['es'] = """""" msg_downloads_history['es'] = """""" msg_people_who_viewed_this_page['es'] = """""" msg_people_who_downloaded_this_document['es'] = """""" ## Catalan: msg_home['ca'] = """""" msg_search['ca'] = """""" msg_submit['ca'] = """""" msg_help['ca'] = """""" msg_help_central['ca'] = """""" msg_search_help['ca'] = """""" msg_search_tips['ca'] = """""" msg_personalize['ca'] = """""" msg_collection_not_found_head['ca'] = """""" msg_collection_not_found_body['ca'] = """""" msg_home['ca'] = """""" msg_powered_by['ca'] = """""" msg_maintained_by['ca'] = """""" msg_last_updated['ca'] = """""" msg_narrow_search['ca'] = """""" msg_focus_on['ca'] = """""" msg_simple_search['ca'] = """""" msg_advanced_search['ca'] = """""" msg_account['ca'] = """""" msg_alerts['ca'] = """""" msg_baskets['ca'] = """""" msg_submissions['ca'] = """""" msg_approvals['ca'] = """""" msg_session['ca'] = """""" msg_login['ca'] = """""" msg_logout['ca'] = """""" msg_language['ca'] = """""" msg_browse['ca'] = """""" msg_search_records_for['ca'] = """""" msg_all_of_the_words['ca'] = """""" msg_any_of_the_words['ca'] = """""" msg_exact_phrase['ca'] = """""" msg_partial_phrase['ca'] = """""" msg_regular_expression['ca'] = """""" msg_and['ca'] = """""" msg_or['ca'] = """""" msg_and_not['ca'] = """""" msg_search_results['ca'] = """""" msg_try_your_search_on['ca'] = """""" msg_not_found_what_you_were_looking_for['ca'] = """""" msg_see_also_similar_author_names['ca'] = """""" msg_in['ca'] = """""" msg_sort_by['ca'] = """""" msg_latest_first['ca'] = """""" msg_ascending['ca'] = """""" msg_descending['ca'] = """""" msg_rank_by['ca'] = """""" msg_display_results['ca'] = """""" msg_results['ca'] = """""" msg_single_list['ca'] = """""" msg_split_by_collection['ca'] = """""" msg_output_format['ca'] = """""" msg_brief['ca'] = """""" msg_detailed['ca'] = """""" msg_detailed_record['ca'] = """""" msg_results_overview_found_x_records_in_y_seconds['ca'] = """""" msg_x_records_found['ca'] = """""" msg_jump_to_record['ca'] = """""" msg_search_took_x_seconds['ca'] = """""" msg_add_to_basket['ca'] = """""" msg_collections['ca'] = """""" msg_any_collection['ca'] = """""" msg_add_another_collection['ca'] = """""" msg_guest['ca'] = """""" msg_added_since['ca'] = """""" msg_until['ca'] = """""" msg_any_day['ca'] = """""" msg_any_month['ca'] = """""" msg_any_year['ca'] = """""" msg_january['ca'] = """""" msg_february['ca'] = """""" msg_march['ca'] = """""" msg_april['ca'] = """""" msg_may['ca'] = """""" msg_june['ca'] = """""" msg_july['ca'] = """""" msg_august['ca'] = """""" msg_september['ca'] = """""" msg_october['ca'] = """""" msg_november['ca'] = """""" msg_december['ca'] = """""" msg_search_term['ca'] = """""" msg_inside_index['ca'] = """""" msg_did_not_match['ca'] = """""" msg_no_words_index_available['ca'] = """""" msg_no_phrase_index_available['ca'] = """""" msg_no_hits_in_given_collection['ca'] = """""" msg_no_public_hits['ca'] = """""" msg_no_match_within_time_limits['ca'] = """""" msg_no_match_within_search_limits['ca'] = """""" msg_no_boolean_hits['ca'] = """""" msg_no_exact_match_for_foo_using_bar_instead['de'] = """""" msg_internal_error['ca'] = """""" msg_please_contact_and_quote['ca'] = """""" msg_search_options['ca'] = """""" msg_latest_additions['ca'] = """""" msg_more['ca'] = """""" msg_collection_contains_no_records['ca'] = """""" msg_collection_restricted_content['ca'] = """""" msg_convert['ca'] = """""" msg_agenda['ca'] = """""" msg_webcast['ca'] = """""" msg_bulletin['ca'] = """""" msg_library['ca'] = """""" msg_this_page_in_languages['ca'] = """""" msg_admin_area['ca'] = """""" msg_administration['ca'] = """""" msg_thousands_separator['ca'] = """""" msg_record_deleted['ca'] = """""" msg_restricted['ca'] = """""" msg_record_last_modified['ca'] = """""" msg_similar_records['ca'] = """""" msg_cited_by['ca'] = """""" msg_cited_by_x_records['ca'] = """""" msg_cocited_with_x_records['ca'] = """""" msg_citation_history['ca'] = """""" msg_downloads_history['ca'] = """""" msg_people_who_viewed_this_page['ca'] = """""" msg_people_who_downloaded_this_document['ca'] = """""" ## Portuguese: msg_home['pt'] = """""" msg_search['pt'] = """""" msg_submit['pt'] = """""" msg_help['pt'] = """""" msg_help_central['pt'] = """""" msg_search_help['pt'] = """""" msg_search_tips['pt'] = """""" msg_personalize['pt'] = """""" msg_collection_not_found_head['pt'] = """""" msg_collection_not_found_body['pt'] = """""" msg_home['pt'] = """""" msg_powered_by['pt'] = """""" msg_maintained_by['pt'] = """""" msg_last_updated['pt'] = """""" msg_narrow_search['pt'] = """""" msg_focus_on['pt'] = """""" msg_simple_search['pt'] = """""" msg_advanced_search['pt'] = """""" msg_account['pt'] = """""" msg_alerts['pt'] = """""" msg_baskets['pt'] = """""" msg_submissions['pt'] = """""" msg_approvals['pt'] = """""" msg_session['pt'] = """""" msg_login['pt'] = """""" msg_logout['pt'] = """""" msg_language['pt'] = """""" msg_browse['pt'] = """""" msg_search_records_for['pt'] = """""" msg_all_of_the_words['pt'] = """""" msg_any_of_the_words['pt'] = """""" msg_exact_phrase['pt'] = """""" msg_partial_phrase['pt'] = """""" msg_regular_expression['pt'] = """""" msg_and['pt'] = """""" msg_or['pt'] = """""" msg_and_not['pt'] = """""" msg_search_results['pt'] = """""" msg_try_your_search_on['pt'] = """""" msg_not_found_what_you_were_looking_for['pt'] = """""" msg_see_also_similar_author_names['pt'] = """""" msg_in['pt'] = """""" msg_sort_by['pt'] = """""" msg_latest_first['pt'] = """""" msg_ascending['pt'] = """""" msg_descending['pt'] = """""" msg_rank_by['pt'] = """""" msg_display_results['pt'] = """""" msg_results['pt'] = """""" msg_single_list['pt'] = """""" msg_split_by_collection['pt'] = """""" msg_output_format['pt'] = """""" msg_brief['pt'] = """""" msg_detailed['pt'] = """""" msg_detailed_record['pt'] = """""" msg_results_overview_found_x_records_in_y_seconds['pt'] = """""" msg_x_records_found['pt'] = """""" msg_jump_to_record['pt'] = """""" msg_search_took_x_seconds['pt'] = """""" msg_add_to_basket['pt'] = """""" msg_collections['pt'] = """""" msg_any_collection['pt'] = """""" msg_add_another_collection['pt'] = """""" msg_guest['pt'] = """""" msg_added_since['pt'] = """""" msg_until['pt'] = """""" msg_any_day['pt'] = """""" msg_any_month['pt'] = """""" msg_any_year['pt'] = """""" msg_january['pt'] = """""" msg_february['pt'] = """""" msg_march['pt'] = """""" msg_april['pt'] = """""" msg_may['pt'] = """""" msg_june['pt'] = """""" msg_july['pt'] = """""" msg_august['pt'] = """""" msg_september['pt'] = """""" msg_october['pt'] = """""" msg_november['pt'] = """""" msg_december['pt'] = """""" msg_search_term['pt'] = """""" msg_inside_index['pt'] = """""" msg_did_not_match['pt'] = """""" msg_no_words_index_available['pt'] = """""" msg_no_phrase_index_available['pt'] = """""" msg_no_hits_in_given_collection['pt'] = """""" msg_no_public_hits['pt'] = """""" msg_no_match_within_time_limits['pt'] = """""" msg_no_match_within_search_limits['pt'] = """""" msg_no_boolean_hits['pt'] = """""" msg_no_exact_match_for_foo_using_bar_instead['pt'] = """""" msg_internal_error['pt'] = """""" msg_please_contact_and_quote['pt'] = """""" msg_search_options['pt'] = """""" msg_latest_additions['pt'] = """""" msg_more['pt'] = """""" msg_collection_contains_no_records['pt'] = """""" msg_collection_restricted_content['pt'] = """""" msg_convert['pt'] = """""" msg_agenda['pt'] = """""" msg_webcast['pt'] = """""" msg_bulletin['pt'] = """""" msg_library['pt'] = """""" msg_this_page_in_languages['pt'] = """""" msg_admin_area['pt'] = """""" msg_administration['pt'] = """""" msg_thousands_separator['pt'] = """""" msg_record_deleted['pt'] = """""" msg_restricted['pt'] = """""" msg_record_last_modified['pt'] = """""" msg_similar_records['pt'] = """""" msg_cited_by['pt'] = """""" msg_cited_by_x_records['pt'] = """""" msg_cocited_with_x_records['pt'] = """""" msg_citation_history['pt'] = """""" msg_downloads_history['pt'] = """""" msg_people_who_viewed_this_page['pt'] = """""" msg_people_who_downloaded_this_document['pt'] = """""" ## Italian: msg_home['it'] = """""" msg_search['it'] = """""" msg_submit['it'] = """""" msg_help['it'] = """""" msg_help_central['it'] = """""" msg_search_help['it'] = """""" msg_search_tips['it'] = """""" msg_personalize['it'] = """""" msg_collection_not_found_head['it'] = """""" msg_collection_not_found_body['it'] = """""" msg_home['it'] = """""" msg_powered_by['it'] = """""" msg_maintained_by['it'] = """""" msg_last_updated['it'] = """""" msg_narrow_search['it'] = """""" msg_focus_on['it'] = """""" msg_simple_search['it'] = """""" msg_advanced_search['it'] = """""" msg_account['it'] = """""" msg_alerts['it'] = """""" msg_baskets['it'] = """""" msg_submissions['it'] = """""" msg_approvals['it'] = """""" msg_session['it'] = """""" msg_login['it'] = """""" msg_logout['it'] = """""" msg_language['it'] = """""" msg_browse['it'] = """""" msg_search_records_for['it'] = """""" msg_all_of_the_words['it'] = """""" msg_any_of_the_words['it'] = """""" msg_exact_phrase['it'] = """""" msg_partial_phrase['it'] = """""" msg_regular_expression['it'] = """""" msg_and['it'] = """""" msg_or['it'] = """""" msg_and_not['it'] = """""" msg_search_results['it'] = """""" msg_try_your_search_on['it'] = """""" msg_not_found_what_you_were_looking_for['it'] = """""" msg_see_also_similar_author_names['it'] = """""" msg_in['it'] = """""" msg_sort_by['it'] = """""" msg_latest_first['it'] = """""" msg_ascending['it'] = """""" msg_descending['it'] = """""" msg_rank_by['it'] = """""" msg_display_results['it'] = """""" msg_results['it'] = """""" msg_single_list['it'] = """""" msg_split_by_collection['it'] = """""" msg_output_format['it'] = """""" msg_brief['it'] = """""" msg_detailed['it'] = """""" msg_detailed_record['it'] = """""" msg_results_overview_found_x_records_in_y_seconds['it'] = """""" msg_x_records_found['it'] = """""" msg_jump_to_record['it'] = """""" msg_search_took_x_seconds['it'] = """""" msg_add_to_basket['it'] = """""" msg_collections['it'] = """""" msg_any_collection['it'] = """""" msg_add_another_collection['it'] = """""" msg_guest['it'] = """""" msg_added_since['it'] = """""" msg_until['it'] = """""" msg_any_day['it'] = """""" msg_any_month['it'] = """""" msg_any_year['it'] = """""" msg_january['it'] = """""" msg_february['it'] = """""" msg_march['it'] = """""" msg_april['it'] = """""" msg_may['it'] = """""" msg_june['it'] = """""" msg_july['it'] = """""" msg_august['it'] = """""" msg_september['it'] = """""" msg_october['it'] = """""" msg_november['it'] = """""" msg_december['it'] = """""" msg_search_term['it'] = """""" msg_inside_index['it'] = """""" msg_did_not_match['it'] = """""" msg_no_words_index_available['it'] = """""" msg_no_phrase_index_available['it'] = """""" msg_no_hits_in_given_collection['it'] = """""" msg_no_public_hits['it'] = """""" msg_no_match_within_time_limits['it'] = """""" msg_no_match_within_search_limits['it'] = """""" msg_no_boolean_hits['it'] = """""" msg_no_exact_match_for_foo_using_bar_instead['it'] = """""" msg_internal_error['it'] = """""" msg_please_contact_and_quote['it'] = """""" msg_search_options['it'] = """""" msg_latest_additions['it'] = """""" msg_more['it'] = """""" msg_collection_contains_no_records['it'] = """""" msg_collection_restricted_content['it'] = """""" msg_convert['it'] = """""" msg_agenda['it'] = """""" msg_webcast['it'] = """""" msg_bulletin['it'] = """""" msg_library['it'] = """""" msg_this_page_in_languages['it'] = """""" msg_admin_area['it'] = """""" msg_administration['it'] = """""" msg_thousands_separator['it'] = """""" msg_record_deleted['it'] = """""" msg_restricted['it'] = """""" msg_record_last_modified['it'] = """""" msg_similar_records['it'] = """""" msg_cited_by['it'] = """""" msg_cited_by_x_records['it'] = """""" msg_cocited_with_x_records['it'] = """""" msg_citation_history['it'] = """""" msg_downloads_history['it'] = """""" msg_people_who_viewed_this_page['it'] = """""" msg_people_who_downloaded_this_document['it'] = """""" ## Russian: msg_home['ru'] = """""" msg_search['ru'] = """""" msg_submit['ru'] = """""" msg_help['ru'] = """""" msg_help_central['ru'] = """""" msg_search_help['ru'] = """""" msg_search_tips['ru'] = """""" msg_personalize['ru'] = """""" msg_collection_not_found_head['ru'] = """""" msg_collection_not_found_body['ru'] = """""" msg_home['ru'] = """""" msg_powered_by['ru'] = """""" msg_maintained_by['ru'] = """""" msg_last_updated['ru'] = """""" msg_narrow_search['ru'] = """""" msg_focus_on['ru'] = """""" msg_simple_search['ru'] = """""" msg_advanced_search['ru'] = """""" msg_account['ru'] = """""" msg_alerts['ru'] = """""" msg_baskets['ru'] = """""" msg_submissions['ru'] = """""" msg_approvals['ru'] = """""" msg_session['ru'] = """""" msg_login['ru'] = """""" msg_logout['ru'] = """""" msg_language['ru'] = """""" msg_browse['ru'] = """""" msg_search_records_for['ru'] = """""" msg_all_of_the_words['ru'] = """""" msg_any_of_the_words['ru'] = """""" msg_exact_phrase['ru'] = """""" msg_partial_phrase['ru'] = """""" msg_regular_expression['ru'] = """""" msg_and['ru'] = """""" msg_or['ru'] = """""" msg_and_not['ru'] = """""" msg_search_results['ru'] = """""" msg_try_your_search_on['ru'] = """""" msg_not_found_what_you_were_looking_for['ru'] = """""" msg_see_also_similar_author_names['ru'] = """""" msg_in['ru'] = """""" msg_sort_by['ru'] = """""" msg_latest_first['ru'] = """""" msg_ascending['ru'] = """""" msg_descending['ru'] = """""" msg_rank_by['ru'] = """""" msg_display_results['ru'] = """""" msg_results['ru'] = """""" msg_single_list['ru'] = """""" msg_split_by_collection['ru'] = """""" msg_output_format['ru'] = """""" msg_brief['ru'] = """""" msg_detailed['ru'] = """""" msg_detailed_record['ru'] = """""" msg_results_overview_found_x_records_in_y_seconds['ru'] = """""" msg_x_records_found['ru'] = """""" msg_jump_to_record['ru'] = """""" msg_search_took_x_seconds['ru'] = """""" msg_add_to_basket['ru'] = """""" msg_collections['ru'] = """""" msg_any_collection['ru'] = """""" msg_add_another_collection['ru'] = """""" msg_guest['ru'] = """""" msg_added_since['ru'] = """""" msg_until['ru'] = """""" msg_any_day['ru'] = """""" msg_any_month['ru'] = """""" msg_any_year['ru'] = """""" msg_january['ru'] = """""" msg_february['ru'] = """""" msg_march['ru'] = """""" msg_april['ru'] = """""" msg_may['ru'] = """""" msg_june['ru'] = """""" msg_july['ru'] = """""" msg_august['ru'] = """""" msg_september['ru'] = """""" msg_october['ru'] = """""" msg_november['ru'] = """""" msg_december['ru'] = """""" msg_search_term['ru'] = """""" msg_inside_index['ru'] = """""" msg_did_not_match['ru'] = """""" msg_no_words_index_available['ru'] = """""" msg_no_phrase_index_available['ru'] = """""" msg_no_hits_in_given_collection['ru'] = """""" msg_no_public_hits['ru'] = """""" msg_no_match_within_time_limits['ru'] = """""" msg_no_match_within_search_limits['ru'] = """""" msg_no_boolean_hits['ru'] = """""" msg_no_exact_match_for_foo_using_bar_instead['ru'] = """""" msg_internal_error['ru'] = """""" msg_please_contact_and_quote['ru'] = """""" msg_search_options['ru'] = """""" msg_latest_additions['ru'] = """""" msg_more['ru'] = """""" msg_collection_contains_no_records['ru'] = """""" msg_collection_restricted_content['ru'] = """""" msg_convert['ru'] = """""" msg_agenda['ru'] = """""" msg_webcast['ru'] = """""" msg_bulletin['ru'] = """""" msg_library['ru'] = """""" msg_this_page_in_languages['ru'] = """""" msg_admin_area['ru'] = """""" msg_administration['ru'] = """""" msg_thousands_separator['ru'] = """""" msg_record_deleted['ru'] = """""" msg_restricted['ru'] = """""" msg_record_last_modified['ru'] = """""" msg_similar_records['ru'] = """""" msg_cited_by['ru'] = """""" msg_cited_by_x_records['ru'] = """""" msg_cocited_with_x_records['ru'] = """""" msg_citation_history['ru'] = """""" msg_downloads_history['ru'] = """""" msg_people_who_viewed_this_page['ru'] = """""" msg_people_who_downloaded_this_document['ru'] = """""" ## Slovak: msg_home['sk'] = """""" msg_search['sk'] = """""" msg_submit['sk'] = """""" msg_help['sk'] = """""" msg_help_central['sk'] = """""" msg_search_help['sk'] = """""" msg_search_tips['sk'] = """""" msg_personalize['sk'] = """""" msg_collection_not_found_head['sk'] = """""" msg_collection_not_found_body['sk'] = """""" msg_home['sk'] = """""" msg_powered_by['sk'] = """""" msg_maintained_by['sk'] = """""" msg_last_updated['sk'] = """""" msg_narrow_search['sk'] = """""" msg_focus_on['sk'] = """""" msg_simple_search['sk'] = """""" msg_advanced_search['sk'] = """""" msg_account['sk'] = """""" msg_alerts['sk'] = """""" msg_baskets['sk'] = """""" msg_submissions['sk'] = """""" msg_approvals['sk'] = """""" msg_session['sk'] = """""" msg_login['sk'] = """""" msg_logout['sk'] = """""" msg_language['sk'] = """""" msg_browse['sk'] = """""" msg_search_records_for['sk'] = """""" msg_all_of_the_words['sk'] = """""" msg_any_of_the_words['sk'] = """""" msg_exact_phrase['sk'] = """""" msg_partial_phrase['sk'] = """""" msg_regular_expression['sk'] = """""" msg_and['sk'] = """""" msg_or['sk'] = """""" msg_and_not['sk'] = """""" msg_search_results['sk'] = """""" msg_try_your_search_on['sk'] = """""" msg_not_found_what_you_were_looking_for['sk'] = """""" msg_see_also_similar_author_names['sk'] = """""" msg_in['sk'] = """""" msg_sort_by['sk'] = """""" msg_latest_first['sk'] = """""" msg_ascending['sk'] = """""" msg_descending['sk'] = """""" msg_rank_by['sk'] = """""" msg_display_results['sk'] = """""" msg_results['sk'] = """""" msg_single_list['sk'] = """""" msg_split_by_collection['sk'] = """""" msg_output_format['sk'] = """""" msg_brief['sk'] = """""" msg_detailed['sk'] = """""" msg_detailed_record['sk'] = """""" msg_results_overview_found_x_records_in_y_seconds['sk'] = """""" msg_x_records_found['sk'] = """""" msg_jump_to_record['sk'] = """""" msg_search_took_x_seconds['sk'] = """""" msg_add_to_basket['sk'] = """""" msg_collections['sk'] = """""" msg_any_collection['sk'] = """""" msg_add_another_collection['sk'] = """""" msg_guest['sk'] = """""" msg_added_since['sk'] = """""" msg_until['sk'] = """""" msg_any_day['sk'] = """""" msg_any_month['sk'] = """""" msg_any_year['sk'] = """""" msg_january['sk'] = """""" msg_february['sk'] = """""" msg_march['sk'] = """""" msg_april['sk'] = """""" msg_may['sk'] = """""" msg_june['sk'] = """""" msg_july['sk'] = """""" msg_august['sk'] = """""" msg_september['sk'] = """""" msg_october['sk'] = """""" msg_november['sk'] = """""" msg_december['sk'] = """""" msg_search_term['sk'] = """""" msg_inside_index['sk'] = """""" msg_did_not_match['sk'] = """""" msg_no_words_index_available['sk'] = """""" msg_no_phrase_index_available['sk'] = """""" msg_no_hits_in_given_collection['sk'] = """""" msg_no_public_hits['sk'] = """""" msg_no_match_within_time_limits['sk'] = """""" msg_no_match_within_search_limits['sk'] = """""" msg_no_boolean_hits['skk'] = """""" msg_no_exact_match_for_foo_using_bar_instead['sk'] = """""" msg_internal_error['sk'] = """""" msg_please_contact_and_quote['sk'] = """""" msg_search_options['sk'] = """""" msg_latest_additions['sk'] = """""" msg_more['sk'] = """""" msg_collection_contains_no_records['sk'] = """""" msg_collection_restricted_content['sk'] = """""" msg_convert['sk'] = """""" msg_agenda['sk'] = """""" msg_webcast['sk'] = """""" msg_bulletin['sk'] = """""" msg_library['sk'] = """""" msg_this_page_in_languages['sk'] = """""" msg_admin_area['sk'] = """""" msg_administration['sk'] = """""" msg_thousands_separator['sk'] = """""" msg_record_deleted['sk'] = """""" msg_restricted['sk'] = """""" msg_record_last_modified['sk'] = """""" msg_similar_records['sk'] = """""" msg_cited_by['sk'] = """""" msg_cited_by_x_records['sk'] = """""" msg_cocited_with_x_records['sk'] = """""" msg_citation_history['sk'] = """""" msg_downloads_history['sk'] = """""" msg_people_who_viewed_this_page['sk'] = """""" msg_people_who_downloaded_this_document['sk'] = """""" ## Czech: msg_home['cs'] = """""" msg_search['cs'] = """""" msg_submit['cs'] = """""" msg_help['cs'] = """""" msg_help_central['cs'] = """""" msg_search_help['cs'] = """""" msg_search_tips['cs'] = """""" msg_personalize['cs'] = """""" msg_collection_not_found_head['cs'] = """""" msg_collection_not_found_body['cs'] = """""" msg_home['cs'] = """""" msg_powered_by['cs'] = """""" msg_maintained_by['cs'] = """""" msg_last_updated['cs'] = """""" msg_narrow_search['cs'] = """""" msg_focus_on['cs'] = """""" msg_simple_search['cs'] = """""" msg_advanced_search['cs'] = """""" msg_account['cs'] = """""" msg_alerts['cs'] = """""" msg_baskets['cs'] = """""" msg_submissions['cs'] = """""" msg_approvals['cs'] = """""" msg_session['cs'] = """""" msg_login['cs'] = """""" msg_logout['cs'] = """""" msg_language['cs'] = """""" msg_browse['cs'] = """""" msg_search_records_for['cs'] = """""" msg_all_of_the_words['cs'] = """""" msg_any_of_the_words['cs'] = """""" msg_exact_phrase['cs'] = """""" msg_partial_phrase['cs'] = """""" msg_regular_expression['cs'] = """""" msg_and['cs'] = """""" msg_or['cs'] = """""" msg_and_not['cs'] = """""" msg_search_results['cs'] = """""" msg_try_your_search_on['cs'] = """""" msg_not_found_what_you_were_looking_for['cs'] = """""" msg_see_also_similar_author_names['cs'] = """""" msg_in['cs'] = """""" msg_sort_by['cs'] = """""" msg_latest_first['cs'] = """""" msg_ascending['cs'] = """""" msg_descending['cs'] = """""" msg_rank_by['cs'] = """""" msg_display_results['cs'] = """""" msg_results['cs'] = """""" msg_single_list['cs'] = """""" msg_split_by_collection['cs'] = """""" msg_output_format['cs'] = """""" msg_brief['cs'] = """""" msg_detailed['cs'] = """""" msg_detailed_record['cs'] = """""" msg_results_overview_found_x_records_in_y_seconds['cs'] = """""" msg_x_records_found['cs'] = """""" msg_jump_to_record['cs'] = """""" msg_search_took_x_seconds['cs'] = """""" msg_add_to_basket['cs'] = """""" msg_collections['cs'] = """""" msg_any_collection['cs'] = """""" msg_add_another_collection['cs'] = """""" msg_guest['cs'] = """""" msg_added_since['cs'] = """""" msg_until['cs'] = """""" msg_any_day['cs'] = """""" msg_any_month['cs'] = """""" msg_any_year['cs'] = """""" msg_january['cs'] = """""" msg_february['cs'] = """""" msg_march['cs'] = """""" msg_april['cs'] = """""" msg_may['cs'] = """""" msg_june['cs'] = """""" msg_july['cs'] = """""" msg_august['cs'] = """""" msg_september['cs'] = """""" msg_october['cs'] = """""" msg_november['cs'] = """""" msg_december['cs'] = """""" msg_search_term['cs'] = """""" msg_inside_index['cs'] = """""" msg_did_not_match['cs'] = """""" msg_no_words_index_available['cs'] = """""" msg_no_phrase_index_available['cs'] = """""" msg_no_hits_in_given_collection['cs'] = """""" msg_no_public_hits['cs'] = """""" msg_no_match_within_time_limits['cs'] = """""" msg_no_match_within_search_limits['cs'] = """""" msg_no_boolean_hits['cs'] = """""" msg_no_exact_match_for_foo_using_bar_instead['cs'] = """""" msg_internal_error['cs'] = """""" msg_please_contact_and_quote['cs'] = """""" msg_search_options['cs'] = """""" msg_latest_additions['cs'] = """""" msg_more['cs'] = """""" msg_collection_contains_no_records['cs'] = """""" msg_collection_restricted_content['cs'] = """""" msg_convert['cs'] = """""" msg_agenda['cs'] = """""" msg_webcast['cs'] = """""" msg_bulletin['cs'] = """""" msg_library['cs'] = """""" msg_this_page_in_languages['cs'] = """""" msg_admin_area['cs'] = """""" msg_administration['cs'] = """""" msg_thousands_separator['cs'] = """""" msg_record_deleted['cs'] = """""" msg_restricted['cs'] = """""" msg_record_last_modified['cs'] = """""" msg_similar_records['cs'] = """""" msg_cited_by['cs'] = """""" msg_cited_by_x_records['cs'] = """""" msg_cocited_with_x_records['cs'] = """""" msg_citation_history['cs'] = """""" msg_downloads_history['cs'] = """""" msg_people_who_viewed_this_page['cs'] = """""" msg_people_who_downloaded_this_document['cs'] = """""" ## Norwegian: msg_home['no'] = """""" msg_search['no'] = """""" msg_submit['no'] = """""" msg_help['no'] = """""" msg_help_central['no'] = """""" msg_search_help['no'] = """""" msg_search_tips['no'] = """""" msg_personalize['no'] = """""" msg_collection_not_found_head['no'] = """""" msg_collection_not_found_body['no'] = """""" msg_home['no'] = """""" msg_powered_by['no'] = """""" msg_maintained_by['no'] = """""" msg_last_updated['no'] = """""" msg_narrow_search['no'] = """""" msg_focus_on['no'] = """""" msg_simple_search['no'] = """""" msg_advanced_search['no'] = """""" msg_account['no'] = """""" msg_alerts['no'] = """""" msg_baskets['no'] = """""" msg_submissions['no'] = """""" msg_approvals['no'] = """""" msg_session['no'] = """""" msg_login['no'] = """""" msg_logout['no'] = """""" msg_language['no'] = """""" msg_browse['no'] = """""" msg_search_records_for['no'] = """""" msg_all_of_the_words['no'] = """""" msg_any_of_the_words['no'] = """""" msg_exact_phrase['no'] = """""" msg_partial_phrase['no'] = """""" msg_regular_expression['no'] = """""" msg_and['no'] = """""" msg_or['no'] = """""" msg_and_not['no'] = """""" msg_search_results['no'] = """""" msg_try_your_search_on['no'] = """""" msg_not_found_what_you_were_looking_for['no'] = """""" msg_see_also_similar_author_names['no'] = """""" msg_in['no'] = """""" msg_sort_by['no'] = """""" msg_latest_first['no'] = """""" msg_ascending['no'] = """""" msg_descending['no'] = """""" msg_rank_by['no'] = """""" msg_display_results['no'] = """""" msg_results['no'] = """""" msg_single_list['no'] = """""" msg_split_by_collection['no'] = """""" msg_output_format['no'] = """""" msg_brief['no'] = """""" msg_detailed['no'] = """""" msg_detailed_record['no'] = """""" msg_results_overview_found_x_records_in_y_seconds['no'] = """""" msg_x_records_found['no'] = """""" msg_jump_to_record['no'] = """""" msg_search_took_x_seconds['no'] = """""" msg_add_to_basket['no'] = """""" msg_collections['no'] = """""" msg_any_collection['no'] = """""" msg_add_another_collection['no'] = """""" msg_guest['no'] = """""" msg_added_since['no'] = """""" msg_until['no'] = """""" msg_any_day['no'] = """""" msg_any_month['no'] = """""" msg_any_year['no'] = """""" msg_january['no'] = """""" msg_february['no'] = """""" msg_march['no'] = """""" msg_april['no'] = """""" msg_may['no'] = """""" msg_june['no'] = """""" msg_july['no'] = """""" msg_august['no'] = """""" msg_september['no'] = """""" msg_october['no'] = """""" msg_november['no'] = """""" msg_december['no'] = """""" msg_search_term['no'] = """""" msg_inside_index['no'] = """""" msg_did_not_match['no'] = """""" msg_no_words_index_available['no'] = """""" msg_no_phrase_index_available['no'] = """""" msg_no_hits_in_given_collection['no'] = """""" msg_no_public_hits['no'] = """""" msg_no_match_within_time_limits['no'] = """""" msg_no_match_within_search_limits['no'] = """""" msg_no_boolean_hits['no'] = """""" msg_no_exact_match_for_foo_using_bar_instead['no'] = """""" msg_internal_error['no'] = """""" msg_please_contact_and_quote['no'] = """""" msg_search_options['no'] = """""" msg_latest_additions['no'] = """""" msg_more['no'] = """""" msg_collection_contains_no_records['no'] = """""" msg_collection_restricted_content['no'] = """""" msg_convert['no'] = """""" msg_agenda['no'] = """""" msg_webcast['no'] = """""" msg_bulletin['no'] = """""" msg_library['no'] = """""" msg_this_page_in_languages['no'] = """""" msg_admin_area['no'] = """""" msg_administration['no'] = """""" msg_thousands_separator['no'] = """""" msg_record_deleted['no'] = """""" msg_restricted['no'] = """""" msg_record_last_modified['no'] = """""" msg_similar_records['no'] = """""" msg_cited_by['no'] = """""" msg_cited_by_x_records['no'] = """""" msg_cocited_with_x_records['no'] = """""" msg_citation_history['no'] = """""" msg_downloads_history['no'] = """""" msg_people_who_viewed_this_page['no'] = """""" msg_people_who_downloaded_this_document['no'] = """""" ## Swedish: msg_home['sv'] = """""" msg_search['sv'] = """""" msg_submit['sv'] = """""" msg_help['sv'] = """""" msg_help_central['sv'] = """""" msg_search_help['sv'] = """""" msg_search_tips['sv'] = """""" msg_personalize['sv'] = """""" msg_collection_not_found_head['sv'] = """""" msg_collection_not_found_body['sv'] = """""" msg_home['sv'] = """""" msg_powered_by['sv'] = """""" msg_maintained_by['sv'] = """""" msg_last_updated['sv'] = """""" msg_narrow_search['sv'] = """""" msg_focus_on['sv'] = """""" msg_simple_search['sv'] = """""" msg_advanced_search['sv'] = """""" msg_account['sv'] = """""" msg_alerts['sv'] = """""" msg_baskets['sv'] = """""" msg_submissions['sv'] = """""" msg_approvals['sv'] = """""" msg_session['sv'] = """""" msg_login['sv'] = """""" msg_logout['sv'] = """""" msg_language['sv'] = """""" msg_browse['sv'] = """""" msg_search_records_for['sv'] = """""" msg_all_of_the_words['sv'] = """""" msg_any_of_the_words['sv'] = """""" msg_exact_phrase['sv'] = """""" msg_partial_phrase['sv'] = """""" msg_regular_expression['sv'] = """""" msg_and['sv'] = """""" msg_or['sv'] = """""" msg_and_not['sv'] = """""" msg_search_results['sv'] = """""" msg_try_your_search_on['sv'] = """""" msg_not_found_what_you_were_looking_for['sv'] = """""" msg_see_also_similar_author_names['sv'] = """""" msg_in['sv'] = """""" msg_sort_by['sv'] = """""" msg_latest_first['sv'] = """""" msg_ascending['sv'] = """""" msg_descending['sv'] = """""" msg_rank_by['sv'] = """""" msg_display_results['sv'] = """""" msg_results['sv'] = """""" msg_single_list['sv'] = """""" msg_split_by_collection['sv'] = """""" msg_output_format['sv'] = """""" msg_brief['sv'] = """""" msg_detailed['sv'] = """""" msg_detailed_record['sv'] = """""" msg_results_overview_found_x_records_in_y_seconds['sv'] = """""" msg_x_records_found['sv'] = """""" msg_jump_to_record['sv'] = """""" msg_search_took_x_seconds['sv'] = """""" msg_add_to_basket['sv'] = """""" msg_collections['sv'] = """""" msg_any_collection['sv'] = """""" msg_add_another_collection['sv'] = """""" msg_guest['sv'] = """""" msg_added_since['sv'] = """""" msg_until['sv'] = """""" msg_any_day['sv'] = """""" msg_any_month['sv'] = """""" msg_any_year['sv'] = """""" msg_january['sv'] = """""" msg_february['sv'] = """""" msg_march['sv'] = """""" msg_april['sv'] = """""" msg_may['sv'] = """""" msg_june['sv'] = """""" msg_july['sv'] = """""" msg_august['sv'] = """""" msg_september['sv'] = """""" msg_october['sv'] = """""" msg_november['sv'] = """""" msg_december['sv'] = """""" msg_search_term['sv'] = """""" msg_inside_index['sv'] = """""" msg_did_not_match['sv'] = """""" msg_no_words_index_available['sv'] = """""" msg_no_phrase_index_available['sv'] = """""" msg_no_hits_in_given_collection['sv'] = """""" msg_no_public_hits['sv'] = """""" msg_no_match_within_time_limits['sv'] = """""" msg_no_match_within_search_limits['sv'] = """""" msg_no_boolean_hits['sv'] = """""" msg_no_exact_match_for_foo_using_bar_instead['sv'] = """""" msg_internal_error['sv'] = """""" msg_please_contact_and_quote['sv'] = """""" msg_search_options['sv'] = """""" msg_latest_additions['sv'] = """""" msg_more['sv'] = """""" msg_collection_contains_no_records['sv'] = """""" msg_collection_restricted_content['sv'] = """""" msg_convert['sv'] = """""" msg_agenda['sv'] = """""" msg_webcast['sv'] = """""" msg_bulletin['sv'] = """""" msg_library['sv'] = """""" msg_this_page_in_languages['sv'] = """""" msg_admin_area['sv'] = """""" msg_administration['sv'] = """""" msg_thousands_separator['sv'] = """""" msg_record_deleted['sv'] = """""" msg_restricted['sv'] = """""" msg_record_last_modified['sv'] = """""" msg_similar_records['sv'] = """""" msg_cited_by['sv'] = """""" msg_cited_by_x_records['sv'] = """""" msg_cocited_with_x_records['sv'] = """""" msg_citation_history['sv'] = """""" msg_downloads_history['sv'] = """""" msg_people_who_viewed_this_page['sv'] = """""" msg_people_who_downloaded_this_document['sv'] = """""" ## Greek: msg_home['el'] = """""" msg_search['el'] = """""" msg_submit['el'] = """""" msg_help['el'] = """""" msg_help_central['el'] = """""" msg_search_help['el'] = """""" msg_search_tips['el'] = """""" msg_personalize['el'] = """""" msg_collection_not_found_head['el'] = """""" msg_collection_not_found_body['el'] = """""" msg_home['el'] = """""" msg_powered_by['el'] = """""" msg_maintained_by['el'] = """""" msg_last_updated['el'] = """""" msg_narrow_search['el'] = """""" msg_focus_on['el'] = """""" msg_simple_search['el'] = """""" msg_advanced_search['el'] = """""" msg_account['el'] = """""" msg_alerts['el'] = """""" msg_baskets['el'] = """""" msg_submissions['el'] = """""" msg_approvals['el'] = """""" msg_session['el'] = """""" msg_login['el'] = """""" msg_logout['el'] = """""" msg_language['el'] = """""" msg_browse['el'] = """""" msg_search_records_for['el'] = """""" msg_all_of_the_words['el'] = """""" msg_any_of_the_words['el'] = """""" msg_exact_phrase['el'] = """""" msg_partial_phrase['el'] = """""" msg_regular_expression['el'] = """""" msg_and['el'] = """""" msg_or['el'] = """""" msg_and_not['el'] = """""" msg_search_results['el'] = """""" msg_try_your_search_on['el'] = """""" msg_not_found_what_you_were_looking_for['el'] = """""" msg_see_also_similar_author_names['el'] = """""" msg_in['el'] = """""" msg_sort_by['el'] = """""" msg_latest_first['el'] = """""" msg_ascending['el'] = """""" msg_descending['el'] = """""" msg_rank_by['el'] = """""" msg_display_results['el'] = """""" msg_results['el'] = """""" msg_single_list['el'] = """""" msg_split_by_collection['el'] = """""" msg_output_format['el'] = """""" msg_brief['el'] = """""" msg_detailed['el'] = """""" msg_detailed_record['el'] = """""" msg_results_overview_found_x_records_in_y_seconds['el'] = """""" msg_x_records_found['el'] = """""" msg_jump_to_record['el'] = """""" msg_search_took_x_seconds['el'] = """""" msg_add_to_basket['el'] = """""" msg_collections['el'] = """""" msg_any_collection['el'] = """""" msg_add_another_collection['el'] = """""" msg_guest['el'] = """""" msg_added_since['el'] = """""" msg_until['el'] = """""" msg_any_day['el'] = """""" msg_any_month['el'] = """""" msg_any_year['el'] = """""" msg_january['el'] = """""" msg_february['el'] = """""" msg_march['el'] = """""" msg_april['el'] = """""" msg_may['el'] = """""" msg_june['el'] = """""" msg_july['el'] = """""" msg_august['el'] = """""" msg_september['el'] = """""" msg_october['el'] = """""" msg_november['el'] = """""" msg_december['el'] = """""" msg_search_term['el'] = """""" msg_inside_index['el'] = """""" msg_did_not_match['el'] = """""" msg_no_words_index_available['el'] = """""" msg_no_phrase_index_available['el'] = """""" msg_no_hits_in_given_collection['el'] = """""" msg_no_public_hits['el'] = """""" msg_no_match_within_time_limits['el'] = """""" msg_no_match_within_search_limits['el'] = """""" msg_no_boolean_hits['el'] = """""" msg_no_exact_match_for_foo_using_bar_instead['el'] = """""" msg_internal_error['el'] = """""" msg_please_contact_and_quote['el'] = """""" msg_search_options['el'] = """""" msg_latest_additions['el'] = """""" msg_more['el'] = """""" msg_collection_contains_no_records['el'] = """""" msg_collection_restricted_content['el'] = """""" msg_convert['el'] = """""" msg_agenda['el'] = """""" msg_webcast['el'] = """""" msg_bulletin['el'] = """""" msg_library['el'] = """""" msg_this_page_in_languages['el'] = """""" msg_admin_area['el'] = """""" msg_administration['el'] = """""" msg_thousands_separator['el'] = """""" msg_record_deleted['el'] = """""" msg_restricted['el'] = """""" msg_record_last_modified['el'] = """""" msg_similar_records['el'] = """""" msg_cited_by['el'] = """""" msg_cited_by_x_records['el'] = """""" msg_cocited_with_x_records['el'] = """""" msg_citation_history['el'] = """""" msg_downloads_history['el'] = """""" msg_people_who_viewed_this_page['el'] = """""" msg_people_who_downloaded_this_document['el'] = """""" ## Ukrainian: msg_home['uk'] = """""" msg_search['uk'] = """""" msg_submit['uk'] = """""" msg_help['uk'] = """""" msg_help_central['uk'] = """""" msg_search_help['uk'] = """""" msg_search_tips['uk'] = """""" msg_personalize['uk'] = """""" msg_collection_not_found_head['uk'] = """""" msg_collection_not_found_body['uk'] = """""" msg_home['uk'] = """""" msg_powered_by['uk'] = """""" msg_maintained_by['uk'] = """""" msg_last_updated['uk'] = """""" msg_narrow_search['uk'] = """""" msg_focus_on['uk'] = """""" msg_simple_search['uk'] = """""" msg_advanced_search['uk'] = """""" msg_account['uk'] = """""" msg_alerts['uk'] = """""" msg_baskets['uk'] = """""" msg_submissions['uk'] = """""" msg_approvals['uk'] = """""" msg_session['uk'] = """""" msg_login['uk'] = """""" msg_logout['uk'] = """""" msg_language['uk'] = """""" msg_browse['uk'] = """""" msg_search_records_for['uk'] = """""" msg_all_of_the_words['uk'] = """""" msg_any_of_the_words['uk'] = """""" msg_exact_phrase['uk'] = """""" msg_partial_phrase['uk'] = """""" msg_regular_expression['uk'] = """""" msg_and['uk'] = """""" msg_or['uk'] = """""" msg_and_not['uk'] = """""" msg_search_results['uk'] = """""" msg_try_your_search_on['uk'] = """""" msg_not_found_what_you_were_looking_for['uk'] = """""" msg_see_also_similar_author_names['uk'] = """""" msg_in['uk'] = """""" msg_sort_by['uk'] = """""" msg_latest_first['uk'] = """""" msg_ascending['uk'] = """""" msg_descending['uk'] = """""" msg_rank_by['uk'] = """""" msg_display_results['uk'] = """""" msg_results['uk'] = """""" msg_single_list['uk'] = """""" msg_split_by_collection['uk'] = """""" msg_output_format['uk'] = """""" msg_brief['uk'] = """""" msg_detailed['uk'] = """""" msg_detailed_record['uk'] = """""" msg_results_overview_found_x_records_in_y_seconds['uk'] = """""" msg_x_records_found['uk'] = """""" msg_jump_to_record['uk'] = """""" msg_search_took_x_seconds['uk'] = """""" msg_add_to_basket['uk'] = """""" msg_collections['uk'] = """""" msg_any_collection['uk'] = """""" msg_add_another_collection['uk'] = """""" msg_guest['uk'] = """""" msg_added_since['uk'] = """""" msg_until['uk'] = """""" msg_any_day['uk'] = """""" msg_any_month['uk'] = """""" msg_any_year['uk'] = """""" msg_january['uk'] = """""" msg_february['uk'] = """""" msg_march['uk'] = """""" msg_april['uk'] = """""" msg_may['uk'] = """""" msg_june['uk'] = """""" msg_july['uk'] = """""" msg_august['uk'] = """""" msg_september['uk'] = """""" msg_october['uk'] = """""" msg_november['uk'] = """""" msg_december['uk'] = """""" msg_search_term['uk'] = """""" msg_inside_index['uk'] = """""" msg_did_not_match['uk'] = """""" msg_no_words_index_available['uk'] = """""" msg_no_phrase_index_available['uk'] = """""" msg_no_hits_in_given_collection['uk'] = """""" msg_no_public_hits['uk'] = """""" msg_no_match_within_time_limits['uk'] = """""" msg_no_match_within_search_limits['uk'] = """""" msg_no_boolean_hits['uk'] = """""" msg_no_exact_match_for_foo_using_bar_instead['uk'] = """""" msg_internal_error['uk'] = """""" msg_please_contact_and_quote['uk'] = """""" msg_search_options['uk'] = """""" msg_latest_additions['uk'] = """""" msg_more['uk'] = """""" msg_collection_contains_no_records['uk'] = """""" msg_collection_restricted_content['uk'] = """""" msg_convert['uk'] = """""" msg_agenda['uk'] = """""" msg_webcast['uk'] = """""" msg_bulletin['uk'] = """""" msg_library['uk'] = """""" msg_this_page_in_languages['uk'] = """""" msg_admin_area['uk'] = """""" msg_administration['uk'] = """""" msg_thousands_separator['uk'] = """""" msg_record_deleted['uk'] = """""" msg_restricted['uk'] = """""" msg_record_last_modified['uk'] = """""" msg_similar_records['uk'] = """""" msg_cited_by['uk'] = """""" msg_cited_by_x_records['uk'] = """""" msg_cocited_with_x_records['uk'] = """""" msg_citation_history['uk'] = """""" msg_downloads_history['uk'] = """""" msg_people_who_viewed_this_page['uk'] = """""" msg_people_who_downloaded_this_document['uk'] = """""" ## I18N messages oriented functions: ## gettext load languages and set language stuff import gettext lang = {} for ln in <: print generate_language_list_for_python(); :>: lang[ln] = gettext.translation('cdsware', localedir, languages = [ln], fallback = True) def gettext_set_language(ln): """ Set the _ gettext function in every caller function Usage:: _ = gettext_set_language(ln) """ return lang[ln].gettext def wash_language(ln): """Look at LN and check if it is one of allowed languages for the interface. Return it in case of success, return the default language otherwise.""" if ln in <: print generate_language_list_for_python(); :>: return ln else: return '' def create_language_selection_box(urlargs="", language="en"): """Take URLARGS and LANGUAGE and return textual language selection box for the given page.""" out = "" for (lang, lang_namelong) in <: print generate_language_list_for_python("long"); :>: if lang == language: out += """ %s   """ % lang_namelong else: if urlargs: urlargs = sre.sub(r'ln=.*?(&|$)', '', urlargs) if urlargs: if urlargs.endswith('&'): urlargs += "ln=%s" % lang else: urlargs += "&ln=%s" % lang else: urlargs = "ln=%s" % lang out += """ %s   """ % (urlargs, lang_namelong) return msg_this_page_in_languages[language] + "
" + out def language_list_long(): return <: print generate_language_list_for_python("long"); :> diff --git a/modules/webaccess/bin/authaction.in b/modules/webaccess/bin/authaction.in index aa799f97b..fef57b66d 100644 --- a/modules/webaccess/bin/authaction.in +++ b/modules/webaccess/bin/authaction.in @@ -1,96 +1,94 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """authaction -- CLI interface to Access Control Engine""" __version__ = "$Id$" try: import sys - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.config import * from cdsware.access_control_engine import acc_authorize_action from cdsware.access_control_config import cfg_webaccess_warning_msgs except ImportError, e: print "Error: %s" % e import sys sys.exit(1) def usage(code, msg=''): """Print usage info.""" if msg: sys.stderr.write("Error: %s.\n" % msg) sys.stderr.write("authaction -- CLI interface to Access Control Engine\n") sys.stderr.write("Usage: %s [options] [keyword1] [value1] [keyword2] [value2] ...\n" % sys.argv[0]) sys.stderr.write("Command options:\n") sys.stderr.write(" = ID of the user\n") sys.stderr.write(" = action name\n") sys.stderr.write(" [keyword1] = optional first keyword argument\n") sys.stderr.write(" [value1] = its value\n") sys.stderr.write(" [keyword2] = optional second keyword argument\n") sys.stderr.write(" [value2] = its value\n") sys.stderr.write(" ... = et caetera\n") sys.stderr.write("General options:\n") sys.stderr.write(" -h, --help \t\t Print this help.\n") sys.stderr.write(" -V, --version \t\t Print version information.\n") sys.exit(code) def main(): """CLI to acc_authorize_action. The function finds the needed arguments in sys.argv. If the number of arguments is wrong it prints help. Return 0 on success, 9 or higher on failure. """ alen, auth = len(sys.argv), 0 # return ``not permitted'' if wrong arguments if alen > 1 and sys.argv[1] in ["-h", "--help"]: usage(0) elif alen > 1 and sys.argv[1] in ["-V", "--version"]: sys.stderr.write("%s\n" % __version__) sys.exit(0) if alen < 3 or alen % 2 == 0: print "7 - %s" % cfg_webaccess_warning_msgs[7] return "7 - %s" % cfg_webaccess_warning_msgs[7] # try to authorize else: # get values id_user = sys.argv[1] name_action = sys.argv[2] dict = {} for i in range(3, alen, 2): dict[sys.argv[i]] = sys.argv[i + 1] # run ace-function (auth_code, auth_message) = acc_authorize_action(id_user, name_action, **dict) # print and return print "%s - %s" % (auth_code, auth_message) return "%s - %s" % (auth_code, auth_message) if __name__ == '__main__': main() diff --git a/modules/webaccess/bin/webaccessadmin.in b/modules/webaccess/bin/webaccessadmin.in index 088beae9a..607a19186 100644 --- a/modules/webaccess/bin/webaccessadmin.in +++ b/modules/webaccess/bin/webaccessadmin.in @@ -1,108 +1,106 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """WebAccess Admin -- reset or add default authorization settings""" __version__ = "$Id$" try: import getpass import sys - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.config import supportemail from cdsware.access_control_admin import acc_reset_default_settings from cdsware.access_control_admin import acc_add_default_settings from cdsware.dbquery import run_sql except ImportError, e: print "Error: %s" % e import sys sys.exit(1) def usage(code, msg=''): """Print usage info.""" if msg: sys.stderr.write("Error: %s.\n" % msg) sys.stderr.write("WebAccess Admin -- reset or add default authorization settings\n") sys.stderr.write("Usage: %s [options] \n" % sys.argv[0]) sys.stderr.write("Command options:\n") sys.stderr.write(" = reset-default-settings\n") sys.stderr.write(" = add-default-settings\n") sys.stderr.write("General options:\n") sys.stderr.write(" -h, --help \t\t Print this help.\n") sys.stderr.write(" -V, --version \t\t Print version information.\n") sys.exit(code) def main(): """CLI to acc_authorize_action. The function finds the needed arguments in sys.argv. If the number of arguments is wrong it prints help. Return 1 on success, 0 on failure. """ alen = len(sys.argv) action = '' # print help if wrong arguments if alen > 1 and sys.argv[1] in ["-h", "--help"]: usage(0) elif alen > 1 and sys.argv[1] in ["-V", "--version"]: print __version__ sys.exit(0) if alen != 2 or sys.argv[1] not in ['reset-default-settings', 'add-default-settings']: usage(1) # getting input from user print 'User: ', user = raw_input() password = getpass.getpass() # validating input perform = 0 # check password if user == supportemail: perform = run_sql("""select * from user where email = '%s' and password = '%s' """ % (supportemail, password)) and 1 or 0 if not perform: # wrong password or user not recognized print 'User not authorized' return perform # perform chosen action, add all users above as superusers if sys.argv[1] == 'reset-default-settings': action = 'reset' acc_reset_default_settings([supportemail]) elif sys.argv[1] == 'add-default-settings': action = 'added' acc_add_default_settings([supportemail]) # notify of success if action: print '\nOkay, the default authorization settings have been __%s__.' % (action, ) else: print 'Requested action failed.' return perform if __name__ == '__main__': main() diff --git a/modules/webaccess/lib/access_control_admin.py b/modules/webaccess/lib/access_control_admin.py index 6bce0be1c..d75ac246b 100644 --- a/modules/webaccess/lib/access_control_admin.py +++ b/modules/webaccess/lib/access_control_admin.py @@ -1,1588 +1,1588 @@ # $Id$ ## CDSware Access Control Engine in mod_python. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware Access Control Admin.""" __version__ = "$Id$" # check this: def acc_addUserRole(id_user, id_role=0, name_role=0): ## import interesting modules: import sys import time -from access_control_engine import acc_authorize_action -from access_control_config import * -from config import * -from dbquery import run_sql from MySQLdb import ProgrammingError +from cdsware.access_control_engine import acc_authorize_action +from cdsware.access_control_config import * +from cdsware.config import * +from cdsware.dbquery import run_sql # ACTIONS def acc_addAction(name_action='', description='', optional='no', *allowedkeywords): """function to create new entry in accACTION for an action name_action - name of the new action, must be unique keyvalstr - string with allowed keywords allowedkeywords - a list of allowedkeywords keyvalstr and allowedkeywordsdict can not be in use simultanously success -> return id_action, name_action, description and allowedkeywords failure -> return 0 """ keystr = '' # action with this name all ready exists, return 0 if run_sql("""SELECT * FROM accACTION WHERE name = '%s'""" % (name_action, )): return 0 # create keyword string for value in allowedkeywords: if keystr: keystr += ',' keystr += value if not allowedkeywords: optional = 'no' # insert the new entry try: res = run_sql("""INSERT INTO accACTION (name, description, allowedkeywords, optional) VALUES ('%s', '%s', '%s', '%s')""" % (name_action, description, keystr, optional)) except ProgrammingError: return 0 if res: return res, name_action, description, keystr, optional return 0 def acc_deleteAction(id_action=0, name_action=0): """delete action in accACTION according to id, or secondly name. entries in accROLE_accACTION_accARGUMENT will also be removed. id_action - id of action to be deleted, prefered variable name_action - this is used if id_action is not given if the name or id is wrong, the function does nothing """ if id_action and name_action: return 0 # delete the action if run_sql("""DELETE FROM accACTION WHERE id = %s OR name = '%s'""" % (id_action, name_action)): # delete all entries related return 1 + run_sql("""DELETE FROM accROLE_accACTION_accARGUMENT WHERE id_accACTION = %s """ % (id_action, )) else: return 0 def acc_verifyAction(name_action='', description='', allowedkeywords='', optional=''): """check if all the values of a given action are the same as those in accACTION in the database. self explanatory parameters. return id if identical, 0 if not. """ id_action = acc_getActionId(name_action=name_action) if not id_action: return 0 res_desc = acc_getActionDescription(id_action=id_action) res_keys = acc_getActionKeywordsString(id_action=id_action) bool_desc = res_desc == description and 1 or 0 bool_keys = res_keys == allowedkeywords and 1 or 0 bool_opti = acc_getActionIsOptional(id_action=id_action) return bool_desc and bool_keys and bool_opti and id_action or 0 def acc_updateAction(id_action=0, name_action='', verbose=0, **update): """try to change the values of given action details. if there is no change nothing is done. some changes require to update other parts of the database. id_action - id of the action to change name_action - if no id_action is given try to find it using this name **update - dictionary containg keywords: description, allowedkeywords and/or optional other keywords are ignored """ id_action = id_action or acc_getActionId(name_action=name_action) if not id_action: return 0 try: if update.has_key('description'): # change the description, no other effects if verbose: print 'desc' run_sql("""UPDATE accACTION SET description = '%s' WHERE id = %s""" % (update['description'], id_action)) if update.has_key('allowedkeywords'): # change allowedkeywords if verbose: print 'keys' # check if changing allowedkeywords or not if run_sql("""SELECT * FROM accACTION WHERE id = %s AND allowedkeywords != '%s' """ % (id_action, update['allowedkeywords'])): # change allowedkeywords if verbose: print ' changing' run_sql("""UPDATE accACTION SET allowedkeywords = '%s' WHERE id = %s""" % (update['allowedkeywords'], id_action)) # delete entries, but keep optional authorizations if there still is keywords if verbose: print ' deleting auths' run_sql("""DELETE FROM accROLE_accACTION_accARGUMENT WHERE id_accACTION = %s %s """ % (id_action, update['allowedkeywords'] and 'AND id_accARGUMENT != -1' or '')) if update.has_key('optional'): # check if there changing optional or not if verbose: print 'optional' if run_sql("""SELECT * FROM accACTION WHERE id = %s AND optional != '%s' """ % (id_action, update['optional'])): # change optional if verbose: print ' changing' run_sql("""UPDATE accACTION SET optional = '%s' WHERE id = %s""" % (update['optional'], id_action)) # setting it to no, delete authorizations with optional arguments if update['optional'] == 'no': if verbose: print ' deleting optional' run_sql("""DELETE FROM accROLE_accACTION_accARGUMENT WHERE id_accACTION = %s AND id_accARGUMENT = -1 AND argumentlistid = -1 """ % (id_action, )) except ProgrammingError: return 0 return 1 # ROLES def acc_addRole(name_role, description): """add a new role to accROLE in the database. name_role - name of the role, must be unique description - text to describe the role""" if not run_sql("""SELECT * FROM accROLE WHERE name = '%s'""" % (name_role, )): res = run_sql("""INSERT INTO accROLE (name, description) VALUES ('%s', '%s') """ % (name_role, description)) return res, name_role, description return 0 def acc_isRole(name_action,**arguments): """ check whether the role which allows action name_action on arguments exists action_name - name of the action arguments - arguments for authorization""" # first check if an action exists with this name query1 = """select a.id, a.allowedkeywords, a.optional from accACTION a where a.name = '%s'""" % (name_action) try: id_action, aallowedkeywords, optional = run_sql(query1)[0] except (ProgrammingError, IndexError): return 0 defkeys = aallowedkeywords.split(',') for key in arguments.keys(): if key not in defkeys: return 0 # then check if a role giving this authorization exists # create dictionary with default values and replace entries from input arguments defdict = {} for key in defkeys: try: defdict[key] = arguments[key] except KeyError: return 0 # all keywords must be present # except KeyError: defdict[key] = 'x' # default value, this is not in use... # create or-string from arguments str_args = '' for key in defkeys: if str_args: str_args += ' OR ' str_args += """(arg.keyword = '%s' AND arg.value = '%s')""" % (key, defdict[key]) query4 = """SELECT DISTINCT raa.id_accROLE, raa.id_accACTION, raa.argumentlistid, raa.id_accARGUMENT, arg.keyword, arg.value FROM accROLE_accACTION_accARGUMENT raa, accARGUMENT arg WHERE raa.id_accACTION = %s AND (%s) AND raa.id_accARGUMENT = arg.id """ % (id_action, str_args) try: res4 = run_sql(query4) except ProgrammingError: return 0 if not res4: return 0 # no entries at all res5 = [] for res in res4: res5.append(res) res5.sort() if len(defdict) == 1: return 1 cur_role = cur_action = cur_arglistid = 0 booldict = {} for key in defkeys: booldict[key] = 0 # run through the results for (role, action, arglistid, arg, keyword, val) in res5 + [(-1, -1, -1, -1, -1, -1)]: # not the same role or argumentlist (authorization group), i.e. check if thing are satisfied # if cur_arglistid != arglistid or cur_role != role or cur_action != action: if (cur_arglistid, cur_role, cur_action) != (arglistid, role, action): # test if all keywords are satisfied for value in booldict.values(): if not value: break else: return 1 # USER AUTHENTICATED TO PERFORM ACTION # assign the values for the current tuple from the query cur_arglistid, cur_role, cur_action = arglistid, role, action for key in booldict.keys(): booldict[key] = 0 # set keyword qualified for the action, (whatever result of the test) booldict[keyword] = 1 # matching failed return 0 def acc_deleteRole(id_role=0, name_role=0): """ delete role entry in table accROLE and all references from other tables. id_role - id of role to be deleted, prefered variable name_role - this is used if id_role is not given """ count = 0 id_role = id_role or acc_getRoleId(name_role=name_role) # try to delete if run_sql("""DELETE FROM accROLE WHERE id = %s """ % (id_role, )): # delete everything related # authorization entries count += 1 + run_sql("""DELETE FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s""" % (id_role, )) # connected users count += run_sql("""DELETE FROM user_accROLE WHERE id_accROLE = %s """ % (id_role, )) # delegated rights over the role rolenames = run_sql("""SELECT name FROM accROLE""") # string of rolenames roles_str = '' for (name, ) in rolenames: roles_str += (roles_str and ',' or '') + '"%s"' % (name, ) # arguments with non existing rolenames not_valid = run_sql("""SELECT ar.id FROM accARGUMENT ar WHERE keyword = 'role' AND value NOT IN (%s)""" % (roles_str, )) if not_valid: nv_str = '' for (id, ) in not_valid: nv_str += (nv_str and ',' or '') + '%s' % (id, ) # delete entries count += run_sql("""DELETE FROM accROLE_accACTION_accARGUMENT WHERE id_accACTION = %s AND id_accARGUMENT IN (%s) """ % (acc_getActionId(name_action=DELEGATEADDUSERROLE), nv_str)) # return number of deletes return count def acc_updateRole(id_role=0, name_role='', verbose=0, description=''): """try to change the description. id_role - id of the role to change name_role - use this to find id if not present verbose - extra output description - new description """ id_role = id_role or acc_getRoleId(name_role=name_role) if not id_role: return 0 return run_sql("""UPDATE accROLE SET description = '%s' WHERE id = %s AND description != '%s' """ % (description, id_role, description)) # CONNECTIONS BETWEEN USER AND ROLE def acc_addUserRole(id_user=0, id_role=0, email='', name_role=''): """ this function adds a new entry to table user_accROLE and returns it id_user, id_role - self explanatory email - email of the user name_role - name of the role, to be used instead of id. """ id_user = id_user or acc_getUserId(email=email) id_role = id_role or acc_getRoleId(name_role=name_role) # check if the id_role exists if id_role and not acc_getRoleName(id_role=id_role): return 0 # check that the user actually exist if not acc_getUserEmail(id_user=id_user): return 0 # control if existing entry if run_sql("""SELECT * FROM user_accROLE WHERE id_user = %s AND id_accROLE = %s""" % (id_user, id_role)): return id_user, id_role, 0 else: run_sql("""INSERT INTO user_accROLE (id_user, id_accROLE) VALUES (%s, %s) """ % (id_user, id_role)) return id_user, id_role, 1 def acc_deleteUserRole(id_user, id_role=0, name_role=0): """ function deletes entry from user_accROLE and reports the success. id_user - user in database id_role - role in the database, prefered parameter name_role - can also delete role on background of role name. """ # need to find id of the role id_role = id_role or acc_getRoleId(name_role=name_role) # number of deleted entries will be returned (0 or 1) return run_sql("""DELETE FROM user_accROLE WHERE id_user = %s AND id_accROLE = %s """ % (id_user, id_role)) # ARGUMENTS def acc_addArgument(keyword='', value=''): """ function to insert an argument into table accARGUMENT. if it exists the old id is returned, if it does not the entry is created and the new id is returned. keyword - inserted in keyword column value - inserted in value column. """ # if one of the values are missing, return 0 if not keyword or not value: return 0 # try to return id of existing argument try: return run_sql("""SELECT id from accARGUMENT where keyword = '%s' and value = '%s'""" % (keyword, value))[0][0] # return id of newly added argument except IndexError: return run_sql("""INSERT INTO accARGUMENT (keyword, value) values ('%s', '%s') """ % (keyword, value)) def acc_deleteArgument(id_argument): """ functions deletes one entry in table accARGUMENT. the success of the operation is returned. id_argument - id of the argument to be deleted""" # return number of deleted entries, 1 or 0 return run_sql("""DELETE FROM accARGUMENT WHERE id = %s """ % (id_argument, )) def acc_deleteArgument_names(keyword='', value=''): """delete argument according to keyword and value, send call to another function...""" # one of the values is missing if not keyword or not value: return 0 # find id of the entry try: return run_sql("""SELECT id from accARGUMENT where keyword = '%s' and value = '%s'""" % (keyword, value))[0][0] except IndexError: return 0 # AUTHORIZATIONS # ADD WITH names and keyval list def acc_addAuthorization(name_role='', name_action='', optional=0, **keyval): """ function inserts entries in accROLE_accACTION_accARGUMENT if all references are valid. this function is made specially for the webaccessadmin web interface. always inserting only one authorization. id_role, id_action - self explanatory, preferably used name_role, name_action - self explanatory, used if id not given optional - if this is set to 1, check that function can have optional arguments and add with arglistid -1 and id_argument -1 **keyval - dictionary of keyword=value pairs, used to find ids. """ inserted = [] # check that role and action exist id_role = run_sql("""SELECT id FROM accROLE where name = '%s'""" % (name_role, )) action_details = run_sql("""SELECT * from accACTION where name = '%s' """ % (name_action, )) if not id_role or not action_details: return [] # get role id and action id and details id_role, id_action = id_role[0][0], action_details[0][0] allowedkeywords_str = action_details[0][3] allowedkeywords_lst = acc_getActionKeywords(id_action=id_action) optional_action = action_details[0][4] == 'yes' and 1 or 0 optional = int(optional) # this action does not take arguments if not optional and not keyval: # can not add if user is doing a mistake if allowedkeywords_str: return [] # check if entry exists if not run_sql("""SELECT * FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s AND id_accACTION = %s AND argumentlistid = %s AND id_accARGUMENT = %s""" % (id_role, id_action, 0, 0)): # insert new authorization run_sql("""INSERT INTO accROLE_accACTION_accARGUMENT values (%s, %s, %s, %s)""" % (id_role, id_action, 0, 0)) return [[id_role, id_action, 0, 0], ] return [] # try to add authorization without the optional arguments elif optional: # optional not allowed for this action if not optional_action: return [] # check if authorization already exists if not run_sql("""SELECT * FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s AND id_accACTION = %s AND id_accARGUMENT = -1 AND argumentlistid = -1""" % (id_role, id_action, )): # insert new authorization run_sql("""INSERT INTO accROLE_accACTION_accARGUMENT (id_accROLE, id_accACTION, id_accARGUMENT, argumentlistid) VALUES (%s, %s, -1, -1) """ % (id_role, id_action)) return [[id_role, id_action, -1, -1], ] return [] else: # regular authorization # get list of ids, if they don't exist, create arguments id_arguments = [] argstr = '' for key in keyval.keys(): if key not in allowedkeywords_lst: return [] id_argument = (acc_getArgumentId(key, keyval[key]) or run_sql("""INSERT INTO accARGUMENT (keyword, value) values ('%s', '%s') """ % (key, keyval[key]))) id_arguments.append(id_argument) argstr += (argstr and ',' or '') + str(id_argument) # check if equal authorization exists for (id_trav, ) in run_sql("""SELECT DISTINCT argumentlistid FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = '%s' AND id_accACTION = '%s' """% (id_role, id_action)): listlength = run_sql("""SELECT COUNT(*) FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = '%s' AND id_accACTION = '%s' AND argumentlistid = '%s' AND id_accARGUMENT IN (%s) """ % (id_role, id_action, id_trav, argstr))[0][0] notlist = run_sql("""SELECT COUNT(*) FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = '%s' AND id_accACTION = '%s' AND argumentlistid = '%s' AND id_accARGUMENT NOT IN (%s) """ % (id_role, id_action, id_trav, argstr))[0][0] # this means that a duplicate already exists if not notlist and listlength == len(id_arguments): return [] # find new arglistid, highest + 1 try: arglistid = 1 + run_sql("""SELECT MAX(argumentlistid) FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s AND id_accACTION = %s """ % (id_role, id_action))[0][0] except (IndexError, TypeError): arglistid = 1 if arglistid <= 0: arglistid = 1 # insert for id_argument in id_arguments: run_sql("""INSERT INTO accROLE_accACTION_accARGUMENT values (%s, %s, %s, %s) """ % (id_role, id_action, id_argument, arglistid)) inserted.append([id_role, id_action, id_argument, arglistid]) return inserted def acc_addRoleActionArguments(id_role=0, id_action=0, arglistid=-1, optional=0, verbose=0, id_arguments=[]): """ function inserts entries in accROLE_accACTION_accARGUMENT if all references are valid. id_role, id_action - self explanatory arglistid - argumentlistid for the inserted entries if -1: create new group other values: add to this group, if it exists or not optional - if this is set to 1, check that function can have optional arguments and add with arglistid -1 and id_argument -1 verbose - extra output id_arguments - list of arguments to add to group.""" inserted = [] if verbose: print 'ids: starting' if verbose: print 'ids: checking ids' # check that all the ids are valid and reference something... if not run_sql("""SELECT * FROM accROLE WHERE id = %s""" % (id_role, )): return 0 if verbose: print 'ids: get allowed keywords' # check action exist and get allowed keywords try: allowedkeys = acc_getActionKeywords(id_action=id_action) # allowedkeys = run_sql("""SELECT * FROM accACTION WHERE id = %s""" % (id_action, ))[0][3].split(',') except (IndexError, AttributeError): return 0 if verbose: print 'ids: is it optional' # action with optional arguments if optional: if verbose: print 'ids: yes - optional' if not acc_getActionIsOptional(id_action=id_action): return [] if verbose: print 'ids: run query to check if exists' if not run_sql("""SELECT * FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s AND id_accACTION = %s AND id_accARGUMENT = -1 AND argumentlistid = -1""" % (id_role, id_action, )): if verbose: print 'ids: does not exist' run_sql("""INSERT INTO accROLE_accACTION_accARGUMENT (id_accROLE, id_accACTION, id_accARGUMENT, argumentlistid) VALUES (%s, %s, -1, -1) """ % (id_role, id_action)) return ((id_role, id_action, -1, -1), ) if verbose: print 'ids: exists' return [] if verbose: print 'ids: check if not arguments' # action without arguments if not allowedkeys: if verbose: print 'ids: not arguments' if not run_sql("""SELECT * FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s AND id_accACTION = %s AND argumentlistid = %s AND id_accARGUMENT = %s""" % (id_role, id_action, 0, 0)): if verbose: print 'ids: try to insert' result = run_sql("""INSERT INTO accROLE_accACTION_accARGUMENT values (%s, %s, %s, %s)""" % (id_role, id_action, 0, 0)) return ((id_role, id_action, 0, 0), ) else: if verbose: print 'ids: already existed' return 0 else: if verbose: print 'ids: arguments exist' argstr = '' # check that the argument exists, and that it is a valid key if verbose: print 'ids: checking all the arguments' for id_argument in id_arguments: res_arg = run_sql("""SELECT * FROM accARGUMENT WHERE id = %s""" % (id_argument, )) if not res_arg or res_arg[0][1] not in allowedkeys: return 0 else: if argstr: argstr += ',' argstr += '%s' % (id_argument, ) # arglistid = -1 means that the user wants a new group if verbose: print 'ids: find arglistid' if arglistid < 0: # check if such single group already exists for (id_trav, ) in run_sql("""SELECT DISTINCT argumentlistid FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = '%s' AND id_accACTION = '%s' """ % (id_role, id_action)): listlength = run_sql("""SELECT COUNT(*) FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = '%s' AND id_accACTION = '%s' AND argumentlistid = '%s' AND id_accARGUMENT IN (%s) """ % (id_role, id_action, id_trav, argstr))[0][0] notlist = run_sql("""SELECT COUNT(*) FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = '%s' AND id_accACTION = '%s' AND argumentlistid = '%s' AND id_accARGUMENT NOT IN (%s) """ % (id_role, id_action, id_trav, argstr))[0][0] # this means that a duplicate already exists if not notlist and listlength == len(id_arguments): return 0 # find new arglistid try: arglistid = run_sql("""SELECT MAX(argumentlistid) FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s AND id_accACTION = %s """ % (id_role, id_action))[0][0] + 1 except ProgrammingError: return 0 except (IndexError, TypeError): arglistid = 1 if arglistid <= 0: arglistid = 1 if verbose: print 'ids: insert all the entries' # all references are valid, insert: one entry in raa for each argument for id_argument in id_arguments: if not run_sql("""SELECT * FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s AND id_accACTION = %s AND id_accARGUMENT = %s AND argumentlistid = %s""" % (id_role, id_action, id_argument, arglistid)): run_sql("""INSERT INTO accROLE_accACTION_accARGUMENT (id_accROLE, id_accACTION, id_accARGUMENT, argumentlistid) VALUES (%s, %s, %s, %s) """ % (id_role, id_action, id_argument, arglistid)) inserted.append((id_role, id_action, id_argument, arglistid)) # [(r, ac, ar1, aid), (r, ac, ar2, aid)] if verbose: print 'ids: inside add function' for r in acc_findPossibleActions(id_role=id_role, id_action=id_action): print 'ids: ', r return inserted def acc_addRoleActionArguments_names(name_role='', name_action='', arglistid=-1, optional=0, verbose=0, **keyval): """ this function makes it possible to pass names when creating new entries instead of ids. get ids for all the names, create entries in accARGUMENT that does not exist, pass on to id based function. name_role, name_action - self explanatory arglistid - add entries to or create group with arglistid, default -1 create new. optional - create entry with optional keywords, **keyval is ignored, but should be empty verbose - used to print extra information **keyval - dictionary of keyword=value pairs, used to find ids. """ if verbose: print 'names: starting' if verbose: print 'names: checking ids' # find id of the role, return 0 if it doesn't exist id_role = run_sql("""SELECT id FROM accROLE where name = '%s'""" % (name_role, )) if id_role: id_role = id_role[0][0] else: return 0 # find id of the action, return 0 if it doesn't exist res = run_sql("""SELECT * from accACTION where name = '%s'""" % (name_action, )) if res: id_action = res[0][0] else: return 0 if verbose: print 'names: checking arguments' id_arguments = [] if not optional: if verbose: print 'names: not optional' # place to keep ids of arguments and list of allowed keywords allowedkeys = acc_getActionKeywords(id_action=id_action) # res[0][3].split(',') # find all the id_arguments and create those that does not exist for key in keyval.keys(): # this key does not exist if key not in allowedkeys: return 0 id_argument = acc_getArgumentId(key, keyval[key]) id_argument = id_argument or run_sql("""INSERT INTO accARGUMENT (keyword, value) values ('%s', '%s') """ % (key, keyval[key])) id_arguments.append(id_argument) # append the id to the list else: if verbose: print 'names: optional' # use the other function return acc_addRoleActionArguments(id_role=id_role, id_action=id_action, arglistid=arglistid, optional=optional, verbose=verbose, id_arguments=id_arguments) # DELETE WITH ID OR NAMES def acc_deleteRoleActionArguments(id_role, id_action, arglistid=1, auths=[[]]): """delete all entries in accROLE_accACTION_accARGUMENT that satisfy the parameters. return number of actual deletes. this function relies on the id-lists in auths to have the same order has the possible actions... id_role, id_action - self explanatory arglistid - group to delete from. if more entries than deletes, split the group before delete. id_arguments - list of ids to delete.""" keepauths = [] # these will be kept # find all possible actions pas = acc_findPossibleActions_ids(id_role, id_action) header = pas[0] # decide which to keep or throw away for pa in pas[1:]: if pa[0] == arglistid and pa[1:] not in auths: keepauths.append(pa[1:]) # delete everything run_sql("""DELETE FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s AND id_accACTION = %s AND argumentlistid = %s """ % (id_role, id_action, arglistid)) # insert those to be kept for auth in keepauths: acc_addRoleActionArguments(id_role=id_role, id_action=id_action, arglistid=-1, id_arguments=auth) return 1 def acc_deleteRoleActionArguments_names(name_role='', name_action='', arglistid=1, **keyval): """utilize the function on ids by first finding all ids and redirecting the function call. break of and return 0 if any of the ids can't be found. name_role = name of the role name_action - name of the action arglistid - the argumentlistid, all keyword=value pairs must be in this same group. **keyval - dictionary of keyword=value pairs for the arguments.""" # find ids for role and action id_role = acc_getRoleId(name_role=name_role) id_action = acc_getActionId(name_action=name_action) # create string with the ids idstr = '' idlist = [] for key in keyval.keys(): id = acc_getArgumentId(key, keyval[key]) if not id: return 0 if idstr: idstr += ',' idstr += '%s' % id idlist.append(id) # control that a fitting group exists try: count = run_sql("""SELECT COUNT(*) FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s AND id_accACTION = %s AND argumentlistid = %s AND id_accARGUMENT IN (%s)""" % (id_role, id_action, arglistid, idstr))[0][0] except IndexError: return 0 if count < len(keyval): return 0 # call id based function return acc_deleteRoleActionArguments(id_role, id_action, arglistid, [idlist]) def acc_deleteRoleActionArguments_group(id_role=0, id_action=0, arglistid=0): """delete entire group of arguments for connection between role and action.""" if not id_role or not id_action: return [] return run_sql("""DELETE FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s AND id_accACTION = %s AND argumentlistid = %s """ % (id_role, id_action, arglistid)) def acc_deletePossibleActions(id_role=0, id_action=0, authids=[]): """delete authorizations in selected rows. utilization of the delete function. id_role - id of role to be connected to action. id_action - id of action to be connected to role authids - list of row indexes to be removed. """ # find all authorizations pas = acc_findPossibleActions(id_role=id_role, id_action=id_action) # get the keys keys = pas[0][1:] # create dictionary for all the argumentlistids ald = {} for authid in authids: if authid > len(pas): return authid, len(pas) # get info from possible action id = pas[authid][0] values = pas[authid][1:] # create list of authids for each authorization auth = [acc_getArgumentId(keys[0], values[0])] for i in range(1, len(keys)): auth.append(acc_getArgumentId(keys[i], values[i])) # create entries in the dictionary for each argumentlistid try: ald[id].append(auth) except KeyError: ald[id] = [auth] # do the deletes result = 1 for key in ald.keys(): result = 1 and acc_deleteRoleActionArguments(id_role=id_role, id_action=id_action, arglistid=key, auths=ald[key]) return result def acc_deleteRoleAction(id_role=0, id_action=0): """delete all connections between a role and an action. """ count = run_sql("""DELETE FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = '%s' AND id_accACTION = '%s' """ % (id_role, id_action)) return count # GET FUNCTIONS # ACTION RELATED def acc_getActionId(name_action): """get id of action when name is given name_action - name of the wanted action""" try: return run_sql("""SELECT id FROM accACTION WHERE name = '%s'""" % (name_action, ))[0][0] except IndexError: return 0 def acc_getActionName(id_action): """get name of action when id is given. """ try: return run_sql("""SELECT name FROM accACTION WHERE id = %s""" % (id_action, ))[0][0] except (ProgrammingError, IndexError): return '' def acc_getActionDescription(id_action): """get description of action when id is given. """ try: return run_sql("""SELECT description FROM accACTION WHERE id = %s""" % (id_action, ))[0][0] except (ProgrammingError, IndexError): return '' def acc_getActionKeywords(id_action=0, name_action=''): """get list of keywords for action when id is given. empty list if no keywords.""" result = acc_getActionKeywordsString(id_action=id_action, name_action=name_action) if result: return result.split(',') else: return [] def acc_getActionKeywordsString(id_action=0, name_action=''): """get keywordstring when id is given. """ id_action = id_action or acc_getActionId(name_action) try: result = run_sql("""SELECT allowedkeywords from accACTION where id = %s """ % (id_action, ))[0][0] except IndexError: return '' return result def acc_getActionIsOptional(id_action=0): """get if the action arguments are optional or not. return 1 if yes, 0 if no.""" result = acc_getActionOptional(id_action=id_action) return result == 'yes' and 1 or 0 def acc_getActionOptional(id_action=0): """get if the action arguments are optional or not. return result, but 0 if action does not exist. """ try: result = run_sql("""SELECT optional from accACTION where id = %s """ % (id_action, ))[0][0] except IndexError: return 0 return result def acc_getActionDetails(id_action=0): """get all the fields for an action.""" details = [] try: result = run_sql("""SELECT * FROM accACTION WHERE id = %s """ % (id_action, ))[0] except IndexError: return details if result: for r in result: details.append(r) return details def acc_getAllActions(): """returns all entries in accACTION.""" return run_sql("""SELECT a.id, a.name, a.description FROM accACTION a ORDER BY a.name""") def acc_getActionRoles(id_action): return run_sql("""SELECT DISTINCT(r.id), r.name, r.description FROM accROLE_accACTION_accARGUMENT raa LEFT JOIN accROLE r ON raa.id_accROLE = r.id WHERE raa.id_accACTION = %s ORDER BY r.name """ % (id_action, )) # ROLE RELATED def acc_getRoleId(name_role): """get id of role, name given. """ try: return run_sql("""SELECT id FROM accROLE WHERE name = %s""", (name_role, ))[0][0] except IndexError: return 0 def acc_getRoleName(id_role): """get name of role, id given. """ try: return run_sql("""SELECT name FROM accROLE WHERE id = %s""", (id_role, ))[0][0] except IndexError: return '' def acc_getRoleDetails(id_role=0): """get all the fields for an action.""" details = [] try: result = run_sql("""SELECT * FROM accROLE WHERE id = %s """, (id_role, ))[0] except IndexError: return details if result: for r in result: details.append(r) return details def acc_getAllRoles(): """get all entries in accROLE.""" return run_sql("""SELECT r.id, r.name, r.description FROM accROLE r ORDER BY r.name""") def acc_getRoleActions(id_role): """get all actions connected to a role. """ return run_sql("""SELECT DISTINCT(a.id), a.name, a.description FROM accROLE_accACTION_accARGUMENT raa, accACTION a WHERE raa.id_accROLE = %s and raa.id_accACTION = a.id ORDER BY a.name """, (id_role, )) def acc_getRoleUsers(id_role): """get all users that have access to a role. """ return run_sql("""SELECT DISTINCT(u.id), u.email, u.settings FROM user_accROLE ur, user u WHERE ur.id_accROLE = %s AND u.id = ur.id_user ORDER BY u.email""", (id_role, )) # ARGUMENT RELATED def acc_getArgumentId(keyword, value): """get id of argument, keyword=value pair given. value = 'optional value' is replaced for id_accARGUMENT = -1.""" try: return run_sql("""SELECT DISTINCT id FROM accARGUMENT WHERE keyword = %s and value = %s""", (keyword, value))[0][0] except IndexError: if value == 'optional value': return -1 return 0 # USER RELATED def acc_getUserEmail(id_user=0): """get email of user, id given.""" try: return run_sql("""SELECT email FROM user WHERE id = %s """, (id_user, ))[0][0] except IndexError: return '' def acc_getUserId(email=''): """get id of user, email given.""" try: return run_sql("""SELECT id FROM user WHERE email = %s """, (email, ))[0][0] except IndexError: return 0 def acc_getUserRoles(id_user=0): """get all roles a user is connected to.""" res = run_sql("""SELECT ur.id_accROLE FROM user_accROLE ur WHERE ur.id_user = %s ORDER BY ur.id_accROLE""", (id_user, )) return res def acc_findUserInfoIds(id_user=0): """find all authorization entries for all the roles a user is connected to.""" res1 = run_sql("""SELECT ur.id_user, raa.* FROM user_accROLE ur LEFT JOIN accROLE_accACTION_accARGUMENT raa ON ur.id_accROLE = raa.id_accROLE WHERE ur.id_user = %s """, (id_user, )) res2 = [] for res in res1: res2.append(res) res2.sort() return res2 def acc_findUserInfoNames(id_user=0): query = """ SELECT ur.id_user, r.name, ac.name, raa.argumentlistid, ar.keyword, ar.value FROM accROLE_accACTION_accARGUMENT raa, user_accROLE ur, accROLE r, accACTION ac, accARGUMENT ar WHERE ur.id_user = %s and ur.id_accROLE = raa.id_accROLE and raa.id_accROLE = r.id and raa.id_accACTION = ac.id and raa.id_accARGUMENT = ar.id """ % (id_user, ) res1 = run_sql(query) res2 = [] for res in res1: res2.append(res) res2.sort() return res2 def acc_findUserRoleActions(id_user=0): """find name of all roles and actions connected to user, id given.""" query = """SELECT DISTINCT r.name, a.name FROM user_accROLE ur, accROLE_accACTION_accARGUMENT raa, accACTION a, accROLE r WHERE ur.id_user = %s and ur.id_accROLE = raa.id_accROLE and raa.id_accACTION = a.id and raa.id_accROLE = r.id """ % (id_user, ) res1 = run_sql(query) res2 = [] for res in res1: res2.append(res) res2.sort() return res2 # POSSIBLE ACTIONS / AUTHORIZATIONS def acc_findPossibleActionsAll(id_role): """find all the possible actions for a role. the function utilizes acc_findPossibleActions to find all the entries from each of the actions under the given role id_role - role to find all actions for returns a list with headers""" query = """SELECT DISTINCT(aar.id_accACTION) FROM accROLE_accACTION_accARGUMENT aar WHERE aar.id_accROLE = %s ORDER BY aar.id_accACTION""" % (id_role, ) res = [] for (id_action, ) in run_sql(query): hlp = acc_findPossibleActions(id_role, id_action) if hlp: res.append(['role', 'action'] + hlp[0]) for row in hlp[1:]: res.append([id_role, id_action] + row) return res def acc_findPossibleActionsArgumentlistid(id_role, id_action, arglistid): """find all possible actions with the given arglistid only.""" # get all, independent of argumentlistid res1 = acc_findPossibleActions_ids(id_role, id_action) # create list with only those with the right arglistid res2 = [] for row in res1[1:]: if row[0] == arglistid: res2.append(row) # return this list return res2 def acc_findPossibleActionsUser(id_user, id_action): """user based function to find all action combination for a given user and action. find all the roles and utilize findPossibleActions for all these. id_user - user id, used to find roles id_action - action id. """ res = [] for (id_role, ) in acc_getUserRoles(id_user): hlp = acc_findPossibleActions(id_role, id_action) if hlp and not res: res.append(['role'] + hlp[0]) for row in hlp[1:]: res.append([id_role] + row) return res def acc_findPossibleActions_ids(id_role, id_action): """finds the ids of the possible actions. utilization of acc_getArgumentId and acc_findPossibleActions. """ pas = acc_findPossibleActions(id_role, id_action) if not pas: return [] keys = pas[0] pas_ids = [pas[0:1]] for pa in pas[1:]: auth = [pa[0]] for i in range(1, len(pa)): auth.append(acc_getArgumentId(keys[i], pa[i])) pas_ids.append(auth) return pas_ids def acc_findPossibleActions(id_role, id_action): """Role based function to find all action combinations for a give role and action. id_role - id of role in the database id_action - id of the action in the database returns a list with all the combinations. first row is used for header.""" # query to find all entries for user and action res1 = run_sql(""" SELECT raa.argumentlistid, ar.keyword, ar.value FROM accROLE_accACTION_accARGUMENT raa, accARGUMENT ar WHERE raa.id_accROLE = %s and raa.id_accACTION = %s and raa.id_accARGUMENT = ar.id """, (id_role, id_action)) # find needed keywords, create header keywords = acc_getActionKeywords(id_action=id_action) keywords.sort() if not keywords: # action without arguments if run_sql("""SELECT * FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s AND id_accACTION = %s AND id_accARGUMENT = 0 AND argumentlistid = 0""", (id_role, id_action)): return [['#', 'argument keyword'], ['0', 'action without arguments']] # tuples into lists res2, arglistids = [], {} for res in res1: res2.append([]) for r in res: res2[-1].append(r) res2.sort() # create multilevel dictionary for res in res2: a, kw, value = res # rolekey, argumentlistid, keyword, value if kw not in keywords: continue if not arglistids.has_key(a): arglistids[a] = {} # fill dictionary if not arglistids[a].has_key(kw): arglistids[a][kw] = [value] elif not value in arglistids[a][kw]: arglistids[a][kw] = arglistids[a][kw] + [value] # fill list with all possible combinations res3 = [] # rolekeys = roles2.keys(); rolekeys.sort() for a in arglistids.keys(): # argumentlistids # fill a list with the new entries, shortcut and copying first keyword list next_arglistid = [] for row in arglistids[a][keywords[0]]: next_arglistid.append([a, row[:] ]) # run through the rest of the keywords for kw in keywords[1:]: if not arglistids[a].has_key(kw): arglistids[a][kw] = ['optional value'] new_list = arglistids[a][kw][:] new_len = len(new_list) # duplicate the list temp_list = [] for row in next_arglistid: for i in range(new_len): temp_list.append(row[:]) # append new values for i in range(len(temp_list)): new_item = new_list[i % new_len][:] temp_list[i].append( new_item ) next_arglistid = temp_list[:] res3.extend(next_arglistid) res3.sort() # if optional allowed, put on top opt = run_sql("""SELECT * FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s AND id_accACTION = %s AND id_accARGUMENT = -1 AND argumentlistid = -1""" % (id_role, id_action)) if opt: res3.insert(0, [-1] + ['optional value'] * len(keywords)) # put header on top if res3: res3.insert(0, ['#'] + keywords) return res3 def acc_splitArgumentGroup(id_role=0, id_action=0, arglistid=0): """collect the arguments, find all combinations, delete original entries and insert the new ones with different argumentlistids for each group id_role - id of the role id_action - id of the action arglistid - argumentlistid to be splittetd""" if not id_role or not id_action or not arglistid: return [] # don't split if none or one possible actions res = acc_findPossibleActionsArgumentlistid(id_role, id_action, arglistid) if not res or len(res) <= 1: return 0 # delete the existing group delete = acc_deleteRoleActionArguments_group(id_role, id_action, arglistid) # add all authorizations with new and different argumentlistid addlist = [] for row in res: argids = row[1:] addlist.append(acc_addRoleActionArguments(id_role=id_role, id_action=id_action, arglistid=-1, id_arguments=argids)) # return list of added authorizations return addlist def acc_mergeArgumentGroups(id_role=0, id_action=0, arglistids=[]): """merge the authorizations from groups with different argumentlistids into one single group. this can both save entries in the database and create extra authorizations. id_role - id of the role id_action - role of the action arglistids - list of groups to be merged together into one.""" if len(arglistids) < 2: return [] argstr = '' for id in arglistids: argstr += 'raa.argumentlistid = %s or ' % (id, ) argstr = '(%s)' % (argstr[:-4], ) # query to find all entries that will be merged query = """ SELECT ar.keyword, ar.value, raa.id_accARGUMENT FROM accROLE_accACTION_accARGUMENT raa, accARGUMENT ar WHERE raa.id_accROLE = %s and raa.id_accACTION = %s and %s and raa.id_accARGUMENT = ar.id """ % (id_role, id_action, argstr) q_del = """DELETE FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE = %s and id_accACTION = %s and %s """ % (id_role, id_action, argstr.replace('raa.', '')) res = run_sql(query) if not res: return [] run_sql(q_del) # list of entire entries old = [] # list of only the ids ids = [] for (k, v, id) in res: if [k, v, id] not in old: old.append([k, v, id]) ids.append(id) # for (k, v, id) in res: if id not in ids: ids.append(id) return acc_addRoleActionArguments(id_role=id_role, id_action=id_action, arglistid=-1, id_arguments=ids) def acc_reset_default_settings(superusers=[]): """reset to default by deleting everything and adding default. superusers - list of superuser emails """ remove = acc_delete_all_settings() add = acc_add_default_settings(superusers=superusers) return remove, add def acc_delete_all_settings(): """simply remove all data affiliated with webaccess by truncating tables accROLE, accACTION, accARGUMENT and those connected. """ run_sql("""TRUNCATE accROLE""") run_sql("""TRUNCATE accACTION""") run_sql("""TRUNCATE accARGUMENT""") run_sql("""TRUNCATE user_accROLE""") run_sql("""TRUNCATE accROLE_accACTION_accARGUMENT""") return 1 def acc_add_default_settings(superusers=[]): """add the default settings if they don't exist. superusers - list of superuser emails """ # imported from config global supportemail # imported from access_control_config global def_roles global def_users global def_actions global def_auths # from superusers: allow input formats ['email1', 'email2'] and [['email1'], ['email2']] and [['email1', id], ['email2', id]] for user in superusers: if type(user) is str: user = [user] def_users.append(user[0]) if supportemail not in def_users: def_users.append(supportemail) # add data # add roles insroles = [] for (name, description) in def_roles: # try to add, don't care if description is different id = acc_addRole(name_role=name, description=description) if not id: id = acc_getRoleId(name_role=name) acc_updateRole(id_role=id, description=description) insroles.append([id, name, description]) # add users to superadmin insuserroles = [] for user in def_users: insuserroles.append(acc_addUserRole(email=user, name_role=SUPERADMINROLE)) # add actions insactions = [] for (name, description, allkeys, optional) in def_actions: # try to add action as new id = acc_addAction(name, description, optional, allkeys) # action with the name exist if not id: id = acc_getActionId(name_action=name) # update the action, necessary updates to the database will also be done acc_updateAction(id_action=id, optional=optional, allowedkeywords=allkeys) # keep track of inserted actions insactions.append([id, name, description, allkeys]) # add authorizations insauths = [] for (name_role, name_action, arglistid, optional, args) in def_auths: # add the authorization acc_addRoleActionArguments_names(name_role=name_role, name_action=name_action, arglistid=arglistid, optional=optional, **args) # keep track of inserted authorizations insauths.append([name_role, name_action, arglistid, optional, args]) return insroles, insactions, insuserroles, insauths def acc_find_delegated_roles(id_role_admin=0): """find all the roles the admin role has delegation rights over. return tuple of all the roles. id_role_admin - id of the admin role """ id_action_delegate = acc_getActionId(name_action=DELEGATEADDUSERROLE) rolenames = run_sql("""SELECT DISTINCT(ar.value) FROM accROLE_accACTION_accARGUMENT raa LEFT JOIN accARGUMENT ar ON raa.id_accARGUMENT = ar.id WHERE raa.id_accROLE = '%s' AND raa.id_accACTION = '%s' """ % (id_role_admin, id_action_delegate)) result = [] for (name_role, ) in rolenames: roledetails = run_sql("""SELECT * FROM accROLE WHERE name = %s """, (name_role, )) if roledetails: result.append(roledetails) return result def acc_cleanupArguments(): """function deletes all accARGUMENTs that are not referenced by accROLE_accACTION_accARGUMENT. returns how many arguments where deleted and a list of the deleted id_arguments""" # find unreferenced arguments ids1 = run_sql("""SELECT DISTINCT ar.id FROM accARGUMENT ar LEFT JOIN accROLE_accACTION_accARGUMENT raa ON ar.id = raa.id_accARGUMENT WHERE raa.id_accARGUMENT IS NULL """) # it is clean if not ids1: return 1 # create list and string of the ids ids2 = [] idstr = '' for (id, ) in ids1: ids2.append(id) if idstr: idstr += ',' idstr += '%s' % id # delete unreferenced arguments count = run_sql("""DELETE FROM accARGUMENT WHERE id in (%s)""" % (idstr, )) # return count and ids of deleted arguments return (count, ids2) def acc_cleanupUserRoles(): """remove all entries in user_accROLE referencing non-existing roles. return number of deletes and the ids. FIXME: THIS FUNCTION HAS NOT BEEN TESTED """ # find unreferenced arguments ids1 = run_sql("""SELECT DISTINCT ur.id_accROLE FROM accROLE ur LEFT JOIN accROLE r ON ur.id_accROLE = r.id WHERE r.id IS NULL""") # it is clean if not ids1: return 1 # create list and string of the ids ids2 = [] idstr = '' for (id, ) in ids1: ids2.append(id) if idstr: idstr += ',' idstr += '%s' % id # delete unreferenced arguments count = run_sql("""DELETE FROM user_accROLE WHERE id_accROLE in (%s)""" % (idstr, )) # return count and ids of deleted arguments return (count, ids2) def acc_garbage_collector(verbose=0): """clean the entire database for unused data""" # keep track of all deleted entries del_entries = [] # user_accROLEs without existing role or user count = 0 # roles have been deleted id_roles = run_sql("""SELECT DISTINCT r.id FROM accROLE r""") idrolesstr = '' for (id, ) in id_roles: idrolesstr += (idrolesstr and ',' or '') + '%s' % id if idrolesstr: count += run_sql("""DELETE FROM user_accROLE WHERE id_accROLE NOT IN (%s)""" % (idrolesstr, )) # users have been deleted id_users = run_sql("""SELECT DISTINCT u.id FROM user u WHERE email != ''""") idusersstr = '' for (id, ) in id_users: idusersstr += (idusersstr and ',' or '') + '%s' % id if idusersstr: count += run_sql("""DELETE FROM user_accROLE WHERE id_user NOT IN (%s) """ % (idusersstr, )) del_entries.append([count]) # accROLE_accACTION_accARGUMENT where role is deleted count = 0 if idrolesstr: count += run_sql("""DELETE FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE NOT IN (%s)""" % (idrolesstr, )) # accROLE_accACTION_accARGUMENT where action is deleted id_actions = run_sql("""SELECT DISTINCT a.id FROM accACTION a""") idactionsstr = '' for (id, ) in id_actions: idactionsstr += (idactionsstr and ',' or '') + '%s' % id # FIXME: here was a syntactic bug, so check the code! if idactionsstr: count += run_sql("""DELETE FROM accROLE_accACTION_accARGUMENT WHERE id_accACTION NOT IN (%s)""" % (idactionsstr, )) del_entries.append([count]) # delegated roles that does not exist nameroles = run_sql("""SELECT DISTINCT r.name FROM accROLE r""") namestr = '' for (name, ) in nameroles: namestr += (namestr and ',' or '') + '"%s"' % name if namestr: idargs = run_sql("""SELECT ar.id FROM accARGUMENT WHERE keyword = 'role' AND value NOT IN (%s) """ % (namestr, )) idstr = '' for (id, ) in idargs: idstr += (idstr and ',' or '') + '%s' % id if namestr and idstr: count = run_sql("""DELETE FROM accROLE_accACTION_accARGUMENT WHERE id_accARGUMENT IN (%s) """ % (idstr, )) else: count = 0 del_entries.append([0]) # delete unreferenced arguments unused_args = run_sql("""SELECT DISTINCT ar.id FROM accARGUMENT ar LEFT JOIN accROLE_accACTION_accARGUMENT raa ON ar.id = raa.id_accARGUMENT WHERE raa.id_accARGUMENT IS NULL """) args = [] idstr = '' for (id, ) in unused_args: args.append(id) idstr += (idstr and ',' or '') + '%s' % id count = run_sql("""DELETE FROM accARGUMENT WHERE id in (%s)""" % (idstr, )) del_entries.append([count, args]) # return statistics return del_entries diff --git a/modules/webaccess/lib/access_control_config.py b/modules/webaccess/lib/access_control_config.py index 7531e7422..924ba0cca 100644 --- a/modules/webaccess/lib/access_control_config.py +++ b/modules/webaccess/lib/access_control_config.py @@ -1,136 +1,136 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware Access Control Config. """ __version__ = "$Id$" ## global configuration parameters: -from config import * +from cdsware.config import * ## local configuration parameters: -from external_authentication import * +from cdsware.external_authentication import * # VALUES TO BE EXPORTED # CURRENTLY USED BY THE FILES access_control_engine.py access_control_admin.py webaccessadmin_lib.py # name of the role giving superadmin rights SUPERADMINROLE = 'superadmin' # name of the webaccess webadmin role WEBACCESSADMINROLE = 'webaccessadmin' # name of the action allowing roles to access the web administrator interface WEBACCESSACTION = 'cfgwebaccess' # name of the action allowing roles to delegate the rights to other roles # ex: libraryadmin to delegate libraryworker DELEGATEADDUSERROLE = 'accdelegaterole' # max number of users to display in the drop down selects MAXSELECTUSERS = 25 # max number of users to display in a page (mainly for user area) MAXPAGEUSERS = 25 # Use external source for access control? # Atleast one must be added # Adviced not to change the name, since it is used to identify the account # Format is: System name: (System class, Default True/Flase), atleast one must be default CFG_EXTERNAL_AUTHENTICATION = {"%s (internal)" % cdsname: (None, True)} #CFG_EXTERNAL_AUTHENTICATION = {"%s (internal)" % cdsname: (None, True), "CERN NICE (external)": (external_auth_nice(), False)} # default data for the add_default_settings function # roles # name description def_roles = ((SUPERADMINROLE, 'superuser with all rights'), ('photoadmin', 'Photo collection administrator'), (WEBACCESSADMINROLE, 'WebAccess administrator')) # users # list of e-mail addresses def_users = [] # actions # name desc allowedkeywords optional def_actions = ( ('cfgwebsearch', 'configure WebSearch', '', 'no'), ('cfgbibformat', 'configure BibFormat', '', 'no'), ('cfgwebsubmit', 'configure WebSubmit', '', 'no'), ('runbibindex', 'run BibIndex', '', 'no'), ('runbibupload', 'run BibUpload', '', 'no'), ('runwebcoll', 'run webcoll', 'collection', 'yes'), ('runbibformat', 'run BibFormat', 'format', 'yes'), (WEBACCESSACTION, 'configure WebAccess', '', 'no'), (DELEGATEADDUSERROLE, 'delegate subroles inside WebAccess', 'role', 'no'), ('runbibtaskex', 'run BibTaskEx example', '', 'no'), ('referee', 'referee document type doctype/category categ', 'doctype,categ', 'yes'), ('submit', 'use webSubmit', 'doctype,act', 'yes'), ('runbibrank', 'run BibRank', '', 'no'), ('cfgbibrank', 'configure BibRank', '', 'no'), ('cfgbibharvest', 'configure BibHarvest', '', 'no'), ('runoaiharvest', 'run oaiharvest task', '', 'no'), ('cfgwebcomment', 'configure WebComment', '', 'no'), ) # authorizations # role action arglistid optional arguments def_auths = ( (SUPERADMINROLE, 'cfgwebsearch', -1, 0, {}), (SUPERADMINROLE, 'cfgbibformat', -1, 0, {}), (SUPERADMINROLE, 'cfgwebsubmit', -1, 0, {}), (SUPERADMINROLE, 'runbibindex', -1, 0, {}), (SUPERADMINROLE, 'runbibupload', -1, 0, {}), (SUPERADMINROLE, 'runbibformat', -1, 1, {}), (SUPERADMINROLE, WEBACCESSACTION, -1, 0, {}), ('photoadmin', 'runwebcoll', -1, 0, {'collection': 'Pictures'}), (WEBACCESSADMINROLE,WEBACCESSACTION, -1, 0, {}), (SUPERADMINROLE, 'runtaskex', -1, 0, {}), (SUPERADMINROLE, 'referee', -1, 1, {}), (SUPERADMINROLE, 'submit', -1, 1, {}), (SUPERADMINROLE, 'runbibrank', -1, 0, {}), (SUPERADMINROLE, 'cfgbibrank', -1, 0, {}), (SUPERADMINROLE, 'cfgbibharvest', -1, 0, {}), (SUPERADMINROLE, 'runoaiharvest', -1, 0, {}), (SUPERADMINROLE, 'cfgwebcomment', -1, 0, {}), ) cfg_webaccess_msgs = { 0: 'Try to login with another account.' % (weburl, weburl, "%s"), 1: '
If you think this is not correct, please contact: %s' % (supportemail, supportemail), 2: '
If you have any questions, please write to %s' % (supportemail, supportemail), 3: 'Guest users are not allowed, please login.' % weburl, 4: 'The site is temporarily closed for maintenance. Please come back soon.', 5: 'Authorization failure', 6: '%s temporarily closed' % cdsname, 7: 'This functionality is temporarily closed due to server maintenance. Please use only the search engine in the meantime.', 8: 'Functionality temporarily closed' } cfg_webaccess_warning_msgs = { 0: 'Authorization granted', 1: 'Error(1): You are not authorized to perform this action.', 2: 'Error(2): You are not authorized to perform any action.', 3: 'Error(3): The action %s does not exist.', 4: 'Error(4): Unexpected error occurred.', 5: 'Error(5): Missing mandatory keyword argument(s) for this action.', 6: 'Error(6): Guest accounts are not authorized to perform this action.', 7: 'Error(7): Not enough arguments, user ID and action name required.', 8: 'Error(8): Incorrect keyword argument(s) for this action.', 9: """Error(9): Account '%s' is not yet activated.""", 10: """Error(10): You were not authorized by the authentication method '%s'.""", 11: """Error(11): The selected login method '%s' is not the default method for this account, please try another one.""", 12: """Error(12): Selected login method '%s' does not exist.""", 13: """Error(13): Could not register '%s' account.""", 14: """Error(14): Could not login using '%s', because this user is unknown.""", 15: """Error(15): Could not login using your '%s' account, because you have introduced a wrong password.""" } diff --git a/modules/webaccess/lib/access_control_engine.py b/modules/webaccess/lib/access_control_engine.py index 363a3df43..5c31ea351 100644 --- a/modules/webaccess/lib/access_control_engine.py +++ b/modules/webaccess/lib/access_control_engine.py @@ -1,234 +1,234 @@ ## $Id$ ## CDSware Access Control Engine in mod_python. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware Access Control Engine in mod_python.""" __version__ = "$Id$" -## import interesting modules: -from config import * -from dbquery import run_sql from MySQLdb import ProgrammingError -from access_control_config import SUPERADMINROLE, cfg_webaccess_warning_msgs, cfg_webaccess_msgs, CFG_ACCESS_CONTROL_LEVEL_GUESTS, CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS#, CFG_EXTERNAL_ACCESS_CONTROL + +from cdsware.config import * +from cdsware.dbquery import run_sql +from cdsware.access_control_config import SUPERADMINROLE, cfg_webaccess_warning_msgs, cfg_webaccess_msgs, CFG_ACCESS_CONTROL_LEVEL_GUESTS, CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS#, CFG_EXTERNAL_ACCESS_CONTROL called_from = 1 #1=web,0=cli try: import _apache except ImportError, e: called_from = 0 ## access controle engine function def acc_authorize_action(id_user, name_action, verbose=0, **arguments): """Check if user is allowed to perform action with given list of arguments. Return (0, message) if authentication succeeds, (error code, error message) if it fails. The arguments are as follows: id_user - id of the user in the database name_action - the name of the action arguments - dictionary with keyword=value pairs created automatically by python on the extra arguments. these depend on the given action. """ #TASK -1: Checking external source if user is authorized: #if CFG_: # em_pw = run_sql("SELECT email, password FROM user WHERE id=%s", (id_user,)) # if em_pw: # if not CFG_EXTERNAL_ACCESS_CONTROL.loginUser(em_pw[0][0], em_pw[0][1]): # return (10, "%s %s" % (cfg_webaccess_warning_msgs[10], (called_from and cfg_webaccess_msgs[1] or ""))) # TASK 0: find id and allowedkeywords of action if verbose: print 'task 0 - get action info' query1 = """select a.id, a.allowedkeywords, a.optional from accACTION a where a.name = '%s'""" % (name_action) try: id_action, aallowedkeywords, optional = run_sql(query1)[0] except (ProgrammingError, IndexError): return (3, "%s %s" % (cfg_webaccess_warning_msgs[3] % name_action, (called_from and cfg_webaccess_msgs[1] or ""))) defkeys = aallowedkeywords.split(',') for key in arguments.keys(): if key not in defkeys: return (8, "%s %s" % (cfg_webaccess_warning_msgs[8], (called_from and "%s %s" % (cfg_webaccess_msgs[0] % name_action[3:], cfg_webaccess_msgs[1]) or ""))) #incorrect arguments? # ------------------------------------------- # TASK 1: check if user is a superadmin # we know the action exists. no connection with role is necessary # passed arguments must have allowed keywords # no check to see if the argument exists if verbose: print 'task 1 - is user %s' % (SUPERADMINROLE, ) if run_sql("""SELECT * FROM accROLE r LEFT JOIN user_accROLE ur ON r.id = ur.id_accROLE WHERE r.name = '%s' AND ur.id_user = '%s' """ % (SUPERADMINROLE, id_user)): return (0, cfg_webaccess_warning_msgs[0]) # ------------------------------------------ # TASK 2: check if user exists and find all the user's roles and create or-string if verbose: print 'task 2 - find user and userroles' try: query2 = """SELECT email, note from user where id=%s""" % id_user res2 = run_sql(query2) if not res2: raise Exception if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 1 and res2[0][1] not in [1, "1"]: if res[0][1]: return (9, "%s %s" % (cfg_webaccess_warning_msgs[9] % res[0][1], (called_from and "%s %s" % (cfg_webaccess_msgs[0] % name_action[3:], cfg_webaccess_msgs[1]) or ""))) else: raise Exception query2 = """SELECT ur.id_accROLE FROM user_accROLE ur WHERE ur.id_user=%s ORDER BY ur.id_accROLE """ % id_user res2 = run_sql(query2) except Exception: return (6, "%s %s" % (cfg_webaccess_warning_msgs[6], (called_from and "%s %s" % (cfg_webaccess_msgs[0] % name_action[3:], cfg_webaccess_msgs[1]) or ""))) if not res2: return (2, "%s %s" % (cfg_webaccess_warning_msgs[2], (called_from and "%s %s" % (cfg_webaccess_msgs[0] % name_action[3:], cfg_webaccess_msgs[1]) or ""))) #user has no roles # ------------------------------------------- # create role string (add default value? roles='(raa.id_accROLE='def' or ') str_roles = '' for (role, ) in res2: if str_roles: str_roles += ',' str_roles += '%s' % (role, ) # TASK 3: authorizations with no arguments given if verbose: print 'task 3 - checks with no arguments' if not arguments: # 3.1 if optional == 'no': if verbose: print ' - action with zero arguments' connection = run_sql("""SELECT * FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE IN (%s) AND id_accACTION = %s AND argumentlistid = 0 AND id_accARGUMENT = 0 """ % (str_roles, id_action)) if connection and 1: return (0, cfg_webaccess_warning_msgs[0]) else: return (1, "%s %s" % (cfg_webaccess_warning_msgs[1], (called_from and "%s %s" % (cfg_webaccess_msgs[0] % name_action[3:], cfg_webaccess_msgs[1]) or ""))) # 3.2 if optional == 'yes': if verbose: print ' - action with optional arguments' connection = run_sql("""SELECT * FROM accROLE_accACTION_accARGUMENT WHERE id_accROLE IN (%s) AND id_accACTION = %s AND id_accARGUMENT = -1 AND argumentlistid = -1 """ % (str_roles, id_action)) if connection and 1: return (0, cfg_webaccess_warning_msgs[0]) else: return (1, "%s %s" % (cfg_webaccess_warning_msgs[1], (called_from and "%s %s" % (cfg_webaccess_msgs[0] % name_action[3:], cfg_webaccess_msgs[1]) or ""))) # none of the zeroargs tests succeded if verbose: print ' - not authorization without arguments' return (5, "%s %s" % (cfg_webaccess_warning_msgs[5], (called_from and "%s" % (cfg_webaccess_msgs[1] or "")))) # TASK 4: create list of keyword and values that satisfy part of the authentication and create or-string if verbose: print 'task 4 - create keyword=value pairs' # create dictionary with default values and replace entries from input arguments defdict = {} for key in defkeys: try: defdict[key] = arguments[key] except KeyError: return (5, "%s %s" % (cfg_webaccess_warning_msgs[5], (called_from and "%s" % (cfg_webaccess_msgs[1] or "")))) # all keywords must be present # except KeyError: defdict[key] = 'x' # default value, this is not in use... # create or-string from arguments str_args = '' for key in defkeys: if str_args: str_args += ' OR ' str_args += """(arg.keyword = '%s' AND arg.value = '%s')""" % (key, defdict[key]) # TASK 5: find all the table entries that partially authorize the action in question if verbose: print 'task 5 - find table entries that are part of the result' query4 = """SELECT DISTINCT raa.id_accROLE, raa.id_accACTION, raa.argumentlistid, raa.id_accARGUMENT, arg.keyword, arg.value FROM accROLE_accACTION_accARGUMENT raa, accARGUMENT arg WHERE raa.id_accACTION = %s AND raa.id_accROLE IN (%s) AND (%s) AND raa.id_accARGUMENT = arg.id """ % (id_action, str_roles, str_args) try: res4 = run_sql(query4) except ProgrammingError: return (3, "%s %s" % (cfg_webaccess_warning_msgs[3], (called_from and "%s" % (cfg_webaccess_msgs[1] or "")))) if not res4: return (1, "%s %s" % (cfg_webaccess_warning_msgs[1], (called_from and "%s %s" % (cfg_webaccess_msgs[0] % name_action[3:], cfg_webaccess_msgs[1]) or ""))) # no entries at all res5 = [] for res in res4: res5.append(res) res5.sort() # USER AUTHENTICATED TO PERFORM ACTION WITH ONE ARGUMENT if len(defdict) == 1: return (0, cfg_webaccess_warning_msgs[0]) # CHECK WITH MORE THAN 1 ARGUMENT # TASK 6: run through the result and try to satisfy authentication if verbose: print 'task 6 - combine results and try to satisfy' cur_role = cur_action = cur_arglistid = 0 booldict = {} for key in defkeys: booldict[key] = 0 # run through the results for (role, action, arglistid, arg, keyword, val) in res5 + [(-1, -1, -1, -1, -1, -1)]: # not the same role or argumentlist (authorization group), i.e. check if thing are satisfied # if cur_arglistid != arglistid or cur_role != role or cur_action != action: if (cur_arglistid, cur_role, cur_action) != (arglistid, role, action): if verbose: print ' : checking new combination', # test if all keywords are satisfied for value in booldict.values(): if not value: break else: if verbose: print '-> found satisfying combination' return (0, cfg_webaccess_warning_msgs[0]) # USER AUTHENTICATED TO PERFORM ACTION if verbose: print '-> not this one' # assign the values for the current tuple from the query cur_arglistid, cur_role, cur_action = arglistid, role, action for key in booldict.keys(): booldict[key] = 0 # set keyword qualified for the action, (whatever result of the test) booldict[keyword] = 1 if verbose: print 'finished' # authentication failed return (4, "%s %s" % (cfg_webaccess_warning_msgs[4], (called_from and "%s %s" % (cfg_webaccess_msgs[0] % name_action[3:], cfg_webaccess_msgs[1]) or ""))) diff --git a/modules/webaccess/lib/webaccessadmin_lib.py b/modules/webaccess/lib/webaccessadmin_lib.py index a8d475b78..3a9e8ae91 100644 --- a/modules/webaccess/lib/webaccessadmin_lib.py +++ b/modules/webaccess/lib/webaccessadmin_lib.py @@ -1,3432 +1,3430 @@ ## $Id$ ## Administrator interface for WebAccess ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware WebAccess Administrator Interface.""" __lastupdated__ = """$Date$""" ## fill config variables: -import access_control_engine as acce -import access_control_admin as acca -# reload(acce) -# reload(acca) import cgi import re import random import MySQLdb import string import smtplib - -from bibrankadminlib import adderrorbox,addadminbox,tupletotable,tupletotable_onlyselected,addcheckboxes,createhiddenform -from access_control_config import * -from dbquery import run_sql -from config import * -from webpage import page, pageheaderonly, pagefooteronly -from webuser import getUid, get_email, page_not_authorized from mod_python import apache -from search_engine import print_record -from webuser import checkemail, get_user_preferences, set_user_preferences + +import cdsware.access_control_engine as acce +import cdsware.access_control_admin as acca +from cdsware.bibrankadminlib import adderrorbox,addadminbox,tupletotable,tupletotable_onlyselected,addcheckboxes,createhiddenform +from cdsware.access_control_config import * +from cdsware.dbquery import run_sql +from cdsware.config import * +from cdsware.webpage import page, pageheaderonly, pagefooteronly +from cdsware.webuser import getUid, get_email, page_not_authorized +from cdsware.search_engine import print_record +from cdsware.webuser import checkemail, get_user_preferences, set_user_preferences __version__ = "$Id$" def index(req, title='', body='', subtitle='', adminarea=2, authorized=0): """main function to show pages for webaccessadmin. 1. if user not logged in and administrator, show the mustlogin page 2. if used without body argument, show the startpage 3. show admin page with title, body, subtitle and navtrail. adminarea - number codes that tell what extra info to put in the navtrail 0 - nothing extra 1 - add Delegate Rights 2 - add Manage WebAccess maybe add: 3: role admin 4: action admin 5: user area 6: reset area authorized - if 1, don't check if the user is allowed to be webadmin """ navtrail_previous_links = """Admin Area > WebAccess Admin """ % (weburl, weburl) if body: if adminarea == 1: navtrail_previous_links += '> Delegate Rights ' % (weburl, ) if adminarea >= 2 and adminarea < 7: navtrail_previous_links += '> Manage WebAccess ' % (weburl, ) if adminarea == 3: navtrail_previous_links += '> Role Administration ' % (weburl, ) elif adminarea == 4: navtrail_previous_links += '> Action Administration ' % (weburl, ) elif adminarea == 5: navtrail_previous_links += '> User Administration ' % (weburl, ) elif adminarea == 6: navtrail_previous_links += '> Reset Authorizations ' % (weburl, ) elif adminarea == 7: navtrail_previous_links += '> Manage Accounts ' % (weburl, ) id_user = getUid(req) (auth_code, auth_message) = is_adminuser(req) if not authorized and auth_code != 0: return mustloginpage(req, auth_message) elif not body: title = 'Manage WebAccess' body = startpage() elif type(body) != str: body = addadminbox(subtitle, datalist=body) return page(title=title, uid=id_user, body=body, navtrail=navtrail_previous_links, lastupdated=__lastupdated__) def mustloginpage(req, message): """show a page asking the user to login.""" navtrail_previous_links = """Admin Area > WebAccess Admin """ % (weburl, weburl) return page_not_authorized(req=req, text=message, navtrail=navtrail_previous_links) def is_adminuser(req): """check if user is a registered administrator. """ id_user = getUid(req) return acce.acc_authorize_action(id_user, WEBACCESSACTION) def perform_rolearea(req): """create the role area menu page.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) header = ['id', 'name', 'description', 'users', 'authorizations / actions', 'role', ''] roles = acca.acc_getAllRoles() roles2 = [] for (id, name, desc) in roles: if len(desc) > 30: desc = desc[:30] + '...' roles2.append([id, name, desc]) for col in [(('add', 'adduserrole'), ('remove', 'deleteuserrole')), (('add', 'addauthorization'), ('modify', 'modifyauthorizations'), ('remove', 'deleteroleaction')), (('delete', 'deleterole'), ), (('show details', 'showroledetails'), )]: roles2[-1].append('%s' % (col[0][1], id, col[0][0])) for (str, function) in col[1:]: roles2[-1][-1] += ' / %s' % (function, id, str) output = """
Users:
add or remove users from the access to a role and its priviliges.
Authorizations/Actions:
these terms means almost the same, but an authorization is a
connection between a role and an action (possibly) containing arguments.
Roles:
see all the information attached to a role and decide if you want to
delete it.
""" output += tupletotable(header=header, tuple=roles2) extra = """
Create new role
go here to add a new role.
Create new action
go here to add a new action.
""" return index(req=req, title='Role Administration', subtitle='administration with roles as access point', body=[output, extra], adminarea=2) def perform_actionarea(req): """create the action area menu page.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) header = ['id', 'name', 'authorizations/roles', 'action', ''] actions = acca.acc_getAllActions() actions2 = [] roles2 = [] for (id, name, dontcare) in actions: actions2.append([id, name]) for col in [(('add', 'addauthorization'), ('modify', 'modifyauthorizations'), ('remove', 'deleteroleaction')), (('delete', 'deleteaction'), ), (('show details', 'showactiondetails'), )]: actions2[-1].append('%s' % (col[0][1], id, col[0][0])) for (str, function) in col[1:]: actions2[-1][-1] += ' / %s' % (function, id, str) output = """
Authorizations/Roles:
these terms means almost the same, but an authorization is a
connection between a role and an action (possibly) containing arguments.
Actions:
see all the information attached to an action and decide if you want to
delete it.
""" output += tupletotable(header=header, tuple=actions2) extra = """
Create new role
go here to add a new role.
Create new action
go here to add a new action.
""" return index(req=req, title='Action Administration', subtitle='administration with actions as access point', body=[output, extra], adminarea=2) def perform_userarea(req, email_user_pattern=''): """create area to show info about users. """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) subtitle = 'step 1 - search for users' output = """

search for users to display.

""" # remove letters not allowed in an email email_user_pattern = cleanstring_email(email_user_pattern) text = ' 1. search for user\n' text += ' \n' % (email_user_pattern, ) output += createhiddenform(action="userarea", text=text, button="search for users") if email_user_pattern: users1 = run_sql("""SELECT id, email FROM user WHERE email RLIKE '%s' ORDER BY email LIMIT %s""" % (email_user_pattern, MAXPAGEUSERS+1)) if not users1: output += '

no matching users

' else: subtitle = 'step 2 - select what to do with user' users = [] for (id, email) in users1[:MAXPAGEUSERS]: users.append([id, email]) for col in [(('add', 'addroleuser'), ('remove', 'deleteuserrole')), (('show details', 'showuserdetails'), )]: users[-1].append('%s' % (col[0][1], email_user_pattern, id, col[0][0])) for (str, function) in col[1:]: users[-1][-1] += ' / %s' % (function, email_user_pattern, id, str) output += '

found %s matching users:

' % (len(users1), ) output += tupletotable(header=['id', 'email', 'roles', ''], tuple=users) if len(users1) > MAXPAGEUSERS: output += '

only showing the first %s users, narrow your search...

' % (MAXPAGEUSERS, ) return index(req=req, title='User Administration', subtitle=subtitle, body=[output], adminarea=2) def perform_resetarea(req): """create the reset area menu page.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) output = """
Reset to Default Authorizations
remove all changes that has been done to the roles and
add only the default authorization settings.
Add Default Authorizations
keep all changes and add the default authorization settings.
""" return index(req=req, title='Reset Authorizations', subtitle='reseting to or adding default authorizations', body=[output], adminarea=2) def perform_resetdefaultsettings(req, superusers=[], confirm=0): """delete all roles, actions and authorizations presently in the database and add only the default roles. only selected users will be added to superadmin, rest is blank """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) # cleaning input if type(superusers) == str: superusers = [superusers] # remove not valid e-mails for email in superusers: if not check_email(email): superusers.remove(email) # instructions output = """

before you reset the settings, we need some users
to connect to %s.
enter as many e-mail adresses you want and press reset.
confirm reset settings when you have added enough e-mails.
%s is added as default.

""" % (SUPERADMINROLE, supportemail) # add more superusers output += """

enter user e-mail addresses:

""" for email in superusers: output += ' ' % (email, ) output += """ e-mail
""" if superusers: # remove emails output += """
have you entered wrong data?
""" # superusers confirm table start = '
' extra = ' ' for email in superusers: extra += '' % (email, ) extra += ' ' end = '
' output += '

reset default settings with the users below?

' output += tupletotable(header=['e-mail address'], tuple=superusers, start=start, extracolumn=extra, end=end) if confirm in [1, "1"]: res = acca.acc_reset_default_settings(superusers) if res: output += '

successfully reset default settings

' else: output += '

sorry, could not reset default settings

' return index(req=req, title='Reset Default Settings', subtitle='reset settings', body=[output], adminarea=6) def perform_adddefaultsettings(req, superusers=[], confirm=0): """add the default settings, and keep everything else. probably nothing will be deleted, except if there has been made changes to the defaults.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) # cleaning input if type(superusers) == str: superusers = [superusers] # remove not valid e-mails for email in superusers: if not check_email(email): superusers.remove(email) # instructions output = """

before you add the settings, we need some users
to connect to %s.
enter as many e-mail adresses you want and press add.
confirm add settings when you have added enough e-mails.
%s is added as default.

""" % (SUPERADMINROLE, supportemail) # add more superusers output += """

enter user e-mail addresses:

""" for email in superusers: output += ' ' % (email, ) output += """ e-mail
""" if superusers: # remove emails output += """
have you entered wrong data?
""" # superusers confirm table start = '
' extra = ' ' for email in superusers: extra += '' % (email, ) extra += ' ' end = '
' output += '

add default settings with the users below?

' output += tupletotable(header=['e-mail address'], tuple=superusers, start=start, extracolumn=extra, end=end) if confirm in [1, "1"]: res = acca.acc_add_default_settings(superusers) if res: output += '

successfully added default settings

' else: output += '

sorry, could not add default settings

' return index(req=req, title='Add Default Settings', subtitle='add settings', body=[output], adminarea=6) def perform_manageaccounts(req, mtype='', content='', confirm=0): """start area for managing accounts.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) subtitle = 'Overview' fin_output = '' fin_output += """
Menu
0. Show all 1. Access policy 2. Account overview 3. Create account 4. Edit accounts
""" % (weburl, weburl, weburl, weburl, weburl) if mtype == "perform_accesspolicy" and content: fin_output += content elif mtype == "perform_accesspolicy" or mtype == "perform_showall": fin_output += perform_accesspolicy(req, callback='') fin_output += "
" if mtype == "perform_accountoverview" and content: fin_output += content elif mtype == "perform_accountoverview" or mtype == "perform_showall": fin_output += perform_accountoverview(req, callback='') fin_output += "
" if mtype == "perform_createaccount" and content: fin_output += content elif mtype == "perform_createaccount" or mtype == "perform_showall": fin_output += perform_createaccount(req, callback='') fin_output += "
" if mtype == "perform_modifyaccounts" and content: fin_output += content elif mtype == "perform_modifyaccounts" or mtype == "perform_showall": fin_output += perform_modifyaccounts(req, callback='') fin_output += "
" return index(req=req, title='Manage Accounts', subtitle=subtitle, body=[fin_output], adminarea=0, authorized=1) def perform_accesspolicy(req, callback='yes', confirm=0): """Modify default behaviour of a guest user or if new accounts should automatically/manually be modified.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) subtitle = """1. Access policy.   [?]""" % weburl account_policy = {} account_policy[0] = "Users can register new accounts. New accounts automatically activated." account_policy[1] = "Users can register new accounts. Admin users must activate the accounts." account_policy[2] = "Only admin can register new accounts. User cannot edit email address." account_policy[3] = "Only admin can register new accounts. User cannot edit email address or password." account_policy[4] = "Only admin can register new accounts. User cannot edit email address,password or login method." site_policy = {} site_policy[0] = "Normal operation of the site." site_policy[1] = "Read-only site, all write operations temporarily closed." site_policy[2] = "Site fully closed." output = "(Modifications must be done in access_control_config.py)
" output += "
Current settings:
" output += "Site status: %s
" % (site_policy[CFG_ACCESS_CONTROL_LEVEL_SITE]) output += "Guest accounts allowed: %s
" % (CFG_ACCESS_CONTROL_LEVEL_GUESTS == 0 and "Yes" or "No") output += "Account policy: %s
" % (account_policy[CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS]) output += "Allowed email addresses limited: %s
" % (CFG_ACCESS_CONTROL_LIMIT_REGISTRATION_TO_DOMAIN and CFG_ACCESS_CONTROL_LIMIT_REGISTRATION_TO_DOMAIN or "Not limited") output += "Send email to admin when new account: %s
" % (CFG_ACCESS_CONTROL_NOTIFY_ADMIN_ABOUT_NEW_ACCOUNTS == 1 and "Yes" or "No") output += "Send email to user after creating new account: %s
" % (CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_NEW_ACCOUNT == 1 and "Yes" or "No") output += "Send email to user when account is activated: %s
" % (CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_ACTIVATION == 1 and "Yes" or "No") output += "Send email to user when account is deleted/rejected: %s
" % (CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_DELETION == 1 and "Yes" or "No") output += "
" output += "Available 'login via' methods:
" methods = CFG_EXTERNAL_AUTHENTICATION.keys() methods.sort() for system in methods: output += """%s %s
""" % (system, (CFG_EXTERNAL_AUTHENTICATION[system][1] and "(Default)" or "")) output += "
Changing the settings:
" output += "Currently, all changes must be done using your favourite editor, and the webserver restarted for changes to take effect. For the settings to change, either look in the guide or in access_control_config.py ." try: body = [output, extra] except NameError: body = [output] if callback: return perform_manageaccounts(req, "perform_accesspolicy", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_accountoverview(req, callback='yes', confirm=0): """Modify default behaviour of a guest user or if new accounts should automatically/manually be modified.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) subtitle = """2. Account overview.   [?]""" % weburl output = "" res = run_sql("SELECT COUNT(*) FROM user WHERE email=''") output += "Guest accounts: %s
" % res[0][0] res = run_sql("SELECT COUNT(*) FROM user WHERE email!=''") output += "Registered accounts: %s
" % res[0][0] res = run_sql("SELECT COUNT(*) FROM user WHERE email!='' AND note='0' OR note IS NULL") output += "Inactive accounts: %s " % res[0][0] if res[0][0] > 0: output += ' [Activate/Reject accounts]' res = run_sql("SELECT COUNT(*) FROM user") output += "
Total nr of accounts: %s
" % res[0][0] try: body = [output, extra] except NameError: body = [output] if callback: return perform_manageaccounts(req, "perform_accountoverview", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_createaccount(req, email='', password='', callback='yes', confirm=0): """Modify default behaviour of a guest user or if new accounts should automatically/manually be modified.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) subtitle = """3. Create account.   [?]""" % weburl output = "" text = ' Email:\n' text += '
' % (email, ) text += ' Password:\n' text += '
' % (password, ) output += createhiddenform(action="createaccount", text=text, confirm=1, button="Create") if confirm in [1, "1"] and email and checkemail(email): res = run_sql("SELECT * FROM user WHERE email='%s'" % MySQLdb.escape_string(email)) if not res: res = run_sql("INSERT INTO user (email,password, note) values('%s','%s', '1')" % (MySQLdb.escape_string(email), MySQLdb.escape_string(password))) if CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_NEW_ACCOUNT == 1: emailsent = sendNewUserAccountWarning(email, email, password) if password: output += 'Account created with password and activated.' else: output += 'Account created without password and activated.' if CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_NEW_ACCOUNT == 1: if emailsent: output += '
An email has been sent to the owner of the account.' else: output += '
Could not send an email to the owner of the account.' else: output += 'An account with the same email already exists.' elif confirm in [1, "1"]: output += 'Please specify an valid email-address.' try: body = [output, extra] except NameError: body = [output] if callback: return perform_manageaccounts(req, "perform_createaccount", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyaccountstatus(req, userID, email_user_pattern, limit_to, maxpage, page, callback='yes', confirm=0): """set a disabled account to enabled and opposite""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) res = run_sql("SELECT id, email, note, password FROM user WHERE id=%s" % userID) output = "" if res: if res[0][2] in [0, "0", None]: res2 = run_sql("UPDATE user SET note=1 WHERE id=%s" % userID) output += """The account '%s' has been activated.""" % res[0][1] if CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_ACTIVATION == 1: emailsent = sendAccountActivatedMessage(res[0][1], res[0][1], res[0][3]) if emailsent: output += """
An email has been sent to the owner of the account.""" else: output += """
Could not send an email to the owner of the account.""" elif res[0][2] in [1, "1"]: res2 = run_sql("UPDATE user SET note=0 WHERE id=%s" % userID) output += """The account '%s' has been set inactive.""" % res[0][1] else: output += 'The account id given does not exist.' try: body = [output, extra] except NameError: body = [output] if callback: return perform_modifyaccounts(req, email_user_pattern, limit_to, maxpage, page, content=output, callback='yes') else: return addadminbox(subtitle, body) def perform_editaccount(req, userID, mtype='', content='', callback='yes', confirm=-1): """form to modify an account. this method is calling other methods which again is calling this and sending back the output of the method. if callback, the method will call perform_editcollection, if not, it will just return its output. userID - id of the user mtype - the method that called this method. content - the output from that method.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) res = run_sql("SELECT id, email FROM user WHERE id=%s" % userID) if not res: if mtype == "perform_deleteaccount": text = """The selected account has been deleted, to continue editing, go back to 'Manage Accounts'.""" if CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_DELETION == 1: text += """
An email has been sent to the owner of the account.""" else: text = """The selected accounts does not exist, please go back and select an account to edit.""" return index(req=req, title='Edit Account', subtitle="Edit account", body=[text], adminarea=7, authorized=1) fin_output = """
Menu
0. Show all 1. Modify login-data 2. Modify baskets 3. Modify alerts 4. Modify preferences
5. Delete account
""" % (weburl, userID, weburl, userID, weburl, userID, weburl, userID, weburl, userID, weburl, userID) if mtype == "perform_modifylogindata" and content: fin_output += content elif mtype == "perform_modifylogindata" or not mtype: fin_output += perform_modifylogindata(req, userID, callback='') if mtype == "perform_modifybasket" and content: fin_output += content elif mtype == "perform_modifybasket" or not mtype: fin_output += perform_modifybasket(req, userID, callback='') if mtype == "perform_modifypreferences" and content: fin_output += content elif mtype == "perform_modifypreferences" or not mtype: fin_output += perform_modifypreferences(req, userID, callback='') if mtype == "perform_modifyalerts" and content: fin_output += content elif mtype == "perform_modifyalerts" or not mtype: fin_output += perform_modifyalerts(req, userID, callback='') if mtype == "perform_deleteaccount" and content: fin_output += content elif mtype == "perform_deleteaccount" or not mtype: fin_output += perform_deleteaccount(req, userID, callback='') return index(req=req, title='Edit Account', subtitle="Edit account '%s'" % res[0][1], body=[fin_output], adminarea=7, authorized=1) def perform_modifybasket(req, userID, callback='yes', confirm=0): """modify email and password of an account""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) subtitle = """2. Modify baskets.   [?]""" % weburl res = run_sql("SELECT id, email, password FROM user WHERE id=%s" % userID) output = "" if res: text = """To modify the baskets for this account, you have to login as the user.""" output += createhiddenform(action="%s/youraccount.py/login?" % weburl, text=text, p_email=res[0][1], p_pw=res[0][2], referer="%s/yourbaskets.py/display" % weburl, button="Login") output += "Remember that you will be logged out as the current user." #baskets = run_sql("SELECT basket.id, basket.name, basket.public FROM basket, user_basket WHERE id_user=%s and user_basket.id_basket=basket.id" % userID) #output += "" #for (id, name, public) in baskets: # output += "" % (name, (public=="y" and "Yes" or "No")) # basket_records = run_sql("SELECT id_record, nb_order FROM basket_record WHERE id_basket=%s" % id) # for (id_record, nb_order) in basket_records: # output += "" # #output += "
%s
Public: %s
" # output += print_record(id_record) # output += "
" else: output += 'The account id given does not exist.' try: body = [output, extra] except NameError: body = [output] if callback: return perform_editaccount(req, userID, mtype='perform_modifybasket', content=addadminbox(subtitle, body), callback='yes') else: return addadminbox(subtitle, body) def perform_modifylogindata(req, userID, email='', password='', callback='yes', confirm=0): """modify email and password of an account""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) subtitle = """1. Edit login-data.   [?]""" % weburl res = run_sql("SELECT id, email, password FROM user WHERE id=%s" % userID) output = "" if res: if not email and not password: email = res[0][1] password = res[0][2] text = ' Account id:%s
\n' % userID text += ' Email:\n' text += '
' % (email, ) text += ' Password:\n' text += '
' % (password, ) output += createhiddenform(action="modifylogindata", text=text, userID=userID, confirm=1, button="Modify") if confirm in [1, "1"] and email and checkemail(email): res = run_sql("UPDATE user SET email='%s' WHERE id=%s" % (MySQLdb.escape_string(email), userID)) res = run_sql("UPDATE user SET password='%s' WHERE id=%s" % (MySQLdb.escape_string(password), userID)) output += 'Email and/or password modified.' elif confirm in [1, "1"]: output += 'Please specify an valid email-address.' else: output += 'The account id given does not exist.' try: body = [output, extra] except NameError: body = [output] if callback: return perform_editaccount(req, userID, mtype='perform_modifylogindata', content=addadminbox(subtitle, body), callback='yes') else: return addadminbox(subtitle, body) def perform_modifyalerts(req, userID, callback='yes', confirm=0): """modify email and password of an account""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) subtitle = """3. Modify alerts.   [?]""" % weburl res = run_sql("SELECT id, email, password FROM user WHERE id=%s" % userID) output = "" if res: text = """To modify the alerts for this account, you have to login as the user.""" output += createhiddenform(action="%s/youraccount.py/login?" % weburl, text=text, p_email=res[0][1], p_pw=res[0][2], referer="%s/youralerts.py/display" % weburl, button="Login") output += "Remember that you will be logged out as the current user." res= """ SELECT q.id, q.urlargs, a.id_basket, a.alert_name, a.frequency, a.notification, DATE_FORMAT(a.date_creation,'%%d %%b %%Y'), DATE_FORMAT(a.date_lastrun,'%%d %%b %%Y') FROM query q, user_query_basket a WHERE a.id_user='%s' AND a.id_query=q.id ORDER BY a.alert_name ASC """ % userID #res = run_sql(res) #for (qID, qurlargs, id_basket, alertname, frequency, notification, date_creation, date_lastrun) in res: # output += "%s - %s - %s - %s - %s - %s - %s
" % (qID, id_basket, alertname, frequency, notification, date_creation, date_lastrun) else: output += 'The account id given does not exist.' try: body = [output, extra] except NameError: body = [output] if callback: return perform_editaccount(req, userID, mtype='perform_modifyalerts', content=addadminbox(subtitle, body), callback='yes') else: return addadminbox(subtitle, body) def perform_modifypreferences(req, userID, login_method='', callback='yes', confirm=0): """modify email and password of an account""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) subtitle = """4. Modify preferences.   [?]""" % weburl res = run_sql("SELECT id, email, password FROM user WHERE id=%s" % userID) output = "" if res: user_pref = get_user_preferences(userID) if confirm in [1, "1"]: if login_method: user_pref['login_method'] = login_method set_user_preferences(userID, user_pref) output += "Select default login method:
" text = "" methods = CFG_EXTERNAL_AUTHENTICATION.keys() methods.sort() for system in methods: text += """%s
""" % (system, (user_pref['login_method'] == system and "checked" or ""), system) output += createhiddenform(action="modifypreferences", text=text, confirm=1, userID=userID, button="Select") if confirm in [1, "1"]: if login_method: output += """The login method has been changed""" else: output += """Nothing to update""" else: output += 'The account id given does not exist.' try: body = [output, extra] except NameError: body = [output] if callback: return perform_editaccount(req, userID, mtype='perform_modifypreferences', content=addadminbox(subtitle, body), callback='yes') else: return addadminbox(subtitle, body) def perform_deleteaccount(req, userID, callback='yes', confirm=0): """delete account""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) subtitle = """5. Delete account.   [?]""" % weburl res = run_sql("SELECT id, email, password FROM user WHERE id=%s" % userID) output = "" if res: if confirm in [0, "0"]: text = 'Are you sure you want to delete the account with email: "%s"?' % res[0][1] output += createhiddenform(action="deleteaccount", text=text, userID=userID, confirm=1, button="Delete") elif confirm in [1, "1"]: res2 = run_sql("DELETE FROM user WHERE id=%s" % userID) output += 'Account deleted.' if CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_DELETION == 1: emailsent = sendAccountDeletedMessage(res[0][1], res[0][1]) else: output += 'The account id given does not exist.' try: body = [output, extra] except NameError: body = [output] if callback: return perform_editaccount(req, userID, mtype='perform_deleteaccount', content=addadminbox(subtitle, body), callback='yes') else: return addadminbox(subtitle, body) def perform_rejectaccount(req, userID, email_user_pattern, limit_to, maxpage, page, callback='yes', confirm=0): """Delete account and send an email to the owner.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) res = run_sql("SELECT id, email, password, note FROM user WHERE id=%s" % userID) output = "" if res: res2 = run_sql("DELETE FROM user WHERE id=%s" % userID) output += 'Account rejected and deleted.' if CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_DELETION == 1: if not res[0][3] or res[0][3] == "0": emailsent = sendAccountRejectedMessage(res[0][1], res[0][1]) elif res[0][3] == "1": emailsent = sendAccountDeletedMessage(res[0][1], res[0][1]) if emailsent: output += """
An email has been sent to the owner of the account.""" else: output += """
Could not send an email to the owner of the account.""" else: output += 'The account id given does not exist.' try: body = [output, extra] except NameError: body = [output] if callback: return perform_modifyaccounts(req, email_user_pattern, limit_to, maxpage, page, content=output, callback='yes') else: return addadminbox(subtitle, body) def perform_modifyaccounts(req, email_user_pattern='', limit_to=-1, maxpage=MAXPAGEUSERS, page=1, content='', callback='yes', confirm=0): """Modify default behaviour of a guest user or if new accounts should automatically/manually be modified.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) subtitle = """4. Edit accounts.   [?]""" % weburl output = "" # remove letters not allowed in an email email_user_pattern = cleanstring_email(email_user_pattern) try: maxpage = int(maxpage) except: maxpage = MAXPAGEUSERS try: page = int(page) if page < 1: page = 1 except: page = 1 text = ' Email (part of):\n' text += '
' % (email_user_pattern, ) text += """Limit to:
""" % ((limit_to=="all" and "selected" or ""), (limit_to=="enabled" and "selected" or ""), (limit_to=="disabled" and "selected" or "")) text += """Accounts per page:
""" % ((maxpage==25 and "selected" or ""), (maxpage==50 and "selected" or ""), (maxpage==100 and "selected" or ""), (maxpage==250 and "selected" or ""), (maxpage==500 and "selected" or ""), (maxpage==1000 and "selected" or "")) output += createhiddenform(action="modifyaccounts", text=text, button="search for accounts") if limit_to not in [-1, "-1"] and maxpage: users1 = "SELECT id,email,note FROM user WHERE " if limit_to == "enabled": users1 += " email!='' AND note=1" elif limit_to == "disabled": users1 += " email!='' AND note=0 OR note IS NULL" elif limit_to == "guest": users1 += " email=''" else: users1 += " email!=''" if email_user_pattern: users1 += " AND email RLIKE '%s'" % (email_user_pattern) users1 += " ORDER BY email LIMIT %s" % (maxpage * page + 1) users1 = run_sql(users1) if not users1: output += 'There are no accounts matching the email given.' else: users = [] if maxpage * (page - 1) > len(users1): page = len(users1) / maxpage + 1 for (id, email, note) in users1[maxpage * (page - 1):(maxpage * page)]: users.append(['', id, email, (note=="1" and 'Active' or 'Inactive')]) for col in [(((note=="1" and 'Inactivate' or 'Activate'), 'modifyaccountstatus'), ((note == "0" and 'Reject' or 'Delete'), 'rejectaccount'), ), (('Edit account', 'editaccount'), ),]: users[-1].append('%s' % (col[0][1], id, email_user_pattern, limit_to, maxpage, page, random.randint(0,1000), col[0][0])) for (str, function) in col[1:]: users[-1][-1] += ' / %s' % (function, id, email_user_pattern, limit_to, maxpage, page, random.randint(0,1000), str) last = "" next = "" if len(users1) > maxpage: if page > 1: last += 'Last Page' % (email_user_pattern, limit_to, maxpage, (page - 1)) if len(users1[maxpage * (page - 1):(maxpage * page)]) == maxpage: next += 'Next page' % (email_user_pattern, limit_to, maxpage, (page + 1)) output += 'Showing accounts %s-%s:' % (1 + maxpage * (page - 1), maxpage * page) else: output += '%s matching account(s):' % len(users1) output += tupletotable(header=[last, 'id', 'email', 'Status', '', '',next], tuple=users) else: output += 'Please select which accounts to find and how many to show per page.' if content: output += "
%s" % content try: body = [output, extra] except NameError: body = [output] if callback: return perform_manageaccounts(req, "perform_modifyaccounts", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_delegate_startarea(req): """start area for lower level delegation of rights.""" subtitle = 'select what to do' output = '' if is_adminuser(req)[0] == 0: output += """

You are also allowed to be in the Main Admin Area which gives you
the access to the full functionality of WebAccess.

""" output += """
Connect users to roles
add users to the roles you have delegation rights to.
Remove users from roles
remove users from the roles you have delegation rights to.
Set up delegation rights
spesialized area to set up the delegation rights used in the areas above.
you need to be a web administrator to access the area.
""" return index(req=req, title='Delegate Rights', subtitle=subtitle, body=[output], adminarea=0, authorized=1) def perform_delegate_adminsetup(req, id_role_admin=0, id_role_delegate=0, confirm=0): """lets the webadmins set up the delegation rights for the other roles id_role_admin - the role to be given delegation rights id_role_delegate - the role over which the delegation rights are given confirm - make the connection happen """ subtitle = 'step 1 - select admin role' admin_roles = acca.acc_getAllRoles() output = """

This is a specialized area to handle a task that also can be handled
from the "add authorization" interface.

By handling the delegation rights here you get the advantage of
not having to select the correct action (%s) or
remembering the names of available roles.

""" % (DELEGATEADDUSERROLE, ) output += createroleselect(id_role=id_role_admin, step=1, button='select admin role', name='id_role_admin', action='delegate_adminsetup', roles=admin_roles) if str(id_role_admin) != '0': subtitle = 'step 2 - select delegate role' name_role_admin = acca.acc_getRoleName(id_role=id_role_admin) delegate_roles_old = acca.acc_find_delegated_roles(id_role_admin=id_role_admin) delegate_roles = [] delegate_roles_old_names = [] for role in admin_roles: if (role,) not in delegate_roles_old: delegate_roles.append(role) else: delegate_roles_old_names.append(role[1]) if delegate_roles_old_names: delegate_roles_old_names.sort() names_str = '' for name in delegate_roles_old_names: if names_str: names_str += ', ' names_str += name output += '

previously selected roles: %s.

' % (names_str, ) extra = """
Remove delegated roles
use the standard administration area to remove delegation rights you no longer want to be available.
""" % (id_role_admin, acca.acc_getActionId(name_action=DELEGATEADDUSERROLE)) else: output += '

no previously selected roles.

' output += createroleselect(id_role=id_role_delegate, step=2, button='select delegate role', name='id_role_delegate', action='delegate_adminsetup', roles=delegate_roles, id_role_admin=id_role_admin) if str(id_role_delegate) != '0': subtitle = 'step 3 - confirm to add delegation right' name_role_delegate = acca.acc_getRoleName(id_role=id_role_delegate) output += """

Warning: don't hand out delegation rights that can harm the system (e.g. delegating superrole).

""" output += createhiddenform(action="delegate_adminsetup", text='let role %s delegate rights over role %s?' % (name_role_admin, name_role_delegate), id_role_admin=id_role_admin, id_role_delegate=id_role_delegate, confirm=1) if int(confirm): subtitle = 'step 4 - confirm delegation right added' # res1 = acca.acc_addRoleActionArguments_names(name_role=name_role_admin, # name_action=DELEGATEADDUSERROLE, # arglistid=-1, # optional=0, # role=name_role_delegate) res1 = acca.acc_addAuthorization(name_role=name_role_admin, name_action=DELEGATEADDUSERROLE, optional=0, role=name_role_delegate) if res1: output += '

confirm: role %s delegates role %s.' % (name_role_admin, name_role_delegate) else: output += '

sorry, delegation right could not be added,
it probably already exists.

' # see if right hand menu is available try: body = [output, extra] except NameError: body = [output] return index(req=req, title='Delegate Rights', subtitle=subtitle, body=body, adminarea=1) def perform_delegate_adduserrole(req, id_role=0, email_user_pattern='', id_user=0, confirm=0): """let a lower level web admin add users to a limited set of roles. id_role - the role to connect to a user id_user - the user to connect to a role confirm - make the connection happen """ # finding the allowed roles for this user id_admin = getUid(req) id_action = acca.acc_getActionId(name_action=DELEGATEADDUSERROLE) actions = acca.acc_findPossibleActionsUser(id_user=id_admin, id_action=id_action) allowed_roles = [] allowed_id_roles = [] for (id, arglistid, name_role_help) in actions[1:]: id_role_help = acca.acc_getRoleId(name_role=name_role_help) if id_role_help and [id_role_help, name_role_help, ''] not in allowed_roles: allowed_roles.append([id_role_help, name_role_help, '']) allowed_id_roles.append(str(id_role_help)) output = '' if not allowed_roles: subtitle = 'no delegation rights' output += """

You do not have the delegation rights over any roles.
If you think you should have such rights, contact a WebAccess Administrator.

""" extra = '' else: subtitle = 'step 1 - select role' output += """

Lower level delegation of access rights to roles.
An administrator with all rights have to give you these rights.

""" email_out = acca.acc_getUserEmail(id_user=id_user) name_role = acca.acc_getRoleName(id_role=id_role) output += createroleselect(id_role=id_role, step=1, name='id_role', action='delegate_adduserrole', roles=allowed_roles) if str(id_role) != '0' and str(id_role) in allowed_id_roles: subtitle = 'step 2 - search for users' # remove letters not allowed in an email email_user_pattern = cleanstring_email(email_user_pattern) text = ' 2. search for user \n' text += ' \n' % (email_user_pattern, ) output += createhiddenform(action="delegate_adduserrole", text=text, button="search for users", id_role=id_role) # pattern is entered if email_user_pattern: # users with matching email-address users1 = run_sql("""SELECT id, email FROM user WHERE email RLIKE '%s' ORDER BY email """ % (email_user_pattern, )) # users that are connected users2 = run_sql("""SELECT DISTINCT u.id, u.email FROM user u LEFT JOIN user_accROLE ur ON u.id = ur.id_user WHERE ur.id_accROLE = '%s' AND u.email RLIKE '%s' ORDER BY u.email """ % (id_role, email_user_pattern)) # no users that match the pattern if not (users1 or users2): output += '

no qualified users, try new search.

' # too many matching users elif len(users1) > MAXSELECTUSERS: output += '

%s hits, too many qualified users, specify more narrow search. (limit %s)

' % (len(users1), MAXSELECTUSERS) # show matching users else: subtitle = 'step 3 - select a user' users = [] extrausers = [] for (id, email) in users1: if (id, email) not in users2: users.append([id,email,'']) for (id, email) in users2: extrausers.append([-id, email,'']) output += createuserselect(id_user=id_user, action="delegate_adduserrole", step=3, users=users, extrausers=extrausers, button="add this user", id_role=id_role, email_user_pattern=email_user_pattern) try: id_user = int(id_user) except ValueError: pass # user selected already connected to role if id_user < 0: output += '

users in brackets are already attached to the role, try another one...

' # a user is selected elif email_out: subtitle = "step 4 - confirm to add user" output += createhiddenform(action="delegate_adduserrole", text='add user %s to role %s?' % (email_out, name_role), id_role=id_role, email_user_pattern=email_user_pattern, id_user=id_user, confirm=1) # it is confirmed that this user should be added if confirm: # add user result = acca.acc_addUserRole(id_user=id_user, id_role=id_role) if result and result[2]: subtitle = 'step 5 - confirm user added' output += '

confirm: user %s added to role %s.

' % (email_out, name_role) else: subtitle = 'step 5 - user could not be added' output += '

sorry, but user could not be added.

' extra = """
Remove users from role
remove users from the roles you have delegating rights to.
""" % (id_role, ) return index(req=req, title='Connect users to roles', subtitle=subtitle, body=[output, extra], adminarea=1, authorized=1) def perform_delegate_deleteuserrole(req, id_role=0, id_user=0, confirm=0): """let a lower level web admin remove users from a limited set of roles. id_role - the role to connect to a user id_user - the user to connect to a role confirm - make the connection happen """ subtitle = 'in progress...' output = '

in progress...

' # finding the allowed roles for this user id_admin = getUid(req) id_action = acca.acc_getActionId(name_action=DELEGATEADDUSERROLE) actions = acca.acc_findPossibleActionsUser(id_user=id_admin, id_action=id_action) output = '' if not actions: subtitle = 'no delegation rights' output += """

You do not have the delegation rights over any roles.
If you think you should have such rights, contact a WebAccess Administrator.

""" extra = '' else: subtitle = 'step 1 - select role' output += """

Lower level delegation of access rights to roles.
An administrator with all rights have to give you these rights.

""" email_out = acca.acc_getUserEmail(id_user=id_user) name_role = acca.acc_getRoleName(id_role=id_role) # create list of allowed roles allowed_roles = [] allowed_id_roles = [] for (id, arglistid, name_role_help) in actions[1:]: id_role_help = acca.acc_getRoleId(name_role=name_role_help) if id_role_help and [id_role_help, name_role_help, ''] not in allowed_roles: allowed_roles.append([id_role_help, name_role_help, '']) allowed_id_roles.append(str(id_role_help)) output += createroleselect(id_role=id_role, step=1, action='delegate_deleteuserrole', roles=allowed_roles) if str(id_role) != '0' and str(id_role) in allowed_id_roles: subtitle = 'step 2 - select user' users = acca.acc_getRoleUsers(id_role) output += createuserselect(id_user=id_user, step=2, action='delegate_deleteuserrole', users=users, id_role=id_role) if str(id_user) != '0': subtitle = 'step 3 - confirm delete of user' email_user = acca.acc_getUserEmail(id_user=id_user) output += createhiddenform(action="delegate_deleteuserrole", text='delete user %s from %s?' % (headerstrong(user=id_user), headerstrong(role=id_role)), id_role=id_role, id_user=id_user, confirm=1) if confirm: res = acca.acc_deleteUserRole(id_user=id_user, id_role=id_role) if res: subtitle = 'step 4 - confirm user deleted from role' output += '

confirm: deleted user %s from role %s.

' % (email_user, name_role) else: subtitle = 'step 4 - user could not be deleted' output += 'sorry, but user could not be deleted
user is probably already deleted.' extra = """
Connect users to role
add users to the roles you have delegating rights to.
""" % (id_role, ) return index(req=req, title='Remove users from roles', subtitle=subtitle, body=[output, extra], adminarea=1, authorized=1) def perform_addaction(req, name_action='', arguments='', optional='no', description='put description here.', confirm=0): """form to add a new action with these values: name_action - name of the new action arguments - allowedkeywords, separated by whitespace description - optional description of the action""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) name_action = cleanstring(name_action) arguments = cleanstring(arguments, comma=1) title = 'Add Action' subtitle = 'step 1 - give values to the requested fields' output = """
action name
arguments keywords for arguments, separate with comma, no whitespace.
optional arguments
description
""" % (name_action, arguments, optional == 'yes' and 'selected="selected"' or '', description) if name_action: # description must be changed before it is submitted if description == 'put description here.': internaldesc = '' else: internaldesc = description if arguments: subtitle = 'step 2 - confirm to add action with %s arguments' % (optional == 'yes' and 'optional' or '', ) arguments = arguments.replace(' ', '') text = 'add action with:
\n' text += 'name: %s
\n' % (name_action, ) if internaldesc: text += 'description: %s
\n' % (description, ) text += '%sarguments: %s
' % (optional == 'yes' and 'optional ' or '', arguments) text += 'optional: %s?' % (optional, ) else: optional = 'no' subtitle = 'step 2 - confirm to add action without arguments' text = 'add action %s without arguments' % (name_action, ) if internaldesc: text += '
\nand description: %s?\n' % (description, ) else: text += '?\n' output += createhiddenform(action="addaction", text=text, name_action=name_action, arguments=arguments, optional=optional, description=description, confirm=1) if confirm not in ["0", 0]: arguments = arguments.split(',') result = acca.acc_addAction(name_action, internaldesc, optional, *arguments) if result: subtitle = 'step 3 - action added' output += '

action added:

' output += tupletotable(header=['id', 'action name', 'description', 'allowedkeywords', 'optional'], tuple=[result]) else: subtitle = 'step 3 - action could not be added' output += '

sorry, could not add action,
action with the same name probably exists.

' extra = """
Add authorization
start adding new authorizations to action %s.
""" % (acca.acc_getActionId(name_action=name_action), name_action) try: body = [output, extra] except NameError: body = [output] return index(req=req, title=title, body=body, subtitle=subtitle, adminarea=4) def perform_deleteaction(req, id_action="0", confirm=0): """show all roles connected, and ask for confirmation. id_action - id of action to delete """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) title='Delete action' subtitle='step 1 - select action to delete' name_action = acca.acc_getActionName(id_action=id_action) output = createactionselect(id_action=id_action, action="deleteaction", step=1, actions=acca.acc_getAllActions(), button="delete action") if id_action != "0" and name_action: subtitle = 'step 2 - confirm the delete' output += actiondetails(id_action=id_action) if acca.acc_getActionRoles(id_action=id_action): output += createhiddenform(action="deleteroleaction", text="""rather delete only connection between action %s and a selected role?""" % (name_action, ), id_action=id_action, reverse=1, button='go there') output += createhiddenform(action="deleteaction", text=' delete action %s and all connections?' % (name_action, ), confirm=1, id_action=id_action) if confirm: subtitle = 'step 3 - confirm delete of action' res = acca.acc_deleteAction(id_action=id_action) if res: output += '

confirm: action %s deleted.
\n' % (name_action, ) output += '%s entries deleted all in all.

\n' % (res, ) else: output += '

sorry, action could not be deleted.

\n' elif id_action != "0": output += '

the action has been deleted...

' return index(req=req, title=title, subtitle=subtitle, body=[output], adminarea=4) def perform_showactiondetails(req, id_action): """show the details of an action. """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) output = createactionselect(id_action=id_action, action="showactiondetails", step=1, actions=acca.acc_getAllActions(), button="select action") if id_action not in [0, '0']: output += actiondetails(id_action=id_action) extra = """
Add new authorization
add an authorization.
Modify authorizations
modify existing authorizations.
Remove role
remove all authorizations from action and a role.
""" % (id_action, id_action, id_action) body = [output, extra] else: output += '

no details to show

' body = [output] return index(req=req, title='Show Action Details', subtitle='show action details', body=body, adminarea=4) def actiondetails(id_action=0): """show details of given action. """ output = '' if id_action not in [0, '0']: name_action = acca.acc_getActionName(id_action=id_action) output += '

action details:

' output += tupletotable(header=['id', 'name', 'description', 'allowedkeywords', 'optional'], tuple=[acca.acc_getActionDetails(id_action=id_action)]) roleshlp = acca.acc_getActionRoles(id_action=id_action) if roleshlp: roles = [] for (id, name, dontcare) in roleshlp: roles.append([id, name, 'show authorization details' % (id, id_action), 'show connected users' % (id, )]) roletable = tupletotable(header=['id', 'name', '', ''], tuple=roles) output += '

roles connected to %s:

\n' % (headerstrong(action=name_action, query=0), ) output += roletable else: output += '

no roles connected to %s.

\n' % (headerstrong(action=name_action, query=0), ) else: output += '

no details to show

' return output def perform_addrole(req, name_role='', description='put description here.', confirm=0): """form to add a new role with these values: name_role - name of the new role description - optional description of the role """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) name_role = cleanstring(name_role) title='Add Role' subtitle = 'step 1 - give values to the requested fields' output = """
role name
description
""" % (name_role, description) if name_role: # description must be changed before submitting subtitle = 'step 2 - confirm to add role' internaldesc = '' if description != 'put description here.': internaldesc = description text = """ add role with:
\n name: %s
""" % (name_role, ) if internaldesc: text += 'description: %s?\n' % (description, ) output += createhiddenform(action="addrole", text=text, name_role=name_role, description=description, confirm=1) if confirm not in ["0", 0]: result = acca.acc_addRole(name_role=name_role, description=internaldesc) if result: subtitle = 'step 3 - role added' output += '

role added:

' output += tupletotable(header=['id', 'action name', 'description', 'allowedkeywords'], tuple=[result]) else: subtitle = 'step 3 - role could not be added' output += '

sorry, could not add role,
role with the same name probably exists.

' id_role = acca.acc_getRoleId(name_role=name_role) extra = """
Add authorization
start adding new authorizations to role %s.
Connect user
connect a user to role %s.
""" % (id_role, name_role, id_role, name_role) try: body = [output, extra] except NameError: body = [output] return index(req=req, title=title, body=body, subtitle=subtitle, adminarea=3) def perform_deleterole(req, id_role="0", confirm=0): """select a role and show all connected information, users - users that can access the role. actions - actions with possible authorizations.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) title = 'Delete role' subtitle = 'step 1 - select role to delete' name_role = acca.acc_getRoleName(id_role=id_role) output = createroleselect(id_role=id_role, action="deleterole", step=1, roles=acca.acc_getAllRoles(), button="delete role") if id_role != "0" and name_role: subtitle = 'step 2 - confirm delete of role' output += roledetails(id_role=id_role) output += createhiddenform(action="deleterole", text='delete role %s and all connections?' % (name_role, ), id_role=id_role, confirm=1) if confirm: res = acca.acc_deleteRole(id_role=id_role) subtitle = 'step 3 - confirm role deleted' if res: output += "

confirm: role %s deleted.
" % (name_role, ) output += "%s entries were removed.

" % (res, ) else: output += "

sorry, the role could not be deleted.

" elif id_role != "0": output += '

the role has been deleted...

' return index(req=req, title=title, subtitle=subtitle, body=[output], adminarea=3) def perform_showroledetails(req, id_role): """show the details of a role.""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) output = createroleselect(id_role=id_role, action="showroledetails", step=1, roles=acca.acc_getAllRoles(), button="select role") if id_role not in [0, '0']: name_role = acca.acc_getRoleName(id_role=id_role) output += roledetails(id_role=id_role) extra = """
Add new authorization
add an authorization.
Modify authorizations
modify existing authorizations.
Connect user
connect a user to role %s.
Remove user
remove a user from role %s.
""" % (id_role, id_role, id_role, name_role, id_role, name_role) body = [output, extra] else: output += '

no details to show

' body = [output] return index(req=req, title='Show Role Details', subtitle='show role details', body=body, adminarea=3) def roledetails(id_role=0): """create the string to show details about a role. """ name_role = acca.acc_getRoleName(id_role=id_role) usershlp = acca.acc_getRoleUsers(id_role) users = [] for (id, email, dontcare) in usershlp: users.append([id, email, 'show user details' % (id, )]) usertable = tupletotable(header=['id', 'email'], tuple=users) actionshlp = acca.acc_getRoleActions(id_role) actions = [] for (id, name, dontcare) in actionshlp: actions.append([id, name, 'show action details' % (id_role, id), 'show authorization details' % (id_role, id)]) actiontable = tupletotable(header=['id', 'name', '', ''], tuple=actions) # show role details details = '

role details:

' details += tupletotable(header=['id', 'name', 'description'], tuple=[acca.acc_getRoleDetails(id_role=id_role)]) # show connected users details += '

users connected to %s:

' % (headerstrong(role=name_role, query=0), ) if users: details += usertable else: details += '

no users connected.

' # show connected authorizations details += '

authorizations for %s:

' % (headerstrong(role=name_role, query=0), ) if actions: details += actiontable else: details += '

no authorizations connected

' return details def perform_adduserrole(req, id_role='0', email_user_pattern='', id_user='0', confirm=0): """create connection between user and role. id_role - id of the role to add user to email_user_pattern - search for users using this pattern id_user - id of user to add to the role. """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) email_out = acca.acc_getUserEmail(id_user=id_user) name_role = acca.acc_getRoleName(id_role=id_role) title = 'Connect user to role ' subtitle = 'step 1 - select a role' output = createroleselect(id_role=id_role, action="adduserrole", step=1, roles=acca.acc_getAllRoles()) # role is selected if id_role != "0": title += name_role subtitle = 'step 2 - search for users' # remove letters not allowed in an email email_user_pattern = cleanstring_email(email_user_pattern) text = ' 2. search for user \n' text += ' \n' % (email_user_pattern, ) output += createhiddenform(action="adduserrole", text=text, button="search for users", id_role=id_role) # pattern is entered if email_user_pattern: # users with matching email-address users1 = run_sql("""SELECT id, email FROM user WHERE email RLIKE '%s' ORDER BY email """ % (email_user_pattern, )) # users that are connected users2 = run_sql("""SELECT DISTINCT u.id, u.email FROM user u LEFT JOIN user_accROLE ur ON u.id = ur.id_user WHERE ur.id_accROLE = '%s' AND u.email RLIKE '%s' ORDER BY u.email """ % (id_role, email_user_pattern)) # no users that match the pattern if not (users1 or users2): output += '

no qualified users, try new search.

' elif len(users1) > MAXSELECTUSERS: output += '

%s hits, too many qualified users, specify more narrow search. (limit %s)

' % (len(users1), MAXSELECTUSERS) # show matching users else: subtitle = 'step 3 - select a user' users = [] extrausers = [] for (id, email) in users1: if (id, email) not in users2: users.append([id,email,'']) for (id, email) in users2: extrausers.append([-id, email,'']) output += createuserselect(id_user=id_user, action="adduserrole", step=3, users=users, extrausers=extrausers, button="add this user", id_role=id_role, email_user_pattern=email_user_pattern) try: id_user = int(id_user) except ValueError: pass # user selected already connected to role if id_user < 0: output += '

users in brackets are already attached to the role, try another one...

' # a user is selected elif email_out: subtitle = "step 4 - confirm to add user" output += createhiddenform(action="adduserrole", text='add user %s to role %s?' % (email_out, name_role), id_role=id_role, email_user_pattern=email_user_pattern, id_user=id_user, confirm=1) # it is confirmed that this user should be added if confirm: # add user result = acca.acc_addUserRole(id_user=id_user, id_role=id_role) if result and result[2]: subtitle = 'step 5 - confirm user added' output += '

confirm: user %s added to role %s.

' % (email_out, name_role) else: subtitle = 'step 5 - user could not be added' output += '

sorry, but user could not be added.

' extra = """
Create new role
go here to add a new role.
""" if str(id_role) != "0": extra += """
Remove users
remove users from role %s.
Connected users
show all users connected to role %s.
Add authorization
start adding new authorizations to role %s.
""" % (id_role, name_role, id_role, name_role, id_role, name_role) return index(req=req, title=title, subtitle=subtitle, body=[output, extra], adminarea=3) def perform_addroleuser(req, email_user_pattern='', id_user='0', id_role='0', confirm=0): """delete connection between role and user. id_role - id of role to disconnect id_user - id of user to disconnect. """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) email_out = acca.acc_getUserEmail(id_user=id_user) name_role = acca.acc_getRoleName(id_role=id_role) # used to sort roles, and also to determine right side links con_roles = [] not_roles = [] title = 'Connect user to roles' subtitle = 'step 1 - search for users' # clean email search string email_user_pattern = cleanstring_email(email_user_pattern) text = ' 1. search for user \n' text += ' \n' % (email_user_pattern, ) output = createhiddenform(action='addroleuser', text=text, button='search for users', id_role=id_role) if email_user_pattern: subtitle = 'step 2 - select user' users1 = run_sql("""SELECT id, email FROM user WHERE email RLIKE '%s' ORDER BY email """ % (email_user_pattern, )) users = [] for (id, email) in users1: users.append([id, email, '']) # no users if not users: output += '

no qualified users, try new search.

' # too many users elif len(users) > MAXSELECTUSERS: output += '

%s hits, too many qualified users, specify more narrow search. (limit %s)

' % (len(users), MAXSELECTUSERS) # ok number of users else: output += createuserselect(id_user=id_user, action='addroleuser', step=2, users=users, button='select user', email_user_pattern=email_user_pattern) if int(id_user): subtitle = 'step 3 - select role' # roles the user is connected to role_ids = acca.acc_getUserRoles(id_user=id_user) # all the roles, lists are sorted on the background of these... all_roles = acca.acc_getAllRoles() # sort the roles in connected and not connected roles for (id, name, description) in all_roles: if (id, ) in role_ids: con_roles.append([-id, name, description]) else: not_roles.append([id, name, description]) # create roleselect output += createroleselect(id_role=id_role, action='addroleuser', step=3, roles=not_roles, extraroles=con_roles, extrastamp='(connected)', button='add this role', email_user_pattern=email_user_pattern, id_user=id_user) if int(id_role) < 0: name_role = acca.acc_getRoleName(id_role=-int(id_role)) output += '

role %s already connected to the user, try another one...

' % (name_role, ) elif int(id_role): subtitle = 'step 4 - confirm to add role to user' output += createhiddenform(action='addroleuser', text='add role %s to user %s?' % (name_role, email_out), email_user_pattern=email_user_pattern, id_user=id_user, id_role=id_role, confirm=1) if confirm: # add role result = acca.acc_addUserRole(id_user=id_user, id_role=id_role) if result and result[2]: subtitle = 'step 5 - confirm role added' output += '

confirm: role %s added to user %s.

' % (name_role, email_out) else: subtitle = 'step 5 - role could not be added' output += '

sorry, but role could not be added

' extra = """
Create new role
go here to add a new role.
""" if int(id_user) and con_roles: extra += """
Remove roles
disconnect roles from user %s.
""" % (id_user, email_out) if int(id_role): if int(id_role) < 0: id_role = -int(id_role) extra += """
Remove users
disconnect users from role %s.
""" % (id_role, name_role) return index(req=req, title=title, subtitle=subtitle, body=[output, extra], adminarea=5) def perform_deleteuserrole(req, id_role='0', id_user='0', reverse=0, confirm=0): """delete connection between role and user. id_role - id of role to disconnect id_user - id of user to disconnect. """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) title = 'Remove user from role' email_user = acca.acc_getUserEmail(id_user=id_user) name_role = acca.acc_getRoleName(id_role=id_role) output = '' if reverse in [0, '0']: adminarea = 3 subtitle = 'step 1 - select the role' output += createroleselect(id_role=id_role, action="deleteuserrole", step=1, roles=acca.acc_getAllRoles()) if id_role != "0": subtitle = 'step 2 - select the user' output += createuserselect(id_user=id_user, action="deleteuserrole", step=2, users=acca.acc_getRoleUsers(id_role=id_role), id_role=id_role) else: adminarea = 5 # show only if user is connected to a role, get users connected to roles users = run_sql("""SELECT DISTINCT(u.id), u.email, u.note FROM user u LEFT JOIN user_accROLE ur ON u.id = ur.id_user WHERE ur.id_accROLE != 'NULL' AND u.email != '' ORDER BY u.email """) has_roles = 1 # check if the user is connected to any roles for (id, email, note) in users: if str(id) == str(id_user): break # user not connected to a role else: subtitle = 'step 1 - user not connected' output += '

no need to remove roles from user %s,
user is not connected to any roles.

' % (email_user, ) has_roles, id_user = 0, '0' # stop the rest of the output below... # user connected to roles if has_roles: output += createuserselect(id_user=id_user, action="deleteuserrole", step=1, users=users, reverse=reverse) if id_user != "0": subtitle = 'step 2 - select the role' role_ids = acca.acc_getUserRoles(id_user=id_user) all_roles = acca.acc_getAllRoles() roles = [] for (id, name, desc) in all_roles: if (id, ) in role_ids: roles.append([id, name, desc]) output += createroleselect(id_role=id_role, action="deleteuserrole", step=2, roles=roles, id_user=id_user, reverse=reverse) if id_role != '0' and id_user != '0': subtitle = 'step 3 - confirm delete of user' output += createhiddenform(action="deleteuserrole", text='delete user %s from %s?' % (headerstrong(user=id_user), headerstrong(role=id_role)), id_role=id_role, id_user=id_user, reverse=reverse, confirm=1) if confirm: res = acca.acc_deleteUserRole(id_user=id_user, id_role=id_role) if res: subtitle = 'step 4 - confirm delete of user' output += '

confirm: deleted user %s from role %s.

' % (email_user, name_role) else: subtitle = 'step 4 - user could not be deleted' output += 'sorry, but user could not be deleted
user is probably already deleted.' extra = '' if str(id_role) != "0": extra += """
Connect user
add users to role %s.
""" % (id_role, name_role) if int(reverse): extra += """
Remove user
remove users from role %s.
""" % (id_role, name_role) extra += '
' if str(id_user) != "0": extra += """
Connect role
add roles to user %s.
""" % (email_user, id_user, email_user) if not int(reverse): extra += """
Remove role
remove roles from user %s.
""" % (id_user, email_user, email_user) extra += '
' if extra: body = [output, extra] else: body = [output] return index(req=req, title=title, subtitle=subtitle, body=body, adminarea=adminarea) def perform_showuserdetails(req, id_user=0): """show the details of a user. """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) if id_user not in [0, '0']: output = userdetails(id_user=id_user) email_user = acca.acc_getUserEmail(id_user=id_user) extra = """
Connect role
connect a role to user %s.
Remove role
remove a role from user %s.
""" % (id_user, email_user, email_user, id_user, email_user) body = [output, extra] else: body = ['

no details to show

'] return index(req=req, title='Show User Details', subtitle='show user details', body=body, adminarea=5) def userdetails(id_user=0): """create the string to show details about a user. """ # find necessary details email_user = acca.acc_getUserEmail(id_user=id_user) userroles = acca.acc_getUserRoles(id_user=id_user) conn_roles = [] # find connected roles for (id, name, desc) in acca.acc_getAllRoles(): if (id, ) in userroles: conn_roles.append([id, name, desc]) conn_roles[-1].append('show details' % (id, )) if conn_roles: # print details details = '

roles connected to user %s

' % (email_user, ) details += tupletotable(header=['id', 'name', 'description', ''], tuple=conn_roles) else: details = '

no roles connected to user %s.

' % (email_user, ) return details def perform_addauthorization(req, id_role="0", id_action="0", optional=0, reverse="0", confirm=0, **keywords): """ form to add new connection between user and role: id_role - role to connect id_action - action to connect reverse - role or action first? """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) # values that might get used name_role = acca.acc_getRoleName(id_role=id_role) or id_role name_action = acca.acc_getActionName(id_action=id_action) or id_action optional = optional == 'on' and 1 or int(optional) extra = """
Create new role
go here to add a new role.
Create new action
go here to add a new action.
""" # create the page according to which step the user is on # role -> action -> arguments if reverse in ["0", 0]: adminarea = 3 subtitle = 'step 1 - select role' output = createroleselect(id_role=id_role, action="addauthorization", step=1, roles=acca.acc_getAllRoles(), reverse=reverse) if str(id_role) != "0": subtitle = 'step 2 - select action' rolacts = acca.acc_getRoleActions(id_role) allhelp = acca.acc_getAllActions() allacts = [] for r in allhelp: if r not in rolacts: allacts.append(r) output += createactionselect(id_action=id_action, action="addauthorization", step=2, actions=rolacts, extraactions=allacts, id_role=id_role, reverse=reverse) # action -> role -> arguments else: adminarea = 4 subtitle = 'step 1 - select action' output = createactionselect(id_action=id_action, action="addauthorization", step=1, actions=acca.acc_getAllActions(), reverse=reverse) if str(id_action) != "0": subtitle = 'step 2 - select role' actroles = acca.acc_getActionRoles(id_action) allhelp = acca.acc_getAllRoles() allroles = [] for r in allhelp: if r not in actroles: allroles.append(r) output += createroleselect(id_role=id_role, action="addauthorization", step=2, roles=actroles, extraroles=allroles, id_action=id_action, reverse=reverse) # ready for step 3 no matter which direction we took to get here if id_action != "0" and id_role != "0": # links to adding authorizations in the other direction if str(reverse) == "0": extra += """
Add authorization
add authorizations to action %s.
""" % (id_action, name_action) else: extra += """
Add authorization
add authorizations to role %s.
""" % (id_role, name_role) subtitle = 'step 3 - enter values for the keywords\n' output += """
""" % (id_role, id_action, reverse) # the actions argument keywords res_keys = acca.acc_getActionKeywords(id_action=id_action) # res used to display existing authorizations # res used to determine if showing "create connection without arguments" res_auths = acca.acc_findPossibleActions(id_role, id_action) if not res_keys: # action without arguments if not res_auths: output += """ create connection between %s?
""" % (headerstrong(role=name_role, action=name_action, query=0), ) else: output += '

connection without arguments is already created.

' else: # action with arguments optionalargs = acca.acc_getActionIsOptional(id_action=id_action) output += '3. authorized arguments
' if optionalargs: # optional arguments output += """

connect %s to %s for any arguments
connect %s to %s for only these argument cases:

""" % (optional and 'checked="checked"' or '', name_role, name_action, not optional and 'checked="checked"' or '', name_role, name_action) # list the arguments allkeys = 1 for key in res_keys: output += '%s \n \n' output += '\n' # ask for confirmation if str(allkeys) != "0" or optional: keys = keywords.keys() keys.reverse() subtitle = 'step 4 - confirm add of authorization\n' text = """ create connection between
%s
""" % (headerstrong(role=name_role, action=name_action, query=0), ) if optional: text += 'withouth arguments' keywords = {} else: for key in keys: text += '%s: %s \n' % (key, keywords[key]) output += createhiddenform(action="addauthorization", text=text, id_role=id_role, id_action=id_action, reverse=reverse, confirm=1, optional=optional, **keywords) # show existing authorizations, found authorizations further up in the code... # res_auths = acca.acc_findPossibleActions(id_role, id_action) output += '

existing authorizations:

' if res_auths: output += tupletotable(header=res_auths[0], tuple=res_auths[1:]) # shortcut to modifying authorizations extra += """
Modify authorizations
modify the existing authorizations.
""" % (id_role, id_action, reverse) else: output += '

no details to show

' # user confirmed to add entries if confirm: subtitle = 'step 5 - confirm authorization added' res1 = acca.acc_addAuthorization(name_role=name_role, name_action=name_action, optional=optional, **keywords) if res1: res2 = acca.acc_findPossibleActions(id_role, id_action) arg = res1[0][3] # the arglistid new = [res2[0]] for row in res2[1:]: if int(row[0]) == int(arg): new.append(row) newauths = tupletotable(header=new[0], tuple=new[1:]) newentries = tupletotable(header=['role id', 'action id', 'argument id', '#'], tuple=res1) st = 'style="vertical-align: top"' output += """

new authorization and entries:

%s %s
""" % (st, newauths, st, newentries) else: output += '

sorry, authorization could not be added,
it probably already exists

' # trying to put extra link on the right side try: body = [output, extra] except NameError: body = [output] return index(req=req, title = 'Create entry for new authorization', subtitle=subtitle, body=body, adminarea=adminarea) def perform_deleteroleaction(req, id_role="0", id_action="0", reverse=0, confirm=0): """delete all connections between a role and an action. id_role - id of the role id_action - id of the action reverse - 0: ask for role first 1: ask for action first""" (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) title = 'Remove action from role ' if reverse in ["0", 0]: # select role -> action adminarea = 3 subtitle = 'step 1 - select a role' output = createroleselect(id_role=id_role, action="deleteroleaction", step=1, roles=acca.acc_getAllRoles(), reverse=reverse) if id_role != "0": rolacts = acca.acc_getRoleActions(id_role=id_role) subtitle = 'step 2 - select the action' output += createactionselect(id_action=id_action, action="deleteroleaction", step=2, actions=rolacts, reverse=reverse, id_role=id_role, button="remove connection and all authorizations") else: # select action -> role adminarea = 4 subtitle = 'step 1 - select an action' output = createactionselect(id_action=id_action, action="deleteroleaction", step=1, actions=acca.acc_getAllActions(), reverse=reverse) if id_action != "0": actroles = acca.acc_getActionRoles(id_action=id_action) subtitle = 'step 2 - select the role' output += createroleselect(id_role=id_role, action="deleteroleaction", step=2, roles=actroles, button="remove connection and all authorizations", id_action=id_action, reverse=reverse) if id_action != "0" and id_role != "0": subtitle = 'step 3 - confirm to remove authorizations' # ask for confirmation res = acca.acc_findPossibleActions(id_role, id_action) if res: output += '

authorizations that will be deleted:

' output += tupletotable(header=res[0], tuple=res[1:]) output += createhiddenform(action="deleteroleaction", text='remove %s from %s' % (headerstrong(action=id_action), headerstrong(role=id_role)), confirm=1, id_role=id_role, id_action=id_action, reverse=reverse) else: output += 'no authorizations' # confirmation is given if confirm: subtitle = 'step 4 - confirm authorizations removed ' res = acca.acc_deleteRoleAction(id_role=id_role, id_action=id_action) if res: output += '

confirm: removed %s from %s
' % (headerstrong(action=id_action), headerstrong(role=id_role)) output += '%s entries were removed.

' % (res, ) else: output += '

sorry, no entries could be removed.

' return index(req=req, title=title, subtitle=subtitle, body=[output], adminarea=adminarea) def perform_modifyauthorizations(req, id_role="0", id_action="0", reverse=0, confirm=0, errortext='', sel='', authids=[]): """given ids of a role and an action, show all possible action combinations with checkboxes and allow user to access other functions. id_role - id of the role id_action - id of the action reverse - 0: ask for role first 1: ask for action first sel - which button and modification that is selected errortext - text to print when no connection exist between role and action authids - ids of checked checkboxes """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) name_role = acca.acc_getRoleName(id_role) name_action = acca.acc_getActionName(id_action) output = '' try: id_role, id_action, reverse = int(id_role), int(id_action), int(reverse) except ValueError: pass extra = """
Create new role
go here to add a new role.
Create new action
go here to add a new action.
""" if id_role or id_action: extra += '\n
\n' if id_role and id_action: extra += """
Add authorizations
add an authorization to the existing ones.
""" % (id_role, id_action, reverse) if id_role: extra += """
Add authorizations
add to role %s.
""" % (id_role, name_role) if id_action: extra += """
Add authorizations
add to action %s.
""" % (id_action, name_action) extra += '\n
\n' if not reverse: # role -> action adminarea = 3 subtitle = 'step 1 - select the role' output += createroleselect(id_role=str(id_role), action="modifyauthorizations", step=1, roles=acca.acc_getAllRoles(), reverse=reverse) if id_role: rolacts = acca.acc_getRoleActions(id_role=id_role) subtitle = 'step 2 - select the action' output += createactionselect(id_action=str(id_action), action="modifyauthorizations", step=2, actions=rolacts, id_role=id_role, reverse=reverse) else: adminarea = 4 # action -> role subtitle = 'step 1 - select the action' output += createactionselect(id_action=str(id_action), action="modifyauthorizations", step=1, actions=acca.acc_getAllActions(), reverse=reverse) if id_action: actroles = acca.acc_getActionRoles(id_action=id_action) subtitle = 'step 2 - select the role' output += createroleselect(id_role=str(id_role), action="modifyauthorizations", step=2, roles=actroles, id_action=id_action, reverse=reverse) if errortext: output += '

%s

' % (errortext, ) if id_role and id_action: # adding to main area if type(authids) is not list: authids = [authids] subtitle = 'step 3 - select groups and modification' # get info res = acca.acc_findPossibleActions(id_role, id_action) # clean the authids hiddenids = [] if sel in ['delete selected']: hiddenids = authids[:] elif sel in ['split groups', 'merge groups']: for authid in authids: arghlp = res[int(authid)][0] if authid not in hiddenids and arghlp not in [-1, '-1', 0, '0']: hiddenids.append(authid) authids = hiddenids[:] if confirm: # do selected modification and output with new authorizations if sel == 'split groups': res = splitgroups(id_role, id_action, authids) elif sel == 'merge groups': res = mergegroups(id_role, id_action, authids) elif sel == 'delete selected': res = deleteselected(id_role, id_action, authids) authids = [] res = acca.acc_findPossibleActions(id_role, id_action) output += 'authorizations after %s.
\n' % (sel, ) elif sel and authids: output += 'confirm choice of authorizations and modification.
\n' else: output += 'select authorizations and perform modification.
\n' if not res: errortext='all connections deleted, try different ' if reverse in ["0", 0]: return perform_modifyauthorizations(req=req, id_role=id_role, errortext=errortext + 'action.') else: return perform_modifyauthorizations(req=req, id_action=id_action, reverse=reverse, errortext=errortext + 'role.') # display output += modifyauthorizationsmenu(id_role, id_action, header=res[0], tuple=res[1:], checked=authids, reverse=reverse) if sel and authids: subtitle = 'step 4 - confirm to perform modification' # form with hidden authids output += '
\n' % ('modifyauthorizations', ) for hiddenid in hiddenids: output += '\n' % (hiddenid, ) # choose what to do if sel == 'split groups': output += '

split groups containing:

' elif sel == 'merge groups': output += '

merge groups containing:

' elif sel == 'delete selected': output += '

delete selected entries:

' extracolumn = '\n' extracolumn += '\n' # show the entries here... output += tupletotable_onlyselected(header=res[0], tuple=res[1:], selected=hiddenids, extracolumn=extracolumn) output += '\n' % (id_role, ) output += '\n' % (id_action, ) output += '\n' % (sel, ) output += '\n' % (reverse, ) output += '
' # tried to perform modification without something selected elif sel and not authids and not confirm: output += '

no valid groups selected

' # trying to put extra link on the right side try: body = [output, extra] except NameError: body = [output] # Display the page return index(req=req, title='Modify Authorizations', subtitle=subtitle, body=body, adminarea=adminarea) def modifyauthorizationsmenu(id_role, id_action, tuple=[], header=[], checked=[], reverse=0): """create table with header and checkboxes, used for multiple choice. makes use of tupletotable to add the actual table id_role - selected role, hidden value in the form id_action - selected action, hidden value in the form tuple - all rows to be put in the table (with checkboxes) header - column headers, empty strings added at start and end checked - ids of rows to be checked """ if not tuple: return 'no authorisations...' argnum = len(acca.acc_getActionKeywords(id_action=id_action)) tuple2 = [] for t in tuple: tuple2.append(t[:]) tuple2 = addcheckboxes(datalist=tuple2, name='authids', startindex=1, checked=checked) hidden = ' \n' % (id_role, ) hidden += ' \n' % (id_action, ) hidden += ' \n' % (reverse, ) button = '\n' if argnum > 1: button += '\n' button += '\n' hdrstr = '' for h in [''] + header + ['']: hdrstr += ' %s\n' % (h, ) if hdrstr: hdrstr = ' \n%s\n \n' % (hdrstr, ) output = '
\n' output += ' \n' output += hdrstr output += '\n' % (hidden, ) align = ['admintdleft'] * len(tuple2[0]) try: align[1] = 'admintdright' except IndexError: pass output += '' for i in range(len(tuple2[0])): output += '\n' % (align[i], tuple2[0][i]) output += '\n' % (len(tuple2), button) output += '\n' for row in tuple2[1:]: output += ' \n' for i in range(len(row)): output += '\n' % (align[i], row[i]) output += ' \n' output += '
%s
%s\n%s\n
%s
\n
\n' return output def splitgroups(id_role=0, id_action=0, authids=[]): """get all the old ones, gather up the arglistids find a list of arglistidgroups to be split, unique get all actions in groups outside of the old ones, (old arglistid is allowed). show them like in showselect. """ if not id_role or not id_action or not authids: return 0 # find all the actions datalist = acca.acc_findPossibleActions(id_role, id_action) if type(authids) is str: authids = [authids] for i in range(len(authids)): authids[i] = int(authids[i]) # argumentlistids of groups to be split splitgrps = [] for authid in authids: hlp = datalist[authid][0] if hlp not in splitgrps and authid in range(1,len(datalist)): splitgrps.append(hlp) # split groups and return success or failure result = 1 for splitgroup in splitgrps: result = 1 and acca.acc_splitArgumentGroup(id_role, id_action, splitgroup) return result def mergegroups(id_role=0, id_action=0, authids=[]): """get all the old ones, gather up the argauthids find a list of arglistidgroups to be split, unique get all actions in groups outside of the old ones, (old arglistid is allowed). show them like in showselect.""" if not id_role or not id_action or not authids: return 0 datalist = acca.acc_findPossibleActions(id_role, id_action) if type(authids) is str: authids = [authids] for i in range(len(authids)): authids[i] = int(authids[i]) # argumentlistids of groups to be merged mergegroups = [] for authid in authids: hlp = datalist[authid][0] if hlp not in mergegroups and authid in range(1, len(datalist)): mergegroups.append(hlp) # merge groups and return success or failure if acca.acc_mergeArgumentGroups(id_role, id_action, mergegroups): return 1 else: return 0 def deleteselected(id_role=0, id_action=0, authids=[]): """delete checked authorizations/possible actions, ids in authids. id_role - role to delete from id_action - action to delete from authids - listids for which possible actions to delete.""" if not id_role or not id_action or not authids: return 0 if type(authids) in [str, int]: authids = [authids] for i in range(len(authids)): authids[i] = int(authids[i]) result = acca.acc_deletePossibleActions(id_role=id_role, id_action=id_action, authids=authids) return result def headeritalic(**ids): """transform keyword=value pairs to string with value in italics. **ids - a dictionary of pairs to create string from """ output = '' value = '' table = '' for key in ids.keys(): if key in ['User', 'user']: value, table = 'email', 'user' elif key in ['Role', 'role']: value, table = 'name', 'accROLE' elif key in ['Action', 'action']: value, table = 'name', 'accACTION' else: if output: output += ' and ' output += ' %s %s' % (key, ids[key]) continue res = run_sql("""SELECT %s FROM %s WHERE id = %s""" % (value, table, ids[key])) if res: if output: output += ' and ' output += ' %s %s' % (key, res[0][0]) return output def headerstrong(query=1, **ids): """transform keyword=value pairs to string with value in strong text. **ids - a dictionary of pairs to create string from query - 1 -> try to find names to ids of role, user and action. 0 -> do not try to find names, use the value passed on """ output = '' value = '' table = '' for key in ids.keys(): if key in ['User', 'user']: value, table = 'email', 'user' elif key in ['Role', 'role']: value, table = 'name', 'accROLE' elif key in ['Action', 'action']: value, table = 'name', 'accACTION' else: if output: output += ' and ' output += ' %s %s' % (key, ids[key]) continue if query: res = run_sql("""SELECT %s FROM %s WHERE id = %s""" % (value, table, ids[key])) if res: if output: output += ' and ' output += ' %s %s' % (key, res[0][0]) else: if output: output += ' and ' output += ' %s %s' % (key, ids[key]) return output def startpage(): """create the menu for the startpage""" body = """
selection for WebAccess Admin
Role Area
main area to configure administration rights and authorization rules.
Action Area
configure administration rights with the actions as starting point.
User Area
configure administration rights with the users as starting point.
Reset Area
reset roles, actions and authorizations.
""" return body def rankarea(): return "Rankmethod area" def perform_simpleauthorization(req, id_role=0, id_action=0): """show a page with simple overview of authorizations between a connected role and action. """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) res = acca.acc_findPossibleActions(id_role, id_action) if res: extra = createhiddenform(action='modifyauthorizations', button='modify authorizations', id_role=id_role, id_action=id_action) output = '

authorizations for %s:

' % (headerstrong(action=id_action, role=id_role), ) output += tupletotable(header=res[0], tuple=res[1:], extracolumn=extra) else: output = 'no details to show' return index(req=req, title='Simple authorization details', subtitle='simple authorization details', body=[output], adminarea=3) def perform_showroleusers(req, id_role=0): """show a page with simple overview of a role and connected users. """ (auth_code, auth_message) = is_adminuser(req) if auth_code != 0: return mustloginpage(req, auth_message) res = acca.acc_getRoleUsers(id_role=id_role) name_role = acca.acc_getRoleName(id_role=id_role) if res: users = [] for (id, name, dontcare) in res: users.append([id, name, 'show user details' % (id, )]) output = '

users connected to %s:

' % (headerstrong(role=id_role), ) output += tupletotable(header=['id', 'name', ''], tuple=users) else: output = 'no users connected to role %s' % (name_role, ) extra = """
Connect user
connect users to the role.
""" % (id_role, ) return index(req=req, title='Users connected to role %s' % (name_role, ), subtitle='simple details', body=[output, extra], adminarea=3) def createselect(id_input="0", label="", step=0, name="", action="", list=[], extralist=[], extrastamp='', button="", **hidden): """create form with select and hidden values id - the one to choose as selected if exists label - label shown to the left of the select name - the name of the select on which to reference it list - primary list to select from extralist - list of options to be put in paranthesis extrastamp - stamp extralist entries with this if not '' usually paranthesis around the entry button - the value/text to be put on the button **hidden - name=value pairs to be put as hidden in the form. """ step = step and '%s. ' % step or '' output = '
\n' % (action, ) output += ' %s\n' % (step + label, ) output += ' \n' for key in hidden.keys(): output += ' \n' % (key, hidden[key]) output += ' \n' % (button, ) output += '
\n' return output def createactionselect(id_action="0", label="select action", step=0, name="id_action", action="", actions=[], extraactions=[], extrastamp='', button="select action", **hidden): """create a select for roles in a form. see createselect.""" return createselect(id_input=id_action, label=label, step=step, name=name, action=action, list=actions, extralist=extraactions, extrastamp=extrastamp, button=button, **hidden) def createroleselect(id_role="0", label="select role", step=0, name="id_role", action="", roles=[], extraroles=[], extrastamp='', button="select role", **hidden): """create a select for roles in a form. see createselect.""" return createselect(id_input=id_role, label=label, step=step, name=name, action=action, list=roles, extralist=extraroles, extrastamp=extrastamp, button=button, **hidden) def createuserselect(id_user="0", label="select user", step=0, name="id_user", action="", users=[], extrausers=[], extrastamp='(connected)', button="select user", **hidden): """create a select for users in a form.see createselect.""" return createselect(id_input=id_user, label=label, step=step, name=name, action=action, list=users, extralist=extrausers, extrastamp=extrastamp, button=button, **hidden) def cleanstring(str='', comma=0): """clean all the strings before submitting to access control admin. remove characters not letter, number or underscore, also remove leading underscores and numbers. return cleaned string. str - string to be cleaned comma - 1 -> allow the comma to divide multiple arguments 0 -> wash commas as well """ # remove not allowed characters str = re.sub(r'[^a-zA-Z0-9_,]', '', str) # split string on commas items = str.split(',') str = '' for item in items: if not item: continue if comma and str: str += ',' # create valid variable names str += re.sub(r'^([0-9_])*', '', item) return str def cleanstring_argumentvalue(str=''): """clean the value of an argument before submitting it. allowed characters: a-z A-Z 0-9 _ and space str - string to be cleaned """ # remove not allowed characters str = re.sub(r'[^a-zA-Z0-9_ .]', '', str) # trim leading and ending spaces str = re.sub(r'^ *| *$', '', str) return str def cleanstring_email(str=''): """clean the string and return a valid email address. str - string to be cleaned """ # remove not allowed characters str = re.sub(r'[^a-zA-Z0-9_.@-]', '', str) return str def check_email(str=''): """control that submitted emails are correct. this little check is not very good, but better than nothing. """ r = re.compile(r'(.)+\@(.)+\.(.)+') return r.match(str) and 1 or 0 def sendAccountActivatedMessage(AccountEmail, sendTo, password, ln=cdslang): """Send an email to the address given by sendTo about the new activated account.""" fromaddr = "From: %s" % supportemail toaddrs = "To: %s" % sendTo to = toaddrs + "\n" sub = "Subject: Your account on '%s' has been activated\n\n" % cdsname body = "Your account earlier created on '%s' has been activated:\n\n" % cdsname body += " Username/Email: %s\n" % AccountEmail body += " Password: %s\n" % ("*" * len(password)) body += "\n---------------------------------" body += "\n%s" % cdsname body += "\nContact: %s" % supportemail msg = to + sub + body server = smtplib.SMTP('localhost') server.set_debuglevel(1) try: server.sendmail(fromaddr, toaddrs, msg) except smtplib.SMTPRecipientsRefused,e: return 0 server.quit() return 1 def sendNewUserAccountWarning(newAccountEmail, sendTo, password, ln=cdslang): """Send an email to the address given by sendTo about the new account newAccountEmail.""" fromaddr = "From: %s" % supportemail toaddrs = "To: %s" % sendTo to = toaddrs + "\n" sub = "Subject: Account created on '%s'\n\n" % cdsname body = "An account has been created for you on '%s':\n\n" % cdsname body += " Username/Email: %s\n" % newAccountEmail body += " Password: %s\n" % ("*" * len(password)) body += "\n---------------------------------" body += "\n%s" % cdsname body += "\nContact: %s" % supportemail msg = to + sub + body server = smtplib.SMTP('localhost') server.set_debuglevel(1) try: server.sendmail(fromaddr, toaddrs, msg) except smtplib.SMTPRecipientsRefused,e: return 0 server.quit() return 1 def sendAccountRejectedMessage(newAccountEmail, sendTo, ln=cdslang): """Send an email to the address given by sendTo about the new account newAccountEmail.""" fromaddr = "From: %s" % supportemail toaddrs = "To: %s" % sendTo to = toaddrs + "\n" sub = "Subject: Account rejected on '%s'\n\n" % cdsname body = "Your request for an account has been rejected on '%s':\n\n" % cdsname body += " Username/Email: %s\n" % newAccountEmail body += "\n---------------------------------" body += "\n%s" % cdsname body += "\nContact: %s" % supportemail msg = to + sub + body server = smtplib.SMTP('localhost') server.set_debuglevel(1) try: server.sendmail(fromaddr, toaddrs, msg) except smtplib.SMTPRecipientsRefused,e: return 0 server.quit() return 1 def sendAccountDeletedMessage(newAccountEmail, sendTo, ln=cdslang): """Send an email to the address given by sendTo about the new account newAccountEmail.""" fromaddr = "From: %s" % supportemail toaddrs = "To: %s" % sendTo to = toaddrs + "\n" sub = "Subject: Account deleted on '%s'\n\n" % cdsname body = "Your account on '%s' has been deleted:\n\n" % cdsname body += " Username/Email: %s\n" % newAccountEmail body += "\n---------------------------------" body += "\n%s" % cdsname body += "\nContact: %s" % supportemail msg = to + sub + body server = smtplib.SMTP('localhost') server.set_debuglevel(1) try: server.sendmail(fromaddr, toaddrs, msg) except smtplib.SMTPRecipientsRefused,e: return 0 server.quit() return 1 diff --git a/modules/webalert/bin/alertengine.in b/modules/webalert/bin/alertengine.in index 6643a3f3e..aee7d5bf5 100644 --- a/modules/webalert/bin/alertengine.in +++ b/modules/webalert/bin/alertengine.in @@ -1,73 +1,71 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Alert engine command line interface""" __version__ = "$Id$" try: import sys - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) import getopt from cdsware.config import version, supportemail from cdsware.alert_engine import run_alerts from time import time except ImportError, e: print "Error: %s" % e import sys sys.exit(1) DEBUGLEVEL = 0 def usage(): print """Usage: alertengine [OPTION] Run the alert engine.\n -h, --help display this help and exit -V, --version output version information and exit\n Report bugs to <%s>""" % supportemail def main(): global DEBUGLEVEL try: opts, args = getopt.getopt(sys.argv[1:], "hV", ["help", "version"]) except getopt.GetoptError: usage() sys.exit(2) for o, a in opts: if o in ("-h", "--help"): usage() sys.exit() if o in ("-V", "--version"): print __version__ sys.exit(0) run_alerts() if __name__ == "__main__": t0 = time() main() t1 = time() print 'Alert engine finished in %.2f seconds' % (t1 - t0) diff --git a/modules/webalert/lib/alert_engine.py b/modules/webalert/lib/alert_engine.py index 4cf35a828..55a6e982a 100644 --- a/modules/webalert/lib/alert_engine.py +++ b/modules/webalert/lib/alert_engine.py @@ -1,443 +1,444 @@ ## $Id$ ## Alert engine implementation. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Alert engine implementation.""" ## rest of the Python code goes below __version__ = "$Id$" from cgi import parse_qs from sre import search, sub from time import localtime, strftime, mktime, sleep import smtplib -from config import * -from search_engine import perform_request_search -from dbquery import run_sql -from htmlparser import * -from string import split + +from cdsware.config import * +from cdsware.search_engine import perform_request_search +from cdsware.dbquery import run_sql +from cdsware.htmlparser import * +from cdsware.string import split MAXIDS = 50 FROMADDR = 'CDS Alert Engine <%s>' % alertengineemail ALERTURL = weburl + '/youralerts.py/list' DEVELOPERADDR = [supportemail] # Debug levels: # 0 = production, nothing on the console, email sent # 1 = messages on the console, email sent # 2 = messages on the console, but no email sent # 3 = many messages on the console, no email sent # 4 = many messages on the console, email sent to DEVELOPERADDR DEBUGLEVEL = 0 def update_date_lastrun(alert): return run_sql('update user_query_basket set date_lastrun=%s where id_user=%s and id_query=%s and id_basket=%s;', (strftime("%Y-%m-%d"), alert[0], alert[1], alert[2],)) def get_alert_queries(frequency): return run_sql('select distinct id, urlargs from query q, user_query_basket uqb where q.id=uqb.id_query and uqb.frequency=%s and uqb.date_lastrun <= now();', (frequency,)) def get_alert_queries_for_user(uid): return run_sql('select distinct id, urlargs, uqb.frequency from query q, user_query_basket uqb where q.id=uqb.id_query and uqb.id_user=%s and uqb.date_lastrun <= now();', (uid,)) def get_alerts(query, frequency): r = run_sql('select id_user, id_query, id_basket, frequency, date_lastrun, alert_name, notification from user_query_basket where id_query=%s and frequency=%s;', (query['id_query'], frequency,)) return {'alerts': r, 'records': query['records'], 'argstr': query['argstr'], 'date_from': query['date_from'], 'date_until': query['date_until']} # def add_record_to_basket(record_id, basket_id): # if DEBUGLEVEL > 0: # print "-> adding record %s into basket %s" % (record_id, basket_id) # try: # return run_sql('insert into basket_record (id_basket, id_record) values(%s, %s);', (basket_id, record_id,)) # except: # return 0 # def add_records_to_basket(record_ids, basket_id): # # TBD: generate the list and all all records in one step (see below) # for i in record_ids: # add_record_to_basket(i, basket_id) # Optimized version: def add_records_to_basket(record_ids, basket_id): global DEBUGLEVEL nrec = len(record_ids) if nrec > 0: vals = '(%s,%s)' % (basket_id, record_ids[0]) if nrec > 1: for i in record_ids[1:]: vals += ',(%s, %s)' % (basket_id, i) if DEBUGLEVEL > 0: print "-> adding %s records into basket %s: %s" % (nrec, basket_id, vals) try: if DEBUGLEVEL < 4: return run_sql('insert into basket_record (id_basket, id_record) values %s;' % vals) # Cannot use the run_sql(, (,)) form for some reason else: print ' NOT ADDED, DEBUG LEVEL == 4' return 0 except: return 0 else: return 0 def get_email(uid): r = run_sql('select email from user where id=%s', (uid,)) return r[0][0] def get_query(alert_id): r = run_sql('select urlargs from query where id=%s', (alert_id,)) return r[0][0] def send_email(fromaddr, toaddr, body, attempt=0): global DEBUGLEVEL if attempt > 2: log('error sending email to %s: SMTP error; gave up after 3 attempts' % toaddr) return try: server = smtplib.SMTP('localhost') if DEBUGLEVEL > 2: server.set_debuglevel(1) else: server.set_debuglevel(0) server.sendmail(fromaddr, toaddr, body) server.quit() except: if (DEBUGLEVEL > 1): print 'Error connecting to SMTP server, attempt %s retrying in 5 minutes. Exception raised: %s' % (attempt, sys.exc_info()[0]) sleep(300) send_email(fromaddr, toaddr, body, attempt+1) return def forge_email(fromaddr, toaddr, subject, content): body = 'From: %s\nTo: %s\nContent-Type: text/plain; charset=utf-8\nSubject: %s\n%s' % (fromaddr, toaddr, subject, content) return body def format_frequency(freq): frequency = freq if frequency == "day": return 'daily' else: return frequency + 'ly' def print_records(record_ids): global MAXIDS msg = '' c = 1 for i in record_ids: if c > MAXIDS: break msg += '\n\n%s) %s' % (c, get_as_text(i)) c += 1 if c > MAXIDS: msg += '\n\n' + wrap('Only the first %s records were displayed. Please consult the search URL given at the top of this email to see all the results.' % MAXIDS) return msg def email_notify(alert, records, argstr): global FROMADDR global ALERTURL global DEBUGLEVEL global DEVELOPERADDR if len(records) == 0: return msg = "" if DEBUGLEVEL > 0: msg = "*** THIS MESSAGE WAS SENT IN DEBUG MODE ***\n\n" msg += "Hello\n\n" msg += wrap("Below are the results of the email alert that you set up with the CERN Document Server. This is an automatic message, please don't reply to its address. For any question, use <%s> instead." % supportemail) email = get_email(alert[0]) url = weburl + "/search.py?" + argstr pattern = get_pattern(argstr) catalogue = get_catalogue(argstr) catword = 'collection' if get_catalogue_num(argstr) > 1: catword += 's' time = strftime("%Y-%m-%d") msg += '\n' + wrap('alert name: %s' % alert[5]) if pattern: msg += wrap('pattern: %s' % pattern) if catalogue: msg += wrap('%s: %s' % (catword, catalogue)) msg += wrap('frequency: %s ' % format_frequency(alert[3])) msg += wrap('run time: %s ' % strftime("%a %Y-%m-%d %H:%M:%S")) recword = 'record' if len(records) > 1: recword += 's' msg += wrap('found: %s %s' % (len(records), recword)) msg += "url: <%s/search.py?%s>\n" % (weburl, argstr) msg += wrap_records(print_records(records)) msg += "\n-- \nCERN Document Server Alert Service <%s>\nUnsubscribe? See <%s>\nNeed human intervention? Contact <%s>" % (weburl, ALERTURL, supportemail) subject = 'Alert %s run on %s' % (alert[5], time) body = forge_email(FROMADDR, email, subject, msg) if DEBUGLEVEL > 0: print "********************************************************************************" print body print "********************************************************************************" if DEBUGLEVEL < 2: send_email(FROMADDR, email, body) if DEBUGLEVEL == 4: for a in DEVELOPERADDR: send_email(FROMADDR, a, body) def get_argument(args, argname): if args.has_key(argname): return args[argname] else: return [] def get_record_ids(argstr, date_from, date_until): args = parse_qs(argstr) p = get_argument(args, 'p') c = get_argument(args, 'c') cc = get_argument(args, 'cc') as = get_argument(args, 'as') f = get_argument(args, 'f') rg = get_argument(args, 'rg') so = get_argument(args, 'so') sp = get_argument(args, 'sp') ot = get_argument(args, 'ot') as = get_argument(args, 'as') p1 = get_argument(args, 'p1') f1 = get_argument(args, 'f1') m1 = get_argument(args, 'm1') op1 = get_argument(args, 'op1') p2 = get_argument(args, 'p2') f2 = get_argument(args, 'f2') m2 = get_argument(args, 'm2') op2 = get_argument(args, 'op2') p3 = get_argument(args, 'p3') f3 = get_argument(args, 'f3') m3 = get_argument(args, 'm3') sc = get_argument(args, 'sc') # search = get_argument(args, 'search') d1y, d1m, d1d = date_from d2y, d2m, d2d = date_until return perform_request_search(of='id', p=p, c=c, cc=cc, f=f, so=so, sp=sp, ot=ot, as=as, p1=p1, f1=f1, m1=m1, op1=op1, p2=p2, f2=f2, m2=m2, op2=op2, p3=p3, f3=f3, m3=m3, sc=sc, d1y=d1y, d1m=d1m, d1d=d1d, d2y=d2y, d2m=d2m, d2d=d2d) def get_argument_as_string(argstr, argname): args = parse_qs(argstr) a = get_argument(args, argname) r = '' if len(a): r = a[0] for i in a[1:len(a)]: r += ", %s" % i return r def get_pattern(argstr): return get_argument_as_string(argstr, 'p') def get_catalogue(argstr): return get_argument_as_string(argstr, 'c') def get_catalogue_num(argstr): args = parse_qs(argstr) a = get_argument(args, 'c') return len(a) def get_date_from(time, freq): t = mktime(time) if freq == 'day': time2 = localtime(t - 86400) elif freq == 'month': m = time[1] - 1 y = time[0] if m == 0: m = 12 y -= 1 time2 = (y, m, time[2], time[3], time[4], time[5], time[6], time[7], time[8]) elif freq == 'week': time2 = localtime(t - 604800) ystr = strftime("%Y", time2) mstr = strftime("%m", time2) dstr = strftime("%d", time2) return (ystr, mstr, dstr) def run_query(query, frequency): """Return a dictionary containing the information of the performed query. The information contains the id of the query, the arguments as a string, and the list of found records.""" time = localtime() # Override time here for testing purposes (beware of localtime offset): #time = (2002, 12, 21, 2, 0, 0, 2, 120, 1) # Override frequency here for testing #frequency = 'week' ystr = strftime("%Y", time) mstr = strftime("%m", time) dstr = strftime("%d", time) date_until = (ystr, mstr, dstr) date_from = get_date_from(time, frequency) recs = get_record_ids(query[1], date_from, date_until) n = len(recs) if n: log('query %08s produced %08s records' % (query[0], len(recs))) if DEBUGLEVEL > 2: print "[%s] run query: %s with dates: from=%s, until=%s\n found rec ids: %s" % (strftime("%c"), query, date_from, date_until, recs) return {'id_query': query[0], 'argstr': query[1], 'records': recs, 'date_from': date_from, 'date_until': date_until} def process_alert_queries(frequency): """Run the alerts according to the frequency. Retrieves the queries for which an alert exists, performs it, and processes the corresponding alerts.""" alert_queries = get_alert_queries(frequency) for aq in alert_queries: q = run_query(aq, frequency) alerts = get_alerts(q, frequency) process_alerts(alerts) def replace_argument(argstr, argname, argval): """Replace the given date argument value with the new one. If the argument is missing, it is added.""" if search('%s=\d+' % argname, argstr): r = sub('%s=\d+' % argname, '%s=%s' % (argname, argval), argstr) else: r = argstr + '&%s=%s' % (argname, argval) return r def update_arguments(argstr, date_from, date_until): """Replace date arguments in argstr with the ones specified by date_from and date_until. Absent arguments are added.""" d1y, d1m, d1d = date_from d2y, d2m, d2d = date_until r = replace_argument(argstr, 'd1y', d1y) r = replace_argument(r, 'd1m', d1m) r = replace_argument(r, 'd1d', d1d) r = replace_argument(r, 'd2y', d2y) r = replace_argument(r, 'd2m', d2m) r = replace_argument(r, 'd2d', d2d) return r def log(msg): try: log = open(logdir + '/alertengine.log', 'a') log.write(strftime('%04Y%02m%02d%02H%02M%02S#')) log.write(msg + '\n') log.close() except: pass def process_alerts(alerts): # TBD: do not generate the email each time, forge it once and then # send it to all appropriate people for a in alerts['alerts']: if alert_use_basket_p(a): add_records_to_basket(alerts['records'], a[2]) if alert_use_notification_p(a): argstr = update_arguments(alerts['argstr'], alerts['date_from'], alerts['date_until']) email_notify(a, alerts['records'], argstr) update_date_lastrun(a) def alert_use_basket_p(alert): return alert[2] != 0 def alert_use_notification_p(alert): return alert[6] == 'y' def run_alerts(): """Run the alerts. First decide which alerts to run according to the current local time, and runs them.""" t = localtime() if t[2] == 1: # first of the month process_alert_queries('month') t = strftime("%A") if t == 'Monday': # first day of the week process_alert_queries('week') process_alert_queries('day') def process_alert_queries_for_user(uid): """Process the alerts for the given user id. All alerts are with reference date set as the current local time.""" alert_queries = get_alert_queries_for_user(uid) print alert_queries for aq in alert_queries: frequency = aq[2] q = run_query(aq, frequency) alerts = get_alerts(q, frequency) process_alerts(alerts) if __name__ == '__main__': process_alert_queries_for_user(2571422) # erik process_alert_queries_for_user(109) # tibor # process_alert_queries_for_user(11040) # jean-yves diff --git a/modules/webalert/lib/htmlparser.py b/modules/webalert/lib/htmlparser.py index 57a710cbe..75162f3a5 100644 --- a/modules/webalert/lib/htmlparser.py +++ b/modules/webalert/lib/htmlparser.py @@ -1,125 +1,126 @@ ## $Id$ ## HTML parser for records. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """HTML parser for records.""" ## rest of the Python code goes below __version__ = "$Id$" -from config import * -from search_engine import print_record from HTMLParser import HTMLParser -import textwrap from string import split +from cdsware.config import * +from cdsware.search_engine import print_record +from cdsware import textwrap + WRAPWIDTH = 72 def wrap(text): global WRAPWIDTH lines = textwrap.wrap(text, WRAPWIDTH) r = '' for l in lines: r += l + '\n' return r def wrap_records(text): global WRAPWIDTH lines = split(text, '\n') result = '' for l in lines: newlines = textwrap.wrap(l, WRAPWIDTH) for ll in newlines: result += ll + '\n' return result class RecordHTMLParser(HTMLParser): """A parser for the HTML returned by cdsware.search_engine.print_record. The parser provides methods to transform the HTML returned by cdsware.search_engine.print_record into plain text, with some minor formatting. """ def __init__(self): HTMLParser.__init__(self) self.result = '' def handle_starttag(self, tag, attrs): if tag == 'strong': # self.result += '*' pass elif tag == 'a': self.printURL = 0 self.unclosedBracket = 0 for f in attrs: if f[1] == 'note': self.result += 'Fulltext : <' self.unclosedBracket = 1 if f[1] == 'moreinfo': self.result += 'Detailed record : ' self.printURL = 1 if (self.printURL == 1) and (f[0] == 'href'): self.result += '<' + f[1] + '>' elif tag == 'br': self.result += '\n' def handle_endtag(self, tag): if tag == 'strong': # self.result += '\n' pass elif tag == 'a': if self.unclosedBracket == 1: self.result += '>' self.unclosedBracket = 0 def handle_data(self, data): if data == 'Detailed record': pass else: self.result += data def handle_comment(self, data): pass def get_as_text(record_id): """Return the plain text from RecordHTMLParser of the record.""" rec = print_record(record_id) htparser = RecordHTMLParser() try: htparser.feed(rec) return htparser.result except: #htparser.close() return wrap(htparser.result + 'Detailed record: .' % record_id) if __name__ == "__main__": rec = print_record(619028) print rec print "***" print get_as_text(619028) diff --git a/modules/webalert/lib/webalert.py b/modules/webalert/lib/webalert.py index 099453ab3..e3831705b 100644 --- a/modules/webalert/lib/webalert.py +++ b/modules/webalert/lib/webalert.py @@ -1,443 +1,442 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """PERSONAL FEATURES - YOUR ALERTS""" import cgi import string import sys import time import urllib import zlib - -from config import * -from webpage import page -from dbquery import run_sql -from webuser import getUid, isGuestUser -from webaccount import warning_guest_user -from webbasket import perform_create_basket, BasketNameAlreadyExists from mod_python import apache -from messages import gettext_set_language +from cdsware.config import * +from cdsware.webpage import page +from cdsware.dbquery import run_sql +from cdsware.webuser import getUid, isGuestUser +from cdsware.webaccount import warning_guest_user +from cdsware.webbasket import perform_create_basket, BasketNameAlreadyExists +from cdsware.messages import gettext_set_language -import template -webalert_templates = template.load('webalert') +import cdsware.template +webalert_templates = cdsware.template.load('webalert') ### IMPLEMENTATION class AlertError(Exception): pass def check_alert_name(alert_name, uid, ln = cdslang): #check this user does not have another alert with this name sql = """select id_query from user_query_basket where id_user=%s and alert_name='%s'"""%(uid, alert_name.strip()) res = run_sql( sql ) # load the right message language _ = gettext_set_language(ln) if len( run_sql( sql ) ) > 0: raise AlertError( _("You already have an alert which name is %(name)s") % {'name' : alert_name} ) def get_textual_query_info_from_urlargs(urlargs, ln = cdslang): """Return nicely formatted search pattern and catalogue from urlargs of the search query. Suitable for 'your searches' display.""" out = "" args = cgi.parse_qs(urlargs) return webalert_templates.tmpl_textual_query_info_from_urlargs( ln = ln, args = args, ) return out # perform_display(): display the searches performed by the current user # input: default permanent="n"; permanent="y" display permanent queries(most popular) # output: list of searches in formatted html def perform_display(permanent,uid, ln = cdslang): # set variables out = "" id_user = uid # XXX # load the right message language _ = gettext_set_language(ln) # first detect number of queries: nb_queries_total = 0 nb_queries_distinct = 0 id_queries_distinct = [] res = run_sql("SELECT COUNT(*),COUNT(DISTINCT(id_query)) FROM user_query WHERE id_user=%s", (uid,), 1) try: nb_queries_total = res[0][0] nb_queries_distinct = res[0][1] except: pass # query for queries: if permanent == "n": SQL_query = "SELECT DISTINCT(q.id),q.urlargs "\ "FROM query q, user_query uq "\ "WHERE uq.id_user='%s' "\ "AND uq.id_query=q.id "\ "ORDER BY q.id DESC" % id_user else: # permanent="y" SQL_query = "SELECT q.id,q.urlargs "\ "FROM query q "\ "WHERE q.type='p'" query_result = run_sql(SQL_query) queries = [] if len(query_result) > 0: for row in query_result : if permanent == "n": res = run_sql("SELECT DATE_FORMAT(MAX(date),'%%Y-%%m-%%d %%T') FROM user_query WHERE id_user=%s and id_query=%s", (id_user, row[0])) try: lastrun = res[0][0] except: lastrun = _("unknown") else: lastrun = "" queries.append({ 'id' : row[0], 'args' : row[1], 'textargs' : get_textual_query_info_from_urlargs(row[1], ln = ln), 'lastrun' : lastrun, }) return webalert_templates.tmpl_display_alerts( ln = ln, permanent = permanent, nb_queries_total = nb_queries_total, nb_queries_distinct = nb_queries_distinct, queries = queries, guest = isGuestUser(uid), guesttxt = warning_guest_user(type="alerts", ln = ln), weburl = weburl ) # perform_input_alert: get the alert settings # input: action="add" for a new alert (blank form), action="modify" for an update (get old values) # id_query id the identifier of the search to be alerted # for the "modify" action specify old alert_name, frequency of checking, e-mail notification and basket id. # output: alert settings input form def perform_input_alert(action, id_query, alert_name, frequency, notification, id_basket,uid, old_id_basket=None, ln = cdslang): # set variables out = "" frequency_month = frequency frequency_week = "" frequency_day = "" notification_yes = "" notification_no = "" id_user = uid # XXX # display query information res = run_sql("SELECT urlargs FROM query WHERE id=%s", (id_query,)) try: urlargs = res[0][0] except: urlargs = "UNKNOWN" SQL_query = "SELECT b.id, b.name FROM basket b,user_basket ub "\ "WHERE ub.id_user='%s' AND ub.id_basket=b.id ORDER BY b.name ASC" % id_user query_result = run_sql(SQL_query) baskets = [] for row in query_result : baskets.append({ 'id' : row[0], 'name' : row[1], }) return webalert_templates.tmpl_input_alert( ln = ln, query = get_textual_query_info_from_urlargs(urlargs, ln = ln), action = action, frequency = frequency, notification = notification, alert_name = alert_name, baskets = baskets, old_id_basket = old_id_basket, id_basket = id_basket, id_query = id_query, ) def check_alert_is_unique( id_basket, id_query, uid, ln = cdslang ): #check the user does not have another alert for the specied query # and basket sql = """select id_query from user_query_basket where id_user = %s and id_query = %s and id_basket= %s"""%(uid, id_query, id_basket) res = run_sql( sql ) if len( run_sql( sql ) ) > 0: raise AlertError(_("You already have an alert defined for the specified query and basket")) # perform_add_alert: add an alert to the database # input: the name of the new alert; # alert frequency: 'month', 'week' or 'day'; # setting for e-mail notification: 'y' for yes, 'n' for no; # basket identifier: 'no' for no basket; # new basket name for this alert; # identifier of the query to be alerted # output: confirmation message + the list of alerts Web page def perform_add_alert(alert_name, frequency, notification, id_basket, new_basket_name, id_query,uid, ln = cdslang): # set variables out = "" id_user=uid # XXX alert_name = alert_name.strip() # load the right message language _ = gettext_set_language(ln) #check the alert name is not empty if alert_name.strip() == "": raise AlertError(_("The alert name cannot be empty.")) #check if the alert can be created check_alert_name( alert_name, uid, ln) check_alert_is_unique( id_basket, id_query, uid, ln) # set the basket identifier if new_basket_name != "": # create a new basket try: id_basket = perform_create_basket(uid, new_basket_name, ln) basket_created = 1 out += _("""The private basket %(name)s has been created.""") % {'name' : new_basket_name }+ """
""" except BasketNameAlreadyExists, e: basket_created = 0 out += _("You already have a basket which name is '%s'") % basket_name # add a row to the alerts table: user_query_basket SQL_query = "INSERT INTO user_query_basket (id_user, id_query, id_basket, frequency, date_creation, date_lastrun, alert_name, notification) "\ "VALUES ('%s','%s','%s','%s','%s','','%s','%s') " \ % (id_user, id_query, id_basket, frequency,time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()), alert_name, notification) query_result = run_sql(SQL_query) out += _("""The alert %s has been added to your profile.""") % alert_name + """

""" out += perform_list_alerts(uid, ln = ln) return out # perform_list_alerts display the list of alerts for the connected user def perform_list_alerts (uid, ln = cdslang): # set variables out = "" id_user = uid # XXX # query the database SQL_query = """ SELECT q.id, q.urlargs, a.id_user, a.id_query, a.id_basket, a.alert_name, a.frequency, a.notification, DATE_FORMAT(a.date_creation,'%%d %%b %%Y'), DATE_FORMAT(a.date_lastrun,'%%d %%b %%Y'), a.id_basket FROM query q, user_query_basket a WHERE a.id_user='%s' AND a.id_query=q.id ORDER BY a.alert_name ASC """ % id_user query_result = run_sql(SQL_query) alerts = [] if len(query_result) > 0: for row in query_result : sql = "select name from basket where id=%s"%row[10] res = run_sql(sql) if res: basket_name=res[0][0] else: basket_name="" alerts.append({ 'queryid' : row[0], 'queryargs' : row[1], 'textargs' : get_textual_query_info_from_urlargs(row[1], ln = ln), 'userid' : row[2], 'basketid' : row[4], 'basketname' : basket_name, 'alertname' : row[5], 'frequency' : row[6], 'notification' : row[7], 'created' : row[8], 'lastrun' : row[9], }) # link to the "add new alert" form out = webalert_templates.tmpl_list_alerts( ln = ln, weburl = weburl, alerts = alerts, guest = isGuestUser(uid), guesttxt = warning_guest_user(type="alerts", ln = ln), ) return out # perform_remove_alert: remove an alert from the database # input: identifier of the user; # identifier of the query; # identifier of the basket # output: confirmation message + the list of alerts Web page def perform_remove_alert( alert_name, id_user, id_query, id_basket,uid, ln = cdslang): # set variables out = "" # remove a row from the alerts table: user_query_basket SQL_query = "DELETE FROM user_query_basket "\ "WHERE id_user='%s' AND id_query='%s' AND id_basket='%s'" \ % (id_user, id_query, id_basket) query_result = run_sql(SQL_query) out += """The alert %s has been removed from your profile.

\n""" % alert_name out += perform_list_alerts(uid) return out # perform_update_alert: update alert settings into the database # input: the name of the new alert; # alert frequency: 'month', 'week' or 'day'; # setting for e-mail notification: 'y' for yes, 'n' for no; # new basket identifier: 'no' for no basket; # new basket name for this alert; # identifier of the query to be alerted # old identifier of the basket associated to the alert # output: confirmation message + the list of alerts Web page def perform_update_alert(alert_name, frequency, notification, id_basket, new_basket_name, id_query, old_id_basket,uid, ln = cdslang): #set variables out = "" id_user = uid # XXX # load the right message language _ = gettext_set_language(ln) #check the alert name is not empty if alert_name.strip() == "": raise AlertError(_("The alert name cannot be empty.")) #check if the alert can be created sql = """select alert_name from user_query_basket where id_user=%s and id_basket=%s and id_query=%s"""%( uid, old_id_basket, id_query ) old_alert_name = run_sql( sql )[0][0] if old_alert_name.strip()!="" and old_alert_name != alert_name: check_alert_name( alert_name, uid, ln) if id_basket != old_id_basket: check_alert_is_unique( id_basket, id_query, uid, ln) # set the basket identifier if new_basket_name != "": # create a new basket try: id_basket = perform_create_basket(uid, new_basket_name, ln) basket_created = 1 out += _("""The private basket %(name)s has been created.""") % {'name' : new_basket_name }+ """
""" except BasketNameAlreadyExists, e: basket_created = 0 out += _("You already have a basket which name is '%s'") % basket_name # update a row into the alerts table: user_query_basket SQL_query = "UPDATE user_query_basket "\ "SET alert_name='%s',frequency='%s',notification='%s',date_creation='%s',date_lastrun='',id_basket='%s' "\ "WHERE id_user='%s' AND id_query='%s' AND id_basket='%s'" \ % (alert_name,frequency,notification,time.strftime("%Y-%m-%d %H:%M:%S",time.localtime()),id_basket,id_user,id_query,old_id_basket) #date_creation #date_lastrun query_result = run_sql(SQL_query) out += _("""The alert %s has been successfully updated.""") % alert_name + """

""" out += perform_list_alerts(uid) return out def is_selected(var, fld): "Checks if the two are equal, and if yes, returns ' selected'. Useful for select boxes." if var == fld: return " selected" else: return "" # account_list_alerts: list alert for the account page # input: the user id # id_alert: the identifier of the alert # id_basket: the identifier of the basket, to access to the alert # new basket identifier: 'no' for no basket; # new basket name for this alert; # identifier of the query to be alerted # output: the list of alerts Web page def account_list_alerts(uid, action="", id_alert=0,id_basket=0,old_id_basket=0,newname="",value="", ln = "en"): i=0 id_user = uid # XXX out = "" SQL_query = """ SELECT q.id, q.urlargs, a.id_user, a.id_query, a.id_basket, a.alert_name, a.frequency, a.notification, DATE_FORMAT(a.date_creation,'%%d %%b %%Y'), DATE_FORMAT(a.date_lastrun,'%%d %%b %%Y'), a.id_basket FROM query q, user_query_basket a WHERE a.id_user='%s' AND a.id_query=q.id ORDER BY a.alert_name ASC """ % id_user query_result = run_sql(SQL_query) alerts = [] if len(query_result) > 0 : for row in query_result : alerts.append({ 'id' : row[0], 'name' : row[5] }) return webalert_templates.tmpl_account_list_alerts( ln = ln, alerts = alerts, ) # account_list_searches: list the searches of the user # input: the user id # output: resume of the searches def account_list_searches(uid, ln = "en"): out ="" # first detect number of queries: nb_queries_total = 0 nb_queries_distinct = 0 id_queries_distinct = [] res = run_sql("SELECT COUNT(*),COUNT(DISTINCT(id_query)) FROM user_query WHERE id_user=%s", (uid,), 1) try: nb_queries_total = res[0][0] nb_queries_distinct = res[0][1] except: pass # load the right message language _ = gettext_set_language(ln) out += _(""" You have made %(number)s queries. A %(detailed_list)s is available with a posibility to (a) view search results and (b) subscribe for automatic email alerting service for these queries""") % { 'detailed_list' : """""" + _("detailed list") + """""", 'number' : nb_queries_total, } return out diff --git a/modules/webalert/lib/webalert_templates.py b/modules/webalert/lib/webalert_templates.py index 0fdd31aa9..cf1a94a55 100644 --- a/modules/webalert/lib/webalert_templates.py +++ b/modules/webalert/lib/webalert_templates.py @@ -1,465 +1,465 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import urllib import time import cgi import gettext import string import locale import re import operator -from config import * +from cdsware.config import * from cdsware.messages import gettext_set_language class Template: def tmpl_errorMsg(self, ln, error_msg, rest = ""): """ Adds an error message to the output Parameters: - 'ln' *string* - The language to display the interface in - 'error_msg' *string* - The error message - 'rest' *string* - The rest of the page """ # load the right message language _ = gettext_set_language(ln) out = """
%(error)s

%(rest)s""" % { 'error' : error_msg, 'rest' : rest } return out def tmpl_textual_query_info_from_urlargs(self, ln, args): """ Displays a human inteligible textual representation of a query Parameters: - 'ln' *string* - The language to display the interface in - 'args' *array* - The URL arguments array (parsed) """ # load the right message language _ = gettext_set_language(ln) out = "" if args.has_key('p'): out += "" + _("Pattern") + ": " + string.join(args['p'], "; ") + "
" if args.has_key('f'): out += "" + _("Field") + ": " + string.join(args['f'], "; ") + "
" if args.has_key('p1'): out += "" + _("Pattern 1") + ": " + string.join(args['p1'], "; ") + "
" if args.has_key('f1'): out += "" + _("Field 1") + ": " + string.join(args['f1'], "; ") + "
" if args.has_key('p2'): out += "" + _("Pattern 2") + ": " + string.join(args['p2'], "; ") + "
" if args.has_key('f2'): out += "" + _("Field 2") + ": " + string.join(args['f2'], "; ") + "
" if args.has_key('p3'): out += "" + _("Pattern 3") + ": " + string.join(args['p3'], "; ") + "
" if args.has_key('f3'): out += "" + _("Field 3") + ": " + string.join(args['f3'], "; ") + "
" if args.has_key('c'): out += "" + _("Collections") + ": " + string.join(args['c'], "; ") + "
" elif args.has_key('cc'): out += "" + _("Collection") + ": " + string.join(args['cc'], "; ") + "
" return out def tmpl_account_list_alerts(self, ln, alerts): """ Displays all the alerts in the main "Your account" page Parameters: - 'ln' *string* - The language to display the interface in - 'alerts' *array* - The existing alerts IDs ('id' + 'name' pairs) """ # load the right message language _ = gettext_set_language(ln) out = """
%(you_own)s  
""" % { 'show' : _("SHOW"), } return out def tmpl_input_alert(self, ln, query, alert_name, action, frequency, notification, baskets, old_id_basket, id_basket, id_query): """ Displays an alert adding form. Parameters: - 'ln' *string* - The language to display the interface in - 'query' *string* - The HTML code of the textual representation of the query (as returned ultimately by tmpl_textual_query_info_from_urlargs...) - 'alert_name' *string* - The alert name - 'action' *string* - The action to complete ('update' or 'add') - 'frequency' *string* - The frequency of alert running ('day', 'week', 'month') - 'notification' *string* - If notification should be sent by email ('y', 'n') - 'baskets' *array* - The existing baskets ('id' + 'name' pairs) - 'old_id_basket' *string* - The id of the previous basket of this alert - 'id_basket' *string* - The id of the basket of this alert - 'id_query' *string* - The id of the query associated to this alert """ # load the right message language _ = gettext_set_language(ln) out = "" out += """
%(notify_cond)s
   %(query_text)s: %(query)s
""" % { 'notify_cond' : _("This alert will notify you each time/only if a new item satisfy the following query"), 'query_text' : _("QUERY"), 'query' : query, } out += """
%(alert_name)s
%(freq)s
%(send_email)s (%(specify)s) 
%(store_basket)s
%(insert_basket)s

 
""" % { 'insert_basket' : _("or insert a new basket name"), 'idq' : id_query, 'set_alert' : _("SET ALERT"), 'clear_data' : _("CLEAR DATA"), } if action == "update": out += """""" % old_id_basket out += "
" return out def tmpl_list_alerts(self, ln, weburl, alerts, guest, guesttxt): """ Displays the list of alerts Parameters: - 'ln' *string* - The language to display the interface in - 'weburl' *string* - The url of cdsware - 'alerts' *array* - The existing alerts: - 'queryid' *string* - The id of the associated query - 'queryargs' *string* - The query string - 'textargs' *string* - The textual description of the query string - 'userid' *string* - The user id - 'basketid' *string* - The basket id - 'basketname' *string* - The basket name - 'alertname' *string* - The alert name - 'frequency' *string* - The frequency of alert running ('day', 'week', 'month') - 'notification' *string* - If notification should be sent by email ('y', 'n') - 'created' *string* - The date of alert creation - 'lastrun' *string* - The last running date - 'guest' *bool* - If the user is a guest user - 'guesttxt' *string* - The HTML content of the warning box for guest users (produced by webaccount.tmpl_warning_guest_user) """ # load the right message language _ = gettext_set_language(ln) out = """

%(set_new_alert)s

""" % { 'set_new_alert' : _("Set a new alert from %(your_searches)s, the %(popular_searches)s or the input form.") % { 'your_searches' : """%s""" % _("your searches"), 'popular_searches' : """%s""" % _("most popular searches"), } } if len(alerts): out += """""" % { 'no' : _("No"), 'name' : _("Name"), 'search_freq' : _("Search checking frequency"), 'notification' : _("Notification by e-mail"), 'result_basket' : _("Result in basket"), 'date_run' : _("Date last run"), 'date_created' : _("Creation date"), 'query' : _("Query"), 'action' : _("Action"), } i = 0 for alert in alerts: i += 1 if alert['frequency'] == "day": frequency = _("daily"), else: if alert['frequency'] == "week": frequency = _("weekly") else: frequency = _("monthly") if alert['notification'] == "y": notification = _("yes") else: notification = _("no") out += """""" % { 'index' : i, 'alertname' : alert['alertname'], 'frequency' : frequency, 'notification' : notification, 'basketname' : alert['basketname'], 'lastrun' : alert['lastrun'], 'created' : alert['created'], 'textargs' : alert['textargs'], 'userid' : alert['userid'], 'queryid' : alert['queryid'], 'basketid' : alert['basketid'], 'freq' : alert['frequency'], 'notif' : alert['notification'], 'remove' : _("Remove"), 'modify' : _("Modify"), 'weburl' : weburl, 'search' : _("Execute search"), 'queryargs' : alert['queryargs'] } out += '
%(no)s %(name)s %(search_freq)s %(notification)s %(result_basket)s %(date_run)s %(date_created)s %(query)s %(action)s
#%(index)d %(alertname)s %(frequency)s %(notification)s %(basketname)s %(lastrun)s %(created)s %(textargs)s %(remove)s
%(modify)s
%(search)s
' out += """

%(defined)s

""" % { 'defined' : _("You have defined %(number)s alerts.") % { 'number' : len(alerts)} } if guest: out += guesttxt return out def tmpl_display_alerts(self, ln, weburl, permanent, nb_queries_total, nb_queries_distinct, queries, guest, guesttxt): """ Displays the list of alerts Parameters: - 'ln' *string* - The language to display the interface in - 'weburl' *string* - The url of cdsware - 'permanent' *string* - If displaying most popular searches ('y') or only personal searches ('n') - 'nb_queries_total' *string* - The number of personal queries in the last period - 'nb_queries_distinct' *string* - The number of distinct queries in the last period - 'queries' *array* - The existing queries: - 'id' *string* - The id of the associated query - 'args' *string* - The query string - 'textargs' *string* - The textual description of the query string - 'lastrun' *string* - The last running date (only for personal queries) - 'guest' *bool* - If the user is a guest user - 'guesttxt' *string* - The HTML content of the warning box for guest users (produced by webaccount.tmpl_warning_guest_user) """ # load the right message language _ = gettext_set_language(ln) if len(queries) == 0: return _("You have not executed any search yet. %(click_here)s for search.") % { 'click_here' : """%(click)s""" % { 'weburl' : weburl, 'click' : _("Click here"), } } out = '' # display message: number of items in the list if permanent=="n": out += """

""" + _("You have performed %(number)d searches (%(different)d different questions) during the last 30 days or so.""") % { 'number' : nb_queries_total, 'different' : nb_queries_distinct } + """

""" else: # permanent="y" out += """

Here are listed the %s most popular searches.

""" % len(query_result) # display the list of searches out += """""" % { 'no' : _("#"), 'question' : _("Question"), 'action' : _("Action") } if permanent=="n": out += """""" % _("Last Run") out += """\n""" i = 0 for query in queries : i += 1 # id, pattern, base, search url and search set alert, date out += """""" % { 'index' : i, 'textargs' : query['textargs'], 'weburl' : weburl, 'args' : query['args'], 'id' : query['id'], 'execute_query' : _("Execute search"), 'set_alert' : _("Set new alert") } if permanent=="n": out += """""" % query out += """\n""" out += """
%(no)s%(question)s %(action)s%s
#%(index)d %(textargs)s %(execute_query)s
%(set_alert)s
%(lastrun)s

\n""" if guest : out += guesttxt return out diff --git a/modules/webalert/web/youralerts.py b/modules/webalert/web/youralerts.py index 582607e1a..3acea4575 100644 --- a/modules/webalert/web/youralerts.py +++ b/modules/webalert/web/youralerts.py @@ -1,227 +1,226 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """PERSONAL FEATURES - YOUR ALERTS""" __lastupdated__ = """<: print `date +"%d %b %Y %H:%M:%S %Z"`; :>""" import sys import time import zlib import urllib import time +from mod_python import apache from cdsware.config import weburl, cdslang, cdsname from cdsware.webpage import page from cdsware import webalert from cdsware.webuser import getUid, page_not_authorized -from mod_python import apache from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_SITE from cdsware.messages import gettext_set_language - import cdsware.template webalert_templates = cdsware.template.load('webalert') def relative_redirect( req, relative_url, **args ): tmp = [] for param in args.keys(): #ToDo: url encoding of the params tmp.append( "%s=%s"%( param, args[param] ) ) req.err_headers_out.add("Location", "%s/%s?%s" % (weburl, relative_url, "&".join( tmp ) )) raise apache.SERVER_RETURN, apache.HTTP_MOVED_PERMANENTLY ### CALLABLE INTERFACE def display(req, p="n", ln = cdslang): uid = getUid(req) # load the right message language _ = gettext_set_language(ln) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../youralerts.py/display") return page(title=_("Display searches"), body=webalert.perform_display(p,uid, ln = ln), navtrail= """%(account)s""" % { 'weburl' : weburl, 'account' : _("Your Account"), }, description="CDS Personalize, Display searches", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def input(req, idq, name="", freq="week", notif="y", idb=0, error_msg="", ln = cdslang): uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../youralerts.py/input") # load the right message language _ = gettext_set_language(ln) html = webalert.perform_input_alert("add", idq, name, freq, notif, idb,uid, ln = ln) if error_msg != "": html = webalert_templates.tmpl_errorMsg( ln = ln, error_msg = error_msg, rest = html, ) return page(title=_("Set a new alert"), body=html, navtrail= """%(account)s""" % { 'weburl' : weburl, 'account' : _("Your Account"), }, description="CDS Personalize, Set a new alert", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def modify(req, idq, old_idb, name="", freq="week", notif="y", idb=0, error_msg="", ln = cdslang): uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../youralerts.py/modify") # load the right message language _ = gettext_set_language(ln) html = webalert.perform_input_alert("update", idq, name, freq, notif, idb,uid, old_idb, ln = ln) if error_msg != "": html = webalert_templates.tmpl_errorMsg( ln = ln, error_msg = error_msg, rest = html, ) return page(title=_("Modify alert settings"), body=html, navtrail= """%(account)s""" % { 'weburl' : weburl, 'account' : _("Your Account"), }, description="CDS Personalize, Modify alert settings", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def list(req, ln = cdslang): uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../youralerts.py/list") # load the right message language _ = gettext_set_language(ln) return page(title=_("Display alerts"), body=webalert.perform_list_alerts(uid, ln = ln), navtrail= """%(account)s""" % { 'weburl' : weburl, 'account' : _("Your Account"), }, description="CDS Personalize, Display alerts", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def add(req, name, freq, notif, idb, bname, idq, ln = cdslang): uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../youralerts.py/add") # load the right message language _ = gettext_set_language(ln) try: html=webalert.perform_add_alert(name, freq, notif, idb, bname, idq,uid, ln = ln) except webalert.AlertError, e: return input(req, idq, name, freq, notif, idb, e, ln = ln) return page(title=_("Display alerts"), body=html, navtrail= """%(account)s""" % { 'weburl' : weburl, 'account' : _("Your Account"), }, description="CDS Personalize, Display alerts", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def update(req, name, freq, notif, idb, bname, idq, old_idb, ln = cdslang): uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../youralerts.py/update") # load the right message language _ = gettext_set_language(ln) try: html=webalert.perform_update_alert(name, freq, notif, idb, bname, idq, old_idb,uid, ln = ln) except webalert.AlertError, e: return modify(req, idq, old_idb, name, freq, notif, idb, e, ln = ln) return page(title=_("Display alerts"), body=html, navtrail= """%(account)s""" % { 'weburl' : weburl, 'account' : _("Your Account"), }, description="CDS Personalize, Display alerts", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def remove(req, name, idu, idq, idb, ln = cdslang): uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../youralerts.py/remove") # load the right message language _ = gettext_set_language(ln) return page(title=_("Display alerts"), body=webalert.perform_remove_alert(name, idu, idq, idb, uid, ln = ln), navtrail= """%(account)s""" % { 'weburl' : weburl, 'account' : _("Your Account"), }, description="CDS Personalize, Display alerts", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def errorMsg(title,req,c=cdsname,ln=cdslang): return page(title="error", body = create_error_box(req, title=title,verbose=0, ln=ln), description="%s - Internal Error" % c, keywords="%s, CDSware, Internal Error" % c, language=ln, urlargs=req.args) diff --git a/modules/webbasket/lib/webbasket.py b/modules/webbasket/lib/webbasket.py index 833ec9287..4aed86456 100644 --- a/modules/webbasket/lib/webbasket.py +++ b/modules/webbasket/lib/webbasket.py @@ -1,732 +1,731 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Web Baskets features.""" import sys import time import zlib import urllib -from config import * -from webpage import page -from dbquery import run_sql -from webuser import getUid, getDataUid,isGuestUser -from search_engine import print_record -from webaccount import warning_guest_user -imagesurl = "%s/img" % weburl - -from messages import gettext_set_language +from cdsware.config import * +from cdsware.webpage import page +from cdsware.dbquery import run_sql +from cdsware.webuser import getUid, getDataUid,isGuestUser +from cdsware.search_engine import print_record +from cdsware.webaccount import warning_guest_user -import template -webbasket_templates = template.load('webbasket') +from cdsware.messages import gettext_set_language +import cdsware.template +webbasket_templates = cdsware.template.load('webbasket') +imagesurl = "%s/img" % weburl ### IMPLEMENTATION # perform_display(): display the baskets defined by the current user # input: default action="" display the list of baskets and the content of the selected basket; # action="DELETE" delete the selected basket; # action="RENAME" modify the basket name; # action="CREATE NEW" create a new basket; # action="SET PUBLIC" set access permission to public; # action="SET PRIVATE" set access permission to private; # action="REMOVE" remove selected items from basket; # action="EXECUTE" copy/move selected items to another basket; # action="ORDER" change the order of the items in the basket; # id_basket is the identifier of the selected basket # delete_alerts='n' if releted alerts shouldn't be deleted; 'y' if yes # confirm_action="CANCEL"cancel the delete action/="CONFIRM" confirm the delete action; # bname is the old basket name for renaming # newname is the new name for renaming the basket # mark[] contains the list of identifiers of the items to be removed # to_basket is the destination basket identifier for copy or move items # copy_move="1" if copy items is requested,"2" if move items is requested # idup, ordup are the identifier and the order of the item to be moved up # iddown, orddown are the identifier and the order of the item to be moved down # of is the output format code # output: list of baskets in formatted html+content of the selected basket def perform_display(uid, action="", delete_alerts="", confirm_action="", id_basket=0, bname="", newname="", newbname="", mark=[], to_basket="", copy_move="", idup="", ordup="", iddown="", orddown="", of="hb", ln="en"): # set variables out = "" basket_name = "" public_basket="no" permission = [] bname = get_basket_name( id_basket ) messages = [] # load the right message language _ = gettext_set_language(ln) # execute the requested action if (action == _("DELETE")) and (id_basket != '0') and (id_basket != 0): if (confirm_action == _("CANCEL")) or (confirm_action == _("CONFIRM")): try: msg = perform_delete(uid, delete_alerts, confirm_action, id_basket, ln) # out += "%s
" % msg messages.append(msg) except BasketException, e: msg = _("The basket has not been deleted: %s") % e messages.append(msg) show_actions = 1 else: # goes to the form which deletes the selected basket out += delete_basket(uid, id_basket, bname, ln) basket_name = bname show_actions = 0 id_basket = '0' else: show_actions = 1 if action == _("CREATE NEW"): # create a new basket if newname != "": # create a new basket newname try: id_basket = perform_create_basket(uid, newname, ln) messages.append(_("""The private basket %s has been created.""") % newname) bname = newname except BasketException, e: messages.append(_("""The basket %s has not been created: %s""") % (newname, e)) else: messages.append(_("""The basket has not been created: specify a basket name.""")) else: if (id_basket != '0') and (id_basket != 0): if action == _("RENAME"): # rename the selected basket if newbname != "": # rename basket to newname try: id_basket = perform_rename_basket(uid, id_basket,newbname, ln) messages.append(_("""The basket %s has been renamed to %s.\n""") % (bname, newbname)) bname = newbname except BasketException, e: messages.append(_("""The basket has not been renamed: %s""") % e) else: messages.append(_("""The basket has not been renamed: specify a basket name.""")) else: if action == _("SET PUBLIC"): try: # set public permission set_permission(uid, id_basket, "y", ln) url_public_basket = """%s/yourbaskets.py/display_public?id_basket=%s""" \ % (weburl, id_basket) messages.append(_("""The selected basket is now publicly accessible at the following URL:""") + """%s

""" % (url_public_basket, url_public_basket)) except BasketException, e: messages.append(_("The basket has not been made public: %s") % e) else: if action == _("SET PRIVATE"): # set private permission try: set_permission(uid, id_basket, "n", ln) messages.append(_("""The selected basket is no more publically accessible.""")) except BasketException, e: messages.append(_("The basket has not been made private: %s") % e) else: if action == _("REMOVE"): # remove the selected items from the basket try: remove_items(uid, id_basket, mark, ln) messages.append(_("""The selected items have been removed.""")) except BasketException, e: messages.append(_("""The items have not been removed: %s""")%e) else: if action == _("EXECUTE"): # copy/move the selected items to another basket if to_basket == '0': messages.append(_("""Select a destination basket to copy/move items.""")) else: move_items(uid, id_basket, mark, to_basket, copy_move, ln) messages.append(_("""The selected items have been copied/moved.""")) else: if action == "ORDER": # change the order of the items in the basket try: order_items(uid, id_basket,idup,ordup,iddown,orddown, ln) except BasketException, e: messages.append(_("""The items have not been re-ordered: %s""") % e) # display the basket's action form if (show_actions): baskets = [] basket_permission = '' # query the database for the list of baskets query_result = run_sql("SELECT b.id, b.name, b.public, ub.date_modification "\ "FROM basket b, user_basket ub "\ "WHERE ub.id_user=%s AND b.id=ub.id_basket "\ "ORDER BY b.name ASC ", (uid,)) if len(query_result) : for row in query_result : if str(id_basket) == str(row[0]): basket_permission = row[2] baskets.append({ 'id' : row[0], 'name' : row[1], 'permission' : row[2], }) alerts = [] if ((id_basket != '0') and (id_basket != 0)): # is basket related to some alerts? alert_query_result = run_sql("SELECT alert_name FROM user_query_basket WHERE id_user=%s AND id_basket=%s", (uid, id_basket)) if len(alert_query_result): for row in alert_query_result: alerts.append(row[0]) out += webbasket_templates.tmpl_display_basket_actions( ln = ln, weburl = weburl, messages = messages, baskets = baskets, id_basket = id_basket, basket_name = bname, basket_permission = basket_permission, alerts = alerts, ) # display the content of the selected basket if ((id_basket != '0') and (id_basket != 0)): if (basket_name == ""): if (newname != ""): basket_name = newname else: if (newbname != ""): basket_name = newbname out += display_basket_content(uid, id_basket, basket_name, of, ln) # if is guest user print message of relogin if isGuestUser(uid): out += warning_guest_user(type="baskets", ln = ln) # modified it for gettext also return out # display_basket_content: display the content of the selected basket # input: the identifier of the basket # the name of the basket # output: the basket's content def display_basket_content(uid, id_basket, basket_name, of, ln): out = "" out_tmp="" # search for basket's items if (id_basket != '0') and (id_basket != 0): query_result = run_sql("SELECT br.id_record,br.nb_order "\ "FROM basket_record br "\ "WHERE br.id_basket=%s "\ "ORDER BY br.nb_order DESC ", (id_basket,)) items = [] if len(query_result) > 0: for row in query_result: items.append({ 'id' : row[0], 'order' : row[1], 'abstract' : print_record(row[0], of), }) query_result = run_sql("SELECT b.id, b.name "\ "FROM basket b, user_basket ub "\ "WHERE ub.id_user=%s AND b.id=ub.id_basket AND b.id<>%s "\ "ORDER BY b.name ASC ", (uid,id_basket)) baskets = [] if len(query_result) > 0: for row in query_result: baskets.append({ 'id' : row[0], 'name' : row[1], }) out = webbasket_templates.tmpl_display_basket_content( ln = ln, items = items, baskets = baskets, id_basket = id_basket, basket_name = basket_name, imagesurl = imagesurl, ) return out # delete_basket: present a form for the confirmation of the delete action # input: the identifier of the selected basket # the name of the selected basket # output: the information about the selected basket and the form for the confirmation of the delete action def delete_basket(uid, id_basket, basket_name, ln): # set variables out = "" alerts = [] query_result = run_sql("SELECT alert_name FROM user_query_basket WHERE id_user=%s AND id_basket=%s", (uid, id_basket)) if len(query_result): for row in query_result: alerts.append(row[0]) return webbasket_templates.tmpl_delete_basket_form( ln = ln, alerts = alerts, id_basket = id_basket, basket_name = basket_name, ) # perform_delete: present a form for the confirmation of the delete action # input: delete_alerts='n' if releted alerts shouldn't be deleted; 'y' if yes # action='YES' if delete action has been confirmed; 'NO' otherwise # id_basket contains the identifier of the selected basket # output: go back to the display baskets form with confirmation message def perform_delete(uid, delete_alerts, confirm_action, id_basket, ln): # set variables out = "" # load the right message language _ = gettext_set_language(ln) if (confirm_action == _('CONFIRM')): #check that the user which is changing the basket name is the owner of it if not is_basket_owner( uid, id_basket ): raise NotBasketOwner(_("You are not the owner of this basket")) # perform the cancellation msg = _("The selected basket has been deleted.") if (delete_alerts=='y'): # delete the related alerts, remove from the alerts table: user_query_basket query_result = run_sql("DELETE FROM user_query_basket WHERE id_user=%s AND id_basket=%s", (uid, id_basket)) msg += " " + _("The related alerts have been removed.") else: # replace the basket identifier with 0 # select the records to update query_result = run_sql("SELECT id_query,alert_name,frequency,notification,date_creation,date_lastrun "\ "FROM user_query_basket WHERE id_user=%s AND id_basket=%s", (uid, id_basket)) # update the records for row in query_result: query_result_temp = run_sql("UPDATE user_query_basket "\ "SET alert_name=%s,frequency=%s,notification=%s,"\ "date_creation=%s,date_lastrun=%s,id_basket='0' "\ "WHERE id_user=%s AND id_query=%s AND id_basket=%s", (row[1],row[2],row[3],row[4],row[5],uid,row[0],id_basket)) # delete the relation with the user table query_result = run_sql("DELETE FROM user_basket WHERE id_user=%s AND id_basket=%s", (uid, id_basket)) # delete the basket information query_result = run_sql("DELETE FROM basket WHERE id=%s", (id_basket,)) # delete the basket content query_result = run_sql("DELETE FROM basket_record WHERE id_basket=%s", (id_basket,)) else: msg="" return msg # perform_rename_basket: rename an existing basket # input: basket identifier, basket new name # output: basket identifier def perform_rename_basket(uid, id_basket, newname, ln): # load the right message language _ = gettext_set_language(ln) # check that there's no basket owned by this user with the same name if has_user_basket( uid, newname): raise BasketNameAlreadyExists(_("You already have a basket which name is '%s'") % newname) #check that the user which is changing the basket name is the owner of it if not is_basket_owner( uid, id_basket ): raise NotBasketOwner(_("You are not the owner of this basket")) # update a row to the basket table tmp = run_sql("UPDATE basket SET name=%s WHERE id=%s", (newname, id_basket)) return id_basket class BasketException(Exception): """base exception class for basket related errors """ pass class BasketNameAlreadyExists(BasketException): """exception which is raised when a basket already exists with a certain name for a user """ pass class NotBasketOwner(BasketException): """exception which is raised when a user which is not the owner of a basket tries to perform an operation over it for which he has no privileges """ pass def has_user_basket(uid, basket_name): """checks if a user (uid) already has a basket which name is 'basket_name' (case-sensitive) """ return run_sql("select b.id from basket b, user_basket ub where ub.id_user=%s and b.id=ub.id_basket and b.name=%s", (uid, basket_name.strip())) def is_basket_owner(uid, bid): """checks whether or not the user (uid) is owner for the indicated basket (bid) """ return run_sql("select id_basket from user_basket where id_user=%s and id_basket=%s", (uid, bid)) def get_basket_name(bid): """returns the name of the basket corresponding to the given id """ res = run_sql("select name from basket where id=%s", (bid,)) if not res: return "" return res[0][0] # perform_create_basket: create a new basket and the relation with the user table # input: basket name # output: basket identifier def perform_create_basket(uid, basket_name, ln): # load the right message language _ = gettext_set_language(ln) # check that there's no basket owned by this user with the same name if has_user_basket(uid, basket_name): raise BasketNameAlreadyExists(_("You already have a basket which name is '%s'") % basket_name) # add a row to the basket table id_basket = run_sql("INSERT INTO basket(id,name,public) VALUES ('0',%s,'n')", (basket_name,)) # create the relation between the user and the basket: user_basket query_result = run_sql("INSERT INTO user_basket(id_user,id_basket,date_modification) VALUES (%s,%s,%s)", (uid, id_basket, time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))) return id_basket # basket_exists checks if a basket is in the database # input: the name of the basket # output: the id of the basket if it exists, 0 otherwise def basket_exists (basket_name, uid): id_basket = run_sql("SELECT b.id FROM basket b, user_basket ub "\ "WHERE b.name=%s "\ "AND b.id=ub.id_basket "\ "AND ub.id_user=%s", (basket_name, uid)) return id_basket # set_permission: set access permission on a basket # input: basket identifier, basket public permission # output: basket identifier def set_permission(uid, id_basket, permission, ln): # load the right message language _ = gettext_set_language(ln) #check that the user which is changing the basket name is the owner of it if not is_basket_owner( uid, id_basket ): raise NotBasketOwner(_("You are not the owner of this basket")) # update a row to the basket table id_basket = run_sql("UPDATE basket SET public=%s WHERE id=%s", (permission, id_basket)) return id_basket # remove_items: remove the selected items from the basket # input: basket identifier, list of selected items # output: basket identifier def remove_items(uid, id_basket, mark, ln): # load the right message language _ = gettext_set_language(ln) #check that the user which is changing the basket name is the owner of it if not is_basket_owner( uid, id_basket ): raise NotBasketOwner(_("You are not the owner of this basket")) if type(mark)==list: selected_items=mark else: selected_items=[mark] for i in selected_items: # delete the basket content query_result = run_sql("DELETE FROM basket_record WHERE id_basket=%s AND id_record=%s", (id_basket, i)) return id_basket # check_copy: check if the record exists already in the basket # input: basket identifier, list of selected items # output: boolean def check_copy(idbask,i): query_result = run_sql("select * from basket_record where id_basket=%s and id_record=%s", (idbask,i)) if len(query_result)>0 : return 0 return 1 # copy/move the selected items to another basket # input: original basket identifier, list of selected items, # destination basket identifier, copy or move option: "1"=copy, "2"=move #output: basket identifier def move_items(uid, id_basket, mark, to_basket, copy_move="1", ln="en"): if type(mark)==list: selected_items=mark else: selected_items=[mark] for i in selected_items: if check_copy(to_basket,i): query_result = run_sql("INSERT INTO basket_record(id_basket,id_record,nb_order) VALUES (%s,%s,'0')", (to_basket, i)) if copy_move=="2": #delete from previous basket remove_items(uid, id_basket, mark, ln) return id_basket # change the order of the items in the basket # input: basket identifier # identifiers and positions of the items to be moved #output: basket identifier def order_items(uid, id_basket,idup,ordup,iddown,orddown, ln): # load the right message language _ = gettext_set_language(ln) #check that the user which is changing the basket name is the owner of it if not is_basket_owner( uid, id_basket ): raise NotBasketOwner(_("You are not the owner of this basket")) # move up the item idup (by switching its order number with the other item): query_result = run_sql("UPDATE basket_record SET nb_order=%s WHERE id_basket=%s AND id_record=%s", (orddown,id_basket,idup)) # move down the item iddown (by switching its order number with the other item): query_result = run_sql("UPDATE basket_record SET nb_order=%s WHERE id_basket=%s AND id_record=%s", (ordup,id_basket,iddown)) return id_basket # perform_display_public: display the content of the selected basket, if public # input: the identifier of the basket # the name of the basket # of is the output format code # output: the basket's content def perform_display_public(uid, id_basket, basket_name, action, to_basket, mark, newname, of, ln = "en"): out = "" messages = [] # load the right message language _ = gettext_set_language(ln) if action == _("EXECUTE"): # perform actions if newname != "": # create a new basket try: to_basket = perform_create_basket(uid, newname, ln) messages.append(_("""The private basket %s has been created.""") % newname) except BasketException, e: messages.append(_("""The basket %s has not been created: %s""") % (newname, e)) # copy the selected items if to_basket == '0': messages.append(_("""Select a destination basket to copy the selected items.""")) else: move_items(uid, id_basket, mark, to_basket, '1', ln) messages.append(_("""The selected items have been copied.""")) # search for basket's items if (id_basket != '0') and (id_basket != 0): res = run_sql("select public from basket where id=%s", (id_basket,)) if len(res) == 0: messages.append(_("""Non existing basket""")) out += '' out += webbasket_templates.tmpl_display_messages (of = of, ln = ln, messages = messages) out += '' return out if str(res[0][0]).strip() != 'y': messages.append(_("""The basket is private""")) out += '' out += webbasket_templates.tmpl_display_messages (of = of, ln = ln, messages = messages) out += '' return out query_result = run_sql("SELECT br.id_record,br.nb_order "\ "FROM basket_record br "\ "WHERE br.id_basket=%s "\ "ORDER BY br.nb_order DESC ", (id_basket,)) # Shortcut the output in the case of XML format if of == 'xm': out = '\n' for r in query_result: out += print_record (r [0], of) out += '\n\n' return out items = [] if len(query_result) > 0: for row in query_result: items.append({ 'id' : row[0], 'order' : row[1], 'abstract' : print_record(row[0], of), }) # copy selected items to basket query_result = run_sql("SELECT b.id, b.name "\ "FROM basket b, user_basket ub "\ "WHERE ub.id_user=%s AND b.id=ub.id_basket "\ "ORDER BY b.name ASC ", (uid,)) baskets = [] if len(query_result) > 0: for row in query_result: baskets.append({ 'id' : row[0], 'name' : row[1], }) out += webbasket_templates.tmpl_display_messages (of = of, ln = ln, messages = messages) out += _("""Content of the public basket %s :""") % get_basket_name(id_basket) + "
" out += webbasket_templates.tmpl_display_public_basket_content( ln = ln, items = items, baskets = baskets, id_basket = id_basket, basket_name = basket_name, imagesurl = imagesurl, ) return out ## --- new stuff starts here --- def perform_request_add(uid=-1, recid=[], bid=[], bname=[], ln="en"): """Add records recid to baskets bid for user uid. If bid isn't set, it'll ask user into which baskets to add them. If bname is set, it'll create new basket with this name, and add records there rather than to bid.""" out = "" # load the right message language _ = gettext_set_language(ln) # wash arguments: recIDs = recid bskIDs = bid if not type(recid) is list: recIDs = [recid] if not type(bid) is list: bskIDs = [bid] # sanity checking: if recIDs == []: return _("No records to add.") # do we have to create some baskets? if bname: try: new_basket_ID = perform_create_basket(uid, bname, ln) bskIDs = [new_basket_ID] except BasketException, e: out += _("""The basket %s has not been created: %s""") % (bname, e) basket_id_name_list = get_list_of_user_baskets(uid) if len(basket_id_name_list) == 1: bskIDs = [basket_id_name_list[0][0]] if bskIDs == []: # A - some information missing, so propose list of baskets to choose from if basket_id_name_list != []: # there are some baskets; good out += webbasket_templates.tmpl_add_choose_basket( ln = ln, baskets = basket_id_name_list, recids = recIDs, ) else: out += webbasket_templates.tmpl_add_create_basket( ln = ln, recids = recIDs, ) if isGuestUser(uid): out += warning_guest_user (type = _("baskets"), ln = ln) else: # B - we have baskets IDs, so we can add records messages = [] messages.append(_("Adding %s records to basket(s)...") % len(recIDs)) for bskID in bskIDs: if is_basket_owner(uid, bskID): for recID in recIDs: try: res = run_sql("INSERT INTO basket_record(id_basket,id_record,nb_order) VALUES (%s,%s,%s)", (bskID,recID,'0')) except: pass # maybe records were already there? page reload happened? messages.append(_("...done.")) else: messages.append(_("sorry, you are not the owner of this basket.")) out += webbasket_templates.tmpl_add_messages( ln = ln, messages = messages, ) out += perform_display(uid=uid, id_basket=bskIDs[0], ln=ln) return out def get_list_of_user_baskets(uid): """Return list of lists [[basket_id, basket_name],[basket_id, basket_name],...] for the given user.""" out = [] res = run_sql("SELECT b.id, b.name "\ "FROM basket b, user_basket ub "\ "WHERE ub.id_user=%s AND b.id=ub.id_basket "\ "ORDER BY b.name ASC ", (uid,)) for row in res: out.append([row[0], row[1]]) return out def account_list_baskets(uid, action="", id_basket=0, newname="", ln="en"): out = "" # query the database for the list of baskets query_result = run_sql("SELECT b.id, b.name, b.public, ub.date_modification "\ "FROM basket b, user_basket ub "\ "WHERE ub.id_user=%s AND b.id=ub.id_basket "\ "ORDER BY b.name ASC ", (uid,)) baskets = [] if len(query_result) : for row in query_result : if str(id_basket) == str(row[0]): basket_permission = row[2] baskets.append({ 'id' : row[0], 'name' : row[1], 'permission' : row[2], 'selected' : (str(id_basket) == str(row[0])) and "selected" or "", }) return webbasket_templates.tmpl_account_list_baskets( ln = ln, baskets = baskets, ) diff --git a/modules/webbasket/lib/webbasket_templates.py b/modules/webbasket/lib/webbasket_templates.py index 921986ff5..6230fa56b 100644 --- a/modules/webbasket/lib/webbasket_templates.py +++ b/modules/webbasket/lib/webbasket_templates.py @@ -1,551 +1,551 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import urllib import time import cgi import gettext import string import locale -from config import * +from cdsware.config import * from cdsware.messages import gettext_set_language class Template: def tmpl_delete_basket_form(self, ln, alerts, id_basket, basket_name): """ Creates the form that demands confirmation for the deletion of a basket. Parameters: - 'ln' *string* - The language to display the interface in - 'alerts' *array* - An array of alerts associated to the basket - 'id_basket' *int* - The database basket id - 'basket_name' *string* - The basket display name """ # load the right message language _ = gettext_set_language(ln) # search for related alerts out = """
""" % { 'ln' : ln, } if len(alerts) == 0: out += """""" % { 'err' : _("There isn't any alert related to this basket.") } else: Msg = _("""The following alerts are related to this basket:""") i = 1 for alert in alerts: if i != 1: Msg += ", " Msg += """%s""" % alert i+=1 out += """""" % { 'alerts' : Msg } out += """""" % { 'remove_alerts' : _("Do you want to remove the related alerts too?"), 'no' : _("No"), 'yes' : _("Yes"), } # confirm delete action? yes or no out += '''
%(err)s
%(alerts)s
%(remove_alerts)s  
%(delete)s %(basket_name)s ?    
''' % { 'confirm': _("CONFIRM"), 'cancel': _("CANCEL"), 'id' : id_basket, 'basket_name' : basket_name, 'delete' : _("Delete the basket"), 'delete_action' : _("DELETE"), 'ln' : ln, } return out def tmpl_display_basket_actions(self, ln, weburl, messages, baskets, id_basket, basket_name, alerts, basket_permission): """ Displays a basket actions. Parameters: - 'ln' *string* - The language to display the interface in - 'messages' *array* - An array of messages to display to the user (confirmation/warning messages) - 'baskets' *array* - An array of all the baskets - 'id_basket' *int* - The database basket id - 'basket_name' *string* - The basket display name - 'basket_permission' *string* - If the basket is public (value=yes or not value=no) - 'alerts' *array* - An array of all the alerts associated to this basket (if existing) """ # load the right message language _ = gettext_set_language(ln) out = self.tmpl_display_messages(messages, ln) out += """
""" if len(baskets) == 0: # create new basket form out += _("""No baskets have been defined.""") + "
" out += _("""New basket name:""") + """    """ else: # display the list of baskets out += _("""You own %s baskets.""") % len(baskets) + "
" out += "" + _("""Select an existing basket""") + """ \n""" # buttons for basket's selection or creation out += """  " + _("or") +\ """  

" if ((id_basket != '0') and (id_basket != 0)): out += """ """ % { 'selected_name' : _("The selected basket is"), 'basket_name' : basket_name, 'delete' : _("DELETE"), 'rename' : _("RENAME"), } if basket_permission == 'n': public_basket="no" out += """""" % { 'is_private' : _("Basket access is set to private, convert to public?"), 'set_public' : _("SET PUBLIC"), } else: public_basket="yes" out += """ """ % { 'is_public' : _("Basket access is set to public, convert to private?"), 'set_private' : _("SET PRIVATE"), 'public_url' : _("Public URL"), 'url' : """%s/yourbaskets.py/display_public?id_basket=%s""" % (weburl, id_basket) } if len(alerts) == 0: out += """""" % { 'err' : _("There isn't any alert related to this basket.") } else: Msg = _("""The following alerts are related to this basket: """) i = 1 for alert in alerts: if i != 1: Msg += ", " Msg += """%s""" % alert i+=1 out += """""" % { 'alerts' : Msg } out += """
%(selected_name)s %(basket_name)s.
%(is_private)s
%(is_public)s
%(public_url)s: %(url)s
%(err)s
%(alerts)s
""" # hidden parameters out += """ """ % (basket_name, ln) out += """
""" return out def tmpl_display_basket_content(self, ln, items, baskets, id_basket, basket_name, imagesurl): """ Displays a basket's items and options (move/copy items) Parameters: - 'ln' *string* - The language to display the interface in - 'items' *array* - The items to display - 'baskets' *array* - An array of all the baskets - 'id_basket' *int* - The database basket id - 'basket_name' *string* - The basket display name - 'imagesurl' *string* - The URL to the images directory """ # load the right message language _ = gettext_set_language(ln) out = "" if len(items) > 0: # display the list of items out += """
""" % { 'execute' : _("EXECUTE") } out += self.tmpl_display_basket_items(ln, items, id_basket, basket_name, imagesurl) out += """
%(selected_items)s: """ % { 'selected_items' : _("Selected items"), 'remove' : _("REMOVE") } if len(baskets) > 0: out += """  %(or)s   %(to)s

""" else: out += "

" + _("""The basket %s is empty.""") % basket_name + "

" return out def tmpl_display_messages(self, messages, ln="en", of = "hd"): """ Displays a basket actions. Parameters: - 'ln' *string* - The language to display the interface in - 'messages' *array* - An array of messages to display to the user (confirmation/warning messages) """ if of and of [0] == 'x': out = '' return out def tmpl_display_public_basket_content(self, ln, items, baskets, id_basket, basket_name, imagesurl): """ Displays a public basket's items and options (copy items to other baskets) Parameters: - 'ln' *string* - The language to display the interface in - 'items' *array* - The items to display - 'baskets' *array* - An array of all the baskets - 'id_basket' *int* - The database basket id - 'basket_name' *string* - The basket display name - 'imagesurl' *string* - The URL to the images directory """ # load the right message language _ = gettext_set_language(ln) out = "" if len(items) > 0: # display the list of items out += """
""" if len(baskets) > 0: out += _("""Copy the selected items to """) +\ """" else: out += _("""Copy the selected items to new basket""") + " " out += """  """ out += """

""" % { 'execute' : _("EXECUTE") } out += self.tmpl_display_basket_items(ln, items, id_basket, basket_name, imagesurl) out += """
""" else: out += "

" + _("""The basket %s is empty.""") % basket_name + "

" return out def tmpl_display_basket_items(self, ln, items, id_basket, basket_name, imagesurl): """ Displays a basket's list of items. Parameters: - 'ln' *string* - The language to display the interface in - 'items' *array* - The items to display - 'id_basket' *int* - The database basket id - 'basket_name' *string* - The basket display name - 'imagesurl' *string* - The URL to the images directory """ out = "" i = 1 for item in items : out += """%s""" % (i, item['id']) if i == 1: out += """""" % (imagesurl) else: out += """""" % { 'bid' : id_basket, 'upid' : item['id'], 'upord' : item['order'], 'downid' : items[i - 2]['id'], 'downord' : items[i - 2]['order'], 'imgurl' : imagesurl } if i == len(items): out += """""" % (imagesurl) else: out += """""" % { 'bid' : id_basket, 'upid' : items[i]['id'], 'upord' : items[i]['order'], 'downid' : item['id'], 'downord' : item['order'], 'imgurl' : imagesurl } out += """%s """ % item['abstract'] i += 1 # hidden parameters out += """""" % (id_basket, ln) return out def tmpl_add_choose_basket(self, ln, baskets, recids): """ Displays form to add the records to the basket Parameters: - 'ln' *string* - The language to display the interface in - 'baskets' *array* - The baskets array - 'recids' *array* - The record ids to add to the basket """ out = "" # load the right message language _ = gettext_set_language(ln) out += """

%s

""" % ( _("Please choose the basket you want to add %d records to:") % len(recids), weburl ) for recid in recids: out += """""" % recid out += """
""" % { 'ln' : ln, 'add_to_basket' : _("ADD TO BASKET") } return out def tmpl_add_create_basket(self, ln, recids): """ Displays form to create a new basket (in case no baskets exist) Parameters: - 'ln' *string* - The language to display the interface in - 'recids' *array* - The record ids to add to the basket """ out = "" # load the right message language _ = gettext_set_language(ln) # user have to create a basket first out += """

%s

""" % ( _("You don't own baskets defined yet."), weburl ) for recid in recids: out += """""" % recid out += """%(new_basket)s
""" % { 'ln' : ln, 'new_basket' : _("New basket name:"), 'create_new' : _("CREATE NEW BASKET") } return out def tmpl_add_messages(self, messages, ln="en"): """ Displays the messages from the add interface. Parameters: - 'ln' *string* - The language to display the interface in - 'messages' *array* - An array of messages to display to the user (confirmation/warning messages) """ out = "

" for msg in messages: out += """%s""" % msg return out def tmpl_account_list_baskets(self, ln, baskets): """ Displays form to add the records to the basket Parameters: - 'ln' *string* - The language to display the interface in - 'baskets' *array* - The baskets array """ out = "" # load the right message language _ = gettext_set_language(ln) out += """

""" if len(baskets): out += _("You own the following baskets") +\ """ %(or)s """ % { 'select' : _("SELECT"), 'or' : _("or") } else: # create new basket form out += _("""No baskets have been defined.""") + "
" out += """

""" % { 'create' : _("CREATE NEW"), } return out diff --git a/modules/webbasket/web/yourbaskets.py b/modules/webbasket/web/yourbaskets.py index dbfc2013f..5c929a7c9 100644 --- a/modules/webbasket/web/yourbaskets.py +++ b/modules/webbasket/web/yourbaskets.py @@ -1,95 +1,95 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Web Baskets features.""" __lastupdated__ = """$Date$""" import sys import time import zlib import urllib +from mod_python import apache from cdsware.config import weburl,webdir from cdsware.webpage import page from cdsware.dbquery import run_sql from cdsware.webuser import getUid,page_not_authorized from cdsware import webbasket -from mod_python import apache from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_SITE imagesurl = "%s/img" % webdir ## rest of the Python code goes below ### CALLABLE INTERFACE def index(req): req.err_headers_out.add("Location", "%s/yourbaskets.py/display?%s" % (weburl, req.args)) raise apache.SERVER_RETURN, apache.HTTP_MOVED_PERMANENTLY def display(req, action="", title="Your Baskets", delete_alerts="", confirm_action="", id_basket=0, bname="", newname="", newbname="", mark=[], to_basket="", copy_move="", idup="", ordup="", iddown="", orddown="", of="hb"): uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../yourbaskets.py/display") if action=="DELETE": title="Delete basket" return page(title=title, body=webbasket.perform_display(uid, action, delete_alerts, confirm_action, id_basket, bname, newname, newbname, mark, to_basket, copy_move, idup, ordup, iddown, orddown, of), navtrail="""Your Account""" % weburl, description="CDS Personalize, Display baskets", keywords="CDS, personalize", uid=uid, lastupdated=__lastupdated__) def display_public(req, id_basket=0, name="", action="", to_basket="", mark=[], newname="", of="hb"): title = "Display basket" uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE == 2: return page_not_authorized(req, "../yourbaskets.py/display_public") if CFG_ACCESS_CONTROL_LEVEL_SITE >= 1 and action == "EXECUTE": return page_not_authorized(req, "../yourbaskets.py/display_public") return page(title=title, body=webbasket.perform_display_public(uid, id_basket, name, action, to_basket, mark, newname, of), navtrail="""Your Account""" % weburl, description="CDS Personalize, Display baskets", keywords="CDS, personalize", uid=uid, lastupdated=__lastupdated__) def add(req, recid=[], bid=[], bname=[]): """Add records to basket. If bid isn't set, it'll ask user into which baskets to add them. If bname is set, it'll create new basket with this name, and add records there rather than to bid.""" title = "Adding records to baskets" uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../yourbaskets.py/add") return page(title=title, body=webbasket.perform_request_add(uid, recid, bid, bname), navtrail="""Your Account > Your Baskets""" % (weburl, weburl), description="CDS Personalize, Add records to basket", keywords="CDS, personalize", uid=uid, lastupdated=__lastupdated__) diff --git a/modules/webcomment/lib/webcomment.py b/modules/webcomment/lib/webcomment.py index ad5c44c62..610d6cc9d 100644 --- a/modules/webcomment/lib/webcomment.py +++ b/modules/webcomment/lib/webcomment.py @@ -1,915 +1,913 @@ # -*- coding: utf-8 -*- ## $Id$ ## Comments and reviews for records. - + ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - -__lastupdated__ = """FIXME: last updated""" -# non CDSware imports: +__lastupdated__ = """$Date$""" + from email.Utils import quote import time import math import string from cgi import escape -# import CDSware stuff: -from webcomment_config import * -from dbquery import run_sql -from config import cdslang -from elmsubmit_html2txt import html2txt +from cdsware.webcomment_config import * +from cdsware.dbquery import run_sql +from cdsware.config import cdslang +from cdsware.elmsubmit_html2txt import html2txt -import template -webcomment_templates = template.load('webcomment') +import cdsware.template +webcomment_templates = cdsware.template.load('webcomment') def perform_request_display_comments_or_remarks(recID, ln=cdslang, display_order='od', display_since='all', nb_per_page=100, page=1, voted=-1, reported=-1, reviews=0): """ Returns all the comments (reviews) of a specific internal record or external basket record. @param recID: record id where (internal record IDs > 0) or (external basket record IDs < -100) @param display_order: hh = highest helpful score, review only lh = lowest helpful score, review only hs = highest star score, review only ls = lowest star score, review only od = oldest date nd = newest date @param display_since: all= no filtering by date nd = n days ago nw = n weeks ago nm = n months ago ny = n years ago where n is a single digit integer between 0 and 9 @param nb_per_page: number of results per page @param page: results page @param voted: boolean, active if user voted for a review, see perform_request_vote function @param reported: boolean, active if user reported a certain comment/review, perform_request_report function @param reviews: boolean, enabled if reviews, disabled for comments @return html body. """ errors = [] warnings = [] # wash arguments recID= wash_url_argument(recID, 'int') ln = wash_url_argument(ln, 'str') display_order = wash_url_argument(display_order, 'str') display_since = wash_url_argument(display_since, 'str') nb_per_page = wash_url_argument(nb_per_page, 'int') page = wash_url_argument(page, 'int') voted = wash_url_argument(voted, 'int') reported = wash_url_argument(reported, 'int') reviews = wash_url_argument(reviews, 'int') # vital argument check check_recID_is_in_range(recID, warnings, ln) # Query the database and filter results res = query_retrieve_comments_or_remarks(recID, display_order, display_since, reviews) nb_res = len(res) # chekcing non vital arguemnts - will be set to default if wrong #if page <= 0 or page.lower() != 'all': if page < 0: page = 1 warnings.append(('WRN_WEBCOMMENT_INVALID_PAGE_NB',)) if nb_per_page < 0: nb_per_page = 100 warnings.append(('WRN_WEBCOMMENT_INVALID_NB_RESULTS_PER_PAGE',)) if cfg_webcomment_allow_reviews and reviews: if display_order not in ['od', 'nd', 'hh', 'lh', 'hs', 'ls']: display_order = 'hh' warnings.append(('WRN_WEBCOMMENT_INVALID_REVIEW_DISPLAY_ORDER',)) else: if display_order not in ['od', 'nd']: display_order = 'od' warnings.append(('WRN_WEBCOMMENT_INVALID_DISPLAY_ORDER',)) # filter results according to page and number of reults per page if nb_per_page > 0: if nb_res > 0: last_page = int(math.ceil(nb_res / float(nb_per_page))) else: last_page = 1 if page > last_page: page = 1 warnings.append(("WRN_WEBCOMMENT_INVALID_PAGE_NB",)) if nb_res > nb_per_page: # if more than one page of results if page < last_page: res = res[(page-1)*(nb_per_page) : (page*nb_per_page)] else: res = res[(page-1)*(nb_per_page) : ] else: # one page of results pass else: last_page = 1 # Send to template # record is an internal record if recID > 0: avg_score = 0.0 if not cfg_webcomment_allow_comments and not cfg_webcomment_allow_reviews: # comments not allowed by admin errors.append(('ERR_WEBCOMMENT_COMMENTS_NOT_ALLOWED',)) if reported > 0: warnings.append(('WRN_WEBCOMMENT_FEEDBACK_RECORDED_GREEN_TEXT',)) elif reported == 0: warnings.append(('WRN_WEBCOMMENT_FEEDBACK_NOT_RECORDED_RED_TEXT',)) if cfg_webcomment_allow_reviews and reviews: avg_score = calculate_avg_score(res) if voted>0: warnings.append(('WRN_WEBCOMMENT_FEEDBACK_RECORDED_GREEN_TEXT',)) elif voted == 0: warnings.append(('WRN_WEBCOMMENT_FEEDBACK_NOT_RECORDED_RED_TEXT',)) body = webcomment_templates.tmpl_get_comments(recID, ln, nb_per_page, page, last_page, display_order, display_since, cfg_webcomment_allow_reviews, res, nb_res, avg_score, warnings, border=0, reviews=reviews) return (body, errors, warnings) # record is an external record else: return ("TODO", errors, warnings) #!FIXME def perform_request_vote(comID, value): """ Vote positively or negatively for a comment/review @param comID: review id @param value: +1 for voting positively -1 for voting negatively @return integer 1 if successful, integer 0 if not """ #FIXME should record IP address and not allow voters to vote more than once comID = wash_url_argument(comID, 'int') value = wash_url_argument(value, 'int') if comID > 0 and value in [-1, 1]: return query_record_useful_review(comID, value) else: return 0 def perform_request_report(comID): """ Report a comment/review for inappropriate content. Will send an email to the administrator if number of reports is a multiple of config.py/cfg_comment_nb_reports_before_send_email_to_admin @param comID: comment id @return integer 1 if successful, integer 0 if not """ #FIXME should record IP address and not allow reporters to report more than once comID = wash_url_argument(comID, 'int') if comID <= 0: return 0 (query_res, nb_abuse_reports) = query_record_report_this(comID) if query_res == 0: return 0 if nb_abuse_reports % cfg_webcomment_nb_reports_before_send_email_to_admin == 0: (comID2, id_bibrec, id_user, com_body, com_date, com_star, com_vote, com_nb_votes_total, com_title, com_reported) = query_get_comment(comID) (user_nb_abuse_reports, user_votes, user_nb_votes_total) = query_get_user_reports_and_votes(int(id_user)) (nickname, user_email, last_login) = query_get_user_contact_info(id_user) from_addr = 'CDS Alert Engine <%s>' % alertengineemail to_addr = adminemail subject = "An error report has been sent from a user" body = ''' The following comment has been reported a total of %(com_reported)s times. Author: nickname = %(nickname)s email = %(user_email)s user_id = %(uid)s This user has: total number of reports = %(user_nb_abuse_reports)s %(votes)s Comment: comment_id = %(comID)s record_id = %(id_bibrec)s date written = %(com_date)s nb reports = %(com_reported)s %(review_stuff)s body = ---start body--- %(com_body)s ---end body--- Please go to the Comments Admin Panel %(comment_admin_link)s to delete this message if necessary. A warning will be sent to the user in question.''' % \ { 'cfg-report_max' : cfg_webcomment_nb_reports_before_send_email_to_admin, 'nickname' : nickname, 'user_email' : user_email, 'uid' : id_user, 'user_nb_abuse_reports' : user_nb_abuse_reports, 'user_votes' : user_votes, 'votes' : cfg_webcomment_allow_reviews and \ "total number of positive votes\t= %s\n\t\t\t\ttotal number of negative votes\t= %s" % \ (user_votes, (user_nb_votes_total - user_votes)) or "\n", 'comID' : comID, 'id_bibrec' : id_bibrec, 'com_date' : com_date, 'com_reported' : com_reported, 'review_stuff' : cfg_webcomment_allow_reviews and \ "star score\t\t= %s\n\t\t\treview title\t\t= %s" % (com_star, com_title) or "", 'com_body' : com_body, 'comment_admin_link' : "http://%s/admin/webcomment/" % weburl, 'user_admin_link' : "user_admin_link" #! FIXME } #FIXME to be added to email #If you wish to ban the user, you can do so via the User Admin Panel %(user_admin_link)s. from alert_engine import send_email, forge_email body = forge_email(from_addr, to_addr, subject, body) send_email(from_addr, to_addr, body) return 1 def query_get_user_contact_info(uid): """ Get the user contact information @return tuple (nickname, email, last_login), if none found return () Note: for the moment, if no nickname, will return email address up to the '@' """ query1 = "SELECT email, nickname, last_login FROM user WHERE id=%s" params1 = (uid,) res1 = run_sql(query1, params1) if len(res1)==0: return () #!FIXME - extra code because still possible to have no nickname res2 = list(res1[0]) if not res2[1]: res2[1] = res2[0].split('@')[0] return (res2[1], res2[0], res2[2]) # return (res1[0][1], res1[0][0], res1[0][2]) def query_get_user_reports_and_votes(uid): """ Retrieve total number of reports and votes of a particular user @param uid: user id @return tuple (total_nb_reports, total_nb_votes_yes, total_nb_votes_total) if none found return () """ query1 = "SELECT nb_votes_yes, nb_votes_total, nb_abuse_reports FROM cmtRECORDCOMMENT WHERE id_user=%s" params1 = (uid,) res1 = run_sql(query1, params1) if len(res1)==0: return () nb_votes_yes = nb_votes_total = nb_abuse_reports = 0 for cmt_tuple in res1: nb_votes_yes += int(cmt_tuple[0]) nb_votes_total += int(cmt_tuple[1]) nb_abuse_reports += int(cmt_tuple[2]) return (nb_abuse_reports, nb_votes_yes, nb_votes_total) def query_get_comment(comID): """ Get all fields of a comment @param comID: comment id @return tuple (comID, id_bibrec, id_user, body, date_creation, star_score, nb_votes_yes, nb_votes_total, title, nb_abuse_reports) if none found return () """ query1 = "SELECT id, id_bibrec, id_user, body, date_creation, star_score, nb_votes_yes, nb_votes_total, title, nb_abuse_reports FROM cmtRECORDCOMMENT WHERE id=%s" params1 = (comID,) res1 = run_sql(query1, params1) if len(res1)>0: return res1[0] else: return () def query_record_report_this(comID): """ Increment the number of reports for a comment @param comID: comment id @return tuple (success, new_total_nb_reports_for_this_comment) where success is integer 1 if success, integer 0 if not if none found, return () """ #retrieve nb_abuse_reports query1 = "SELECT nb_abuse_reports FROM cmtRECORDCOMMENT WHERE id=%s" params1 = (comID,) res1 = run_sql(query1, params1) if len(res1)==0: return () #increment and update nb_abuse_reports = int(res1[0][0]) + 1 query2 = "UPDATE cmtRECORDCOMMENT SET nb_abuse_reports=%s WHERE id=%s" params2 = (nb_abuse_reports, comID) res2 = run_sql(query2, params2) return (int(res2), nb_abuse_reports) def query_record_useful_review(comID, value): """ private funciton Adjust the number of useful votes and number of total votes for a comment. @param comID: comment id @param value: +1 or -1 @return integer 1 if successful, integer 0 if not """ # retrieve nb_useful votes query1 = "SELECT nb_votes_total, nb_votes_yes FROM cmtRECORDCOMMENT WHERE id=%s" params1 = (comID,) res1 = run_sql(query1, params1) if len(res1)==0: return 0 # modify and insert new nb_useful votes nb_votes_yes = int(res1[0][1]) if value >= 1: nb_votes_yes = int(res1[0][1]) + 1 nb_votes_total = int(res1[0][0]) + 1 query2 = "UPDATE cmtRECORDCOMMENT SET nb_votes_total=%s, nb_votes_yes=%s WHERE id=%s" params2 = (nb_votes_total, nb_votes_yes, comID) res2 = run_sql(query2, params2) return int(res2) def query_retrieve_comments_or_remarks (recID, display_order='od', display_since='0000-00-00 00:00:00', ranking=0): """ Private function Retrieve tuple of comments or remarks from the database @param recID: record id @param display_order: hh = highest helpful score lh = lowest helpful score hs = highest star score ls = lowest star score od = oldest date nd = newest date @param display_since: datetime, e.g. 0000-00-00 00:00:00 @param ranking: boolean, enabled if reviews, disabled for comments @return tuple of comment where comment is tuple (nickname, date_creation, body, id) if ranking disabled or tuple (nickname, date_creation, body, nb_votes_yes, nb_votes_total, star_score, title, id) Note: for the moment, if no nickname, will return email address up to '@' """ display_since = calculate_start_date(display_since) order_dict = { 'hh' : "c.nb_votes_yes/(c.nb_votes_total+1) DESC, c.date_creation DESC ", 'lh' : "c.nb_votes_yes/(c.nb_votes_total+1) ASC, c.date_creation ASC ", 'ls' : "c.star_score ASC, c.date_creation DESC ", 'hs' : "c.star_score DESC, c.date_creation DESC ", 'od' : "c.date_creation ASC ", 'nd' : "c.date_creation DESC " } #FIXME: temporary fix due to new basket tables not existing yet. if recID < 1: return () # Ranking only done for comments and when allowed if ranking: try: display_order = order_dict[display_order] except: display_order = order_dict['od'] else: try: if display_order[-1] == 'd': display_order = order_dict[display_order] else: display_order = order_dict['od'] except: display_order = order_dict['od'] query = "SELECT u.nickname, c.date_creation, c.body, %(ranking)s c.id " \ "FROM %(table)s AS c, user AS u " \ "WHERE %(id_bibrec)s=\'%(recID)s\' " \ "AND c.id_user=u.id "\ "%(ranking_only)s " \ "%(display_since)s " \ "ORDER BY %(display_order)s " params = { 'ranking' : ranking and ' c.nb_votes_yes, c.nb_votes_total, c.star_score, c.title, ' or '', 'ranking_only' : ranking and ' AND c.star_score>0 ' or ' AND c.star_score=0 ', 'id_bibrec' : recID>0 and 'c.id_bibrec' or 'c.id_bskBASKET_bibrec_bskEXTREC', 'table' : recID>0 and 'cmtRECORDCOMMENT' or 'bskREMARK', 'recID' : recID, 'display_since' : display_since=='0000-00-00 00:00:00' and ' ' or 'AND c.date_creation>=\'%s\' ' % display_since, 'display_order' : display_order } # return run_sql(query % params) #FIXME - Extra horrible code cause nickname can still be blank res = run_sql(query % params) #!FIXME res2= [] for comment in res: if not comment[0]: comment2 = list(comment) user_id = query_get_comment(comment[-1])[2] comment2[0] = query_get_user_contact_info(user_id)[1].split('@')[0] res2.append(comment2) else: res2.append(comment) return tuple(res2) def query_add_comment_or_remark(recID=-1, uid=-1, msg="", note="", score=0, priority=0): """ Private function Insert a comment/review or remarkinto the database @param recID: record id @param uid: user id @param msg: comment body @param note: comment title @param score: review star score @param priority: remark priority #!FIXME @return integer >0 representing id if successful, integer 0 if not """ current_date = calculate_start_date('0d') if msg.count('>>') == 0: #if not replying, but write a fresh comment, limit line length to 80 char msg_words = msg.split(' ') new_msg = [] char_on_this_line = 0 for word in msg_words: char_on_this_line += len(word) + 1 if char_on_this_line >= 80: new_msg.append('\n' + word) char_on_this_line = len(word) + 1 else: if word.find('\n') > 0: char_on_this_line = 0 new_msg.append(word) import string msg = string.join(new_msg, ' ') if recID > 0: #change utf-8 message into general unicode msg = msg.decode('utf-8') note = note.decode('utf-8') # get rid of html tags in msg but keep newlines msg = msg.replace ('\n', "#br#") msg= html2txt(msg) note= html2txt(note) #change general unicode back to utf-8 msg = msg.encode('utf-8') msg = msg.replace('#br#', '
') note = note.encode('utf-8') query = "INSERT INTO cmtRECORDCOMMENT (id_bibrec, id_user, body, date_creation, star_score, nb_votes_total, title) " \ "VALUES (%s, %s, %s, %s, %s, %s, %s)" params = (recID, uid, msg, current_date, score, 0, note) else: #change utf-8 message into general unicode msg = msg.decode('utf-8') # get rid of html tags in msg but keep newlines msg = msg.replace ('\n', "#br#") msg= html2txt(msg) #change general unicode back to utf-8 msg = msg.encode('utf-8') msg = msg.replace('#br#', '
') query = "INSERT INTO bskREMARK (id_bskBASKET_bibrec_bibEXTREC, id_user, body, date_creation, priority) " \ "VALUES (%s, %s, %s, %s, %s)" params = (recID, uid, msg, current_date, priority) return int(run_sql(query, params)) def calculate_start_date(display_since): """ Private function Returns the datetime of display_since argument in MYSQL datetime format calculated according to the local time. @param display_since = all= no filtering nd = n days ago nw = n weeks ago nm = n months ago ny = n years ago where n is a single digit number @return string of wanted datetime. If 'all' given as argument, will return "0000-00-00 00:00:00" If bad arguement given, will return "0000-00-00 00:00:00" """ # time type and seconds coefficients time_types = {'d':0,'w':0,'m':0,'y':0} ## verify argument # argument wrong size if (display_since==(None or 'all')) or (len(display_since) > 2): return ("0000-00-00 00:00:00") try: nb = int(display_since[0]) except: return ("0000-00-00 00:00:00") if str(display_since[1]) in time_types: time_type = str(display_since[1]) else: return ("0000-00-00 00:00:00") ## calculate date # initialize the coef if time_type == 'w': time_types[time_type] = 7 else: time_types[time_type] = 1 start_time = time.localtime(time.time()) start_time = time.mktime(( start_time[0] - nb*time_types['y'], start_time[1] - nb*time_types['m'], start_time[2] - nb*time_types['d'] - nb*time_types['w'], start_time[3], start_time[4], start_time[5], start_time[6], start_time[7], start_time[8])) return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(start_time)) def get_first_comments_or_remarks(recID=-1, ln=cdslang, nb_comments='all', nb_reviews='all', voted=-1, reported=-1): """ Gets nb number comments/reviews or remarks. In the case of comments, will get both comments and reviews Comments and remarks sorted by most recent date, reviews sorted by highest helpful score @param recID: record id @param ln: language @param nb: number of comment/reviews or remarks to get @param voted: 1 if user has voted for a remark @param reported: 1 if user has reported a comment or review @return if comment, tuple (comments, reviews) both being html of first nb comments/reviews if remark, tuple (remakrs, None) """ warnings = [] errors = [] voted = wash_url_argument(voted, 'int') reported = wash_url_argument(reported, 'int') ## check recID argument if type(recID) is not int: return () if recID >= 1 or recID <= -100: #comment or remark if cfg_webcomment_allow_reviews: res_reviews = query_retrieve_comments_or_remarks(recID=recID, display_order="hh", ranking=1) nb_res_reviews = len(res_reviews) ## check nb argument if type(nb_reviews) is int and nb_reviews < len(res_reviews): first_res_reviews = res_reviews[:nb_reviews] else: if nb_res_reviews > cfg_webcomment_nb_reviews_in_detailed_view: first_res_reviews = res_reviews[:cfg_comment_nb_reports_before_send_email_to_admin] else: first_res_reviews = res_reviews if cfg_webcomment_allow_comments: res_comments = query_retrieve_comments_or_remarks(recID=recID, display_order="od", ranking=0) nb_res_comments = len(res_comments) ## check nb argument if type(nb_comments) is int and nb_comments < len(res_comments): first_res_comments = res_comments[:nb_comments] else: if nb_res_comments > cfg_webcomment_nb_comments_in_detailed_view: first_res_comments = res_comments[:cfg_webcomment_nb_comments_in_detailed_view] else: first_res_comments = res_comments else: #error errors.append(('ERR_WEBCOMMENT_RECID_INVALID', recID)) #!FIXME dont return error anywhere since search page # comment if recID >= 1: comments = reviews = "" if reported > 0: warnings.append(('WRN_WEBCOMMENT_FEEDBACK_RECORDED_GREEN_TEXT',)) elif reported == 0: warnings.append(('WRN_WEBCOMMENT_FEEDBACK_NOT_RECORDED_RED_TEXT',)) if cfg_webcomment_allow_comments: # normal comments comments = webcomment_templates.tmpl_get_first_comments_without_ranking(recID, ln, first_res_comments, nb_res_comments, warnings) if cfg_webcomment_allow_reviews: # ranked comments #calculate average score avg_score = calculate_avg_score(res_reviews) if voted > 0: warnings.append(('WRN_WEBCOMMENT_FEEDBACK_RECORDED_GREEN_TEXT',)) elif voted == 0: warnings.append(('WRN_WEBCOMMENT_FEEDBACK_NOT_RECORDED_RED_TEXT',)) reviews = webcomment_templates.tmpl_get_first_comments_with_ranking(recID, ln, first_res_reviews, nb_res_reviews, avg_score, warnings) return (comments, reviews) # remark else: return(webcomment_templates.tmpl_get_first_remarks(first_res_comments, ln, nb_res_comments), None) def calculate_avg_score(res): """ private function Calculate the avg score of reviews present in res @param res: tuple of tuple returned from query_retrieve_comments_or_remarks @return a float of the average score rounded to the closest 0.5 """ c_nickname = 0 c_date_creation = 1 c_body = 2 c_nb_votes_yes = 3 c_nb_votes_total = 4 c_star_score = 5 c_title = 6 c_id = 7 avg_score = 0.0 nb_reviews = 0 for comment in res: if comment[c_star_score] > 0: avg_score += comment[c_star_score] nb_reviews += 1 if nb_reviews == 0: return 0.0 avg_score = avg_score / nb_reviews avg_score_unit = avg_score - math.floor(avg_score) if avg_score_unit < 0.25: avg_score = math.floor(avg_score) elif avg_score_unit > 0.75: avg_score = math.floor(avg_score) + 1 else: avg_score = math.floor(avg_score) + 0.5 if avg_score > 5: avg_score = 5.0 return avg_score def perform_request_add_comment_or_remark(recID=-1, uid=-1, action='DISPLAY', ln=cdslang, msg=None, score=None, note=None, priority=None, reviews=0, comID=-1): """ Add a comment/review or remark @param recID: record id @param uid: user id @param action: 'DISPLAY' to display add form 'SUBMIT' to submit comment once form is filled 'REPLY' to reply to an existing comment @param ln: language @param msg: the body of the comment/review or remark @param score: star score of the review @param note: title of the review @param priority: priority of remark @param reviews: boolean, if enabled will add a review, if disabled will add a comment @param comID: if replying, this is the comment id of the commetn are replying to @return html add form if action is display or reply html successful added form if action is submit """ warnings = [] errors = [] actions = ['DISPLAY', 'REPLY', 'SUBMIT'] ## wash arguments recID = wash_url_argument(recID, 'int') uid = wash_url_argument(uid, 'int') msg = wash_url_argument(msg, 'str') score = wash_url_argument(score, 'int') note = wash_url_argument(note, 'str') priority = wash_url_argument(priority, 'int') reviews = wash_url_argument(reviews, 'int') comID = wash_url_argument(comID, 'int') ## check arguments check_recID_is_in_range(recID, warnings, ln) if uid <= 0: errors.append(('ERR_WEBCOMMENT_UID_INVALID', uid)) else: nickname = query_get_user_contact_info(uid)[0] # show the form if action == 'DISPLAY': if reviews and cfg_webcomment_allow_reviews: return (webcomment_templates.tmpl_add_comment_form_with_ranking(recID, uid, nickname, ln, msg, score, note, warnings), errors, warnings) elif not reviews and cfg_webcomment_allow_comments: return (webcomment_templates.tmpl_add_comment_form(recID, uid, nickname, ln, msg, warnings), errors, warnings) else: errors.append(('ERR_WEBCOMMENT_COMMENTS_NOT_ALLOWED',)) elif action == 'REPLY': if reviews and cfg_webcomment_allow_reviews: errors.append(('ERR_WEBCOMMENT_REPLY_REVIEW',)) return (webcomment_templates.tmpl_add_comment_form_with_ranking(recID, uid, nickname, ln, msg, score, note, warnings), errors, warnings) elif not reviews and cfg_webcomment_allow_comments: if comID>0: comment = query_get_comment(comID) if comment: user_info = query_get_user_contact_info(comment[2]) if user_info: msg = comment[3] # msg = comment[3].replace('\n', ' ') # msg = msg.replace('
', '\n') date_creation = str(comment[4]) date_creation = date_creation[:18] date_creation = time.strptime(str(date_creation), "%Y-%m-%d %H:%M:%S") date_creation = time.strftime("%d %b %Y %H:%M:%S %Z", date_creation) msg = "%s wrote on %s:\n%s" % (user_info[0], date_creation, msg) return (webcomment_templates.tmpl_add_comment_form(recID, uid, nickname, ln, msg, warnings), errors, warnings) else: errors.append(('ERR_WEBCOMMENT_COMMENTS_NOT_ALLOWED',)) # check before submitting form elif action == 'SUBMIT': if reviews and cfg_webcomment_allow_reviews: if note.strip() in ["", "None"]: warnings.append(('WRN_WEBCOMMENT_ADD_NO_TITLE',)) if score == 0 or score > 5: warnings.append(("WRN_WEBCOMMENT_ADD_NO_SCORE",)) if msg.strip() in ["", "None"]: warnings.append(('WRN_WEBCOMMENT_ADD_NO_BODY',)) # if no warnings, submit if len(warnings) == 0: success = query_add_comment_or_remark(recID=recID, uid=uid, msg=msg, note=note, score=score, priority=0) if success > 0: if cfg_webcomment_admin_notification_level > 0: notify_admin_of_new_comment(comID=success) return (webcomment_templates.tmpl_add_comment_successful(recID, ln, reviews), errors, warnings) else: errors.append(('ERR_WEBCOMMENT_DB_INSERT_ERROR',)) # if are warnings or if inserting comment failed, show user where warnings are if reviews and cfg_webcomment_allow_reviews: return (webcomment_templates.tmpl_add_comment_form_with_ranking(recID, uid, nickname, ln, msg, score, note, warnings), errors, warnings) else: return (webcomment_templates.tmpl_add_comment_form(recID, uid, nickname, ln, msg, warnings), errors, warnings) # unknown action send to display else: warnings.append(('WRN_WEBCOMMENT_ADD_UNKNOWN_ACTION',)) if reviews and cfg_webcomment_allow_reviews: return (webcomment_templates.tmpl_add_comment_form_with_ranking(recID, uid, ln, msg, score, note, warnings), errors, warnings) else: return (webcomment_templates.tmpl_add_comment_form(recID, uid, ln, msg, warnings), errors, warnings) def notify_admin_of_new_comment(comID): """ Sends an email to the admin with details regarding comment with ID = comID """ comment = query_get_comment(comID) if len(comment) > 0: (comID2, id_bibrec, id_user, body, date_creation, star_score, nb_votes_yes, nb_votes_total, title, nb_abuse_reports) = comment else: return user_info = query_get_user_contact_info(id_user) if len(user_info) > 0: (nickname, email, last_login) = user_info if not len(nickname) > 0: nickname = email.split('@')[0] else: nickname = email = last_login = "ERROR: Could not retrieve" from search_engine import print_record record = print_record(recID=id_bibrec, format='hs') review_stuff = ''' Star score = %s Title = %s''' % (star_score, title) out = ''' The following %(comment_or_review)s has just been posted (%(date)s). AUTHOR: Nickname = %(nickname)s Email = %(email)s User ID = %(uid)s RECORD CONCERNED: Record ID = %(recID)s Record = %(record_details)s %(comment_or_review_caps)s: %(comment_or_review)s ID = %(comID)s %(review_stuff)s Body = %(body)s ADMIN OPTIONS: To delete comment go to %(weburl)s/admin/webcomment/webcommentadmin.py/delete?comid=%(comID)s ''' % \ { 'comment_or_review' : star_score>0 and 'review' or 'comment', 'comment_or_review_caps': star_score>0 and 'REVIEW' or 'COMMENT', 'date' : date_creation, 'nickname' : nickname, 'email' : email, 'uid' : id_user, 'recID' : id_bibrec, 'record_details' : record, 'comID' : comID2, 'review_stuff' : star_score>0 and review_stuff or "", 'body' : body.replace('
','\n'), 'weburl' : weburl } from_addr = 'CDS Alert Engine <%s>' % alertengineemail to_addr = adminemail subject = "A new comment/review has just been posted" from alert_engine import send_email, forge_email out = forge_email(from_addr, to_addr, subject, out) send_email(from_addr, to_addr, out) def check_recID_is_in_range(recID, warnings=[], ln=cdslang): """ Check that recID is >= 0 or <= -100 Append error messages to errors listi @param recID: record id @param warnings: the warnings list of the calling function @return tuple (boolean, html) where boolean (1=true, 0=false) and html is the body of the page to display if there was a problem """ # Make errors into a list if needed if type(warnings) is not list: errors = [warnings] try: recID = int(recID) except: pass if type(recID) is int: if recID >= 1 or recID <= -100: from search_engine import record_exists success = record_exists(recID) if success == 1: return (1,"") else: warnings.append(('ERR_WEBCOMMENT_RECID_INEXISTANT', recID)) return (0, webcomment_templates.tmpl_record_not_found(status='inexistant', recID=recID, ln=ln)) elif recID == -1: warnings.append(('ERR_WEBCOMMENT_RECID_MISSING',)) return (0, webcomment_templates.tmpl_record_not_found(status='missing', recID=recID, ln=ln)) else: warnings.append(('ERR_WEBCOMMENT_RECID_INVALID', recID)) return (0, webcomment_templates.tmpl_record_not_found(status='invalid', recID=recID, ln=ln)) else: warnings.append(('ERR_WEBCOMMENT_RECID_NAN', recID)) return (0, webcomment_templates.tmpl_record_not_found(status='nan', recID=recID, ln=ln)) def check_int_arg_is_in_range(value, name, errors, gte_value, lte_value=None): """ Check that variable with name 'name' >= gte_value and optionally <= lte_value Append error messages to errors list @param value: variable value @param name: variable name @param errors: list of error tuples (error_id, value) @param gte_value: greater than or equal to value @param lte_value: less than or equal to value @return boolean (1=true, 0=false) """ # Make errors into a list if needed if type(errors) is not list: errors = [errors] if type(value) is not int or type(gte_value) is not int: errors.append(('ERR_WEBCOMMENT_PROGRAMNING_ERROR',)) return 0 if type(value) is not int: errors.append(('ERR_WEBCOMMENT_ARGUMENT_NAN', value)) return 0 if value < gte_value: errors.append(('ERR_WEBCOMMENT_ARGUMENT_INVALID', value)) return 0 if lte_value: if type(lte_value) is not int: errors.append(('ERR_WEBCOMMENT_PROGRAMNING_ERROR',)) return 0 if value > lte_value: errors.append(('ERR_WEBCOMMENT_ARGUMENT_INVALID', value)) return 0 return 1 def wash_url_argument(var, new_type): """ Wash argument into 'new_type', that can be 'list', 'str', or 'int'. If needed, the check 'type(var) is not None' should be done before calling this function @param var: variable value @param new_type: variable type, 'list', 'str' or 'int' @return as much as possible, value var as type new_type If var is a list, will change first element into new_type. If int check unsuccessful, returns 0 """ out = [] if new_type == 'list': # return lst if type(var) is list: out = var else: out = [var] elif new_type == 'str': # return str if type(var) is list: try: out = "%s" % var[0] except: out = "" elif type(var) is str: out = var else: out = "%s" % var elif new_type == 'int': # return int if type(var) is list: try: out = int(var[0]) except: out = 0 elif type(var) is int: out = var elif type(var) is str: try: out = int(var) except: out = 0 else: out = 0 elif new_type == 'tuple': # return tuple if type(var) is tuple: out = var else: out = (var,) elif new_type == 'dict': # return dictionary if type(var) is dict: out = var else: out = {0:var} return out diff --git a/modules/webcomment/lib/webcomment_config.py b/modules/webcomment/lib/webcomment_config.py index 25cd5789c..ff4cf1fe8 100644 --- a/modules/webcomment/lib/webcomment_config.py +++ b/modules/webcomment/lib/webcomment_config.py @@ -1,60 +1,60 @@ # -*- coding: utf-8 -*- ## $Id$ ## Comments and reviews for records. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __lastupdated__ = """FIXME: last updated""" -from config import * +from cdsware.config import * cfg_webcomment_error_messages = \ { 'ERR_WEBCOMMENT_RECID_INVALID' : ' %s is an invalid record ID ', 'ERR_WEBCOMMENT_RECID_NAN' : ' Record ID %s is not a number ', 'ERR_WEBCOMMENT_UID_INVALID' : ' %s is an invalid user ID ', 'ERR_WEBCOMMENT_DB_ERROR' : ' %s ', 'ERR_WEBCOMMENT_COMMENTS_NOT_ALLOWED': ' Comments on library record have been disallowed by the Administrator ', 'ERR_WEBCOMMENT_ARGUMENT_NAN' : ' %s is not a number ', 'ERR_WEBCOMMENT_ARGUEMENT_INVALID' : ' %s invalid argument ', 'ERR_WEBCOMMENT_PROGRAMMING_ERROR' : ' Programming error, please inform the Administrator ', 'ERR_WEBCOMMENT_FOR_TESTING_PURPOSES': ' THIS IS FOR TESTING PURPOSES ONLY var1=%s var2=%s var3=%s var4=%s var5=%s var6=%s ', 'ERR_WEBCOMMENT_REPLY_REVIEW' : ' Cannot reply to a review ' } cfg_webcomment_warning_messages = \ { 'WRN_WEBCOMMENT_INVALID_PAGE_NB' : "Bad page number --> showing first page", 'WRN_WEBCOMMENT_INVALID_NB_RESULTS_PER_PAGE' : "Bad number of results per page --> showing 10 results per page", 'WRN_WEBCOMMENT_INVALID_REVIEW_DISPLAY_ORDER' : "Bad display order --> showing most helpful first", 'WRN_WEBCOMMENT_INVALID_DISPLAY_ORDER' : "Bad display order --> showing oldest first", 'WRN_WEBCOMMENT_FEEDBACK_RECORDED_GREEN_TEXT' : "Your feedback has been recorded, many thanks", 'WRN_WEBCOMMENT_FEEDBACK_NOT_RECORDED_RED_TEXT' : "Your feedback could not be recorded, please try again", 'WRN_WEBCOMMENT_ADD_NO_TITLE' : "You must enter a title", 'WRN_WEBCOMMENT_ADD_NO_SCORE' : "You must choose a score", 'WRN_WEBCOMMENT_ADD_NO_BODY' : "You must enter a text", 'ERR_WEBCOMMENT_DB_INSERT_ERROR' : 'Failed to insert your comment to the database. Please try again.', 'WRN_WEBCOMMENT_ADD_UNKNOWN_ACTION' : 'Unknown action --> showing you the default add comment form', 'WRN_WEBCOMMENT_ADMIN_COMID_NAN' : 'comment ID must be a number, try again', 'WRN_WEBCOMMENT_ADMIN_INVALID_COMID' : 'Invalid comment ID, try again', 'WRN_WEBCOMMENT_ADMIN_COMID_INEXISTANT' : "Comment ID %s does not exist, try again", 'ERR_WEBCOMMENT_RECID_MISSING' : "No record ID was given", 'ERR_WEBCOMMENT_RECID_INEXISTANT' : "Record ID %s does not exist in the database", 'ERR_WEBCOMMENT_RECID_INVALID' : "Record ID %s is an invalid ID", 'ERR_WEBCOMMENT_RECID_NAN' : "Record ID %s is not a number", } diff --git a/modules/webcomment/lib/webcomment_templates.py b/modules/webcomment/lib/webcomment_templates.py index 1a61fffea..807d12414 100644 --- a/modules/webcomment/lib/webcomment_templates.py +++ b/modules/webcomment/lib/webcomment_templates.py @@ -1,962 +1,962 @@ # -*- coding: utf-8 -*- ## $Id$ ## Comments and reviews for records. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __lastupdated__ = """FIXME: last updated""" import urllib import time import string -from config import * -from messages import gettext_set_language, language_list_long +from cdsware.config import * +from cdsware.messages import gettext_set_language, language_list_long class Template: def tmpl_get_first_comments_without_ranking(self, recID, ln, comments, nb_comments_total, warnings): """ @param recID: record id @param ln: language @param comments: tuple as returned from webcomment.py/query_retrieve_comments_or_remarks @param nb_comments_total: total number of comments for this record @param warnings: list of warning tuples (warning_msg, arg1, arg2, ...) @return html of comments """ # load the right message language _ = gettext_set_language(ln) # naming data fields of comments c_nickname = 0 c_date_creation = 1 c_body = 2 c_id = 3 warnings = self.tmpl_warnings(warnings) report_link = '''%s/comments.py/report?recid=%s&ln=%s&comid=%%(comid)s&reviews=0''' % (weburl, recID, ln) # comments comment_rows = ''' ''' for comment in comments: comment_rows += ''' ''' comment_rows += self.tmpl_get_comment_without_ranking(recID, ln, comment[c_nickname], comment[c_date_creation], comment[c_body]) comment_rows += '''

''' # write button write_button_link = '''%s/comments.py/add''' % (weburl,) write_button_form = ''' ''' % (recID, ln) write_button_form = self.createhiddenform(action=write_button_link, method="Get", text=write_button_form, button='Write a comment') # output if nb_comments_total > 0: out = warnings + '''
%(comment_title)s
Showing the latest %(nb_comments)s comment%(s)s: %(tab)s
%(comment_rows)s
%(view_all_comments_link)s

%(write_button_form)s
''' % \ { 'comment_title' : "Discuss this document:", 'nb_comments_total' : nb_comments_total, 'recID' : recID, 'comment_rows' : comment_rows, 'tab' : ' '*4, 'weburl' : weburl, 's' : cfg_webcomment_nb_comments_in_detailed_view>1 and 's' or "", 'view_all_comments_link' : nb_comments_total>0 and '''View all %s comments''' \ % (weburl, recID, nb_comments_total) or "", 'write_button_form' : write_button_form, 'nb_comments' : cfg_webcomment_nb_comments_in_detailed_view>1 and cfg_webcomment_nb_comments_in_detailed_view or "" } else: out = '''
Discuss this document:
Start a discussion about any aspect of this document.
%s
''' % (write_button_form,) return out def tmpl_record_not_found(self, status='missing', recID="", ln=cdslang): """ Displays a page when bad or missing record ID was given. @param status: 'missing' : no recID was given 'inexistant': recID doesn't have an entry in the database 'nan' : recID is not a number 'invalid' : recID is an error code, i.e. in the interval [-99,-1] @param return: body of the page """ if status == 'inexistant': body = "Sorry, the record %s does not seem to exist." % (recID,) elif status == 'nan': body = "Sorry, the record %s does not seem to be a number." % (recID,) elif status == 'invalid': body = "Sorry, the record %s is not a valid ID value." % (recID,) else: body = "Sorry, no record ID was provided." body += "

You may want to start browsing from %s." % (weburl, ln, cdsnameintl[ln]) return body def tmpl_get_first_comments_with_ranking(self, recID, ln, comments=None, nb_comments_total=None, avg_score=None, warnings=[]): """ @param recID: record id @param ln: language @param comments: tuple as returned from webcomment.py/query_retrieve_comments_or_remarks @param nb_comments_total: total number of comments for this record @param avg_score: average score of all reviews @param warnings: list of warning tuples (warning_msg, arg1, arg2, ...) @return html of comments """ # load the right message language _ = gettext_set_language(ln) # naming data fields of comments c_nickname = 0 c_date_creation = 1 c_body = 2 c_nb_votes_yes = 3 c_nb_votes_total = 4 c_star_score = 5 c_title = 6 c_id = 7 warnings = self.tmpl_warnings(warnings) #stars if avg_score > 0: avg_score_img = 'stars-' + str(avg_score).split('.')[0] + '-' + str(avg_score).split('.')[1] + '.gif' else: avg_score_img = "stars-except.gif" # voting links useful_dict = { 'weburl' : weburl, 'recID' : recID, 'ln' : ln, 'yes_img' : 'smchk_gr.gif', #'yes.gif', 'no_img' : 'iconcross.gif' #'no.gif' } useful_yes = '''Yes''' % useful_dict useful_no = ''' No''' % useful_dict report_link = '''%(weburl)s/comments.py/report?recid=%(recID)s&ln=%(ln)s&comid=%%(comid)s&reviews=1''' % useful_dict #comment row comment_rows = ''' ''' for comment in comments: comment_rows += ''' ''' comment_rows += self.tmpl_get_comment_with_ranking(recID, ln, comment[c_nickname], comment[c_date_creation], comment[c_body], comment[c_nb_votes_total], comment[c_nb_votes_yes], comment[c_star_score], comment[c_title]) comment_rows += ''' Was this review helpful? %s / %s
''' % (useful_yes % {'comid':comment[c_id]}, useful_no % {'comid':comment[c_id]}) comment_rows += '''
''' # write button write_button_link = '''%s/comments.py/add''' % (weburl,) write_button_form = ''' ''' % (recID, ln ) write_button_form = self.createhiddenform(action=write_button_link, method="Get", text=write_button_form, button='Write a review') if nb_comments_total > 0: out = warnings + '''
%(comment_title)s
Average review score: %(avg_score)s based on %(nb_comments_total)s reviews
Readers found the following %(nb_helpful)s review%(s)s to be most helpful. %(comment_rows)s
%(view_all_comments_link)s %(write_button_form)s
''' % \ { 'comment_title' : "Rate this document:", 'avg_score_img' : avg_score_img, 'avg_score' : avg_score, 'nb_comments_total' : nb_comments_total, 'recID' : recID, 'view_all_comments' : "view all %s reviews" % (nb_comments_total,), 'write_comment' : "write a review", 'comment_rows' : comment_rows, 's' : cfg_webcomment_nb_reviews_in_detailed_view>1 and 's' or "", 'tab' : ' '*4, 'weburl' : weburl, 'nb_helpful' : cfg_webcomment_nb_reviews_in_detailed_view>1 and cfg_webcomment_nb_reviews_in_detailed_view or "", 'view_all_comments_link': nb_comments_total>0 and """View all %s reviews
""" \ % (weburl, recID, ln, nb_comments_total) or "", 'write_button_form' : write_button_form } else: out = '''
Rate this document:
Be the first to review this document.
%s
''' % (write_button_form,) return out def tmpl_get_comment_without_ranking(self, recID, ln, nickname, date_creation, body, reply_link=None, report_link=None): """ private function @param ln: language @param nickname: nickname @param date_creation: date comment was written @param body: comment body @param reply_link: if want reply and report, give the http links @param repot_link: if want reply and report, give the http links @return html table of comment """ # load the right message language _ = gettext_set_language(ln) date_creation = str(date_creation) date_creation_data = date_creation[:18] date_creation_data = time.strptime(str(date_creation_data), "%Y-%m-%d %H:%M:%S") date_creation_data = time.strftime("%d %b %Y %H:%M:%S %Z", date_creation_data) date_creation = str(date_creation_data) + date_creation[22:] # 22 to get rid of the .00 after time out = ''' ''' # load the right message language #_ = gettext_set_language(ln) # format the msg so that the '>>' chars give vertical lines body_rows = body.split('
') final_body = "\n" for row in body_rows: final_body += "\n\t" nb_quotes = row.count('>>') final_body += '\n\t\t' final_body += "\n\t" final_body += "\n
' if len(row.split('>>')[-1].strip()) == 0: row += ' ' row = row.replace('>>', "\t\t\t
") for line in range(nb_quotes): row += '
' final_body += row final_body += '\n\t\t
" out += """
%(nickname)s wrote on %(date_creation)s %(links)s
%(body)s
""" % \ { #! FIXME put send_a_private_message view_shared_baskets 'nickname' : nickname, 'date_creation' : date_creation, 'body' : final_body, 'links' : (report_link!=None and reply_link!=None) and " Reply | Report abuse" % (reply_link, report_link) or "" } return out def tmpl_get_comment_with_ranking(self, recID, ln, nickname, date_creation, body, nb_votes_total, nb_votes_yes, star_score, title): """ private function @param ln: language @param nickname: nickname @param date_creation: date comment was written @param body: comment body @param nb_votes_total: total number of votes for this review @param nb_votes_yes: number of positive votes for this record @param star_score: star score for this record @param title: title of review @return html table of review """ # load the right message language _ = gettext_set_language(ln) if star_score > 0: star_score_img = 'stars-' + str(star_score) + '-0.gif' else: star_score_img = 'stars-except.gif' out = """""" date_creation = str(date_creation) date_creation_data = date_creation[:18] date_creation_data = time.strptime(str(date_creation_data), "%Y-%m-%d %H:%M:%S") date_creation_data = time.strftime("%d %b %Y %H:%M:%S %Z", date_creation_data) date_creation = str(date_creation_data) + date_creation[22:] # load the right message language #_ = gettext_set_language(ln) out += """
%(star_score)s %(title)s
Reviewed by %(nickname)s on %(date_creation)s
%(nb_votes_yes)s out of %(nb_votes_total)s people found this review useful.
%(body)s
""" % \ { #! FIXME put send_a_private_message view_shared_baskets 'nickname' : nickname, 'weburl' : weburl, 'star_score_img': star_score_img, 'date_creation' : date_creation, 'body' : body, 'nb_votes_total' : nb_votes_total, 'star_score' : star_score, 'title' : title, 'nb_votes_yes' : nb_votes_yes<0 and "0" or nb_votes_yes } return out def tmpl_get_comments(self, recID, ln, nb_per_page, page, nb_pages, display_order, display_since, cfg_webcomment_allow_reviews, comments, total_nb_comments, avg_score, warnings, border=0, reviews=0): """ Get table of all comments @param recID: record id @param ln: language @param nb_per_page: number of results per page @param page: page number @param display_order: hh = highest helpful score, review only lh = lowest helpful score, review only hs = highest star score, review only ls = lowest star score, review only od = oldest date nd = newest date @param display_since: all= no filtering by date nd = n days ago nw = n weeks ago nm = n months ago ny = n years ago where n is a single digit integer between 0 and 9 @param cfg_webcomment_allow_reviews: is ranking enable, get from config.py/cfg_webcomment_allow_reviews @param comments: tuple as returned from webcomment.py/query_retrieve_comments_or_remarks @param total_nb_comments: total number of comments for this record @param avg_score: average score of reviews for this record @param warnings: list of warning tuples (warning_msg, color) @param border: boolean, active if want to show border around each comment/review @param reviews: booelan, enabled for reviews, disabled for comments """ # load the right message language _ = gettext_set_language(ln) # naming data fields of comments if reviews: c_nickname = 0 c_date_creation = 1 c_body = 2 c_nb_votes_yes = 3 c_nb_votes_total = 4 c_star_score = 5 c_title = 6 c_id = 7 else: c_nickname = 0 c_date_creation = 1 c_body = 2 c_id = 3 # voting links useful_dict = { 'weburl' : weburl, 'recID' : recID, 'ln' : ln, 'do' : display_order, 'ds' : display_since, 'nb' : nb_per_page, 'p' : page, 'reviews' : reviews } useful_yes = '''Yes''' % useful_dict useful_no = '''No''' % useful_dict warnings = self.tmpl_warnings(warnings) ## record details from search_engine import print_record record_details = print_record(recID=recID, format='hb') link_dic = { 'weburl' : weburl, 'module' : 'comments.py', 'function' : 'index', 'arguments' : 'recid=%s&do=%s&ds=%s&nb=%s&reviews=%s' % (recID, display_order, display_since, nb_per_page, reviews), 'arg_page' : '&p=%s' % page, 'page' : page } ## comments table comments_rows = ''' ''' for comment in comments: comments_rows += ''' ''' if not reviews: report_link = '''%(weburl)s/comments.py/report?recid=%(recID)s&ln=%(ln)s&comid=%%(comid)s&do=%(do)s&ds=%(ds)s&nb=%(nb)s&p=%(p)s&reviews=%(reviews)s&referer=%(weburl)s/comments.py/display''' % useful_dict % {'comid':comment[c_id]} reply_link = '''%(weburl)s/comments.py/add?recid=%(recID)s&ln=%(ln)s&action=REPLY&comid=%%(comid)s''' % useful_dict % {'comid':comment[c_id]} comments_rows += self.tmpl_get_comment_without_ranking(recID, ln, comment[c_nickname], comment[c_date_creation], comment[c_body], reply_link, report_link) else: report_link = '''%(weburl)s/comments.py/report?recid=%(recID)s&ln=%(ln)s&comid=%%(comid)s&do=%(do)s&ds=%(ds)s&nb=%(nb)s&p=%(p)s&reviews=%(reviews)s&referer=%(weburl)s/comments.py/display''' % useful_dict % {'comid':comment[c_id]} comments_rows += self.tmpl_get_comment_with_ranking(recID, ln, comment[c_nickname], comment[c_date_creation], comment[c_body], comment[c_nb_votes_total], comment[c_nb_votes_yes], comment[c_star_score], comment[c_title]) comments_rows += '''
Was this review helpful? %(tab)s %(yes)s / %(no)s %(tab)s%(tab)s(Report abuse)
''' \ % { 'yes' : useful_yes % {'comid':comment[c_id]}, 'no' : useful_no % {'comid':comment[c_id]}, 'report' : report_link % {'comid':comment[c_id]}, 'tab' : ' '*2 } comments_rows += '''
''' ## page links page_links = ''' ''' # Previous if page != 1: link_dic['arg_page'] = 'p=%s' % (page - 1) page_links += ''' << ''' % link_dic else: page_links += ''' %s ''' % (' '*(len('< Previous')+7)) # Page Numbers for i in range(1, nb_pages+1): link_dic['arg_page'] = 'p=%s' % i link_dic['page'] = '%s' % i if i != page: page_links += ''' %(page)s ''' % link_dic else: page_links += ''' %s ''' % i # Next if page != nb_pages: link_dic['arg_page'] = 'p=%s' % (page + 1) page_links += ''' >> ''' % link_dic else: page_links += '''%s''' % (' '*(len('< Next')+7)) ## stuff for ranking if enabled if reviews: comments_or_reviews = 'review' if avg_score > 0: avg_score_img = 'stars-' + str(avg_score).split('.')[0] + '-' + str(avg_score).split('.')[1] + '.gif' else: avg_score_img = "stars-except.gif" ranking_average = '''
Average review score: %(avg_score)s based on %(nb_comments_total)s reviews
''' \ % { 'weburl' : weburl, 'avg_score' : avg_score, 'avg_score_img' : avg_score_img, 'nb_comments_total' : total_nb_comments } else: ranking_average = "" comments_or_reviews = 'comment' write_button_link = '''%s/comments.py/add''' % (weburl, ) write_button_form = ''' ''' % (recID, ln, reviews) write_button_form = self.createhiddenform(action=write_button_link, method="Get", text=write_button_form, button='Write a %s' % comments_or_reviews) ## html body = '''

Record %(recid)s

(Back to search results)

%(record_details)s

%(comments_or_reviews_title)ss

There is a total of %(total_nb_comments)s %(comments_or_reviews)ss. %(ranking_avg)s
%(write_button_form)s
%(comments_rows)s
%(write_button_form_again)s (Back to search results)

''' % \ { 'record_details' : record_details, 'write_button_form' : write_button_form, 'write_button_form_again' : total_nb_comments>3 and write_button_form or "", 'comments_rows' : comments_rows, 'total_nb_comments' : total_nb_comments, 'comments_or_reviews' : comments_or_reviews, 'comments_or_reviews_title' : comments_or_reviews[0].upper() + comments_or_reviews[1:], 'weburl' : weburl, 'module' : "comments.py", 'recid' : recID, 'ln' : ln, 'border' : border, 'ranking_avg' : ranking_average } form = ''' Display comments per page that are and sorted by ''' % \ (reviews==1 and ''' ''' or ''' ''') form_link = "%(weburl)s/%(module)s/%(function)s" % link_dic form = self.createhiddenform(action=form_link, method="Get", text=form, button='Go', recid=recID, p=1) pages = '''
Viewing %(comments_or_reviews)s %(results_nb_lower)s-%(results_nb_higher)s
%(page_links)s
''' % \ { 'page_links' : "Page: " + page_links , 'comments_or_reviews' : not reviews and 'comments' or 'reviews', 'results_nb_lower' : len(comments)>0 and ((page-1) * nb_per_page)+1 or 0, 'results_nb_higher' : page == nb_pages and (((page-1) * nb_per_page) + len(comments)) or (page * nb_per_page) } if nb_pages > 1: #body = warnings + body + form + pages body = warnings + body + pages else: body = warnings + body return body def createhiddenform(self, action="", method="Get", text="", button="confirm", cnfrm='', **hidden): """ create select with hidden values and submit button @param action: name of the action to perform on submit @param method: 'get' or 'post' @param text: additional text, can also be used to add non hidden input @param button: value/caption on the submit button @param cnfrm: if given, must check checkbox to confirm @param **hidden: dictionary with name=value pairs for hidden input @return html form """ output = '
\n' % (action, string.lower(method).strip() in ['get','post'] and method or 'Get') output += '\n
' output += text if cnfrm: output += ' ' for key in hidden.keys(): if type(hidden[key]) is list: for value in hidden[key]: output += ' \n' % (key, value) else: output += ' \n' % (key, hidden[key]) output += '
' output += ' \n' % (button, ) output += '
' output += '
\n' return output def tmpl_warnings(self, warnings): """ Prepare the warnings list @param warnings: list of warning tuples (warning_msg, arg1, arg2, etc) @return html string of warnings """ from errorlib import get_msgs_for_code_list span_class = 'important' out = "" if type(warnings) is not list: warnings = [warnings] if len(warnings) > 0: warnings_parsed = get_msgs_for_code_list(warnings, 'warning') for (warning_code, warning_text) in warnings_parsed: if not warning_code.startswith('WRN'): #display only warnings that begin with WRN to user continue if warning_code.find('GREEN_TEXT') >= 0: span_class = "exampleleader" elif warning_code.find('RED_TEXT') >= 0: span_class = "important" out += ''' %(warning)s
''' % \ { 'span_class' : span_class, 'warning' : warning_text } return out else: return "" def tmpl_add_comment_form(self, recID, uid, nickname, ln, msg, warnings): """ Add form for comments @param recID: record id @param uid: user id @param ln: language @param msg: comment body contents for when refreshing due to warning @param warnings: list of warning tuples (warning_msg, color) @return html add comment form """ link_dic = { 'weburl' : weburl, 'module' : 'comments.py', 'function' : 'add', 'arguments' : 'recid=%s&ln=%s&action=%s&reviews=0' % (recID, ln, 'SUBMIT') } from search_engine import print_record record_details = print_record(recID=recID, format='hb') (msg_formated, max_line_length) = self.format_quote(urllib.unquote(msg)) warnings = self.tmpl_warnings(warnings) form = '''
Article:
%(record)s

Comment:
Note: Your nickname, %(nickname)s, will be displayed as author of this comment


''' % { 'msg' : msg!="" and msg_formated or "", 'max_line_length' : max_line_length>80 and max_line_length or 80, 'nickname' : nickname, 'record' : record_details } form_link = "%(weburl)s/%(module)s/%(function)s?%(arguments)s" % link_dic form = self.createhiddenform(action=form_link, method="Post", text=form, button='Add comment') return warnings + form def format_quote(self, msg): """ Formats the msg so that the reply has correctly inserted '>>' characters @return tuple (formated_msg, max_line_length) max_line_length helps to determine optimal size of input field """ import string msg = msg.replace('CET:', 'CET:
', 1) msg = msg.replace('\n', '') msg = msg.replace('
', ' \n ') msg_words = msg.split(' ') final_words = [] char_on_this_line = 0 previous_word_is_quote = 0 max_line_len = 0 for word in msg_words: if word.strip() == '>>': if not previous_word_is_quote: final_words.append('\n' + '>>') if char_on_this_line > max_line_len: max_line_len = char_on_this_line char_on_this_line = 3 previous_word_is_quote = 1 previous_word_is_quote = 1 final_words.append('>>') char_on_this_line += 3 elif word == '\n': final_words.append('\n' + '>>') if char_on_this_line > max_line_len: max_line_len = char_on_this_line char_on_this_line = 3 previous_word_is_quote = 1 else: final_words.append(word) char_on_this_line += len(word) + 1 previous_word_is_quote = 0 msg = string.join(final_words,' ') return (msg, max_line_len) def tmpl_add_comment_form_with_ranking(self, recID, uid, nickname, ln, msg, score, note, warnings): """ Add form for reviews @param recID: record id @param uid: user id @param ln: language @param msg: comment body contents for when refreshing due to warning @param score: review score @param note: review title @param warnings: list of warning tuples (warning_msg, color) @return html add review form """ link_dic = { 'weburl' : weburl, 'module' : 'comments.py', 'function' : 'add', 'arguments' : 'recid=%s&ln=%s&action=%s&reviews=1' % (recID, ln, 'SUBMIT') } warnings = self.tmpl_warnings(warnings) from search_engine import print_record record_details = print_record(recID=recID, format='hb') form = '''
Article:
%(record)s

Rate this article:

Give a title to your review:


Write your review:
Note: Your nickname, %(nickname)s, will be displayed as the author of this review


''' % { 'note' : note!='' and note or "", 'msg' : msg!='' and msg or "", 'nickname' : nickname, 'record' : record_details } form_link = "%(weburl)s/%(module)s/%(function)s?%(arguments)s" % link_dic form = self.createhiddenform(action=form_link, method="Post", text=form, button='Add Review') return warnings + form def tmpl_add_comment_successful(self, recID, ln, reviews): """ @param recID: record id @param ln: language @return html page of successfully added comment/review """ link_dic = { 'weburl' : weburl, 'module' : 'comments.py', 'function' : 'display', 'arguments' : 'recid=%s&ln=%s&do=od&reviews=%s' % (recID, ln, reviews) } link = "%(weburl)s/%(module)s/%(function)s?%(arguments)s" % link_dic return '''Your %s was successfully added

Back to record''' % (reviews==1 and 'review' or 'comment', link) def tmpl_admin_index(self, ln): """ """ # load the right message language _ = gettext_set_language(ln) out = ''' ''' if cfg_webcomment_allow_comments or cfg_webcomment_allow_reviews: if cfg_webcomment_allow_comments: out += ''' ''' if cfg_webcomment_allow_reviews: out += ''' ''' out += ''' ''' out = out % { 'weburl' : weburl, 'ln' : ln } else: out += ''' ''' out += '''
0.  View all reported comments
0. View all reported reviews
1. Delete a specific comment/review (by ID)
2. View all users who have been reported

Comments and reviews are disabled
''' from bibrankadminlib import addadminbox return addadminbox('Menu', [out]) def tmpl_admin_delete_form(self, ln, warnings): """ @param warnings: list of warning_tuples where warning_tuple is (warning_message, text_color) see tmpl_warnings, color is optional """ # load the right message language _ = gettext_set_language(ln) warnings = self.tmpl_warnings(warnings) out = '''
Please enter the ID of the comment/review so that you can view it before deciding to delete it or not

''' form = '''
Comment ID:


''' form_link = "%s/admin/webcomment/webcommentadmin.py/delete?ln=%s" % (weburl, ln) form = self.createhiddenform(action=form_link, method="Get", text=form, button='View Comment') return warnings + out + form def tmpl_admin_users(self, ln, users_data): """ @param users_data: tuple of ct, i.e. (ct, ct, ...) where ct is a tuple (total_number_reported, total_comments_reported, total_reviews_reported, total_nb_votes_yes_of_reported, total_nb_votes_total_of_reported, user_id, user_email, user_nickname) sorted by order of ct having highest total_number_reported """ u_reports = 0 u_comment_reports = 1 u_reviews_reports = 2 u_nb_votes_yes = 3 u_nb_votes_total = 4 u_uid = 5 u_email = 6 u_nickname = 7 if not users_data: return self.tmpl_warnings([("There have been no reports so far.", 'green')]) user_rows = "" for utuple in users_data: com_link = '''View all %s reported comments
''' % \ (weburl, ln, utuple[u_uid], utuple[u_comment_reports]) rev_link = '''View all %s reported reviews''' % \ (weburl, ln, utuple[u_uid], utuple[u_reviews_reports]) user_rows += ''' %(nickname)s %(email)s %(uid)s %(review_row)s %(reports)s %(com_link)s%(rev_link)s ''' % { 'nickname' : len(utuple[u_nickname])>0 and utuple[u_nickname] or utuple[u_email].split('@')[0], 'email' : utuple[u_email], 'uid' : utuple[u_uid], 'reports' : utuple[u_reports], 'review_row': cfg_webcomment_allow_reviews>0 and "%s%s%s" % \ (utuple[u_nb_votes_yes], utuple[u_nb_votes_total]-utuple[u_nb_votes_yes], utuple[u_nb_votes_total]) or "", 'weburl' : weburl, 'ln' : ln, 'com_link' : cfg_webcomment_allow_comments>0 and com_link or "", 'rev_link' : cfg_webcomment_allow_reviews>0 and rev_link or "" } out = '''
Here is a list, sorted by total number of reports, of all users who have had at least one report to one of their comments.

%(reviews_columns)s %(user_rows)s
Nickname Email User IDTotal number of reports View all user's reported comments/reviews
''' % { 'reviews_columns' : cfg_webcomment_allow_reviews>0 and "Number positive votesNumber negative votesTotal number votes" or "", 'user_rows' : user_rows } return out def tmpl_admin_comments(self, ln, uid, comID, comment_data, reviews): """ @param comment_data: same type of tuple as that which is returned by webcomment.py/query_retrieve_comments_or_remarks i.e. tuple of comment where comment is tuple (nickname, date_creation, body, id) if ranking disabled or tuple (nickname, date_creation, body, nb_votes_yes, nb_votes_total, star_score, title, id) """ comments = self.tmpl_get_comments(recID=-1, ln=ln, nb_per_page=0, page=1, nb_pages=1, display_order='od', display_since='all', cfg_webcomment_allow_reviews=cfg_webcomment_allow_reviews, comments=comment_data, total_nb_comments=len(comment_data), avg_score=-1, warnings=[], border=1, reviews=reviews) comments = comments.split("")[1] comments = comments.split("")[0] form_link = "%s/admin/webcomment/webcommentadmin.py/del_com?ln=%s" % (weburl, ln) form = self.createhiddenform(action=form_link, method="Post", text=comments, button='Delete Selected Comments') if uid > 0: header = "
Here are the reported %s of user %s

" % (reviews>0 and "reviews" or "comments", uid) if comID > 0: header = "
Here is comment/review %s

" % comID if uid > 0 and comID > 0: header = "
Here is comment/review %s written by user %s

" % (comID, uid) if uid ==0 and comID == 0: header = "
Here are all reported %s sorted by most reported

" % (reviews>0 and "reviews" or "comments",) return header + form def tmpl_admin_del_com(self, del_res): """ @param del_res: list of the following tuple (comment_id, was_successfully_deleted), was_successfully_deleted is boolean (0=false, >0=true """ table_rows = ''' ''' for deltuple in del_res: table_rows += ''' %s%s ''' % (deltuple[0], deltuple[1]>0 and "Yes" or "No") out = ''' %s
comment IDsuccessfully deleted
''' % (table_rows) return out diff --git a/modules/webcomment/lib/webcomment_tests.py b/modules/webcomment/lib/webcomment_tests.py index 7a4c02e69..99f66324c 100644 --- a/modules/webcomment/lib/webcomment_tests.py +++ b/modules/webcomment/lib/webcomment_tests.py @@ -1,31 +1,24 @@ -import webcomment -import unittest - -class TestWashQueryParameters(unittest.TestCase): - """Test for washing of search query parameters.""" +# -*- coding: utf-8 -*- +## $Id$ +## +## This file is part of the CERN Document Server Software (CDSware). +## Copyright (C) 2002, 2003, 2004, 2005 CERN. +## +## The CDSware is free software; you can redistribute it and/or +## modify it under the terms of the GNU General Public License as +## published by the Free Software Foundation; either version 2 of the +## License, or (at your option) any later version. +## +## The CDSware is distributed in the hope that it will be useful, but +## WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +## General Public License for more details. +## +## You should have received a copy of the GNU General Public License +## along with CDSware; if not, write to the Free Software Foundation, Inc., +## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - def test_wash_url_argument(self): - """search engine - washing of URL arguments""" - self.assertEqual(1, search_engine.wash_url_argument(['1'],'int')) - self.assertEqual("1", search_engine.wash_url_argument(['1'],'str')) - self.assertEqual(['1'], search_engine.wash_url_argument(['1'],'list')) - self.assertEqual(0, search_engine.wash_url_argument('ellis','int')) - self.assertEqual("ellis", search_engine.wash_url_argument('ellis','str')) - self.assertEqual(["ellis"], search_engine.wash_url_argument('ellis','list')) - self.assertEqual(0, search_engine.wash_url_argument(['ellis'],'int')) - self.assertEqual("ellis", search_engine.wash_url_argument(['ellis'],'str')) - self.assertEqual(["ellis"], search_engine.wash_url_argument(['ellis'],'list')) +import unittest - def test_wash_pattern(self): - """search engine - washing of query patterns""" - self.assertEqual("Ellis, J", search_engine.wash_pattern('Ellis, J')) - self.assertEqual("ell", search_engine.wash_pattern('ell*')) - -def create_test_suite(): - """Return test suite for the search engine.""" - return unittest.TestSuite((unittest.makeSuite(TestWashQueryParameters,'test'), - unittest.makeSuite(TestStripAccents,'test'), - unittest.makeSuite(TestQueryParser,'test'))) +from cdsware import webcomment -if __name__ == "__main__": - unittest.TextTestRunner(verbosity=2).run(create_test_suite()) diff --git a/modules/webcomment/lib/webcommentadminlib.py b/modules/webcomment/lib/webcommentadminlib.py index 0190ca19c..b846f4039 100644 --- a/modules/webcomment/lib/webcommentadminlib.py +++ b/modules/webcomment/lib/webcommentadminlib.py @@ -1,232 +1,232 @@ # -*- coding: utf-8 -*- ## $Id$ ## Comments and reviews for records. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. __lastupdated__ = """FIXME: last updated""" +from mod_python import apache -from bibrankadminlib import check_user +from cdsware.bibrankadminlib import check_user #write_outcome,modify_translations,get_def_name,get_i8n_name,get_name,get_rnk_nametypes,get_languages,check_user,is_adminuser, #adderrorbox,addadminbox,tupletotable,tupletotable_onlyselected,addcheckboxes,createhiddenform,serialize_via_numeric_array_dumps, #serialize_via_numeric_array_compr,serialize_via_numeric_array_escape,serialize_via_numeric_array,deserialize_via_numeric_array, #serialize_via_marshal,deserialize_via_marshal -from config import * -from webcomment import wash_url_argument, query_get_comment, query_get_user_contact_info -from mod_python import apache -from dbquery import run_sql +from cdsware.config import * +from cdsware.webcomment import wash_url_argument, query_get_comment, query_get_user_contact_info +from cdsware.dbquery import run_sql -import template -webcomment_templates = template.load('webcomment') +import cdsware.template +webcomment_templates = cdsware.template.load('webcomment') def getnavtrail(previous = ''): """Get the navtrail""" navtrail = """Admin Area > WebComment Admin """ % (weburl, weburl) navtrail = navtrail + previous return navtrail def perform_request_index(ln=cdslang): """ """ return webcomment_templates.tmpl_admin_index(ln=ln) def perform_request_delete(ln=cdslang, comID=-1): """ """ warnings = [] ln = wash_url_argument(ln, 'str') comID = wash_url_argument(comID, 'int') if comID is not None: if comID <= 0: if comID != -1: warnings.append(("WRN_WEBCOMMENT_ADMIN_INVALID_COMID",)) return (webcomment_templates.tmpl_admin_delete_form(ln, warnings),None, warnings) comment = query_get_comment(comID) if comment: c_star_score = 5 if comment[c_star_score] > 0: reviews = 1 else: reviews = 0 return (perform_request_comments(ln=ln, comID=comID, reviews=reviews), None, warnings) else: warnings.append(('WRN_WEBCOMMENT_ADMIN_COMID_INEXISTANT', comID)) return (webcomment_templates.tmpl_admin_delete_form(ln, warnings), None, warnings) else: return (webcomment_templates.tmpl_admin_delete_form(ln, warnings), None, warnings) def perform_request_users(ln=cdslang): """ """ ln = wash_url_argument(ln, 'str') users_data = query_get_users_reported() return webcomment_templates.tmpl_admin_users(ln=ln, users_data=users_data) def query_get_users_reported(): """ Get the users who have been reported at least one. @return tuple of ct, i.e. (ct, ct, ...) where ct is a tuple (total_number_reported, total_comments_reported, total_reviews_reported, total_nb_votes_yes_of_reported, total_nb_votes_total_of_reported, user_id, user_email, user_nickname) sorted by order of ct having highest total_number_reported """ query1 = "SELECT c.nb_abuse_reports, c.nb_votes_yes, c.nb_votes_total, u.id, u.email, u.nickname, c.star_score " \ "FROM user AS u, cmtRECORDCOMMENT AS c " \ "WHERE c.id_user=u.id AND c.nb_abuse_reports > 0 " \ "ORDER BY u.id " res1 = run_sql(query1) if type(res1) is None: return () users = {} for cmt in res1: uid = int(cmt[3]) if users.has_key(uid): users[uid] = (users[uid][0]+int(cmt[0]), int(cmt[6])>0 and users[uid][1] or users[uid][1]+1, int(cmt[6])>0 and users[uid][2]+1 or users[uid][2], users[uid][3]+int(cmt[1]), users[uid][4]+int(cmt[2]), int(cmt[3]), cmt[4], cmt[5]) else: users[uid] = (int(cmt[0]), int(cmt[6])==0 and 1 or 0, int(cmt[6])>0 and 1 or 0, int(cmt[1]), int(cmt[2]), int(cmt[3]), cmt[4], cmt[5]) users = users.values() users.sort() users.reverse() users = tuple(users) return users def perform_request_comments(ln=cdslang, uid="", comID="", reviews=0): """ """ warning = [] ln = wash_url_argument(ln, 'str') uid = wash_url_argument(uid, 'int') comID = wash_url_argument(comID, 'int') reviews = wash_url_argument(reviews, 'int') comments = query_get_comments(uid, comID, reviews) return webcomment_templates.tmpl_admin_comments(ln=ln, uid=uid, comID=comID, comment_data=comments, reviews=reviews) def query_get_comments(uid, comID, reviews): """ private funciton Get the reported comments of user uid or get the comment comID or get the comment comID which was written by user uid @return same type of tuple as that which is returned by webcomment.py/query_retrieve_comments_or_remarks i.e. tuple of comment where comment is tuple (nickname, date_creation, body, id) if ranking disabled or tuple (nickname, date_creation, body, nb_votes_yes, nb_votes_total, star_score, title, id) """ query1 = "SELECT u.nickname, c.date_creation, c.body, %s c.id, c.id_bibrec, c.id_user, " \ "c.nb_abuse_reports, u.id, u.email, u.nickname " \ "FROM user AS u, cmtRECORDCOMMENT AS c " \ "WHERE c.id_user=u.id %s %s %s " \ "ORDER BY c.nb_abuse_reports DESC, c.nb_votes_yes DESC, c.date_creation " params1 = ( reviews>0 and " c.nb_votes_yes, c.nb_votes_total, c.star_score, c.title, " or "", reviews>0 and " AND c.star_score>0 " or " AND c.star_score=0 ", uid>0 and " AND c.id_user=%s " % uid or "", comID>0 and " AND c.id=%s " % comID or " AND c.nb_abuse_reports>0 " ) res1 = run_sql(query1 % params1) res2 = [] for qtuple1 in res1: # exceptional use of html here for giving admin extra information new_info = """
user (nickname=%s, email=%s, id=%s)
comment/review id = %s
commented this record (id=%s)
""" break except StandardError, e: pass text += """ """ tables = tables - 1 text += """
reported %s times        
""" \ % (len(qtuple1[0])>0 and qtuple1[0] or qtuple1[-2].split('@')[0], qtuple1[-2], qtuple1[-5], qtuple1[-7], weburl, qtuple1[-6], qtuple1[-6], qtuple1[-4], qtuple1[-7]) if reviews: qtuple2 = (len(qtuple1[0])>0 and qtuple1[0] or qtuple1[-2].split('@')[0], str(qtuple1[1])+new_info, qtuple1[2], qtuple1[3], qtuple1[4], qtuple1[5], qtuple1[6], qtuple1[7]) else: qtuple2 = (len(qtuple1[0])>0 and qtuple1[0] or qtuple1[-2].split('@')[0], str(qtuple1[1])+new_info, qtuple1[2], qtuple1[3]) res2.append(qtuple2) return tuple(res2) def perform_request_del_com(ln=cdslang, comIDs=[]): """ private function Delete the comments and say whether successful or not @param ln: language @param comIDs: list of comment ids """ ln = wash_url_argument(ln, 'str') comIDs = wash_url_argument(comIDs, 'list') # map ( fct, list, arguments of function) comIDs = map(wash_url_argument, comIDs, ('int '*len(comIDs)).split(' ')[:-1]) if not comIDs: comIDs = map(coerce, comIDs, ('0 '*len(comIDs)).split(' ')[:-1]) return webcomment_templates.tmpl_admin_del_com(del_res=comIDs) del_res=[] for id in comIDs: del_res.append((id, query_delete_comment(id))) return webcomment_templates.tmpl_admin_del_com(del_res=del_res) def query_delete_comment(comID): """ delete comment with id comID @return integer 1 if successful, integer 0 if not """ query1 = "DELETE FROM cmtRECORDCOMMENT WHERE id=%s" params1 = (comID,) res1 = run_sql(query1, params1) return int(res1) def getnavtrail(previous = ''): """ Get the navtrail """ navtrail = """Admin Area > WebComment Admin """ % (weburl, weburl) navtrail = navtrail + previous return navtrail diff --git a/modules/webcomment/web/admin/webcommentadmin.py b/modules/webcomment/web/admin/webcommentadmin.py index 8391aff31..f31f233b6 100644 --- a/modules/webcomment/web/admin/webcommentadmin.py +++ b/modules/webcomment/web/admin/webcommentadmin.py @@ -1,168 +1,167 @@ # -*- coding: utf-8 -*- ## $Id$ ## Comments and reviews for records. - + ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - -__lastupdated__ = """$FIXME $""" +__lastupdated__ = """$Date$""" from cdsware.webcommentadminlib import * from cdsware.webpage import page, create_error_box from cdsware.config import weburl,cdslang from cdsware.webuser import getUid, page_not_authorized def index(req, ln=cdslang): """ Menu of admin options @param ln: language """ navtrail_previous_links = getnavtrail() + """ > Comment Management""" % (weburl,) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) (auth_code, auth_msg) = check_user(uid,'cfgwebcomment') if not auth_code: return page(title="Comment Management", body=perform_request_index(ln=ln), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth_msg, navtrail=navtrail_previous_links) def delete(req, ln=cdslang, comid=""): """ Delete a comment by giving its comment id @param ln: language @param comid: comment id """ navtrail_previous_links = getnavtrail() + """ > Comment Management""" % (weburl,) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) (auth_code, auth_msg) = check_user(uid,'cfgwebcomment') if not auth_code: (body, errors, warnings) = perform_request_delete(ln=ln, comID=comid) return page(title="Delete Comment", body=body, uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, req = req, errors = errors, warnings = warnings, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth_msg, navtrail=navtrail_previous_links) def comments(req, ln=cdslang, uid="", comid="", reviews=0): """ View reported comments, filter by either user or a specific comment (only one given at a time) @param ln: language @param uid: user id @param comid: comment id @param reviews: boolean enabled for reviews, disabled for comments """ navtrail_previous_links = getnavtrail() + """ > Comment Management""" % (weburl,) try: auid = getUid(req) except MySQLdb.Error, e: return error_page(req) (auth_code, auth_msg) = check_user(auid,'cfgwebcomment') if not auth_code: return page(title="View all Reported %s" % (reviews>0 and "Reviews" or "Comments",), body=perform_request_comments(ln=ln, uid=uid, comID=comid, reviews=reviews), uid=auid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth_msg, navtrail=navtrail_previous_links) def users(req, ln=cdslang): """ View a list of all the users that have been reported, sorted by most reported @param ln: language """ navtrail_previous_links = getnavtrail() + """ > Comment Management""" % (weburl,) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) (auth_code, auth_msg) = check_user(uid,'cfgwebcomment') if not auth_code: return page(title="View all Reported Users", body=perform_request_users(ln=ln), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth_msg, navtrail=navtrail_previous_links) def del_com(req, ln=cdslang, **hidden): """ private funciton Delete a comment @param ln: language @param **hidden: ids of comments to delete sent as individual variables comidX=on, where X is id """ navtrail_previous_links = getnavtrail() + """ > Comment Management""" % (weburl,) try: uid = getUid(req) except MySQLdb.Error, e: return error_page(req) (auth_code, auth_msg) = check_user(uid,'cfgwebcomment') if not auth_code: comIDs = [] args = hidden.keys() for var in args: try: comIDs.append(int(var.split('comid')[1])) except: pass return page(title="Delete Comments", body=perform_request_del_com(ln=ln, comIDs=comIDs), uid=uid, language=ln, urlargs=req.args, navtrail = navtrail_previous_links, lastupdated=__lastupdated__) else: return page_not_authorized(req=req, text=auth_msg, navtrail=navtrail_previous_links) diff --git a/modules/webcomment/web/comments.py b/modules/webcomment/web/comments.py index 3c8a356e9..90cef2766 100644 --- a/modules/webcomment/web/comments.py +++ b/modules/webcomment/web/comments.py @@ -1,210 +1,210 @@ # -*- coding: utf-8 -*- ## $Id$ ## Comments and reviews for records. - + ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. - -__lastupdated__ = """FIXME: last updated""" - + +__lastupdated__ = """$Date$""" + +from mod_python import apache +import urllib + from cdsware import webcomment from cdsware.config import * from cdsware.webuser import getUid, page_not_authorized, isGuestUser from cdsware.webaccount import create_login_page_box, create_register_page_box from cdsware.webpage import page, create_error_box from cdsware.search_engine import create_navtrail_links, guess_primary_collection_of_a_record -from mod_python import apache -import urllib - def index(req): """ Redirects to display function """ req.err_headers_out.add("Location", "%s/comments.py/display?%s" % (weburl, req.args)) raise apache.SERVER_RETURN, apache.HTTP_MOVED_PERMANENTLY def display(req, recid=-1, ln=cdslang, do='od', ds='all', nb=100, p=1, voted=-1, reported=-1, reviews=0): """ Display comments (reviews if enabled) associated with record having id recid where recid>0. This function can also be used to display remarks associated with basket having id recid where recid<-99. @param ln: language @param recid: record id, integer @param do: display order hh = highest helpful score, review only lh = lowest helpful score, review only hs = highest star score, review only ls = lowest star score, review only od = oldest date nd = newest date @param ds: display since all= no filtering by date nd = n days ago nw = n weeks ago nm = n months ago ny = n years ago where n is a single digit integer between 0 and 9 @param nb: number of results per page @param p: results page @param voted: boolean, active if user voted for a review, see vote function @param reported: boolean, active if user reported a certain comment/review, see report function @param reviews: boolean, enabled for reviews, disabled for comments @return the full html page. """ uid = getUid(req) check_warnings = [] (ok, problem) = webcomment.check_recID_is_in_range(recid, check_warnings, ln) if ok: (body, errors_to_display, warnings) = webcomment.perform_request_display_comments_or_remarks(recID=recid, display_order=do, display_since=ds, nb_per_page=nb, page=p, ln=ln, voted=voted, reported=reported, reviews=reviews) navtrail = create_navtrail_links(cc=guess_primary_collection_of_a_record(recid)) + \ """ > Detailed record #%s""" % (weburl, recid, ln, recid) + \ """ > %s""" % (reviews==1 and "Reviews" or "Comments",) return page(title="", body=body, navtrail=navtrail, description="", keywords="", uid=uid, cdspageheaderadd="", cdspageboxlefttopadd="", cdspageboxleftbottomadd="", cdspageboxrighttopadd="", cdspageboxrightbottomadd="", cdspagefooteradd="", lastupdated="", urlargs="", verbose=1, titleprologue="", titleepilogue="", req=req, errors=errors_to_display, warnings=warnings) else: return page(title="Record Not Found", body=problem, description="", keywords="", uid=uid, cdspageheaderadd="", cdspageboxlefttopadd="", cdspageboxleftbottomadd="", cdspageboxrighttopadd="", cdspageboxrightbottomadd="", cdspagefooteradd="", lastupdated="", urlargs="", verbose=1, titleprologue="", titleepilogue="", req=req, warnings=check_warnings, errors=[]) def add(req, ln=cdslang, recid=-1, action='DISPLAY', msg="", note="", score="", reviews=0, comid=-1): """ Add a comment (review) to record with id recid where recid>0 Also works for adding a remark to basket with id recid where recid<-99 @param ln: languange @param recid: record id @param action: 'DISPLAY' to display add form 'SUBMIT' to submit comment once form is filled 'REPLY' to reply to an already existing comment @param msg: the body of the comment/review or remark @param score: star score of the review @param note: title of the review @param comid: comment id, needed for replying @return the full html page. """ actions = ['DISPLAY', 'REPLY', 'SUBMIT'] uid = getUid(req) check_warnings = [] (ok, problem) = webcomment.check_recID_is_in_range(recid, check_warnings, ln) if ok: navtrail = create_navtrail_links(cc=guess_primary_collection_of_a_record(recid)) + \ """ > Detailed record #%s""" % (weburl, recid, ln, recid) + \ """ > %s""" % (weburl, recid, ln, reviews==1 and 'Reviews' or 'Comments') if action not in actions: action = 'DISPLAY' # is page allowed to be viewed if uid == -1 or (not cfg_webcomment_allow_comments and not cfg_comment_allow_reviews): return page_not_authorized(req, "../comments.py/add") # if guest, must log in first if isGuestUser(uid): msg = "Before you add your comment, you need to log in first" referer = "%s/comments.py/add?recid=%s&ln=%s&reviews=%s&comid=%s&action=%s" % (weburl, recid, ln, reviews, comid, action) login_box = create_login_page_box(referer=referer, ln=ln) return page(title="Login", body=msg+login_box, navtrail=navtrail, description="", keywords="", uid=uid, cdspageheaderadd="", cdspageboxlefttopadd="", cdspageboxleftbottomadd="", cdspageboxrighttopadd="", cdspageboxrightbottomadd="", cdspagefooteradd="", lastupdated="", language=cdslang, urlargs="", verbose=1, titleprologue="", titleepilogue="") # user logged in else: (body, errors, warnings) = webcomment.perform_request_add_comment_or_remark(recID=recid, uid=uid, action=action, msg=msg, note=note, score=score, reviews=reviews, comID=comid) title = "Add %s" % (reviews in [1, '1'] and 'Review' or 'Comment') return page(title=title, body=body, navtrail=navtrail, description="", keywords="", uid=uid, cdspageheaderadd="", cdspageboxlefttopadd="", cdspageboxleftbottomadd="", cdspageboxrighttopadd="", cdspageboxrightbottomadd="", cdspagefooteradd="", lastupdated="", language=cdslang, urlargs="", verbose=1, titleprologue="", titleepilogue="", errors=errors, warnings=warnings) else: return page(title="Record Not Found", body=problem, description="", keywords="", uid=uid, cdspageheaderadd="", cdspageboxlefttopadd="", cdspageboxleftbottomadd="", cdspageboxrighttopadd="", cdspageboxrightbottomadd="", cdspagefooteradd="", lastupdated="", urlargs="", verbose=1, titleprologue="", titleepilogue="", req=req, warnings=check_warnings, errors=[]) def vote(req, comid=-1, com_value=0, recid=-1, ln=cdslang, do='od', ds='all', nb=100, p=1, referer=None, reviews=0): """ Vote positively or negatively for a comment/review. @param comid: comment/review id @param com_value: +1 to vote positively -1 to vote negatively @param recid: the id of the record the comment/review is associated with @param ln: language @param do: display order hh = highest helpful score, review only lh = lowest helpful score, review only hs = highest star score, review only ls = lowest star score, review only od = oldest date nd = newest date @param ds: display since all= no filtering by date nd = n days ago nw = n weeks ago nm = n months ago ny = n years ago where n is a single digit integer between 0 and 9 @param nb: number of results per page @param p: results page @param referer: http address of the calling function to redirect to (refresh) @param reviews: boolean, enabled for reviews, disabled for comments """ success = webcomment.perform_request_vote(comid, com_value) if referer: referer = referer + '''?recid=%s&ln=%s&do=%s&ds=%s&nb=%s&p=%s&voted=%s&reviews=%s''' % \ (recid, ln, do, ds, nb, p, success, reviews) req.err_headers_out.add("Location", referer) raise apache.SERVER_RETURN, apache.HTTP_MOVED_PERMANENTLY else: #Note: sent to commetns display req.err_headers_out.add("Location", "%s/comments.py/display?recid=%s&ln=%s&reviews=1&voted=1" % (weburl, recid, ln)) raise apache.SERVER_RETURN, apache.HTTP_MOVED_PERMANENTLY def report(req, comid=-1, recid=-1, ln=cdslang, do='od', ds='all', nb=100, p=1, referer=None, reviews=0): """ Report a comment/review for inappropriate content @param comid: comment/review id @param recid: the id of the record the comment/review is associated with @param ln: language @param do: display order hh = highest helpful score, review only lh = lowest helpful score, review only hs = highest star score, review only ls = lowest star score, review only od = oldest date nd = newest date @param ds: display since all= no filtering by date nd = n days ago nw = n weeks ago nm = n months ago ny = n years ago where n is a single digit integer between 0 and 9 @param nb: number of results per page @param p: results page @param referer: http address of the calling function to redirect to (refresh) @param reviews: boolean, enabled for reviews, disabled for comments """ success = webcomment.perform_request_report(comid) if referer: referer = referer + '''?recid=%s&ln=%s&do=%s&ds=%s&nb=%s&p=%s&reported=%s&reviews=%s''' % \ (recid, ln, do, ds, nb, p, success, reviews) req.err_headers_out.add("Location", referer) raise apache.SERVER_RETURN, apache.HTTP_MOVED_PERMANENTLY else: #Note: sent to comments display req.err_headers_out.add("Location", "%s/comments.py/display?recid=%s&ln=%s&reviews=1&voted=1" % (weburl, recid, ln)) raise apache.SERVER_RETURN, apache.HTTP_MOVED_PERMANENTLY diff --git a/modules/webmessage/bin/webmessageadmin.in b/modules/webmessage/bin/webmessageadmin.in index 9025f2607..19d16a67d 100644 --- a/modules/webmessage/bin/webmessageadmin.in +++ b/modules/webmessage/bin/webmessageadmin.in @@ -1,100 +1,98 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """WebMessage Admin -- clean messages""" __version__ = "$Id$" try: import getpass import readline import sys - pylibdir = "/home/cdsware/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.config import supportemail from cdsware.webmessage_dblayer import clean_messages from cdsware.dbquery import run_sql except ImportError, e: print "Error: %s" % e import sys sys.exit(1) def usage(code, msg=''): """Print usage info.""" if msg: sys.stderr.write("Error: %s.\n" % msg) sys.stderr.write("WebMessage Admin -- cleans forgotten messages") sys.stderr.write("Usage: %s [options] \n" % sys.argv[0]) sys.stderr.write("Command options:\n") sys.stderr.write(" = clean\n") sys.stderr.write("General options:\n") sys.stderr.write(" -h, --help \t\t Print this help.\n") sys.stderr.write(" -V, --version \t\t Print version information.\n") sys.exit(code) def main(): """CLI to clean_messages. The function finds the needed arguments in sys.argv. If the number of arguments is wrong it prints help. Return 1 on success, 0 on failure. """ alen = len(sys.argv) action = '' # print help if wrong arguments if alen > 1 and sys.argv[1] in ["-h", "--help"]: usage(0) elif alen > 1 and sys.argv[1] in ["-V", "--version"]: print __version__ sys.exit(0) if alen != 2 or sys.argv[1] not in ['clean']: usage(1) # getting input from user print 'User: ', user = raw_input() password = getpass.getpass() # validating input perform = 0 # check password if user == supportemail: perform = run_sql("""select * from user where email = '%s' and password = '%s' """ % (supportemail, password)) and 1 or 0 if not perform: # wrong password or user not recognized print 'User not authorized' return perform # perform chosen action if sys.argv[1] == 'clean': cleaned = clean_messages() print 'Database cleaned. %i suppressed messages' % int(cleaned) return perform if __name__ == '__main__': main() diff --git a/modules/websearch/bin/webcoll.in b/modules/websearch/bin/webcoll.in index ec8f139da..4a8144cd1 100644 --- a/modules/websearch/bin/webcoll.in +++ b/modules/websearch/bin/webcoll.in @@ -1,1156 +1,1154 @@ #!@PYTHON@ ## -*- mode: python; coding: utf-8; -*- ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Creates CDSware collection specific pages, using WML and MySQL configuration tables.""" __version__ = "$Id$" ## import modules: try: import calendar import copy import getopt import getpass import marshal import signal import sys import cgi import sre import os import math import string import urllib import zlib import MySQLdb import Numeric import time import traceback except ImportError, e: print "Error: %s" % e import sys sys.exit(1) try: - pylibdir = "@prefix@/lib/python" - sys.path.append('%s' % pylibdir) from cdsware.config import * from cdsware.messages import gettext_set_language, language_list_long from cdsware.search_engine import HitSet, search_pattern, get_creation_date, nice_number, get_field_i18nname from cdsware.search_engine_config import cfg_author_et_al_threshold, cfg_instant_browse, cfg_max_recID, cfg_narrow_search_show_grandsons from cdsware.dbquery import run_sql from cdsware.access_control_engine import acc_authorize_action from cdsware.bibrank_record_sorter import get_bibrank_methods import cdsware.template websearch_templates = cdsware.template.load('websearch') except ImportError, e: print "Error: %s" % e import sys sys.exit(1) ## global vars collection_house = {} # will hold collections we treat in this run of the program; a dict of {collname2, collobject1}, ... options = {} # will hold task options # cfg_cache_last_updated_timestamp_tolerance -- cache timestamp # tolerance (in seconds), to account for the fact that an admin might # accidentally happen to edit the collection definitions at exactly # the same second when some webcoll process was about to be started. # In order to be safe, let's put an exaggerated timestamp tolerance # value such as 20 seconds: cfg_cache_last_updated_timestamp_tolerance = 20 # cfg_cache_last_updated_timestamp_file -- location of the cache # timestamp file: cfg_cache_last_updated_timestamp_file = "%s/collections/last_updated" % cachedir def get_collection(colname): """Return collection object from the collection house for given colname. If does not exist, then create it.""" if not collection_house.has_key(colname): colobject = Collection(colname) collection_house[colname] = colobject return collection_house[colname] ## auxiliary functions: def mymkdir(newdir, mode=0777): """works the way a good mkdir should :) - already exists, silently complete - regular file in the way, raise an exception - parent directory(ies) does not exist, make them as well """ if os.path.isdir(newdir): pass elif os.path.isfile(newdir): raise OSError("a file with the same name as the desired " \ "dir, '%s', already exists." % newdir) else: head, tail = os.path.split(newdir) if head and not os.path.isdir(head): mymkdir(head, mode) if tail: os.umask(022) os.mkdir(newdir, mode) def escape_string(s): "Escapes special chars in string. For MySQL queries." s = MySQLdb.escape_string(s) return s def is_selected(var, fld): "Checks if the two are equal, and if yes, returns ' selected'. Useful for select boxes." if var == fld: return " selected" else: return "" def write_message(msg, stream=sys.stdout): """Write message and flush output stream (may be sys.stdout or sys.stderr). Useful for debugging stuff.""" if stream == sys.stdout or stream == sys.stderr: stream.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())) stream.write("%s\n" % msg) stream.flush() else: sys.stderr.write("Unknown stream %s. [must be sys.stdout or sys.stderr]\n" % stream) return def get_field(recID, tag): "Gets list of field 'tag' for the record with 'recID' system number." out = [] digit = tag[0:2] bx = "bib%sx" % digit bibx = "bibrec_bib%sx" % digit query = "SELECT bx.value FROM %s AS bx, %s AS bibx WHERE bibx.id_bibrec='%s' AND bx.id=bibx.id_bibxxx AND bx.tag='%s'" \ % (bx, bibx, recID, tag) res = run_sql(query) for row in res: out.append(row[0]) return out def print_record(recID, format='hb', ln=cdslang): "Prints record 'recID' formatted accoding to 'format'." out = "" # HTML brief format by default query = "SELECT value FROM bibfmt WHERE id_bibrec='%s' AND format='%s'" % (recID, format) res = run_sql(query, None, 1) if res: # record 'recID' is formatted in 'format', so print it out += "%s" % zlib.decompress(res[0][0]) else: # record 'recID' does not exist in format 'format', so print some default format: # firstly, title: titles = get_field(recID, "245__a") # secondly, authors: authors = get_field(recID, "100__a") + get_field(recID, "700__a") # thirdly, date of creation: dates = get_field(recID, "260__c") # thirdly bis, report numbers: rns = get_field(recID, "037__a") + get_field(recID, "088__a") # fourthly, beginning of abstract: abstracts = get_field(recID, "520__a") # fifthly, fulltext link: urls_z = get_field(recID, "8564_z") urls_u = get_field(recID, "8564_u") out += websearch_templates.tmpl_record_body( weburl = weburl, titles = titles, authors = authors, dates = dates, rns = rns, abstracts = abstracts, urls_u = urls_u, urls_z = urls_z ) # at the end of HTML mode, print "Detailed record" and "Mark record" functions: out += websearch_templates.tmpl_record_links( weburl = weburl, recid = recID, ln = ln ) return out class Collection: "Holds the information on collections (id,name,dbquery)." def __init__(self, name=""): "Creates collection instance by querying the MySQL configuration database about 'name'." self.calculate_reclist_run_already = 0 # to speed things up wihtout much refactoring self.update_reclist_run_already = 0 # to speed things up wihtout much refactoring self.reclist_with_nonpublic_subcolls = HitSet() if not name: self.name = cdsname # by default we are working on the home page self.id = 1 self.dbquery = None self.nbrecs = None self.reclist = HitSet() else: self.name = name query = "SELECT id,name,dbquery,nbrecs,reclist FROM collection WHERE name='%s'" % escape_string(name) try: res = run_sql(query, None, 1) if res: self.id = res[0][0] self.name = res[0][1] self.dbquery = res[0][2] self.nbrecs = res[0][3] try: self.reclist = HitSet(Numeric.loads(zlib.decompress(res[0][5]))) except: self.reclist = HitSet() else: # collection does not exist! self.id = None self.dbquery = None self.nbrecs = None self.reclist = HitSet() except MySQLdb.Error, e: print "Error %d: %s" % (e.args[0], e.args[1]) sys.exit(1) def get_name(self, ln=cdslang, name_type="ln", prolog="", epilog="", prolog_suffix=" ", epilog_suffix=""): """Return nicely formatted collection name for language LN. The NAME_TYPE may be 'ln' (=long name), 'sn' (=short name), etc.""" out = prolog i18name = "" res = run_sql("SELECT value FROM collectionname WHERE id_collection=%s AND ln=%s AND type=%s", (self.id, ln, name_type)) try: i18name += res[0][0] except IndexError: pass if i18name: out += i18name else: out += self.name out += epilog return out def get_ancestors(self): "Returns list of ancestors of the current collection." ancestors = [] id_son = self.id while 1: query = "SELECT cc.id_dad,c.name FROM collection_collection AS cc, collection AS c "\ "WHERE cc.id_son=%d AND c.id=cc.id_dad" % int(id_son) res = run_sql(query, None, 1) if res: col_ancestor = get_collection(res[0][1]) ancestors.append(col_ancestor) id_son = res[0][0] else: break ancestors.reverse() return ancestors def restricted_p(self): """Predicate to test if the collection is restricted or not. Return the contect of the `restrited' column of the collection table (typically Apache group). Otherwise return None if the collection is public.""" out = None query = "SELECT restricted FROM collection WHERE id=%d" % self.id res = run_sql(query, None, 1) try: out = res[0][0] except: pass return out def get_sons(self, type='r'): "Returns list of direct sons of type 'type' for the current collection." sons = [] id_dad = self.id query = "SELECT cc.id_son,c.name FROM collection_collection AS cc, collection AS c "\ "WHERE cc.id_dad=%d AND cc.type='%s' AND c.id=cc.id_son ORDER BY score DESC, c.name ASC" % (int(id_dad), type) res = run_sql(query) for row in res: sons.append(get_collection(row[1])) return sons def get_descendants(self, type='r'): "Returns list of all descendants of type 'type' for the current collection." descendants = [] id_dad = self.id query = "SELECT cc.id_son,c.name FROM collection_collection AS cc, collection AS c "\ "WHERE cc.id_dad=%d AND cc.type='%s' AND c.id=cc.id_son ORDER BY score DESC" % (int(id_dad), type) res = run_sql(query) for row in res: col_desc = get_collection(row[1]) descendants.append(col_desc) descendants += col_desc.get_descendants() return descendants def write_cache_file(self, filename='', filebody=''): "Write a file inside collection cache." # open file: dirname = "%s/collections/%d" % (cachedir, self.id) mymkdir(dirname) fullfilename = dirname + "/%s.html" % filename try: os.umask(022) f = open(fullfilename, "w") except IOError, v: try: (code, message) = v except: code = 0 message = v print "I/O Error: " + str(message) + " (" + str(code) + ")" sys.exit(1) # print user info: if options["verbose"] >= 6: write_message("... creating %s" % fullfilename) sys.stdout.flush() # print page body: f.write(filebody) # close file: f.close() def update_webpage_cache(self): """Create collection page header, navtrail, body (including left and right stripes) and footer, and call write_cache_file() afterwards to update the collection webpage cache.""" ## do this for each language: for lang, lang_fullname in language_list_long(): # load the right message language _ = gettext_set_language(lang) ## first, update navtrail: for as in range(0,2): self.write_cache_file("navtrail-as=%s-ln=%s" % (as, lang), self.create_navtrail_links(as, lang)) ## second, update page body: for as in range(0,2): # do both simple search and advanced search pages: body = websearch_templates.tmpl_webcoll_body( weburl = weburl, te_portalbox = self.create_portalbox(lang, 'te'), searchfor = self.create_searchfor(as, lang), np_portalbox = self.create_portalbox(lang, 'np'), narrowsearch = self.create_narrowsearch(as, lang, _("Narrow by collection:")), focuson = self.create_narrowsearch(as, lang, _("Focus on:"), "v"), ne_portalbox = self.create_portalbox(lang, 'ne') ) self.write_cache_file("body-as=%s-ln=%s" % (as, lang), body) ## third, write portalboxes: self.write_cache_file("portalbox-tp-ln=%s" % lang, self.create_portalbox(lang, "tp")) self.write_cache_file("portalbox-te-ln=%s" % lang, self.create_portalbox(lang, "te")) self.write_cache_file("portalbox-lt-ln=%s" % lang, self.create_portalbox(lang, "lt")) self.write_cache_file("portalbox-rt-ln=%s" % lang, self.create_portalbox(lang, "rt")) ## fourth, write 'last updated' information: self.write_cache_file("last-updated-ln=%s" % lang, time.strftime("%02d %b %04Y %02H:%02M:%02S %Z", time.localtime())) return def create_navtrail_links(self, \ as=0, ln=cdslang, separator=" > "): """Creates navigation trail links, i.e. links to collection ancestors (except Home collection). If as==1, then links to Advanced Search interfaces; otherwise Simple Search. """ dads = [] for dad in self.get_ancestors(): if dad.name != cdsname: # exclude Home collection dads.append ((dad.name, dad.get_name(ln))) return websearch_templates.tmpl_navtrail_links( as = as, ln = ln, weburl = weburl, separator = separator, dads = dads) def create_nbrecs_info(self, ln=cdslang, prolog = None, epilog = None): "Return information on the number of records." return websearch_templates.tmpl_nbrecs_info (number = nice_number (self.nbrecs, ln), prolog = prolog, epilog = epilog) def create_portalbox(self, lang=cdslang, position="rt"): """Creates portalboxes of language CDSLANG of the position POSITION by consulting MySQL configuration database. The position may be: 'lt'='left top', 'rt'='right top', etc.""" out = "" query = "SELECT p.title,p.body FROM portalbox AS p, collection_portalbox AS cp "\ " WHERE cp.id_collection=%d AND p.id=cp.id_portalbox AND cp.ln='%s' AND cp.position='%s' "\ " ORDER BY cp.score DESC" % (self.id, lang, position) res = run_sql(query) for row in res: title, body = row[0], row[1] if title: out += websearch_templates.tmpl_portalbox(title = title, body = body) else: # no title specified, so print body ``as is'' only: out += body return out def create_narrowsearch(self, as=0, ln=cdslang, title="Narrow search", type="r"): """Creates list of collection descendants of type 'type' under title 'title'. If as==1, then links to Advanced Search interfaces; otherwise Simple Search. Suitable for 'Narrow search' and 'Focus on' boxes.""" # get list of sons and analyse it sons = self.get_sons(type) # return nothing for type 'v' (virtual collection) if there are no sons: if type == 'v' and not sons: return "" # load instant browse parts, in case no descendants if not len(sons) and type == 'r': instant_browse = self.create_instant_browse(ln=ln) else: instant_browse = '' # get descendents descendants = self.get_descendants(type) grandsons = [] if cfg_narrow_search_show_grandsons: # load grandsons for each son for son in sons: grandsons.append(son.get_sons()) # return "" return websearch_templates.tmpl_narrowsearch( as = as, ln = ln, weburl = weburl, type = type, father = self, has_grandchildren = len(descendants)>len(sons), title = title, instant_browse = instant_browse, sons = sons, display_grandsons = cfg_narrow_search_show_grandsons, grandsons = grandsons ) def create_instant_browse(self, rg=cfg_instant_browse, ln=cdslang): "Searches database and produces list of last 'rg' records." box = "" if self.restricted_p(): return websearch_templates.tmpl_box_restricted_content(ln = ln) else: url = "%s/search.py?cc=%s&jrec=%d" % (weburl, urllib.quote_plus(self.name), rg+1) if self.nbrecs and self.reclist: # firstly, get last 'rg' records: recIDs = Numeric.nonzero(self.reclist._set) passIDs = [] for idx in range(self.nbrecs-1, self.nbrecs-rg-1, -1): if idx>=0: passIDs.append({'id' : recIDs[idx], 'body' : print_record(recIDs[idx], ln=ln) }) if not (self.nbrecs > rg): url = "" recdates = [] for recid in passIDs: recid['date'] = get_creation_date(recid['id'], fmt="%Y-%m-%d
%H:%i") return websearch_templates.tmpl_instant_browse( ln = ln, recids = passIDs, more_link = url ) else: return websearch_templates.tmpl_box_no_records(ln = ln) return box def create_searchoptions(self): "Produces 'Search options' portal box." box="" query = """SELECT DISTINCT(cff.id_field),f.code,f.name FROM collection_field_fieldvalue AS cff, field AS f WHERE cff.id_collection=%d AND cff.id_fieldvalue IS NOT NULL AND cff.id_field=f.id ORDER BY cff.score DESC""" % self.id res = run_sql(query) if res: for row in res: field_id = row[0] field_code = row[1] field_name = row[2] query_bis = """SELECT fv.value,fv.name FROM fieldvalue AS fv, collection_field_fieldvalue AS cff WHERE cff.id_collection=%d AND cff.type='seo' AND cff.id_field=%d AND fv.id=cff.id_fieldvalue ORDER BY cff.score_fieldvalue DESC, cff.score DESC, fv.name ASC""" % (self.id, field_id) res_bis = run_sql(query_bis) if res_bis: values = [{'value' : '', 'text' : 'any' + field_name}] # @todo internationalisation of "any" for row_bis in res_bis: values.append({'value' : cgi.escape(row_bis[0], 1), 'text' : row_bis[1]}) box += websearch_templates.tmpl_select( fieldname = field_code, values = values ) return box def create_sortoptions(self, ln=cdslang): "Produces 'Sort options' portal box." # load the right message language _ = gettext_set_language(ln) box="" query = """SELECT f.code,f.name FROM field AS f, collection_field_fieldvalue AS cff WHERE id_collection=%d AND cff.type='soo' AND cff.id_field=f.id ORDER BY cff.score DESC, f.name ASC""" % self.id values = [{'value' : '', 'text': "- %s -" % _("latest first")}] res = run_sql(query) if res: for row in res: values.append({'value' : row[0], 'text': row[1]}) else: for tmp in ('title', 'author', 'report number', 'year'): values.append({'value' : tmp.replace(' ', ''), 'text' : get_field_i18nname(tmp, ln)}) box = websearch_templates.tmpl_select( fieldname = 'sf', css_class = 'address', values = values ) box += websearch_templates.tmpl_select( fieldname = 'so', css_class = 'address', values = [ {'value' : 'a' , 'text' : _("asc.")}, {'value' : 'd' , 'text' : _("desc.")} ] ) return box def create_rankoptions(self, ln=cdslang): "Produces 'Rank options' portal box." # load the right message language _ = gettext_set_language(ln) values = [{'value' : '', 'text': "- %s %s -" % (string.lower(_("OR")), _("rank by"))}] for (code,name) in get_bibrank_methods(self.id, ln): values.append({'value' : code, 'text': name}) box = websearch_templates.tmpl_select( fieldname = 'sf', css_class = 'address', values = values ) return box def create_displayoptions(self, ln=cdslang): "Produces 'Display options' portal box." # load the right message language _ = gettext_set_language(ln) values = [] for i in ['10', '25', '50', '100', '250', '500']: values.append({'value' : i, 'text' : i + ' ' + _("results")}) box = websearch_templates.tmpl_select( fieldname = 'rg', css_class = 'address', values = values ) if self.get_sons(): box += websearch_templates.tmpl_select( fieldname = 'sc', css_class = 'address', values = [ {'value' : '1' , 'text' : _("split by collection")}, {'value' : '0' , 'text' : _("single list")} ] ) return box def create_formatoptions(self, ln=cdslang): "Produces 'Output format options' portal box." # load the right message language _ = gettext_set_language(ln) box = "" values = [] query = """SELECT f.code,f.name FROM format AS f, collection_format AS cf WHERE cf.id_collection=%d AND cf.id_format=f.id ORDER BY cf.score DESC, f.name ASC""" % self.id res = run_sql(query) if res: for row in res: values.append({'value' : row[0], 'text': row[1]}) else: values.append({'value' : 'hb', 'text' : "HTML %s" % _("brief")}) box = websearch_templates.tmpl_select( fieldname = 'of', css_class = 'address', values = values ) def create_searchwithin_selection_box(self, fieldname='f', value='', ln='en'): "Produces 'search within' selection box for the current collection." # get values query = """SELECT f.code,f.name FROM field AS f, collection_field_fieldvalue AS cff WHERE cff.type='sew' AND cff.id_collection=%d AND cff.id_field=f.id ORDER BY cff.score DESC, f.name ASC""" % self.id res = run_sql(query) values = [{'value' : '', 'text' : get_field_i18nname("any field", ln)}] if res: for row in res: values.append({'value' : row[0], 'text' : row[1]}) else: if cfg_cern_site: for tmp in ['title', 'author', 'abstract', 'report number', 'year']: values.append({'value' : tmp.replace(' ', ''), 'text' : get_field_i18nname(tmp, ln)}) else: for tmp in ['title', 'author', 'abstract', 'keyword', 'report number', 'year', 'fulltext', 'reference']: values.append({'value' : tmp.replace(' ', ''), 'text' : get_field_i18nname(tmp, ln)}) return websearch_templates.tmpl_searchwithin_select( fieldname = fieldname, ln = ln, selected = value, values = values ) def create_searchexample(self): "Produces search example(s) for the current collection." out = "$collSearchExamples = getSearchExample(%d, $se);" % self.id return out def create_searchfor(self, as=0, ln=cdslang): "Produces either Simple or Advanced 'Search for' box for the current collection." if as == 1: return self.create_searchfor_advanced(ln) else: return self.create_searchfor_simple(ln) def create_searchfor_simple(self, ln=cdslang): "Produces simple 'Search for' box for the current collection." # load the right message language _ = gettext_set_language(ln) if self.name != cdsname: ssearchurl = "?c=%s&as=0&ln=%s" % (urllib.quote_plus(self.name), ln) asearchurl = "?c=%s&as=1&ln=%s" % (urllib.quote_plus(self.name), ln) else: # hide cdsname for aesthetical reasons ssearchurl = "?as=0&ln=%s" % ln asearchurl = "?as=1&ln=%s" % ln return websearch_templates.tmpl_searchfor_simple( ln = ln, weburl = weburl, asearchurl = asearchurl, header = _("Search %s records for:") % self.create_nbrecs_info(ln, "",""), middle_option = self.create_searchwithin_selection_box(ln=ln), ) def create_searchfor_advanced(self, ln=cdslang): "Produces advanced 'Search for' box for the current collection." # load the right message language _ = gettext_set_language(ln) if self.name != cdsname: ssearchurl = "?c=%s&as=0&ln=%s" % (urllib.quote_plus(self.name), ln) asearchurl = "?c=%s&as=1&ln=%s" % (urllib.quote_plus(self.name), ln) else: # hide cdsname for aesthetical reasons ssearchurl = "?as=0&ln=%s" % ln asearchurl = "?as=1&ln=%s" % ln return websearch_templates.tmpl_searchfor_advanced( ln = ln, weburl = weburl, ssearchurl = ssearchurl, header = _("Search %s records for:") % self.create_nbrecs_info(ln, "",""), middle_option_1 = self.create_searchwithin_selection_box('f1', ln=ln), middle_option_2 = self.create_searchwithin_selection_box('f2', ln=ln), middle_option_3 = self.create_searchwithin_selection_box('f3', ln=ln), searchoptions = self.create_searchoptions(), sortoptions = self.create_sortoptions(ln), rankoptions = self.create_rankoptions(ln), displayoptions = self.create_displayoptions(ln), formatoptions = self.create_formatoptions(ln) ) def calculate_reclist(self): """Calculate, set and return the (reclist, reclist_with_nonpublic_subcolls) tuple for given collection.""" if self.calculate_reclist_run_already: # do we have to recalculate? return (self.reclist, self.reclist_with_nonpublic_subcolls) if options["verbose"] >= 6: write_message("... calculating reclist of %s" % self.name) reclist = HitSet() # will hold results for public sons only; good for storing into DB reclist_with_nonpublic_subcolls = HitSet() # will hold results for both public and nonpublic sons; good for deducing total # number of documents if not self.dbquery: # A - collection does not have dbquery, so query recursively all its sons # that are either non-restricted or that have the same restriction rules for coll in self.get_sons(): coll_reclist, coll_reclist_with_nonpublic_subcolls = coll.calculate_reclist() if ((coll.restricted_p() is None) or (coll.restricted_p() == self.restricted_p())): # add this reclist ``for real'' only if it is public reclist.union(coll_reclist) reclist_with_nonpublic_subcolls.union(coll_reclist_with_nonpublic_subcolls) else: # B - collection does have dbquery, so compute it: reclist = search_pattern(None,self.dbquery) reclist_with_nonpublic_subcolls = copy.deepcopy(reclist) # deduce the number of records: reclist.calculate_nbhits() reclist_with_nonpublic_subcolls.calculate_nbhits() # store the results: self.nbrecs = reclist_with_nonpublic_subcolls._nbhits self.reclist = reclist self.reclist_with_nonpublic_subcolls = reclist_with_nonpublic_subcolls # last but not least, update the speed-up flag: self.calculate_reclist_run_already = 1 # return the two sets: return (self.reclist, self.reclist_with_nonpublic_subcolls) def update_reclist(self): "Update the record universe for given collection; nbrecs, reclist of the collection table." if self.update_reclist_run_already: # do we have to reupdate? return 0 if options["verbose"] >= 6: write_message("... updating reclist of %s (%s recs)" % (self.name, self.nbrecs)) sys.stdout.flush() try: query = "UPDATE collection SET nbrecs=%d, reclist='%s' WHERE id=%d" % \ (self.nbrecs, escape_string(zlib.compress(Numeric.dumps(self.reclist._set))), self.id) res = run_sql(query) self.reclist_updated_since_start = 1 except MySQLdb.Error, e: print "Database Query Error %d: %s." % (e.args[0], e.args[1]) sys.exit(1) # last but not least, update the speed-up flag: self.update_reclist_run_already = 1 return 0 def usage(code, msg=''): "Prints usage info." if msg: sys.stderr.write("Error: %s.\n" % msg) sys.stderr.write("Usage: %s [collection][+]\n" % sys.argv[0]) sys.stderr.write("""Description: %s updates the collection cache (record universe for a given collection plus web page elements) based on WML and MySQL configuration parameters. If the collection name is passed as the second argument, it'll update this collection only. If the collection name is immediately followed by a plus sign, it will also update all its desdendants. The top-level collection name may be entered as the void string.\n""" % sys.argv[0]) sys.stderr.write("Example: %s update-reclist\n" % sys.argv[0]) sys.stderr.write("Example: %s update-webpage\n" % sys.argv[0]) sys.stderr.write("Example: %s update-webpage \"Articles & Preprints\"\n" % sys.argv[0]) sys.stderr.write("Example: %s update-webpage \"Articles & Preprints\"+\n" % sys.argv[0]) sys.stderr.write("Example: %s update-webpage \"\"\n" % sys.argv[0]) sys.stderr.write("Example: %s update-reclist \"\"+\n" % sys.argv[0]) sys.exit(code) def get_datetime(var, format_string="%Y-%m-%d %H:%M:%S"): """Returns a date string according to the format string. It can handle normal date strings and shifts with respect to now.""" date = time.time() shift_re=sre.compile("([-\+]{0,1})([\d]+)([dhms])") factors = {"d":24*3600, "h":3600, "m":60, "s":1} m = shift_re.match(var) if m: sign = m.groups()[0] == "-" and -1 or 1 factor = factors[m.groups()[2]] value = float(m.groups()[1]) date = time.localtime(date + sign * factor * value) date = time.strftime(format_string, date) else: date = time.strptime(var, format_string) date = time.strftime(format_string, date) return date def get_current_time_timestamp(): """Return timestamp corresponding to the current time.""" return time.strftime("%04Y-%02m-%02d %02H:%02M:%02S", time.localtime()) def compare_timestamps_with_tolerance(timestamp1, timestamp2, tolerance=0): """Compare two timestamps TIMESTAMP1 and TIMESTAMP2, of the form '2005-03-31 17:37:26'. Optionally receives a TOLERANCE argument (in seconds). Return -1 if TIMESTAMP1 is less than TIMESTAMP2 minus TOLERANCE, 0 if they are equal within TOLERANCE limit, and 1 if TIMESTAMP1 is greater than TIMESTAMP2 plus TOLERANCE. """ # remove any trailing .00 in timestamps: timestamp1 = sre.sub(r'\.[0-9]+$', '', timestamp1) timestamp2 = sre.sub(r'\.[0-9]+$', '', timestamp2) # first convert timestamps to Unix epoch seconds: timestamp1_seconds = calendar.timegm(time.strptime(timestamp1, "%Y-%m-%d %H:%M:%S")) timestamp2_seconds = calendar.timegm(time.strptime(timestamp2, "%Y-%m-%d %H:%M:%S")) # now compare them: if timestamp1_seconds < timestamp2_seconds - tolerance: return -1 elif timestamp1_seconds > timestamp2_seconds + tolerance: return 1 else: return 0 def get_database_last_updated_timestamp(): """Return last updated timestamp for collection-related and record-related database tables. """ database_tables_timestamps = [] database_tables_timestamps.extend(map(lambda x: str(x[11]), run_sql("SHOW TABLE STATUS LIKE 'bibrec'"))) database_tables_timestamps.extend(map(lambda x: str(x[11]), run_sql("SHOW TABLE STATUS LIKE 'bibfmt'"))) database_tables_timestamps.extend(map(lambda x: str(x[11]), run_sql("SHOW TABLE STATUS LIKE 'idxWORD%%'"))) database_tables_timestamps.extend(map(lambda x: str(x[11]), run_sql("SHOW TABLE STATUS LIKE 'collection%%'"))) database_tables_timestamps.extend(map(lambda x: str(x[11]), run_sql("SHOW TABLE STATUS LIKE 'portalbox'"))) database_tables_timestamps.extend(map(lambda x: str(x[11]), run_sql("SHOW TABLE STATUS LIKE 'field%%'"))) database_tables_timestamps.extend(map(lambda x: str(x[11]), run_sql("SHOW TABLE STATUS LIKE 'format%%'"))) database_tables_timestamps.extend(map(lambda x: str(x[11]), run_sql("SHOW TABLE STATUS LIKE 'rnkMETHODNAME'"))) return max(database_tables_timestamps) def get_cache_last_updated_timestamp(): """Return last updated cache timestamp.""" try: f = open(cfg_cache_last_updated_timestamp_file, "r") except: return "1970-01-01 00:00:00" timestamp = f.read() f.close() return timestamp def set_cache_last_updated_timestamp(timestamp): """Set last updated cache timestamp to TIMESTAMP.""" try: f = open(cfg_cache_last_updated_timestamp_file, "w") except: pass f.write(timestamp) f.close() return timestamp def write_message(msg, stream=sys.stdout): """Prints message and flush output stream (may be sys.stdout or sys.stderr).""" if stream == sys.stdout or stream == sys.stderr: stream.write(time.strftime("%Y-%m-%d %H:%M:%S --> ", time.localtime())) stream.write("%s\n" % msg) stream.flush() else: sys.stderr.write("Unknown stream %s. [must be sys.stdout or sys.stderr]\n" % stream) def task_sig_sleep(sig, frame): """Signal handler for the 'sleep' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("sleeping...") task_update_status("SLEEPING") signal.pause() # wait for wake-up signal def task_sig_wakeup(sig, frame): """Signal handler for the 'wakeup' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("continuing...") task_update_status("CONTINUING") def task_sig_stop(sig, frame): """Signal handler for the 'stop' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("stopping...") task_update_status("STOPPING") pass # FIXME: is there anything to be done? task_update_status("STOPPED") sys.exit(0) def task_sig_suicide(sig, frame): """Signal handler for the 'suicide' signal sent by BibSched.""" if options["verbose"] >= 9: write_message("got signal %d" % sig) write_message("suiciding myself now...") task_update_status("SUICIDING") write_message("suicided") task_update_status("SUICIDED") sys.exit(0) def task_sig_unknown(sig, frame): """Signal handler for the other unknown signals sent by shell or user.""" write_message("unknown signal %d ignored" % sig) # do nothing for other signals def authenticate(user, header="WebColl Task Submission", action="runwebcoll"): """Authenticate the user against the user database. Check for its password, if it exists. Check for action access rights. Return user name upon authorization success, do system exit upon authorization failure. """ print header print "=" * len(header) if user == "": print >> sys.stdout, "\rUsername: ", user = string.strip(string.lower(sys.stdin.readline())) else: print >> sys.stdout, "\rUsername: ", user ## first check user pw: res = run_sql("select id,password from user where email=%s", (user,), 1) if not res: print "Sorry, %s does not exist." % user sys.exit(1) else: (uid_db, password_db) = res[0] if password_db: password_entered = getpass.getpass() if password_db == password_entered: pass else: print "Sorry, wrong credentials for %s." % user sys.exit(1) ## secondly check authorization for the action: (auth_code, auth_message) = acc_authorize_action(uid_db, action) if auth_code != 0: print auth_message sys.exit(1) return user def task_submit(options): """Submits task to the BibSched task queue. This is what people will be invoking via command line.""" ## sanity check: remove eventual "task" option: if options.has_key("task"): del options["task"] ## authenticate user: user = authenticate(options.get("user", "")) ## submit task: if options["verbose"] >= 9: print "" write_message("storing task options %s\n" % options) task_id = run_sql("""INSERT INTO schTASK (id,proc,user,runtime,sleeptime,status,arguments) VALUES (NULL,'webcoll',%s,%s,%s,'WAITING',%s)""", (user, options["runtime"], options["sleeptime"], marshal.dumps(options))) ## update task number: options["task"] = task_id run_sql("""UPDATE schTASK SET arguments=%s WHERE id=%s""", (marshal.dumps(options),task_id)) write_message("Task #%d submitted." % task_id) return task_id def task_update_progress(msg): """Updates progress information in the BibSched task table.""" global task_id return run_sql("UPDATE schTASK SET progress=%s where id=%s", (msg, task_id)) def task_update_status(val): """Updates status information in the BibSched task table.""" global task_id return run_sql("UPDATE schTASK SET status=%s where id=%s", (val, task_id)) def task_read_status(task_id): """Read status information in the BibSched task table.""" res = run_sql("SELECT status FROM schTASK where id=%s", (task_id,), 1) try: out = res[0][0] except: out = 'UNKNOWN' return out def task_get_options(id): """Returns options for the task 'id' read from the BibSched task queue table.""" out = {} res = run_sql("SELECT arguments FROM schTASK WHERE id=%s AND proc='webcoll'", (id,)) try: out = marshal.loads(res[0][0]) except: write_message("Error: WebColl task %d does not seem to exist." % id) sys.exit(1) return out def task_run(): """Run the WebColl task by fetching arguments from the BibSched task queue. This is what BibSched will be invoking via daemon call. The task will update collection reclist cache and collection web pages for given collection. (default is all). Arguments described in usage() function. Return 1 in case of success and 0 in case of failure.""" global task_id, options task_run_start_timestamp = get_current_time_timestamp() options = task_get_options(task_id) # get options from BibSched task table ## check task id: if not options.has_key("task"): write_message("Error: The task #%d does not seem to be a WebColl task." % task_id) return 0 ## check task status: task_status = task_read_status(task_id) if task_status != "WAITING": write_message("Error: The task #%d is %s. I expected WAITING." % (task_id, task_status)) return 0 ## we can run the task now: if options["verbose"]: write_message("Task #%d started." % task_id) task_update_status("RUNNING") ## initialize signal handler: signal.signal(signal.SIGUSR1, task_sig_sleep) signal.signal(signal.SIGTERM, task_sig_stop) signal.signal(signal.SIGABRT, task_sig_suicide) signal.signal(signal.SIGCONT, task_sig_wakeup) signal.signal(signal.SIGINT, task_sig_unknown) colls = [] # decide whether we need to run or not, by comparing last updated timestamps: if options["verbose"] >= 3: write_message("Database timestamp is %s." % get_database_last_updated_timestamp()) write_message("Collection cache timestamp is %s." % get_cache_last_updated_timestamp()) if options.has_key("force") or \ compare_timestamps_with_tolerance(get_database_last_updated_timestamp(), get_cache_last_updated_timestamp(), cfg_cache_last_updated_timestamp_tolerance) >= 0: ## either forced update was requested or cache is not up to date, so recreate it: # firstly, decide which collections to do: if options.has_key("collection"): coll = get_collection(options["collection"]) if coll.id == None: usage(1, 'Collection %s does not exist' % coll.name) colls.append(coll) else: res = run_sql("SELECT name FROM collection ORDER BY id") for row in res: colls.append(get_collection(row[0])) # secondly, update collection reclist cache: i = 0 for coll in colls: i += 1 if options["verbose"]: write_message("%s / reclist cache update" % coll.name) coll.calculate_reclist() coll.update_reclist() task_update_progress("Part 1/2: done %d/%d" % (i,len(colls))) # thirdly, update collection webpage cache: i = 0 for coll in colls: i += 1 if options["verbose"]: write_message("%s / web cache update" % coll.name) coll.update_webpage_cache() task_update_progress("Part 2/2: done %d/%d" % (i,len(colls))) # finally update the cache last updated timestamp: # (but only when all collections were updated, not when only # some of them were forced-updated as per admin's demand) if not options.has_key("collection"): set_cache_last_updated_timestamp(task_run_start_timestamp) if options["verbose"] >= 3: write_message("Collection cache timestamp is set to %s." % get_cache_last_updated_timestamp()) else: ## cache up to date, we don't have to run if options["verbose"]: write_message("Collection cache is up to date, no need to run.") pass ## we are done: task_update_progress("Done.") task_update_status("DONE") if options["verbose"]: write_message("Task #%d finished." % task_id) return 1 def usage(exitcode=1, msg=""): """Prints usage info.""" if msg: sys.stderr.write("Error: %s.\n" % msg) sys.stderr.write("Usage: %s [options]\n" % sys.argv[0]) sys.stderr.write("Command options:\n") sys.stderr.write(" -c, --collection\t Update cache for the given collection only. [all]\n") sys.stderr.write(" -f, --force\t Force update even if cache is up to date. [no]\n") sys.stderr.write("Scheduling options:\n") sys.stderr.write(" -u, --user=USER \t User name to submit the task as, password needed.\n") sys.stderr.write(" -t, --runtime=TIME \t Time to execute the task (now), e.g.: +15s, 5m, 3h, 2002-10-27 13:57:26\n") sys.stderr.write(" -s, --sleeptime=SLEEP \t Sleeping frequency after which to repeat task (no), e.g.: 30m, 2h, 1d\n") sys.stderr.write("General options:\n") sys.stderr.write(" -h, --help \t\t Print this help.\n") sys.stderr.write(" -V, --version \t\t Print version information.\n") sys.stderr.write(" -v, --verbose=LEVEL \t Verbose level (from 0 to 9, default 1).\n") sys.stderr.write("""Description: %s updates the collection cache (record universe for a given collection plus web page elements) based on WML and MySQL configuration parameters. If the collection name is passed as the second argument, it'll update this collection only. If the collection name is immediately followed by a plus sign, it will also update all its desdendants. The top-level collection name may be entered as the void string.\n""" % sys.argv[0]) sys.exit(exitcode) def main(): """Main function that analyzes command line input and calls whatever is appropriate. Useful for learning on how to write BibSched tasks.""" global task_id ## parse command line: if len(sys.argv) == 2 and sys.argv[1].isdigit(): ## A - run the task task_id = int(sys.argv[1]) try: if not task_run(): write_message("Error occurred. Exiting.", sys.stderr) except StandardError, e: write_message("Unexpected error occurred: %s." % e, sys.stderr) write_message("Traceback is:", sys.stderr) traceback.print_tb(sys.exc_info()[2]) write_message("Exiting.", sys.stderr) task_update_status("ERROR") else: ## B - submit the task # set default values: options["runtime"] = time.strftime("%Y-%m-%d %H:%M:%S") options["verbose"] = 1 options["sleeptime"] = "" # set user-defined options: try: opts, args = getopt.getopt(sys.argv[1:], "hVv:u:s:t:c:f", ["help", "version", "verbose=","user=","sleep=","time=","collection=","force"]) except getopt.GetoptError, err: usage(1, err) try: for opt in opts: if opt[0] in ["-h", "--help"]: usage(0) elif opt[0] in ["-V", "--version"]: print __version__ sys.exit(0) elif opt[0] in [ "-u", "--user"]: options["user"] = opt[1] elif opt[0] in ["-v", "--verbose"]: options["verbose"] = int(opt[1]) elif opt[0] in [ "-s", "--sleeptime" ]: get_datetime(opt[1]) # see if it is a valid shift options["sleeptime"] = opt[1] elif opt[0] in [ "-t", "--runtime" ]: options["runtime"] = get_datetime(opt[1]) elif opt[0] in [ "-c", "--collection"]: options["collection"] = opt[1] elif opt[0] in [ "-f", "--force"]: options["force"] = 1 else: usage(1) except StandardError, e: usage(e) task_submit(options) return ### okay, here we go: if __name__ == '__main__': main() diff --git a/modules/websearch/lib/search_engine.py b/modules/websearch/lib/search_engine.py index 36ae5167a..19367e60e 100644 --- a/modules/websearch/lib/search_engine.py +++ b/modules/websearch/lib/search_engine.py @@ -1,3494 +1,3493 @@ # -*- coding: utf-8 -*- ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware Search Engine in mod_python.""" __lastupdated__ = """$Date$""" __version__ = "$Id$" ## import general modules: import cgi import copy import Cookie import cPickle import marshal import fileinput import getopt import string from string import split import os import sre import sys import time import traceback import urllib import zlib import MySQLdb import Numeric import md5 import base64 import unicodedata ## import CDSware stuff: -from config import * - -from search_engine_config import * -from bibrank_record_sorter import get_bibrank_methods,rank_records -from bibrank_downloads_similarity import register_page_view_event, calculate_reading_similarity_list +from cdsware.config import * +from cdsware.search_engine_config import * +from cdsware.bibrank_record_sorter import get_bibrank_methods,rank_records +from cdsware.bibrank_downloads_similarity import register_page_view_event, calculate_reading_similarity_list if cfg_experimental_features: - from bibrank_citation_searcher import calculate_cited_by_list, calculate_co_cited_with_list - from bibrank_citation_grapher import create_citation_history_graph_and_box - from bibrank_downloads_grapher import create_download_history_graph_and_box -from dbquery import run_sql + from cdsware.bibrank_citation_searcher import calculate_cited_by_list, calculate_co_cited_with_list + from cdsware.bibrank_citation_grapher import create_citation_history_graph_and_box + from cdsware.bibrank_downloads_grapher import create_download_history_graph_and_box +from cdsware.dbquery import run_sql try: from mod_python import apache - from webuser import getUid - from webpage import pageheaderonly, pagefooteronly, create_error_box + from cdsware.webuser import getUid + from cdsware.webpage import pageheaderonly, pagefooteronly, create_error_box except ImportError, e: pass # ignore user personalisation, needed e.g. for command-line -from messages import gettext_set_language, wash_language +from cdsware.messages import gettext_set_language, wash_language try: - import template - websearch_templates = template.load('websearch') + import cdsware.template + websearch_templates = cdsware.template.load('websearch') except: pass ## global vars: search_cache = {} # will cache results of previous searches cfg_nb_browse_seen_records = 100 # limit of the number of records to check when browsing certain collection cfg_nicely_ordered_collection_list = 0 # do we propose collection list nicely ordered or alphabetical? ## precompile some often-used regexp for speed reasons: sre_word = sre.compile('[\s]') sre_quotes = sre.compile('[\'\"]') sre_doublequote = sre.compile('\"') sre_equal = sre.compile('\=') sre_logical_and = sre.compile('\sand\s', sre.I) sre_logical_or = sre.compile('\sor\s', sre.I) sre_logical_not = sre.compile('\snot\s', sre.I) sre_operators = sre.compile(r'\s([\+\-\|])\s') sre_pattern_wildcards_at_beginning = sre.compile(r'(\s)[\*\%]+') sre_pattern_single_quotes = sre.compile("'(.*?)'") sre_pattern_double_quotes = sre.compile("\"(.*?)\"") sre_pattern_regexp_quotes = sre.compile("\/(.*?)\/") sre_pattern_short_words = sre.compile(r'([\s\"]\w{1,3})[\*\%]+') sre_pattern_space = sre.compile("__SPACE__") sre_pattern_today = sre.compile("\$TODAY\$") sre_unicode_lowercase_a = sre.compile(unicode(r"(?u)[áàäâãå]", "utf-8")) sre_unicode_lowercase_ae = sre.compile(unicode(r"(?u)[æ]", "utf-8")) sre_unicode_lowercase_e = sre.compile(unicode(r"(?u)[éèëê]", "utf-8")) sre_unicode_lowercase_i = sre.compile(unicode(r"(?u)[íìïî]", "utf-8")) sre_unicode_lowercase_o = sre.compile(unicode(r"(?u)[óòöôõø]", "utf-8")) sre_unicode_lowercase_u = sre.compile(unicode(r"(?u)[úùüû]", "utf-8")) sre_unicode_lowercase_y = sre.compile(unicode(r"(?u)[ýÿ]", "utf-8")) sre_unicode_lowercase_c = sre.compile(unicode(r"(?u)[ç]", "utf-8")) sre_unicode_lowercase_n = sre.compile(unicode(r"(?u)[ñ]", "utf-8")) sre_unicode_uppercase_a = sre.compile(unicode(r"(?u)[ÁÀÄÂÃÅ]", "utf-8")) sre_unicode_uppercase_ae = sre.compile(unicode(r"(?u)[Æ]", "utf-8")) sre_unicode_uppercase_e = sre.compile(unicode(r"(?u)[ÉÈËÊ]", "utf-8")) sre_unicode_uppercase_i = sre.compile(unicode(r"(?u)[ÍÌÏÎ]", "utf-8")) sre_unicode_uppercase_o = sre.compile(unicode(r"(?u)[ÓÒÖÔÕØ]", "utf-8")) sre_unicode_uppercase_u = sre.compile(unicode(r"(?u)[ÚÙÜÛ]", "utf-8")) sre_unicode_uppercase_y = sre.compile(unicode(r"(?u)[Ý]", "utf-8")) sre_unicode_uppercase_c = sre.compile(unicode(r"(?u)[Ç]", "utf-8")) sre_unicode_uppercase_n = sre.compile(unicode(r"(?u)[Ñ]", "utf-8")) def get_alphabetically_ordered_collection_list(collid=1, level=0): """Returns nicely ordered (score respected) list of collections, more exactly list of tuples (collection name, printable collection name). Suitable for create_search_box().""" out = [] query = "SELECT id,name FROM collection ORDER BY name ASC" res = run_sql(query) for c_id, c_name in res: # make a nice printable name (e.g. truncate c_printable for for long collection names): if len(c_name)>30: c_printable = c_name[:30] + "..." else: c_printable = c_name if level: c_printable = " " + level * '-' + " " + c_printable out.append([c_name, c_printable]) return out def get_nicely_ordered_collection_list(collid=1, level=0): """Returns nicely ordered (score respected) list of collections, more exactly list of tuples (collection name, printable collection name). Suitable for create_search_box().""" colls_nicely_ordered = [] query = "SELECT c.name,cc.id_son FROM collection_collection AS cc, collection AS c "\ " WHERE c.id=cc.id_son AND cc.id_dad='%s' ORDER BY score DESC" % collid res = run_sql(query) for c, cid in res: # make a nice printable name (e.g. truncate c_printable for for long collection names): if len(c)>30: c_printable = c[:30] + "..." else: c_printable = c if level: c_printable = " " + level * '-' + " " + c_printable colls_nicely_ordered.append([c, c_printable]) colls_nicely_ordered = colls_nicely_ordered + get_nicely_ordered_collection_list(cid, level+1) return colls_nicely_ordered def get_index_id(field): """Returns first index id where the field code FIELD is indexed. Returns zero in case there is no table for this index. Example: field='author', output=4.""" out = 0 query = """SELECT w.id FROM idxINDEX AS w, idxINDEX_field AS wf, field AS f WHERE f.code='%s' AND wf.id_field=f.id AND w.id=wf.id_idxINDEX LIMIT 1""" % MySQLdb.escape_string(field) res = run_sql(query, None, 1) if res: out = res[0][0] return out def get_words_from_pattern(pattern): "Returns list of whitespace-separated words from pattern." words = {} for word in split(pattern): if not words.has_key(word): words[word] = 1; return words.keys() def create_basic_search_units(req, p, f, m=None, of='hb'): """Splits search pattern and search field into a list of independently searchable units. - A search unit consists of '(operator, pattern, field, type, hitset)' tuples where 'operator' is set union (|), set intersection (+) or set exclusion (-); 'pattern' is either a word (e.g. muon*) or a phrase (e.g. 'nuclear physics'); 'field' is either a code like 'title' or MARC tag like '100__a'; 'type' is the search type ('w' for word file search, 'a' for access file search). - Optionally, the function accepts the match type argument 'm'. If it is set (e.g. from advanced search interface), then it performs this kind of matching. If it is not set, then a guess is made. 'm' can have values: 'a'='all of the words', 'o'='any of the words', 'p'='phrase/substring', 'r'='regular expression', 'e'='exact value'. - Warnings are printed on req (when not None) in case of HTML output formats.""" opfts = [] # will hold (o,p,f,t,h) units ## check arguments: if matching type phrase/string/regexp, do we have field defined? if (m=='p' or m=='r' or m=='e') and not f: m = 'a' if of.startswith("h"): print_warning(req, "This matching type cannot be used within any field. I will perform a word search instead." ) print_warning(req, "If you want to phrase/substring/regexp search in a specific field, e.g. inside title, then please choose within title search option.") ## is desired matching type set? if m: ## A - matching type is known; good! if m == 'e': # A1 - exact value: opfts.append(['+',p,f,'a']) # '+' since we have only one unit elif m == 'p': # A2 - phrase/substring: opfts.append(['+',"%"+p+"%",f,'a']) # '+' since we have only one unit elif m == 'r': # A3 - regular expression: opfts.append(['+',p,f,'r']) # '+' since we have only one unit elif m == 'a' or m == 'w': # A4 - all of the words: p = strip_accents(p) # strip accents for 'w' mode, FIXME: delete when not needed for word in get_words_from_pattern(p): opfts.append(['+',word,f,'w']) # '+' in all units elif m == 'o': # A5 - any of the words: p = strip_accents(p) # strip accents for 'w' mode, FIXME: delete when not needed for word in get_words_from_pattern(p): if len(opfts)==0: opfts.append(['+',word,f,'w']) # '+' in the first unit else: opfts.append(['|',word,f,'w']) # '|' in further units else: if of.startswith("h"): print_warning(req, "Matching type '%s' is not implemented yet." % m, "Warning") opfts.append(['+',"%"+p+"%",f,'a']) else: ## B - matching type is not known: let us try to determine it by some heuristics if f and p[0]=='"' and p[-1]=='"': ## B0 - does 'p' start and end by double quote, and is 'f' defined? => doing ACC search opfts.append(['+',p[1:-1],f,'a']) elif f and p[0]=="'" and p[-1]=="'": ## B0bis - does 'p' start and end by single quote, and is 'f' defined? => doing ACC search opfts.append(['+','%'+p[1:-1]+'%',f,'a']) elif f and p[0]=="/" and p[-1]=="/": ## B0ter - does 'p' start and end by a slash, and is 'f' defined? => doing regexp search opfts.append(['+',p[1:-1],f,'r']) elif f and string.find(p, ',') >= 0: ## B1 - does 'p' contain comma, and is 'f' defined? => doing ACC search opfts.append(['+',p,f,'a']) elif f and str(f[0:2]).isdigit(): ## B2 - does 'f' exist and starts by two digits? => doing ACC search opfts.append(['+',p,f,'a']) else: ## B3 - doing WRD search, but maybe ACC too # search units are separated by spaces unless the space is within single or double quotes # so, let us replace temporarily any space within quotes by '__SPACE__' p = sre_pattern_single_quotes.sub(lambda x: "'"+string.replace(x.group(1), ' ', '__SPACE__')+"'", p) p = sre_pattern_double_quotes.sub(lambda x: "\""+string.replace(x.group(1), ' ', '__SPACE__')+"\"", p) p = sre_pattern_regexp_quotes.sub(lambda x: "/"+string.replace(x.group(1), ' ', '__SPACE__')+"/", p) # wash argument: p = sre_equal.sub(":", p) p = sre_logical_and.sub(" ", p) p = sre_logical_or.sub(" |", p) p = sre_logical_not.sub(" -", p) p = sre_operators.sub(r' \1', p) for pi in split(p): # iterate through separated units (or items, as "pi" stands for "p item") pi = sre_pattern_space.sub(" ", pi) # replace back '__SPACE__' by ' ' # firstly, determine set operator if pi[0] == '+' or pi[0] == '-' or pi[0] == '|': oi = pi[0] pi = pi[1:] else: # okay, there is no operator, so let us decide what to do by default oi = '+' # by default we are doing set intersection... # secondly, determine search pattern and field: if string.find(pi, ":") > 0: fi, pi = split(pi, ":", 1) else: fi, pi = f, pi # look also for old ALEPH field names: if fi and cfg_fields_convert.has_key(string.lower(fi)): fi = cfg_fields_convert[string.lower(fi)] # wash 'pi' argument: if sre_quotes.match(pi): # B3a - quotes are found => do ACC search (phrase search) if fi: if pi[0] == '"' and pi[-1] == '"': pi = string.replace(pi, '"', '') # remove quote signs opfts.append([oi,pi,fi,'a']) elif pi[0] == "'" and pi[-1] == "'": pi = string.replace(pi, "'", "") # remove quote signs opfts.append([oi,"%"+pi+"%",fi,'a']) else: # unbalanced quotes, so do WRD query: opfts.append([oi,pi,fi,'w']) else: # fi is not defined, look at where we are doing exact or subphrase search (single/double quotes): if pi[0]=='"' and pi[-1]=='"': opfts.append([oi,pi[1:-1],"anyfield",'a']) if of.startswith("h"): print_warning(req, "Searching for an exact match inside any field may be slow. You may want to search for words instead, or choose to search within specific field.") else: # nope, subphrase in global index is not possible => change back to WRD search pi = strip_accents(pi) # strip accents for 'w' mode, FIXME: delete when not needed for pii in get_words_from_pattern(pi): # since there may be '-' and other chars that we do not index in WRD opfts.append([oi,pii,fi,'w']) if of.startswith("h"): print_warning(req, "The partial phrase search does not work in any field. I'll do a boolean AND searching instead.") print_warning(req, "If you want to do a partial phrase search in a specific field, e.g. inside title, then please choose 'within title' search option.", "Tip") print_warning(req, "If you want to do exact phrase matching, then please use double quotes.", "Tip") elif fi and str(fi[0]).isdigit() and str(fi[0]).isdigit(): # B3b - fi exists and starts by two digits => do ACC search opfts.append([oi,pi,fi,'a']) elif fi and not get_index_id(fi): # B3c - fi exists but there is no words table for fi => try ACC search opfts.append([oi,pi,fi,'a']) elif fi and pi.startswith('/') and pi.endswith('/'): # B3d - fi exists and slashes found => try regexp search opfts.append([oi,pi[1:-1],fi,'r']) else: # B3e - general case => do WRD search pi = strip_accents(pi) # strip accents for 'w' mode, FIXME: delete when not needed for pii in get_words_from_pattern(pi): opfts.append([oi,pii,fi,'w']) ## sanity check: for i in range(0,len(opfts)): try: pi = opfts[i][1] if pi == '*': if of.startswith("h"): print_warning(req, "Ignoring standalone wildcard word.", "Warning") del opfts[i] if pi == '' or pi == ' ': fi = opfts[i][2] if fi: if of.startswith("h"): print_warning(req, "Ignoring empty %s search term." % fi, "Warning") del opfts[i] except: pass ## return search units: return opfts def page_start(req, of, cc, as, ln, uid, title_message=None): "Start page according to given output format." _ = gettext_set_language(ln) if not title_message: title_message = _("Search Results") if not req: return # we were called from CLI if of.startswith('x'): # we are doing XML output: req.content_type = "text/xml" req.send_http_header() req.write("""\n""") if of.startswith("xm"): req.write("""\n""") else: req.write("""\n""") elif of.startswith('t') or str(of[0:3]).isdigit(): # we are doing plain text output: req.content_type = "text/plain" req.send_http_header() elif of == "id": pass # nothing to do, we shall only return list of recIDs else: # we are doing HTML output: req.content_type = "text/html" req.send_http_header() req.write(pageheaderonly(title=title_message, navtrail=create_navtrail_links(cc, as, ln, 1), description="%s %s." % (cc, _("Search Results")), keywords="CDSware, WebSearch, %s" % cc, uid=uid, language=ln, urlargs=req.args)) req.write(websearch_templates.tmpl_search_pagestart(ln = ln)) def page_end(req, of="hb", ln=cdslang): "End page according to given output format: e.g. close XML tags, add HTML footer, etc." if of == "id": return [] # empty recID list if not req: return # we were called from CLI if of.startswith('h'): req.write(websearch_templates.tmpl_search_pageend(ln = ln)) # pagebody end req.write(pagefooteronly(lastupdated=__lastupdated__, language=ln, urlargs=req.args)) elif of.startswith('x'): req.write("""\n""") return "\n" def create_inputdate_box(name="d1", selected_year=0, selected_month=0, selected_day=0, ln=cdslang): "Produces 'From Date', 'Until Date' kind of selection box. Suitable for search options." _ = gettext_set_language(ln) box = "" # day box += """""" # month box += """""" # year box += """""" return box def create_google_box(cc, p, f, p1, p2, p3, ln=cdslang, prolog_start="""
""", prolog_end="""
""", column_separator="""""", link_separator= """
""", epilog="""
"""): "Creates the box that proposes links to other useful search engines like Google. 'p' is the search pattern." if not p and (p1 or p2 or p3): p = p1 + " " + p2 + " " + p3 # check suitable p's whether we want to print it if cfg_google_box == 0 or \ p == "" or \ string.find(p, "recid:")>=0 or \ string.find(p, "sysno:")>=0 or \ string.find(p, "sysnos:")>=0: return "" # remove our logical field index names: p = sre.sub(r'\w+:', '', p) return websearch_templates.tmpl_google_box( ln = ln, cc = cc, p = p, f = f, prolog_start = prolog_start, prolog_end = prolog_end, column_separator = column_separator, link_separator = link_separator, epilog = epilog, ) def create_search_box(cc, colls, p, f, rg, sf, so, sp, rm, of, ot, as, ln, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, action=""): "Create search box for 'search again in the results page' functionality." # load the right message language _ = gettext_set_language(ln) # some computations if cc == cdsname: cc_intl = cdsnameintl[ln] else: cc_intl = get_coll_i18nname(cc, ln) colls_nicely_ordered = [] if cfg_nicely_ordered_collection_list: colls_nicely_ordered = get_nicely_ordered_collection_list() else: colls_nicely_ordered = get_alphabetically_ordered_collection_list() colls_nice = [] for (cx, cx_printable) in colls_nicely_ordered: if not cx.startswith("Unnamed collection"): colls_nice.append({ 'value' : cx, 'text' : cx_printable }) coll_selects = [] if colls and colls[0] != cdsname: # some collections are defined, so print these first, and only then print 'add another collection' heading: for c in colls: if c: temp = [] temp.append({ 'value' : '', 'text' : '*** %s ***' % _("any collection") }) for val in colls_nice: # print collection: if not cx.startswith("Unnamed collection"): temp.append({ 'value' : val['value'], 'text' : val['text'], 'selected' : (c == sre.sub("^[\s\-]*","", val['value'])) }) coll_selects.append(temp) coll_selects.append([{ 'value' : '', 'text' : '*** %s ***' % _("add another collection") }] + colls_nice) else: # we searched in CDSNAME, so print 'any collection' heading coll_selects.append([{ 'value' : '', 'text' : '*** %s ***' % _("any collection") }] + colls_nice) sort_formats = [{ 'value' : '', 'text' : _("latest first") }] query = """SELECT DISTINCT(f.code),f.name FROM field AS f, collection_field_fieldvalue AS cff WHERE cff.type='soo' AND cff.id_field=f.id ORDER BY cff.score DESC, f.name ASC""" res = run_sql(query) for code, name in res: sort_formats.append({ 'value' : code, 'text' : name, }) ## ranking methods ranks = [{ 'value' : '', 'text' : "- %s %s -" % (_("OR").lower (), _("rank by")), }] for (code,name) in get_bibrank_methods(get_colID(cc), ln): # propose found rank methods: ranks.append({ 'value' : code, 'text' : name, }) formats = [] query = """SELECT code,name FROM format ORDER BY name ASC""" res = run_sql(query) if res: # propose found formats: for code, name in res: formats.append({ 'value' : code, 'text' : name }) else: formats.append({'value' : 'hb', 'text' : _("HTML brief") }) return websearch_templates.tmpl_search_box( ln = ln, weburl = weburl, as = as, cc_intl = cc_intl, cc = cc, ot = ot, sp = sp, action = action, fieldslist = get_searchwithin_fields(ln = ln), f1 = f1, f2 = f2, f3 = f3, m1 = m1, m2 = m2, m3 = m3, p1 = p1, p2 = p2, p3 = p3, op1 = op1, op2 = op2, rm = rm, p = p, f = f, coll_selects = coll_selects, d1y = d1y, d2y = d2y, d1m = d1m, d2m = d2m, d1d = d1d, d2d = d2d, sort_formats = sort_formats, sf = sf, so = so, ranks = ranks, sc = sc, rg = rg, formats = formats, of = of, ) def create_navtrail_links(cc=cdsname, as=0, ln=cdslang, self_p=1, separator=" > "): """Creates navigation trail links, i.e. links to collection ancestors (except Home collection). If as==1, then links to Advanced Search interfaces; otherwise Simple Search. """ dads = [] for dad in get_coll_ancestors(cc): if dad != cdsname: # exclude Home collection dads.append ((dad, get_coll_i18nname (dad, ln))) if self_p and cc != cdsname: dads.append ((cc, get_coll_i18nname(cc, ln))) return websearch_templates.tmpl_navtrail_links (as = as, ln = ln, weburl = weburl, separator = separator, dads = dads) def create_searchwithin_selection_box(fieldname='f', value='', ln='en'): "Produces 'search within' selection box for the current collection." out = "" out += """""" return out def get_searchwithin_fields(ln='en'): "Retrieves the fields name used in the 'search within' selection box for the current collection." query = "SELECT code,name FROM field ORDER BY name ASC" res = run_sql(query) fields = [{ 'value' : '', 'text' : get_field_i18nname("any field", ln) }] for field_code, field_name in res: if field_code and field_code != "anyfield": fields.append({ 'value' : field_code, 'text' : get_field_i18nname(field_name, ln) }) return fields def create_andornot_box(name='op', value='', ln='en'): "Returns HTML code for the AND/OR/NOT selection box." _ = gettext_set_language(ln) out = """ """ % (name, is_selected('a', value), _("AND"), is_selected('o', value), _("OR"), is_selected('n', value), _("AND NOT")) return out def create_matchtype_box(name='m', value='', ln='en'): "Returns HTML code for the 'match type' selection box." _ = gettext_set_language(ln) out = """ """ % (name, is_selected('a', value), _("All of the words:"), is_selected('o', value), _("Any of the words:"), is_selected('e', value), _("Exact phrase:"), is_selected('p', value), _("Partial phrase:"), is_selected('r', value), _("Regular expression:")) return out def nice_number(num, ln=cdslang): "Returns nicely printed number NUM in language LN using thousands separator char defined in the I18N messages file." if num is None: return None _ = gettext_set_language(ln) separator = _(",") chars_in = list(str(num)) num = len(chars_in) chars_out = [] for i in range(0,num): if i % 3 == 0 and i != 0: chars_out.append(separator) chars_out.append(chars_in[num-i-1]) chars_out.reverse() return ''.join(chars_out) def is_selected(var, fld): "Checks if the two are equal, and if yes, returns ' selected'. Useful for select boxes." if type(var) is int and type(fld) is int: if var == fld: return " selected" elif str(var) == str(fld): return " selected" elif fld and len(fld)==3 and fld[0] == "w" and var == fld[1:]: return " selected" return "" def urlargs_replace_text_in_arg(urlargs, regexp_argname, text_old, text_new): """Analyze `urlargs' (URL CGI GET query arguments) and for each occurrence of argument matching `regexp_argname' replace every substring `text_old' by `text_new'. Return the resulting URL. Useful for create_nearest_terms_box.""" out = "" # parse URL arguments into a dictionary: urlargsdict = cgi.parse_qs(urlargs) ## construct new URL arguments: urlargsdictnew = {} for key in urlargsdict.keys(): if sre.match(regexp_argname, key): # replace `arg' by new values urlargsdictnew[key] = [] for parg in urlargsdict[key]: urlargsdictnew[key].append(string.replace(parg, text_old, text_new)) else: # keep old values urlargsdictnew[key] = urlargsdict[key] # build new URL for this word: for key in urlargsdictnew.keys(): for val in urlargsdictnew[key]: out += "&" + key + "=" + urllib.quote_plus(val, '') if out.startswith("&"): out = out[1:] return out class HitSet: """Class describing set of records, implemented as bit vectors of recIDs. Using Numeric arrays for speed (1 value = 8 bits), can use later "real" bit vectors to save space.""" def __init__(self, init_set=None): self._nbhits = -1 if init_set: self._set = init_set else: self._set = Numeric.zeros(cfg_max_recID+1, Numeric.Int0) def __repr__(self, join=string.join): return "%s(%s)" % (self.__class__.__name__, join(map(repr, self._set), ', ')) def add(self, recID): "Adds a record to the set." self._set[recID] = 1 def addmany(self, recIDs): "Adds several recIDs to the set." for recID in recIDs: self._set[recID] = 1 def addlist(self, arr): "Adds an array of recIDs to the set." Numeric.put(self._set, arr, 1) def remove(self, recID): "Removes a record from the set." self._set[recID] = 0 def removemany(self, recIDs): "Removes several records from the set." for recID in recIDs: self.remove(recID) def intersect(self, other): "Does a set intersection with other. Keep result in self." self._set = Numeric.bitwise_and(self._set, other._set) def union(self, other): "Does a set union with other. Keep result in self." self._set = Numeric.bitwise_or(self._set, other._set) def difference(self, other): "Does a set difference with other. Keep result in self." #self._set = Numeric.bitwise_not(self._set, other._set) for recID in Numeric.nonzero(other._set): self.remove(recID) def contains(self, recID): "Checks whether the set contains recID." return self._set[recID] __contains__ = contains # Higher performance member-test for python 2.0 and above def __getitem__(self, index): "Support for the 'for item in set:' protocol." return Numeric.nonzero(self._set)[index] def calculate_nbhits(self): "Calculates the number of records set in the hitset." self._nbhits = Numeric.sum(self._set.copy().astype(Numeric.Int)) def items(self): "Return an array containing all recID." return Numeric.nonzero(self._set) def tolist(self): "Return an array containing all recID." return Numeric.nonzero(self._set).tolist() # speed up HitSet operations by ~20% if Psyco is installed: try: import psyco psyco.bind(HitSet) except: pass def escape_string(s): "Escapes special chars in string. For MySQL queries." s = MySQLdb.escape_string(s) return s def wash_colls(cc, c, split_colls=0): """Wash collection list by checking whether user has deselected anything under 'Narrow search'. Checks also if cc is a list or not. Return list of cc, colls_to_display, colls_to_search since the list of collections to display is different from that to search in. This is because users might have chosen 'split by collection' functionality. The behaviour of "collections to display" depends solely whether user has deselected a particular collection: e.g. if it started from 'Articles and Preprints' page, and deselected 'Preprints', then collection to display is 'Articles'. If he did not deselect anything, then collection to display is 'Articles & Preprints'. The behaviour of "collections to search in" depends on the 'split_colls' parameter: * if is equal to 1, then we can wash the colls list down and search solely in the collection the user started from; * if is equal to 0, then we are splitting to the first level of collections, i.e. collections as they appear on the page we started to search from; """ colls_out = [] colls_out_for_display = [] # check what type is 'cc': if type(cc) is list: for ci in cc: if collection_reclist_cache.has_key(ci): # yes this collection is real, so use it: cc = ci break else: # check once if cc is real: if not collection_reclist_cache.has_key(cc): cc = cdsname # cc is not real, so replace it with Home collection # check type of 'c' argument: if type(c) is list: colls = c else: colls = [c] # remove all 'unreal' collections: colls_real = [] for coll in colls: if collection_reclist_cache.has_key(coll): colls_real.append(coll) colls = colls_real # check if some real collections remain: if len(colls)==0: colls = [cc] # then let us check the list of non-restricted "real" sons of 'cc' and compare it to 'coll': query = "SELECT c.name FROM collection AS c, collection_collection AS cc, collection AS ccc WHERE c.id=cc.id_son AND cc.id_dad=ccc.id AND ccc.name='%s' AND cc.type='r' AND c.restricted IS NULL" % MySQLdb.escape_string(cc) res = run_sql(query) l_cc_nonrestricted_sons = [] l_c = colls for row in res: l_cc_nonrestricted_sons.append(row[0]) l_c.sort() l_cc_nonrestricted_sons.sort() if l_cc_nonrestricted_sons == l_c: colls_out_for_display = [cc] # yep, washing permitted, it is sufficient to display 'cc' else: colls_out_for_display = colls # nope, we need to display all 'colls' successively # remove duplicates: colls_out_for_display_nondups=filter(lambda x, colls_out_for_display=colls_out_for_display: colls_out_for_display[x-1] not in colls_out_for_display[x:], range(1, len(colls_out_for_display)+1)) colls_out_for_display = map(lambda x, colls_out_for_display=colls_out_for_display:colls_out_for_display[x-1], colls_out_for_display_nondups) # second, let us decide on collection splitting: if split_colls == 0: # type A - no sons are wanted colls_out = colls_out_for_display # elif split_colls == 1: else: # type B - sons (first-level descendants) are wanted for coll in colls_out_for_display: coll_sons = get_coll_sons(coll) if coll_sons == []: colls_out.append(coll) else: colls_out = colls_out + coll_sons # remove duplicates: colls_out_nondups=filter(lambda x, colls_out=colls_out: colls_out[x-1] not in colls_out[x:], range(1, len(colls_out)+1)) colls_out = map(lambda x, colls_out=colls_out:colls_out[x-1], colls_out_nondups) return (cc, colls_out_for_display, colls_out) def strip_accents(x): """Strip accents in the input phrase X (assumed in UTF-8) by replacing accented characters with their unaccented cousins (e.g. é by e). Return such a stripped X.""" # convert input into Unicode string: try: y = unicode(x, "utf-8") except: return x # something went wrong, probably the input wasn't UTF-8 # asciify Latin-1 lowercase characters: y = sre_unicode_lowercase_a.sub("a", y) y = sre_unicode_lowercase_ae.sub("ae", y) y = sre_unicode_lowercase_e.sub("e", y) y = sre_unicode_lowercase_i.sub("i", y) y = sre_unicode_lowercase_o.sub("o", y) y = sre_unicode_lowercase_u.sub("u", y) y = sre_unicode_lowercase_y.sub("y", y) y = sre_unicode_lowercase_c.sub("c", y) y = sre_unicode_lowercase_n.sub("n", y) # asciify Latin-1 uppercase characters: y = sre_unicode_uppercase_a.sub("A", y) y = sre_unicode_uppercase_ae.sub("AE", y) y = sre_unicode_uppercase_e.sub("E", y) y = sre_unicode_uppercase_i.sub("I", y) y = sre_unicode_uppercase_o.sub("O", y) y = sre_unicode_uppercase_u.sub("U", y) y = sre_unicode_uppercase_y.sub("Y", y) y = sre_unicode_uppercase_c.sub("C", y) y = sre_unicode_uppercase_n.sub("N", y) # return UTF-8 representation of the Unicode string: return y.encode("utf-8") def wash_pattern(p): """Wash pattern passed by URL. Check for sanity of the wildcard by removing wildcards if they are appended to extremely short words (1-3 letters). TODO: instead of this approximative treatment, it will be much better to introduce a temporal limit, e.g. to kill a query if it does not finish in 10 seconds.""" # strip accents: # p = strip_accents(p) # FIXME: when available, strip accents all the time # add leading/trailing whitespace for the two following wildcard-sanity checking regexps: p = " " + p + " " # get rid of wildcards at the beginning of words: p = sre_pattern_wildcards_at_beginning.sub("\\1", p) # replace spaces within quotes by __SPACE__ temporarily: p = sre_pattern_single_quotes.sub(lambda x: "'"+string.replace(x.group(1), ' ', '__SPACE__')+"'", p) p = sre_pattern_double_quotes.sub(lambda x: "\""+string.replace(x.group(1), ' ', '__SPACE__')+"\"", p) p = sre_pattern_regexp_quotes.sub(lambda x: "/"+string.replace(x.group(1), ' ', '__SPACE__')+"/", p) # get rid of extremely short words (1-3 letters with wildcards): p = sre_pattern_short_words.sub("\\1", p) # replace back __SPACE__ by spaces: p = sre_pattern_space.sub(" ", p) # replace special terms: p = sre_pattern_today.sub(time.strftime("%04Y-%02m-%02d", time.localtime()), p) # remove unnecessary whitespace: p = string.strip(p) return p def wash_field(f): """Wash field passed by URL.""" # get rid of unnecessary whitespace: f = string.strip(f) # wash old-style CDSware/ALEPH 'f' field argument, e.g. replaces 'wau' and 'au' by 'author' if cfg_fields_convert.has_key(string.lower(f)): f = cfg_fields_convert[f] return f def wash_dates(d1y, d1m, d1d, d2y, d2m, d2d): """Take user-submitted dates (day, month, year) of the web form and return (day1, day2) in YYYY-MM-DD format suitable for time restricted searching. I.e. pay attention when months are not there to put 01 or 12 according to if it's the starting or the ending date, etc.""" day1, day2 = "", "" # sanity checking: if d1y==0 and d1m==0 and d1d==0 and d2y==0 and d2m==0 and d2d==0: return ("", "") # nothing selected, so return empty values # construct day1 (from): if d1y: day1 += "%04d" % d1y else: day1 += "0000" if d1m: day1 += "-%02d" % d1m else: day1 += "-01" if d1d: day1 += "-%02d" % d1d else: day1 += "-01" # construct day2 (until): if d2y: day2 += "%04d" % d2y else: day2 += "9999" if d2m: day2 += "-%02d" % d2m else: day2 += "-12" if d2d: day2 += "-%02d" % d2d else: day2 += "-31" # NOTE: perhaps we should add max(datenumber) in # given month, but for our quering it's not # needed, 31 will always do # okay, return constructed YYYY-MM-DD dates return (day1, day2) def get_colID(c): "Return collection ID for collection name C. Return None if no match found." colID = None res = run_sql("SELECT id FROM collection WHERE name=%s", (c,), 1) if res: colID = res[0][0] return colID def get_coll_i18nname(c, ln=cdslang): """Return nicely formatted collection name (of name type 'ln', 'long name') for collection C in language LN.""" global collection_i18nname_cache global collection_i18nname_cache_timestamp # firstly, check whether the collectionname table was modified: res = run_sql("SHOW TABLE STATUS LIKE 'collectionname'") if res and str(res[0][11])>collection_i18nname_cache_timestamp: # yes it was, cache clear-up needed: collection_i18nname_cache = create_collection_i18nname_cache() # secondly, read i18n name from either the cache or return common name: out = c try: out = collection_i18nname_cache[c][ln] except KeyError: pass # translation in LN does not exist return out def get_field_i18nname(f, ln=cdslang): """Return nicely formatted field name (of type 'ln', 'long name') for field F in language LN.""" global field_i18nname_cache global field_i18nname_cache_timestamp # firstly, check whether the fieldname table was modified: res = run_sql("SHOW TABLE STATUS LIKE 'fieldname'") if res and str(res[0][11])>field_i18nname_cache_timestamp: # yes it was, cache clear-up needed: field_i18nname_cache = create_field_i18nname_cache() # secondly, read i18n name from either the cache or return common name: out = f try: out = field_i18nname_cache[f][ln] except KeyError: pass # translation in LN does not exist return out def get_coll_ancestors(coll): "Returns a list of ancestors for collection 'coll'." coll_ancestors = [] coll_ancestor = coll while 1: query = "SELECT c.name FROM collection AS c "\ "LEFT JOIN collection_collection AS cc ON c.id=cc.id_dad "\ "LEFT JOIN collection AS ccc ON ccc.id=cc.id_son "\ "WHERE ccc.name='%s' ORDER BY cc.id_dad ASC LIMIT 1" \ % escape_string(coll_ancestor) res = run_sql(query, None, 1) if res: coll_name = res[0][0] coll_ancestors.append(coll_name) coll_ancestor = coll_name else: break # ancestors found, return reversed list: coll_ancestors.reverse() return coll_ancestors def get_coll_sons(coll, type='r', public_only=1): """Return a list of sons (first-level descendants) of type 'type' for collection 'coll'. If public_only, then return only non-restricted son collections. """ coll_sons = [] query = "SELECT c.name FROM collection AS c "\ "LEFT JOIN collection_collection AS cc ON c.id=cc.id_son "\ "LEFT JOIN collection AS ccc ON ccc.id=cc.id_dad "\ "WHERE cc.type='%s' AND ccc.name='%s'" \ % (escape_string(type), escape_string(coll)) if public_only: query += " AND c.restricted IS NULL " query += " ORDER BY cc.score DESC" res = run_sql(query) for name in res: coll_sons.append(name[0]) return coll_sons def get_coll_real_descendants(coll): """Return a list of all descendants of collection 'coll' that are defined by a 'dbquery'. IOW, we need to decompose compound collections like "A & B" into "A" and "B" provided that "A & B" has no associated database query defined. """ coll_sons = [] query = "SELECT c.name,c.dbquery FROM collection AS c "\ "LEFT JOIN collection_collection AS cc ON c.id=cc.id_son "\ "LEFT JOIN collection AS ccc ON ccc.id=cc.id_dad "\ "WHERE ccc.name='%s' ORDER BY cc.score DESC" \ % escape_string(coll) res = run_sql(query) for name, dbquery in res: if dbquery: # this is 'real' collection, so return it: coll_sons.append(name) else: # this is 'composed' collection, so recurse: coll_sons.extend(get_coll_real_descendants(name)) return coll_sons def get_collection_reclist(coll): """Return hitset of recIDs that belong to the collection 'coll'. But firstly check the last updated date of the collection table. If it's newer than the cache timestamp, then empty the cache, since new records could have been added.""" global collection_reclist_cache global collection_reclist_cache_timestamp # firstly, check whether the collection table was modified: res = run_sql("SHOW TABLE STATUS LIKE 'collection'") if res and str(res[0][11])>collection_reclist_cache_timestamp: # yes it was, cache clear-up needed: collection_reclist_cache = create_collection_reclist_cache() # secondly, read reclist from either the cache or the database: if not collection_reclist_cache[coll]: # not yet it the cache, so calculate it and fill the cache: set = HitSet() query = "SELECT nbrecs,reclist FROM collection WHERE name='%s'" % coll res = run_sql(query, None, 1) if res: try: set._nbhits, set._set = res[0][0], Numeric.loads(zlib.decompress(res[0][1])) except: set._nbhits = 0 collection_reclist_cache[coll] = set # finally, return reclist: return collection_reclist_cache[coll] def coll_restricted_p(coll): "Predicate to test if the collection coll is restricted or not." if not coll: return 0 query = "SELECT restricted FROM collection WHERE name='%s'" % MySQLdb.escape_string(coll) res = run_sql(query, None, 1) if res and res[0][0] != None: return 1 else: return 0 def coll_restricted_group(coll): "Return Apache group to which the collection is restricted. Return None if it's public." if not coll: return None query = "SELECT restricted FROM collection WHERE name='%s'" % MySQLdb.escape_string(coll) res = run_sql(query, None, 1) if res: return res[0][0] else: return None def create_collection_reclist_cache(): """Creates list of records belonging to collections. Called on startup and used later for intersecting search results with collection universe.""" global collection_reclist_cache_timestamp collrecs = {} res = run_sql("SELECT name,reclist FROM collection") for name,reclist in res: collrecs[name] = None # this will be filled later during runtime by calling get_collection_reclist(coll) # update timestamp try: collection_reclist_cache_timestamp = time.strftime("%04Y-%02m-%02d %02H:%02M:%02S", time.localtime()) except NameError: collection_reclist_cache_timestamp = 0 return collrecs try: collection_reclist_cache.has_key(cdsname) except: try: collection_reclist_cache = create_collection_reclist_cache() except: collection_reclist_cache = {} def create_collection_i18nname_cache(): """Create cache of I18N collection names of type 'ln' (=long name). Called on startup and used later during the search time.""" global collection_i18nname_cache_timestamp names = {} res = run_sql("SELECT c.name,cn.ln,cn.value FROM collectionname AS cn, collection AS c WHERE cn.id_collection=c.id AND cn.type='ln'") # ln=long name for c,ln,i18nname in res: if i18nname: try: names[c] except KeyError: names[c] = {} names[c][ln] = i18nname # update timestamp try: collection_i18nname_cache_timestamp = time.strftime("%04Y-%02m-%02d %02H:%02M:%02S", time.localtime()) except NameError: collection_i18nname_cache_timestamp = 0 return names try: collection_i18nname_cache.has_key(cdsname) except: try: collection_i18nname_cache = create_collection_i18nname_cache() except: collection_i18nname_cache = {} def create_field_i18nname_cache(): """Create cache of I18N field names of type 'ln' (=long name). Called on startup and used later during the search time.""" global field_i18nname_cache_timestamp names = {} res = run_sql("SELECT f.name,fn.ln,fn.value FROM fieldname AS fn, field AS f WHERE fn.id_field=f.id AND fn.type='ln'") # ln=long name for f,ln,i18nname in res: if i18nname: try: names[f] except KeyError: names[f] = {} names[f][ln] = i18nname # update timestamp try: field_i18nname_cache_timestamp = time.strftime("%04Y-%02m-%02d %02H:%02M:%02S", time.localtime()) except NameError: field_i18nname_cache_timestamp = 0 return names try: field_i18nname_cache.has_key(cdsname) except: try: field_i18nname_cache = create_field_i18nname_cache() except: field_i18nname_cache = {} def browse_pattern(req, colls, p, f, rg, ln=cdslang): """Browse either biliographic phrases or words indexes, and display it.""" # load the right message language _ = gettext_set_language(ln) ## do we search in words indexes? if not f: return browse_in_bibwords(req, p, f) ## prepare collection urlargument for later printing: p_orig = p urlarg_colls = "" for coll in colls: urlarg_colls += "&c=%s" % urllib.quote(coll) ## okay, "real browse" follows: browsed_phrases = get_nearest_terms_in_bibxxx(p, f, rg, 1) while not browsed_phrases: # try again and again with shorter and shorter pattern: try: p = p[:-1] browsed_phrases = get_nearest_terms_in_bibxxx(p, f, rg, 1) except: # probably there are no hits at all: req.write(_("No values found.")) return ## try to check hits in these particular collection selection: browsed_phrases_in_colls = [] if 0: for phrase in browsed_phrases: phrase_hitset = HitSet() phrase_hitsets = search_pattern("", phrase, f, 'e') for coll in colls: phrase_hitset.union(phrase_hitsets[coll]) phrase_hitset.calculate_nbhits() if phrase_hitset._nbhits > 0: # okay, this phrase has some hits in colls, so add it: browsed_phrases_in_colls.append([phrase, phrase_hitset._nbhits]) ## were there hits in collections? if browsed_phrases_in_colls == []: if browsed_phrases != []: #print_warning(req, """

No match close to %s found in given collections. #Please try different term.

Displaying matches in any collection...""" % p_orig) ## try to get nbhits for these phrases in any collection: for phrase in browsed_phrases: browsed_phrases_in_colls.append([phrase, get_nbhits_in_bibxxx(phrase, f)]) ## display results now: out = websearch_templates.tmpl_browse_pattern( f = get_field_i18nname(f, ln), ln = ln, browsed_phrases_in_colls = browsed_phrases_in_colls, weburl = weburl, urlarg_colls = urlarg_colls, ) req.write(out) return def browse_in_bibwords(req, p, f, ln=cdslang): """Browse inside words indexes.""" if not p: return _ = gettext_set_language(ln) urlargs = string.replace(req.args, "action=%s" % _("Browse"), "action=%s" % _("Search")) nearest_box = create_nearest_terms_box(urlargs, p, f, 'w', ln=ln, intro_text_p=0) req.write(websearch_templates.tmpl_search_in_bibwords( p = p, f = f, ln = ln, nearest_box = nearest_box )) return def search_pattern(req=None, p=None, f=None, m=None, ap=0, of="id", verbose=0, ln=cdslang): """Search for complex pattern 'p' within field 'f' according to matching type 'm'. Return hitset of recIDs. The function uses multi-stage searching algorithm in case of no exact match found. See the Search Internals document for detailed description. The 'ap' argument governs whether an alternative patterns are to be used in case there is no direct hit for (p,f,m). For example, whether to replace non-alphanumeric characters by spaces if it would give some hits. See the Search Internals document for detailed description. (ap=0 forbits the alternative pattern usage, ap=1 permits it.) The 'of' argument governs whether to print or not some information to the user in case of no match found. (Usually it prints the information in case of HTML formats, otherwise it's silent). The 'verbose' argument controls the level of debugging information to be printed (0=least, 9=most). All the parameters are assumed to have been previously washed. This function is suitable as a mid-level API. """ _ = gettext_set_language(ln) hitset_empty = HitSet() hitset_empty._nbhits = 0 # sanity check: if not p: hitset_full = HitSet(Numeric.ones(cfg_max_recID+1, Numeric.Int0)) hitset_full._nbhits = cfg_max_recID # no pattern, so return all universe return hitset_full # search stage 1: break up arguments into basic search units: if verbose and of.startswith("h"): t1 = os.times()[4] basic_search_units = create_basic_search_units(req, p, f, m, of) if verbose and of.startswith("h"): t2 = os.times()[4] print_warning(req, "Search stage 1: basic search units are: %s" % basic_search_units) print_warning(req, "Search stage 1: execution took %.2f seconds." % (t2 - t1)) # search stage 2: do search for each search unit and verify hit presence: if verbose and of.startswith("h"): t1 = os.times()[4] basic_search_units_hitsets = [] for idx_unit in range(0,len(basic_search_units)): bsu_o, bsu_p, bsu_f, bsu_m = basic_search_units[idx_unit] basic_search_unit_hitset = search_unit(bsu_p, bsu_f, bsu_m) if verbose >= 9 and of.startswith("h"): print_warning(req, "Search stage 1: pattern %s gave hitlist %s" % (bsu_p, Numeric.nonzero(basic_search_unit_hitset._set))) if basic_search_unit_hitset._nbhits>0 or \ ap==0 or \ bsu_o=="|" or \ ((idx_unit+1) 0: # we retain the new unit instead if of.startswith('h'): print_warning(req, _("No exact match found for %s, using %s instead...") % (bsu_p,bsu_pn)) basic_search_units[idx_unit][1] = bsu_pn basic_search_units_hitsets.append(basic_search_unit_hitset) else: # stage 2-3: no hits found either, propose nearest indexed terms: if of.startswith('h'): if req: if bsu_f == "recid": print_warning(req, "Requested record does not seem to exist.") else: print_warning(req, create_nearest_terms_box(req.args, bsu_p, bsu_f, bsu_m, ln=ln)) return hitset_empty else: # stage 2-3: no hits found either, propose nearest indexed terms: if of.startswith('h'): if req: if bsu_f == "recid": print_warning(req, "Requested record does not seem to exist.") else: print_warning(req, create_nearest_terms_box(req.args, bsu_p, bsu_f, bsu_m, ln=ln)) return hitset_empty if verbose and of.startswith("h"): t2 = os.times()[4] for idx_unit in range(0,len(basic_search_units)): print_warning(req, "Search stage 2: basic search unit %s gave %d hits." % (basic_search_units[idx_unit][1:], basic_search_units_hitsets[idx_unit]._nbhits)) print_warning(req, "Search stage 2: execution took %.2f seconds." % (t2 - t1)) # search stage 3: apply boolean query for each search unit: if verbose and of.startswith("h"): t1 = os.times()[4] # let the initial set be the complete universe: hitset_in_any_collection = HitSet(Numeric.ones(cfg_max_recID+1, Numeric.Int0)) for idx_unit in range(0,len(basic_search_units)): this_unit_operation = basic_search_units[idx_unit][0] this_unit_hitset = basic_search_units_hitsets[idx_unit] if this_unit_operation == '+': hitset_in_any_collection.intersect(this_unit_hitset) elif this_unit_operation == '-': hitset_in_any_collection.difference(this_unit_hitset) elif this_unit_operation == '|': hitset_in_any_collection.union(this_unit_hitset) else: if of.startswith("h"): print_warning(req, "Invalid set operation %s." % this_unit_operation, "Error") hitset_in_any_collection.calculate_nbhits() if hitset_in_any_collection._nbhits == 0: # no hits found, propose alternative boolean query: if of.startswith('h'): nearestterms = [] for idx_unit in range(0,len(basic_search_units)): bsu_o, bsu_p, bsu_f, bsu_m = basic_search_units[idx_unit] bsu_nbhits = basic_search_units_hitsets[idx_unit]._nbhits url_args_new = sre.sub(r'(^|\&)p=.*?(\&|$)', r'\1p='+urllib.quote(bsu_p)+r'\2', req.args) url_args_new = sre.sub(r'(^|\&)f=.*?(\&|$)', r'\1f='+urllib.quote(bsu_f)+r'\2', url_args_new) nearestterms.append({ 'nbhits' : bsu_nbhits, 'url_args' : url_args_new, 'p' : bsu_p, }) text = websearch_templates.tmpl_search_no_boolean_hits( ln = ln, weburl = weburl, nearestterms = nearestterms, ) print_warning(req, text) if verbose and of.startswith("h"): t2 = os.times()[4] print_warning(req, "Search stage 3: boolean query gave %d hits." % hitset_in_any_collection._nbhits) print_warning(req, "Search stage 3: execution took %.2f seconds." % (t2 - t1)) return hitset_in_any_collection def search_unit(p, f=None, m=None): """Search for basic search unit defined by pattern 'p' and field 'f' and matching type 'm'. Return hitset of recIDs. All the parameters are assumed to have been previously washed. 'p' is assumed to be already a ``basic search unit'' so that it is searched as such and is not broken up in any way. Only wildcard and span queries are being detected inside 'p'. This function is suitable as a low-level API. """ ## create empty output results set: set = HitSet() if not p: # sanity checking return set if m == 'a' or m == 'r': # we are doing either direct bibxxx search or phrase search or regexp search set = search_unit_in_bibxxx(p, f, m) else: # we are doing bibwords search by default set = search_unit_in_bibwords(p, f) set.calculate_nbhits() return set def search_unit_in_bibwords(word, f, decompress=zlib.decompress): """Searches for 'word' inside bibwordsX table for field 'f' and returns hitset of recIDs.""" set = HitSet() # will hold output result set set_used = 0 # not-yet-used flag, to be able to circumvent set operations # deduce into which bibwordsX table we will search: bibwordsX = "idxWORD%02dF" % get_index_id("anyfield") if f: index_id = get_index_id(f) if index_id: bibwordsX = "idxWORD%02dF" % index_id # wash 'word' argument and construct query: word = string.replace(word, '*', '%') # we now use '*' as the truncation character words = string.split(word, "->", 1) # check for span query if len(words) == 2: word0 = sre_word.sub('', words[0]) word1 = sre_word.sub('', words[1]) query = "SELECT term,hitlist FROM %s WHERE term BETWEEN '%s' AND '%s'" % (bibwordsX, escape_string(word0[:50]), escape_string(word1[:50])) else: word = sre_word.sub('', word) if string.find(word, '%') >= 0: # do we have wildcard in the word? query = "SELECT term,hitlist FROM %s WHERE term LIKE '%s'" % (bibwordsX, escape_string(word[:50])) else: query = "SELECT term,hitlist FROM %s WHERE term='%s'" % (bibwordsX, escape_string(word[:50])) # launch the query: res = run_sql(query) # fill the result set: for word,hitlist in res: hitset_bibwrd = HitSet(Numeric.loads(decompress(hitlist))) # add the results: if set_used: set.union(hitset_bibwrd) else: set = hitset_bibwrd set_used = 1 # okay, return result set: return set def search_unit_in_bibxxx(p, f, type): """Searches for pattern 'p' inside bibxxx tables for field 'f' and returns hitset of recIDs found. The search type is defined by 'type' (e.g. equals to 'r' for a regexp search).""" p_orig = p # saving for eventual future 'no match' reporting # wash arguments: f = string.replace(f, '*', '%') # replace truncation char '*' in field definition if type == 'r': pattern = "REGEXP '%s'" % MySQLdb.escape_string(p) else: p = string.replace(p, '*', '%') # we now use '*' as the truncation character ps = string.split(p, "->", 1) # check for span query: if len(ps) == 2: pattern = "BETWEEN '%s' AND '%s'" % (MySQLdb.escape_string(ps[0]), MySQLdb.escape_string(ps[1])) else: if string.find(p, '%') > -1: pattern = "LIKE '%s'" % MySQLdb.escape_string(ps[0]) else: pattern = "='%s'" % MySQLdb.escape_string(ps[0]) # construct 'tl' which defines the tag list (MARC tags) to search in: tl = [] if str(f[0]).isdigit() and str(f[1]).isdigit(): tl.append(f) # 'f' seems to be okay as it starts by two digits else: # convert old ALEPH tag names, if appropriate: (TODO: get rid of this before entering this function) if cfg_fields_convert.has_key(string.lower(f)): f = cfg_fields_convert[string.lower(f)] # deduce desired MARC tags on the basis of chosen 'f' tl = get_field_tags(f) if not tl: # by default we are searching in author index: tl = get_field_tags("author") # okay, start search: l = [] # will hold list of recID that matched for t in tl: # deduce into which bibxxx table we will search: digit1, digit2 = int(t[0]), int(t[1]) bx = "bib%d%dx" % (digit1, digit2) bibx = "bibrec_bib%d%dx" % (digit1, digit2) # construct query: if t == "001": query = "SELECT id FROM bibrec WHERE id %s" % pattern else: if len(t) != 6 or t[-1:]=='%': # only the beginning of field 't' is defined, so add wildcard character: query = "SELECT bibx.id_bibrec FROM %s AS bx LEFT JOIN %s AS bibx ON bx.id=bibx.id_bibxxx WHERE bx.value %s AND bx.tag LIKE '%s%%'" %\ (bx, bibx, pattern, t) else: query = "SELECT bibx.id_bibrec FROM %s AS bx LEFT JOIN %s AS bibx ON bx.id=bibx.id_bibxxx WHERE bx.value %s AND bx.tag='%s'" %\ (bx, bibx, pattern, t) # launch the query: res = run_sql(query) # fill the result set: for id_bibrec in res: if id_bibrec[0]: l.append(id_bibrec[0]) # check no of hits found: nb_hits = len(l) # okay, return result set: set = HitSet() set.addlist(Numeric.array(l)) return set def search_unit_in_bibrec(day1, day2, type='creation_date'): """Return hitset of recIDs found that were either created or modified (see 'type' arg) from day1 until day2, inclusive. Does not pay attention to pattern, collection, anything. Useful to intersect later on with the 'real' query.""" set = HitSet() if type != "creation_date" and type != "modification_date": # type argument is invalid, so search for creation dates by default type = "creation_date" res = run_sql("SELECT id FROM bibrec WHERE %s>=%s AND %s<=%s" % (type, "%s", type, "%s"), (day1, day2)) l = [] for row in res: l.append(row[0]) set.addlist(Numeric.array(l)) return set def intersect_results_with_collrecs(req, hitset_in_any_collection, colls, ap=0, of="hb", verbose=0, ln=cdslang): """Return dict of hitsets given by intersection of hitset with the collection universes.""" _ = gettext_set_language(ln) # search stage 4: intersect with the collection universe: if verbose and of.startswith("h"): t1 = os.times()[4] results = {} results_nbhits = 0 for coll in colls: results[coll] = HitSet() results[coll]._set = Numeric.bitwise_and(hitset_in_any_collection._set, get_collection_reclist(coll)._set) results[coll].calculate_nbhits() results_nbhits += results[coll]._nbhits if results_nbhits == 0: # no hits found, try to search in Home: results_in_Home = HitSet() results_in_Home._set = Numeric.bitwise_and(hitset_in_any_collection._set, get_collection_reclist(cdsname)._set) results_in_Home.calculate_nbhits() if results_in_Home._nbhits > 0: # some hits found in Home, so propose this search: url_args = req.args url_args = sre.sub(r'(^|\&)cc=.*?(\&|$)', r'\2', url_args) url_args = sre.sub(r'(^|\&)c=.*?(\&[^c]+=|$)', r'\2', url_args) url_args = sre.sub(r'^\&+', '', url_args) url_args = sre.sub(r'\&+$', '', url_args) if of.startswith("h"): print_warning(req, _("No match found in collection %s. Other public collections gave " "%d hits.") % (string.join(colls, ","), weburl, url_args, results_in_Home._nbhits)) results = {} else: # no hits found in Home, recommend different search terms: if of.startswith("h"): print_warning(req, _("No public collection matched your query. " "If you were looking for a non-public document, please choose " "the desired restricted collection first.")) results = {} if verbose and of.startswith("h"): t2 = os.times()[4] print_warning(req, "Search stage 4: intersecting with collection universe gave %d hits." % results_nbhits) print_warning(req, "Search stage 4: execution took %.2f seconds." % (t2 - t1)) return results def intersect_results_with_hitset(req, results, hitset, ap=0, aptext="", of="hb"): """Return intersection of search 'results' (a dict of hitsets with collection as key) with the 'hitset', i.e. apply 'hitset' intersection to each collection within search 'results'. If the final 'results' set is to be empty, and 'ap' (approximate pattern) is true, and then print the `warningtext' and return the original 'results' set unchanged. If 'ap' is false, then return empty results set. """ if ap: results_ap = copy.deepcopy(results) else: results_ap = {} # will return empty dict in case of no hits found nb_total = 0 for coll in results.keys(): results[coll].intersect(hitset) results[coll].calculate_nbhits() nb_total += results[coll]._nbhits if nb_total == 0: if of.startswith("h"): print_warning(req, aptext) results = results_ap return results def create_similarly_named_authors_link_box(author_name, ln=cdslang): """Return a box similar to ``Not satisfied...'' one by proposing author searches for similar names. Namely, take AUTHOR_NAME and the first initial of the firstame (after comma) and look into author index whether authors with e.g. middle names exist. Useful mainly for CERN Library that sometimes contains name forms like Ellis-N, Ellis-Nick, Ellis-Nicolas all denoting the same person. The box isn't proposed if no similarly named authors are found to exist. """ # return nothing if not configured: if cfg_create_similarly_named_authors_link_box == 0: return "" # return empty box if there is no initial: if sre.match(r'[^ ,]+, [^ ]', author_name) is None: return "" # firstly find name comma initial: author_name_to_search = sre.sub(r'^([^ ,]+, +[^ ,]).*$', '\\1', author_name) print author_name_to_search # secondly search for similar name forms: similar_author_names = {} for tag in get_field_tags("author"): # deduce into which bibxxx table we will search: digit1, digit2 = int(tag[0]), int(tag[1]) bx = "bib%d%dx" % (digit1, digit2) bibx = "bibrec_bib%d%dx" % (digit1, digit2) if len(tag) != 6 or tag[-1:]=='%': # only the beginning of field 't' is defined, so add wildcard character: query = "SELECT bx.value FROM %s AS bx WHERE bx.value LIKE '%s%%' AND bx.tag LIKE '%s%%'" \ % (bx, escape_string(author_name_to_search), tag) else: query = "SELECT bx.value FROM %s AS bx WHERE bx.value LIKE '%s%%' AND bx.tag='%s'" \ % (bx, escape_string(author_name_to_search), tag) res = run_sql(query) for row in res: similar_author_names[row[0]] = 1 # remove the original name and sort the list: try: del similar_author_names[author_name] except KeyError: pass # thirdly print the box: out = "" if similar_author_names: out_authors = similar_author_names.keys() out_authors.sort() tmp_authors = [] for out_author in out_authors: tmp_authors.append({ 'nb' : get_nbhits_in_bibxxx(out_author, "author"), 'name' : out_author, }) out += websearch_templates.tmpl_similar_author_names( ln = ln, weburl = weburl, authors = tmp_authors, ) return out def create_nearest_terms_box(urlargs, p, f, t='w', n=5, ln=cdslang, intro_text_p=1): """Return text box containing list of 'n' nearest terms above/below 'p' for the field 'f' for matching type 't' (words/phrases) in language 'ln'. Propose new searches according to `urlargs' with the new words. If `intro_text_p' is true, then display the introductory message, otherwise print only the nearest terms in the box content. """ # load the right message language _ = gettext_set_language(ln) out = "" nearest_terms = [] if not p: # sanity check p = "." # look for nearest terms: if t == 'w': nearest_terms = get_nearest_terms_in_bibwords(p, f, n, n) if not nearest_terms: return "%s %s." % (_("No words index available for"), get_field_i18nname(f, ln)) else: nearest_terms = get_nearest_terms_in_bibxxx(p, f, n, n) if not nearest_terms: return "%s %s." % (_("No phrase index available for"), get_field_i18nname(f, ln)) termargs = [] termhits = [] for term in nearest_terms: if t == 'w': termhits.append(get_nbhits_in_bibwords(term, f)) else: termhits.append(get_nbhits_in_bibxxx(term, f)) termargs.append(urlargs_replace_text_in_arg(urlargs, r'^p\d?$', p, term)) intro = "" if intro_text_p: # add full leading introductory text intro = _("Search term %s") % p if f: intro += " " + _("inside %s index") % get_field_i18nname(f, ln) intro += " " + _("did not match any record. Nearest terms in any collection are:") return websearch_templates.tmpl_nearest_term_box( p = p, ln = ln, f = f, weburl = weburl, terms = nearest_terms, termargs = termargs, termhits = termhits, intro = intro, ) def get_nearest_terms_in_bibwords(p, f, n_below, n_above): """Return list of +n -n nearest terms to word `p' in index for field `f'.""" nearest_words = [] # will hold the (sorted) list of nearest words to return # deduce into which bibwordsX table we will search: bibwordsX = "idxWORD%02dF" % get_index_id("anyfield") if f: index_id = get_index_id(f) if index_id: bibwordsX = "idxWORD%02dF" % index_id else: return nearest_words # firstly try to get `n' closest words above `p': query = "SELECT term FROM %s WHERE term<'%s' ORDER BY term DESC LIMIT %d" % (bibwordsX, escape_string(p), n_above) res = run_sql(query) for row in res: nearest_words.append(row[0]) nearest_words.reverse() # secondly insert given word `p': nearest_words.append(p) # finally try to get `n' closest words below `p': query = "SELECT term FROM %s WHERE term>'%s' ORDER BY term ASC LIMIT %d" % (bibwordsX, escape_string(p), n_below) res = run_sql(query) for row in res: nearest_words.append(row[0]) return nearest_words def get_nearest_terms_in_bibxxx(p, f, n_below, n_above): """Browse (-n_above, +n_below) closest bibliographic phrases for the given pattern p in the given field f, regardless of collection. Return list of [phrase1, phrase2, ... , phrase_n].""" ## determine browse field: if not f and string.find(p, ":") > 0: # does 'p' contain ':'? f, p = split(p, ":", 1) ## wash 'p' argument: p = sre_quotes.sub("", p) ## We are going to take max(n_below, n_above) as the number of ## values to ferch from bibXXx. This is needed to work around ## MySQL UTF-8 sorting troubles in 4.0.x. Proper solution is to ## use MySQL 4.1.x or our own idxPHRASE in the future. n_fetch = 2*max(n_below,n_above) ## construct 'tl' which defines the tag list (MARC tags) to search in: tl = [] if str(f[0]).isdigit() and str(f[1]).isdigit(): tl.append(f) # 'f' seems to be okay as it starts by two digits else: # deduce desired MARC tags on the basis of chosen 'f' tl = get_field_tags(f) ## start browsing to fetch list of hits: browsed_phrases = {} # will hold {phrase1: 1, phrase2: 1, ..., phraseN: 1} dict of browsed phrases (to make them unique) # always add self to the results set: browsed_phrases[p] = 1 for t in tl: # deduce into which bibxxx table we will search: digit1, digit2 = int(t[0]), int(t[1]) bx = "bib%d%dx" % (digit1, digit2) bibx = "bibrec_bib%d%dx" % (digit1, digit2) # firstly try to get `n' closest phrases above `p': if len(t) != 6 or t[-1:]=='%': # only the beginning of field 't' is defined, so add wildcard character: query = "SELECT bx.value FROM %s AS bx WHERE bx.value<'%s' AND bx.tag LIKE '%s%%' ORDER BY bx.value DESC LIMIT %d" \ % (bx, escape_string(p), t, n_fetch) else: query = "SELECT bx.value FROM %s AS bx WHERE bx.value<'%s' AND bx.tag='%s' ORDER BY bx.value DESC LIMIT %d" \ % (bx, escape_string(p), t, n_fetch) res = run_sql(query) for row in res: browsed_phrases[row[0]] = 1 # secondly try to get `n' closest phrases equal to or below `p': if len(t) != 6 or t[-1:]=='%': # only the beginning of field 't' is defined, so add wildcard character: query = "SELECT bx.value FROM %s AS bx WHERE bx.value>='%s' AND bx.tag LIKE '%s%%' ORDER BY bx.value ASC LIMIT %d" \ % (bx, escape_string(p), t, n_fetch) else: query = "SELECT bx.value FROM %s AS bx WHERE bx.value>='%s' AND bx.tag='%s' ORDER BY bx.value ASC LIMIT %d" \ % (bx, escape_string(p), t, n_fetch) res = run_sql(query) for row in res: browsed_phrases[row[0]] = 1 # select first n words only: (this is needed as we were searching # in many different tables and so aren't sure we have more than n # words right; this of course won't be needed when we shall have # one ACC table only for given field): phrases_out = browsed_phrases.keys() phrases_out.sort(lambda x, y: cmp(string.lower(strip_accents(x)), string.lower(strip_accents(y)))) # find position of self: try: idx_p = phrases_out.index(p) except: idx_p = len(phrases_out)/2 # return n_above and n_below: return phrases_out[max(0,idx_p-n_above):idx_p+n_below] def get_nbhits_in_bibwords(word, f): """Return number of hits for word 'word' inside words index for field 'f'.""" out = 0 # deduce into which bibwordsX table we will search: bibwordsX = "idxWORD%02dF" % get_index_id("anyfield") if f: index_id = get_index_id(f) if index_id: bibwordsX = "idxWORD%02dF" % index_id else: return 0 if word: query = "SELECT hitlist FROM %s WHERE term='%s'" % (bibwordsX, escape_string(word)) res = run_sql(query) for hitlist in res: out += Numeric.sum(Numeric.loads(zlib.decompress(hitlist[0])).copy().astype(Numeric.Int)) return out def get_nbhits_in_bibxxx(p, f): """Return number of hits for word 'word' inside words index for field 'f'.""" ## determine browse field: if not f and string.find(p, ":") > 0: # does 'p' contain ':'? f, p = split(p, ":", 1) ## wash 'p' argument: p = sre_quotes.sub("", p) ## construct 'tl' which defines the tag list (MARC tags) to search in: tl = [] if str(f[0]).isdigit() and str(f[1]).isdigit(): tl.append(f) # 'f' seems to be okay as it starts by two digits else: # deduce desired MARC tags on the basis of chosen 'f' tl = get_field_tags(f) # start searching: recIDs = {} # will hold dict of {recID1: 1, recID2: 1, ..., } (unique recIDs, therefore) for t in tl: # deduce into which bibxxx table we will search: digit1, digit2 = int(t[0]), int(t[1]) bx = "bib%d%dx" % (digit1, digit2) bibx = "bibrec_bib%d%dx" % (digit1, digit2) if len(t) != 6 or t[-1:]=='%': # only the beginning of field 't' is defined, so add wildcard character: query = """SELECT bibx.id_bibrec FROM %s AS bibx, %s AS bx WHERE bx.value='%s' AND bx.tag LIKE '%s%%' AND bibx.id_bibxxx=bx.id""" \ % (bibx, bx, escape_string(p), t) else: query = """SELECT bibx.id_bibrec FROM %s AS bibx, %s AS bx WHERE bx.value='%s' AND bx.tag='%s' AND bibx.id_bibxxx=bx.id""" \ % (bibx, bx, escape_string(p), t) res = run_sql(query) for row in res: recIDs[row[0]] = 1 return len(recIDs) def get_mysql_recid_from_aleph_sysno(sysno): """Returns MySQL's recID for ALEPH sysno passed in the argument (e.g. "002379334CER"). Returns None in case of failure.""" out = None query = "SELECT bb.id_bibrec FROM bibrec_bib97x AS bb, bib97x AS b WHERE b.value='%s' AND b.tag='970__a' AND bb.id_bibxxx=b.id" %\ (escape_string(sysno)) res = run_sql(query, None, 1) if res: out = res[0][0] return out def guess_primary_collection_of_a_record(recID): """Return primary collection name a record recid belongs to, by testing 980 identifier. May lead to bad guesses when a collection is defined dynamically bia dbquery. In that case, return 'cdsname'.""" out = cdsname dbcollids = get_fieldvalues(recID, "980__a") if dbcollids: dbquery = "collection:" + dbcollids[0] res = run_sql("SELECT name FROM collection WHERE dbquery=%s", (dbquery,)) if res: out = res[0][0] return out def get_tag_name(tag_value, prolog="", epilog=""): """Return tag name from the known tag value, by looking up the 'tag' table. Return empty string in case of failure. Example: input='100__%', output=first author'.""" out = "" res = run_sql("SELECT name FROM tag WHERE value=%s", (tag_value,)) if res: out = prolog + res[0][0] + epilog return out def get_fieldcodes(): """Returns a list of field codes that may have been passed as 'search options' in URL. Example: output=['subject','division'].""" out = [] res = run_sql("SELECT DISTINCT(code) FROM field") for row in res: out.append(row[0]) return out def get_field_tags(field): """Returns a list of MARC tags for the field code 'field'. Returns empty list in case of error. Example: field='author', output=['100__%','700__%'].""" out = [] query = """SELECT t.value FROM tag AS t, field_tag AS ft, field AS f WHERE f.code='%s' AND ft.id_field=f.id AND t.id=ft.id_tag ORDER BY ft.score DESC""" % field res = run_sql(query) for val in res: out.append(val[0]) return out def get_fieldvalues(recID, tag): """Return list of field values for field TAG inside record RECID.""" out = [] if tag == "001___": # we have asked for recID that is not stored in bibXXx tables out.append(str(recID)) else: # we are going to look inside bibXXx tables digit = tag[0:2] bx = "bib%sx" % digit bibx = "bibrec_bib%sx" % digit query = "SELECT bx.value FROM %s AS bx, %s AS bibx WHERE bibx.id_bibrec='%s' AND bx.id=bibx.id_bibxxx AND bx.tag LIKE '%s'" \ "ORDER BY bibx.field_number, bx.tag ASC" % (bx, bibx, recID, tag) res = run_sql(query) for row in res: out.append(row[0]) return out def get_fieldvalues_alephseq_like(recID, tags): """Return textual lines in ALEPH sequential like format for field 'tag' inside record 'recID'.""" out = "" # clean passed 'tag': tags_in = string.split(tags, ",") if len(tags_in) == 1 and len(tags_in[0]) == 6: ## case A: one concrete subfield asked, so print its value if found ## (use with care: can false you if field has multiple occurrences) out += string.join(get_fieldvalues(recID, tags_in[0]),"\n") else: ## case B: print our "text MARC" format; works safely all the time tags_out = [] for tag in tags_in: if len(tag) == 0: for i in range(0,10): for j in range(0,10): tags_out.append("%d%d%%" % (i, j)) elif len(tag) == 1: for j in range(0,10): tags_out.append("%s%d%%" % (tag, j)) elif len(tag) < 5: tags_out.append("%s%%" % tag) elif tag >= 6: tags_out.append(tag[0:5]) # search all bibXXx tables as needed: for tag in tags_out: digits = tag[0:2] if tag.startswith("001") or tag.startswith("00%"): if out: out += "\n" out += "%09d %s %d" % (recID, "001__", recID) bx = "bib%sx" % digits bibx = "bibrec_bib%sx" % digits query = "SELECT b.tag,b.value,bb.field_number FROM %s AS b, %s AS bb "\ "WHERE bb.id_bibrec='%s' AND b.id=bb.id_bibxxx AND b.tag LIKE '%s%%' "\ "ORDER BY bb.field_number, b.tag ASC" % (bx, bibx, recID, tag) res = run_sql(query) # go through fields: field_number_old = -999 field_old = "" for row in res: field, value, field_number = row[0], row[1], row[2] ind1, ind2 = field[3], field[4] if ind1 == "_": ind1 = "" if ind2 == "_": ind2 = "" # print field tag if field_number != field_number_old or field[:-1] != field_old[:-1]: if out: out += "\n" out += "%09d %s " % (recID, field[:5]) field_number_old = field_number field_old = field # print subfield value out += "$$%s%s" % (field[-1:], value) return out def record_exists(recID): """Return 1 if record RECID exists. Return 0 if it doesn't exist. Return -1 if it exists but is marked as deleted.""" out = 0 query = "SELECT id FROM bibrec WHERE id='%s'" % recID res = run_sql(query, None, 1) if res: # record exists; now check whether it isn't marked as deleted: dbcollids = get_fieldvalues(recID, "980__%") if ("DELETED" in dbcollids) or (cfg_cern_site and "DUMMY" in dbcollids): out = -1 # exists, but marked as deleted else: out = 1 # exists fine return out def record_public_p(recID): """Return 1 if the record is public, i.e. if it can be found in the Home collection. Return 0 otherwise. """ return get_collection_reclist(cdsname).contains(recID) def get_creation_date(recID, fmt="%Y-%m-%d"): "Returns the creation date of the record 'recID'." out = "" res = run_sql("SELECT DATE_FORMAT(creation_date,%s) FROM bibrec WHERE id=%s", (fmt, recID), 1) if res: out = res[0][0] return out def get_modification_date(recID, fmt="%Y-%m-%d"): "Returns the date of last modification for the record 'recID'." out = "" res = run_sql("SELECT DATE_FORMAT(modification_date,%s) FROM bibrec WHERE id=%s", (fmt, recID), 1) if res: out = res[0][0] return out def print_warning(req, msg, type='', prologue='
', epilogue='
'): "Prints warning message and flushes output." if req and msg: req.write(websearch_templates.tmpl_print_warning( msg = msg, type = type, prologue = prologue, epilogue = epilogue, )) return def print_search_info(p, f, sf, so, sp, rm, of, ot, collection=cdsname, nb_found=-1, jrec=1, rg=10, as=0, ln=cdslang, p1="", p2="", p3="", f1="", f2="", f3="", m1="", m2="", m3="", op1="", op2="", sc=1, pl_in_url="", d1y=0, d1m=0, d1d=0, d2y=0, d2m=0, d2d=0, cpu_time=-1, middle_only=0): """Prints stripe with the information on 'collection' and 'nb_found' results and CPU time. Also, prints navigation links (beg/next/prev/end) inside the results set. If middle_only is set to 1, it will only print the middle box information (beg/netx/prev/end/etc) links. This is suitable for displaying navigation links at the bottom of the search results page.""" out = "" # sanity check: if jrec < 1: jrec = 1 if jrec > nb_found: jrec = max(nb_found-rg+1, 1) return websearch_templates.tmpl_print_search_info( ln = ln, weburl = weburl, collection = collection, as = as, collection_name = get_coll_i18nname(collection, ln), middle_only = middle_only, rg = rg, nb_found = nb_found, sf = sf, so = so, rm = rm, of = of, ot = ot, p = p, f = f, p1 = p1, p2 = p2, p3 = p3, f1 = f1, f2 = f2, f3 = f3, m1 = m1, m2 = m2, m3 = m3, op1 = op1, op2 = op2, pl_in_url = pl_in_url, d1y = d1y, d1m = d1m, d1d = d1d, d2y = d2y, d2m = d2m, d2d = d2d, jrec = jrec, sc = sc, sp = sp, all_fieldcodes = get_fieldcodes(), cpu_time = cpu_time, ) def print_results_overview(req, colls, results_final_nb_total, results_final_nb, cpu_time, ln=cdslang): "Prints results overview box with links to particular collections below." out = "" new_colls = [] for coll in colls: new_colls.append({ 'code' : coll, 'name' : get_coll_i18nname(coll, ln), }) # deduce url without 'of' argument: args = req.args or '' url_args = sre.sub(r'(^|\&)of=.*?(\&|$)',r'\1',args) url_args = sre.sub(r'^\&+', '', url_args) url_args = sre.sub(r'\&+$', '', url_args) return websearch_templates.tmpl_print_results_overview( ln = ln, weburl = weburl, results_final_nb_total = results_final_nb_total, results_final_nb = results_final_nb, cpu_time = cpu_time, colls = new_colls, url_args = url_args, ) def sort_records(req, recIDs, sort_field='', sort_order='d', sort_pattern='', verbose=0): """Sort records in 'recIDs' list according sort field 'sort_field' in order 'sort_order'. If more than one instance of 'sort_field' is found for a given record, try to choose that that is given by 'sort pattern', for example "sort by report number that starts by CERN-PS". Note that 'sort_field' can be field code like 'author' or MARC tag like '100__a' directly.""" ## check arguments: if not sort_field: return recIDs if len(recIDs) > cfg_nb_records_to_sort: print_warning(req, "Sorry, sorting is allowed on sets of up to %d records only. Using default sort order (\"latest first\")." % cfg_nb_records_to_sort,"Warning") return recIDs sort_fields = string.split(sort_field, ",") recIDs_dict = {} recIDs_out = [] ## first deduce sorting MARC tag out of the 'sort_field' argument: tags = [] for sort_field in sort_fields: if sort_field and str(sort_field[0:2]).isdigit(): # sort_field starts by two digits, so this is probably a MARC tag already tags.append(sort_field) else: # let us check the 'field' table query = """SELECT DISTINCT(t.value) FROM tag AS t, field_tag AS ft, field AS f WHERE f.code='%s' AND ft.id_field=f.id AND t.id=ft.id_tag ORDER BY ft.score DESC""" % sort_field res = run_sql(query) if res: for row in res: tags.append(row[0]) else: print_warning(req, "Sorry, '%s' does not seem to be a valid sort option. Choosing title sort instead." % sort_field, "Error") tags.append("245__a") if verbose >= 3: print_warning(req, "Sorting by tags %s." % tags) if sort_pattern: print_warning(req, "Sorting preferentially by %s." % sort_pattern) ## check if we have sorting tag defined: if tags: # fetch the necessary field values: for recID in recIDs: val = "" # will hold value for recID according to which sort vals = [] # will hold all values found in sorting tag for recID for tag in tags: vals.extend(get_fieldvalues(recID, tag)) if sort_pattern: # try to pick that tag value that corresponds to sort pattern bingo = 0 for v in vals: if v.startswith(sort_pattern): # bingo! bingo = 1 val = v break if not bingo: # sort_pattern not present, so add other vals after spaces val = sort_pattern + " " + string.join(vals) else: # no sort pattern defined, so join them all together val = string.join(vals) val = val.lower() if recIDs_dict.has_key(val): recIDs_dict[val].append(recID) else: recIDs_dict[val] = [recID] # sort them: recIDs_dict_keys = recIDs_dict.keys() recIDs_dict_keys.sort() # now that keys are sorted, create output array: for k in recIDs_dict_keys: for s in recIDs_dict[k]: recIDs_out.append(s) # ascending or descending? if sort_order == 'a': recIDs_out.reverse() # okay, we are done return recIDs_out else: # good, no sort needed return recIDs def print_record_list_for_similarity_boxen(req, title, recID_score_list, ln=cdslang): """Print list of records in the "hs" (HTML Similarity) format for similarity boxes. FIXME: templatize. """ recID_score_list_to_be_printed = [] # firstly find 5 first public records to print: nb_records_to_be_printed = 0 nb_records_seen = 0 while nb_records_to_be_printed < 5 and nb_records_seen < len(recID_score_list) and nb_records_seen < 50: # looking through first 50 records only, picking first 5 public ones (recID, score) = recID_score_list[nb_records_seen] nb_records_seen += 1 if record_public_p(recID): nb_records_to_be_printed += 1 recID_score_list_to_be_printed.append([recID,score]) # secondly print them: if nb_records_to_be_printed > 0: req.write("""
""") req.write("""
%s
""" % title) req.write("""
""") for (recID, score) in recID_score_list_to_be_printed: req.write("""""" % \ (score,print_record(recID, format="hs", ln=ln))) req.write("""
(%s)  %s
""") return def print_records(req, recIDs, jrec=1, rg=10, format='hb', ot='', ln=cdslang, relevances=[], relevances_prologue="(", relevances_epilogue="%%)", decompress=zlib.decompress): """Prints list of records 'recIDs' formatted accoding to 'format' in groups of 'rg' starting from 'jrec'. Assumes that the input list 'recIDs' is sorted in reverse order, so it counts records from tail to head. A value of 'rg=-9999' means to print all records: to be used with care. Print also list of RELEVANCES for each record (if defined), in between RELEVANCE_PROLOGUE and RELEVANCE_EPILOGUE. """ # load the right message language _ = gettext_set_language(ln) # sanity checking: if req == None: return if len(recIDs): nb_found = len(recIDs) if rg == -9999: # print all records rg = nb_found else: rg = abs(rg) if jrec < 1: # sanity checks jrec = 1 if jrec > nb_found: jrec = max(nb_found-rg+1, 1) # will print records from irec_max to irec_min excluded: irec_max = nb_found - jrec irec_min = nb_found - jrec - rg if irec_min < 0: irec_min = -1 if irec_max >= nb_found: irec_max = nb_found - 1 #req.write("%s:%d-%d" % (recIDs, irec_min, irec_max)) if format.startswith('x'): # we are doing XML output: for irec in range(irec_max,irec_min,-1): req.write(print_record(recIDs[irec], format, ot, ln)) elif format.startswith('t') or str(format[0:3]).isdigit(): # we are doing plain text output: for irec in range(irec_max,irec_min,-1): x = print_record(recIDs[irec], format, ot, ln) req.write(x) if x: req.write('\n') else: # we are doing HTML output: if format == 'hp' or format.startswith("hb_") or format.startswith("hd_"): # portfolio and on-the-fly formats: for irec in range(irec_max,irec_min,-1): req.write(print_record(recIDs[irec], format, ot, ln)) elif format.startswith("hb"): # HTML brief format: rows = [] for irec in range(irec_max,irec_min,-1): temp = { 'number' : jrec+irec_max-irec, 'recid' : recIDs[irec], } if relevances and relevances[irec]: temp['relevance'] = relevances[irec] else: temp['relevance'] = '' temp['record'] = print_record(recIDs[irec], format, ot, ln) rows.append(temp) req.write(websearch_templates.tmpl_records_format_htmlbrief( ln = ln, weburl = weburl, rows = rows, relevances_prologue = relevances_prologue, relevances_epilogue = relevances_epilogue, )) else: # HTML detailed format: # deduce url without 'of' argument: url_args = sre.sub(r'(^|\&)of=.*?(\&|$)',r'\1',req.args) url_args = sre.sub(r'^\&+', '', url_args) url_args = sre.sub(r'\&+$', '', url_args) # print other formatting choices: rows = [] for irec in range(irec_max,irec_min,-1): temp = { 'record' : print_record(recIDs[irec], format, ot, ln), 'recid' : recIDs[irec], 'creationdate': '', 'modifydate' : '', } if record_exists(recIDs[irec])==1: temp['creationdate'] = get_creation_date(recIDs[irec]) temp['modifydate'] = get_modification_date(recIDs[irec]) if cfg_experimental_features: r = calculate_cited_by_list(recIDs[irec]) if r: temp ['citinglist'] = r temp ['citationhistory'] = create_citation_history_graph_and_box(recIDs[irec], ln) r = calculate_co_cited_with_list(recIDs[irec]) if r: temp ['cociting'] = r r = calculate_reading_similarity_list(recIDs[irec], "downloads") if r: temp ['downloadsimilarity'] = r temp ['downloadhistory'] = create_download_history_graph_and_box(recIDs[irec], ln) # Get comments and reviews for this record if exist # FIXME: templatize me if cfg_webcomment_allow_comments or cfg_webcomment_allow_reviews: from webcomment import get_first_comments_or_remarks (comments, reviews) = get_first_comments_or_remarks(recID=recIDs[irec], ln=ln, nb_comments=cfg_webcomment_nb_comments_in_detailed_view, nb_reviews=cfg_webcomment_nb_reviews_in_detailed_view) temp['comments'] = comments temp['reviews'] = reviews r = calculate_reading_similarity_list(recIDs[irec], "pageviews") if r: temp ['viewsimilarity'] = r rows.append(temp) req.write(websearch_templates.tmpl_records_format_other( ln = ln, weburl = weburl, url_args = url_args, rows = rows, format = format, )) else: print_warning(req, _("Use different search terms.")) def print_record(recID, format='hb', ot='', ln=cdslang, decompress=zlib.decompress): "Prints record 'recID' formatted accoding to 'format'." _ = gettext_set_language(ln) out = "" # sanity check: record_exist_p = record_exists(recID) if record_exist_p == 0: # doesn't exist return out # print record opening tags, if needed: if format == "marcxml" or format == "oai_dc": out += " \n" out += "

\n" for id in get_fieldvalues(recID,oaiidfield): out += " %s\n" % id out += " %s\n" % get_modification_date(recID) out += "
\n" out += " \n" if format.startswith("xm") or format == "marcxml": # look for detailed format existence: query = "SELECT value FROM bibfmt WHERE id_bibrec='%s' AND format='%s'" % (recID, format) res = run_sql(query, None, 1) if res and record_exist_p==1: # record 'recID' is formatted in 'format', so print it out += "%s" % decompress(res[0][0]) else: # record 'recID' is not formatted in 'format' -- they are not in "bibfmt" table; so fetch all the data from "bibXXx" tables: if format == "marcxml": out += """ \n""" out += " %d\n" % int(recID) elif format.startswith("xm"): out += """ \n""" out += " %d\n" % int(recID) if record_exist_p == -1: # deleted record, so display only OAI ID and 980: oai_ids = get_fieldvalues(recID, cfg_oaiidtag) if oai_ids: out += "%s\n" % \ (cfg_oaiidtag[0:3], cfg_oaiidtag[3:4], cfg_oaiidtag[4:5], cfg_oaiidtag[5:6], oai_ids[0]) out += "DELETED\n" else: for digit1 in range(0,10): for digit2 in range(0,10): bx = "bib%d%dx" % (digit1, digit2) bibx = "bibrec_bib%d%dx" % (digit1, digit2) query = "SELECT b.tag,b.value,bb.field_number FROM %s AS b, %s AS bb "\ "WHERE bb.id_bibrec='%s' AND b.id=bb.id_bibxxx AND b.tag LIKE '%s%%' "\ "ORDER BY bb.field_number, b.tag ASC" % (bx, bibx, recID, str(digit1)+str(digit2)) res = run_sql(query) field_number_old = -999 field_old = "" for row in res: field, value, field_number = row[0], row[1], row[2] ind1, ind2 = field[3], field[4] if ind1 == "_": ind1 = "" if ind2 == "_": ind2 = "" # print field tag if field_number != field_number_old or field[:-1] != field_old[:-1]: if format.startswith("xm") or format == "marcxml": fieldid = encode_for_xml(field[0:3]) if field_number_old != -999: out += """ \n""" out += """ \n""" % \ (encode_for_xml(field[0:3]), encode_for_xml(ind1), encode_for_xml(ind2)) field_number_old = field_number field_old = field # print subfield value if format.startswith("xm") or format == "marcxml": value = encode_for_xml(value) out += """ %s\n""" % (encode_for_xml(field[-1:]), value) # all fields/subfields printed in this run, so close the tag: if (format.startswith("xm") or format == "marcxml") and field_number_old != -999: out += """ \n""" # we are at the end of printing the record: if format.startswith("xm") or format == "marcxml": out += " \n" elif format == "xd" or format == "oai_dc": # XML Dublin Core format, possibly OAI -- select only some bibXXx fields: out += """ \n""" if record_exist_p == -1: out += "" else: for f in get_fieldvalues(recID, "041__a"): out += " %s\n" % f for f in get_fieldvalues(recID, "100__a"): out += " %s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "700__a"): out += " %s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "245__a"): out += " %s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "65017a"): out += " %s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "8564_u"): out += " %s\n" % encode_for_xml(f) for f in get_fieldvalues(recID, "520__a"): out += " %s\n" % encode_for_xml(f) out += " %s\n" % get_creation_date(recID) out += " \n" elif format.startswith("x_"): # underscore means that XML formats should be called on-the-fly; suitable for testing formats out += call_bibformat(recID, format) elif str(format[0:3]).isdigit(): # user has asked to print some fields only if format == "001": out += "%s\n" % (format, recID, format) else: vals = get_fieldvalues(recID, format) for val in vals: out += "%s\n" % (format, val, format) elif format.startswith('t'): ## user directly asked for some tags to be displayed only if record_exist_p == -1: out += get_fieldvalues_alephseq_like(recID, "001,%s,980" % cfg_oaiidtag) else: out += get_fieldvalues_alephseq_like(recID, ot) elif format == "hm": if record_exist_p == -1: out += "
" + cgi.escape(get_fieldvalues_alephseq_like(recID, "001,%s,980" % cfg_oaiidtag)) + "
" else: out += "
" + cgi.escape(get_fieldvalues_alephseq_like(recID, ot)) + "
" elif format.startswith("h") and ot: ## user directly asked for some tags to be displayed only if record_exist_p == -1: out += "
" + get_fieldvalues_alephseq_like(recID, "001,%s,980" % cfg_oaiidtag) + "
" else: out += "
" + get_fieldvalues_alephseq_like(recID, ot) + "
" elif format == "hd": # HTML detailed format if record_exist_p == -1: out += _("The record has been deleted.") else: # look for detailed format existence: query = "SELECT value FROM bibfmt WHERE id_bibrec='%s' AND format='%s'" % (recID, format) res = run_sql(query, None, 1) if res: # record 'recID' is formatted in 'format', so print it out += "%s" % decompress(res[0][0]) else: # record 'recID' is not formatted in 'format', so either call BibFormat on the fly or use default format # second, see if we are calling BibFormat on the fly: if cfg_call_bibformat: out += call_bibformat(recID) else: out += websearch_templates.tmpl_print_record_detailed( ln = ln, recID = recID, weburl = weburl, ) elif format.startswith("hb_") or format.startswith("hd_"): # underscore means that HTML brief/detailed formats should be called on-the-fly; suitable for testing formats if record_exist_p == -1: out += _("The record has been deleted.") else: out += call_bibformat(recID, format) elif format.startswith("hx"): # BibTeX format, called on the fly: if record_exist_p == -1: out += _("The record has been deleted.") else: out += call_bibformat(recID, format) elif format.startswith("hs"): # for citation/download similarity navigation links: if record_exist_p == -1: out += _("The record has been deleted.") else: out += """""" % (weburl, recID, ln) # firstly, title: titles = get_fieldvalues(recID, "245__a") if titles: for title in titles: out += "%s" % title else: # usual title not found, try conference title: titles = get_fieldvalues(recID, "111__a") if titles: for title in titles: out += "%s" % title else: # just print record ID: out += "%s %d" % (get_field_i18nname("record ID", ln), recID) out += "" # secondly, authors: authors = get_fieldvalues(recID, "100__a") + get_fieldvalues(recID, "700__a") if authors: out += " - %s" % authors[0] if len(authors) > 1: out += " et al" # thirdly publication info: publinfos = get_fieldvalues(recID, "773__s") if not publinfos: publinfos = get_fieldvalues(recID, "909C4s") if not publinfos: publinfos = get_fieldvalues(recID, "037__a") if not publinfos: publinfos = get_fieldvalues(recID, "088__a") if publinfos: out += " - %s" % publinfos[0] else: # fourthly publication year (if not publication info): years = get_fieldvalues(recID, "773__y") if not years: years = get_fieldvalues(recID, "909C4y") if not years: years = get_fieldvalues(recID, "260__c") if years: out += " (%s)" % years[0] else: # HTML brief format by default if record_exist_p == -1: out += _("The record has been deleted.") else: query = "SELECT value FROM bibfmt WHERE id_bibrec='%s' AND format='%s'" % (recID, format) res = run_sql(query) if res: # record 'recID' is formatted in 'format', so print it out += "%s" % decompress(res[0][0]) else: out += websearch_templates.tmpl_print_record_brief( ln = ln, recID = recID, weburl = weburl, ) # at the end of HTML brief mode, print the "Detailed record" functionality: if format == 'hp' or format.startswith("hb_") or format.startswith("hd_"): pass # do nothing for portfolio and on-the-fly formats else: out += websearch_templates.tmpl_print_record_brief_links( ln = ln, recID = recID, weburl = weburl, ) # print record closing tags, if needed: if format == "marcxml" or format == "oai_dc": out += "
\n" out += " \n" return out def encode_for_xml(s): "Encode special chars in string so that it would be XML-compliant." s = string.replace(s, '&', '&') s = string.replace(s, '<', '<') return s def call_bibformat(id, otype="HD"): """Calls BibFormat for the record 'id'. Desired BibFormat output type is passed in 'otype' argument. This function is mainly used to display full format, if they are not stored in the 'bibfmt' table.""" pipe_input, pipe_output, pipe_error = os.popen3(["%s/bibformat" % bindir, "otype=%s" % otype], 'rw') pipe_input.write(print_record(id, "xm")) pipe_input.close() out = pipe_output.read() pipe_output.close() pipe_error.close() return out def log_query(hostname, query_args, uid=-1): """Log query into the query and user_query tables.""" if uid > 0: # log the query only if uid is reasonable res = run_sql("SELECT id FROM query WHERE urlargs=%s", (query_args,), 1) try: id_query = res[0][0] except: id_query = run_sql("INSERT INTO query (type, urlargs) VALUES ('r', %s)", (query_args,)) if id_query: run_sql("INSERT INTO user_query (id_user, id_query, hostname, date) VALUES (%s, %s, %s, %s)", (uid, id_query, hostname, time.strftime("%04Y-%02m-%02d %02H:%02M:%02S", time.localtime()))) return def log_query_info(action, p, f, colls, nb_records_found_total=-1): """Write some info to the log file for later analysis.""" try: log = open(logdir + "/search.log", "a") log.write(time.strftime("%04Y%02m%02d%02H%02M%02S#", time.localtime())) log.write(action+"#") log.write(p+"#") log.write(f+"#") for coll in colls[:-1]: log.write("%s," % coll) log.write("%s#" % colls[-1]) log.write("%d" % nb_records_found_total) log.write("\n") log.close() except: pass return def wash_url_argument(var, new_type): """Wash list argument into 'new_type', that can be 'list', 'str', or 'int'. Useful for washing mod_python passed arguments, that are all lists of strings (URL args may be multiple), but we sometimes want only to take the first value, and sometimes to represent it as string or numerical value.""" out = [] if new_type == 'list': # return lst if type(var) is list: out = var else: out = [var] elif new_type == 'str': # return str if type(var) is list: try: out = "%s" % var[0] except: out = "" elif type(var) is str: out = var else: out = "%s" % var elif new_type == 'int': # return int if type(var) is list: try: out = string.atoi(var[0]) except: out = 0 elif type(var) is int: out = var elif type(var) is str: try: out = string.atoi(var) except: out = 0 else: out = 0 return out ### CALLABLES def perform_request_search(req=None, cc=cdsname, c=None, p="", f="", rg="10", sf="", so="d", sp="", rm="", of="id", ot="", as="0", p1="", f1="", m1="", op1="", p2="", f2="", m2="", op2="", p3="", f3="", m3="", sc="0", jrec="0", recid="-1", recidb="-1", sysno="", id="-1", idb="-1", sysnb="", action="", d1y="0", d1m="0", d1d="0", d2y="0", d2m="0", d2d="0", verbose="0", ap="0", ln=cdslang): """Perform search or browse request, without checking for authentication. Return list of recIDs found, if of=id. Otherwise create web page. The arguments are as follows: req - mod_python Request class instance. cc - current collection (e.g. "ATLAS"). The collection the user started to search/browse from. c - collectin list (e.g. ["Theses", "Books"]). The collections user may have selected/deselected when starting to search from 'cc'. p - pattern to search for (e.g. "ellis and muon or kaon"). f - field to search within (e.g. "author"). rg - records in groups of (e.g. "10"). Defines how many hits per collection in the search results page are displayed. sf - sort field (e.g. "title"). so - sort order ("a"=ascending, "d"=descending). sp - sort pattern (e.g. "CERN-") -- in case there are more values in a sort field, this argument tells which one to prefer rm - ranking method (e.g. "jif"). Defines whether results should be ranked by some known ranking method. of - output format (e.g. "hb"). Usually starting "h" means HTML output (and "hb" for HTML brief, "hd" for HTML detailed), "x" means XML output, "t" means plain text output, "id" means no output at all but to return list of recIDs found. (Suitable for high-level API.) ot - output only these MARC tags (e.g. "100,700,909C0b"). Useful if only some fields are to be shown in the output, e.g. for library to control some fields. as - advanced search ("0" means no, "1" means yes). Whether search was called from within the advanced search interface. p1 - first pattern to search for in the advanced search interface. Much like 'p'. f1 - first field to search within in the advanced search interface. Much like 'f'. m1 - first matching type in the advanced search interface. ("a" all of the words, "o" any of the words, "e" exact phrase, "p" partial phrase, "r" regular expression). op1 - first operator, to join the first and the second unit in the advanced search interface. ("a" add, "o" or, "n" not). p2 - second pattern to search for in the advanced search interface. Much like 'p'. f2 - second field to search within in the advanced search interface. Much like 'f'. m2 - second matching type in the advanced search interface. ("a" all of the words, "o" any of the words, "e" exact phrase, "p" partial phrase, "r" regular expression). op2 - second operator, to join the second and the third unit in the advanced search interface. ("a" add, "o" or, "n" not). p3 - third pattern to search for in the advanced search interface. Much like 'p'. f3 - third field to search within in the advanced search interface. Much like 'f'. m3 - third matching type in the advanced search interface. ("a" all of the words, "o" any of the words, "e" exact phrase, "p" partial phrase, "r" regular expression). sc - split by collection ("0" no, "1" yes). Governs whether we want to present the results in a single huge list, or splitted by collection. jrec - jump to record (e.g. "234"). Used for navigation inside the search results. recid - display record ID (e.g. "20000"). Do not search/browse but go straight away to the Detailed record page for the given recID. recidb - display record ID bis (e.g. "20010"). If greater than 'recid', then display records from recid to recidb. Useful for example for dumping records from the database for reformatting. sysno - display old system SYS number (e.g. ""). If you migrate to CDSware from another system, and store your old SYS call numbers, you can use them instead of recid if you wish so. id - the same as recid, in case recid is not set. For backwards compatibility. idb - the same as recid, in case recidb is not set. For backwards compatibility. sysnb - the same as sysno, in case sysno is not set. For backwards compatibility. action - action to do. "SEARCH" for searching, "Browse" for browsing. Default is to search. d1y - first date year (e.g. "1998"). Useful for search limits on creation date. d1m - first date month (e.g. "08"). Useful for search limits on creation date. d1d - first date day (e.g. "23"). Useful for search limits on creation date. d2y - second date year (e.g. "1998"). Useful for search limits on creation date. d2m - second date month (e.g. "09"). Useful for search limits on creation date. d2d - second date day (e.g. "02"). Useful for search limits on creation date. verbose - verbose level (0=min, 9=max). Useful to print some internal information on the searching process in case something goes wrong. ap - alternative patterns (0=no, 1=yes). In case no exact match is found, the search engine can try alternative patterns e.g. to replace non-alphanumeric characters by a boolean query. ap defines if this is wanted. ln - language of the search interface (e.g. "en"). Useful for internationalization. """ # wash all passed arguments: cc = wash_url_argument(cc, 'str') sc = wash_url_argument(sc, 'int') (cc, colls_to_display, colls_to_search) = wash_colls(cc, c, sc) # which colls to search and to display? p = wash_pattern(wash_url_argument(p, 'str')) f = wash_field(wash_url_argument(f, 'str')) rg = wash_url_argument(rg, 'int') sf = wash_url_argument(sf, 'str') so = wash_url_argument(so, 'str') sp = wash_url_argument(sp, 'str') rm = wash_url_argument(rm, 'str') of = wash_url_argument(of, 'str') if type(ot) is list: ot = string.join(ot,",") ot = wash_url_argument(ot, 'str') as = wash_url_argument(as, 'int') p1 = wash_pattern(wash_url_argument(p1, 'str')) f1 = wash_field(wash_url_argument(f1, 'str')) m1 = wash_url_argument(m1, 'str') op1 = wash_url_argument(op1, 'str') p2 = wash_pattern(wash_url_argument(p2, 'str')) f2 = wash_field(wash_url_argument(f2, 'str')) m2 = wash_url_argument(m2, 'str') op2 = wash_url_argument(op2, 'str') p3 = wash_pattern(wash_url_argument(p3, 'str')) f3 = wash_field(wash_url_argument(f3, 'str')) m3 = wash_url_argument(m3, 'str') jrec = wash_url_argument(jrec, 'int') recid = wash_url_argument(recid, 'int') recidb = wash_url_argument(recidb, 'int') sysno = wash_url_argument(sysno, 'str') id = wash_url_argument(id, 'int') idb = wash_url_argument(idb, 'int') sysnb = wash_url_argument(sysnb, 'str') action = wash_url_argument(action, 'str') d1y = wash_url_argument(d1y, 'int') d1m = wash_url_argument(d1m, 'int') d1d = wash_url_argument(d1d, 'int') d2y = wash_url_argument(d2y, 'int') d2m = wash_url_argument(d2m, 'int') d2d = wash_url_argument(d2d, 'int') day1, day2 = wash_dates(d1y, d1m, d1d, d2y, d2m, d2d) verbose = wash_url_argument(verbose, 'int') ap = wash_url_argument(ap, 'int') ln = wash_language(ln) _ = gettext_set_language(ln) # backwards compatibility: id, idb, sysnb -> recid, recidb, sysno (if applicable) if sysnb != "" and sysno == "": sysno = sysnb if id > 0 and recid == -1: recid = id if idb > 0 and recidb == -1: recidb = idb # TODO deduce passed search limiting criterias (if applicable) pl, pl_in_url = "", "" # no limits by default if action != _("Browse") and req and req.args: # we do not want to add options while browsing or while calling via command-line fieldargs = cgi.parse_qs(req.args) for fieldcode in get_fieldcodes(): if fieldargs.has_key(fieldcode): for val in fieldargs[fieldcode]: pl += "+%s:\"%s\" " % (fieldcode, val) pl_in_url += "&%s=%s" % (urllib.quote(fieldcode), urllib.quote(val)) # deduce recid from sysno argument (if applicable): if sysno: # ALEPH SYS number was passed, so deduce MySQL recID for the record: recid = get_mysql_recid_from_aleph_sysno(sysno) # deduce collection we are in (if applicable): if recid>0: cc = guess_primary_collection_of_a_record(recid) # deduce user id (if applicable): try: uid = getUid(req) except: uid = 0 ## 0 - start output if recid>0: ## 1 - detailed record display page_start(req, of, cc, as, ln, uid, _("Detailed record") + " #%d" % recid) if of == "hb": of = "hd" if record_exists(recid): if recidb<=recid: # sanity check recidb=recid+1 print_records(req, range(recid,recidb), -1, -9999, of, ot, ln) if req and of.startswith("h"): # register detailed record page view event client_ip_address = str(req.get_remote_host(apache.REMOTE_NOLOOKUP)) register_page_view_event(recid, uid, client_ip_address) else: # record does not exist if of.startswith("h"): print_warning(req, "Requested record does not seem to exist.") elif action == _("Browse"): ## 2 - browse needed page_start(req, of, cc, as, ln, uid, _("Browse")) if of.startswith("h"): req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so, sp, rm, of, ot, as, ln, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, action)) try: if as==1 or (p1 or p2 or p3): browse_pattern(req, colls_to_search, p1, f1, rg) browse_pattern(req, colls_to_search, p2, f2, rg) browse_pattern(req, colls_to_search, p3, f3, rg) else: browse_pattern(req, colls_to_search, p, f, rg) except: if of.startswith("h"): req.write(create_error_box(req, verbose=verbose, ln=ln)) return page_end(req, of, ln) elif rm and p.startswith("recid:"): ## 3-ter - similarity search needed page_start(req, of, cc, as, ln, uid, _("Search Results")) if of.startswith("h"): req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so, sp, rm, of, ot, as, ln, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, action)) if record_exists(p[6:]) != 1: # record does not exist if of.startswith("h"): print_warning(req, "Requested record does not seem to exist.") if of == "id": return [] else: # record well exists, so find similar ones to it t1 = os.times()[4] results_similar_recIDs, results_similar_relevances, results_similar_relevances_prologue, results_similar_relevances_epilogue, results_similar_comments = \ rank_records(rm, 0, get_collection_reclist(cdsname), string.split(p), verbose) if results_similar_recIDs: t2 = os.times()[4] cpu_time = t2 - t1 if of.startswith("h"): req.write(print_search_info(p, f, sf, so, sp, rm, of, ot, cdsname, len(results_similar_recIDs), jrec, rg, as, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, cpu_time)) print_warning(req, results_similar_comments) print_records(req, results_similar_recIDs, jrec, rg, of, ot, ln, results_similar_relevances, results_similar_relevances_prologue, results_similar_relevances_epilogue) elif of=="id": return results_similar_recIDs else: # rank_records failed and returned some error message to display: if of.startswith("h"): print_warning(req, results_similar_relevances_prologue) print_warning(req, results_similar_relevances_epilogue) print_warning(req, results_similar_comments) if of == "id": return [] elif cfg_experimental_features and p.startswith("cocitedwith:"): ## 3-terter - cited by search needed page_start(req, of, cc, as, ln, uid, _("Search Results")) if of.startswith("h"): req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so, sp, rm, of, ot, as, ln, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, action)) recID = p[12:] if record_exists(recID) != 1: # record does not exist if of.startswith("h"): print_warning(req, "Requested record does not seem to exist.") if of == "id": return [] else: # record well exists, so find co-cited ones: t1 = os.times()[4] results_cocited_recIDs = map(lambda x: x[0], calculate_co_cited_with_list(int(recID))) if results_cocited_recIDs: t2 = os.times()[4] cpu_time = t2 - t1 if of.startswith("h"): req.write(print_search_info(p, f, sf, so, sp, rm, of, ot, cdsname, len(results_cocited_recIDs), jrec, rg, as, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, cpu_time)) print_records(req, results_cocited_recIDs, jrec, rg, of, ot, ln) elif of=="id": return results_cocited_recIDs else: # cited rank_records failed and returned some error message to display: if of.startswith("h"): print_warning(req, "nothing found") if of == "id": return [] else: ## 3 - common search needed page_start(req, of, cc, as, ln, uid, _("Search Results")) if of.startswith("h"): req.write(create_search_box(cc, colls_to_display, p, f, rg, sf, so, sp, rm, of, ot, as, ln, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3, m3, sc, pl, d1y, d1m, d1d, d2y, d2m, d2d, action)) t1 = os.times()[4] results_in_any_collection = HitSet() if as == 1 or (p1 or p2 or p3): ## 3A - advanced search try: results_in_any_collection = search_pattern(req, p1, f1, m1, ap=ap, of=of, verbose=verbose, ln=ln) if results_in_any_collection._nbhits == 0: if of.startswith("h"): req.write(create_google_box(cc, p, f, p1, p2, p3, ln)) return page_end(req, of, ln) if p2: results_tmp = search_pattern(req, p2, f2, m2, ap=ap, of=of, verbose=verbose, ln=ln) if op1 == "a": # add results_in_any_collection.intersect(results_tmp) elif op1 == "o": # or results_in_any_collection.union(results_tmp) elif op1 == "n": # not results_in_any_collection.difference(results_tmp) else: if of.startswith("h"): print_warning(req, "Invalid set operation %s." % op1, "Error") results_in_any_collection.calculate_nbhits() if results_in_any_collection._nbhits == 0: if of.startswith("h"): req.write(create_google_box(cc, p, f, p1, p2, p3, ln)) return page_end(req, of, ln) if p3: results_tmp = search_pattern(req, p3, f3, m3, ap=ap, of=of, verbose=verbose, ln=ln) if op2 == "a": # add results_in_any_collection.intersect(results_tmp) elif op2 == "o": # or results_in_any_collection.union(results_tmp) elif op2 == "n": # not results_in_any_collection.difference(results_tmp) else: if of.startswith("h"): print_warning(req, "Invalid set operation %s." % op2, "Error") results_in_any_collection.calculate_nbhits() except: if of.startswith("h"): req.write(create_error_box(req, verbose=verbose, ln=ln)) req.write(create_google_box(cc, p, f, p1, p2, p3, ln)) return page_end(req, of, ln) else: ## 3B - simple search try: results_in_any_collection = search_pattern(req, p, f, ap=ap, of=of, verbose=verbose, ln=ln) except: if of.startswith("h"): req.write(create_error_box(req, verbose=verbose, ln=ln)) req.write(create_google_box(cc, p, f, p1, p2, p3, ln)) return page_end(req, of, ln) if results_in_any_collection._nbhits == 0: if of.startswith("h"): req.write(create_google_box(cc, p, f, p1, p2, p3, ln)) return page_end(req, of, ln) # search_cache_key = p+"@"+f+"@"+string.join(colls_to_search,",") # if search_cache.has_key(search_cache_key): # is the result in search cache? # results_final = search_cache[search_cache_key] # else: # results_final = search_pattern(req, p, f, colls_to_search) # search_cache[search_cache_key] = results_final # if len(search_cache) > cfg_search_cache_size: # is the cache full? (sanity cleaning) # search_cache.clear() # search stage 4: intersection with collection universe: try: results_final = intersect_results_with_collrecs(req, results_in_any_collection, colls_to_search, ap, of, verbose, ln) except: if of.startswith("h"): req.write(create_error_box(req, verbose=verbose, ln=ln)) req.write(create_google_box(cc, p, f, p1, p2, p3, ln)) return page_end(req, of, ln) if results_final == {}: if of.startswith("h"): req.write(create_google_box(cc, p, f, p1, p2, p3, ln)) return page_end(req, of, ln) # search stage 5: apply search option limits and restrictions: if day1 != "": try: results_final = intersect_results_with_hitset(req, results_final, search_unit_in_bibrec(day1, day2), ap, aptext= _("No match within your time limits, " "discarding this condition...")) except: if of.startswith("h"): req.write(create_error_box(req, verbose=verbose, ln=ln)) req.write(create_google_box(cc, p, f, p1, p2, p3, ln)) return page_end(req, of, ln) if results_final == {}: if of.startswith("h"): req.write(create_google_box(cc, p, f, p1, p2, p3, ln)) return page_end(req, of, ln) if pl: try: results_final = intersect_results_with_hitset(req, results_final, search_pattern(req, pl, ap=0, ln=ln), ap, aptext=_("No match within your search limits, " "discarding this condition...")) except: if of.startswith("h"): req.write(create_error_box(req, verbose=verbose, ln=ln)) req.write(create_google_box(cc, p, f, p1, p2, p3, ln)) return page_end(req, of, ln) if results_final == {}: if of.startswith("h"): req.write(create_google_box(cc, p, f, p1, p2, p3, ln)) return page_end(req, of, ln) t2 = os.times()[4] cpu_time = t2 - t1 ## search stage 6: display results: results_final_nb_total = 0 results_final_nb = {} # will hold number of records found in each collection # (in simple dict to display overview more easily; may refactor later) for coll in results_final.keys(): results_final_nb[coll] = results_final[coll]._nbhits results_final_nb_total += results_final_nb[coll] if results_final_nb_total == 0: if of.startswith('h'): print_warning(req, "No match found, please enter different search terms.") else: # yes, some hits found: good! # collection list may have changed due to not-exact-match-found policy so check it out: for coll in results_final.keys(): if coll not in colls_to_search: colls_to_search.append(coll) # print results overview: if of == "id": # we have been asked to return list of recIDs results_final_for_all_colls = HitSet() for coll in results_final.keys(): results_final_for_all_colls.union(results_final[coll]) recIDs = results_final_for_all_colls.items().tolist() if sf: # do we have to sort? recIDs = sort_records(req, recIDs, sf, so, sp, verbose) elif rm: # do we have to rank? results_final_for_all_colls_rank_records_output = rank_records(rm, 0, results_final_for_all_colls, string.split(p) + string.split(p1) + string.split(p2) + string.split(p3), verbose) if results_final_for_all_colls_rank_records_output[0]: recIDs = results_final_for_all_colls_rank_records_output[0] return recIDs elif of.startswith("h"): req.write(print_results_overview(req, colls_to_search, results_final_nb_total, results_final_nb, cpu_time, ln)) # print records: if len(colls_to_search)>1: cpu_time = -1 # we do not want to have search time printed on each collection for coll in colls_to_search: if results_final.has_key(coll) and results_final[coll]._nbhits: if of.startswith("h"): req.write(print_search_info(p, f, sf, so, sp, rm, of, ot, coll, results_final_nb[coll], jrec, rg, as, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, cpu_time)) results_final_recIDs = results_final[coll].items() results_final_relevances = [] results_final_relevances_prologue = "" results_final_relevances_epilogue = "" if sf: # do we have to sort? results_final_recIDs = sort_records(req, results_final_recIDs, sf, so, sp, verbose) elif rm: # do we have to rank? results_final_recIDs_ranked, results_final_relevances, results_final_relevances_prologue, results_final_relevances_epilogue, results_final_comments = \ rank_records(rm, 0, results_final[coll], string.split(p) + string.split(p1) + string.split(p2) + string.split(p3), verbose) if of.startswith("h"): print_warning(req, results_final_comments) if results_final_recIDs_ranked: results_final_recIDs = results_final_recIDs_ranked else: # rank_records failed and returned some error message to display: print_warning(req, results_final_relevances_prologue) print_warning(req, results_final_relevances_epilogue) print_records(req, results_final_recIDs, jrec, rg, of, ot, ln, results_final_relevances, results_final_relevances_prologue, results_final_relevances_epilogue) if of.startswith("h"): req.write(print_search_info(p, f, sf, so, sp, rm, of, ot, coll, results_final_nb[coll], jrec, rg, as, ln, p1, p2, p3, f1, f2, f3, m1, m2, m3, op1, op2, sc, pl_in_url, d1y, d1m, d1d, d2y, d2m, d2d, cpu_time, 1)) if f == "author" and of.startswith("h"): req.write(create_similarly_named_authors_link_box(p, ln)) # log query: try: log_query(req.get_remote_host(), req.args, uid) except: # do not log query if req is None (used by CLI interface) pass log_query_info("ss", p, f, colls_to_search, results_final_nb_total) ## 4 - write footer: if of.startswith("h"): req.write(create_google_box(cc, p, f, p1, p2, p3, ln)) return page_end(req, of, ln) def perform_request_cache(req, action="show"): """Manipulates the search engine cache.""" global search_cache global collection_reclist_cache global collection_reclist_cache_timestamp global field_i18nname_cache global field_i18nname_cache_timestamp global collection_i18nname_cache global collection_i18nname_cache_timestamp req.content_type = "text/html" req.send_http_header() out = "" out += "

Search Cache

" # clear cache if requested: if action == "clear": search_cache = {} collection_reclist_cache = create_collection_reclist_cache() # show collection reclist cache: out += "

Collection reclist cache

" res = run_sql("SHOW TABLE STATUS LIKE 'collection'") out += "- collection table last updated: %s" % str(res[0][11]) out += "
- reclist cache timestamp: %s" % collection_reclist_cache_timestamp out += "
- reclist cache contents:" out += "
" for coll in collection_reclist_cache.keys(): if collection_reclist_cache[coll]: out += "%s (%d)
" % (coll, get_collection_reclist(coll)._nbhits) out += "
" # show search cache: out += "

Search Cache

" out += "
" if len(search_cache): out += """""" out += "" % ("Pattern","Field","Collection","Number of Hits") for search_cache_key in search_cache.keys(): p, f, c = string.split(search_cache_key, "@", 2) # find out about length of cached data: l = 0 for coll in search_cache[search_cache_key]: l += search_cache[search_cache_key][coll]._nbhits out += "" % (p, f, c, l) out += "
%s%s%s%s
%s%s%s%d
" else: out += "

Search cache is empty." out += "

" out += """

clear cache""" % weburl # show field i18nname cache: out += "

Field I18N names cache

" res = run_sql("SHOW TABLE STATUS LIKE 'fieldname'") out += "- fieldname table last updated: %s" % str(res[0][11]) out += "
- i18nname cache timestamp: %s" % field_i18nname_cache_timestamp out += "
- i18nname cache contents:" out += "
" for field in field_i18nname_cache.keys(): for ln in field_i18nname_cache[field].keys(): out += "%s, %s = %s
" % (field, ln, field_i18nname_cache[field][ln]) out += "
" # show collection i18nname cache: out += "

Collection I18N names cache

" res = run_sql("SHOW TABLE STATUS LIKE 'collectionname'") out += "- collectionname table last updated: %s" % str(res[0][11]) out += "
- i18nname cache timestamp: %s" % collection_i18nname_cache_timestamp out += "
- i18nname cache contents:" out += "
" for coll in collection_i18nname_cache.keys(): for ln in collection_i18nname_cache[coll].keys(): out += "%s, %s = %s
" % (coll, ln, collection_i18nname_cache[coll][ln]) out += "
" req.write(out) return "\n" def perform_request_log(req, date=""): """Display search log information for given date.""" req.content_type = "text/html" req.send_http_header() req.write("

Search Log

") if date: # case A: display stats for a day yyyymmdd = string.atoi(date) req.write("

Date: %d

" % yyyymmdd) req.write("""""") req.write("" % ("No.","Time", "Pattern","Field","Collection","Number of Hits")) # read file: p = os.popen("grep ^%d %s/search.log" % (yyyymmdd,logdir), 'r') lines = p.readlines() p.close() # process lines: i = 0 for line in lines: try: datetime, as, p, f, c, nbhits = string.split(line,"#") i += 1 req.write("" \ % (i, datetime[8:10], datetime[10:12], datetime[12:], p, f, c, nbhits)) except: pass # ignore eventual wrong log lines req.write("
%s%s%s%s%s%s
#%d%s:%s:%s%s%s%s%s
") else: # case B: display summary stats per day yyyymm01 = int(time.strftime("%04Y%02m01", time.localtime())) yyyymmdd = int(time.strftime("%04Y%02m%02d", time.localtime())) req.write("""""") req.write("" % ("Day", "Number of Queries")) for day in range(yyyymm01,yyyymmdd+1): p = os.popen("grep -c ^%d %s/search.log" % (day,logdir), 'r') for line in p.readlines(): req.write("""""" % (day, weburl,day,line)) p.close() req.write("
%s%s
%s%s
") return "\n" def profile(p="", f="", c=cdsname): """Profile search time.""" import profile import pstats profile.run("perform_request_search(p='%s',f='%s', c='%s')" % (p, f, c), "perform_request_search_profile") p = pstats.Stats("perform_request_search_profile") p.strip_dirs().sort_stats("cumulative").print_stats() return 0 ## test cases: #print wash_colls(cdsname,"Library Catalogue", 0) #print wash_colls("Periodicals & Progress Reports",["Periodicals","Progress Reports"], 0) #print wash_field("wau") #print print_record(20,"tm","001,245") #print create_opft_search_units(None, "PHE-87-13","reportnumber") #print ":"+wash_pattern("* and % doo * %")+":\n" #print ":"+wash_pattern("*")+":\n" #print ":"+wash_pattern("ellis* ell* e*%")+":\n" #print run_sql("SELECT name,dbquery from collection") #print get_index_id("author") #print get_coll_ancestors("Theses") #print get_coll_sons("Articles & Preprints") #print get_coll_real_descendants("Articles & Preprints") #print get_collection_reclist("Theses") #print log(sys.stdin) #print search_unit_in_bibrec('2002-12-01','2002-12-12') #print wash_dates('1980', '', '28', '2003','02','') #print type(wash_url_argument("-1",'int')) #print get_nearest_terms_in_bibxxx("ellis", "author", 5, 5) #print call_bibformat(68, "HB_FLY") #print create_collection_i18nname_cache() ## profiling: #profile("of the this") #print perform_request_search(p="ellis") diff --git a/modules/websearch/lib/search_engine_config.py b/modules/websearch/lib/search_engine_config.py index ecb76fbd9..16c0d8e19 100644 --- a/modules/websearch/lib/search_engine_config.py +++ b/modules/websearch/lib/search_engine_config.py @@ -1,40 +1,40 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware Search Engine config parameters.""" ## import config variables defined from config.wml: -from config import cfg_max_recID, \ - cfg_instant_browse, \ - cfg_author_et_al_threshold, \ - cfg_search_cache_size, \ - cfg_nb_records_to_sort, \ - cfg_call_bibformat, \ - cfg_use_aleph_sysnos, \ - cfg_fields_convert, \ - cfg_simplesearch_pattern_box_width, \ - cfg_advancedsearch_pattern_box_width, \ - cfg_narrow_search_show_grandsons, \ - cfg_oaiidtag, \ - cfg_create_similarly_named_authors_link_box, \ - cfg_google_box, \ - cfg_google_box_servers +from cdsware.config import cfg_max_recID, \ + cfg_instant_browse, \ + cfg_author_et_al_threshold, \ + cfg_search_cache_size, \ + cfg_nb_records_to_sort, \ + cfg_call_bibformat, \ + cfg_use_aleph_sysnos, \ + cfg_fields_convert, \ + cfg_simplesearch_pattern_box_width, \ + cfg_advancedsearch_pattern_box_width, \ + cfg_narrow_search_show_grandsons, \ + cfg_oaiidtag, \ + cfg_create_similarly_named_authors_link_box, \ + cfg_google_box, \ + cfg_google_box_servers ## do we want experimental features? (0=no, 1=yes) cfg_experimental_features = 0 diff --git a/modules/websearch/lib/search_engine_tests.py b/modules/websearch/lib/search_engine_tests.py index 353bf9230..cb5939355 100644 --- a/modules/websearch/lib/search_engine_tests.py +++ b/modules/websearch/lib/search_engine_tests.py @@ -1,184 +1,185 @@ # -*- coding: utf-8 -*- ## $Id$ ## CDSware Search Engine unit tests. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """Unit tests for the search engine.""" __version__ = "$Id$" -import search_engine import unittest +from cdsware import search_engine + class TestWashQueryParameters(unittest.TestCase): """Test for washing of search query parameters.""" def test_wash_url_argument(self): """search engine - washing of URL arguments""" self.assertEqual(1, search_engine.wash_url_argument(['1'],'int')) self.assertEqual("1", search_engine.wash_url_argument(['1'],'str')) self.assertEqual(['1'], search_engine.wash_url_argument(['1'],'list')) self.assertEqual(0, search_engine.wash_url_argument('ellis','int')) self.assertEqual("ellis", search_engine.wash_url_argument('ellis','str')) self.assertEqual(["ellis"], search_engine.wash_url_argument('ellis','list')) self.assertEqual(0, search_engine.wash_url_argument(['ellis'],'int')) self.assertEqual("ellis", search_engine.wash_url_argument(['ellis'],'str')) self.assertEqual(["ellis"], search_engine.wash_url_argument(['ellis'],'list')) def test_wash_pattern(self): """search engine - washing of query patterns""" self.assertEqual("Ellis, J", search_engine.wash_pattern('Ellis, J')) self.assertEqual("ell", search_engine.wash_pattern('ell*')) class TestStripAccents(unittest.TestCase): """Test for handling of UTF-8 accents.""" def test_strip_accents(self): """search engine - stripping of accented letters""" self.assertEqual("memememe", search_engine.strip_accents('mémêmëmè')) self.assertEqual("MEMEMEME", search_engine.strip_accents('MÉMÊMËMÈ')) class TestQueryParser(unittest.TestCase): """Test of search pattern (or query) parser.""" def _check(self, p, f, m, result_wanted): "Internal checking function calling create_basic_search_units." result_obtained = search_engine.create_basic_search_units(None, p, f, m) assert result_obtained == result_wanted, \ 'obtained %s instead of %s' % (repr(result_obtained), repr(result_wanted)) return def test_parsing_single_word_query(self): "search engine - parsing single word queries" self._check('word', '', None, [['+', 'word', '', 'w']]) def test_parsing_single_word_with_boolean_operators(self): "search engine - parsing single word queries" self._check('+word', '', None, [['+', 'word', '', 'w']]) self._check('-word', '', None, [['-', 'word', '', 'w']]) self._check('|word', '', None, [['|', 'word', '', 'w']]) def test_parsing_single_word_in_field(self): "search engine - parsing single word queries in a logical field" self._check('word', 'title', None, [['+', 'word', 'title', 'w']]) def test_parsing_single_word_in_tag(self): "search engine - parsing single word queries in a physical tag" self._check('word', '500', None, [['+', 'word', '500', 'a']]) def test_parsing_query_with_commas(self): "search engine - parsing queries with commas" self._check('word,word', 'title', None, [['+', 'word,word', 'title', 'a']]) def test_parsing_exact_phrase_query(self): "search engine - parsing exact phrase" self._check('"the word"', 'title', None, [['+', 'the word', 'title', 'a']]) def test_parsing_exact_phrase_query_unbalanced(self): "search engine - parsing unbalanced exact phrase" self._check('"the word', 'title', None, [['+', '"the', 'title', 'w'], ['+', 'word', 'title', 'w']]) def test_parsing_exact_phrase_query_in_any_field(self): "search engine - parsing exact phrase in any field" self._check('"the word"', '', None, [['+', 'the word', 'anyfield', 'a']]) def test_parsing_partial_phrase_query(self): "search engine - parsing partial phrase" self._check("'the word'", 'title', None, [['+', '%the word%', 'title', 'a']]) def test_parsing_partial_phrase_query_unbalanced(self): "search engine - parsing unbalanced partial phrase" self._check("'the word", 'title', None, [['+', "'the", 'title', 'w'], ['+', "word", 'title', 'w']]) def test_parsing_partial_phrase_query_in_any_field(self): "search engine - parsing partial phrase in any field" self._check("'the word'", '', None, [['+', "'the", '', 'w'], ['+', "word'", '', 'w']]) def test_parsing_regexp_query(self): "search engine - parsing regex matches" self._check("/the word/", 'title', None, [['+', 'the word', 'title', 'r']]) def test_parsing_regexp_query_unbalanced(self): "search engine - parsing unbalanced regexp" self._check("/the word", 'title', None, [['+', '/the', 'title', 'w'], ['+', 'word', 'title', 'w']]) def test_parsing_regexp_query_in_any_field(self): "search engine - parsing regexp searches in any field" self._check("/the word/", '', None, [['+', "/the", '', 'w'], ['+', "word/", '', 'w']]) def test_parsing_boolean_query(self): "search engine - parsing boolean query with several words" self._check("muon kaon ellis cern", '', None, [['+', 'muon', '', 'w'], ['+', 'kaon', '', 'w'], ['+', 'ellis', '', 'w'], ['+', 'cern', '', 'w']]) def test_parsing_boolean_query_with_word_operators(self): "search engine - parsing boolean query with word operators" self._check("muon and kaon or ellis not cern", '', None, [['+', 'muon', '', 'w'], ['+', 'kaon', '', 'w'], ['|', 'ellis', '', 'w'], ['-', 'cern', '', 'w']]) def test_parsing_boolean_query_with_symbol_operators(self): "search engine - parsing boolean query with symbol operators" self._check("muon +kaon |ellis -cern", '', None, [['+', 'muon', '', 'w'], ['+', 'kaon', '', 'w'], ['|', 'ellis', '', 'w'], ['-', 'cern', '', 'w']]) def test_parsing_boolean_query_with_symbol_operators_and_spaces(self): "search engine - parsing boolean query with symbol operators and spaces" self._check("muon + kaon | ellis - cern", '', None, [['+', 'muon', '', 'w'], ['+', 'kaon', '', 'w'], ['|', 'ellis', '', 'w'], ['-', 'cern', '', 'w']]) def test_parsing_boolean_query_with_symbol_operators_and_no_spaces(self): "search engine - parsing boolean query with symbol operators and no spaces" self._check("muon+kaon|ellis-cern", '', None, [['+', 'muon+kaon|ellis-cern', '', 'w']]) def test_parsing_combined_structured_query(self): "search engine - parsing combined structured query" self._check("title:muon author:ellis", '', None, [['+', 'muon', 'title', 'w'], ['+', 'ellis', 'author', 'w']]) def test_parsing_structured_regexp_query(self): "search engine - parsing structured regexp query" self._check("title:/(one|two)/", '', None, [['+', '(one|two)', 'title', 'r']]), def test_parsing_combined_structured_query_in_a_field(self): "search engine - parsing structured query in a field" self._check("title:muon author:ellis", 'abstract', None, [['+', 'muon', 'title', 'w'], ['+', 'ellis', 'author', 'w']]) def create_test_suite(): """Return test suite for the search engine.""" return unittest.TestSuite((unittest.makeSuite(TestWashQueryParameters,'test'), unittest.makeSuite(TestStripAccents,'test'), unittest.makeSuite(TestQueryParser,'test'))) if __name__ == "__main__": unittest.TextTestRunner(verbosity=2).run(create_test_suite()) diff --git a/modules/websearch/lib/websearch_templates.py b/modules/websearch/lib/websearch_templates.py index a7ed0808e..48fb2defe 100644 --- a/modules/websearch/lib/websearch_templates.py +++ b/modules/websearch/lib/websearch_templates.py @@ -1,2362 +1,2362 @@ # -*- coding: utf-8 -*- ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import urllib import time import cgi import gettext import string import locale import sre -from config import * -from dbquery import run_sql -from messages import gettext_set_language -from search_engine_config import * +from cdsware.config import * +from cdsware.dbquery import run_sql +from cdsware.messages import gettext_set_language +from cdsware.search_engine_config import * def get_fieldvalues(recID, tag): """Return list of field values for field TAG inside record RECID. FIXME: should be imported commonly for search_engine too.""" out = [] if tag == "001___": # we have asked for recID that is not stored in bibXXx tables out.append(str(recID)) else: # we are going to look inside bibXXx tables digit = tag[0:2] bx = "bib%sx" % digit bibx = "bibrec_bib%sx" % digit query = "SELECT bx.value FROM %s AS bx, %s AS bibx WHERE bibx.id_bibrec='%s' AND bx.id=bibx.id_bibxxx AND bx.tag LIKE '%s'" \ "ORDER BY bibx.field_number, bx.tag ASC" % (bx, bibx, recID, tag) res = run_sql(query) for row in res: out.append(row[0]) return out class Template: # This dictionary maps CDSware language code to locale codes (ISO 639) tmpl_localemap = { 'ca': 'ca_ES', 'de': 'de_DE', 'el': 'el_GR', 'en': 'en_US', 'es': 'es_ES', 'pt': 'pt_BR', 'fr': 'fr_FR', 'it': 'it_IT', 'ru': 'ru_RU', 'sk': 'sk_SK', 'cs': 'cs_CZ', 'no': 'no_NO', 'sv': 'sv_SE', 'uk': 'uk_UA', } tmpl_default_locale = "en_US" # which locale to use by default, useful in case of failure def tmpl_navtrail_links(self, as, ln, weburl, separator, dads): """ Creates the navigation bar at top of each search page (*Home > Root collection > subcollection > ...*) Parameters: - 'as' *bool* - Should we display an advanced search box? - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'separator' *string* - The separator between two consecutive collections - 'dads' *list* - A list of parent links, eachone being a dictionary of ('name', 'longname') """ out = "" for url, name in dads: if out: out += separator out += '''%(longname)s''' % { 'weburl' : weburl, 'qname' : urllib.quote_plus (url), 'as' : as, 'ln' : ln, 'longname' : name } return out def tmpl_webcoll_body(self, weburl, te_portalbox, searchfor, np_portalbox, narrowsearch, focuson, ne_portalbox): """ Creates the body of the main search page. Parameters: - 'weburl' *string* - The base URL for the site - 'te_portalbox' *string* - The HTML code for the portalbox on top of search - 'searchfor' *string* - The HTML code for the search options - 'np_portalbox' *string* - The HTML code for the portalbox on bottom of search - 'searchfor' *string* - The HTML code for the search categories (left bottom of page) - 'focuson' *string* - The HTML code for the "focuson" categories (right bottom of page) - 'ne_portalbox' *string* - The HTML code for the bottom of the page """ body = """

%(searchfor)s %(np_portalbox)s """ % { 'weburl' : weburl, 'searchfor' : searchfor, 'np_portalbox' : np_portalbox, 'narrowsearch' : narrowsearch } if focuson: body += """""" body += """
%(narrowsearch)s""" + focuson + """
%(ne_portalbox)s
""" % {'ne_portalbox' : ne_portalbox} return body def tmpl_portalbox(self, title, body): """Creates portalboxes based on the parameters Parameters: - 'title' *string* - The title of the box - 'body' *string* - The HTML code for the body of the box """ out = """
%(title)s
%(body)s
""" % {'title' : title, 'body' : body} return out def tmpl_searchfor_simple(self, ln,weburl,asearchurl, header, middle_option): """Produces simple *Search for* box for the current collection. Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'asearchurl' *string* - The URL to advanced search form - 'header' *string* - header of search form - 'middle_option' *string* - HTML code for the options (any field, specific fields ...) """ # load the right message language _ = gettext_set_language(ln) # print commentary start: out = """ """ % { 'ln' : ln, 'weburl' : weburl, 'asearchurl' : asearchurl, 'header' : header, 'middle_option' : middle_option, 'msg_search' : _('Search'), 'msg_browse' : _('Browse'), 'msg_search_tips' : _('Search Tips'), 'msg_advanced_search' : _('Advanced Search'), } return out def tmpl_searchfor_advanced(self, ln, # current language weburl, # base url ssearchurl, # url to simple search form header, # header of search form middle_option_1, middle_option_2, middle_option_3, searchoptions, sortoptions, rankoptions, displayoptions, formatoptions ): """ Produces advanced *Search for* box for the current collection. Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'ssearchurl' *string* - The URL to simple search form - 'header' *string* - header of search form - 'middle_option_1' *string* - HTML code for the first row of options (any field, specific fields ...) - 'middle_option_2' *string* - HTML code for the second row of options (any field, specific fields ...) - 'middle_option_3' *string* - HTML code for the third row of options (any field, specific fields ...) - 'searchoptions' *string* - HTML code for the search options - 'sortoptions' *string* - HTML code for the sort options - 'rankoptions' *string* - HTML code for the rank options - 'displayoptions' *string* - HTML code for the display options - 'formatoptions' *string* - HTML code for the format options """ # load the right message language _ = gettext_set_language(ln) out = """ """ % { 'ln' : ln, 'weburl' : weburl, 'ssearchurl' : ssearchurl, 'header' : header, 'matchbox_m1' : self.tmpl_matchtype_box('m1', ln=ln), 'middle_option_1' : middle_option_1, 'andornot_op1' : self.tmpl_andornot_box('op1', ln=ln), 'matchbox_m2' : self.tmpl_matchtype_box('m2', ln=ln), 'middle_option_2' : middle_option_2, 'andornot_op2' : self.tmpl_andornot_box('op2', ln=ln), 'matchbox_m3' : self.tmpl_matchtype_box('m3', ln=ln), 'middle_option_3' : middle_option_3, 'msg_search' : _("Search"), 'msg_browse' : _("Browse"), 'msg_search_tips' : _("Search Tips"), 'msg_simple_search' : _("Simple Search") } if (searchoptions): out += """""" % { 'searchheader' : _("Search options:"), 'searchoptions' : searchoptions } out += """ """ % { 'added' : _("Added since:"), 'until' : _("until:"), 'date_added' : self.tmpl_inputdate("d1", ln=ln), 'date_until' : self.tmpl_inputdate("d2", ln=ln), 'msg_sort' : _("Sort by:"), 'msg_display' : _("Display results:"), 'msg_format' : _("Output format:"), 'sortoptions' : sortoptions, 'rankoptions' : rankoptions, 'displayoptions' : displayoptions, 'formatoptions' : formatoptions } return out def tmpl_matchtype_box(self, name='m', value='', ln='en'): """Returns HTML code for the 'match type' selection box. Parameters: - 'name' *string* - The name of the produced select - 'value' *string* - The selected value (if any value is already selected) - 'ln' *string* - the language to display """ # load the right message language _ = gettext_set_language(ln) out = """ """ % {'name' : name, 'sela' : self.tmpl_is_selected('a', value), 'opta' : _("All of the words:"), 'selo' : self.tmpl_is_selected('o', value), 'opto' : _("Any of the words:"), 'sele' : self.tmpl_is_selected('e', value), 'opte' : _("Exact phrase:"), 'selp' : self.tmpl_is_selected('p', value), 'optp' : _("Partial phrase:"), 'selr' : self.tmpl_is_selected('r', value), 'optr' : _("Regular expression:") } return out def tmpl_is_selected(self, var, fld): """ Checks if *var* and *fld* are equal, and if yes, returns ' selected'. Useful for select boxes. Parameters: - 'var' *string* - First value to compare - 'fld' *string* - Second value to compare """ if var == fld or (var == True and fld) or (fld == True and var): return " selected" else: return "" def tmpl_andornot_box(self, name='op', value='', ln='en'): """ Returns HTML code for the AND/OR/NOT selection box. Parameters: - 'name' *string* - The name of the produced select - 'value' *string* - The selected value (if any value is already selected) - 'ln' *string* - the language to display """ # load the right message language _ = gettext_set_language(ln) out = """ """ % {'name' : name, 'sela' : self.tmpl_is_selected('a', value), 'opta' : _("AND"), 'selo' : self.tmpl_is_selected('o', value), 'opto' : _("OR"), 'seln' : self.tmpl_is_selected('n', value), 'optn' : _("AND NOT") } return out def tmpl_inputdate(self, name, ln, sy = 0, sm = 0, sd = 0): """ Produces *From Date*, *Until Date* kind of selection box. Suitable for search options. Parameters: - 'name' *string* - The base name of the produced selects - 'ln' *string* - the language to display """ # load the right message language _ = gettext_set_language(ln) box = """ """ # month box += """ """ # year box += """ """ return box def tmpl_narrowsearch(self, as, ln, weburl, title, type, father, has_grandchildren, instant_browse, sons, display_grandsons, grandsons): """ Creates list of collection descendants of type *type* under title *title*. If as==1, then links to Advanced Search interfaces; otherwise Simple Search. Suitable for 'Narrow search' and 'Focus on' boxes. Parameters: - 'as' *bool* - Should we display an advanced search box? - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'title' *string* - The title of the produced box - 'type' *string* - The type of the produced box (virtual collections or normal collections) - 'father' *collection* - The current collection - 'has_grandchildren' *bool* - If the current collection has grand children - 'sons' *list* - The list of the sub-collections (first level) - 'display_grandsons' *bool* - If the grand children collections should be displayed (2 level deep display) - 'grandsons' *list* - The list of sub-collections (second level) """ # load the right message language _ = gettext_set_language(ln) if has_grandchildren: style_prolog = "" style_epilog = "" else: style_prolog = "" style_epilog = "" out = '' if type == 'r': out += """""" % { 'name' : father.name, } if len(sons): out += """""" % {'title' : title} # iterate through sons: i = 0 for son in sons: out += """""" % {'name' : son.name } else: out += """ """ % {'name' : son.name } out += """""" i += 1 out += "
%(title)s
""" if type=='r': if son.restricted_p() and son.restricted_p() != father.restricted_p(): out += """ %(prolog)s%(longname)s%(epilog)s%(recs)s """ % { 'url' : weburl, 'name' : urllib.quote_plus(son.name), 'as' : as, 'ln' : ln, 'prolog' : style_prolog, 'longname' : son.get_name(ln), 'epilog' : style_epilog, 'recs' : son.create_nbrecs_info(ln) } if son.restricted_p(): out += """ [%(msg)s]""" % { 'msg' : _("restricted") } if display_grandsons and len(grandsons[i]): # iterate trough grandsons: out += """
""" for grandson in grandsons[i]: out += """ %(longname)s%(nbrec)s """ % { 'weburl' : weburl, 'name' : urllib.quote_plus(grandson.name), 'as' : as, 'ln' : ln, 'longname' : grandson.get_name(ln), 'nbrec' : grandson.create_nbrecs_info(ln) } out += """
" else: if type == 'r': # no sons, and type 'r', so print info on collection content: out += """
%(header)s
%(body)s
""" % { 'header' : _("Latest additions:"), 'body' : instant_browse } return out def tmpl_nbrecs_info(self, number, prolog = None, epilog = None): """ Return information on the number of records. Parameters: - 'number' *string* - The number of records - 'prolog' *string* (optional) - An HTML code to prefix the number (if **None**, will be '(') - 'epilog' *string* (optional) - An HTML code to append to the number (if **None**, will be ')') """ if number is None: return '' if prolog == None: prolog = """ (""" if epilog == None: epilog = """)""" return prolog + number + epilog def tmpl_box_restricted_content(self, ln): """ Displays a box containing a *restricted content* message Parameters: - 'ln' *string* - The language to display """ # load the right message language _ = gettext_set_language(ln) return _("The contents of this collection is restricted.") def tmpl_box_no_records(self, ln): """ Displays a box containing a *no content* message Parameters: - 'ln' *string* - The language to display """ # load the right message language _ = gettext_set_language(ln) return _("This collection does not contain any document yet.") def tmpl_instant_browse(self, ln, recids, more_link = None): """ Formats a list of records (given in the recids list) from the database. Parameters: - 'ln' *string* - The language to display - 'recids' *list* - the list of records from the database - 'more_link' *string* - the "More..." link for the record. If not given, will not be displayed """ # load the right message language _ = gettext_set_language(ln) if not len(recids): return "" out = """""" for recid in recids: out += """""" % { 'date': recid['date'], 'body': recid['body'] } out += "
%(date)s %(body)s
" if more_link: out += """""" % { 'url' : more_link, 'ln' : ln, 'text' : _("more")} return out def tmpl_searchwithin_select(self, ln, fieldname, selected, values): """ Produces 'search within' selection box for the current collection. Parameters: - 'ln' *string* - The language to display - 'fieldname' *string* - the name of the select box produced - 'selected' *string* - which of the values is selected - 'values' *list* - the list of values in the select """ out = """""" return out def tmpl_select(self, fieldname, values, selected = None, css_class = ''): """ Produces a generic select box Parameters: - 'css_class' *string* - optional, a css class to display this select with - 'fieldname' *list* - the name of the select box produced - 'selected' *string* - which of the values is selected - 'values' *list* - the list of values in the select """ if css_class != '': class_field = ' class="%s"' % css_class else: class_field = '' out = """""" return out def tmpl_record_links(self, weburl, recid, ln): """ Displays the *More info* and *Find similar* links for a record Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'recid' *string* - the id of the displayed record """ # load the right message language _ = gettext_set_language(ln) out = """
%(msgdetail)s - %(msgsimilar)s """ % { 'weburl' : weburl, 'recid' : recid, 'ln' : ln, 'msgdetail' : _("Detailed record"), 'msgsimilar' : _("Similar records") } if cfg_experimental_features: out += """ - %s\n""" % ( weburl, recid, ln, _("Cited by")) return out def tmpl_record_body(self, weburl, titles, authors, dates, rns, abstracts, urls_u, urls_z): """ Displays the "HTML basic" format of a record Parameters: - 'weburl' *string* - The base URL for the site - 'authors' *list* - the authors (as strings) - 'dates' *list* - the dates of publication - 'rns' *list* - the quicknotes for the record - 'abstracts' *list* - the abstracts for the record - 'urls_u' *list* - URLs to the original versions of the notice - 'urls_z' *list* - Not used """ out = "" for title in titles: out += "%(title)s " % { 'title' : cgi.escape(title) } if authors: out += " / " for i in range (0,cfg_author_et_al_threshold): if i < len(authors): out += """%(name)s ;""" % { 'weburl' : weburl, 'name_url' : urllib.quote(authors[i]), 'name' : cgi.escape(authors[i]) } if len(authors) > cfg_author_et_al_threshold: out += " et al" for date in dates: out += " %s." % cgi.escape(date) for rn in rns: out += """ [%(rn)s]""" % {'rn' : cgi.escape(rn)} for abstract in abstracts: out += "
%(abstract)s [...]" % {'abstract' : cgi.escape(abstract[:1+string.find(abstract, '.')]) } for idx in range(0,len(urls_u)): out += """
%(name)s""" % { 'url' : urls_u[idx], 'name' : urls_u[idx] } return out def tmpl_search_in_bibwords(self, p, f, ln, nearest_box): """ Displays the *Words like current ones* links for a search Parameters: - 'p' *string* - Current search words - 'f' *string* - the fields in which the search was done - 'nearest_box' *string* - the HTML code for the "nearest_terms" box - most probably from a create_nearest_terms_box call """ # load the right message language _ = gettext_set_language(ln) out = "

%(words)s %(p)s " % { 'words' : _("Words nearest to"), 'p' : p, } if f: out += "%(inside)s %(f)s " %{ 'inside' : _("inside"), 'f' : f, } out += _("in any collection are:") + "
" out += nearest_box return out def tmpl_nearest_term_box(self, p, ln, f, weburl, terms, termargs, termhits, intro): """ Displays the *Nearest search terms* box Parameters: - 'p' *string* - Current search words - 'f' *string* - a collection description (if the search has been completed in a collection) - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'terms' *array* - the broken down related terms - 'termargs' *array* - the URL parameters to compose the search queries for the terms - 'termhits' *array* - the number of hits in each query - 'intro' *string* - the intro HTML to prefix the box with """ out = """""" for i in range(0, len(terms)): if terms[i] == p: # print search word for orientation: if termhits[i] > 0: out += """""" % { 'hits' : termhits[i], 'weburl' : weburl, 'urlargs' : termargs[i], 'term' : terms[i] } else: out += """""" % { 'term' : terms[i] } else: out += """""" % { 'hits' : termhits[i], 'weburl' : weburl, 'urlargs' : termargs[i], 'term' : terms[i] } out += "
%(hits)d   %(term)s
-   %(term)s
%(hits)s   %(term)s
" return intro + "

" + out + "
" def tmpl_browse_pattern(self, f, ln, weburl, browsed_phrases_in_colls, urlarg_colls): """ Displays the *Nearest search terms* box Parameters: - 'f' *string* - a field name (i18nized) - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'browsed_phrases_in_colls' *array* - the phrases to display - 'urlargs_colls' *string* - the url parameters for the search """ # load the right message language _ = gettext_set_language(ln) out = """""" % { 'hits' : _("Hits"), 'f' : f } if len(browsed_phrases_in_colls) == 1: # one hit only found: phrase, nbhits = browsed_phrases_in_colls[0][0], browsed_phrases_in_colls[0][1] out += """""" % { 'nbhits' : nbhits, 'weburl' : weburl, 'phrase_qt' : urllib.quote(phrase), 'phrase' : phrase, 'f' : urllib.quote(f), 'urlargs' : urlarg_colls, } elif len(browsed_phrases_in_colls) > 1: # first display what was found but the last one: for phrase, nbhits in browsed_phrases_in_colls[:-1]: out += """""" % { 'nbhits' : nbhits, 'weburl' : weburl, 'phrase_qt' : urllib.quote(phrase), 'phrase' : phrase, 'f' : urllib.quote(f), 'urlargs' : urlarg_colls, } # now display last hit as "next term": phrase, nbhits = browsed_phrases_in_colls[-1] out += """""" % { 'weburl' : weburl, 'phrase_qt' : urllib.quote(phrase), 'browse' : _("Browse"), 'next' : _("next"), 'f' : urllib.quote(f), 'urlargs' : urlarg_colls, } out += """
%(hits)s   %(f)s
%(nbhits)s   %(phrase)s
%(nbhits)s   %(phrase)s
  %(next)s
""" return out def tmpl_search_box(self, ln, weburl, as, cc, cc_intl, ot, sp, action, fieldslist, f1, f2, f3, m1, m2, m3, p1, p2, p3, op1, op2, rm, p, f, coll_selects, d1y, d2y, d1m, d2m, d1d, d2d, sort_formats, sf, so, ranks, sc, rg, formats, of): """ Displays the *Nearest search terms* box Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'as' *bool* - Should we display an advanced search box? - 'cc_intl' *string* - the i18nized current collection name - 'cc' *string* - the internal current collection name - 'ot', 'sp' *string* - hidden values - 'action' *string* - the action demanded by the user - 'fieldlist' *list* - the list of all fields available in CDSWare, for use in select within boxes in advanced search - 'p, f, f1, f2, f3, m1, m2, m3, p1, p2, p3, op1, op2, op3, rm' *strings* - the search parameters - 'coll_selects' *array* - a list of lists, each containing the collections selects to display - 'd1y, d2y, d1m, d2m, d1d, d2d' *int* - the search between dates - 'sort_formats' *array* - the select information for the sorting format - 'sf' *string* - the currently selected sort format - 'so' *string* - the currently selected sort order ("a" or "d") - 'ranks' *array* - ranking methods - 'rm' *string* - selected ranking method - 'sc' *string* - split by collection or not - 'rg' *string* - selected results/page - 'formats' *array* - available output formats - 'of' *string* - the selected output format """ # load the right message language _ = gettext_set_language(ln) out = "" # print search box prolog: out += """

%(ccname)s

""" % { 'ccname' : cc_intl, 'weburl' : weburl, 'cc' : cc, 'as' : as, 'ln' : ln } if ot: out += self.tmpl_input_hidden('ot', ot) if sp: out += self.tmpl_input_hidden('sp', sp) leadingtext = _("Search") if action == _("Browse") : leadingtext = _("Browse") if as == 1: # print Advanced Search form: google = '' if cfg_google_box and (p1 or p2 or p3): google = """ :: %(search_smwhere)s""" % { 'search_smwhere' : _("Try your search on...") } # define search box elements: out += """ """ % { 'leading' : leadingtext, 'sizepattern' : cfg_advancedsearch_pattern_box_width, 'matchbox1' : self.tmpl_matchtype_box('m1', m1, ln=ln), 'p1' : cgi.escape(p1,1), 'searchwithin1' : self.tmpl_searchwithin_select( ln = ln, fieldname = 'f1', selected = f1, values = self._add_mark_to_field(value = f1, fields = fieldslist, ln = ln) ), 'andornot1' : self.tmpl_andornot_box( name = 'op1', value = op1, ln = ln ), 'matchbox2' : self.tmpl_matchtype_box('m2', m2, ln=ln), 'p2' : cgi.escape(p2,1), 'searchwithin2' : self.tmpl_searchwithin_select( ln = ln, fieldname = 'f2', selected = f2, values = self._add_mark_to_field(value = f2, fields = fieldslist, ln = ln) ), 'andornot2' : self.tmpl_andornot_box( name = 'op2', value = op2, ln = ln ), 'matchbox3' : self.tmpl_matchtype_box('m3', m3, ln=ln), 'p3' : cgi.escape(p3,1), 'searchwithin3' : self.tmpl_searchwithin_select( ln = ln, fieldname = 'f3', selected = f3, values = self._add_mark_to_field(value = f3, fields = fieldslist, ln = ln) ), 'search' : _("Search"), 'browse' : _("Browse"), 'weburl' : weburl, 'ln' : ln, 'search_tips': _("Search Tips"), 'p1_qt' : urllib.quote(p1), 'f1_qt' : urllib.quote(f1), 'rm' : urllib.quote(rm), 'cc' : urllib.quote(cc), 'simple_search' : _("Simple Search"), 'google' : google, } else: # print Simple Search form: google = '' if cfg_google_box and (p1 or p2 or p3): google = """ :: %(search_smwhere)s""" % { 'search_smwhere' : _("Try your search on...") } out += """ """ % { 'leading' : leadingtext, 'sizepattern' : cfg_advancedsearch_pattern_box_width, 'p' : cgi.escape(p, 1), 'searchwithin' : self.tmpl_searchwithin_select( ln = ln, fieldname = 'f', selected = f, values = self._add_mark_to_field(value = f, fields = fieldslist, ln = ln) ), 'search' : _("Search"), 'browse' : _("Browse"), 'weburl' : weburl, 'ln' : ln, 'search_tips': _("Search Tips"), 'p_qt' : urllib.quote(p), 'f_qt' : urllib.quote(f), 'rm' : urllib.quote(rm), 'cc' : urllib.quote(cc), 'advanced_search' : _("Advanced Search"), 'google' : google, } ## secondly, print Collection(s) box: selects = '' for sel in coll_selects: selects += self.tmpl_select(fieldname = 'c', values = sel) out += """ """ % { 'leading' : leadingtext, 'msg_coll' : _("collections"), 'colls' : selects, } ## thirdly, print search limits, if applicable: if action != _("Browse") and p1: out += """""" % { 'sizepattern' : cfg_advancedsearch_pattern_box_width, 'p1' : cgi.escape(p1, 1), } ## fourthly, print from/until date boxen, if applicable: if action == _("Browse") or (d1y==0 and d1m==0 and d1d==0 and d2y==0 and d2m==0 and d2d==0): pass # do not need it else: cell_6_a = self.tmpl_inputdate_box("d1", d1y, d1m, d1d, ln=ln) cell_6_b = create_inputdate_box("d2", d2y, d2m, d2d, ln=ln) out += """""" % { 'added' : _("Added since:"), 'until' : _("Until"), 'date1' : self.tmpl_inputdate_box("d1", d1y, d1m, d1d, ln=ln), 'date2' : self.tmpl_inputdate_box("d2", d2y, d2m, d2d, ln=ln), } ## fifthly, print Display results box, including sort/rank, formats, etc: if action != _("Browse"): rgs = [] for i in [10, 25, 50, 100, 250, 500]: rgs.append({ 'value' : i, 'text' : "%d %s" % (i, _("results"))}) # sort by: out += """""" % { 'sort_by' : _("Sort by:"), 'display_res' : _("Display results:"), 'out_format' : _("Output format:"), 'select_sf' : self.tmpl_select(fieldname = 'sf', values = sort_formats, selected = sf, css_class = 'address'), 'select_so' : self.tmpl_select(fieldname = 'so', values = [{ 'value' : 'a', 'text' : _("asc.") }, { 'value' : 'd', 'text' : _("desc.") }], selected = so, css_class = 'address'), 'select_rm' : self.tmpl_select(fieldname = 'rm', values = ranks, selected = rm, css_class = 'address'), 'select_rg' : self.tmpl_select(fieldname = 'rg', values = rgs, selected = rg, css_class = 'address'), 'select_sc' : self.tmpl_select(fieldname = 'sc', values = [{ 'value' : '0', 'text' : _("single list") }, { 'value' : '1', 'text' : _("split by collection") }], selected = so, css_class = 'address'), 'select_of' : self.tmpl_searchwithin_select( ln = ln, fieldname = 'of', selected = of, values = self._add_mark_to_field(value = of, fields = formats, chars = 3, ln = ln) ), } ## last but not least, print end of search box: out += """
""" return out def tmpl_input_hidden(self, name, value): "Produces the HTML code for a hidden field " return """""" % { 'name' : name, 'value' : value, } def _add_mark_to_field(self, value, fields, ln, chars = 1): """Adds the current value as a MARC tag in the fields array Useful for advanced search""" # load the right message language _ = gettext_set_language(ln) out = fields if value and str(value[0:chars]).isdigit(): out.append({'value' : value, 'text' : str(value) + " " + _("MARC tag") }) return out def tmpl_google_box(self, ln, cc, p, f, prolog_start, prolog_end, column_separator, link_separator, epilog): """Creates the box that proposes links to other useful search engines like Google. Parameters: - 'ln' *string* - The language to display in - 'cc' *string* - the internal current collection name - 'p' *string* - the search query - 'f' *string* - the current field - 'prolog_start, prolog_end, column_separator, link_separator, epilog' *strings* - default HTML code for the specified position in the box """ # load the right message language _ = gettext_set_language(ln) out_links = [] p_quoted = urllib.quote(p) # Amazon if cfg_google_box_servers.get('Amazon', 0): if string.find(cc, "Book") >= 0: if f == "author": out_links.append("""%s %s Amazon""" % (p_quoted, p, _('in'))) else: out_links.append("""%s %s Amazon""" % (p_quoted, p, _('in'))) # CERN Intranet: if cfg_google_box_servers.get('CERN Intranet', 0): out_links.append("""%s %s CERN Intranet""" % (urllib.quote(string.replace(p, ' ', ' +')), p, _('in'))) # CERN Agenda: if cfg_google_box_servers.get('CERN Agenda', 0): if f == "author": out_links.append("""%s %s CERN Agenda""" % (p_quoted, p, _('in'))) elif f == "title": out_links.append("""%s %s CERN Agenda""" % (p_quoted, p, _('in'))) # CERN EDMS: if cfg_google_box_servers.get('CERN Agenda', 0): # FIXME: reusing CERN Agenda config variable until we can enter CERN EDMS into config.wml if f == "author": out_links.append("""%s %s CERN EDMS""" % (p_quoted, p, _("in"))) elif f == "title" or f == "abstract" or f == "keyword": out_links.append("""%s %s CERN EDMS""" % (p_quoted, p, _("in"))) elif f == "reportnumber": out_links.append("""%s %s CERN EDMS""" % (p_quoted, p, _("in"))) else: out_links.append("""%s %s CERN EDMS""" % (p_quoted, p, _("in"))) # CiteSeer: if cfg_google_box_servers.get('CiteSeer', 0): out_links.append("""%s %s CiteSeer""" % (p_quoted, p, _('in'))) # Google Print: if cfg_google_box_servers.get('Google Scholar', 0): # FIXME: reusing Google Scholar config variable until we can enter Google Print into config.wml if string.find(cc, "Book") >= 0: out_links.append("""%s %s Google Print""" % (p_quoted, p, _("in"))) # Google Scholar: if cfg_google_box_servers.get('Google Scholar', 0): if f == "author": out_links.append("""%s %s Google Scholar""" % (p_quoted, p, _('in'))) else: out_links.append("""%s %s Google Scholar""" % (p_quoted, p, _('in'))) # Google Web: if cfg_google_box_servers.get('Google Web', 0): if f == "author": p_google = p if string.find(p, ",") >= 0 and (not p.startswith('"')) and (not p.endswith('"')): p_lastname, p_firstnames = string.split(p, ",", 1) p_google = '"%s %s" OR "%s %s"' % (p_lastname, p_firstnames, p_firstnames, p_lastname) out_links.append("""%s %s Google Web""" % (urllib.quote(p_google), p_google, _('in'))) else: out_links.append("""%s %s Google Web""" % (p_quoted, p, _('in'))) # IEC if cfg_google_box_servers.get('IEC', 0): if string.find(cc, "Standard") >= 0: out_links.append("""%s %s IEC""" % (p_quoted, p, _('in'))) # IHS if cfg_google_box_servers.get('IHS', 0): if string.find(cc, "Standard") >= 0: out_links.append("""%s %s IHS""" % (p_quoted, p, _('in'))) # INSPEC if cfg_google_box_servers.get('INSPEC', 0): if f == "author": p_inspec = sre.sub(r'(, )| ', '-', p) p_inspec = sre.sub(r'(-\w)\w+$', '\\1', p_inspec) out_links.append("""%s %s INSPEC""" % (urllib.quote(p_inspec), p_inspec, _('in'))) elif f == "title": out_links.append("""%s %s INSPEC""" % (p_quoted, p, _('in'))) elif f == "abstract": out_links.append("""%s %s INSPEC""" % (p_quoted, p, _('in'))) elif f == "year": out_links.append("""%s %s INSPEC""" % (p_quoted, p, _('in'))) # ISO if cfg_google_box_servers.get('ISO', 0): if string.find(cc, "Standard") >= 0: out_links.append("""%s %s ISO""" % (p_quoted, p, _('in'))) # KEK if cfg_google_box_servers.get('KEK', 0): kek_search_title = "KEK KISS Preprints" kek_search_baseurl = "http://www-lib.kek.jp/cgi-bin/kiss_prepri?" if string.find(cc, "Book") >= 0: kek_search_title = "KEK Library Books" kek_search_baseurl = "http://www-lib.kek.jp/cgi-bin/kiss_book?DSP=1&" elif string.find(cc, "Periodical") >= 0: kek_search_title = "KEK Library Journals" kek_search_baseurl = "http://www-lib.kek.jp/cgi-bin/kiss_book?DSP=2&" if f == "author": out_links.append("""%s %s %s""" % \ (kek_search_baseurl, p_quoted, p, _('in'), kek_search_title)) elif f == "title": out_links.append("""%s %s %s""" % \ (kek_search_baseurl, p_quoted, p, _('in'), kek_search_title)) elif f == "reportnumber": out_links.append("""%s %s %s""" % \ (kek_search_baseurl, p_quoted, p, _('in'), kek_search_title)) # NEBIS if cfg_google_box_servers.get('NEBIS', 0): if string.find(cc, "Book") >= 0: if f == "author": out_links.append("""%s %s NEBIS""" % (p_quoted, p, _('in'))) elif f == "title": out_links.append("""%s %s NEBIS""" % (p_quoted, p, _('in'))) else: out_links.append("""%s %s NEBIS""" % (p_quoted, p, _('in'))) # Scirus: if cfg_google_box_servers.get('Google Scholar', 0): # FIXME: reusing Google Scholar config variable until we can enter Scirus into config.wml if f == "author": out_links.append("""%s %s Scirus""" % (p_quoted, p, _("in"))) elif f == "title": out_links.append("""%s %s Scirus""" % (p_quoted, p, _("in"))) elif f == "keyword": out_links.append("""%s %s Scirus""" % (p_quoted, p, _("in"))) else: out_links.append("""%s %s Scirus""" % (p_quoted, p, _("in"))) # SPIRES if cfg_google_box_servers.get('SPIRES', 0): spires_search_title = "SLAC SPIRES HEP" spires_search_baseurl = "http://www.slac.stanford.edu/spires/find/hep/" if string.find(cc, "Book") >= 0: spires_search_title = "SLAC Library Books" spires_search_baseurl = "http://www.slac.stanford.edu/spires/find/books/" elif string.find(cc, "Periodical") >= 0: spires_search_title = "SLAC Library Journals" spires_search_baseurl = "http://www.slac.stanford.edu/spires/find/tserials/" if f == "author": out_links.append("""%s %s %s""" % \ (spires_search_baseurl, p_quoted, p, _('in'), spires_search_title)) elif f == "title": out_links.append("""%s %s %s""" % \ (spires_search_baseurl, p_quoted, p, _('in'), spires_search_title)) elif f == "reportnumber": out_links.append("""%s %s %s""" % \ (spires_search_baseurl, p_quoted, p, _('in'), spires_search_title)) elif f == "keyword": out_links.append("""%s %s %s""" % \ (spires_search_baseurl, p_quoted, p, _('in'), spires_search_title)) else: # invent a poor man's any field search since SPIRES doesn't support one out_links.append("""%s %s %s""" % \ (spires_search_baseurl, p_quoted, p_quoted, p_quoted, p_quoted, p_quoted, p, _('in'), spires_search_title)) # okay, so print the box now: out = "" if out_links: out += """""" out += prolog_start + _("Haven't found what you were looking for? Try your search on other servers:") + prolog_end nb_out_links_in_one_column = len(out_links)/2 out += string.join(out_links[:nb_out_links_in_one_column], link_separator) out += column_separator out += string.join(out_links[nb_out_links_in_one_column:], link_separator) out += epilog return out def tmpl_search_pagestart(self, ln) : "page start for search page. Will display after the page header" return """
""" def tmpl_search_pageend(self, ln) : "page end for search page. Will display just before the page footer" return """
""" def tmpl_print_warning(self, msg, type, prologue, epilogue): """Prints warning message and flushes output. Parameters: - 'msg' *string* - The message string - 'type' *string* - the warning type - 'prologue' *string* - HTML code to display before the warning - 'epilogue' *string* - HTML code to display after the warning """ out = '\n%s' % (prologue) if type: out += '%s: ' % type out += '%s%s' % (msg, epilogue) return out def tmpl_print_search_info(self, ln, weburl, middle_only, collection, collection_name, as, sf, so, rm, rg, nb_found, of, ot, p, f, f1, f2, f3, m1, m2, m3, op1, op2, p1, p2, p3, d1y, d1m, d1d, d2y, d2m, d2d, all_fieldcodes, cpu_time, pl_in_url, jrec, sc, sp): """Prints stripe with the information on 'collection' and 'nb_found' results and CPU time. Also, prints navigation links (beg/next/prev/end) inside the results set. If middle_only is set to 1, it will only print the middle box information (beg/netx/prev/end/etc) links. This is suitable for displaying navigation links at the bottom of the search results page. Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'middle_only' *bool* - Only display parts of the interface - 'collection' *string* - the collection name - 'collection_name' *string* - the i18nized current collection name - 'as' *bool* - if we display the advanced search interface - 'sf' *string* - the currently selected sort format - 'so' *string* - the currently selected sort order ("a" or "d") - 'rm' *string* - selected ranking method - 'rg' *int* - selected results/page - 'nb_found' *int* - number of results found - 'of' *string* - the selected output format - 'ot' *string* - hidden values - 'p' *string* - Current search words - 'f' *string* - the fields in which the search was done - 'f1, f2, f3, m1, m2, m3, p1, p2, p3, op1, op2' *strings* - the search parameters - 'jrec' *int* - number of first record on this page - 'd1y, d2y, d1m, d2m, d1d, d2d' *int* - the search between dates - 'all_fieldcodes' *array* - all the available fields - 'cpu_time' *float* - the time of the query in seconds """ # load the right message language _ = gettext_set_language(ln) out = "" # left table cells: print collection name if not middle_only: out += """
""" % { 'collection_qt' : urllib.quote(collection), 'collection_qt_plus' : urllib.quote_plus(collection), 'as' : as, 'ln' : ln, 'collection_name' : collection_name, 'weburl' : weburl, } else: out += """
""" % { 'weburl' : weburl } # middle table cell: print beg/next/prev/end arrows: if not middle_only: out += """
" else: out += "" # right table cell: cpu time info if not middle_only: if cpu_time > -1: out += """""" % { 'time' : _("Search took %.2f seconds.") % cpu_time, } out += "
%(collection_name)s %(recs_found)s  """ % { 'recs_found' : _("%s records found") % self.tmpl_nice_number(nb_found, ln) } else: out += "" if nb_found > rg: out += "" + collection_name + " : " + _("%s records found") % self.tmpl_nice_number(nb_found, ln) + "   " if nb_found > rg: # navig.arrows are needed, since we have many hits if (pl_in_url): scbis = 1 else: scbis = 0 url = """%(weburl)s/search.py?p=%(p_qt)s&cc=%(coll_qt)s&f=%(f)s&sf=%(sf)s&so=%(so)s&sp=%(sp)s&rm=%(rm)s&of=%(of)s&ot=%(ot)s&as=%(as)s&ln=%(ln)s&p1=%(p1)s&p2=%(p2)s&p3=%(p3)s&f1=%(f1)s&f2=%(f2)s&f3=%(f3)s&m1=%(m1)s&m2=%(m2)s&m3=%(m3)s&op1=%(op1)s&op2=%(op2)s&sc=%(sc)d&d1y=%(d1y)d&d1m=%(d1m)d&d1d=%(d1d)d&d2y=%(d2y)d&d2m=%(d2m)d&d2d=%(d2d)d""" % { 'weburl' : weburl, 'p_qt' : urllib.quote(p), 'coll_qt' : urllib.quote(collection), 'f' : f, 'sf' : sf, 'so' : so, 'sp' : sp, 'rm' : rm, 'of' : of, 'ot' : ot, 'as' : as, 'ln' : ln, 'p1' : urllib.quote(p1), 'p2' : urllib.quote(p2), 'p3' : urllib.quote(p3), 'f1' : f1, 'f2' : f2, 'f3' : f3, 'm1' : m1, 'm2' : m2, 'm3' : m3, 'op1' : op1, 'op2' : op2, 'sc' : scbis, 'd1y' : d1y, 'd1m' : d1m, 'd1d' : d1d, 'd2y' : d2y, 'd2m' : d2m, 'd2d' : d2d, } # @todo here if jrec-rg > 1: out += """%(begin)s""" % { 'url' : url, 'rg' : rg, 'weburl' : weburl, 'begin' : _("begin"), } if jrec > 1: out += """%(previous)s""" % { 'url' : url, 'jrec' : max(jrec-rg, 1), 'rg' : rg, 'weburl' : weburl, 'previous' : _("previous") } if jrec+rg-1 < nb_found: out += "%d - %d" % (jrec, jrec+rg-1) else: out += "%d - %d" % (jrec, nb_found) if nb_found >= jrec+rg: out += """%(next)s""" % { 'url' : url, 'jrec' : jrec + rg, 'rg' : rg, 'weburl' : weburl, 'next' : _("next") } if nb_found >= jrec+rg+rg: out += """%(end)s""" % { 'url' : url, 'jrec' : nb_found-rg+1, 'rg' : rg, 'weburl' : weburl, 'end' : _("end") } # still in the navigation part cc = collection sc = 0 for var in ['p', 'cc', 'f', 'sf', 'so', 'of', 'rg', 'as', 'ln', 'p1', 'p2', 'p3', 'f1', 'f2', 'f3', 'm1', 'm2', 'm3', 'op1', 'op2', 'sc', 'd1y', 'd1m', 'd1d', 'd2y', 'd2m', 'd2d']: out += self.tmpl_input_hidden(name = var, value = vars()[var]) for var in ['ot', 'sp', 'rm']: if vars()[var]: out += self.tmpl_input_hidden(name = var, value = vars()[var]) if pl_in_url: fieldargs = cgi.parse_qs(pl_in_url) for fieldcode in all_fieldcodes: # get_fieldcodes(): if fieldargs.has_key(fieldcode): for val in fieldargs[fieldcode]: out += self.tmpl_input_hidden(name = cgi.escape(fieldcode), value = cgi.escape(val)) out += """  %(jump)s """ % { 'jump' : _("jump to record:"), 'jrec' : jrec, } if not middle_only: out += "%(time)s 
" else: out += "" out += "
" return out def tmpl_nice_number(self, number, ln): "Returns nicely printed number NUM in language LN using the locale." if number is None: return None # Temporarily switch the numeric locale to the requeted one, and format the number # In case the system has no locale definition, use the vanilla form ol = locale.getlocale(locale.LC_NUMERIC) try: locale.setlocale(locale.LC_NUMERIC, self.tmpl_localemap.get(ln, self.tmpl_default_locale)) except locale.Error: return str(number) number = locale.format('%d', number, True) locale.setlocale(locale.LC_NUMERIC, ol) return number def tmpl_records_format_htmlbrief(self, ln, weburl, rows, relevances_prologue, relevances_epilogue): """Returns the htmlbrief format of the records Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'rows' *array* - Parts of the interface to display, in the format: - 'rows[number]' *int* - The order number - 'rows[recid]' *int* - The recID - 'rows[relevance]' *string* - The relevance of the record - 'rows[record]' *string* - The formatted record - 'relevances_prologue' *string* - HTML code to prepend the relevance indicator - 'relevances_epilogue' *string* - HTML code to append to the relevance indicator (used mostly for formatting) """ # load the right message language _ = gettext_set_language(ln) out = """
""" % { 'weburl' : weburl, } for row in rows: out += """ """ % row['record'] out += """
%(number)s """ % row if row['relevance']: out += """
""" % { 'prologue' : relevances_prologue, 'epilogue' : relevances_epilogue, 'relevance' : row['relevance'] } out += """
%s

""" % { 'basket' : _("ADD TO BASKET") } return out def tmpl_records_format_other(self, ln, weburl, rows, format, url_args): """Returns other formats of the records Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'rows' *array* - Parts of the interface to display, in the format: - 'rows[record]' *string* - The formatted record - 'rows[number]' *int* - The order number - 'rows[recid]' *int* - The recID - 'rows[relevance]' *string* - The relevance of the record - 'format' *string* - The current format - 'url_args' *string* - The rest of the search query """ # load the right message language _ = gettext_set_language(ln) out = """

%(format)s: """ % { 'format' : _("Format") } if format == "hm": out += """HTML | BibTeX | DC | MARC | MARCXML""" % vars() elif format == "hx": out += """HTML | BibTeX | DC | MARC | MARCXML""" % vars() else: out += """HTML | BibTeX | DC | MARC | MARCXML""" % vars() out += "
" for row in rows: out += row ['record'] if format.startswith("hd"): # do not print further information but for HTML detailed formats if row ['creationdate']: out += """
%(dates)s

%(similar)s


""" % { 'dates' : _("Record created %s, last modified %s") % (row['creationdate'], row['modifydate']), 'weburl' : weburl, 'recid' : row['recid'], 'ln' : ln, 'similar' : _("Similar records"), 'basket' : _("ADD TO BASKET") } out += '' if row.has_key ('citinglist'): cs = row ['citinglist'] similar = self.tmpl_print_record_list_for_similarity_boxen ( _("Cited by: %s records") % len (cs), cs, ln) out += ''' ''' % { 'weburl': weburl, 'recid': row ['recid'], 'ln': ln, 'similar': similar, 'more': _("more"), } if row.has_key ('cociting'): cs = row ['cociting'] similar = self.tmpl_print_record_list_for_similarity_boxen ( _("Co-cited with: %s records") % len (cs), cs, ln) out += ''' ''' % { 'weburl': weburl, 'recid': row ['recid'], 'ln': ln, 'similar': similar, 'more': _("more"), } if row.has_key ('citationhistory'): out += '' % row ['citationhistory'] if row.has_key ('downloadsimilarity'): cs = row ['downloadsimilarity'] similar = self.tmpl_print_record_list_for_similarity_boxen ( _("People who downloaded this document also downloaded:"), cs, ln) out += ''' ''' % { 'weburl': weburl, 'recid': row ['recid'], 'ln': ln, 'similar': similar, 'more': _("more"), 'graph': row ['downloadhistory'] } out += '
%(similar)s %(more)s

%(similar)s %(more)s
%s
%(graph)s
%(similar)s
' if row.has_key ('viewsimilarity'): out += '

 ' out += self.tmpl_print_record_list_for_similarity_boxen ( _("People who viewed this page also viewed:"), row ['viewsimilarity'], ln) if row.has_key ('reviews'): out += '

 ' out += row['reviews'] if row.has_key ('comments'): out += row['comments'] out += "

 " return out def tmpl_print_results_overview(self, ln, weburl, results_final_nb_total, cpu_time, results_final_nb, colls, url_args): """Prints results overview box with links to particular collections below. Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'results_final_nb_total' *int* - The total number of hits for the query - 'colls' *array* - The collections with hits, in the format: - 'coll[code]' *string* - The code of the collection (canonical name) - 'coll[name]' *string* - The display name of the collection - 'results_final_nb' *array* - The number of hits, indexed by the collection codes: - 'cpu_time' *string* - The time the query took - 'url_args' *string* - The rest of the search query """ if len(colls) == 1: # if one collection only, print nothing: return "" # load the right message language _ = gettext_set_language(ln) # first find total number of hits: out = """

%(founds)s
""" % { 'founds' : _("Results overview: Found %s records in %.2f seconds.") % (self.tmpl_nice_number(results_final_nb_total, ln), cpu_time) } # then print hits per collection: for coll in colls: if results_final_nb.has_key(coll['code']) and results_final_nb[coll['code']] > 0: out += """%(coll_name)s, %(number)s
""" % { 'coll' : urllib.quote(coll['code']), 'coll_name' : coll['name'], 'number' : _("%s records found") % self.tmpl_nice_number(results_final_nb[coll['code']], ln) } out += "
" return out def tmpl_search_no_boolean_hits(self, ln, weburl, nearestterms): """No hits found, proposes alternative boolean queries Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'nearestterms' *array* - Parts of the interface to display, in the format: - 'nearestterms[nbhits]' *int* - The resulting number of hits - 'nearestterms[url_args]' *string* - The search parameters - 'nearestterms[p]' *string* - The search terms """ # load the right message language _ = gettext_set_language(ln) out = _("Boolean query returned no hits. Please combine your search terms differently.") out += """

""" for term in nearestterms: out += """""" % { 'hits' : term['nbhits'], 'weburl' : weburl, 'url_args' : term['url_args'], 'p' : term['p'] } out += """
%(hits)s   %(p)s
""" return out def tmpl_similar_author_names(self, ln, weburl, authors): """No hits found, proposes alternative boolean queries Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'authors' *array* - The authors information, in the format: - 'authors[nb]' *int* - The resulting number of hits - 'authors[name]' *string* - The author """ # load the right message language _ = gettext_set_language(ln) out = """ """ % { 'similar' : _("See also: similar author names") } for author in authors: out += """""" % { 'nb' : author['nb'], 'weburl' : weburl, 'auth_qt' : urllib.quote(author['name']), 'auth' : author['name'], } out += """
%(similar)s
%(nb)d %(auth)s
""" return out def tmpl_print_record_detailed(self, recID, ln, weburl): """Displays a detailed on-the-fly record Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'recID' *int* - The record id """ # okay, need to construct a simple "Detailed record" format of our own: out = "

 " # secondly, title: titles = get_fieldvalues(recID, "245__a") for title in titles: out += "

%s
" % title # thirdly, authors: authors = get_fieldvalues(recID, "100__a") + get_fieldvalues(recID, "700__a") if authors: out += "

" for author in authors: out += """%s ;""" % (weburl, urllib.quote(author), author) out += "
" # fourthly, date of creation: dates = get_fieldvalues(recID, "260__c") for date in dates: out += "

%s
" % date # fifthly, abstract: abstracts = get_fieldvalues(recID, "520__a") for abstract in abstracts: out += """

Abstract: %s

""" % abstract # fifthly bis, keywords: keywords = get_fieldvalues(recID, "6531_a") if len(keywords): out += """

Keyword(s):""" for keyword in keywords: out += """%s ; """ % (weburl, urllib.quote(keyword), keyword) # fifthly bis bis, published in: prs_p = get_fieldvalues(recID, "909C4p") prs_v = get_fieldvalues(recID, "909C4v") prs_y = get_fieldvalues(recID, "909C4y") prs_n = get_fieldvalues(recID, "909C4n") prs_c = get_fieldvalues(recID, "909C4c") for idx in range(0,len(prs_p)): out += """

Publ. in: %s""" % prs_p[idx] if prs_v and prs_v[idx]: out += """%s""" % prs_v[idx] if prs_y and prs_y[idx]: out += """(%s)""" % prs_y[idx] if prs_n and prs_n[idx]: out += """, no.%s""" % prs_n[idx] if prs_c and prs_c[idx]: out += """, p.%s""" % prs_c[idx] out += """.""" # sixthly, fulltext link: urls_z = get_fieldvalues(recID, "8564_z") urls_u = get_fieldvalues(recID, "8564_u") for idx in range(0,len(urls_u)): link_text = "URL" try: if urls_z[idx]: link_text = urls_z[idx] except IndexError: pass out += """

%s: %s""" % (link_text, urls_u[idx], urls_u[idx]) # print some white space at the end: out += "

" return out def tmpl_print_record_list_for_similarity_boxen(self, title, score_list, ln=cdslang): """Print list of records in the "hs" (HTML Similarity) format for similarity boxes. FIXME: bad symbol names again, e.g. SCORE_LIST is *not* a list of scores. Humph. """ from search_engine import print_record out = '''
%(title)s
''' % { 'title': title } for recid, score in score_list [:5]: out += ''' ''' % { 'score': score, 'info' : print_record (recid, format="hs", ln=ln), } out += """
(%(score)s)  %(info)s
""" return out def tmpl_print_record_brief(self, ln, recID, weburl): """Displays a brief record on-the-fly Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'recID' *int* - The record id """ out = "" # record 'recID' does not exist in format 'format', so print some default format: # firstly, title: titles = get_fieldvalues(recID, "245__a") # secondly, authors: authors = get_fieldvalues(recID, "100__a") + get_fieldvalues(recID, "700__a") # thirdly, date of creation: dates = get_fieldvalues(recID, "260__c") # thirdly bis, report numbers: rns = get_fieldvalues(recID, "037__a") rns = get_fieldvalues(recID, "088__a") # fourthly, beginning of abstract: abstracts = get_fieldvalues(recID, "520__a") # fifthly, fulltext link: urls_z = get_fieldvalues(recID, "8564_z") urls_u = get_fieldvalues(recID, "8564_u") return self.tmpl_record_body( weburl = weburl, titles = titles, authors = authors, dates = dates, rns = rns, abstracts = abstracts, urls_u = urls_u, urls_z = urls_z, ) def tmpl_print_record_brief_links(self, ln, recID, weburl): """Displays links for brief record on-the-fly Parameters: - 'ln' *string* - The language to display - 'weburl' *string* - The base URL for the site - 'recID' *int* - The record id """ # load the right message language _ = gettext_set_language(ln) out = "" if cfg_use_aleph_sysnos: alephsysnos = get_fieldvalues(recID, "970__a") if len(alephsysnos)>0: alephsysno = alephsysnos[0] out += """
%s""" \ % (weburl, alephsysno, ln, _("Detailed record")) else: out += """
%s""" \ % (weburl, recID, ln, _("Detailed record")) else: out += """
%s""" \ % (weburl, recID, ln, _("Detailed record")) out += """ - %s\n""" % \ (weburl, recID, ln, _("Similar records")) if cfg_experimental_features: out += """ - %s\n""" % ( weburl, recID, ln, _("Cited by")) return out diff --git a/modules/websearch/lib/websearchadminlib.py b/modules/websearch/lib/websearchadminlib.py index 9f7a9dcab..1adc7efcc 100644 --- a/modules/websearch/lib/websearchadminlib.py +++ b/modules/websearch/lib/websearchadminlib.py @@ -1,3175 +1,3175 @@ ## $Id$ ## Administrator interface for WebSearch ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware WebSearch Administrator Interface.""" import cgi import re import MySQLdb import Numeric import os import urllib import random import marshal import time - from zlib import compress,decompress -from bibrankadminlib import write_outcome,modify_translations,get_def_name,get_i8n_name,get_name,get_rnk_nametypes,get_languages,check_user,is_adminuser,adderrorbox,addadminbox,tupletotable,tupletotable_onlyselected,addcheckboxes,createhiddenform,serialize_via_numeric_array_dumps,serialize_via_numeric_array_compr,serialize_via_numeric_array_escape,serialize_via_numeric_array,deserialize_via_numeric_array,serialize_via_marshal,deserialize_via_marshal -from messages import * -from dbquery import run_sql -from config import * -from webpage import page, pageheaderonly, pagefooteronly -from webuser import getUid, get_email from mod_python import apache +from cdsware.bibrankadminlib import write_outcome,modify_translations,get_def_name,get_i8n_name,get_name,get_rnk_nametypes,get_languages,check_user,is_adminuser,adderrorbox,addadminbox,tupletotable,tupletotable_onlyselected,addcheckboxes,createhiddenform,serialize_via_numeric_array_dumps,serialize_via_numeric_array_compr,serialize_via_numeric_array_escape,serialize_via_numeric_array,deserialize_via_numeric_array,serialize_via_marshal,deserialize_via_marshal +from cdsware.messages import * +from cdsware.dbquery import run_sql +from cdsware.config import * +from cdsware.webpage import page, pageheaderonly, pagefooteronly +from cdsware.webuser import getUid, get_email + __version__ = "$Id$" def getnavtrail(previous = ''): """Get the navtrail""" navtrail = """Admin Area > WebSearch Admin """ % (weburl, weburl) navtrail = navtrail + previous return navtrail def perform_modifytranslations(colID, ln, sel_type='', trans=[], confirm=-1, callback='yes'): """Modify the translations of a collection sel_type - the nametype to modify trans - the translations in the same order as the languages from get_languages()""" output = '' subtitle = '' cdslangs = get_languages() if confirm in ["2", 2] and colID: finresult = modify_translations(colID, cdslangs, sel_type, trans, "collection") col_dict = dict(get_def_name('', "collection")) if colID and col_dict.has_key(int(colID)): colID = int(colID) subtitle = """3. Modify translations for collection '%s'   [?]""" % (col_dict[colID], weburl) if type(trans) is str: trans = [trans] if sel_type == '': sel_type = get_col_nametypes()[0][0] header = ['Language', 'Translation'] actions = [] types = get_col_nametypes() if len(types) > 1: text = """ Name type """ output += createhiddenform(action="modifytranslations#3", text=text, button="Select", colID=colID, ln=ln, confirm=0) if confirm in [-1, "-1", 0, "0"]: trans = [] for (key, value) in cdslangs: try: trans_names = get_name(colID, key, sel_type, "collection") trans.append(trans_names[0][0]) except StandardError, e: trans.append('') for nr in range(0,len(cdslangs)): actions.append(["%s %s" % (cdslangs[nr][1], (cdslangs[nr][0]==cdslang and '(def)' or ''))]) actions[-1].append('' % trans[nr]) text = tupletotable(header=header, tuple=actions) output += createhiddenform(action="modifytranslations#3", text=text, button="Modify", colID=colID, sel_type=sel_type, ln=ln, confirm=2) if sel_type and len(trans) and confirm in ["2", 2]: output += write_outcome(finresult) try: body = [output, extra] except NameError: body = [output] if callback: return perform_editcollection(colID, ln, "perform_modifytranslations", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyrankmethods(colID, ln, func='', rnkID='', confirm=0, callback='yes'): """Modify which rank methods is visible to the collection func - remove or add rank method rnkID - the id of the rank method.""" output = "" subtitle = "" col_dict = dict(get_def_name('', "collection")) rnk_dict = dict(get_def_name('', "rnkMETHOD")) if colID and col_dict.has_key(int(colID)): colID = int(colID) if func in ["0", 0] and confirm in ["1", 1]: finresult = attach_rnk_col(colID, rnkID) elif func in ["1", 1] and confirm in ["1", 1]: finresult = detach_rnk_col(colID, rnkID) subtitle = """9. Modify rank options for collection '%s'   [?]""" % (col_dict[colID], weburl) output = """

The rank methods enabled for the collection '%s' is:
""" % col_dict[colID] rnkmethods = get_col_rnk(colID, ln) output += """
""" if not rnkmethods: output += """No rank methods""" else: for id, name in rnkmethods: output += """%s, """ % name output += """
""" rnk_list = get_def_name('', "rnkMETHOD") rnk_dict_in_col = dict(get_col_rnk(colID, ln)) rnk_list = filter(lambda x: not rnk_dict_in_col.has_key(x[0]), rnk_list) if rnk_list: text = """ Enable: """ output += createhiddenform(action="modifyrankmethods#9", text=text, button="Enable", colID=colID, ln=ln, func=0, confirm=1) if confirm in ["1", 1] and func in ["0", 0] and int(rnkID) != -1: output += write_outcome(finresult) elif confirm not in ["0", 0] and func in ["0", 0]: output += """Please select a rank method.""" coll_list = get_col_rnk(colID, ln) if coll_list: text = """ Disable: """ output += createhiddenform(action="modifyrankmethods#9", text=text, button="Disable", colID=colID, ln=ln, func=1, confirm=1) if confirm in ["1", 1] and func in ["1", 1] and int(rnkID) != -1: output += write_outcome(finresult) elif confirm not in ["0", 0] and func in ["1", 1]: output += """Please select a rank method.""" try: body = [output, extra] except NameError: body = [output] if callback: return perform_editcollection(colID, ln, "perform_modifyrankmethods", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_addcollectiontotree(colID, ln, add_dad='', add_son='', rtype='', mtype='', callback='yes', confirm=-1): """Form to add a collection to the tree. add_dad - the dad to add the collection to add_son - the collection to add rtype - add it as a regular or virtual mtype - add it to the regular or virtual tree.""" output = "" output2 = "" subtitle = """Attach collection to tree   [?]""" % (weburl) col_dict = dict(get_def_name('', "collection")) if confirm not in [-1, "-1"] and not (add_son and add_dad and rtype): output2 += """All fields must be filled.

""" elif add_son and add_dad and rtype: add_son = int(add_son) add_dad = int(add_dad) if confirm not in [-1, "-1"]: if add_son == add_dad: output2 += """Cannot add a collection as a pointer to itself.

""" elif check_col(add_dad, add_son): res = add_col_dad_son(add_dad, add_son, rtype) output2 += write_outcome(res) if res[0] == 1: output2 += """
The collection will appear on your website after the next webcoll run. You can either run it manually or wait until bibsched does it for you.


""" else: output2 += """Cannot add the collection '%s' as a %s subcollection of '%s' since it will either create a loop, or the association already exists.

""" % (col_dict[add_son], (rtype=="r" and 'regular' or 'virtual'), col_dict[add_dad]) add_son = '' add_dad = '' rtype = '' tree = get_col_tree(colID) col_list = col_dict.items() col_list.sort(compare_on_val) output = show_coll_not_in_tree(colID, ln, col_dict) text = """ Attach which
Attach to
""" text += """ Relationship """ % ((rtype=="r" and 'selected="selected"' or ''), (rtype=="v" and 'selected="selected"' or '')) output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/addcollectiontotree" % weburl, text=text, button="Add", colID=colID, ln=ln, confirm=1) output += output2 #output += perform_showtree(colID, ln) try: body = [output, extra] except NameError: body = [output] if callback: return perform_index(colID, ln, mtype="perform_addcollectiontotree", content=addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_addcollection(colID, ln, colNAME='', dbquery='', rest='', callback="yes", confirm=-1): """form to add a new collection. colNAME - the name of the new collection dbquery - the dbquery of the new collection rest - the group allowed to access the new collection""" output = "" subtitle = """Create new collection   [?]""" % (weburl) text = """ Default name
""" % colNAME output = createhiddenform(action="%s/admin/websearch/websearchadmin.py/addcollection" % weburl, text=text, colID=colID, ln=ln, button="Add collection", confirm=1) if colNAME and confirm in ["1", 1]: res = add_col(colNAME, '', '') output += write_outcome(res) if res[0] == 1: output += perform_addcollectiontotree(colID=colID, ln=ln, add_son=res[1], callback='') elif confirm not in ["-1", -1]: output += """Please give the collection a name.""" try: body = [output, extra] except NameError: body = [output] if callback: return perform_index(colID, ln=ln, mtype="perform_addcollection", content=addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifydbquery(colID, ln, dbquery='', callback='yes', confirm=-1): """form to modify the dbquery of the collection. dbquery - the dbquery of the collection.""" subtitle = '' output = "" col_dict = dict(get_def_name('', "collection")) if colID and col_dict.has_key(int(colID)): colID = int(colID) subtitle = """1. Modify collection query for collection '%s'   [?]""" % (col_dict[colID], weburl) if confirm == -1: res = run_sql("SELECT dbquery FROM collection WHERE id=%s" % colID) dbquery = res[0][0] if not dbquery: dbquery = '' reg_sons = len(get_col_tree(colID, 'r')) vir_sons = len(get_col_tree(colID, 'v')) if reg_sons > 1: if dbquery: output += "Warning: This collection got subcollections, and should because of this not have a collection query, for further explanation, check the WebSearch Guide
" elif reg_sons <= 1: if not dbquery: output += "Warning: This collection does not have any subcollections, and should because of this have a collection query, for further explanation, check the WebSearch Guide
" text = """ Query
""" % dbquery output += createhiddenform(action="modifydbquery", text=text, button="Modify", colID=colID, ln=ln, confirm=1) if confirm in ["1", 1]: res = modify_dbquery(colID, dbquery) if res: if dbquery == "": text = """Query removed for this collection.""" else: text = """Query set for this collection.""" else: text = """Sorry, could not change query.""" output += text try: body = [output, extra] except NameError: body = [output] if callback: return perform_editcollection(colID, ln, "perform_modifydbquery", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyrestricted(colID, ln, rest='', callback='yes', confirm=-1): """modify which apache group is allowed to access the collection. rest - the groupname""" subtitle = '' output = "" col_dict = dict(get_def_name('', "collection")) if colID and col_dict.has_key(int(colID)): colID = int(colID) subtitle = """2. Modify access restrictions for collection '%s'   [?]""" % (col_dict[colID], weburl) if confirm == -1: res = run_sql("SELECT restricted FROM collection WHERE id=%s" % colID) rest = res[0][0] if not rest: rest = '' text = """ Restricted to:
""" % rest output += createhiddenform(action="modifyrestricted", text=text, button="Modify", colID=colID, ln=ln, confirm=1) if confirm in ["1", 1]: res = modify_restricted(colID, rest) if res: if rest == "": text = """Removed restriction for this collection.""" else: text = """Restriction set for this collection.""" else: text = """Sorry, could not change the access restrictions.""" output += text try: body = [output, extra] except NameError: body = [output] if callback: return perform_editcollection(colID, ln, "perform_modifyrestricted", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifycollectiontree(colID, ln, move_up='', move_down='', move_from='', move_to='', delete='', rtype='', callback='yes', confirm=0): """to modify the collection tree: move a collection up and down, delete a collection, or change the father of the collection. colID - the main collection of the tree, the root move_up - move this collection up (is not the collection id, but the place in the tree) move_up - move this collection down (is not the collection id, but the place in the tree) move_from - move this collection from the current positon (is not the collection id, but the place in the tree) move_to - move the move_from collection and set this as it's father. (is not the collection id, but the place in the tree) delete - delete this collection from the tree (is not the collection id, but the place in the tree) rtype - the type of the collection in the tree, regular or virtual""" colID = int(colID) tree = get_col_tree(colID, rtype) col_dict = dict(get_def_name('', "collection")) subtitle = """Modify collection tree: %s   [?]   Printer friendly version""" % (col_dict[colID], weburl, weburl, colID, ln) fin_output = "" output = "" try: if move_up: move_up = int(move_up) switch = find_last(tree, move_up) if switch and switch_col_treescore(tree[move_up], tree[switch]): output += """Moved the %s collection '%s' up and '%s' down.

""" % ((rtype=="r" and 'regular' or 'virtual'), col_dict[tree[move_up][0]], col_dict[tree[switch][0]]) else: output += """Could not move the %s collection '%s' up and '%s' down.

""" % ((rtype=="r" and 'regular' or 'virtual'), col_dict[tree[move_up][0]], col_dict[tree[switch][0]]) elif move_down: move_down = int(move_down) switch = find_next(tree, move_down) if switch and switch_col_treescore(tree[move_down], tree[switch]): output += """Moved the %s collection '%s' down and '%s' up.

""" % ((rtype=="r" and 'regular' or 'virtual'), col_dict[tree[move_down][0]], col_dict[tree[switch][0]]) else: output += """Could not move the %s collection '%s' up and '%s' down.

""" % ((rtype=="r" and 'regular' or 'virtual'), col_dict[tree[move_up][0]],col_dict[tree[switch][0]]) elif delete: delete = int(delete) if confirm in [0, "0"]: if col_dict[tree[delete][0]] != col_dict[tree[delete][3]]: text = """Do you want to remove the %s collection '%s' and its subcollections in the %s collection '%s'. """ % ((tree[delete][4]=="r" and 'regular' or 'virtual'), col_dict[tree[delete][0]], (rtype=="r" and 'regular' or 'virtual'), col_dict[tree[delete][3]]) else: text = """Do you want to remove all subcollections of the %s collection '%s'. """ % ((rtype=="r" and 'regular' or 'virtual'), col_dict[tree[delete][3]]) output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/modifycollectiontree#tree" % weburl, text=text, button="Confirm", colID=colID, delete=delete, rtype=rtype, ln=ln, confirm=1) output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/index?mtype=perform_modifycollectiontree#tree" % weburl, text="To cancel", button="Cancel", colID=colID, ln=ln) else: if remove_col_subcol(tree[delete][0], tree[delete][3], rtype): if col_dict[tree[delete][0]] != col_dict[tree[delete][3]]: output += """Removed the %s collection '%s' and its subcollections in subdirectory '%s'.

""" % ((tree[delete][4]=="r" and 'regular' or 'virtual'), col_dict[tree[delete][0]], col_dict[tree[delete][3]]) else: output += """Removed the subcollections of the %s collection '%s'.

""" % ((rtype=="r" and 'regular' or 'virtual'), col_dict[tree[delete][3]]) else: output += """Could not remove the collection from the tree.

""" delete = '' elif move_from and not move_to: move_from_rtype = move_from[0] move_from_id = int(move_from[1:len(move_from)]) text = """Select collection to place the %s collection '%s' under.

""" % ((move_from_rtype=="r" and 'regular' or 'virtual'), col_dict[tree[move_from_id][0]]) output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/index?mtype=perform_modifycollectiontree#tree" % weburl, text=text, button="Cancel", colID=colID, ln=ln) elif move_from and move_to: move_from_rtype = move_from[0] move_from_id = int(move_from[1:len(move_from)]) move_to_rtype = move_to[0] move_to_id = int(move_to[1:len(move_to)]) tree_from = get_col_tree(colID, move_from_rtype) tree_to = get_col_tree(colID, move_to_rtype) if confirm in [0, '0']: if move_from_id == move_to_id and move_from_rtype==move_to_rtype: output += """Cannot move to itself.

""" elif tree_from[move_from_id][3] == tree_to[move_to_id][0] and move_from_rtype==move_to_rtype: output += """The collection is already there.

""" elif check_col(tree_to[move_to_id][0], tree_from[move_from_id][0]) or (tree_to[move_to_id][0] == 1 and tree_from[move_from_id][3] == tree_to[move_to_id][0] and move_from_rtype != move_to_rtype): text = """Move %s collection '%s' to the %s collection '%s'. """ % ((tree_from[move_from_id][4]=="r" and 'regular' or 'virtual'), col_dict[tree_from[move_from_id][0]], (tree_to[move_to_id][4]=="r" and 'regular' or 'virtual'), col_dict[tree_to[move_to_id][0]]) output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/modifycollectiontree#tree" % weburl, text=text, button="Confirm", colID=colID, move_from=move_from, move_to=move_to, ln=ln, rtype=rtype, confirm=1) output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/index?mtype=perform_modifycollectiontree#tree" % weburl, text="""To cancel""", button="Cancel", colID=colID, ln=ln) else: output += """Cannot move the collection '%s' and set it as a subcollection of '%s' since it will create a loop.

""" % (col_dict[tree_from[move_from_id][0]], col_dict[tree_to[move_to_id][0]]) else: if (move_to_id != 0 and move_col_tree(tree_from[move_from_id], tree_to[move_to_id])) or (move_to_id == 0 and move_col_tree(tree_from[move_from_id], tree_to[move_to_id], move_to_rtype)): output += """Moved %s collection '%s' to the %s collection '%s'.

""" % ((move_from_rtype=="r" and 'regular' or 'virtual'), col_dict[tree_from[move_from_id][0]], (move_to_rtype=="r" and 'regular' or 'virtual'), col_dict[tree_to[move_to_id][0]]) else: output += """Could not move %s collection '%s' to the %s collection '%s'.

""" % ((move_from_rtype=="r" and 'regular' or 'virtual'), col_dict[tree_from[move_from_id][0]], (move_to_rtype=="r" and 'regular' or 'virtual'), col_dict[tree_to[move_to_id][0]]) move_from = '' move_to = '' else: output += """ """ except StandardError, e: return """An error occured. """ output += """
Narrow by collection: Focus on...:
""" tree = get_col_tree(colID, 'r') output += create_colltree(tree, col_dict, colID, ln, move_from, move_to, 'r', "yes") output += """ """ tree = get_col_tree(colID, 'v') output += create_colltree(tree, col_dict, colID, ln, move_from, move_to, 'v', "yes") output += """
""" try: body = [output, extra] except NameError: body = [output] if callback: return perform_index(colID, ln, mtype="perform_modifycollectiontree", content=addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_showtree(colID, ln): """create collection tree/hiarchy""" col_dict = dict(get_def_name('', "collection")) subtitle = "Collection tree: %s" % col_dict[int(colID)] output = """
Narrow by collection: Focus on...:
""" tree = get_col_tree(colID, 'r') output += create_colltree(tree, col_dict, colID, ln, '', '', 'r', '') output += """ """ tree = get_col_tree(colID, 'v') output += create_colltree(tree, col_dict, colID, ln, '', '', 'v', '') output += """
""" try: body = [output, extra] except NameError: body = [output] return addadminbox(subtitle, body) def perform_addportalbox(colID, ln, title='', body='', callback='yes', confirm=-1): """form to add a new portalbox title - the title of the portalbox body - the body of the portalbox""" col_dict = dict(get_def_name('', "collection")) colID = int(colID) subtitle = """Create new portalbox""" text = """ Title
Body
""" % (cgi.escape(title), cgi.escape(body)) output = createhiddenform(action="addportalbox#5.1", text=text, button="Add", colID=colID, ln=ln, confirm=1) if body and confirm in [1, "1"]: title = cgi.escape(title) body = cgi.escape(body) res = add_pbx(title, body) output += write_outcome(res) if res[1] == 1: output += """Add portalbox to collection""" % (colID, ln, res[1]) elif confirm not in [-1, "-1"]: output += """Body field must be filled. """ try: body = [output, extra] except NameError: body = [output] return perform_showportalboxes(colID, ln, content=addadminbox(subtitle, body)) def perform_addexistingportalbox(colID, ln, pbxID=-1, score=0, position='', sel_ln='', callback='yes', confirm=-1): """form to add an existing portalbox to a collection. colID - the collection to add the portalbox to pbxID - the portalbox to add score - the importance of the portalbox. position - the position of the portalbox on the page sel_ln - the language of the portalbox""" subtitle = """Add existing portalbox to collection""" output = "" colID = int(colID) res = get_pbx() pos = get_pbx_pos() lang = dict(get_languages()) col_dict = dict(get_def_name('', "collection")) pbx_dict = dict(map(lambda x: (x[0], x[1]), res)) col_pbx = get_col_pbx(colID) col_pbx = dict(map(lambda x: (x[0], x[5]), col_pbx)) if len(res) > 0: text = """ Portalbox
Language
Position " output += createhiddenform(action="addexistingportalbox#5.2", text=text, button="Add", colID=colID, ln=ln, confirm=1) else: output = """No existing portalboxes to add, please create a new one. """ if pbxID > -1 and position and sel_ln and confirm in [1, "1"]: pbxID = int(pbxID) res = add_col_pbx(colID, pbxID, sel_ln, position, '') output += write_outcome(res) elif pbxID > -1 and confirm not in [-1, "-1"]: output += """All fields must be filled. """ try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_showportalboxes(colID, ln, content=output) def perform_deleteportalbox(colID, ln, pbxID=-1, callback='yes', confirm=-1): """form to delete a portalbox which is not in use. colID - the current collection. pbxID - the id of the portalbox""" subtitle = """Delete an unused portalbox""" output = "" colID = int(colID) if pbxID not in [-1," -1"] and confirm in [1, "1"]: ares = get_pbx() pbx_dict = dict(map(lambda x: (x[0], x[1]), ares)) if pbx_dict.has_key(int(pbxID)): pname = pbx_dict[int(pbxID)] ares = delete_pbx(int(pbxID)) else: return """This portalbox does not exist""" res = get_pbx() col_dict = dict(get_def_name('', "collection")) pbx_dict = dict(map(lambda x: (x[0], x[1]), res)) col_pbx = get_col_pbx() col_pbx = dict(map(lambda x: (x[0], x[5]), col_pbx)) if len(res) > 0: text = """ Portalbox
""" output += createhiddenform(action="deleteportalbox#5.3", text=text, button="Delete", colID=colID, ln=ln, confirm=1) if pbxID not in [-1,"-1"]: pbxID = int(pbxID) if confirm in [1, "1"]: output += write_outcome(ares) elif confirm not in [-1, "-1"]: output += """Choose a portalbox to delete. """ try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_showportalboxes(colID, ln, content=output) def perform_modifyportalbox(colID, ln, pbxID=-1, score='', position='', sel_ln='', title='', body='', callback='yes', confirm=-1): """form to modify a portalbox in a collection, or change the portalbox itself. colID - the id of the collection. pbxID - the portalbox to change score - the score of the portalbox connected to colID which should be changed. position - the position of the portalbox in collection colID to change.""" subtitle = "" output = "" colID = int(colID) res = get_pbx() pos = get_pbx_pos() lang = dict(get_languages()) col_dict = dict(get_def_name('', "collection")) pbx_dict = dict(map(lambda x: (x[0], x[1]), res)) col_pbx = get_col_pbx(colID) col_pbx = dict(map(lambda x: (x[0], x[5]), col_pbx)) if pbxID not in [-1, "-1"]: pbxID = int(pbxID) subtitle = """Modify portalbox '%s' for this collection""" % pbx_dict[pbxID] col_pbx = get_col_pbx(colID) if not (score and position) and not (body and title): for (id_pbx, id_collection, tln, score, position, title, body) in col_pbx: if id_pbx == pbxID: break output += """Collection (presentation) specific values (Changes implies only to this collection.)
""" text = """ Position
""" output += createhiddenform(action="modifyportalbox#5.4", text=text, button="Modify", colID=colID, pbxID=pbxID, score=score, title=title, body=cgi.escape(body, 1), sel_ln=sel_ln, ln=ln, confirm=3) if pbxID > -1 and score and position and confirm in [3, "3"]: pbxID = int(pbxID) res = modify_pbx(colID, pbxID, sel_ln, score, position, '', '') res2 = get_pbx() pbx_dict = dict(map(lambda x: (x[0], x[1]), res2)) output += write_outcome(res) output += """
Portalbox (content) specific values (any changes appears everywhere the portalbox is used.)""" text = """ Title
""" % cgi.escape(title) text += """ Body
""" % cgi.escape(body) output += createhiddenform(action="modifyportalbox#5.4", text=text, button="Modify", colID=colID, pbxID=pbxID, sel_ln=sel_ln, score=score, position=position, ln=ln, confirm=4) if pbxID > -1 and confirm in [4, "4"]: pbxID = int(pbxID) res = modify_pbx(colID, pbxID, sel_ln, '', '', title, body) output += write_outcome(res) else: output = """No portalbox to modify.""" try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_showportalboxes(colID, ln, content=output) def perform_switchpbxscore(colID, id_1, id_2, sel_ln, ln): """Switch the score of id_1 and id_2 in collection_portalbox. colID - the current collection id_1/id_2 - the id's to change the score for. sel_ln - the language of the portalbox""" res = get_pbx() pbx_dict = dict(map(lambda x: (x[0], x[1]), res)) res = switch_pbx_score(colID, id_1, id_2, sel_ln) output += write_outcome(res) return perform_showportalboxes(colID, ln, content=output) def perform_showportalboxes(colID, ln, callback='yes', content='', confirm=-1): """show the portalboxes of this collection. colID - the portalboxes to show the collection for.""" colID = int(colID) col_dict = dict(get_def_name('', "collection")) subtitle = """5. Modify portalboxes for collection '%s'   [?]""" % (col_dict[colID], weburl) output = "" pos = get_pbx_pos() output = """
Portalbox actions (not related to this collection)
Create new portalbox
Delete an unused portalbox
Collection specific actions
Add existing portalbox to collection
""" % (colID, ln, colID, ln, colID, ln) header = ['Position', 'Language', '', 'Title', 'Actions'] actions = [] cdslang = get_languages() lang = dict(cdslang) pos_list = pos.items() pos_list.sort() if len(get_col_pbx(colID)) > 0: for (key, value) in cdslang: for (pos_key, pos_value) in pos_list: res = get_col_pbx(colID, key, pos_key) i = 0 for (pbxID, colID_pbx, tln, score, position, title, body) in res: move = """
""" if i != 0: move += """""" % (weburl, colID, ln, pbxID, res[i - 1][0], tln, random.randint(0, 1000), weburl) else: move += "   " move += "" i += 1 if i != len(res): move += """""" % (weburl, colID, ln, pbxID, res[i][0], tln, random.randint(0, 1000), weburl) move += """
""" actions.append(["%s" % (i==1 and pos[position] or ''), "%s" % (i==1 and lang[tln] or ''), move, "%s" % title]) for col in [(('Modify', 'modifyportalbox'), ('Remove', 'removeportalbox'),)]: actions[-1].append('%s' % (weburl, col[0][1], colID, ln, pbxID, tln, col[0][0])) for (str, function) in col[1:]: actions[-1][-1] += ' / %s' % (weburl, function, colID, ln, pbxID, tln, str) output += tupletotable(header=header, tuple=actions) else: output += """No portalboxes exists for this collection""" output += content try: body = [output, extra] except NameError: body = [output] if callback: return perform_editcollection(colID, ln, "perform_showportalboxes", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_removeportalbox(colID, ln, pbxID='', sel_ln='', callback='yes', confirm=0): """form to remove a portalbox from a collection. colID - the current collection, remove the portalbox from this collection. sel_ln - remove the portalbox with this language pbxID - remove the portalbox with this id""" subtitle = """Remove portalbox""" output = "" col_dict = dict(get_def_name('', "collection")) res = get_pbx() pbx_dict = dict(map(lambda x: (x[0], x[1]), res)) if colID and pbxID and sel_ln: colID = int(colID) pbxID = int(pbxID) if confirm in ["0", 0]: text = """Do you want to remove the portalbox '%s' from the collection '%s'.""" % (pbx_dict[pbxID], col_dict[colID]) output += createhiddenform(action="removeportalbox#5.5", text=text, button="Confirm", colID=colID, pbxID=pbxID, sel_ln=sel_ln, confirm=1) elif confirm in ["1", 1]: res = remove_pbx(colID, pbxID, sel_ln) output += write_outcome(res) try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_showportalboxes(colID, ln, content=output) def perform_switchfmtscore(colID, type, id_1, id_2, ln): """Switch the score of id_1 and id_2 in the table type. colID - the current collection id_1/id_2 - the id's to change the score for. type - like "format" """ fmt_dict = dict(get_def_name('', "format")) res = switch_score(colID, id_1, id_2, type) output = write_outcome(res) return perform_showoutputformats(colID, ln, content=output) def perform_switchfldscore(colID, id_1, id_2, fmeth, ln): """Switch the score of id_1 and id_2 in collection_field_fieldvalue. colID - the current collection id_1/id_2 - the id's to change the score for.""" fld_dict = dict(get_def_name('', "field")) res = switch_fld_score(colID, id_1, id_2) output = write_outcome(res) if fmeth == "soo": return perform_showsortoptions(colID, ln, content=output) elif fmeth == "sew": return perform_showsearchfields(colID, ln, content=output) elif fmeth == "seo": return perform_showsearchoptions(colID, ln, content=output) def perform_switchfldvaluescore(colID, id_1, id_fldvalue_1, id_fldvalue_2, ln): """Switch the score of id_1 and id_2 in collection_field_fieldvalue. colID - the current collection id_1/id_2 - the id's to change the score for.""" name_1 = run_sql("SELECT name from fieldvalue where id=%s" % id_fldvalue_1)[0][0] name_2 = run_sql("SELECT name from fieldvalue where id=%s" % id_fldvalue_2)[0][0] res = switch_fld_value_score(colID, id_1, id_fldvalue_1, id_fldvalue_2) output = write_outcome(res) return perform_modifyfield(colID, fldID=id_1, ln=ln, content=output) def perform_addnewfieldvalue(colID, fldID, ln, name='', value='', callback="yes", confirm=-1): """form to add a new fieldvalue. name - the name of the new fieldvalue value - the value of the new fieldvalue """ output = "" subtitle = """Add new value""" text = """ Display name
Search value
""" % (name, value) output = createhiddenform(action="%s/admin/websearch/websearchadmin.py/addnewfieldvalue" % weburl, text=text, colID=colID, fldID=fldID, ln=ln, button="Add", confirm=1) if name and value and confirm in ["1", 1]: res = add_fldv(name, value) output += write_outcome(res) if res[0] == 1: res = add_col_fld(colID, fldID, 'seo', res[1]) if res[0] == 0: output += "
" + write_outcome(res) elif confirm not in ["-1", -1]: output += """Please fill in name and value. """ try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_modifyfield(colID, fldID=fldID, ln=ln, content=output) def perform_modifyfieldvalue(colID, fldID, fldvID, ln, name='', value='', callback="yes", confirm=-1): """form to modify a fieldvalue. name - the name of the fieldvalue value - the value of the fieldvalue """ if confirm in [-1, "-1"]: res = get_fld_value(fldvID) (id, name, value) = res[0] output = "" subtitle = """Modify existing value""" output = """
Warning: Modifications done below will also inflict on all places the modified data is used.
""" text = """ Display name
Search value
""" % (name, value) output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/modifyfieldvalue" % weburl, text=text, colID=colID, fldID=fldID, fldvID=fldvID, ln=ln, button="Update", confirm=1) output += createhiddenform(action="%s/admin/websearch/websearchadmin.py/modifyfieldvalue" % weburl, text="Delete value and all associations", colID=colID, fldID=fldID, fldvID=fldvID, ln=ln, button="Delete", confirm=2) if name and value and confirm in ["1", 1]: res = update_fldv(fldvID, name, value) output += write_outcome(res) #if res: # output += """Operation successfully completed.""" #else: # output += """Operation failed.""" elif confirm in ["2", 2]: res = delete_fldv(fldvID) output += write_outcome(res) elif confirm not in ["-1", -1]: output += """Please fill in name and value.""" try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_modifyfield(colID, fldID=fldID, ln=ln, content=output) def perform_removefield(colID, ln, fldID='', fldvID='', fmeth='', callback='yes', confirm=0): """form to remove a field from a collection. colID - the current collection, remove the field from this collection. sel_ln - remove the field with this language fldID - remove the field with this id""" if fmeth == "soo": field = "sort option" elif fmeth == "sew": field = "search field" elif fmeth == "seo": field = "search option" else: field = "field" subtitle = """Remove %s""" % field output = "" col_dict = dict(get_def_name('', "collection")) fld_dict = dict(get_def_name('', "field")) res = get_fld_value() fldv_dict = dict(map(lambda x: (x[0], x[1]), res)) if colID and fldID: colID = int(colID) fldID = int(fldID) if fldvID and fldvID != "None": fldvID = int(fldvID) if confirm in ["0", 0]: text = """Do you want to remove the %s '%s' %s from the collection '%s'.""" % (field, fld_dict[fldID], (fldvID not in["", "None"] and "with value '%s'" % fldv_dict[fldvID] or ''), col_dict[colID]) output += createhiddenform(action="removefield#6.5", text=text, button="Confirm", colID=colID, fldID=fldID, fldvID=fldvID, fmeth=fmeth, confirm=1) elif confirm in ["1", 1]: res = remove_fld(colID, fldID, fldvID) output += write_outcome(res) try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) if fmeth == "soo": return perform_showsortoptions(colID, ln, content=output) elif fmeth == "sew": return perform_showsearchfields(colID, ln, content=output) elif fmeth == "seo": return perform_showsearchoptions(colID, ln, content=output) def perform_removefieldvalue(colID, ln, fldID='', fldvID='', fmeth='', callback='yes', confirm=0): """form to remove a field from a collection. colID - the current collection, remove the field from this collection. sel_ln - remove the field with this language fldID - remove the field with this id""" subtitle = """Remove value""" output = "" col_dict = dict(get_def_name('', "collection")) fld_dict = dict(get_def_name('', "field")) res = get_fld_value() fldv_dict = dict(map(lambda x: (x[0], x[1]), res)) if colID and fldID: colID = int(colID) fldID = int(fldID) if fldvID and fldvID != "None": fldvID = int(fldvID) if confirm in ["0", 0]: text = """Do you want to remove the value '%s' from the search option '%s'.""" % (fldv_dict[fldvID], fld_dict[fldID]) output += createhiddenform(action="removefieldvalue#7.4", text=text, button="Confirm", colID=colID, fldID=fldID, fldvID=fldvID, fmeth=fmeth, confirm=1) elif confirm in ["1", 1]: res = remove_fld(colID, fldID, fldvID) output += write_outcome(res) try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_modifyfield(colID, fldID=fldID, ln=ln, content=output) def perform_rearrangefieldvalue(colID, fldID, ln, callback='yes', confirm=-1): """rearrang the fieldvalues alphabetically colID - the collection fldID - the field to rearrange the fieldvalue for """ subtitle = "Order values alphabetically" output = "" col_fldv = get_col_fld(colID, 'seo', fldID) col_fldv = dict(map(lambda x: (x[1], x[0]), col_fldv)) fldv_names = get_fld_value() fldv_names = map(lambda x: (x[0], x[1]), fldv_names) if not col_fldv.has_key(None): vscore = len(col_fldv) for (fldvID, name) in fldv_names: if col_fldv.has_key(fldvID): run_sql("UPDATE collection_field_fieldvalue SET score_fieldvalue=%s WHERE id_collection=%s and id_field=%s and id_fieldvalue=%s" % (vscore, colID, fldID, fldvID)) vscore -= 1 output += write_outcome((1, "")) else: output += write_outcome((0, (0, "No values to order"))) try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_modifyfield(colID, fldID, ln, content=output) def perform_rearrangefield(colID, ln, fmeth, callback='yes', confirm=-1): """rearrang the fields alphabetically colID - the collection """ subtitle = "Order fields alphabetically" output = "" col_fld = dict(map(lambda x: (x[0], x[1]), get_col_fld(colID, fmeth))) fld_names = get_def_name('', "field") if len(col_fld) > 0: score = len(col_fld) for (fldID, name) in fld_names: if col_fld.has_key(fldID): run_sql("UPDATE collection_field_fieldvalue SET score=%s WHERE id_collection=%s and id_field=%s" % (score, colID, fldID)) score -= 1 output += write_outcome((1, "")) else: output += write_outcome((0, (0, "No fields to order"))) try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) if fmeth == "soo": return perform_showsortoptions(colID, ln, content=output) elif fmeth == "sew": return perform_showsearchfields(colID, ln, content=output) elif fmeth == "seo": return perform_showsearchoptions(colID, ln, content=output) def perform_addexistingfieldvalue(colID, fldID, fldvID=-1, ln=cdslang, callback='yes', confirm=-1): """form to add an existing fieldvalue to a field. colID - the collection fldID - the field to add the fieldvalue to fldvID - the fieldvalue to add""" subtitle = """Add existing value to search option""" output = "" if fldvID not in [-1, "-1"] and confirm in [1, "1"]: fldvID = int(fldvID) ares = add_col_fld(colID, fldID, 'seo', fldvID) colID = int(colID) fldID = int(fldID) lang = dict(get_languages()) res = get_def_name('', "field") col_dict = dict(get_def_name('', "collection")) fld_dict = dict(res) col_fld = dict(map(lambda x: (x[0], x[1]), get_col_fld(colID, 'seo'))) fld_value = get_fld_value() fldv_dict = dict(map(lambda x: (x[0], x[1]), fld_value)) text = """ Value
""" output += createhiddenform(action="addexistingfieldvalue#7.4", text=text, button="Add", colID=colID, fldID=fldID, ln=ln, confirm=1) if fldvID not in [-1, "-1"] and confirm in [1, "1"]: output += write_outcome(ares) elif confirm in [1, "1"]: output += """Select a value to add and try again.""" try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_modifyfield(colID, fldID, ln, content=output) def perform_addexistingfield(colID, ln, fldID=-1, fldvID=-1, fmeth='', callback='yes', confirm=-1): """form to add an existing field to a collection. colID - the collection to add the field to fldID - the field to add sel_ln - the language of the field""" subtitle = """Add existing field to collection""" output = "" if fldID not in [-1, "-1"] and confirm in [1, "1"]: fldID = int(fldID) ares = add_col_fld(colID, fldID, fmeth, fldvID) colID = int(colID) lang = dict(get_languages()) res = get_def_name('', "field") col_dict = dict(get_def_name('', "collection")) fld_dict = dict(res) col_fld = dict(map(lambda x: (x[0], x[1]), get_col_fld(colID, fmeth))) fld_value = get_fld_value() fldv_dict = dict(map(lambda x: (x[0], x[1]), fld_value)) if fldvID: fldvID = int(fldvID) text = """ Field
""" output += createhiddenform(action="addexistingfield#6.2", text=text, button="Add", colID=colID, fmeth=fmeth, ln=ln, confirm=1) if fldID not in [-1, "-1"] and confirm in [1, "1"]: output += write_outcome(ares) elif fldID in [-1, "-1"] and confirm not in [-1, "-1"]: output += """Select a field. """ try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) if fmeth == "soo": return perform_showsortoptions(colID, ln, content=output) elif fmeth == "sew": return perform_showsearchfields(colID, ln, content=output) elif fmeth == "seo": return perform_showsearchoptions(colID, ln, content=output) def perform_showsortoptions(colID, ln, callback='yes', content='', confirm=-1): """show the sort fields of this collection..""" colID = int(colID) col_dict = dict(get_def_name('', "collection")) fld_dict = dict(get_def_name('', "field")) fld_type = get_sort_nametypes() subtitle = """8. Modify sort options for collection '%s'   [?]""" % (col_dict[colID], weburl) output = """
Field actions (not related to this collection)
Go to the BibIndex interface to modify the available sort options
Collection specific actions
Add sort option to collection
Order sort options alphabetically
""" % (colID, ln, colID, ln) header = ['', 'Sort option', 'Actions'] actions = [] cdslang = get_languages() lang = dict(cdslang) fld_type_list = fld_type.items() if len(get_col_fld(colID, 'soo')) > 0: res = get_col_fld(colID, 'soo') i = 0 for (fldID, fldvID, stype, score, score_fieldvalue) in res: move = """
""" if i != 0: move += """""" % (weburl, colID, ln, fldID, res[i - 1][0], random.randint(0, 1000), weburl) else: move += "    " move += "" i += 1 if i != len(res): move += """""" % (weburl, colID, ln, fldID, res[i][0], random.randint(0, 1000), weburl) move += """
""" actions.append([move, fld_dict[int(fldID)]]) for col in [(('Remove sort option', 'removefield'),)]: actions[-1].append('%s' % (weburl, col[0][1], colID, ln, fldID, col[0][0])) for (str, function) in col[1:]: actions[-1][-1] += ' / %s' % (weburl, function, colID, ln, fldID, str) output += tupletotable(header=header, tuple=actions) else: output += """No sort options exists for this collection""" output += content try: body = [output, extra] except NameError: body = [output] if callback: return perform_editcollection(colID, ln, "perform_showsortoptions", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_showsearchfields(colID, ln, callback='yes', content='', confirm=-1): """show the search fields of this collection..""" colID = int(colID) col_dict = dict(get_def_name('', "collection")) fld_dict = dict(get_def_name('', "field")) fld_type = get_sort_nametypes() subtitle = """6. Modify search fields for collection '%s'   [?]""" % (col_dict[colID], weburl) output = """
Field actions (not related to this collection)
Go to the BibIndex interface to modify the available search fields
Collection specific actions
Add search field to collection
Order search fields alphabetically
""" % (colID, ln, colID, ln) header = ['', 'Search field', 'Actions'] actions = [] cdslang = get_languages() lang = dict(cdslang) fld_type_list = fld_type.items() if len(get_col_fld(colID, 'sew')) > 0: res = get_col_fld(colID, 'sew') i = 0 for (fldID, fldvID, stype, score, score_fieldvalue) in res: move = """
""" if i != 0: move += """""" % (weburl, colID, ln, fldID, res[i - 1][0], random.randint(0, 1000), weburl) else: move += "   " move += "" i += 1 if i != len(res): move += '' % (weburl, colID, ln, fldID, res[i][0], random.randint(0, 1000), weburl) move += """
""" actions.append([move, fld_dict[int(fldID)]]) for col in [(('Remove search field', 'removefield'),)]: actions[-1].append('%s' % (weburl, col[0][1], colID, ln, fldID, col[0][0])) for (str, function) in col[1:]: actions[-1][-1] += ' / %s' % (weburl, function, colID, ln, fldID, str) output += tupletotable(header=header, tuple=actions) else: output += """No search fields exists for this collection""" output += content try: body = [output, extra] except NameError: body = [output] if callback: return perform_editcollection(colID, ln, "perform_showsearchfields", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_showsearchoptions(colID, ln, callback='yes', content='', confirm=-1): """show the sort and search options of this collection..""" colID = int(colID) col_dict = dict(get_def_name('', "collection")) fld_dict = dict(get_def_name('', "field")) fld_type = get_sort_nametypes() subtitle = """7. Modify search options for collection '%s'   [?]""" % (col_dict[colID], weburl) output = """
Field actions (not related to this collection)
Go to the BibIndex interface to modify the available search options
Collection specific actions
Add search option to collection
Order search options alphabetically
""" % (colID, ln, colID, ln) header = ['', 'Search option', 'Actions'] actions = [] cdslang = get_languages() lang = dict(cdslang) fld_type_list = fld_type.items() fld_distinct = run_sql("SELECT distinct(id_field) FROM collection_field_fieldvalue WHERE type='seo' AND id_collection=%s ORDER by score desc" % colID) if len(fld_distinct) > 0: i = 0 for (id) in fld_distinct: fldID = id[0] col_fld = get_col_fld(colID, 'seo', fldID) move = "" if i != 0: move += """""" % (weburl, colID, ln, fldID, fld_distinct[i - 1][0], random.randint(0, 1000), weburl) else: move += "   " i += 1 if i != len(fld_distinct): move += '' % (weburl, colID, ln, fldID, fld_distinct[i][0], random.randint(0, 1000), weburl) actions.append([move, "%s" % fld_dict[fldID]]) for col in [(('Modify values', 'modifyfield'), ('Remove search option', 'removefield'),)]: actions[-1].append('%s' % (weburl, col[0][1], colID, ln, fldID, col[0][0])) for (str, function) in col[1:]: actions[-1][-1] += ' / %s' % (weburl, function, colID, ln, fldID, str) output += tupletotable(header=header, tuple=actions) else: output += """No search options exists for this collection""" output += content try: body = [output, extra] except NameError: body = [output] if callback: return perform_editcollection(colID, ln, "perform_showsearchoptions", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_modifyfield(colID, fldID, fldvID='', ln=cdslang, content='',callback='yes', confirm=0): """Modify the fieldvalues for a field""" colID = int(colID) col_dict = dict(get_def_name('', "collection")) fld_dict = dict(get_def_name('', "field")) fld_type = get_sort_nametypes() fldID = int(fldID) subtitle = """Modify values for field '%s'""" % (fld_dict[fldID]) output = """
Value specific actions
Add existing value to search option
Add new value to search option
Order values alphabetically
""" % (colID, ln, fldID, colID, ln, fldID, colID, ln, fldID) header = ['', 'Value name', 'Actions'] actions = [] cdslang = get_languages() lang = dict(cdslang) fld_type_list = fld_type.items() col_fld = list(get_col_fld(colID, 'seo', fldID)) if len(col_fld) == 1 and col_fld[0][1] == None: output += """No values added for this search option yet""" else: j = 0 for (fldID, fldvID, stype, score, score_fieldvalue) in col_fld: fieldvalue = get_fld_value(fldvID) move = "" if j != 0: move += """""" % (weburl, colID, ln, fldID, fldvID, col_fld[j - 1][1], random.randint(0, 1000), weburl) else: move += "   " j += 1 if j != len(col_fld): move += """""" % (weburl, colID, ln, fldID, fldvID, col_fld[j][1], random.randint(0, 1000), weburl) if fieldvalue[0][1] != fieldvalue[0][2] and fldvID != None: actions.append([move, "%s - %s" % (fieldvalue[0][1],fieldvalue[0][2])]) elif fldvID != None: actions.append([move, "%s" % fieldvalue[0][1]]) move = '' for col in [(('Modify value', 'modifyfieldvalue'), ('Remove value', 'removefieldvalue'),)]: actions[-1].append('%s' % (weburl, col[0][1], colID, ln, fldID, fldvID, col[0][0])) for (str, function) in col[1:]: actions[-1][-1] += ' / %s' % (weburl, function, colID, ln, fldID, fldvID, str) output += tupletotable(header=header, tuple=actions) output += content try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) if len(col_fld) == 0: output = content return perform_showsearchoptions(colID, ln, content=output) def perform_showoutputformats(colID, ln, callback='yes', content='', confirm=-1): """shows the outputformats of the current collection colID - the collection id.""" colID = int(colID) col_dict = dict(get_def_name('', "collection")) subtitle = """10. Modify output formats for collection '%s'   [?]""" % (col_dict[colID], weburl) output = """
Output format actions (not specific to the chosen collection)
Go to the BibFormat interface to modify
Collection specific actions
Add existing output format to collection
""" % (colID, ln) header = ['', 'Code', 'Output format', 'Actions'] actions = [] col_fmt = get_col_fmt(colID) fmt_dict = dict(get_def_name('', "format")) i = 0 if len(col_fmt) > 0: for (id_format, colID_fld, code, score) in col_fmt: move = """
""" if i != 0: move += """""" % (weburl, colID, ln, id_format, col_fmt[i - 1][0], random.randint(0, 1000), weburl) else: move += "   " move += "" i += 1 if i != len(col_fmt): move += '' % (weburl, colID, ln, id_format, col_fmt[i][0], random.randint(0, 1000), weburl) move += """
""" actions.append([move, code, fmt_dict[int(id_format)]]) for col in [(('Remove', 'removeoutputformat'),)]: actions[-1].append('%s' % (weburl, col[0][1], colID, ln, id_format, col[0][0])) for (str, function) in col[1:]: actions[-1][-1] += ' / %s' % (weburl, function, colID, ln, id_format, str) output += tupletotable(header=header, tuple=actions) else: output += """No output formats exists for this collection""" output += content try: body = [output, extra] except NameError: body = [output] if callback: return perform_editcollection(colID, ln, "perform_showoutputformats", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_addexistingoutputformat(colID, ln, fmtID=-1, callback='yes', confirm=-1): """form to add an existing output format to a collection. colID - the collection the format should be added to fmtID - the format to add.""" subtitle = """Add existing output format to collection""" output = "" if fmtID not in [-1, "-1"] and confirm in [1, "1"]: ares = add_col_fmt(colID, fmtID) colID = int(colID) res = get_def_name('', "format") fmt_dict = dict(res) col_dict = dict(get_def_name('', "collection")) col_fmt = get_col_fmt(colID) col_fmt = dict(map(lambda x: (x[0], x[2]), col_fmt)) if len(res) > 0: text = """ Output format
""" output += createhiddenform(action="addexistingoutputformat#10.2", text=text, button="Add", colID=colID, ln=ln, confirm=1) else: output = """No existing output formats to add, please create a new one.""" if fmtID not in [-1, "-1"] and confirm in [1, "1"]: output += write_outcome(ares) elif fmtID in [-1, "-1"] and confirm not in [-1, "-1"]: output += """Please select output format.""" try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_showoutputformats(colID, ln, content=output) def perform_deleteoutputformat(colID, ln, fmtID=-1, callback='yes', confirm=-1): """form to delete an output format not in use. colID - the collection id of the current collection. fmtID - the format id to delete.""" subtitle = """Delete an unused output format""" output = """
Deleting an output format will also delete the translations associated.
""" colID = int(colID) if fmtID not in [-1," -1"] and confirm in [1, "1"]: fmt_dict = dict(get_def_name('', "format")) old_colNAME = fmt_dict[int(fmtID)] ares = delete_fmt(int(fmtID)) res = get_def_name('', "format") fmt_dict = dict(res) col_dict = dict(get_def_name('', "collection")) col_fmt = get_col_fmt() col_fmt = dict(map(lambda x: (x[0], x[2]), col_fmt)) if len(res) > 0: text = """ Output format
""" output += createhiddenform(action="deleteoutputformat#10.3", text=text, button="Delete", colID=colID, ln=ln, confirm=0) if fmtID not in [-1,"-1"]: fmtID = int(fmtID) if confirm in [0, "0"]: text = """Do you want to delete the output format '%s'. """ % fmt_dict[fmtID] output += createhiddenform(action="deleteoutputformat#10.3", text=text, button="Confirm", colID=colID, fmtID=fmtID, ln=ln, confirm=1) elif confirm in [1, "1"]: output += write_outcome(ares) elif confirm not in [-1, "-1"]: output += """Choose a output format to delete. """ try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_showoutputformats(colID, ln, content=output) def perform_removeoutputformat(colID, ln, fmtID='', callback='yes', confirm=0): """form to remove an output format from a collection. colID - the collection id of the current collection. fmtID - the format id. """ subtitle = """Remove output format""" output = "" col_dict = dict(get_def_name('', "collection")) fmt_dict = dict(get_def_name('', "format")) if colID and fmtID: colID = int(colID) fmtID = int(fmtID) if confirm in ["0", 0]: text = """Do you want to remove the output format '%s' from the collection '%s'.""" % (fmt_dict[fmtID], col_dict[colID]) output += createhiddenform(action="removeoutputformat#10.5", text=text, button="Confirm", colID=colID, fmtID=fmtID, confirm=1) elif confirm in ["1", 1]: res = remove_fmt(colID, fmtID) output += write_outcome(res) try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_showoutputformats(colID, ln, content=output) def perform_index(colID=1, ln=cdslang, mtype='', content='', confirm=0): """The index method, calling methods to show the collection tree, create new collections and add collections to tree. """ subtitle = "Overview" colID = int(colID) col_dict = dict(get_def_name('', "collection")) output = "" fin_output = "" if not col_dict.has_key(1): res = add_col(cdsname, '', '') if res: fin_output += """Created root collection.
""" else: return "Cannot create root collection, please check database." if cdsname != run_sql("SELECT name from collection WHERE id=1")[0][0]: res = run_sql("update collection set name='%s' where id=1" % cdsname) if res: fin_output += """The name of the root collection has been modified to be the same as the cdsware installation name given prior to installing cdsware.
""" else: return "Error renaming root collection." fin_output += """
0. Show all 1. Create new collection 2. Attach collection to tree 3. Modify collection tree 4. Webcoll Status
5. Collections Status
""" % (weburl, colID, ln, weburl, colID, ln, weburl, colID, ln, weburl, colID, ln, weburl, colID, ln, weburl, colID, ln) if mtype == "": fin_output += """

For managing the collections, select an item from the menu.
""" if mtype == "perform_addcollection" and content: fin_output += content elif mtype == "perform_addcollection" or mtype == "perform_showall": fin_output += perform_addcollection(colID=colID, ln=ln, callback='') fin_output += "
" if mtype == "perform_addcollectiontotree" and content: fin_output += content elif mtype == "perform_addcollectiontotree" or mtype == "perform_showall": fin_output += perform_addcollectiontotree(colID=colID, ln=ln, callback='') fin_output += "
" if mtype == "perform_modifycollectiontree" and content: fin_output += content elif mtype == "perform_modifycollectiontree" or mtype == "perform_showall": fin_output += perform_modifycollectiontree(colID=colID, ln=ln, callback='') fin_output += "
" if mtype == "perform_checkwebcollstatus" and content: fin_output += content elif mtype == "perform_checkwebcollstatus" or mtype == "perform_showall": fin_output += perform_checkwebcollstatus(colID, ln, callback='') if mtype == "perform_checkcollectionstatus" and content: fin_output += content elif mtype == "perform_checkcollectionstatus" or mtype == "perform_showall": fin_output += perform_checkcollectionstatus(colID, ln, callback='') try: body = [fin_output, extra] except NameError: body = [fin_output] return addadminbox('Menu', body) def show_coll_not_in_tree(colID, ln, col_dict): """Returns collections not in tree""" tree = get_col_tree(colID) in_tree = {} output = "These collections are not in the tree, and should be added:
" for (id, up, down, dad, reltype) in tree: in_tree[id] = 1 in_tree[dad] = 1 res = run_sql("SELECT id from collection") if len(res) != len(in_tree): for id in res: if not in_tree.has_key(id[0]): output += """%s , """ % (weburl, id[0], ln, col_dict[id[0]]) output += "

" else: output = "" return output def create_colltree(tree, col_dict, colID, ln, move_from='', move_to='', rtype='', edit=''): """Creates the presentation of the collection tree, with the buttons for modifying it. tree - the tree to present, from get_tree() col_dict - the name of the collections in a dictionary colID - the collection id to start with move_from - if a collection to be moved has been chosen move_to - the collection which should be set as father of move_from rtype - the type of the tree, regular or virtual edit - if the method should output the edit buttons.""" if move_from: move_from_rtype = move_from[0] move_from_id = int(move_from[1:len(move_from)]) tree_from = get_col_tree(colID, move_from_rtype) tree_to = get_col_tree(colID, rtype) tables = 0 tstack = [] i = 0 text = """ """ for i in range(0, len(tree)): id_son = tree[i][0] up = tree[i][1] down = tree[i][2] dad = tree[i][3] reltype = tree[i][4] tmove_from = "" j = i while j > 0: j = j - 1 try: if tstack[j][1] == dad: table = tstack[j][2] for k in range(0, tables - table): tables = tables - 1 text += """
""" if i > 0 and tree[i][1] == 0: tables = tables + 1 text += """ """ while tables > 0: text += """
""" if i == 0: tstack.append((id_son, dad, 1)) else: tstack.append((id_son, dad, tables)) if up == 1 and edit: text += """""" % (weburl, colID, ln, i, rtype, tree[i][0], weburl) else: text += """ """ text += "" if down == 1 and edit: text += """""" % (weburl, colID, ln, i, rtype, tree[i][0], weburl) else: text += """ """ text += "" if edit: if move_from and move_to: tmove_from = move_from move_from = '' if not (move_from == "" and i == 0) and not (move_from != "" and int(move_from[1:len(move_from)]) == i and rtype == move_from[0]): check = "true" if move_from: #if tree_from[move_from_id][0] == tree_to[i][0] or not check_col(tree_to[i][0], tree_from[move_from_id][0]): # check = '' #elif not check_col(tree_to[i][0], tree_from[move_from_id][0]): # check = '' #if not check and (tree_to[i][0] == 1 and tree_from[move_from_id][3] == tree_to[i][0] and move_from_rtype != rtype): # check = "true" if check: text += """ """ % (weburl, colID, ln, move_from, rtype, i, rtype, weburl, col_dict[tree_from[int(move_from[1:len(move_from)])][0]], col_dict[tree_to[i][0]]) else: try: text += """""" % (weburl, colID, ln, rtype, i, rtype, tree[i][0], weburl, col_dict[tree[i][0]]) except KeyError: pass else: text += """ """ % weburl else: text += """ """ % weburl text += """ """ if edit: try: text += """""" % (weburl, colID, ln, i, rtype, tree[i][0], weburl) except KeyError: pass elif i != 0: text += """ """ % weburl text += """ """ if tmove_from: move_from = tmove_from try: text += """%s%s%s%s%s""" % (tree[i][0], (reltype=="v" and '' or ''), weburl, tree[i][0], ln, col_dict[id_son], (move_to=="%s%s" %(rtype,i) and ' ' % weburl or ''), (move_from=="%s%s" % (rtype,i) and ' ' % weburl or ''), (reltype=="v" and '' or '')) except KeyError: pass text += """
""" return text def perform_deletecollection(colID, ln, confirm=-1, callback='yes'): """form to delete a collection colID - id of collection """ subtitle ='' output = """
WARNING:
When deleting a collection, you also deletes all data related to the collection like translations, relations to other collections and information about which rank methods to use.
For more information, please go to the WebSearch guide and read the section regarding deleting a collection.
""" % weburl col_dict = dict(get_def_name('', "collection")) if colID !=1 and colID and col_dict.has_key(int(colID)): colID = int(colID) subtitle = """4. Delete collection '%s'   [?]""" % (col_dict[colID], weburl) res = run_sql("SELECT * from collection_collection WHERE id_dad=%s" % colID) res2 = run_sql("SELECT * from collection_collection WHERE id_son=%s" % colID) if not res and not res2: if confirm in ["-1", -1]: text = """Do you want to delete this collection.""" output += createhiddenform(action="deletecollection#4", text=text, colID=colID, button="Delete", confirm=0) elif confirm in ["0", 0]: text = """Are you sure you want to delete this collection.""" output += createhiddenform(action="deletecollection#4", text=text, colID=colID, button="Confirm", confirm=1) elif confirm in ["1", 1]: result = delete_col(colID) if not result: raise StandardException else: output = """Can not delete a collection that is a part of the collection tree, remove collection from the tree and try again.""" else: subtitle = """4. Delete collection""" output = """Not possible to delete the root collection""" try: body = [output, extra] except NameError: body = [output] if callback: return perform_editcollection(colID, ln, "perform_deletecollection", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_editcollection(colID=1, ln=cdslang, mtype='', content=''): """interface to modify a collection. this method is calling other methods which again is calling this and sending back the output of the method. if callback, the method will call perform_editcollection, if not, it will just return its output. colID - id of the collection mtype - the method that called this method. content - the output from that method.""" colID = int(colID) col_dict = dict(get_def_name('', "collection")) if not col_dict.has_key(colID): return """Collection deleted. """ fin_output = """
Menu
0. Show all 1. Modify collection query 2. Modify access restrictions 3. Modify translations 4. Delete collection
5. Modify portalboxes 6. Modify search fields 7. Modify search options 8. Modify sort options 9. Modify rank options
10.Modify output formats
""" % (colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln, colID, ln) if mtype == "perform_modifydbquery" and content: fin_output += content elif mtype == "perform_modifydbquery" or not mtype: fin_output += perform_modifydbquery(colID, ln, callback='') if mtype == "perform_modifyrestricted" and content: fin_output += content elif mtype == "perform_modifyrestricted" or not mtype: fin_output += perform_modifyrestricted(colID, ln, callback='') if mtype == "perform_modifytranslations" and content: fin_output += content elif mtype == "perform_modifytranslations" or not mtype: fin_output += perform_modifytranslations(colID, ln, callback='') if mtype == "perform_deletecollection" and content: fin_output += content elif mtype == "perform_deletecollection" or not mtype: fin_output += perform_deletecollection(colID, ln, callback='') if mtype == "perform_showportalboxes" and content: fin_output += content elif mtype == "perform_showportalboxes" or not mtype: fin_output += perform_showportalboxes(colID, ln, callback='') if mtype == "perform_showsearchfields" and content: fin_output += content elif mtype == "perform_showsearchfields" or not mtype: fin_output += perform_showsearchfields(colID, ln, callback='') if mtype == "perform_showsearchoptions" and content: fin_output += content elif mtype == "perform_showsearchoptions" or not mtype: fin_output += perform_showsearchoptions(colID, ln, callback='') if mtype == "perform_showsortoptions" and content: fin_output += content elif mtype == "perform_showsortoptions" or not mtype: fin_output += perform_showsortoptions(colID, ln, callback='') if mtype == "perform_modifyrankmethods" and content: fin_output += content elif mtype == "perform_modifyrankmethods" or not mtype: fin_output += perform_modifyrankmethods(colID, ln, callback='') if mtype == "perform_showoutputformats" and content: fin_output += content elif mtype == "perform_showoutputformats" or not mtype: fin_output += perform_showoutputformats(colID, ln, callback='') return addadminbox("Overview of edit options for collection '%s'" % col_dict[colID], [fin_output]) def perform_checkwebcollstatus(colID, ln, confirm=0, callback='yes'): """Check status of the collection tables with respect to the webcoll cache.""" subtitle = """Webcoll Status   [?]""" % weburl output = "" colID = int(colID) col_dict = dict(get_def_name('', "collection")) output += """
Last updates:
""" collection_table_update_time = "" collection_web_update_time = "" res = run_sql("SHOW TABLE STATUS LIKE 'collection'") if res: collection_table_update_time = re.sub(r'\.00$', '', str(res[0][11])) output += "Collection table last updated: %s
" % collection_table_update_time try: file = open("%s/collections/1/last-updated-ln=en.html" % cachedir) collection_web_update_time = string.strip(file.readline()) collection_web_update_time = re.sub(r'[A-Z ]+$', '', collection_web_update_time) output += "Collection webpage last updated: %s
" % collection_web_update_time file.close() except StandardError, e: pass # reformat collection_web_update_time to the format suitable for comparisons try: collection_web_update_time = time.strftime("%04Y-%02m-%02d %02H:%02M:%02S", time.strptime(collection_web_update_time, "%d %b %Y %H:%M:%S")) except ValueError, e: pass if collection_table_update_time > collection_web_update_time: output += """
Warning: The collections has been modified since last time Webcoll was executed, to process the changes, Webcoll must be executed.
""" header = ['ID', 'Name', 'Time', 'Status', 'Progress'] actions = [] output += """
Last BibSched tasks:
""" res = run_sql("select id, proc, host, user, runtime, sleeptime, arguments, status, progress from schTASK where proc='webcoll' and runtime< now() ORDER by runtime") if len(res) > 0: (id, proc, host, user, runtime, sleeptime, arguments, status, progress) = res[len(res) - 1] webcoll__update_time = runtime actions.append([id, proc, runtime, (status !="" and status or ''), (progress !="" and progress or '')]) else: actions.append(['', 'webcoll', '', '', 'Not executed yet']) res = run_sql("select id, proc, host, user, runtime, sleeptime, arguments, status, progress from schTASK where proc='bibindex' and runtime< now() ORDER by runtime") if len(res) > 0: (id, proc, host, user, runtime, sleeptime, arguments, status, progress) = res[len(res) - 1] actions.append([id, proc, runtime, (status !="" and status or ''), (progress !="" and progress or '')]) else: actions.append(['', 'bibindex', '', '', 'Not executed yet']) output += tupletotable(header=header, tuple=actions) output += """
Next scheduled BibSched run:
""" actions = [] res = run_sql("select id, proc, host, user, runtime, sleeptime, arguments, status, progress from schTASK where proc='webcoll' and runtime > now() ORDER by runtime") webcoll_future = "" if len(res) > 0: (id, proc, host, user, runtime, sleeptime, arguments, status, progress) = res[0] webcoll__update_time = runtime actions.append([id, proc, runtime, (status !="" and status or ''), (progress !="" and progress or '')]) webcoll_future = "yes" else: actions.append(['', 'webcoll', '', '', 'Not scheduled']) res = run_sql("select id, proc, host, user, runtime, sleeptime, arguments, status, progress from schTASK where proc='bibindex' and runtime > now() ORDER by runtime") bibindex_future = "" if len(res) > 0: (id, proc, host, user, runtime, sleeptime, arguments, status, progress) = res[0] actions.append([id, proc, runtime, (status !="" and status or ''), (progress !="" and progress or '')]) bibindex_future = "yes" else: actions.append(['', 'bibindex', '', '', 'Not scheduled']) output += tupletotable(header=header, tuple=actions) if webcoll_future == "": output += """
Warning: Webcoll is not scheduled for a future run by bibsched, any updates to the collection will not be processed.
""" if bibindex_future == "": output += """
Warning: Bibindex is not scheduled for a future run by bibsched, any updates to the records will not be processed.
""" try: body = [output, extra] except NameError: body = [output] if callback: return perform_index(colID, ln, "perform_checkwebcollstatus", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_checkcollectionstatus(colID, ln, confirm=0, callback='yes'): """Check the configuration of the collections.""" subtitle = """Collections Status   [?]""" % weburl output = "" colID = int(colID) col_dict = dict(get_def_name('', "collection")) collections = run_sql("SELECT id, name, dbquery, restricted FROM collection ORDER BY id") header = ['ID', 'Name', 'Query', 'Subcollections', 'Restricted', 'I18N','Status'] rnk_list = get_def_name('', "rnkMETHOD") actions = [] for (id, name, dbquery, restricted) in collections: reg_sons = len(get_col_tree(id, 'r')) vir_sons = len(get_col_tree(id, 'v')) status = "" langs = run_sql("SELECT ln from collectionname where id_collection=%s" % id) i8n = "" if len(langs) > 0: for lang in langs: i8n += "%s, " % lang else: i8n = """None""" if (reg_sons > 1 and dbquery) or dbquery=="": status = """1:Query""" elif dbquery is None and reg_sons == 1: status = """2:Query""" elif dbquery == "" and reg_sons == 1: status = """3:Query""" if (reg_sons > 1 or vir_sons > 1): subs = """Yes""" else: subs = """No""" if dbquery is None: dbquery = """No""" if restricted == "": restricted = "" if status: status += """,4:Restricted""" else: status += """4:Restricted""" elif restricted is None: restricted = """No""" if status == "": status = """OK""" actions.append([id, """%s""" % (weburl, id, ln, name), dbquery, subs, restricted, i8n, status]) output += tupletotable(header=header, tuple=actions) try: body = [output, extra] except NameError: body = [output] return addadminbox(subtitle, body) if callback: return perform_index(colID, ln, "perform_checkcollectionstatus", addadminbox(subtitle, body)) else: return addadminbox(subtitle, body) def perform_removeoutputformat(colID, ln, fmtID='', callback='yes', confirm=0): """form to remove an output format from a collection. colID - the collection id of the current collection. fmtID - the format id. """ subtitle = """Remove output format""" output = "" col_dict = dict(get_def_name('', "collection")) fmt_dict = dict(get_def_name('', "format")) if colID and fmtID: colID = int(colID) fmtID = int(fmtID) if confirm in ["0", 0]: text = """Do you want to remove the output format '%s' from the collection '%s'.""" % (fmt_dict[fmtID], col_dict[colID]) output += createhiddenform(action="removeoutputformat#10.5", text=text, button="Confirm", colID=colID, fmtID=fmtID, confirm=1) elif confirm in ["1", 1]: res = remove_fmt(colID, fmtID) output += write_outcome(res) try: body = [output, extra] except NameError: body = [output] output = "
" + addadminbox(subtitle, body) return perform_showoutputformats(colID, ln, content=output) def get_col_tree(colID, rtype=''): """Returns a presentation of the tree as a list. TODO: Add loop detection colID - startpoint for the tree rtype - get regular or virtual part of the tree""" try: colID = int(colID) stack = [colID] ssize = 0 tree = [(colID, 0, 0, colID, 'r')] while len(stack) > 0: ccolID = stack.pop() if ccolID == colID and rtype: res = run_sql("SELECT id_son, score, type FROM collection_collection WHERE id_dad=%s AND type='%s' ORDER BY score ASC,id_son" % (ccolID,rtype)) else: res = run_sql("SELECT id_son, score, type FROM collection_collection WHERE id_dad=%s ORDER BY score ASC,id_son" % ccolID) ssize += 1 ntree = [] for i in range(0,len(res)): id_son = res[i][0] score = res[i][1] rtype = res[i][2] stack.append(id_son) if i == (len(res) - 1): up = 0 else: up = 1 if i == 0: down = 0 else: down = 1 ntree.insert(0, (id_son, up, down, ccolID, rtype)) tree = tree[0:ssize] + ntree + tree[ssize:len(tree)] return tree except StandardError, e: return () def add_col_dad_son(add_dad, add_son, rtype): """Add a son to a collection (dad) add_dad - add to this collection id add_son - add this collection id rtype - either regular or virtual""" try: res = run_sql("SELECT score FROM collection_collection WHERE id_dad=%s ORDER BY score ASC" % add_dad) highscore = 0 for score in res: if int(score[0]) > highscore: highscore = int(score[0]) highscore += 1 res = run_sql("INSERT INTO collection_collection(id_dad,id_son,score,type) values(%s,%s,%s,'%s')" % (add_dad, add_son, highscore, rtype)) return (1, highscore) except StandardError, e: return (0, e) def compare_on_val(first, second): """Compare the two values""" return cmp(first[1], second[1]) def get_col_fld(colID=-1, type = '', id_field=''): """Returns either all portalboxes associated with a collection, or based on either colID or language or both. colID - collection id ln - language id""" sql = "SELECT id_field,id_fieldvalue,type,score,score_fieldvalue FROM collection_field_fieldvalue, field WHERE id_field=field.id" try: if colID > -1: sql += " AND id_collection=%s" % colID if id_field: sql += " AND id_field=%s" % id_field if type: sql += " AND type='%s'" % type sql += " ORDER BY type, score desc, score_fieldvalue desc" res = run_sql(sql) return res except StandardError, e: return "" def get_col_pbx(colID=-1, ln='', position = ''): """Returns either all portalboxes associated with a collection, or based on either colID or language or both. colID - collection id ln - language id""" sql = "SELECT id_portalbox, id_collection, ln, score, position, title, body FROM collection_portalbox, portalbox WHERE id_portalbox = portalbox.id" try: if colID > -1: sql += " AND id_collection=%s" % colID if ln: sql += " AND ln='%s'" % ln if position: sql += " AND position='%s'" % position sql += " ORDER BY position, ln, score desc" res = run_sql(sql) return res except StandardError, e: return "" def get_col_fmt(colID=-1): """Returns all formats currently associated with a collection, or for one specific collection colID - the id of the collection""" try: if colID not in [-1, "-1"]: res = run_sql("SELECT id_format, id_collection, code, score FROM collection_format, format WHERE id_format = format.id AND id_collection=%s ORDER BY score desc" % colID) else: res = run_sql("SELECT id_format, id_collection, code, score FROM collection_format, format WHERE id_format = format.id ORDER BY score desc") return res except StandardError, e: return "" def get_col_rnk(colID, ln): """ Returns a list of the rank methods the given collection is attached to colID - id from collection""" try: res1 = dict(run_sql("SELECT id_rnkMETHOD, '' FROM collection_rnkMETHOD WHERE id_collection=%s" % colID)) res2 = get_def_name('', "rnkMETHOD") result = filter(lambda x: res1.has_key(x[0]), res2) return result except StandardError, e: return () def get_pbx(): """Returns all portalboxes""" try: res = run_sql("SELECT id, title, body FROM portalbox ORDER by title,body") return res except StandardError, e: return "" def get_fld_value(fldvID = ''): """Returns fieldvalue""" try: sql = "SELECT id, name, value FROM fieldvalue" if fldvID: sql += " WHERE id=%s" % fldvID sql += " ORDER BY name" res = run_sql(sql) return res except StandardError, e: return "" def get_pbx_pos(): """Returns a list of all the positions for a portalbox""" position = {} position["rt"] = "Right Top" position["lt"] = "Left Top" position["te"] = "Title Epilog" position["tp"] = "Title Prolog" position["ne"] = "Narrow by coll epilog" position["np"] = "Narrow by coll prolog" return position def get_sort_nametypes(): """Return a list of the various translationnames for the fields""" type = {} type['soo'] = 'Sort options' type['seo'] = 'Search options' type['sew'] = 'Search within' return type def get_fmt_nametypes(): """Return a list of the various translationnames for the output formats""" type = [] type.append(('ln', 'Long name')) return type def get_fld_nametypes(): """Return a list of the various translationnames for the fields""" type = [] type.append(('ln', 'Long name')) return type def get_col_nametypes(): """Return a list of the various translationnames for the collections""" type = [] type.append(('ln', 'Long name')) return type def find_last(tree, start_son): """Find the previous collection in the tree with the same father as start_son""" id_dad = tree[start_son][3] while start_son > 0: start_son -= 1 if tree[start_son][3] == id_dad: return start_son def find_next(tree, start_son): """Find the next collection in the tree with the same father as start_son""" id_dad = tree[start_son][3] while start_son < len(tree): start_son += 1 if tree[start_son][3] == id_dad: return start_son def remove_col_subcol(id_son, id_dad, type): """Remove a collection as a son of another collection in the tree, if collection isn't used elsewhere in the tree, remove all registered sons of the id_son. id_son - collection id of son to remove id_dad - the id of the dad""" try: if id_son != id_dad: tree = get_col_tree(id_son) res = run_sql("DELETE FROM collection_collection WHERE id_son=%s and id_dad=%s" % (id_son, id_dad)) else: tree = get_col_tree(id_son, type) res = run_sql("DELETE FROM collection_collection WHERE id_son=%s and id_dad=%s and type='%s'" % (id_son, id_dad, type)) if not run_sql("SELECT * from collection_collection WHERE id_son=%s and type='%s'" % (id_son, type)): for (id, up, down, dad, rtype) in tree: res = run_sql("DELETE FROM collection_collection WHERE id_son=%s and id_dad=%s" % (id, dad)) return (1, "") except StandardError, e: return (0, e) def check_col(add_dad, add_son): """Check if the collection can be placed as a son of the dad without causing loops. add_dad - collection id add_son - collection id""" try: stack = [add_dad] res = run_sql("SELECT id_dad FROM collection_collection WHERE id_dad=%s AND id_son=%s" % (add_dad,add_son)) if res: raise StandardError while len(stack) > 0: colID = stack.pop() res = run_sql("SELECT id_dad FROM collection_collection WHERE id_son=%s" % colID) for id in res: if int(id[0]) == int(add_son): raise StandardError else: stack.append(id[0]) return (1, "") except StandardError, e: return (0, e) def attach_rnk_col(colID, rnkID): """attach rank method to collection rnkID - id from rnkMETHOD table colID - id of collection, as in collection table """ try: res = run_sql("INSERT INTO collection_rnkMETHOD(id_collection, id_rnkMETHOD) values (%s,%s)" % (colID, rnkID)) return (1, "") except StandardError, e: return (0, e) def detach_rnk_col(colID, rnkID): """detach rank method from collection rnkID - id from rnkMETHOD table colID - id of collection, as in collection table """ try: res = run_sql("DELETE FROM collection_rnkMETHOD WHERE id_collection=%s AND id_rnkMETHOD=%s" % (colID, rnkID)) return (1, "") except StandardError, e: return (0, e) def switch_col_treescore(col_1, col_2): try: res1 = run_sql("SELECT score FROM collection_collection WHERE id_dad=%s and id_son=%s" % (col_1[3], col_1[0])) res2 = run_sql("SELECT score FROM collection_collection WHERE id_dad=%s and id_son=%s" % (col_2[3], col_2[0])) res = run_sql("UPDATE collection_collection SET score=%s WHERE id_dad=%s and id_son=%s" % (res2[0][0], col_1[3], col_1[0])) res = run_sql("UPDATE collection_collection SET score=%s WHERE id_dad=%s and id_son=%s" % (res1[0][0], col_2[3], col_2[0])) return (1, "") except StandardError, e: return (0, e) def move_col_tree(col_from, col_to, move_to_rtype=''): """Move a collection from one point in the tree to another. becomes a son of the endpoint. col_from - move this collection from current point col_to - and set it as a son of this collection. move_to_rtype - either virtual or regular collection""" try: res = run_sql("SELECT score FROM collection_collection WHERE id_dad=%s ORDER BY score asc" % col_to[0]) highscore = 0 for score in res: if int(score[0]) > highscore: highscore = int(score[0]) highscore += 1 if not move_to_rtype: move_to_rtype = col_from[4] res = run_sql("DELETE FROM collection_collection WHERE id_son=%s and id_dad=%s" % (col_from[0], col_from[3])) res = run_sql("INSERT INTO collection_collection(id_dad,id_son,score,type) values(%s,%s,%s,'%s')" % (col_to[0], col_from[0], highscore, move_to_rtype)) return (1, "") except StandardError, e: return (0, e) def remove_pbx(colID, pbxID, ln): """Removes a portalbox from the collection given. colID - the collection the format is connected to pbxID - the portalbox which should be removed from the collection. ln - the language of the portalbox to be removed""" try: res = run_sql("DELETE FROM collection_portalbox WHERE id_collection=%s AND id_portalbox=%s AND ln='%s'" % (colID, pbxID, ln)) return (1, "") except StandardError, e: return (0, e) def remove_fmt(colID,fmtID): """Removes a format from the collection given. colID - the collection the format is connected to fmtID - the format which should be removed from the collection.""" try: res = run_sql("DELETE FROM collection_format WHERE id_collection=%s AND id_format=%s" % (colID, fmtID)) return (1, "") except StandardError, e: return (0, e) def remove_fld(colID,fldID, fldvID=''): """Removes a field from the collection given. colID - the collection the format is connected to fldID - the field which should be removed from the collection.""" try: sql = "DELETE FROM collection_field_fieldvalue WHERE id_collection=%s AND id_field=%s" % (colID, fldID) if fldvID: if fldvID != "None": sql += " AND id_fieldvalue=%s" % fldvID else: sql += " AND id_fieldvalue is NULL" res = run_sql(sql) return (1, "") except StandardError, e: return (0, e) def delete_fldv(fldvID): """Deletes all data for the given fieldvalue fldvID - delete all data in the tables associated with fieldvalue and this id""" try: res = run_sql("DELETE FROM collection_field_fieldvalue WHERE id_fieldvalue=%s" % fldvID) res = run_sql("DELETE FROM fieldvalue WHERE id=%s" % fldvID) return (1, "") except StandardError, e: return (0, e) def delete_pbx(pbxID): """Deletes all data for the given portalbox pbxID - delete all data in the tables associated with portalbox and this id """ try: res = run_sql("DELETE FROM collection_portalbox WHERE id_portalbox=%s" % pbxID) res = run_sql("DELETE FROM portalbox WHERE id=%s" % pbxID) return (1, "") except StandardError, e: return (0, e) def delete_fmt(fmtID): """Deletes all data for the given format fmtID - delete all data in the tables associated with format and this id """ try: res = run_sql("DELETE FROM format WHERE id=%s" % fmtID) res = run_sql("DELETE FROM collection_format WHERE id_format=%s" % fmtID) res = run_sql("DELETE FROM formatname WHERE id_format=%s" % fmtID) return (1, "") except StandardError, e: return (0, e) def delete_col(colID): """Deletes all data for the given collection colID - delete all data in the tables associated with collection and this id """ try: res = run_sql("DELETE FROM collection WHERE id=%s" % colID) res = run_sql("DELETE FROM collectionname WHERE id_collection=%s" % colID) res = run_sql("DELETE FROM collection_rnkMETHOD WHERE id_collection=%s" % colID) res = run_sql("DELETE FROM collection_collection WHERE id_dad=%s" % colID) res = run_sql("DELETE FROM collection_collection WHERE id_son=%s" % colID) res = run_sql("DELETE FROM collection_portalbox WHERE id_collection=%s" % colID) res = run_sql("DELETE FROM collection_format WHERE id_collection=%s" % colID) res = run_sql("DELETE FROM collection_field_fieldvalue WHERE id_collection=%s" % colID) return (1, "") except StandardError, e: return (0, e) def add_fmt(code, name, rtype): """Add a new output format. Returns the id of the format. code - the code for the format, max 6 chars. name - the default name for the default language of the format. rtype - the default nametype""" try: res = run_sql("INSERT INTO format (code, name) values ('%s','%s')" % (MySQLdb.escape_string(code), MySQLdb.escape_string(name))) fmtID = run_sql("SELECT id FROM format WHERE code='%s'" % MySQLdb.escape_string(code)) res = run_sql("INSERT INTO formatname(id_format, type, ln, value) VALUES (%s,'%s','%s','%s')" % (fmtID[0][0], rtype, cdslang, MySQLdb.escape_string(name))) return (1, fmtID) except StandardError, e: return (0, e) def update_fldv(fldvID, name, value): """Modify existing fieldvalue fldvID - id of fieldvalue to modify value - the value of the fieldvalue name - the name of the fieldvalue.""" try: res = run_sql("UPDATE fieldvalue set name='%s' where id=%s" % (MySQLdb.escape_string(name), fldvID)) res = run_sql("UPDATE fieldvalue set value='%s' where id=%s" % (MySQLdb.escape_string(value), fldvID)) return (1, "") except StandardError, e: return (0, e) def add_fldv(name, value): """Add a new fieldvalue, returns id of fieldvalue value - the value of the fieldvalue name - the name of the fieldvalue.""" try: res = run_sql("SELECT id FROM fieldvalue WHERE name='%s' and value='%s'" % (MySQLdb.escape_string(name), MySQLdb.escape_string(value))) if not res: res = run_sql("INSERT INTO fieldvalue (name, value) values ('%s','%s')" % (MySQLdb.escape_string(name), MySQLdb.escape_string(value))) res = run_sql("SELECT id FROM fieldvalue WHERE name='%s' and value='%s'" % (MySQLdb.escape_string(name), MySQLdb.escape_string(value))) if res: return (1, res[0][0]) else: raise StandardError except StandardError, e: return (0, e) def add_pbx(title, body): try: res = run_sql("INSERT INTO portalbox (title, body) values ('%s','%s')" % (MySQLdb.escape_string(title), MySQLdb.escape_string(body))) res = run_sql("SELECT id FROM portalbox WHERE title='%s' AND body='%s'" % (MySQLdb.escape_string(title), MySQLdb.escape_string(body))) if res: return (1, res[0][0]) else: raise StandardError except StandardError, e: return (0, e) def add_col(colNAME, dbquery, rest): """Adds a new collection to collection table colNAME - the default name for the collection, saved to collection and collectionname dbquery - query related to the collection rest - name of apache group allowed to access collection""" try: rtype = get_col_nametypes()[0][0] colID = run_sql("SELECT id FROM collection WHERE id=1") if colID: sql = "INSERT INTO collection(name,dbquery,restricted) VALUES('%s'" % MySQLdb.escape_string(colNAME) else: sql = "INSERT INTO collection(id,name,dbquery,restricted) VALUES(1,'%s'" % MySQLdb.escape_string(colNAME) if dbquery: sql += ",'%s'" % MySQLdb.escape_string(dbquery) else: sql += ",null" if rest: sql += ",'%s'" % MySQLdb.escape_string(rest) else: sql += ",null" sql += ")" res = run_sql(sql) colID = run_sql("SELECT id FROM collection WHERE name='%s'" % MySQLdb.escape_string(colNAME)) res = run_sql("INSERT INTO collectionname(id_collection, type, ln, value) VALUES (%s,'%s','%s','%s')" % (colID[0][0], rtype, cdslang, MySQLdb.escape_string(colNAME))) if colID: return (1, colID[0][0]) else: raise StandardError except StandardError, e: return (0, e) def add_col_pbx(colID, pbxID, ln, position, score=''): """add a portalbox to the collection. colID - the id of the collection involved pbxID - the portalbox to add ln - which language the portalbox is for score - decides which portalbox is the most important position - position on page the portalbox should appear.""" try: if score: res = run_sql("INSERT INTO collection_portalbox(id_portalbox, id_collection, ln, score, position) values (%s,%s,'%s',%s,'%s')" % (pbxID, colID, ln, score, position)) else: res = run_sql("SELECT score FROM collection_portalbox WHERE id_collection=%s and ln='%s' and position='%s' ORDER BY score desc, ln, position" % (colID, ln, position)) if res: score = int(res[0][0]) else: score = 0 res = run_sql("INSERT INTO collection_portalbox(id_portalbox, id_collection, ln, score, position) values (%s,%s,'%s',%s,'%s')" % (pbxID, colID, ln, (score + 1), position)) return (1, "") except StandardError, e: return (0, e) def add_col_fmt(colID, fmtID, score=''): """Add a output format to the collection. colID - the id of the collection involved fmtID - the id of the format. score - the score of the format, decides sorting, if not given, place the format on top""" try: if score: res = run_sql("INSERT INTO collection_format(id_format, id_collection, score) values (%s,%s,%s)" % (fmtID, colID, score)) else: res = run_sql("SELECT score FROM collection_format WHERE id_collection=%s ORDER BY score desc" % colID) if res: score = int(res[0][0]) else: score = 0 res = run_sql("INSERT INTO collection_format(id_format, id_collection, score) values (%s,%s,%s)" % (fmtID, colID, (score + 1))) return (1, "") except StandardError, e: return (0, e) def add_col_fld(colID, fldID, type, fldvID=''): """Add a sort/search/field to the collection. colID - the id of the collection involved fldID - the id of the field. fldvID - the id of the fieldvalue. type - which type, seo, sew... score - the score of the format, decides sorting, if not given, place the format on top""" try: if fldvID and fldvID not in [-1, "-1"]: run_sql("DELETE FROM collection_field_fieldvalue WHERE id_collection=%s AND id_field=%s and type='%s' and id_fieldvalue is NULL" % (colID, fldID, type)) res = run_sql("SELECT score FROM collection_field_fieldvalue WHERE id_collection=%s AND id_field=%s and type='%s' ORDER BY score desc" % (colID, fldID, type)) if res: score = int(res[0][0]) res = run_sql("SELECT score_fieldvalue FROM collection_field_fieldvalue WHERE id_collection=%s AND id_field=%s and type='%s' ORDER BY score_fieldvalue desc" % (colID, fldID, type)) else: res = run_sql("SELECT score FROM collection_field_fieldvalue WHERE id_collection=%s and type='%s' ORDER BY score desc" % (colID, type)) if res: score = int(res[0][0]) + 1 else: score = 1 res = run_sql("SELECT * FROM collection_field_fieldvalue where id_field=%s and id_collection=%s and type='%s' and id_fieldvalue=%s" % (fldID, colID, type, fldvID)) if not res: run_sql("UPDATE collection_field_fieldvalue SET score_fieldvalue=score_fieldvalue+1 WHERE id_field=%s AND id_collection=%s and type='%s'" % (fldID, colID, type)) res = run_sql("INSERT INTO collection_field_fieldvalue(id_field, id_fieldvalue, id_collection, type, score, score_fieldvalue) values (%s,%s,%s,'%s',%s,%s)" % (fldID, fldvID, colID, type, score, 1)) else: return (0, (1, "Already exists")) else: res = run_sql("SELECT * FROM collection_field_fieldvalue WHERE id_collection=%s AND type='%s' and id_field=%s and id_fieldvalue is NULL" % (colID, type, fldID)) if res: return (0, (1, "Already exists")) else: run_sql("UPDATE collection_field_fieldvalue SET score=score+1") res = run_sql("INSERT INTO collection_field_fieldvalue(id_field, id_collection, type, score,score_fieldvalue) values (%s,%s,'%s',%s, 0)" % (fldID, colID, type, 1)) return (1, "") except StandardError, e: return (0, e) def modify_restricted(colID, rest): """Modify which apache group is allowed to use the collection. colID - the id of the collection involved restricted - the new group""" try: sql = "UPDATE collection SET restricted=" if rest: sql += "'%s'" % MySQLdb.escape_string(rest) else: sql += "null" sql += " WHERE id=%s" % colID res = run_sql(sql) return (1, "") except StandardError, e: return (0, e) def modify_dbquery(colID, dbquery): """Modify the dbquery of an collection. colID - the id of the collection involved dbquery - the new dbquery""" try: sql = "UPDATE collection SET dbquery=" if dbquery: sql += "'%s'" % MySQLdb.escape_string(dbquery) else: sql += "null" sql += " WHERE id=%s" % colID res = run_sql(sql) return (1, "") except StandardError, e: return (0, e) def modify_pbx(colID, pbxID, sel_ln, score='', position='', title='', body=''): """Modify a portalbox colID - the id of the collection involved pbxID - the id of the portalbox that should be modified sel_ln - the language of the portalbox that should be modified title - the title body - the content score - if several portalboxes in one position, who should appear on top. position - position on page""" try: if title: res = run_sql("UPDATE portalbox SET title='%s' WHERE id=%s" % (MySQLdb.escape_string(title), pbxID)) if body: res = run_sql("UPDATE portalbox SET body='%s' WHERE id=%s" % (MySQLdb.escape_string(body), pbxID)) if score: res = run_sql("UPDATE collection_portalbox SET score='%s' WHERE id_collection=%s and id_portalbox=%s and ln='%s'" % (score, colID, pbxID, sel_ln)) if position: res = run_sql("UPDATE collection_portalbox SET position='%s' WHERE id_collection=%s and id_portalbox=%s and ln='%s'" % (position, colID, pbxID, sel_ln)) return (1, "") except Exception, e: return (0, e) def switch_fld_score(colID, id_1, id_2): """Switch the scores of id_1 and id_2 in collection_field_fieldvalue colID - collection the id_1 or id_2 is connected to id_1/id_2 - id field from tables like format..portalbox... table - name of the table""" try: res1 = run_sql("SELECT score FROM collection_field_fieldvalue WHERE id_collection=%s and id_field=%s" % (colID, id_1)) res2 = run_sql("SELECT score FROM collection_field_fieldvalue WHERE id_collection=%s and id_field=%s" % (colID, id_2)) if res1[0][0] == res2[0][0]: return (0, (1, "Cannot rearrange the selected fields, either rearrange by name or use the mySQL client to fix the problem.")) else: res = run_sql("UPDATE collection_field_fieldvalue SET score=%s WHERE id_collection=%s and id_field=%s" % (res2[0][0], colID, id_1)) res = run_sql("UPDATE collection_field_fieldvalue SET score=%s WHERE id_collection=%s and id_field=%s" % (res1[0][0], colID, id_2)) return (1, "") except StandardError, e: return (0, e) def switch_fld_value_score(colID, id_1, fldvID_1, fldvID_2): """Switch the scores of two field_value colID - collection the id_1 or id_2 is connected to id_1/id_2 - id field from tables like format..portalbox... table - name of the table""" try: res1 = run_sql("SELECT score_fieldvalue FROM collection_field_fieldvalue WHERE id_collection=%s and id_field=%s and id_fieldvalue=%s" % (colID, id_1, fldvID_1)) res2 = run_sql("SELECT score_fieldvalue FROM collection_field_fieldvalue WHERE id_collection=%s and id_field=%s and id_fieldvalue=%s" % (colID, id_1, fldvID_2)) if res1[0][0] == res2[0][0]: return (0, (1, "Cannot rearrange the selected fields, either rearrange by name or use the mySQL client to fix the problem.")) else: res = run_sql("UPDATE collection_field_fieldvalue SET score_fieldvalue=%s WHERE id_collection=%s and id_field=%s and id_fieldvalue=%s" % (res2[0][0], colID, id_1, fldvID_1)) res = run_sql("UPDATE collection_field_fieldvalue SET score_fieldvalue=%s WHERE id_collection=%s and id_field=%s and id_fieldvalue=%s" % (res1[0][0], colID, id_1, fldvID_2)) return (1, "") except Exception, e: return (0, e) def switch_pbx_score(colID, id_1, id_2, sel_ln): """Switch the scores of id_1 and id_2 in the table given by the argument. colID - collection the id_1 or id_2 is connected to id_1/id_2 - id field from tables like format..portalbox... table - name of the table""" try: res1 = run_sql("SELECT score FROM collection_portalbox WHERE id_collection=%s and id_portalbox=%s and ln='%s'" % (colID, id_1, sel_ln)) res2 = run_sql("SELECT score FROM collection_portalbox WHERE id_collection=%s and id_portalbox=%s and ln='%s'" % (colID, id_2, sel_ln)) if res1[0][0] == res2[0][0]: return (0, (1, "Cannot rearrange the selected fields, either rearrange by name or use the mySQL client to fix the problem.")) res = run_sql("UPDATE collection_portalbox SET score=%s WHERE id_collection=%s and id_portalbox=%s and ln='%s'" % (res2[0][0], colID, id_1, sel_ln)) res = run_sql("UPDATE collection_portalbox SET score=%s WHERE id_collection=%s and id_portalbox=%s and ln='%s'" % (res1[0][0], colID, id_2, sel_ln)) return (1, "") except Exception, e: return (0, e) def switch_score(colID, id_1, id_2, table): """Switch the scores of id_1 and id_2 in the table given by the argument. colID - collection the id_1 or id_2 is connected to id_1/id_2 - id field from tables like format..portalbox... table - name of the table""" try: res1 = run_sql("SELECT score FROM collection_%s WHERE id_collection=%s and id_%s=%s" % (table, colID, table, id_1)) res2 = run_sql("SELECT score FROM collection_%s WHERE id_collection=%s and id_%s=%s" % (table, colID, table, id_2)) if res1[0][0] == res2[0][0]: return (0, (1, "Cannot rearrange the selected fields, either rearrange by name or use the mySQL client to fix the problem.")) res = run_sql("UPDATE collection_%s SET score=%s WHERE id_collection=%s and id_%s=%s" % (table, res2[0][0], colID, table, id_1)) res = run_sql("UPDATE collection_%s SET score=%s WHERE id_collection=%s and id_%s=%s" % (table, res1[0][0], colID, table, id_2)) return (1, "") except Exception, e: return (0, e) diff --git a/modules/websearch/web/search.py b/modules/websearch/web/search.py index 243dc7e04..7c3d1f819 100644 --- a/modules/websearch/web/search.py +++ b/modules/websearch/web/search.py @@ -1,119 +1,120 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware Search Engine Web Interface.""" import sys +from mod_python import apache + from cdsware.config import weburl,cdsname from cdsware import search_engine -from mod_python import apache from cdsware.webuser import getUid, page_not_authorized __version__ = "$Id$" def index(req, cc=cdsname, c=None, p="", f="", rg="10", sf="", so="d", sp="", rm="", of="hb", ot="", as="0", p1="", f1="", m1="", op1="", p2="", f2="", m2="", op2="", p3="", f3="", m3="", sc="0", jrec="0", recid="-1", recidb="-1", sysno="", id="-1", idb="-1", sysnb="", action="", d1y="0", d1m="0", d1d="0", d2y="0", d2m="0", d2d="0", verbose="0", ap="1", ln="en"): """Main entry point to WebSearch search engine. See the docstring of search_engine.perform_request_search for the detailed explanation of arguments. """ uid = getUid(req) if uid == -1: return page_not_authorized(req, "../search.py") need_authentication = 0 # check c if type(c) is list: for coll in c: if search_engine.coll_restricted_p(coll): need_authentication = 1 else: pass elif search_engine.coll_restricted_p(c): need_authentication = 1 # check cc if type(cc) is list: for coll in cc: if search_engine.coll_restricted_p(coll): need_authentication = 1 else: pass elif search_engine.coll_restricted_p(cc): need_authentication = 1 # is authentication needed? if need_authentication: req.err_headers_out.add("Location", "%s/search.py/authenticate?%s" % (weburl, req.args)) raise apache.SERVER_RETURN, apache.HTTP_MOVED_PERMANENTLY else: return search_engine.perform_request_search(req, cc, c, p, f, rg, sf, so, sp, rm, of, ot, as, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3, m3, sc, jrec, recid, recidb, sysno, id, idb, sysnb, action, d1y, d1m, d1d, d2y, d2m, d2d, verbose, ap, ln) def authenticate(req, cc=cdsname, c=None, p="", f="", rg="10", sf="", so="d", sp="", rm="", of="hb", ot="", as="0", p1="", f1="", m1="", op1="", p2="", f2="", m2="", op2="", p3="", f3="", m3="", sc="0", jrec="0", recid="-1", recidb="-1", sysno="", id="-1", idb="-1", sysnb="", action="", d1y="0", d1m="0", d1d="0", d2y="0", d2m="0", d2d="0", verbose="0", ap="1", ln="en"): """Authenticate the user before launching the search. See the docstring of search_engine.perform_request_search for the detailed explanation of arguments. """ __auth_realm__ = "restricted collection" def __auth__(req, user, password): """Is user authorized to proceed with the request?""" import sys from cdsware.config import cdsname from cdsware.webuser import auth_apache_user_collection_p from cgi import parse_qs # let's parse collection list from given URL request: parsed_args = parse_qs(req.args) l_cc = parsed_args.get('cc', [cdsname]) l_c = parsed_args.get('c', []) # let's check user authentication for each collection: for coll in l_c + l_cc: if not auth_apache_user_collection_p(user, password, coll): return 0 return 1 return search_engine.perform_request_search(req, cc, c, p, f, rg, sf, so, sp, rm, of, ot, as, p1, f1, m1, op1, p2, f2, m2, op2, p3, f3, m3, sc, jrec, recid, recidb, sysno, id, idb, sysnb, action, d1y, d1m, d1d, d2y, d2m, d2d, verbose, ap, ln) def cache(req, action="show"): """Manipulates the search engine cache.""" return search_engine.perform_request_cache(req, action) def log(req, date=""): """Display search log information for given date.""" return search_engine.perform_request_log(req, date) def test(req): import cgi req.content_type = "text/plain" req.send_http_header() args = cgi.parse_qs(req.args) req.write("BEG\n") req.write("%s\n" % args.get('c')) req.write("END\n") return "\n" diff --git a/modules/websession/bin/sessiongc.wml b/modules/websession/bin/sessiongc.wml index 375dfa754..893636136 100644 --- a/modules/websession/bin/sessiongc.wml +++ b/modules/websession/bin/sessiongc.wml @@ -1,393 +1,388 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## read config variables: #include "config.wml" #include "configbis.wml" #include "cdswmllib.wml" ## start Python: #! # -*- coding: utf-8 -*- ## $Id$ ## DO NOT EDIT THIS FILE! IT WAS AUTOMATICALLY GENERATED FROM CDSware WML SOURCES. """ Guest user sessions garbage collector. To be run via cron once per day. (say) """ __version__ = "<: print generate_pretty_version_string('$Id$'); :>" -## fill config variables: -pylibdir = "/python" - ## okay, rest of the Python code goes below ####### import sys try: - sys.path.append('%s' % pylibdir) - from cdsware.dbquery import run_sql import getopt import time except ImportError, e: print "Error: %s" % (e, ) sys.exit(1) # configure variables cfg_mysql_argumentlist_size = 100 def guest_user_garbage_collector(verbose=1): """Session Garbage Collector program flow/tasks: 1: delete expired sessions 1b:delete guest users without session 2: delete queries not attached to any user 3: delete baskets not attached to any user 4: delete alerts not attached to any user verbose - level of program output. 0 - nothing 1 - default 9 - max, debug""" # dictionary used to keep track of number of deleted entries delcount = {'session': 0, 'user': 0, 'user_query': 0, 'query': 0, 'basket': 0, 'user_basket': 0, 'user_query_basket': 0} if verbose: print """\nGUEST USER SESSIONS GARBAGE COLLECTOR STARTED: %s.\n""" % (time.ctime(), ) # 1 - DELETE EXPIRED SESSIONS if verbose: print "- deleting expired sessions" timelimit = time.time() if verbose >= 9: print """ DELETE FROM session WHERE session_expiry < %d \n""" % (timelimit, ) delcount['session'] += run_sql("""DELETE FROM session WHERE session_expiry < %s """ % (timelimit, )) # 1b - DELETE GUEST USERS WITHOUT SESSION if verbose: print "- deleting guest users without session" # get uids if verbose >= 9: print """ SELECT u.id\n FROM user AS u LEFT JOIN session AS s\n ON u.id = s.uid\n WHERE s.uid IS NULL AND u.email = ''""" result = run_sql("""SELECT u.id FROM user AS u LEFT JOIN session AS s ON u.id = s.uid WHERE s.uid IS NULL AND u.email = ''""") if verbose >= 9: print result if result: # work on slices of result list in case of big result for i in range(0, len(result), cfg_mysql_argumentlist_size): # create string of uids uidstr = '' for (id_user, ) in result[i:i+cfg_mysql_argumentlist_size]: if uidstr: uidstr += ',' uidstr += "%s" % (id_user, ) # delete users if verbose >= 9: print """ DELETE FROM user WHERE id IN (TRAVERSE LAST RESULT) AND email = '' \n""" delcount['user'] += run_sql("""DELETE FROM user WHERE id IN (%s) AND email = ''""" % (uidstr, )) # 2 - DELETE QUERIES NOT ATTACHED TO ANY USER # first step, delete from user_query if verbose: print "- deleting user_queries referencing non-existent users" # find user_queries referencing non-existent users if verbose >= 9: print """ SELECT DISTINCT uq.id_user\n FROM user_query AS uq LEFT JOIN user AS u\n ON uq.id_user = u.id\n WHERE u.id IS NULL""" result = run_sql("""SELECT DISTINCT uq.id_user FROM user_query AS uq LEFT JOIN user AS u ON uq.id_user = u.id WHERE u.id IS NULL""") if verbose >= 9: print result # delete in user_query one by one if verbose >= 9: print """ DELETE FROM user_query WHERE id_user = 'TRAVERSE LAST RESULT' \n""" for (id_user, ) in result: delcount['user_query'] += run_sql("""DELETE FROM user_query WHERE id_user = %s""" % (id_user, )) # delete the actual queries if verbose: print "- deleting queries not attached to any user" # select queries that must be deleted if verbose >= 9: print """ SELECT DISTINCT q.id\n FROM query AS q LEFT JOIN user_query AS uq\n ON uq.id_query = q.id\n WHERE uq.id_query IS NULL AND\n q.type <> 'p' """ result = run_sql("""SELECT DISTINCT q.id FROM query AS q LEFT JOIN user_query AS uq ON uq.id_query = q.id WHERE uq.id_query IS NULL AND q.type <> 'p'""") if verbose >= 9: print result # delete queries one by one if verbose >= 9: print """ DELETE FROM query WHERE id = 'TRAVERSE LAST RESULT \n""" for (id_user, ) in result: delcount['query'] += run_sql("""DELETE FROM query WHERE id = %s""" % (id_user, )) # 3 - DELETE BASKETS NOT OWNED BY ANY USER if verbose: print "- deleting baskets not owned by any user" # select basket ids if verbose >= 9: print """ SELECT ub.id_basket\n FROM user_basket AS ub LEFT JOIN user AS u\n ON u.id = ub.id_user\n WHERE u.id IS NULL""" result = run_sql("""SELECT ub.id_basket FROM user_basket AS ub LEFT JOIN user AS u ON u.id = ub.id_user WHERE u.id IS NULL""") if verbose >= 9: print result # delete from user_basket and basket one by one if verbose >= 9: print """ DELETE FROM user_basket WHERE id_basket = 'TRAVERSE LAST RESULT' """ print """ DELETE FROM basket WHERE id = 'TRAVERSE LAST RESULT' \n""" for (id_basket, ) in result: delcount['user_basket'] += run_sql("""DELETE FROM user_basket WHERE id_basket = %s""" % (id_basket, )) delcount['basket'] += run_sql("""DELETE FROM basket WHERE id = %s""" % (id_basket, )) # 4 - DELETE ALERTS NOT OWNED BY ANY USER if verbose: print '- deleting alerts not owned by any user' # select user ids in uqb that reference non-existent users if verbose >= 9: print """SELECT DISTINCT uqb.id_user FROM user_query_basket AS uqb LEFT JOIN user AS u ON uqb.id_user = u.id WHERE u.id IS NULL""" result = run_sql("""SELECT DISTINCT uqb.id_user FROM user_query_basket AS uqb LEFT JOIN user AS u ON uqb.id_user = u.id WHERE u.id IS NULL""") if verbose >= 9: print result # delete all these entries for (id_user, ) in result: if verbose >= 9: print """DELETE FROM user_query_basket WHERE id_user = 'TRAVERSE LAST RESULT """ delcount['user_query_basket'] += run_sql("""DELETE FROM user_query_basket WHERE id_user = %s """ % (id_user, )) # PRINT STATISTICS if verbose: print """\nSTATISTICS - DELETED DATA: """ print """- %7s sessions.""" % (delcount['session'], ) print """- %7s users.""" % (delcount['user'], ) print """- %7s user_queries.""" % (delcount['user_query'], ) print """- %7s queries.""" % (delcount['query'], ) print """- %7s baskets.""" % (delcount['basket'], ) print """- %7s user_baskets.""" % (delcount['user_basket'], ) print """- %7s user_query_baskets.""" % (delcount['user_query_basket'], ) print """\nGUEST USER SESSIONS GARBAGE COLLECTOR FINISHED: %s. """ % (time.ctime(), ) print """\nEXECUTION LASTED %.2f SECONDS.\n""" % (time.time() - timelimit, ) return def usage(exitcode=1, msg=""): """Prints usage info.""" if msg: sys.stderr.write("Error: %s.\n" % msg) sys.stderr.write("Usage: %s [options]\n" % sys.argv[0]) sys.stderr.write("General options:\n") sys.stderr.write(" -h, --help \t\t Print this help.\n") sys.stderr.write(" -V, --version \t\t Print version information.\n") sys.stderr.write(" -v, --verbose=LEVEL \t Verbose level (0=min, 1=default, 9=max).\n") sys.exit(exitcode) def test_insertdata(): """insert testdata for the garbage collector. something will be deleted, other data kept. test_checkdata() checks if the remains are correct.""" test_deletedata_nooutput() print 'insert into session 6' for (key, uid) in [('23A', 2000), ('24B', 2100), ('25C', 2200), ('26D', 2300)]: run_sql("""INSERT INTO session (session_key, session_expiry, uid) values ('%s', %d, %s) """ % (key, time.time(), uid)) for (key, uid) in [('27E', 2400), ('28F', 2500)]: run_sql("""INSERT INTO session (session_key, session_expiry, uid) values ('%s', %d, %s) """ % (key, time.time()+20000, uid)) print 'insert into user 6' for id in range(2000, 2600, 100): run_sql("""INSERT INTO user (id, email) values (%s, '') """ % (id, )) print 'insert into user_query 6' for (id_user, id_query) in [(2000, 155), (2100, 231), (2200, 155), (2300, 574), (2400, 155), (2500, 988)]: run_sql("""INSERT INTO user_query (id_user, id_query) values (%s, %s) """ % (id_user, id_query)) print 'insert into query 4' for (id, urlargs) in [(155, 'p=cern'), (231, 'p=muon'), (574, 'p=physics'), (988, 'cc=Atlantis+Institute+of+Science&as=0&p=')]: run_sql("""INSERT INTO query (id, type, urlargs) values (%s, 'r', '%s') """ % (id, urlargs)) print 'insert into basket 4' for (id, name) in [(6, 'general'), (7, 'physics'), (8, 'cern'), (9, 'diverse')]: run_sql("""INSERT INTO basket (id, name, public) values (%s, '%s', 'n')""" % (id, name)) print 'insert into user_basket 4' for (id_user, id_basket) in [(2000, 6), (2200, 7), (2200, 8), (2500, 9)]: run_sql("""INSERT INTO user_basket (id_user, id_basket) values (%s, %s) """ % (id_user, id_basket)) print 'insert into user_query_basket 2' for (id_user, id_query, id_basket) in [(2200, 155, 6), (2500, 988, 9)]: run_sql("""INSERT INTO user_query_basket (id_user, id_query, id_basket) values (%s, %s, %s) """ % (id_user, id_query, id_basket)) def test_deletedata(): """deletes all the testdata inserted in the insert function. outputs how many entries are deleted""" print 'delete from session', print run_sql("DELETE FROM session WHERE uid IN (2000,2100,2200,2300,2400,2500) ") print 'delete from user', print run_sql("DELETE FROM user WHERE id IN (2000,2100,2200,2300,2400,2500) ") print 'delete from user_query', print run_sql("DELETE FROM user_query WHERE id_user IN (2000,2100,2200,2300,2400,2500) OR id_query IN (155,231,574,988) ") print 'delete from query', print run_sql("DELETE FROM query WHERE id IN (155,231,574,988) ") print 'delete from basket', print run_sql("DELETE FROM basket WHERE id IN (6,7,8,9) ") print 'delete from user_basket', print run_sql("DELETE FROM user_basket WHERE id_basket IN (6,7,8,9) OR id_user IN (2000, 2200, 2500) ") print 'delete from user_query_basket', print run_sql("DELETE FROM user_query_basket WHERE id_user IN (2200, 2500) ") def test_deletedata_nooutput(): """same as test_deletedata without output.""" run_sql("DELETE FROM session WHERE uid IN (2000,2100,2200,2300,2400,2500) ") run_sql("DELETE FROM user WHERE id IN (2000,2100,2200,2300,2400,2500) ") run_sql("DELETE FROM user_query WHERE id_user IN (2000,2100,2200,2300,2400,2500) OR id_query IN (155,231,574,988) ") run_sql("DELETE FROM query WHERE id IN (155,231,574,988) ") run_sql("DELETE FROM basket WHERE id IN (6,7,8,9) ") run_sql("DELETE FROM user_basket WHERE id_basket IN (6,7,8,9) OR id_user IN (2000, 2200, 2500) ") run_sql("DELETE FROM user_query_basket WHERE id_user IN (2200, 2500) ") def test_showdata(): print '\nshow test data:' print '\n- select * from session:' for r in run_sql("SELECT * FROM session WHERE session_key IN ('23A','24B','25C','26D','27E','28F') "): print r print '\n- select * from user:' for r in run_sql("SELECT * FROM user WHERE email = '' AND id IN (2000,2100,2200,2300,2400,2500) "): print r print '\n- select * from user_query:' for r in run_sql("SELECT * FROM user_query WHERE id_user IN (2000,2100,2200,2300,2400,2500) "): print r print '\n- select * from query:' for r in run_sql("SELECT * FROM query WHERE id IN (155,231,574,988) "): print r print '\n- select * from basket:' for r in run_sql("SELECT * FROM basket WHERE id IN (6,7,8,9) "): print r print '\n- select * from user_basket:' for r in run_sql("SELECT * FROM user_basket WHERE id_basket IN (6,7,8,9)"): print r print '\n- select * from user_query_basket:' for r in run_sql("SELECT * FROM user_query_basket WHERE id_basket IN (6,7,8,9) "): print r def test_checkdata(): """checks wether the data in the database is correct after the garbage collector has run. test_insertdata must have been run followed by the gc for this to be true.""" result = run_sql("SELECT DISTINCT session_key FROM session WHERE session_key IN ('23A','24B','25C','26D','27E','28F') ") if len(result) != 2: return 0 for r in [('27E', ), ('28F', )]: if r not in result: return 0 result = run_sql("SELECT id FROM user WHERE email = '' AND id IN (2000,2100,2200,2300,2400,2500) ") if len(result) != 2: return 0 for r in [(2400, ), (2500, )]: if r not in result: return 0 result = run_sql("SELECT DISTINCT id_user FROM user_query WHERE id_user IN (2000,2100,2200,2300,2400,2500) ") if len(result) != 2: return 0 for r in [(2400, ), (2500, )]: if r not in result: return 0 result = run_sql("SELECT id FROM query WHERE id IN (155,231,574,988) ") if len(result) != 2: return 0 for r in [(155, ), (988, )]: if r not in result: return 0 result = run_sql("SELECT id FROM basket WHERE id IN (6,7,8,9) ") if len(result) != 1: return 0 for r in [(9, )]: if r not in result: return 0 result = run_sql("SELECT id_user, id_basket FROM user_basket WHERE id_basket IN (6,7,8,9)") if len(result) != 1: return 0 for r in [(2500, 9)]: if r not in result: return 0 result = run_sql("SELECT id_user, id_query, id_basket FROM user_query_basket WHERE id_basket IN (6,7,8,9) ") if len(result) != 1: return 0 for r in [(2500, 988, 9)]: if r not in result: return 0 return 1 def test_runtest_guest_user_garbage_collector(): """a test to see if the garbage collector works correctly.""" test_insertdata() test_showdata() guest_user_garbage_collector(verbose=9) test_showdata() if test_checkdata(): print '\n\nGARBAGE COLLECTOR CLEANED UP THE CORRECT DATA \n\n' else: print '\n\nERROR ERROR ERROR - WRONG DATA CLEANED - ERROR ERROR ERROR \n\n' test_deletedata_nooutput() return def main(): """CLI to the session garbage collector. Gets arguments from sys.argv and dispatch to guest_user_garbage_collector""" options = {} options['verbose'] = 1 try: opts, args = getopt.getopt(sys.argv[1:], "hVv:", ["help", "version", "verbose="]) except getopt.GetoptError, e: usage(e) try: for opt in opts: if opt[0] in ['-h', '--help']: usage(0) elif opt[0] in ['-V', '--version']: print __version__ sys.exit(0) elif opt[0] in ['-v', '--verbose']: options['verbose'] = int(opt[1]) except StandardError, e: usage(e) guest_user_garbage_collector(**options) return if __name__ == '__main__': main() diff --git a/modules/websession/lib/webaccount.py b/modules/websession/lib/webaccount.py index 8b96848c1..cddb479cd 100644 --- a/modules/websession/lib/webaccount.py +++ b/modules/websession/lib/webaccount.py @@ -1,244 +1,245 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import sys import string import cgi -from config import * -from webpage import page -from dbquery import run_sql -from webuser import getUid,isGuestUser, get_user_preferences, set_user_preferences -from access_control_admin import acc_findUserRoleActions -from access_control_config import CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS, CFG_EXTERNAL_AUTHENTICATION -from messages import gettext_set_language +from cdsware.config import * +from cdsware.webpage import page +from cdsware.dbquery import run_sql +from cdsware.webuser import getUid,isGuestUser, get_user_preferences, set_user_preferences +from cdsware.access_control_admin import acc_findUserRoleActions +from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS, CFG_EXTERNAL_AUTHENTICATION -import template -websession_templates = template.load('websession') +from cdsware.messages import gettext_set_language + +import cdsware.template +websession_templates = cdsware.template.load('websession') imagesurl = "%s/img" % weburl # perform_info(): display the main features of CDS personalize def perform_info(req, ln): out = "" uid = getUid(req) return websession_templates.tmpl_account_info( ln = ln, uid = uid, guest = isGuestUser(uid), cfg_cern_site = cfg_cern_site, ); def perform_youradminactivities(uid, ln): """Return text for the `Your Admin Activities' box. Analyze whether user UID has some admin roles, and if yes, then print suitable links for the actions he can do. If he's not admin, print a simple non-authorized message.""" your_role_actions = acc_findUserRoleActions(uid) your_roles = [] your_admin_activities = [] guest = isGuestUser(uid) for (role, action) in your_role_actions: if role not in your_roles: your_roles.append(role) if action not in your_admin_activities: your_admin_activities.append(action) if "superadmin" in your_roles: for action in ["cfgbibformat", "cfgbibharvest", "cfgbibrank", "cfgbibindex", "cfgwebaccess", "cfgwebsearch", "cfgwebsubmit"]: if action not in your_admin_activities: your_admin_activities.append(action) return websession_templates.tmpl_account_adminactivities( ln = ln, uid = uid, guest = guest, roles = your_roles, activities = your_admin_activities, weburl = weburl, ) # perform_display_account(): display a dynamic page that shows the user's account def perform_display_account(req,data,bask,aler,sear,msgs,ln): # load the right message language _ = gettext_set_language(ln) uid = getUid(req) #your account if isGuestUser(uid): user = "guest" accBody = _("""You are logged in as guest. You may want to login as a regular user""") + "

" bask=aler=msgs= _("""The guest users need to register first""") sear= _("No queries found") else: user = data[0] accBody = websession_templates.tmpl_account_body( ln = ln, user = user, ) return websession_templates.tmpl_account_page( ln = ln, weburl = weburl, accBody = accBody, baskets = bask, alerts = aler, searches = sear, messages = msgs, administrative = perform_youradminactivities(uid, ln) ) # template_account() : it is a template for print each of the options from the user's account def template_account(title, body, ln): return websession_templates.tmpl_account_template( ln = ln, title = title, body = body ) # warning_guest_user(): It returns an alert message,showing that the user is a guest user and should log into the system def warning_guest_user(type, ln=cdslang): # load the right message language _ = gettext_set_language(ln) return websession_templates.tmpl_warning_guest_user( ln = ln, type = type, ) ## perform_delete():delete the account of the user, not implement yet def perform_delete(ln): return websession_templates.tmpl_account_delete(ln = ln) ## perform_set(email,password): edit your account parameters, email and password. def perform_set(email,password, ln): try: uid = run_sql("SELECT id FROM user where email=%s", (email,)) uid = uid[0][0] except: uid = 0 CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS_LOCAL = CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS prefs = get_user_preferences(uid) if CFG_EXTERNAL_AUTHENTICATION.has_key(prefs['login_method']) and CFG_EXTERNAL_AUTHENTICATION[prefs['login_method']][1] != True: CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS_LOCAL = 3 out = websession_templates.tmpl_user_preferences( ln = ln, email = email, email_disabled = (CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS_LOCAL >= 2), password = password, password_disabled = (CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS_LOCAL >= 3) ) if len(CFG_EXTERNAL_AUTHENTICATION) >= 1: try: uid = run_sql("SELECT id FROM user where email=%s", (email,)) uid = uid[0][0] except: uid = 0 prefs = get_user_preferences(uid) current_login_method = prefs['login_method'] methods = CFG_EXTERNAL_AUTHENTICATION.keys() methods.sort() out += websession_templates.tmpl_user_external_auth( ln = ln, methods = methods, current = current_login_method, method_disabled = (CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 4) ) return out ## create_register_page_box(): register a new account def create_register_page_box(referer='', ln=cdslang): return websession_templates.tmpl_register_page( referer = referer, ln = ln, level = CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS, supportemail = supportemail, cdsname = cdsname ) ## create_login_page_box(): ask for the user's email and password, for login into the system def create_login_page_box(referer='', ln=cdslang): internal = None for system in CFG_EXTERNAL_AUTHENTICATION.keys(): if not CFG_EXTERNAL_AUTHENTICATION[system][0]: internal = system break register_available = CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS <= 1 and internal methods = CFG_EXTERNAL_AUTHENTICATION.keys() methods.sort() selected = '' for method in methods: if CFG_EXTERNAL_AUTHENTICATION[method][1] == True: selected = method break return websession_templates.tmpl_login_form( ln = ln, referer = referer, internal = internal, register_available = register_available, methods = methods, selected_method = selected, supportemail = supportemail, ) # perform_logout: display the message of not longer authorized, def perform_logout(req, ln): return websession_templates.tmpl_account_logout(ln = ln) #def perform_lost: ask the user for his email, in order to send him the lost password def perform_lost(ln): return websession_templates.tmpl_lost_password_form( ln = ln, msg = websession_templates.tmpl_lost_password_message(ln = ln, supportemail = supportemail), ) # perform_emailSent(email): confirm that the password has been emailed to 'email' address def perform_emailSent(email, ln): return websession_templates.tmpl_account_emailSent(ln = ln, email = email) # peform_emailMessage : display a error message when the email introduced is not correct, and sugest to try again def perform_emailMessage(eMsg, ln): return websession_templates.tmpl_account_emailMessage( ln = ln, msg = eMsg ) # perform_back(): template for return to a previous page, used for login,register and setting def perform_back(mess,act,linkname='', ln='en'): if not linkname: linkname = act return websession_templates.tmpl_back_form( ln = ln, message = mess, act = act, link = linkname, ) diff --git a/modules/websession/lib/websession.py b/modules/websession/lib/websession.py index 3401d5da1..41518b492 100644 --- a/modules/websession/lib/websession.py +++ b/modules/websession/lib/websession.py @@ -1,151 +1,151 @@ ## $Id$ ## CDSware Web Session utilities, implementing persistence. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ Classes necessary for using in CDSware, as a complement of session, which adds persistence to sessions by using a MySQL table. Consists of the following classes: - SessionNotInDb: Exception to be raised when a session doesn't exit - pSession(Session): Specialisation of the class Session which adds persistence to session - pSessionMapping: Implements only the necessary methods to make it work with the session manager """ import cPickle import time -from dbquery import run_sql, blob_to_string -import session -from session import Session from UserDict import UserDict +from cdsware.dbquery import run_sql, blob_to_string +from cdsware.session import Session + class SessionNotInDb(Exception): """Exception to be raised when a requested session doesn't exist in the DB """ pass class pSession(Session): """Specialisation of the class Session which adds persistence to sessions by using a MySQL table (it pickles itself into the corresponding row of the table). The class provides methods to save and retrieve an instance to/from the DB and to access the main session attributes (uid). The table in the DB must have the following structure: session_key - text - unique uid - int session_object - blob Attributes: __tableName -- (string) name of the table in the DB where the sessions are going to be stored __uid -- (int) user identifier who initiated the session __dirty -- (bool) flag indicating whether the session has been modified (and therefore needs to be saved back to the DB) or not """ __tableName = "session" __ExpireTime = 1050043127 def __init__( self, request, id, uid=-1 ): Session.__init__( self, request, id ) self.__uid = uid self.__dirty = 0 def is_dirty( self ): return self.__dirty def getUid( self ): return self.__uid def setUid( self, newUid ): self.__uid = int(newUid) self.__dirty = 1 def retrieve( cls, sessionId ): """method for retrieving a session from the DB for the given id. If the id has no corresponding session an exception is raised """ sql = "select session_object from %s where session_key='%s'"%(cls.__tableName, sessionId) res = run_sql(sql) if len(res)==0: raise SessionNotInDb("Session %s doesn't exist"%sessionId) s = cPickle.loads(blob_to_string(res[0][0])) return s retrieve = classmethod( retrieve ) def __getRepr( self ): return cPickle.dumps( self ) def save( self ): """method that tries to insert the session as NEW in the DB. If this fails (giving an integrity error) it means the session already exists there and it must be updated, so it performs the corresponding SQL update """ repr = self.__getRepr().replace("'", "\\\'") repr = repr.replace('"', '\\\"') try: sql = 'INSERT INTO %s (session_key, session_expiry, session_object, uid) values ("%s","%s","%s","%s")' % \ (self.__class__.__tableName, self.id, self.get_access_time()+60*60*24*2, repr, int(self.getUid())) res = run_sql(sql) # FIXME. WARNING!! it should be "except IntegrityError, e:" but this will # create a dependency on package MySQL. I'll leave it like this for # the time being but this can lead to Exception masking except Exception, e: sql = 'UPDATE %s SET uid=%s, session_expiry=%s, session_object="%s" WHERE session_key="%s"' % \ (self.__class__.__tableName, int(self.getUid()), self.get_access_time()+60*60*24*2, repr, self.id) res = run_sql(sql) self.__dirty=0 class pSessionMapping(UserDict): """Only the necessary methods to make it work with the session manager have been implemented. """ def __includeItemFromDB(self, key): if key not in self.data.keys(): try: s = pSession.retrieve( key ) self.data[key] = s except SessionNotInDb, e: pass def __setitem__(self, key, v): """when a session is added or updated in the dictionary it means it must be updated within the DB """ v.save() UserDict.__setitem__(self, key, v) def __getitem__(self, key): """in order not to have to load all the sessions in the dictionary (normally only a single session is needed on each web request) when a session is requested the object looks to see if it is in the dictionary (memory) and if not it tries to retrieve it from the DB, puts it in the dictionary and returns the requested item. If the session doesn't exist a normal KeyError exception is raised """ self.__includeItemFromDB( key ) return UserDict.__getitem__(self, key) def has_key(self, key): """same as for "__getitem__": it checks whether the session exist in the local dictionary or in the DB. """ self.__includeItemFromDB( key ) return UserDict.has_key( self, key ) diff --git a/modules/websession/lib/websession_templates.py b/modules/websession/lib/websession_templates.py index c1a332227..705ae1258 100644 --- a/modules/websession/lib/websession_templates.py +++ b/modules/websession/lib/websession_templates.py @@ -1,796 +1,796 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import urllib import time import cgi import gettext import string import locale -from config import * +from cdsware.config import * from cdsware.messages import gettext_set_language class Template: def tmpl_lost_password_message(self, ln, supportemail): """ Defines the text that will be displayed on the 'lost password' page Parameters: - 'ln' *string* - The language to display the interface in - 'supportemail' *string* - The email of the support team """ # load the right message language _ = gettext_set_language(ln) return _("If you have lost password for your CERN Document Server internal account, then please enter your email address below and the lost password will be emailed to you.") +\ "

" +\ _("Note that if you have been using an external login system (such as CERN NICE), then we cannot do anything and you have to ask there.") +\ _("Alternatively, you can ask %s to change your login system from external to internal.") % ("""%(email)s""" % { 'email' : supportemail }) +\ "

" def tmpl_back_form(self, ln, message, act, link): """ A standard one-message-go-back-link page. Parameters: - 'ln' *string* - The language to display the interface in - 'message' *string* - The message to display - 'act' *string* - The action to accomplish when going back - 'link' *string* - The link text """ out = """
%(message)s %(link)s
"""% { 'message' : message, 'act' : act, 'link' : link } return out def tmpl_user_preferences(self, ln, email, email_disabled, password, password_disabled): """ Displays a form for the user to change his email/password. Parameters: - 'ln' *string* - The language to display the interface in - 'email' *string* - The email of the user - 'email_disabled' *boolean* - If the user has the right to edit his email - 'password' *string* - The password of the user - 'password_disabled' *boolean* - If the user has the right to edit his password """ # load the right message language _ = gettext_set_language(ln) out = """

Edit parameters

%(change_user_pass)s

%(new_email)s:
(%(mandatory)s)

%(example)s: johndoe@example.com
%(new_password)s:
(%(optional)s)

%(note)s: %(password_note)s
%(retype_password)s:
   
""" % { 'change_user_pass' : _("If you want to change your email address or password, please set new values in the form below."), 'new_email' : _("New email address"), 'mandatory' : _("mandatory"), 'example' : _("Example"), 'new_password' : _("New password"), 'optional' : _("optional"), 'note' : _("Note"), 'password_note' : _("The password phrase may contain punctuation, spaces, etc."), 'retype_password' : _("Retype password"), 'set_values' : _("Set new values"), 'email' : email, 'email_disabled' : email_disabled and "disabled" or "", 'password' : password, 'password_disabled' : password_disabled and "disabled" or "", } return out def tmpl_user_external_auth(self, ln, methods, current, method_disabled): """ Displays a form for the user to change his authentication method. Parameters: - 'ln' *string* - The language to display the interface in - 'methods' *array* - The methods of authentication - 'method_disabled' *boolean* - If the user has the right to change this - 'current' *string* - The currently selected method """ # load the right message language _ = gettext_set_language(ln) out = """
%(edit_method)s

%(explain_method)s:

%(select_method)s: """ % { 'edit_method' : _("Edit login method"), 'explain_method' : _("Please select which login method you would like to use to authenticate yourself"), 'select_method' : _("Select method"), } for system in methods: out += """%(system)s
""" % { 'system' : system, 'disabled' : method_disabled and "disabled" or "", 'selected' : current == system and "disabled" or "", } out += """
""" % { 'select_method' : _("Select method"), } return out def tmpl_lost_password_form(self, ln, msg): """ Displays a form for the user to ask for his password sent by email. Parameters: - 'ln' *string* - The language to display the interface in - 'msg' *string* - Explicative message on top of the form. """ # load the right message language _ = gettext_set_language(ln) out = """
%(msg)s
%(email)s:
""" % { 'msg' : msg, 'email' : _("Email address"), 'send' : _("Send lost password"), } return out def tmpl_account_info(self, ln, uid, guest, cfg_cern_site): """ Displays the account information Parameters: - 'ln' *string* - The language to display the interface in - 'uid' *string* - The user id - 'guest' *boolean* - If the user is guest - 'cfg_cern_site' *boolean* - If the site is a CERN site """ # load the right message language _ = gettext_set_language(ln) out = """

%(account_offer)s

""" % { 'account_offer' : _("The CDS Search offers you a possibility to personalize the interface, to set up your own personal library of documents, or to set up an automatic alert query that would run periodically and would notify you of search results by email."), } if not guest: out += """
%(your_settings)s
%(change_account)s""" % { 'your_settings' : _("Your Settings"), 'change_account' : _("Set or change your account Email address or password. Specify your preferences about the way the interface looks like.") } out += """
%(your_searches)s
%(search_explain)s
%(your_baskets)s
%(basket_explain)s""" % { 'your_searches' : _("Your Searches"), 'search_explain' : _("View all the searches you performed during the last 30 days."), 'your_baskets' : _("Your Baskets"), 'basket_explain' : _("With baskets you can define specific collections of items, store interesting records you want to access later or share with others."), } if guest: out += self.tmpl_warning_guest_user(ln = ln, type = "baskets") out += """
%(your_alerts)s
%(explain_alerts)s""" % { 'your_alerts' : _("Your Alerts"), 'explain_alerts' : _("Subscribe to a search which will be run periodically by our service. The result can be sent to you via Email or stored in one of your baskets."), } if guest: out += self.tmpl_warning_guest_user(type="alerts", ln = ln) if cfg_cern_site: out += """
%(your_loans)s
%(explain_loans)s""" % { 'your_loans' : _("Your Loans"), 'explain_loans' : _("Check out book you have on load, submit borrowing requests, etc. Requires CERN ID."), } out += """
""" return out def tmpl_warning_guest_user(self, ln, type): """ Displays a warning message about the specified type Parameters: - 'ln' *string* - The language to display the interface in - 'type' *string* - The type of data that will get lost in case of guest account """ # load the right message language _ = gettext_set_language(ln) msg= _("""You are logged in as a guest user, so your %s will disappear at the end of the current session. If you wish you can login or register here.""") % type return """
%s
""" % msg def tmpl_account_body(self, ln, user): """ Displays the body of the actions of the user Parameters: - 'ln' *string* - The language to display the interface in - 'user' *string* - The user name """ # load the right message language _ = gettext_set_language(ln) return _("""You are logged in as %s. You may want to a) logout; b) edit your account settings.""") % user + "

" def tmpl_account_template(self, title, body, ln): """ Displays a block of the your account page Parameters: - 'ln' *string* - The language to display the interface in - 'title' *string* - The title of the block - 'body' *string* - The body of the block """ out ="" out +=""" """ % (title, body) return out def tmpl_account_page(self, ln, weburl, accBody, baskets, alerts, searches, messages, administrative): """ Displays the your account page Parameters: - 'ln' *string* - The language to display the interface in - 'weburl' *string* - The URL of cdsware - 'accBody' *string* - The body of the heading block - 'baskets' *string* - The body of the baskets block - 'alerts' *string* - The body of the alerts block - 'searches' *string* - The body of the searches block - 'messages' *string* - The body of the messages block - 'administrative' *string* - The body of the administrative block """ # load the right message language _ = gettext_set_language(ln) out = "" out += self.tmpl_account_template(_("Your Account"), accBody, ln) #your baskets out += self.tmpl_account_template(_("Your Baskets"), baskets, ln) out += self.tmpl_account_template(_("Your Messages"), messages, ln) out += self.tmpl_account_template(_("Your Alert Searches"), alerts, ln) out += self.tmpl_account_template(_("Your Searches"), searches, ln) out += self.tmpl_account_template(_("Your Submissions"), _("You can consult the list of %(your_submissions)s and inquire about their status.") % { 'your_submissions' : """%(your_sub)s""" % { 'weburl' : weburl, 'your_sub' : _("your submissions") } }, ln) out += self.tmpl_account_template(_("Your Approvals"), _("You can consult the list of %(your_approvals)s with the documents you approved or refereed.") % { 'your_approvals' : """ %(your_app)s""" % { 'weburl' : weburl, 'your_app' : _("your approvals"), } }, ln) out += self.tmpl_account_template(_("Your Administrative Activities"), administrative, ln) return out def tmpl_account_emailMessage(self, ln, msg): """ Displays a link to retrieve the lost password Parameters: - 'ln' *string* - The language to display the interface in - 'msg' *string* - Explicative message on top of the form. """ # load the right message language _ = gettext_set_language(ln) out ="" out +=""" %(msg)s %(try_again)s """ % { 'msg' : msg, 'try_again' : _("Try again") } return out def tmpl_account_emailSent(self, ln, email): """ Displays a confirmation message for an email sent Parameters: - 'ln' *string* - The language to display the interface in - 'email' *string* - The email to which the message has been sent """ # load the right message language _ = gettext_set_language(ln) out ="" out += _("Okay, password has been emailed to %s") % email return out def tmpl_account_delete(self, ln): """ Displays a confirmation message about deleting the account Parameters: - 'ln' *string* - The language to display the interface in """ # load the right message language _ = gettext_set_language(ln) out = "

" + _("""Deleting your account""") return out def tmpl_account_logout(self, ln): """ Displays a confirmation message about logging out Parameters: - 'ln' *string* - The language to display the interface in """ # load the right message language _ = gettext_set_language(ln) out = "" out += _("""You are no longer recognized. If you wish you can login here.""") return out def tmpl_login_form(self, ln, referer, internal, register_available, methods, selected_method, supportemail): """ Displays a login form Parameters: - 'ln' *string* - The language to display the interface in - 'referer' *string* - The referer URL - will be redirected upon after login - 'internal' *boolean* - If we are producing an internal authentication - 'register_available' *boolean* - If users can register freely in the system - 'methods' *array* - The available authentication methods - 'selected_method' *string* - The default authentication method - 'supportemail' *string* - The email of the support team """ # load the right message language _ = gettext_set_language(ln) out = "

%(please_login)s
" % { 'please_login' : _("If you already have an account, please login using the form below.") } if register_available: out += _("""If you don't own an account yet, please register an internal account.""") else: out += _("""It is not possible to create an account yourself. Contact %s if you want an account.""") % ( """%(email)s""" % { 'email' : supportemail } ) out += """

""" if len(methods) > 1: # more than one method, must make a select login_select = """" out += """ """ % { 'login_title' : _("Login via:"), 'login_select' : login_select, } else: # only one login method available out += """""" % (methods[0]) out += """
%(login_title)s %(login_select)s
%(username)s:
%(password)s:
""" % { 'referer' : cgi.escape(referer), 'username' : _("Username"), 'password' : _("Password"), 'login' : _("login"), } if internal: out += """   (%(lost_pass)s)""" % { 'lost_pass' : _("Lost your password?") } out += """
""" return out def tmpl_register_page(self, ln, referer, level, supportemail, cdsname): """ Displays a login form Parameters: - 'ln' *string* - The language to display the interface in - 'referer' *string* - The referer URL - will be redirected upon after login - 'level' *int* - Login level (0 - all access, 1 - accounts activated, 2+ - no self-registration) - 'supportemail' *string* - The email of the support team - 'cdsname' *string* - The name of the installation """ # load the right message language _ = gettext_set_language(ln) out = "" if level <= 1: out += _("""Please enter your email address and desired password:""") if level == 1: out += _("The account will not be possible to use before it has been verified and activated.") out += """
%(email_address)s:
(%(mandatory)s)

%(example)s: johndoe@example.com
%(password)s:
(%(optional)s)

%(note)s: %(password_contain)s
%(retype)s:

%(note)s: %(explain_acc)s""" % { 'referer' : cgi.escape(referer), 'email_address' : _("Email address"), 'password' : _("Password"), 'mandatory' : _("mandatory"), 'optional' : _("optional"), 'example' : _("Example"), 'note' : _("Note"), 'password_contain' : _("The password phrase may contain punctuation, spaces, etc."), 'retype' : _("Retype Password"), 'register' : _("register"), 'explain_acc' : _("Please do not use valuable passwords such as your Unix, AFS or NICE passwords with this service. Your email address will stay strictly confidential and will not be disclosed to any third party. It will be used to identify you for personal services of %s. For example, you may set up an automatic alert search that will look for new preprints and will notify you daily of new arrivals by email.") % cdsname, } return out def tmpl_account_adminactivities(self, ln, weburl, uid, guest, roles, activities): """ Displays the admin activities block for this user Parameters: - 'ln' *string* - The language to display the interface in - 'weburl' *string* - The address of the site - 'uid' *string* - The used id - 'guest' *boolean* - If the user is guest - 'roles' *array* - The current user roles - 'activities' *array* - The user allowed activities """ # load the right message language _ = gettext_set_language(ln) out = "" # guest condition if guest: return _("""You seem to be the guest user. You have to login first.""") # no rights condition if not roles: return "

" + _("You are not authorized to access administrative functions.") + "

" # displaying form out += "

" + _("You seem to be %s.") % string.join(roles, ", ") + " " out += _("Here are some interesting web admin links for you:") # print proposed links: activities.sort(lambda x, y: cmp(string.lower(x), string.lower(y))) for action in activities: if action == "cfgbibformat": out += """
    %s""" % (weburl, _("Configure BibFormat")) if action == "cfgbibharvest": out += """
    %s""" % (weburl, _("Configure BibHarvest")) if action == "cfgbibindex": out += """
    %s""" % (weburl, _("Configure BibIndex")) if action == "cfgbibrank": out += """
    %s""" % (weburl, _("Configure BibRank")) if action == "cfgwebaccess": out += """
    %s""" % (weburl, _("Configure WebAccess")) if action == "cfgwebsearch": out += """
    %s""" % (weburl, _("Configure WebSearch")) if action == "cfgwebsubmit": out += """
    %s""" % (weburl, _("Configure WebSubmit")) out += "
" + _("""For more admin-level activities, see the complete %(admin_area)s""") % { 'admin_area' : """%s.""" % (weburl, _("Admin Area")) } return out def tmpl_create_userinfobox(self, ln, weburl, guest, email, submitter, referee, admin): """ Displays the user block Parameters: - 'ln' *string* - The language to display the interface in - 'weburl' *string* - The address of the site - 'guest' *boolean* - If the user is guest - 'email' *string* - The user email (if known) - 'submitter' *boolean* - If the user is submitter - 'referee' *boolean* - If the user is referee - 'admin' *boolean* - If the user is admin """ # load the right message language _ = gettext_set_language(ln) out = """""" % weburl if guest: out += """%(guest_msg)s :: %(session)s :: %(alerts)s :: %(baskets)s :: %(login)s""" % { 'weburl' : weburl, 'ln' : ln, 'guest_msg' : _("guest"), 'session' : _("session"), 'alerts' : _("alerts"), 'baskets' : _("baskets"), 'login' : _("login"), } else: out += """%(email)s :: %(account)s :: %(alerts)s :: %(messages)s :: %(baskets)s :: """ % { 'email' : email, 'weburl' : weburl, 'ln' : ln, 'account' : _("account"), 'alerts' : _("alerts"), 'messages': _("messages"), 'baskets' : _("baskets"), } if submitter: out += """%(submission)s :: """ % { 'weburl' : weburl, 'ln' : ln, 'submission' : _("submissions"), } if referee: out += """%(approvals)s :: """ % { 'weburl' : weburl, 'ln' : ln, 'approvals' : _("approvals"), } if admin: out += """%(administration)s :: """ % { 'weburl' : weburl, 'ln' : ln, 'administration' : _("administration"), } out += """%(logout)s""" % { 'weburl' : weburl, 'ln' : ln, 'logout' : _("logout"), } return out diff --git a/modules/websession/lib/webuser.py b/modules/websession/lib/webuser.py index 2440cdb4a..9418320ec 100644 --- a/modules/websession/lib/webuser.py +++ b/modules/websession/lib/webuser.py @@ -1,585 +1,583 @@ ## $Id$ ## CDSware User related utilities. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """ This file implements all methods necessary for working with users and sessions in cdsware. Contains methods for logging/registration when a user log/register into the system, checking if it is a guest user or not. At the same time this presents all the stuff it could need with sessions managements, working with websession. It also contains Apache-related user authentication stuff. """ from marshal import loads,dumps from zlib import compress,decompress from dbquery import run_sql import sys import time import os import crypt import string import session import websession import smtplib import MySQLdb -from websession import pSession, pSessionMapping -from session import SessionError -from config import * -from messages import * -from access_control_engine import acc_authorize_action -from access_control_admin import acc_findUserRoleActions -from access_control_config import * -import cdsware.template -# language for gettext -from cdsware.messages import gettext_set_language +from cdsware.websession import pSession, pSessionMapping +from cdsware.session import SessionError +from cdsware.config import * +from cdsware.messages import * +from cdsware.access_control_engine import acc_authorize_action +from cdsware.access_control_admin import acc_findUserRoleActions +from cdsware.access_control_config import * -## templating engine +from cdsware.messages import gettext_set_language +import cdsware.template tmpl = cdsware.template.load('websession') def createGuestUser(): """Create a guest user , insert into user null values in all fields createGuestUser() -> GuestUserID """ if CFG_ACCESS_CONTROL_LEVEL_GUESTS == 0: return run_sql("insert into user (email, note) values ('', '1')") elif CFG_ACCESS_CONTROL_LEVEL_GUESTS >= 1: return run_sql("insert into user (email, note) values ('', '0')") def page_not_authorized(req, referer='', uid='', text='', navtrail=''): """Show error message when account is not activated""" from webpage import page if not CFG_ACCESS_CONTROL_LEVEL_SITE: title = cfg_webaccess_msgs[5] if not uid: uid = getUid(req) res = run_sql("SELECT email FROM user WHERE id=%s" % uid) if res and res[0][0]: if text: body = text else: body = "%s %s" % (cfg_webaccess_warning_msgs[9] % res[0][0], ("%s %s" % (cfg_webaccess_msgs[0] % referer, cfg_webaccess_msgs[1]))) else: if text: body = text else: body = cfg_webaccess_msgs[3] elif CFG_ACCESS_CONTROL_LEVEL_SITE == 1: title = cfg_webaccess_msgs[8] body = "%s %s" % (cfg_webaccess_msgs[7], cfg_webaccess_msgs[2]) elif CFG_ACCESS_CONTROL_LEVEL_SITE == 2: title = cfg_webaccess_msgs[6] body = "%s %s" % (cfg_webaccess_msgs[4], cfg_webaccess_msgs[2]) return page(title=title, uid=getUid(req), body=body, navtrail=navtrail) def getUid (req): """It gives the userId taking it from the cookie of the request,also has the control mechanism for the guest users, inserting in the MySql table when need it, and raise the cookie to the client. getUid(req) -> userId """ if CFG_ACCESS_CONTROL_LEVEL_SITE == 1: return 0 if CFG_ACCESS_CONTROL_LEVEL_SITE == 2: return -1 guest = 0 sm = session.MPSessionManager(pSession, pSessionMapping()) try: s = sm.get_session(req) except SessionError,e: sm.revoke_session_cookie (req) s = sm.get_session(req) userId = s.getUid() if userId == -1: # first time, so create a guest user s.setUid(createGuestUser()) userId = s.getUid() guest = 1 sm.maintain_session(req,s) if guest == 0: guest = isGuestUser(userId) if guest: if CFG_ACCESS_CONTROL_LEVEL_GUESTS == 0: return userId elif CFG_ACCESS_CONTROL_LEVEL_GUESTS >= 1: return -1 else: res = run_sql("SELECT note FROM user WHERE id=%s" % userId) if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS == 0: return userId elif CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 1 and res and res[0][0] in [1, "1"]: return userId else: return -1 def setUid(req,uid): """It sets the userId into the session, and raise the cookie to the client. """ sm = session.MPSessionManager(pSession, pSessionMapping()) try: s = sm.get_session(req) except SessionError,e: sm.revoke_session_cookie (req) s = sm.get_session(req) s.setUid(uid) sm.maintain_session(req,s) return uid def isGuestUser(uid): """It Checks if the userId corresponds to a guestUser or not isGuestUser(uid) -> boolean """ out = 1 try: res = run_sql("select email from user where id=%s", (uid,)) if res: if res[0][0]: out = 0 except: pass return out def isUserSubmitter(uid): u_email = get_email(uid) res = run_sql("select * from sbmSUBMISSIONS where email=%s",(u_email,)) if len(res) > 0: return 1 else: return 0 def isUserReferee(uid): res = run_sql("select sdocname from sbmDOCTYPE") for row in res: doctype = row[0] categ = "*" (auth_code, auth_message) = acc_authorize_action(uid, "referee",doctype=doctype, categ=categ) if auth_code == 0: return 1 res2 = run_sql("select sname from sbmCATEGORIES where doctype=%s",(doctype,)) for row2 in res2: categ = row2[0] (auth_code, auth_message) = acc_authorize_action(uid, "referee",doctype=doctype, categ=categ) if auth_code == 0: return 1 return 0 def isUserAdmin(uid): "Return 1 if the user UID has some admin rights; 0 otherwise." out = 0 if acc_findUserRoleActions(uid): out = 1 return out def checkRegister(user,passw): """It checks if the user is register with the correct password checkRegister(user,passw) -> boolean """ query_result = run_sql("select * from user where email=%s and password=%s", (user,passw)) if len(query_result)> 0 : return 0 return 1 def userOnSystem(user): """It checks if the user is registered already on the system """ query_register = run_sql("select * from user where email=%s", (user,)) if len(query_register)>0: return 1 return 0 def checkemail(email): """Check whether the EMAIL address supplied by the user is valid. At the moment we just check whether it contains '@' and whether it doesn't contain blanks. checkemail(email) -> boolean """ if (string.find(email, "@") <= 0) or (string.find(email, " ") > 0): return 0 elif CFG_ACCESS_CONTROL_LIMIT_REGISTRATION_TO_DOMAIN: if not email.endswith(CFG_ACCESS_CONTROL_LIMIT_REGISTRATION_TO_DOMAIN): return 0 return 1 def getDataUid(req,uid): """It takes the email and password from a given userId, from the MySQL database, if don't exist it just returns guest values for email and password getDataUid(req,uid) -> [email,password] """ email = 'guest' password = 'none' query_result = run_sql("select email, password from user where id=%s", (uid,)) if len(query_result)>0: email = query_result[0][0] password = query_result[0][1] if password == None or email =='': email = 'guest' list = [email] +[password] return list def registerUser(req,user,passw): """It registers the user, inserting into the user table of MySQL database, the email and the pasword of the user. It returns 1 if the insertion is done, 0 if there is any failure with the email and -1 if the user is already on the data base registerUser(req,user,passw) -> int """ if userOnSystem(user) and user !='': return -1 if checkRegister(user,passw) and checkemail(user): if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS == 0: activated = 1 elif CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS == 1: activated = 0 elif CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 2: return 0 user_preference = get_default_user_preferences() setUid(req, run_sql("INSERT INTO user (email, password, note, settings) VALUES (%s,%s,%s,%s)", (user,passw,activated,serialize_via_marshal(user_preference),))) if CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_NEW_ACCOUNT: sendNewUserAccountWarning(user, user, passw) if CFG_ACCESS_CONTROL_NOTIFY_ADMIN_ABOUT_NEW_ACCOUNTS: sendNewAdminAccountWarning(user, adminemail) return 1 return 0 def updateDataUser(req,uid,email,password): """It updates the data from the user. It is used when a user set his email and password """ if email =='guest': return 0 if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 2: query_result = run_sql("update user set password=%s where id=%s", (password,uid)) else: query_result = run_sql("update user set email=%s,password=%s where id=%s", (email,password,uid)) return 1 def loginUser(req, p_email,p_pw, login_method): """It is a first simple version for the authentication of user. It returns the id of the user, for checking afterwards if the login is correct """ user_prefs = get_user_preferences(emailUnique(p_email)) if user_prefs and login_method != user_prefs["login_method"]: if CFG_EXTERNAL_AUTHENTICATION.has_key(user_prefs["login_method"]): return ([], p_email, p_pw, 11) if not CFG_EXTERNAL_AUTHENTICATION.has_key(login_method): return ([], p_email, p_pw, 12) if CFG_EXTERNAL_AUTHENTICATION[login_method][0]: p_email = CFG_EXTERNAL_AUTHENTICATION[login_method][0].auth_user(p_email, p_pw) if p_email: p_pw = givePassword(p_email) if not p_pw or p_pw < 0: import random p_pw = int(random.random() * 1000000) if not registerUser(req,p_email,p_pw): return ([], p_email, p_pw, 13) else: query_result = run_sql("SELECT id from user where email=%s and password=%s", (p_email,p_pw,)) user_prefs = get_user_preferences(query_result[0][0]) user_prefs["login_method"] = login_method set_user_preferences(query_result[0][0], user_prefs) else: return ([], p_email, p_pw, 10) query_result = run_sql("SELECT id from user where email=%s and password=%s", (p_email,p_pw,)) if query_result: prefered_login_method = get_user_preferences(query_result[0][0])['login_method'] else: return ([], p_email, p_pw, 14) if login_method != prefered_login_method: if CFG_EXTERNAL_AUTHENTICATION.has_key(prefered_login_method): return ([], p_email, p_pw, 11) return (query_result, p_email, p_pw, 0) def logoutUser(req): """It logout the user of the system, creating a guest user. """ uid = getUid(req) sm = session.MPSessionManager(pSession, pSessionMapping()) try: s = sm.get_session(req) except SessionError,e: sm.revoke_session_cookie (req) s = sm.get_session(req) id1 = createGuestUser() s.setUid(id1) sm.maintain_session(req,s) return id1 def userNotExist(p_email,p_pw): """Check if the user exists or not in the system """ query_result = run_sql("select email from user where email=%s", (p_email,)) if len(query_result)>0 and query_result[0]!='': return 0 return 1 def emailUnique(p_email): """Check if the email address only exists once. If yes, return userid, if not, -1 """ query_result = run_sql("select id, email from user where email=%s", (p_email,)) if len(query_result) == 1: return query_result[0][0] elif len(query_result) == 0: return 0 return -1 def update_Uid(req,p_email,p_pw): """It updates the userId of the session. It is used when a guest user is logged in succesfully in the system with a given email and password """ query_ID = int(run_sql("select id from user where email=%s and password=%s", (p_email,p_pw))[0][0]) setUid(req,query_ID) return query_ID def givePassword(email): """ It checks in the database the password for a given email. It is used to send the password to the email of the user.It returns the password if the user exists, otherwise it returns -999 """ query_pass = run_sql("select password from user where email =%s",(email,)) if len(query_pass)>0: return query_pass[0][0] return -999 def sendNewAdminAccountWarning(newAccountEmail, sendTo, ln=cdslang): """Send an email to the address given by sendTo about the new account newAccountEmail.""" fromaddr = "From: %s" % supportemail toaddrs = "To: %s" % sendTo to = toaddrs + "\n" sub = "Subject: New account on '%s'" % cdsname if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS == 1: sub += " - PLEASE ACTIVATE" sub += "\n\n" body = "A new account has been created on '%s'" % cdsname if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS == 1: body += " and is awaiting activation" body += ":\n\n" body += " Username/Email: %s\n\n" % newAccountEmail body += "You can approve or reject this account request at: %s/admin/webaccess/webaccessadmin.py/manageaccounts\n" % weburl body += "\n---------------------------------" body += "\n%s" % cdsname body += "\nContact: %s" % supportemail msg = to + sub + body server = smtplib.SMTP('localhost') server.set_debuglevel(1) try: server.sendmail(fromaddr, toaddrs, msg) except smtplib.SMTPRecipientsRefused,e: return 0 server.quit() return 1 def sendNewUserAccountWarning(newAccountEmail, sendTo, password, ln=cdslang): """Send an email to the address given by sendTo about the new account newAccountEmail.""" fromaddr = "From: %s" % supportemail toaddrs = "To: %s" % sendTo to = toaddrs + "\n" sub = "Subject: Your account created on '%s'\n\n" % cdsname body = "You have created a new account on '%s':\n\n" % cdsname body += " Username/Email: %s\n" % newAccountEmail body += " Password: %s\n\n" % ("*" * len(password)) if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 1: body += "This account is awaiting approval by the site administrators and therefore cannot be used as of yet.\nYou will receive an email notification as soon as your account request has been processed.\n" body += "\n---------------------------------" body += "\n%s" % cdsname body += "\nContact: %s" % supportemail msg = to + sub + body server = smtplib.SMTP('localhost') server.set_debuglevel(1) try: server.sendmail(fromaddr, toaddrs, msg) except smtplib.SMTPRecipientsRefused,e: return 0 server.quit() return 1 def get_email(uid): """Return email address of the user uid. Return string 'guest' in case the user is not found.""" out = "guest" res = run_sql("SELECT email FROM user WHERE id=%s", (uid,), 1) if res and res[0][0]: out = res[0][0] return out def create_userinfobox_body(uid, language="en"): """Create user info box body for user UID in language LANGUAGE.""" out = "" return tmpl.tmpl_create_userinfobox( ln = language, weburl = weburl, guest = isGuestUser(uid), email = get_email(uid), submitter = isUserSubmitter(uid), referee = isUserReferee(uid), admin = isUserAdmin(uid), ) def list_registered_users(): """List all registered users.""" return run_sql("SELECT id,email FROM user where email!=''") def list_users_in_role(role): """List all users of a given role (see table accROLE) @param role: role of user (string) @return list of uids """ query = """SELECT uacc.id_user FROM user_accROLE uacc JOIN accROLE acc ON uacc.id_accROLE=acc.id WHERE acc.name='%s'""" res = run_sql(query% MySQLdb.escape_string(role)) if res: return map(lambda x: int(x[0]), res) return [] def list_users_in_roles(role_list): """List all users of given roles (see table accROLE) @param role_list: list of roles [string] @return list of uids """ if not(type(role_list) is list or type(role_list) is tuple): role_list = [role_list] params='' query = """SELECT distinct(uacc.id_user) FROM user_accROLE uacc JOIN accROLE acc ON uacc.id_accROLE=acc.id %s""" if len(role_list) > 0: params = 'WHERE ' for role in role_list[:-1]: params += "acc.name='%s' OR " % MySQLdb.escape_string(role) params += "acc.name='%s'" % MySQLdb.escape_string(role_list[-1]) res = run_sql(query% params) if res: return map(lambda x: int(x[0]), res) return [] ## --- follow some functions for Apache user/group authentication def auth_apache_user_p(user, password): """Check whether user-supplied credentials correspond to valid Apache password data file. Return 0 in case of failure, 1 in case of success.""" try: pipe_input, pipe_output = os.popen2(["/bin/grep", "^" + user + ":", cfg_apache_password_file], 'r') line = pipe_output.readlines()[0] password_apache = string.split(string.strip(line),":")[1] except: # no pw found, so return not-allowed status return 0 salt = password_apache[:2] if crypt.crypt(password, salt) == password_apache: return 1 else: return 0 def auth_apache_user_in_groups(user): """Return list of Apache groups to which Apache user belong.""" out = [] try: pipe_input,pipe_output = os.popen2(["/bin/grep", user, cfg_apache_group_file], 'r') for line in pipe_output.readlines(): out.append(string.split(string.strip(line),":")[0]) except: # no groups found, so return empty list pass return out def auth_apache_user_collection_p(user, password, coll): """Check whether user-supplied credentials correspond to valid Apache password data file, and whether this user is authorized to see the given collections. Return 0 in case of failure, 1 in case of success.""" from search_engine import coll_restricted_p, coll_restricted_group if not auth_apache_user_p(user, password): return 0 if not coll_restricted_p(coll): return 1 if coll_restricted_group(coll) in auth_apache_user_in_groups(user): return 1 else: return 0 def get_user_preferences(uid): pref = run_sql("SELECT id, settings FROM user WHERE id=%s", (uid,)) if pref: try: return deserialize_via_marshal(pref[0][1]) except: return get_default_user_preferences() return None def set_user_preferences(uid, pref): res = run_sql("UPDATE user SET settings='%s' WHERE id=%s" % (serialize_via_marshal(pref),uid)) def get_default_user_preferences(): user_preference = { 'login_method': ''} for system in CFG_EXTERNAL_AUTHENTICATION.keys(): if CFG_EXTERNAL_AUTHENTICATION[system][1]: user_preference['login_method'] = system break return user_preference def serialize_via_marshal(obj): """Serialize Python object via marshal into a compressed string.""" return MySQLdb.escape_string(compress(dumps(obj))) def deserialize_via_marshal(string): """Decompress and deserialize string into a Python object via marshal.""" return loads(decompress(string)) diff --git a/modules/websession/web/youraccount.py b/modules/websession/web/youraccount.py index 6c166a1c4..7cb25615f 100644 --- a/modules/websession/web/youraccount.py +++ b/modules/websession/web/youraccount.py @@ -1,406 +1,406 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware ACCOUNT HANDLING""" __lastupdated__ = """$Date$""" import sys +from mod_python import apache +import smtplib + from cdsware import webuser from cdsware.config import weburl,cdsname,cdslang,supportemail from cdsware.webpage import page from cdsware import webaccount from cdsware import webbasket from cdsware import webalert from cdsware import webuser from cdsware.webmessage import account_new_mail from cdsware.access_control_config import * -from mod_python import apache from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_SITE, cfg_webaccess_warning_msgs, CFG_EXTERNAL_AUTHENTICATION -import smtplib from cdsware.messages import gettext_set_language - import cdsware.template websession_templates = cdsware.template.load('websession') def edit(req, ln=cdslang): uid = webuser.getUid(req) # load the right message language _ = gettext_set_language(ln) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return webuser.page_not_authorized(req, "../youraccount.py/set") data = webuser.getDataUid(req,uid) email = data[0] passw = data[1] return page(title= _("Your Settings"), body=webaccount.perform_set(email,passw, ln), navtrail="""""" % (weburl, ln) + _("Your Account") + """""", description="CDS Personalize, Your Settings", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def change(req,email=None,password=None,password2=None,login_method="",ln=cdslang): uid = webuser.getUid(req) # load the right message language _ = gettext_set_language(ln) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return webuser.page_not_authorized(req, "../youraccount.py/change") if login_method and CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS < 4: title = _("Settings edited") act = "display" linkname = _("Show account") prefs = webuser.get_user_preferences(uid) prefs['login_method'] = login_method webuser.set_user_preferences(uid, prefs) mess = _("Login method successfully selected.") elif login_method and CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 4: return webuser.page_not_authorized(req, "../youraccount.py/change") elif email: uid2 = webuser.emailUnique(email) if (CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 2 or (CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS <= 1 and webuser.checkemail(email))) and uid2 != -1 and (uid2 == uid or uid2 == 0) and password == password2: if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS < 3: change = webuser.updateDataUser(req,uid,email,password) else: return webuser.page_not_authorized(req, "../youraccount.py/change") if change and CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 2: mess = _("Password successfully edited.") elif change: mess = _("Settings successfully edited.") act = "display" linkname = _("Show account") title = _("Settings edited") elif uid2 == -1 or uid2 != uid and not uid2 == 0: mess = _("The email address is already in use, please try again.") act = "edit" linkname = _("Edit settings") title = _("Editing settings failed") elif not webuser.checkemail(email): mess = _("The email address is not valid, please try again.") act = "edit" linkname = _("Edit settings") title = _("Editing settings failed") elif password != password2: mess = _("The passwords do not match, please try again.") act = "edit" linkname = _("Edit settings") title = _("Editing settings failed") else: mess = _("Could not update settings.") act = "edit" linkname = _("Edit settings") title = _("Editing settings failed") return page(title=title, body=webaccount.perform_back(mess,act, linkname, ln), navtrail="""""" % (weburl, ln) + _("Your Account") + """""", description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def lost(req, ln=cdslang): uid = webuser.getUid(req) # load the right message language _ = gettext_set_language(ln) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return webuser.page_not_authorized(req, "../youraccount.py/lost") return page(title=_("Lost your password?"), body=webaccount.perform_lost(ln), navtrail="""""" % (weburl, ln) + _("Your Account") + """""", description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def display(req, ln=cdslang): uid = webuser.getUid(req) # load the right message language _ = gettext_set_language(ln) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return webuser.page_not_authorized(req, "../youraccount.py/display") if webuser.isGuestUser(uid): return page(title=_("Your Account"), body=webaccount.perform_info(req, ln), description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) data = webuser.getDataUid(req,uid) bask = webbasket.account_list_baskets(uid, ln = ln) aler = webalert.account_list_alerts(uid, ln = ln) sear = webalert.account_list_searches(uid, ln = ln) msgs = account_new_mail(uid, ln = ln) return page(title=_("Your Account"), body=webaccount.perform_display_account(req,data,bask,aler,sear,msgs,ln), description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def send_email(req, p_email=None, ln=cdslang): uid = webuser.getUid(req) # load the right message language _ = gettext_set_language(ln) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return webuser.page_not_authorized(req, "../youraccount.py/send_email") user_prefs = webuser.get_user_preferences(webuser.emailUnique(p_email)) if user_prefs: if CFG_EXTERNAL_AUTHENTICATION.has_key(user_prefs['login_method']) or CFG_EXTERNAL_AUTHENTICATION.has_key(user_prefs['login_method']) and CFG_EXTERNAL_AUTHENTICATION[user_prefs['login_method']][0] != None: Msg = websession_templates.tmpl_lost_password_message(ln = ln, supportemail = supportemail) return page(title=_("Your Account"), body=Msg, description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) passw = webuser.givePassword(p_email) if passw == -999: eMsg = _("The entered e-mail address doesn't exist in the database") return page(title=_("Your Account"), body=webaccount.perform_emailMessage(eMsg, ln), description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) fromaddr = "From: %s" % supportemail toaddrs = "To: " + p_email to = toaddrs + "\n" sub = "Subject: %s %s\n\n" % (_("Credentials for"), cdsname) body = "%s %s:\n\n" % (_("Here are your user credentials for"), cdsname) body += " %s: %s\n %s: %s\n\n" % (_("username"), p_email, _("password"), passw) body += "%s %s/youraccount.py/login?ln=%s" % (_("You can login at"), weburl, ln) msg = to + sub + body server = smtplib.SMTP('localhost') server.set_debuglevel(1) try: server.sendmail(fromaddr, toaddrs, msg) except smtplib.SMTPRecipientsRefused,e: eMsg = _("The entered email address is incorrect, please check that it is written correctly (e.g. johndoe@example.com).") return page(title=_("Incorrect email address"), body=webaccount.perform_emailMessage(eMsg, ln), description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) server.quit() return page(title=_("Lost password sent"), body=webaccount.perform_emailSent(p_email, ln), description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def youradminactivities(req, ln=cdslang): uid = webuser.getUid(req) # load the right message language _ = gettext_set_language(ln) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return webuser.page_not_authorized(req, "../youraccount.py/youradminactivities") return page(title=_("Your Administrative Activities"), body=webaccount.perform_youradminactivities(uid, ln), navtrail="""""" % (weburl, ln) + _("Your Account") + """""", description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def delete(req, ln=cdslang): uid = webuser.getUid(req) # load the right message language _ = gettext_set_language(ln) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return webuser.page_not_authorized(req, "../youraccount.py/delete") return page(title=_("Delete Account"), body=webaccount.perform_delete(ln), navtrail="""""" % (weburl, ln) + _("Your Account") + """""", description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def logout(req, ln=cdslang): uid = webuser.logoutUser(req) # load the right message language _ = gettext_set_language(ln) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return webuser.page_not_authorized(req, "../youraccount.py/logout") return page(title=_("Logout"), body=webaccount.perform_logout(req, ln), navtrail="""""" % (weburl, ln) + _("Your Account") + """""", description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def login(req, p_email=None, p_pw=None, login_method=None, action='login', referer='', ln=cdslang): if CFG_ACCESS_CONTROL_LEVEL_SITE > 0: return webuser.page_not_authorized(req, "../youraccount.py/login") uid = webuser.getUid(req) # load the right message language _ = gettext_set_language(ln) if action == _('login'): if p_email==None or not login_method: return page(title=_("Login"), body=webaccount.create_login_page_box(referer, ln), navtrail="""""" % (weburl, ln) + _("Your Account") + """""", description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) (iden, p_email, p_pw, msgcode) = webuser.loginUser(req,p_email,p_pw, login_method) if len(iden)>0: uid = webuser.update_Uid(req,p_email,p_pw) uid2 = webuser.getUid(req) if uid2 == -1: webuser.logoutUser(req) return webuser.page_not_authorized(req, "../youraccount.py/login?ln=%s" % ln, uid=uid) # login successful! if referer: req.err_headers_out.add("Location", referer) raise apache.SERVER_RETURN, apache.HTTP_MOVED_PERMANENTLY else: return display(req) else: mess = cfg_webaccess_warning_msgs[msgcode] % login_method if msgcode == 14: if not webuser.userNotExist(p_email,p_pw) or p_email=='' or p_email==' ': mess = cfg_webaccess_warning_msgs[15] % login_method act = "login" return page(title=_("Login"), body=webaccount.perform_back(mess,act, _("Login"), ln), navtrail="""""" % (weburl, ln) + _("Your Account") + """""", description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) def register(req, p_email=None, p_pw=None, p_pw2=None, action='login', referer='', ln=cdslang): if CFG_ACCESS_CONTROL_LEVEL_SITE > 0: return webuser.page_not_authorized(req, "../youraccount.py/register") uid = webuser.getUid(req) # load the right message language _ = gettext_set_language(ln) if p_email==None: return page(title=_("Register"), body=webaccount.create_register_page_box(referer, ln), navtrail="""""" % (weburl, ln) + _("Your Account") + """""", description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) mess="" act="" if p_pw == p_pw2: ruid = webuser.registerUser(req,p_email,p_pw) else: ruid = -2 if ruid == 1: uid = webuser.update_Uid(req,p_email,p_pw) mess = _("Your account has been successfully created.") title = _("Account created") if CFG_ACCESS_CONTROL_NOTIFY_USER_ABOUT_NEW_ACCOUNT == 1: mess += _(" An email has been sent to the given address with the account information.") if CFG_ACCESS_CONTROL_LEVEL_ACCOUNTS >= 1: mess += _(" A second email will be sent when the account has been activated and can be used.") else: mess += _(""" You can now access your account.""") % ( "%s/youraccount.py/display?ln=%s" % (weburl, ln)) elif ruid == -1: mess = _("The user already exists in the database, please try again.") act = "register" title = _("Register failure") elif ruid == -2: mess = _("Both passwords must match, please try again.") act = "register" title = _("Register failure") else: mess = _("The email address given is not valid, please try again.") act = "register" title = _("Register failure") return page(title=title, body=webaccount.perform_back(mess,act, (act == 'register' and _("register") or ""), ln), navtrail="""""" % (weburl, ln) + _("Your Account") + """""", description="CDS Personalize, Main page", keywords="CDS, personalize", uid=uid, language=ln, lastupdated=__lastupdated__) diff --git a/modules/webstyle/lib/template.py b/modules/webstyle/lib/template.py index b296434cd..0a6250bd4 100644 --- a/modules/webstyle/lib/template.py +++ b/modules/webstyle/lib/template.py @@ -1,46 +1,46 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware templating framework.""" __version__ = "$Id$" -from config import cfg_template_skin +from cdsware.config import cfg_template_skin def load(module=''): """ Load and returns a template class, given a module name (like 'websearch', 'webbasket',...). The module corresponding to the currently selected template model (see config.wml, variable CFG_TEMPLATE_SKIN) is tried first. In case it does not exist, it returns the default template for that module. """ local = {} # load the right template based on the cfg_template_skin and the specified module if cfg_template_skin == "default": mymodule = __import__("cdsware.%s_templates" % (module), local, local, ["cdsware.templates.%s" % (module)]) else: try: mymodule = __import__("cdsware.%s_templates_%s" % (module, cfg_template_skin), local, local, ["cdsware.templates.%s_%s" % (module, cfg_template_skin)]) except ImportError: mymodule = __import__("cdsware.%s_templates" % (module), local, local, ["cdsware.templates.%s" % (module)]) return mymodule.Template() diff --git a/modules/webstyle/lib/webpage.py b/modules/webstyle/lib/webpage.py index 12e2daa94..0208839d0 100644 --- a/modules/webstyle/lib/webpage.py +++ b/modules/webstyle/lib/webpage.py @@ -1,172 +1,172 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware Web Page Functions""" -from config import * -from messages import gettext_set_language -from webuser import create_userinfobox_body import re import string import sys import time import traceback import urllib -from errorlib import get_msgs_for_code_list, register_errors +from cdsware.config import * +from cdsware.messages import gettext_set_language +from cdsware.webuser import create_userinfobox_body +from cdsware.errorlib import get_msgs_for_code_list, register_errors -import template -webstyle_templates = template.load('webstyle') +import cdsware.template +webstyle_templates = cdsware.template.load('webstyle') def create_navtrailbox_body(title, previous_links, prolog="", separator=""" > """, epilog="", language=cdslang): """Create navigation trail box body input: title = page title; previous_links = the trail content from site title until current page (both ends exlusive). output: text containing the navtrail """ return webstyle_templates.tmpl_navtrailbox_body(weburl = weburl, ln = language, title = title, previous_links = previous_links, separator = separator, prolog = prolog, epilog = epilog) def page(title, body, navtrail="", description="", keywords="", uid=0, cdspageheaderadd="", cdspageboxlefttopadd="", cdspageboxleftbottomadd="", cdspageboxrighttopadd="", cdspageboxrightbottomadd="", cdspagefooteradd="", lastupdated="", language=cdslang, urlargs="", verbose=1, titleprologue="", titleepilogue="", req=None, errors=[], warnings=[]): """page(): display CDS web page input: title of the page body of the page in html format description goes to the metadata in the header of the HTML page keywords goes to the metadata in the header of the html page cdspageheaderadd is a message to be displayed just under the page header cdspageboxlefttopadd is a message to be displayed in the page body on left top cdspageboxleftbottomadd is a message to be displayed in the page body on left bottom cdspageboxrighttopadd is a message to be displayed in the page body on right top cdspageboxrightbottomadd is a message to be displayed in the page body on right bottom cdspagefooteradd is a message to be displayed on the top of the page footer lastupdated is a text containing the info on last update (optional) language is the language version of the page urlargs are the URL arguments of the page to display (useful to affect languages) verbose is verbosity of the page (useful for debugging) titleprologue is to be printed right before page title titleepilogue is to be printed right after page title req is the mod_python request errors is the list of error codes as defined in the moduleName_config.py file of the calling module log is the string of data that should be appended to the log file (errors automatically logged) output: the final cds page with header, footer, etc. """ _ = gettext_set_language(language) if title == cdsnameintl[language]: headerstitle = _("Home") else: headerstitle = title # if there are event if warnings: warnings = get_msgs_for_code_list(warnings, 'warning', language) register_errors(warnings, 'warning') # if there are errors if errors: errors = get_msgs_for_code_list(errors, 'error', language) register_errors(errors, 'error', req) body = create_error_box(req, errors=errors, ln=language) return webstyle_templates.tmpl_page(weburl = weburl, ln = language, headertitle = headerstitle, sitename = cdsnameintl[language], supportemail = supportemail, description = description, keywords = keywords, userinfobox = create_userinfobox_body(uid, language), navtrailbox = create_navtrailbox_body(title, navtrail, language=language), uid = uid, # pageheader = cdspageheader, pageheaderadd = cdspageheaderadd, boxlefttop = cdspageboxlefttop, boxlefttopadd = cdspageboxlefttopadd, boxleftbottomadd = cdspageboxleftbottomadd, boxleftbottom = cdspageboxleftbottom, boxrighttop = cdspageboxrighttop, boxrighttopadd = cdspageboxrighttopadd, boxrightbottomadd = cdspageboxrightbottomadd, boxrightbottom = cdspageboxrightbottom, titleprologue = titleprologue, title = title, titleepilogue = titleepilogue, body = body, # pagefooter = cdspagefooter, languagebox = webstyle_templates.tmpl_language_selection_box(urlargs, language), version = version, lastupdated = lastupdated, pagefooteradd = cdspagefooteradd) def pageheaderonly(title, navtrail="", description="", keywords="", uid=0, cdspageheaderadd="", language=cdslang, urlargs="", verbose=1): """Return just the beginning of page(), with full headers. Suitable for the search results page and any long-taking scripts.""" return webstyle_templates.tmpl_pageheader(weburl = weburl, ln = language, headertitle = title, sitename = cdsnameintl[language], supportemail = supportemail, description = description, keywords = keywords, userinfobox = create_userinfobox_body(uid, language), navtrailbox = create_navtrailbox_body(title, navtrail, language=language), uid = uid, # pageheader = cdspageheader, pageheaderadd = cdspageheaderadd, languagebox = webstyle_templates.tmpl_language_selection_box(urlargs, language)) def pagefooteronly(cdspagefooteradd="", lastupdated="", language=cdslang, urlargs="", verbose=1): """Return just the ending of page(), with full footer. Suitable for the search results page and any long-taking scripts.""" return webstyle_templates.tmpl_pagefooter(weburl = weburl, ln = language, sitename = cdsnameintl[language], supportemail = supportemail, # pagefooter = cdspagefooter, languagebox = webstyle_templates.tmpl_language_selection_box(urlargs, language), version = version, lastupdated = lastupdated, pagefooteradd = cdspagefooteradd) def create_error_box(req, title=None, verbose=1, ln=cdslang, errors=None): """Analyse the req object and the sys traceback and return a text message box with internal information that would be suitful to display when something bad has happened. """ _ = gettext_set_language(ln) return webstyle_templates.tmpl_error_box(title = title, ln = ln, verbose = verbose, req = req, supportemail = supportemail, errors = errors) diff --git a/modules/webstyle/lib/webstyle_templates.py b/modules/webstyle/lib/webstyle_templates.py index 8c54c358b..51c4d4e7a 100644 --- a/modules/webstyle/lib/webstyle_templates.py +++ b/modules/webstyle/lib/webstyle_templates.py @@ -1,592 +1,592 @@ ## $Id$ ## CDSware WebStyle templates. ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import urllib import time import cgi import gettext import traceback import sre import urllib import sys -from config import * -from messages import gettext_set_language, language_list_long +from cdsware.config import * +from cdsware.messages import gettext_set_language, language_list_long class Template: def tmpl_navtrailbox_body(self, weburl, ln, title, previous_links, separator, prolog, epilog): """Create navigation trail box body Parameters: - 'weburl' *string* - The base URL for the site - 'ln' *string* - The language to display - 'title' *string* - page title; - 'previous_links' *string* - the trail content from site title until current page (both ends exlusive) - 'prolog' *string* - HTML code to prefix the navtrail item with - 'epilog' *string* - HTML code to suffix the navtrail item with - 'separator' *string* - HTML code that separates two navtrail items Output: - text containing the navtrail """ # load the right message language _ = gettext_set_language(ln) out = "" if title != cdsnameintl[ln]: out += """%(msg_home)s""" % { 'weburl' : weburl, 'ln' : ln, 'msg_home' : _("Home")} if previous_links: if out: out += separator out += previous_links if title: if out: out += separator if title == cdsnameintl[ln]: # hide site name, print Home instead out += _("Home") else: out += title return prolog + out + epilog def tmpl_page(self, weburl, ln, headertitle, sitename = "", supportemail = "", description = "", keywords = "", userinfobox = "", navtrailbox = "", pageheaderadd = "", boxlefttop = "", boxlefttopadd = "", boxleftbottom = "", boxleftbottomadd = "", boxrighttop = "", boxrighttopadd = "", boxrightbottom = "", boxrightbottomadd = "", titleprologue = "", title = "", titleepilogue = "", body = "", version = "", lastupdated = None, languagebox = "", pagefooteradd = "", uid = 0, ): """Creates a complete page Parameters: - 'weburl' *string* - The base URL for the site - 'ln' *string* - The language to display - 'sitename' *string* - the first part of the page HTML title - 'headertitle' *string* - the second part of the page HTML title - 'supportemail' *string* - email of the support team - 'description' *string* - description goes to the metadata in the header of the HTML page - 'keywords' *string* - keywords goes to the metadata in the header of the HTML page - 'userinfobox' *string* - the HTML code for the user information box - 'navtrailbox' *string* - the HTML code for the navigation trail box - 'pageheaderadd' *string* - additional page header HTML code - 'boxlefttop' *string* - left-top box HTML code - 'boxlefttopadd' *string* - additional left-top box HTML code - 'boxleftbottom' *string* - left-bottom box HTML code - 'boxleftbottomadd' *string* - additional left-bottom box HTML code - 'boxrighttop' *string* - right-top box HTML code - 'boxrighttopadd' *string* - additional right-top box HTML code - 'boxrightbottom' *string* - right-bottom box HTML code - 'boxrightbottomadd' *string* - additional right-bottom box HTML code - 'title' *string* - the title of the page - 'body' *string* - the body of the page - 'version' *string* - the version number of CDSware - 'lastupdated' *string* - when the page was last updated - 'languagebox' *string* - the HTML code for the language box - 'pagefooteradd' *string* - additional page footer HTML code Output: - HTML code of the page """ # load the right message language _ = gettext_set_language(ln) if lastupdated: msg_lastupdated = _("Last updated:") + " " + lastupdated else: msg_lastupdated = "" out = self.tmpl_pageheader( weburl = weburl, ln = ln, headertitle = headertitle, sitename = sitename, supportemail = supportemail, description = description, keywords = keywords, userinfobox = userinfobox, navtrailbox = navtrailbox, pageheaderadd = pageheaderadd, languagebox = languagebox, ) + """

%(boxlefttop)s
%(boxlefttopadd)s
%(boxleftbottomadd)s
%(boxleftbottom)s
%(boxrighttop)s
%(boxrighttopadd)s
%(boxrightbottomadd)s
%(boxrightbottom)s

%(title)s

%(body)s
""" % { 'boxlefttop' : boxlefttop, 'boxlefttopadd' : boxlefttopadd, 'boxleftbottom' : boxleftbottom, 'boxleftbottomadd' : boxleftbottomadd, 'boxrighttop' : boxrighttop, 'boxrighttopadd' : boxrighttopadd, 'boxrightbottom' : boxrightbottom, 'boxrightbottomadd' : boxrightbottomadd, 'title' : title, 'body' : body, } + self.tmpl_pagefooter( weburl = weburl, ln = ln, sitename = sitename, supportemail = supportemail, version = version, lastupdated = lastupdated, languagebox = languagebox, pagefooteradd = pagefooteradd ) return out def tmpl_pageheader(self, weburl, ln, headertitle = "", sitename = "", supportemail = "", description = "", keywords = "", userinfobox = "", navtrailbox = "", pageheaderadd = "", languagebox = "", uid = 0, ): """Creates a page header Parameters: - 'weburl' *string* - The base URL for the site - 'ln' *string* - The language to display - 'sitename' *string* - the first part of the page HTML title - 'headertitle' *string* - the second part of the page HTML title - 'supportemail' *string* - email of the support team - 'description' *string* - description goes to the metadata in the header of the HTML page - 'keywords' *string* - keywords goes to the metadata in the header of the HTML page - 'userinfobox' *string* - the HTML code for the user information box - 'navtrailbox' *string* - the HTML code for the navigation trail box - 'pageheaderadd' *string* - additional page header HTML code - 'languagebox' *string* - the HTML code for the language box Output: - HTML code of the page headers """ # load the right message language _ = gettext_set_language(ln) out = """ %(sitename)s: %(headertitle)s """ % { 'weburl' : weburl, 'ln' : ln, 'sitename' : sitename, 'headertitle' : headertitle, 'supportemail' : supportemail, 'description' : description, 'keywords' : keywords, 'userinfobox' : userinfobox, 'navtrailbox' : navtrailbox, 'pageheaderadd' : pageheaderadd, 'msg_search' : _("Search"), 'msg_submit' : _("Submit"), 'msg_personalize' : _("Personalize"), 'msg_help' : _("Help"), 'languagebox' : languagebox, } return out def tmpl_pagefooter(self, weburl, ln, sitename = "", supportemail = "", version = "", lastupdated = None, languagebox = "", pagefooteradd = "" ): """Creates a page footer Parameters: - 'weburl' *string* - The base URL for the site - 'ln' *string* - The language to display - 'sitename' *string* - the first part of the page HTML title - 'supportemail' *string* - email of the support team - 'version' *string* - the version number of CDSware - 'lastupdated' *string* - when the page was last updated - 'languagebox' *string* - the HTML code for the language box - 'pagefooteradd' *string* - additional page footer HTML code Output: - HTML code of the page headers """ # load the right message language _ = gettext_set_language(ln) if lastupdated: msg_lastupdated = _("Last updated:") + " " + lastupdated else: msg_lastupdated = "" out = """ """ % { 'weburl' : weburl, 'ln' : ln, 'sitename' : sitename, 'supportemail' : supportemail, 'msg_search' : _("Search"), 'msg_submit' : _("Submit"), 'msg_personalize' : _("Personalize"), 'msg_help' : _("Help"), 'msg_poweredby' : _("Powered by"), 'msg_maintainedby' : _("Maintained by"), 'msg_lastupdated' : msg_lastupdated, 'languagebox' : languagebox, 'version' : version, 'pagefooteradd' : pagefooteradd, } return out def tmpl_language_selection_box(self, urlargs="", language="en"): """Take URLARGS and LANGUAGE and return textual language selection box for the given page. Parameters: - 'urlargs' *string* - The url args that helped produce this page - 'language' *string* - The selected language """ # load the right message language _ = gettext_set_language(language) out = "" for (lang, lang_namelong) in language_list_long(): if lang == language: out += """ %s   """ % lang_namelong else: if urlargs: urlargs = sre.sub(r'ln=.*?(&|$)', '', urlargs) if urlargs: if urlargs.endswith('&'): urlargs += "ln=%s" % lang else: urlargs += "&ln=%s" % lang else: urlargs = "ln=%s" % lang out += """ %s   """ % (urlargs, lang_namelong) return _("This site is also available in the following languages:") + "
" + out def tmpl_error_box(self, ln, title, verbose, req, supportemail, errors): """Produces an error box. Parameters: - 'title' *string* - The title of the error box - 'ln' *string* - The selected language - 'verbose' *bool* - If lots of information should be displayed - 'req' *object* - the request object - 'supportemail' *string* - the supportemail for this installation - 'errors' list of tuples (error_code, error_message) - #! todo """ # load the right message language _ = gettext_set_language(ln) info_not_available = _("N/A") if title == None: if errors: title = _("Error: %s") % errors[0][1] else: title = _("Internal Error") browser_s = _("Browser") if req: try: if req.headers_in.has_key('User-Agent'): browser_s += ': ' + req.headers_in['User-Agent'] else: browser_s += ': ' + info_not_available host_s = req.hostname page_s = req.unparsed_uri client_s = req.connection.remote_ip except: pass else: browser_s += ': ' + info_not_available host_s = page_s = client_s = info_not_available error_s = '' sys_error_s = '' traceback_s = '' if verbose >= 1: if sys.exc_info()[0]: sys_error_s = _("System Error: %s %s\n") % (sys.exc_info()[0], sys.exc_info()[1]) if errors: errs = '' for error_tuple in errors: try: errs += "%s%s : %s\n " % (' '*6, error_tuple[0], error_tuple[1]) except: errs += "%s%s\n" % (' '*6, error_tuple) errs = errs[6:-2] # get rid of trainling ',' error_s = _("Error: %s")% errs + "\n" else: error_s = _("Error") + ': ' + info_not_available if verbose >= 9: traceback_s = _("Traceback: \n%s") % string.join(traceback.format_tb(sys.exc_info()[2]),"\n") out = """

%(title)s %(sys1)s %(sys2)s

%(contact)s

 URI: http://%(host)s%(page)s
 %(time_label)s: %(time)s
 %(browser)s
 %(client_label)s: %(client)s
 %(error)s%(sys_error)s%(traceback)s
 
%(send_error_label)s
""" % { 'title' : title, 'time_label': _("Time"), 'client_label': _("Client"), 'send_error_label': _("Please send an error report to the Administrator"), 'send_label': _("Send error report"), 'sys1' : sys.exc_info()[0] or '', 'sys2' : sys.exc_info()[1] or '', 'contact' : _("Please contact %s quoting the following information:") % (urllib.quote(supportemail), supportemail), 'host' : host_s, 'page' : page_s, 'time' : time.strftime("%02d/%b/%Y:%H:%M:%S %z"), 'browser' : browser_s, 'client' : client_s, 'error' : error_s, 'traceback' : traceback_s, 'sys_error' : sys_error_s, 'weburl' : weburl, 'referer' : page_s!=info_not_available and ("http://" + host_s + page_s) or info_not_available } return out diff --git a/modules/websubmit/lib/file.py b/modules/websubmit/lib/file.py index cd9ca15ed..55b428e24 100644 --- a/modules/websubmit/lib/file.py +++ b/modules/websubmit/lib/file.py @@ -1,552 +1,551 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## import interesting modules: import string import os import sys import time import types import re import mimetypes import shutil import md5 import urllib - -from config import * -from access_control_engine import acc_authorize_action -from access_control_admin import acc_isRole -from webpage import page, create_error_box -from webuser import getUid, get_email -from dbquery import run_sql -from messages import * from mod_python import apache -from websubmit_config import * - -from messages import gettext_set_language -import template -websubmit_templates = template.load('websubmit') +from cdsware.config import * +from cdsware.access_control_engine import acc_authorize_action +from cdsware.access_control_admin import acc_isRole +from cdsware.webpage import page, create_error_box +from cdsware.webuser import getUid, get_email +from cdsware.dbquery import run_sql +from cdsware.messages import * +from cdsware.websubmit_config import * + +from cdsware.messages import gettext_set_language +import cdsware.template +websubmit_templates = cdsware.template.load('websubmit') archivepath = filedir archivesize = filedirsize # sort compressed file extensions list to get lengthy ones first: cfg_compressed_file_extensions_sorted = cfg_compressed_file_extensions cfg_compressed_file_extensions_sorted.sort() class BibRecDocs: """this class represents all the files attached to one record""" def __init__(self,recid): self.id = recid self.bibdocs = [] self.buildBibDocList() def buildBibDocList(self): self.bibdocs = [] res = run_sql("select id_bibdoc,type from bibrec_bibdoc,bibdoc where id=id_bibdoc and id_bibrec=%s and status!='deleted'", (self.id,)) for row in res: self.bibdocs.append(BibDoc(bibdocid=row[0],recid=self.id)) def listBibDocs(self,type=""): tmp=[] for bibdoc in self.bibdocs: if type=="" or type == bibdoc.getType(): tmp.append(bibdoc) return tmp def getBibDocNames(self,type="Main"): names = [] for bibdoc in self.listBibDocs(type): names.append(bibdoc.getDocName()) return names def getBibDoc(self,bibdocid): for bibdoc in self.bibdocs: if bibdoc.getId() == bibdocid: return bibdoc return None def deleteBibDoc(self,bibdocid): for bibdoc in self.bibdocs: if bibdoc.getId() == bibdocid: bibdoc.delete() self.buildBibDocList() def addBibDoc(self,type="Main",docname="file"): while docname in self.getBibDocNames(type): match = re.match("(.*_)([^_]*)",docname) if match: try: docname = match.group(1)+str(int(match.group(2)) + 1) except: docname = docname + "_2" else: docname = docname + "_2" bibdoc = BibDoc(recid=self.id,type=type,docname=docname) if bibdoc != None: self.bibdocs.append(bibdoc) return bibdoc def addNewFile(self,fullpath,type="Main"): filename = re.sub("\..*","",re.sub(r".*[\\/:]", "", fullpath)) bibdoc = self.addBibDoc(type,filename) if bibdoc != None: bibdoc.addFilesNewVersion(files=[fullpath]) return bibdoc return None def addNewVersion(self,fullpath,bibdocid): bibdoc = self.getBibDoc(bibdocid) if bibdoc != None: bibdoc.addFilesNewVersion(files=[fullpath]) docname = re.sub("\..*","",re.sub(r".*[\\/:]", "", fullpath)) if docname != bibdoc.getDocName(): while docname in self.getBibDocNames(bibdoc.getType()): match = re.match("(.*_)([^_]*)",docname) if match: try: docname = match.group(1)+str(int(match.group(2)) + 1) except: docname = docname + "_2" else: docname = docname + "_2" bibdoc.changeName(docname) return bibdoc return None def addNewFormat(self,fullpath,bibdocid): bibdoc = self.getBibDoc(bibdocid) if bibdoc != None: bibdoc.addFilesNewFormat(files=[fullpath]) return bibdoc return None def listLatestFiles(self,type=""): docfiles = [] for bibdoc in self.listBibDocs(type): for docfile in bibdoc.listLatestFiles(): docfiles.append(docfile) return docfiles def checkFileExists(self,fullpath,type=""): if os.path.exists(fullpath): docfiles = self.listLatestFiles(type) for docfile in docfiles: if md5.new(readfile(fullpath)).digest() == md5.new(readfile(docfile.getPath())).digest(): return docfile.getBibDocId() else: return 0 def display(self,bibdocid="",version="",type="", ln = cdslang): t="" bibdocs = [] if bibdocid!="": for bibdoc in self.bibdocs: if bibdoc.getId() == bibdocid: bibdocs.append(bibdoc) else: bibdocs = self.listBibDocs(type) if len(bibdocs) > 0: types = listTypesFromArray(bibdocs) fulltypes = [] for mytype in types: fulltype = { 'name' : mytype, 'content' : [], } for bibdoc in bibdocs: if mytype == bibdoc.getType(): fulltype['content'].append(bibdoc.display(version, ln = ln)) fulltypes.append(fulltype) t = websubmit_templates.tmpl_bibrecdoc_filelist( ln = ln, types = fulltypes, ) return t class BibDoc: """this class represents one file attached to a record there is a one to one mapping between an instance of this class and an entry in the bibdoc db table""" def __init__ (self,bibdocid="",recid="",docname="file",type="Main"): # bibdocid is known, the document already exists if bibdocid != "": if recid == "": res = run_sql("select id_bibrec,type from bibrec_bibdoc where id_bibdoc=%s",(bibdocid,)) if len(res) > 0: recid = res[0][0] self.type = res[0][1] else: recid = None self.type = "" else: res = run_sql("select type from bibrec_bibdoc where id_bibrec=%s and id_bibdoc=%s",(recid,bibdocid,)) self.type = res[0][0] # gather the other information res = run_sql("select * from bibdoc where id=%s", (bibdocid,)) self.cd = res[0][3] self.md = res[0][4] self.recid = recid self.docname = res[0][2] self.id = bibdocid group = "g"+str(int(int(self.id)/archivesize)) self.basedir = "%s/%s/%s" % (archivepath,group,self.id) # else it is a new document else: if docname == "" or type == "": return None else: self.recid = recid self.type = type self.docname = docname self.id = run_sql("insert into bibdoc (docname,creation_date,modification_date) values(%s,NOW(),NOW())", (docname,)) if self.id != None: #we link the document to the record if a recid was specified if self.recid != "": run_sql("insert into bibrec_bibdoc values(%s,%s,%s)", (recid,self.id,self.type,)) else: return None group = "g"+str(int(int(self.id)/archivesize)) self.basedir = "%s/%s/%s" % (archivepath,group,self.id) # we create the corresponding storage directory if not os.path.exists(self.basedir): os.makedirs(self.basedir) # and save the father record id if it exists if self.recid!="": fp = open("%s/.recid" % self.basedir,"w") fp.write(str(self.recid)) fp.close() # build list of attached files self.docfiles = {} self.BuildFileList() # link with relatedFiles self.relatedFiles = {} self.BuildRelatedFileList() def addFilesNewVersion(self,files=[]): """add a new version of a file to an archive""" latestVersion = self.getLatestVersion() if latestVersion == "0": myversion = "1" else: myversion = str(int(latestVersion)+1) for file in files: if os.path.exists(file): filename = re.sub(r".*[\\/:]", "", file) shutil.copy(file,"%s/%s;%s" % (self.basedir,filename,myversion)) self.BuildFileList() def addFilesNewFormat(self,files=[],version=""): """add a new format of a file to an archive""" if version == "": version = self.getLatestVersion() for file in files: if os.path.exists(file): filename = re.sub(r".*[\\/:]", "", file) shutil.copy(file,"%s/%s;%s" % (self.basedir,filename,version)) self.BuildFileList() def getIcon(self): if self.relatedFiles.has_key('Icon'): return self.relatedFiles['Icon'][0] else: return None def addIcon(self,file): """link an icon with the bibdoc object""" #first check if an icon already exists existingIcon = self.getIcon() if existingIcon != None: existingIcon.delete() #then add the new one filename = re.sub("\..*","",re.sub(r".*[\\/:]", "", file)) newicon = BibDoc(type='Icon',docname=filename) if newicon != None: newicon.addFilesNewVersion(files=[file]) run_sql("insert into bibdoc_bibdoc values(%s,%s,'Icon')", (self.id,newicon.getId(),)) if os.path.exists(newicon.getBaseDir()): fp = open("%s/.docid" % newicon.getBaseDir(),"w") fp.write(str(self.id)) fp.close() self.BuildRelatedFileList() def deleteIcon(self): existingIcon = self.getIcon() if existingIcon != None: existingIcon.delete() self.BuildRelatedFileList() def display(self,version="", ln = cdslang): t="" if version == "all": docfiles = self.listAllFiles() elif version != "": docfiles = self.listVersionFiles(version) else: docfiles = self.listLatestFiles() existingIcon = self.getIcon() if existingIcon != None: imagepath = "%s/getfile.py?docid=%s&name=%s&format=gif" % (weburl,existingIcon.getId(),urllib.quote(existingIcon.getDocName())) else: imagepath = "%s/smallfiles.gif" % images versions = [] for version in listVersionsFromArray(docfiles): currversion = { 'version' : version, 'previous' : 0, 'content' : [] } if version == self.getLatestVersion() and version != "1": currversion['previous'] = 1 for docfile in docfiles: if docfile.getVersion() == version: currversion['content'].append(docfile.display(ln = ln)) versions.append(currversion) t = websubmit_templates.tmpl_bibdoc_filelist( ln = ln, weburl = weburl, versions = versions, imagepath = imagepath, docname = self.docname, id = self.id, ) return t def changeName(self,newname): run_sql("update bibdoc set docname=%s where id=%s",(newname,self.id,)) self.docname = newname def getDocName(self): """retrieve bibdoc name""" return self.docname def getBaseDir(self): """retrieve bibdoc base directory""" return self.basedir def getType(self): """retrieve bibdoc type""" return self.type def getRecid(self): """retrieve bibdoc recid""" return self.recid def getId(self): """retrieve bibdoc id""" return self.id def getFile(self,name,format,version): if version == "": docfiles = self.listLatestFiles() else: docfiles = self.listVersionFiles(version) for docfile in docfiles: if docfile.getName()==name and docfile.getFormat()==format: return docfile return None def listVersions(self): versions = [] for docfile in self.docfiles: if not docfile.getVersion() in versions: versions.append(docfile.getVersion()) return versions def delete(self): """delete the current bibdoc instance""" run_sql("update bibdoc set status='deleted' where id=%s",(self.id,)) def BuildFileList(self): """lists all files attached to the bibdoc""" self.docfiles = [] if os.path.exists(self.basedir): for fil in os.listdir(self.basedir): if fil != ".recid" and fil != ".docid" and fil != "." and fil != "..": filepath = "%s/%s" % (self.basedir,fil) fileversion = re.sub(".*;","",fil) fullname = fil.replace(";%s" % fileversion,"") # detect fullname's basename and extension: fullname_lowercase = fullname.lower() fullname_extension_postition = -1 # first try to detect compressed file extensions: for compressed_file_extension in cfg_compressed_file_extensions_sorted: if fullname_lowercase.endswith("." + compressed_file_extension): fullname_extension_postition = fullname[:-len(compressed_file_extension)-1].rfind(".") break if fullname_extension_postition == -1: # okay, no compressed extension found, so try to find last dot: fullname_extension_postition = fullname.rfind(".") # okay, fullname_extension_postition should now indicate where extension starts (incl. compressed ones) if fullname_extension_postition == -1: fullname_basename = fullname fullname_extension = "" else: fullname_basename = fullname[:fullname_extension_postition] fullname_extension = fullname[fullname_extension_postition+1:] # we can append file: self.docfiles.append(BibDocFile(filepath,self.type,fileversion,fullname_basename,fullname_extension,self.id)) def BuildRelatedFileList(self): res = run_sql("select ln.id_bibdoc2,ln.type from bibdoc_bibdoc as ln,bibdoc where id=ln.id_bibdoc2 and ln.id_bibdoc1=%s and status!='deleted'",(self.id,)) for row in res: bibdocid = row[0] type = row[1] if not self.relatedFiles.has_key(type): self.relatedFiles[type] = [] self.relatedFiles[type].append(BibDoc(bibdocid=bibdocid)) def listAllFiles(self): return self.docfiles def listLatestFiles(self): return self.listVersionFiles(self.getLatestVersion()) def listVersionFiles(self,version): tmp = [] for docfile in self.docfiles: if docfile.getVersion() == version: tmp.append(docfile) return tmp def getLatestVersion(self): if len(self.docfiles) > 0: self.docfiles.sort(orderFilesWithVersion) return self.docfiles[0].getVersion() else: return 0 def getFileNumber(self): return len(self.files) def registerDownload(self,addressIp,version,format,userid=0): return run_sql("INSERT INTO rnkDOWNLOADS (id_bibrec,id_bibdoc,file_version,file_format,id_user,client_host,download_time) VALUES (%s,%s,%s,%s,%s,INET_ATON(%s),NOW())", (self.recid,self.id,version,string.upper(format),userid,addressIp,)) class BibDocFile: """this class represents a physical file in the CDSware filesystem""" def __init__(self,fullpath,type,version,name,format,bibdocid): self.fullpath = fullpath self.type = type self.bibdocid = bibdocid self.version = version self.size = os.path.getsize(fullpath) self.md = os.path.getmtime(fullpath) try: self.cd = os.path.getctime(fullpath) except: self.cd = self.md self.name = name self.format = format self.dir = os.path.dirname(fullpath) if format == "": self.mime = "text/plain" self.encoding = "" self.fullname = name else: self.fullname = "%s.%s" % (name,format) (self.mime,self.encoding) = mimetypes.guess_type(self.fullname) if self.mime == None: self.mime = "text/plain" def display(self, ln = cdslang): if self.format != "": format = ".%s" % self.format else: format = "" return websubmit_templates.tmpl_bibdocfile_filelist( ln = ln, weburl = weburl, id = self.bibdocid, selfformat = self.format, version = self.version, name = self.name, format = format, size = self.size, ) def getType(self): return self.type def getPath(self): return self.fullpath def getBibDocId(self): return self.bibdocid def getName(self): return self.name def getFormat(self): return self.format def getSize(self): return self.size def getVersion(self): return self.version def getRecid(self): return run_sql("select id_bibrec from bibrec_bibdoc where id_bibdoc=%s",(self.bibdocid,))[0][0] def stream(self,req): if os.path.exists(self.fullpath): req.content_type = self.mime req.encoding = self.encoding req.filename = self.fullname req.headers_out["Content-Disposition"] = "file; filename=%s" % self.fullname req.send_http_header() fp = file(self.fullpath,"r") content = fp.read() fp.close() return content def readfile(path): if os.path.exists(path): fp = open(path,"r") content = fp.read() fp.close() return content def listTypesFromArray(bibdocs): types = [] for bibdoc in bibdocs: if not bibdoc.getType() in types: types.append(bibdoc.getType()) return types def listVersionsFromArray(docfiles): versions = [] for docfile in docfiles: if not docfile.getVersion() in versions: versions.append(docfile.getVersion()) return versions def orderFilesWithVersion(docfile1,docfile2): """order docfile objects according to their version""" version1 = int(docfile1.getVersion()) version2 = int(docfile2.getVersion()) return cmp(version2,version1) diff --git a/modules/websubmit/lib/websubmit_config.py b/modules/websubmit/lib/websubmit_config.py index 585f9278e..9396bc064 100644 --- a/modules/websubmit/lib/websubmit_config.py +++ b/modules/websubmit/lib/websubmit_config.py @@ -1,57 +1,57 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. """CDSware Submission Web Interface config file.""" ## import config variables defined from config.wml: -from config import adminemail, \ - supportemail, \ - images, \ - urlpath, \ - accessurl, \ - counters, \ - storage, \ - filedir, \ - filedirsize, \ - gfile, \ - gzip, \ - tar, \ - gunzip, \ - acroread, \ - distiller, \ - convert, \ - tmpdir, \ - bibupload, \ - bibformat, \ - bibwords, \ - bibconvert, \ - bibconvertconf, \ - htdocsurl +from cdsware.config import adminemail, \ + supportemail, \ + images, \ + urlpath, \ + accessurl, \ + counters, \ + storage, \ + filedir, \ + filedirsize, \ + gfile, \ + gzip, \ + tar, \ + gunzip, \ + acroread, \ + distiller, \ + convert, \ + tmpdir, \ + bibupload, \ + bibformat, \ + bibwords, \ + bibconvert, \ + bibconvertconf, \ + htdocsurl ## test: test = "FALSE" ## CC all action confirmation mails to administrator? (0 == NO; 1 == YES) cfg_websubmit_copy_mails_to_admin = 0 ## known compressed file extensions: cfg_compressed_file_extensions = ["z", "gz", "tar", "tgz", "tar", "tar.gz", "zip", "rar", "arj", "arc", "pak", "lha", "lhz", "sit", "sea", "sitx", "cpt", "hqx", "uu", "uue", "bz", "bz2", "bzip", "tbz", "tbz2", "tar.bz", "tar.bz2"] diff --git a/modules/websubmit/lib/websubmit_engine.py b/modules/websubmit/lib/websubmit_engine.py index 8cce85632..232478cc2 100644 --- a/modules/websubmit/lib/websubmit_engine.py +++ b/modules/websubmit/lib/websubmit_engine.py @@ -1,1163 +1,1162 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## import interesting modules: import string import os import sys import time import types import re import MySQLdb import shutil - -from config import * -from dbquery import run_sql -from access_control_engine import acc_authorize_action -from access_control_admin import acc_isRole -from webpage import page, create_error_box -from webuser import getUid, get_email -from messages import * from mod_python import apache -from websubmit_config import * -from file import * - -from messages import gettext_set_language -import template -websubmit_templates = template.load('websubmit') +from cdsware.config import * +from cdsware.dbquery import run_sql +from cdsware.access_control_engine import acc_authorize_action +from cdsware.access_control_admin import acc_isRole +from cdsware.webpage import page, create_error_box +from cdsware.webuser import getUid, get_email +from cdsware.messages import * +from cdsware.websubmit_config import * +from cdsware.file import * + +from cdsware.messages import gettext_set_language +import cdsware.template +websubmit_templates = cdsware.template.load('websubmit') def interface(req,c=cdsname,ln=cdslang, doctype="", act="", startPg=1, indir="", access="",mainmenu="",fromdir="",file="",nextPg="",nbPg="",curpage=1): ln = wash_language(ln) # load the right message language _ = gettext_set_language(ln) sys.stdout = req # get user ID: try: uid = getUid(req) uid_email = get_email(uid) except MySQLdb.Error, e: return errorMsg(e.value,req, c, ln) # variable initialisation t = "" field = [] fieldhtml = [] level = [] fullDesc = [] text = [] check = [] select = [] radio = [] upload = [] txt = [] noPage = [] # Preliminary tasks # check that the user is logged in if uid_email == "" or uid_email == "guest": return warningMsg(websubmit_templates.tmpl_warning_message( ln = ln, msg = _("Sorry, you must log in to perform this action. Please use the top right menu to do so.") ), req, ln) # warningMsg("""
""",req, ln) # check we have minimum fields if doctype=="" or act=="" or access=="": return errorMsg(_("invalid parameter"),req, c, ln) # retrieve the action and doctype data if indir == "": res = run_sql("select dir from sbmACTION where sactname=%s",(act,)) if len(res) == 0: return errorMsg(_("cannot find submission directory"),req, c, ln) else: row = res[0] indir = row[0] res = run_sql("SELECT ldocname FROM sbmDOCTYPE WHERE sdocname=%s",(doctype,)) if len(res) == 0: return errorMsg(_("unknown document type"),req, c, ln) else: docname = res[0][0] docname = string.replace(docname," "," ") res = run_sql("SELECT lactname FROM sbmACTION WHERE sactname=%s",(act,)) if len(res) == 0: return errorMsg(_("unknown action"),req, c, ln) else: actname = res[0][0] actname = string.replace(actname," "," ") subname = "%s%s" % (act,doctype) res = run_sql("SELECT nbpg FROM sbmIMPLEMENT WHERE subname=%s", (subname,)) if len(res) == 0: return errorMsg(_("can't figure number of pages"),req, c, ln) else: nbpages = res[0][0] #Get current page if startPg != "" and (curpage=="" or curpage==0): curpage = startPg # retrieve the name of the file in which the reference of # the submitted document will be stored res = run_sql("SELECT value FROM sbmPARAMETERS WHERE doctype=%s and name='edsrn'", (doctype,)) if len(res) == 0: edsrn = "" else: edsrn = res[0][0] # This defines the path to the directory containing the action data curdir = "%s/%s/%s/%s" % (storage,indir,doctype,access) # if this submission comes from another one ($fromdir is then set) # We retrieve the previous submission directory and put it in the proper one if fromdir != "": olddir = "%s/%s/%s/%s" % (storage,fromdir,doctype,access) if os.path.exists(olddir): os.rename(olddir,curdir) # If the submission directory still does not exist, we create it if not os.path.exists(curdir): try: os.makedirs(curdir) except: return errorMsg(_("can't create submission directory"),req, c, ln) # retrieve the original main menu url ans save it in the "mainmenu" file if mainmenu != "": fp = open("%s/mainmenu" % curdir,"w") fp.write(mainmenu) fp.close() # and if the file containing the URL to the main menu exists # we retrieve it and store it in the $mainmenu variable if os.path.exists("%s/mainmenu" % curdir): fp = open("%s/mainmenu" % curdir,"r"); mainmenu = fp.read() fp.close() else: mainmenu = "%s/submit.py" %urlpath # various authentication related tasks... if uid_email != "guest" and uid_email != "": #First save the username (email address) in the SuE file. This way bibconvert will be able to use it if needed fp = open("%s/SuE" % curdir,"w") fp.write(uid_email) fp.close() # is user authorized to perform this action? (auth_code, auth_message) = acc_authorize_action(uid, "submit",verbose=0,doctype=doctype, act=act) if acc_isRole("submit",doctype=doctype,act=act) and auth_code != 0: return warningMsg("
%s
" % auth_message, req) # then we update the "journal of submission" res = run_sql("SELECT * FROM sbmSUBMISSIONS WHERE doctype=%s and action=%s and id=%s and email=%s", (doctype,act,access,uid_email,)) if len(res) == 0: run_sql("INSERT INTO sbmSUBMISSIONS values (%s,%s,%s,'pending',%s,'',NOW(),NOW())", (uid_email,doctype,act,access,)) else: run_sql("UPDATE sbmSUBMISSIONS SET md=NOW() WHERE doctype=%s and action=%s and id=%s and email=%s", (doctype,act,access,uid_email,)) # Save the form fields entered in the previous submission page # If the form was sent with the GET method form = req.form value = "" # we parse all the form variables for key in form.keys(): formfields = form[key] if re.search("\[\]",key): filename = key.replace("[]","") else: filename = key # the field is an array if isinstance(formfields,types.ListType): fp = open("%s/%s" % (curdir,filename),"w") for formfield in formfields: #stripslashes(value) value = specialchars(formfield) fp.write(value+"\n") fp.close() # the field is a normal string elif isinstance(formfields,types.StringTypes) and formfields != "": value = formfields fp = open("%s/%s" % (curdir,filename),"w") fp.write(specialchars(value)) fp.close() # the field is a file elif hasattr(formfields,"filename"): if not os.path.exists("%s/files/%s" % (curdir,key)): try: os.makedirs("%s/files/%s" % (curdir,key)) except: return errorMsg(_("can't create submission directory"),req, c, ln) filename = formfields.filename if filename != "": # This may be dangerous if the file size is bigger than the available memory data = formfields.file.read() fp = open("%s/files/%s/%s" % (curdir,key,filename),"w") fp.write(data) fp.close() fp = open("%s/lastuploadedfile" % curdir,"w") fp.write(filename) fp.close() fp = open("%s/%s" % (curdir,key),"w") fp.write(filename) fp.close() # if the found field is the reference of the document # we save this value in the "journal of submissions" if uid_email != "" and uid_email != "guest": if key == edsrn: run_sql("UPDATE sbmSUBMISSIONS SET reference=%s WHERE doctype=%s and id=%s and email=%s", (value,doctype,access,uid_email,)) # Now deal with the cookies # If the fields must be saved as a cookie, we do so # In this case, the value of the field will be retrieved and # displayed as the default value of the field next time the user # does a submission if value!="": res = run_sql("SELECT cookie FROM sbmFIELDDESC WHERE name=%s", (key,)) if len(res) > 0: if res[0][0] == 1: setCookie(key,value,uid) # create interface # For each field to be displayed on the page subname = "%s%s" % (act,doctype) res = run_sql("SELECT * FROM sbmFIELD WHERE subname=%s and pagenb=%s ORDER BY fieldnb,fieldnb", (subname,curpage,)) full_fields = [] values = [] for arr in res: full_field = {} # We retrieve its HTML description res3 = run_sql("SELECT * FROM sbmFIELDDESC WHERE name=%s", (arr[3],)) arr3 = res3[0] if arr3[8]==None: val="" else: val=arr3[8] # we also retrieve and add the javascript code of the checking function, if needed full_field['javascript'] = '' if arr[7] != '': res2 = run_sql("SELECT chdesc FROM sbmCHECKS WHERE chname=%s", (arr[7],)) full_field['javascript'] = res2[0][0] full_field['type'] = arr3[3] full_field['name'] = arr[3] full_field['rows'] = arr3[5] full_field['cols'] = arr3[6] full_field['val'] = val full_field['size'] = arr3[4] full_field['maxlength'] = arr3[7] full_field['htmlcode'] = arr3[9] full_field['typename'] = arr[1] # The 'R' fields must be executed in the engine's environment, # as the runtime functions access some global and local # variables. if full_field ['type'] == 'R': co = compile (full_field ['htmlcode'].replace("\r\n","\n"),"","exec") exec(co) else: text = websubmit_templates.tmpl_submit_field (ln = ln, field = full_field) # we now determine the exact type of the created field if full_field['type'] not in [ 'D','R']: field.append(full_field['name']) level.append(arr[5]) fullDesc.append(arr[4]) txt.append(arr[6]) check.append(arr[7]) # If the field is not user-defined, we try to determine its type # (select, radio, file upload...) # check whether it is a select field or not if re.search("SELECT",text,re.IGNORECASE) != None: select.append(1) else: select.append(0) # checks whether it is a radio field or not if re.search(r"TYPE=[\"']?radio",text,re.IGNORECASE) != None: radio.append(1) else: radio.append(0) # checks whether it is a file upload or not if re.search(r"TYPE=[\"']?file",text,re.IGNORECASE) != None: upload.append(1) else: upload.append(0) # if the field description contains the "" string, replace # it by the category selected on the document page submission page combofile = "combo%s" % doctype if os.path.exists("%s/%s" % (curdir,combofile)): f = open("%s/%s" % (curdir,combofile),"r") combo = f.read() f.close() else: combo="" text = text.replace("",combo) # if there is a tag in it, replace it by the current year year = time.strftime("%Y"); text = text.replace("",year) # if there is a tag in it, replace it by the current year today = time.strftime("%d/%m/%Y"); text = text.replace("",today) fieldhtml.append(text) else: select.append(0) radio.append(0) upload.append(0) # field.append(value) - initial version, not working with JS, taking a submitted value field.append(arr[3]) level.append(arr[5]) txt.append(arr[6]) fullDesc.append(arr[4]) check.append(arr[7]) fieldhtml.append(text) full_field['fullDesc'] = arr[4] full_field['text'] = text # If a file exists with the name of the field we extract the saved value text = '' if os.path.exists("%s/%s" % (curdir,full_field['name'])): file = open("%s/%s" % (curdir,full_field['name']),"r"); text = file.read() text = re.compile("[\n\r]*$").sub("",text) text = re.compile("\n").sub("\\n",text) text = re.compile("\r").sub("",text) file.close() # Or if a cookie is set # If a cookie is found corresponding to the name of the current # field, we set the value of the field to the cookie's value elif getCookie(full_field['name'],uid) != None: value = getCookie(full_field['name'],uid) value = re.compile("\r").sub("",value) value = re.compile("\n").sub("\\n",value) text = value values.append(text) full_fields.append(full_field) returnto = {} if int(curpage) == int(nbpages): subname = "%s%s" % (act,doctype) res = run_sql("SELECT * FROM sbmFIELD WHERE subname=%s and pagenb!=%s", (subname,curpage,)) nbFields = 0 message = "" fullcheck_select = [] fullcheck_radio = [] fullcheck_upload = [] fullcheck_field = [] fullcheck_level = [] fullcheck_txt = [] fullcheck_noPage = [] fullcheck_check = [] for arr in res: if arr[5] == "M": res2 = run_sql("SELECT * FROM sbmFIELDDESC WHERE name=%s", (arr[3],)); row2 = res2[0] if row2[3] in ['D','R']: if row2[3] == "D": text = row2[9] else: text = eval(row2[9]) formfields = text.split(">") for formfield in formfields: match = re.match("name=([^ <>]+)",formfield,re.IGNORECASE) if match != None: names = match.groups for value in names: if value != "": value = re.compile("[\"']+").sub("",value) fullcheck_field.append(value) fullcheck_level.append(arr[5]) fullcheck_txt.append(arr[6]) fullcheck_noPage.append(arr[1]) fullcheck_check.append(arr[7]) nbFields = nbFields+1 else: fullcheck_noPage.append(arr[1]) fullcheck_field.append(arr[3]) fullcheck_level.append(arr[5]) fullcheck_txt.append(arr[6]) fullcheck_check.append(arr[7]) nbFields = nbFields+1 # tests each mandatory field fld = 0 res = 1 for i in range (0,nbFields): res = 1 if not os.path.exists("%s/%s" % (curdir,fullcheck_field[i])): res=0 else: file = open("%s/%s" % (curdir,fullcheck_field[i]),"r") text = file.read() if text == '': res=0 else: if text == "Select:": res=0 if res == 0: fld = i break if not res: returnto = { 'field' : fullcheck_txt[fld], 'page' : fullcheck_noPage[fld], } t += websubmit_templates.tmpl_page_interface( ln = ln, docname = docname, actname = actname, curpage = curpage, nbpages = nbpages, file = file, nextPg = nextPg, access = access, nbPg = nbPg, doctype = doctype, act = act, indir = indir, fields = full_fields, javascript = websubmit_templates.tmpl_page_interface_js( ln = ln, upload = upload, field = field, fieldhtml = fieldhtml, txt = txt, check = check, level = level, curdir = curdir, values = values, select = select, radio = radio, curpage = curpage, nbpages = nbpages, images = images, returnto = returnto, ), images = images, mainmenu = mainmenu, ) # start display: req.content_type = "text/html" req.send_http_header() p_navtrail = """%(submit)s > %(docname)s """ % { 'submit' : _("Submit"), 'doctype' : doctype, 'docname' : docname, } return page(title= actname, body = t, navtrail = p_navtrail, description = "submit documents in CDSWare", keywords = "submit, CDSWare", uid = uid, language = ln, urlargs = req.args) def endaction(req,c=cdsname,ln=cdslang, doctype="", act="", startPg=1, indir="", access="",mainmenu="",fromdir="",file="",nextPg="",nbPg="",curpage=1,step=1,mode="U"): global rn,sysno,dismode,curdir,uid,uid_email,last_step,action_score # load the right message language _ = gettext_set_language(ln) try: rn except NameError: rn = "" dismode = mode ln = wash_language(ln) sys.stdout = req t="" # get user ID: try: uid = getUid(req) uid_email = get_email(uid) except MySQLdb.Error, e: return errorMsg(e.value, req, c, ln) # Preliminary tasks # check that the user is logged in if uid_email == "" or uid_email == "guest": return warningMsg(websubmit_templates.tmpl_warning_message( ln = ln, msg = _("Sorry, you must log in to perform this action. Please use the top right menu to do so.") ), req, ln) # check we have minimum fields if doctype=="" or act=="" or access=="": return errorMsg(_("invalid parameter"),req, c, ln) # retrieve the action and doctype data if indir == "": res = run_sql("select dir from sbmACTION where sactname=%s", (act,)) if len(res) == 0: return errorMsg(_("cannot find submission directory"),req, c, ln) else: row = res[0] indir = row[0] # The following words are reserved and should not be used as field names reserved_words = ["stop","file","nextPg","startPg","access","curpage","nbPg","act","indir","doctype","mode","step","deleted","file_path","userfile_name"] # This defines the path to the directory containing the action data curdir = "%s/%s/%s/%s" % (storage,indir,doctype,access) # If the submission directory still does not exist, we create it if not os.path.exists(curdir): try: os.makedirs(curdir) except: return errorMsg(_("can't create submission directory"),req, c, ln) # retrieve the original main menu url ans save it in the "mainmenu" file if mainmenu != "": fp = open("%s/mainmenu" % curdir,"w") fp.write(mainmenu) fp.close() # and if the file containing the URL to the main menu exists # we retrieve it and store it in the $mainmenu variable if os.path.exists("%s/mainmenu" % curdir): fp = open("%s/mainmenu" % curdir,"r"); mainmenu = fp.read() fp.close() else: mainmenu = "%s/submit.py" % urlpath # retrieve the name of the file in which the reference of # the submitted document will be stored res = run_sql("SELECT value FROM sbmPARAMETERS WHERE doctype=%s and name='edsrn'",(doctype,)) if len(res) == 0: edsrn = "" else: edsrn = res[0][0] # Now we test whether the user has already completed the action and # reloaded the page (in this case we don't want the functions to be called # once again # reloaded = Test_Reload(uid_email,doctype,act,access) # if the action has been completed #if reloaded: # return warningMsg(" Sorry, this action has already been completed. Please go back to the main menu to start a new action.",req) # We must determine if the action is finished (ie there is no other steps after the current one res = run_sql("SELECT step FROM sbmFUNCTIONS WHERE action=%s and doctype=%s and step > %s", (act,doctype,step,)) if len(res) == 0: finished = 1 else: finished = 0 # Save the form fields entered in the previous submission page # If the form was sent with the GET method form = req.form value = "" # we parse all the form variables for key in form.keys(): formfields = form[key] if re.search("\[\]",key): filename = key.replace("[]","") else: filename = key # the field is an array if isinstance(formfields,types.ListType): fp = open("%s/%s" % (curdir,filename),"w") for formfield in formfields: #stripslashes(value) value = specialchars(formfield) fp.write(value+"\n") fp.close() # the field is a normal string elif isinstance(formfields,types.StringTypes) and formfields != "": value = formfields fp = open("%s/%s" % (curdir,filename),"w") fp.write(specialchars(value)) fp.close() # the field is a file elif hasattr(formfields,"filename"): if not os.path.exists("%s/files/%s" % (curdir,key)): try: os.makedirs("%s/files/%s" % (curdir,key)) except: return errorMsg("can't create submission directory",req,cdsname,ln) filename = formfields.filename if filename != "": # This may be dangerous if the file size is bigger than the available memory data = formfields.file.read() fp = open("%s/files/%s/%s" % (curdir,key,filename),"w") fp.write(data) fp.close() fp = open("%s/lastuploadedfile" % curdir,"w") fp.write(filename) fp.close() fp = open("%s/%s" % (curdir,key),"w") fp.write(filename) fp.close() # if the found field is the reference of the document # we save this value in the "journal of submissions" if uid_email != "" and uid_email != "guest": if key == edsrn: run_sql("UPDATE sbmSUBMISSIONS SET reference=%s WHERE doctype=%s and id=%s and email=%s", (value,doctype,access,uid_email,)) # Now deal with the cookies # If the fields must be saved as a cookie, we do so # In this case, the value of the field will be retrieved and # displayed as the default value of the field next time the user # does a submission if value!="": res = run_sql("SELECT cookie FROM sbmFIELDDESC WHERE name=%s", (key,)) if len(res) > 0: if res[0][0] == 1: setCookie(key,value,uid) # Get document name res = run_sql("SELECT ldocname FROM sbmDOCTYPE WHERE sdocname=%s", (doctype,)) if len(res) > 0: docname = res[0][0] else: return errorMsg(_("unknown type of document"),req,cdsname,ln) # Get action name res = run_sql("SELECT lactname FROM sbmACTION WHERE sactname=%s", (act,)) if len(res) > 0: actname = res[0][0] else: return errorMsg(_("unknown action"),req,cdsname,ln) # Get number of pages subname = "%s%s" % (act,doctype) res = run_sql("SELECT nbpg FROM sbmIMPLEMENT WHERE subname=%s",(subname,)) if len(res) > 0: nbpages = res[0][0] else: return errorMsg(_("this action does not apply on this type of document"),req,cdsname,ln) # we specify here whether we are in the last step of the action or not res = run_sql("SELECT step FROM sbmFUNCTIONS WHERE action=%s and doctype=%s and step>%s", (act,doctype,step,)) if len(res) == 0: last_step = 1 else: last_step = 0 # Prints the action details, returning the mandatory score action_score = action_details(doctype,act) current_level = get_level(doctype, act) # Calls all the function's actions function_content = '' try: function_content = print_function_calls(doctype, act, step, form) except functionError,e: return errorMsg(e.value,req, c, ln) except functionStop,e: if e.value != None: function_content = e.value else: function_content = e # If the action was mandatory we propose the next mandatory action (if any) next_action = '' if action_score != -1 and last_step == 1: next_action = Propose_Next_Action(doctype,action_score,access,current_level,indir) # If we are in the last step of an action, we can update the "journal of submissions" if last_step == 1: if uid_email != "" and uid_email != "guest" and rn != "": res = run_sql("SELECT * FROM sbmSUBMISSIONS WHERE doctype=%s and action=%s and id=%s and email=%s", (doctype,act,access,uid_email,)) if len(res) == 0: run_sql("INSERT INTO sbmSUBMISSIONS values(%s,%s,%s,'finished',%s,%s,NOW(),NOW())", (uid_email,doctype,act,access,rn,)) else: run_sql("UPDATE sbmSUBMISSIONS SET md=NOW(),reference=%s,status='finished' WHERE doctype=%s and action=%s and id=%s and email=%s", (rn,doctype,act,access,uid_email,)) t = websubmit_templates.tmpl_page_endaction( ln = ln, weburl = weburl, # these fields are necessary for the navigation file = file, nextPg = nextPg, startPg = startPg, access = access, curpage = curpage, nbPg = nbPg, nbpages = nbpages, doctype = doctype, act = act, docname = docname, actname = actname, indir = indir, mainmenu = mainmenu, finished = finished, images = images, function_content = function_content, next_action = next_action, ) # start display: req.content_type = "text/html" req.send_http_header() p_navtrail = """""" + _("Submit") +\ """ > %(docname)s""" % { 'doctype' : doctype, 'docname' : docname, } return page(title= actname, body = t, navtrail = p_navtrail, description="submit documents in CDSWare", keywords="submit, CDSWare", uid = uid, language = ln, urlargs = req.args) def simpleendaction(doctype="", act="", startPg=1, indir="", access="",step=1,mode="U"): global rn,sysno,dismode,curdir,uid,uid_email,lats_step,action_score dismode = mode # check we have minimum fields if doctype=="" or act=="" or access=="": return "invalid parameter" # retrieve the action and doctype data if indir == "": res = run_sql("select dir from sbmACTION where sactname=%s", (act,)) if len(res) == 0: return "cannot find submission directory" else: row = res[0] indir = row[0] # This defines the path to the directory containing the action data curdir = "%s/%s/%s/%s" % (storage,indir,doctype,access) # If the submission directory still does not exist, we create it if not os.path.exists(curdir): return "submission directory %s does not exist" % curdir # retrieve the name of the file in which the reference of # the submitted document will be stored res = run_sql("SELECT value FROM sbmPARAMETERS WHERE doctype=%s and name='edsrn'",(doctype,)) if len(res) == 0: edsrn = "" else: edsrn = res[0][0] # Get document name res = run_sql("SELECT ldocname FROM sbmDOCTYPE WHERE sdocname=%s", (doctype,)) if len(res) > 0: docname = res[0][0] else: return "unknown type of document %s" % doctype # Get action name res = run_sql("SELECT lactname FROM sbmACTION WHERE sactname=%s", (act,)) if len(res) > 0: actname = res[0][0] else: return "unknown action %s" % act # Prints the action details, returning the mandatory score action_score = action_details(doctype,act) current_level = get_level(doctype, act) # Calls all the function's actions print_function_calls(doctype, act, step, "") return "ok" def home(req,c=cdsname,ln=cdslang): """ Generates and displays the default "home page" for Web-submit - contains a list of links to the various document submissions. """ ln = wash_language(ln) # get user ID: try: uid = getUid(req) except MySQLdb.Error, e: return errorMsg(e.value) # start display: req.content_type = "text/html" req.send_http_header() # load the right message language _ = gettext_set_language(ln) finaltext = websubmit_templates.tmpl_submit_home_page( ln = ln, catalogues = makeCataloguesTable(ln) ) return page(title=_("Submit"), body=finaltext, navtrail=[], description="submit documents in CDSWare", keywords="submit, CDSWare", uid=uid, language=ln, urlargs=req.args ) def makeCataloguesTable(ln): text = "" catalogues = [] queryResult = run_sql("SELECT id_son FROM sbmCOLLECTION_sbmCOLLECTION WHERE id_father=0 ORDER BY catalogue_order"); if len(queryResult) != 0: # Query has executed successfully, so we can proceed to display all # catalogues in the EDS system... for row in queryResult: catalogues.append(getCatalogueBranch(row[0], 1)) text = websubmit_templates.tmpl_submit_home_catalogs( ln = ln, catalogs = catalogues ) else: text = websubmit_templates.tmpl_submit_home_catalog_no_content(ln = ln) return text def getCatalogueBranch(id_father,level): elem = {} queryResult = run_sql("SELECT name, id FROM sbmCOLLECTION WHERE id=%s", (id_father,)) if len(queryResult) != 0: row = queryResult[0] elem['name'] = row[0] elem['id'] = row[1] elem['level'] = level # display the son document types elem['docs'] = [] res1 = run_sql("SELECT id_son FROM sbmCOLLECTION_sbmDOCTYPE WHERE id_father=%s ORDER BY catalogue_order", (id_father,)) if len(res1) != 0: for row in res1: elem['docs'].append(getDoctypeBranch(row[0])) elem['sons'] = [] res2 = run_sql("SELECT id_son FROM sbmCOLLECTION_sbmCOLLECTION WHERE id_father=%s ORDER BY catalogue_order", (id_father,)) if len(res2) != 0: for row in res2: elem['sons'].append(getCatalogueBranch(row[0], level + 1)) return elem def getDoctypeBranch(doctype): res = run_sql("SELECT ldocname FROM sbmDOCTYPE WHERE sdocname=%s", (doctype,)) return {'id' : doctype, 'name' : res[0][0], } def displayCatalogueBranch(id_father,level,catalogues): text = "" queryResult = run_sql("SELECT name, id FROM sbmCOLLECTION WHERE id=%s", (id_father,)) if len(queryResult) != 0: row = queryResult[0] if level == 1: text = "
  • %s\n" % row[0] else: if level == 2: text = "
  • %s\n" % row[0] else: if level > 2: text = "
  • %s\n" % row[0] # display the son document types res1 = run_sql("SELECT id_son FROM sbmCOLLECTION_sbmDOCTYPE WHERE id_father=%s ORDER BY catalogue_order", (id_father,)) res2 = run_sql("SELECT id_son FROM sbmCOLLECTION_sbmCOLLECTION WHERE id_father=%s ORDER BY catalogue_order", (id_father,)) if len(res1) != 0 or len(res2) != 0: text = text + "
      \n" if len(res1) != 0: for row in res1: text = text + displayDoctypeBranch(row[0],catalogues) # display the son catalogues for row in res2: catalogues.append(row[0]) text = text + displayCatalogueBranch(row[0],level+1,catalogues) if len(res1) != 0 or len(res2) != 0: text = text + "
    \n" return text def displayDoctypeBranch(doctype,catalogues): text = "" res = run_sql("SELECT ldocname FROM sbmDOCTYPE WHERE sdocname=%s", (doctype,)) row = res[0] text = "
  • %s\n" % (doctype,doctype,doctype,row[0]) return text def action(req,c=cdsname,ln=cdslang,doctype=""): # load the right message language _ = gettext_set_language(ln) nbCateg = 0 snameCateg = [] lnameCateg = [] actionShortDesc = [] indir = [] actionbutton = [] statustext = [] t = "" ln = wash_language(ln) # get user ID: try: uid = getUid(req) uid_email = get_email(uid) except MySQLdb.Error, e: return errorMsg(e.value, req, ln) #parses database to get all data #first the list of categories res = run_sql("SELECT * FROM sbmCATEGORIES WHERE doctype=%s ORDER BY lname", (doctype,)) if len(res) > 0: for arr in res: nbCateg = nbCateg+1 snameCateg.append(arr[1]) lnameCateg.append(arr[2]) #then data about the document type res = run_sql("SELECT * FROM sbmDOCTYPE WHERE sdocname=%s", (doctype,)) if len(res) > 0: arr = res[0] docFullDesc = arr[0] docShortDesc = arr[1] description = arr[4] else: return errorMsg (_("Cannot find document %s") % doctype, req) #then data about associated actions res2 = run_sql("SELECT * FROM sbmIMPLEMENT LEFT JOIN sbmACTION on sbmACTION.sactname=sbmIMPLEMENT.actname WHERE docname=%s and displayed='Y' ORDER BY sbmIMPLEMENT.buttonorder", (docShortDesc,)) for arr2 in res2: res = run_sql("SELECT * FROM sbmACTION WHERE sactname=%s", (arr2[1],)) for arr in res: actionShortDesc.append(arr[1]) indir.append(arr[2]) actionbutton.append(arr[5]) statustext.append(arr[6]) t = websubmit_templates.tmpl_action_page( ln = ln, guest = (uid_email == "" or uid_email == "guest"), pid = os.getpid(), now = time.time(), doctype = doctype, description = description, docfulldesc = docFullDesc, snameCateg = snameCateg, lnameCateg = lnameCateg, actionShortDesc = actionShortDesc, indir = indir, # actionbutton = actionbutton, statustext = statustext, ) p_navtrail = """%(submit)s""" % {'submit' : _("Submit")} return page(title = docFullDesc, body=t, navtrail=p_navtrail, description="submit documents in CDSWare", keywords="submit, CDSWare", uid=uid, language=ln, urlargs=req.args ) def set_report_number (newrn): global uid_email,doctype,access,rn # First we save the value in the global object rn = newrn # then we save this value in the "journal of submissions" if uid_email != "" and uid_email != "guest": run_sql("UPDATE sbmSUBMISSIONS SET reference=%s WHERE doctype=%s and id=%s and email=%s", (newrn,doctype,access,uid_email,)) def get_report_number(): global rn return rn def set_sysno (newsn) : global sysno sysno = newsn def get_sysno() : global sysno return sysno def Request_Print(m, txt): # The argumemts to this function are the display mode (m) and the text to be displayed (txt) # If the argument mode is 'ALL' then the text is unconditionally echoed # m can also take values S (Supervisor Mode) and U (User Mode). In these # circumstances txt is only echoed if the argument mode is the same as # the current mode global dismode if m == "A" or m == dismode: return txt else: return "" def Evaluate_Parameter (field, doctype): # Returns the literal value of the parameter. Assumes that the value is # uniquely determined by the doctype, i.e. doctype is the primary key in # the table # If the table name is not null, evaluate the parameter res = run_sql("SELECT value FROM sbmPARAMETERS WHERE doctype=%s and name=%s", (doctype,field,)) # If no data is found then the data concerning the DEF(ault) doctype is used if len(res) == 0: res = run_sql("SELECT value FROM sbmPARAMETERS WHERE doctype='DEF' and name=%s", (field,)) if len(res) == 0: return "" else: if res[0][0] != None: return res[0][0] else: return "" def Get_Parameters (function, doctype): # Returns the function parameters, in an array, for the function # Gets a description of the parameter parray = {} res = run_sql("SELECT * FROM sbmFUNDESC WHERE function=%s", (function,)) for i in range(0,len(res)): parameter = res[i][1] parray[parameter] = Evaluate_Parameter (parameter , doctype) return parray def get_level (doctype, action): res = run_sql("SELECT * FROM sbmIMPLEMENT WHERE docname=%s and actname=%s", (doctype,action,)) if len(res) > 0: return res[0][9] else: return 0 def action_details (doctype, action): # Prints whether the action is mandatory or optional. The score of the # action is returned (-1 if the action was optional) res = run_sql("SELECT * FROM sbmIMPLEMENT WHERE docname=%s and actname=%s", (doctype,action,)) if len(res)>0: if res[0][9] != "0": return res[0][10] else: return -1 else: return -1 def print_function_calls (doctype, action, step, form): # Calls the functions required by an "action" action on a "doctype" document # In supervisor mode, a table of the function calls is produced global htdocsdir,storage,access,pylibdir,dismode t="" # Get the list of functions to be called res = run_sql("SELECT * FROM sbmFUNCTIONS WHERE action=%s and doctype=%s and step=%s ORDER BY score", (action,doctype,step,)) # If no data is found then the data concerning the DEF(ault) doctype is used if len(res) == 0: res = run_sql("SELECT * FROM sbmFUNCTIONS WHERE action=%s and doctype='DEF' and step=%s ORDER BY score", (action,step,)) if len(res) > 0: # while there are functions left... functions = [] for function in res: function_name = function[2] function_score = function[3] currfunction = { 'name' : function_name, 'score' : function_score, 'error' : 0, 'text' : '', } if os.path.exists("%s/cdsware/websubmit_functions/%s.py" % (pylibdir,function_name)): # import the function itself #function = getattr(cdsware.websubmit_functions, function_name) execfile("%s/cdsware/websubmit_functions/%s.py" % (pylibdir,function_name),globals()) if not globals().has_key(function_name): currfunction['error'] = 1 else: function = globals()[function_name] # Evaluate the parameters, and place them in an array parameters = Get_Parameters(function_name,doctype) # Call function currfunction['text'] = function(parameters,curdir,form) else: currfunction['error'] = 1 functions.append(currfunction) t = websubmit_templates.tmpl_function_output( ln = ln, display_on = (dismode == 'S'), action = action, doctype = doctype, step = step, functions = functions, ) else : if dismode == 'S': t = "

    " + _("Your chosen action is not supported by the document") + "" return t def Propose_Next_Action (doctype,action_score,access,currentlevel,indir): global machine,storage,act,rn t="" res = run_sql("SELECT * FROM sbmIMPLEMENT WHERE docname=%s and level!='0' and level=%s and score>%s ORDER BY score", (doctype,currentlevel,action_score,)) if len(res) > 0: actions = [] first_score = res[0][10] for i in range(0,len(res)): action = res[i] if action[10] == first_score: res2 = run_sql("SELECT dir FROM sbmACTION WHERE sactname=%s", (action[1],)) nextdir = res2[0][0] curraction = { 'page' : action[11], 'action' : action[1], 'doctype' : doctype, 'nextdir' : nextdir, 'access' : access, 'indir' : indir, 'name' : action[12], } actions.append(curraction) t = websubmit_templates.tmpl_next_action( ln = ln, actions = actions, ) return t def Test_Reload(uid_email,doctype,act,access): res = run_sql("SELECT * FROM sbmSUBMISSIONS WHERE doctype=%s and action=%s and id=%s and email=%s and status='finished'", (doctype,act,access,uid_email,)) if len(res) > 0: return 1 else: return 0 class functionError(Exception): def __init__(self, value): self.value = value def __str__(self): return repr(self.value) class functionStop(Exception): def __init__(self, value): self.value = value def __str__(self): return repr(self.value) def errorMsg(title,req,c=cdsname,ln=cdslang): # load the right message language _ = gettext_set_language(ln) return page(title = _("error"), body = create_error_box(req, title=title,verbose=0, ln=ln), description="%s - Internal Error" % c, keywords="%s, CDSware, Internal Error" % c, language=ln, urlargs=req.args) def warningMsg(title,req,c=cdsname,ln=cdslang): # load the right message language _ = gettext_set_language(ln) return page(title = _("warning"), body = title, description="%s - Internal Error" % c, keywords="%s, CDSware, Internal Error" % c, language=ln, urlargs=req.args) def getCookie(name,uid): # these are not real http cookies but are stored in the DB res = run_sql("select value from sbmCOOKIES where uid=%s and name=%s", (uid,name,)) if len(res) > 0: return res[0][0] else: return None def setCookie(name,value,uid): # these are not real http cookies but are stored in the DB res = run_sql("select id from sbmCOOKIES where uid=%s and name=%s", (uid,name,)) if len(res) > 0: run_sql("update sbmCOOKIES set value=%s where uid=%s and name=%s", (value,uid,name,)) else: run_sql("insert into sbmCOOKIES(name,value,uid) values(%s,%s,%s)", (name,value,uid,)) return 1 def specialchars(text): text = string.replace(text,"“","\042"); text = string.replace(text,"”","\042"); text = string.replace(text,"’","\047"); text = string.replace(text,"—","\055"); text = string.replace(text,"…","\056\056\056"); return text diff --git a/modules/websubmit/lib/websubmit_templates.py b/modules/websubmit/lib/websubmit_templates.py index 0a2456bd3..af748c23d 100644 --- a/modules/websubmit/lib/websubmit_templates.py +++ b/modules/websubmit/lib/websubmit_templates.py @@ -1,1929 +1,1929 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. import urllib import time import cgi import gettext import string import locale import re import operator import os -from config import * -from messages import gettext_set_language +from cdsware.config import * +from cdsware.messages import gettext_set_language class Template: def tmpl_submit_home_page(self, ln, catalogues): """ The content of the home page of the submit engine Parameters: - 'ln' *string* - The language to display the interface in - 'catalogues' *string* - The HTML code for the catalogues list """ # load the right message language _ = gettext_set_language(ln) return """ """ % { 'document_types' : _("Document types available for submission"), 'please_select' : _("Please select the type of document you want to submit"), 'catalogues' : catalogues, } def tmpl_submit_home_catalog_no_content(self, ln): """ The content of the home page of submit in case no doctypes are available Parameters: - 'ln' *string* - The language to display the interface in """ # load the right message language _ = gettext_set_language(ln) out = "

    " + _("No document types yet...") + "

    \n" return out def tmpl_submit_home_catalogs(self, ln, catalogs): """ Produces the catalogs' list HTML code Parameters: - 'ln' *string* - The language to display the interface in - 'catalogs' *array* - The catalogs of documents, each one a hash with the properties: - 'id' - the internal id - 'name' - the name - 'sons' - sub-catalogs - 'docs' - the contained document types, in the form: - 'id' - the internal id - 'name' - the name There is at least one catalog """ # load the right message language _ = gettext_set_language(ln) # import pprint # out = "
    " + pprint.pformat(catalogs)
             out = ""
             for catalog in catalogs:
                 out += "
      " out += self.tmpl_submit_home_catalogs_sub(ln, catalog) return out def tmpl_submit_home_catalogs_sub(self, ln, catalog): """ Recursive function that produces a catalog's HTML display Parameters: - 'ln' *string* - The language to display the interface in - 'catalog' *array* - A catalog of documents, with the properties: - 'id' - the internal id - 'name' - the name - 'sons' - sub-catalogs - 'docs' - the contained document types, in the form: - 'id' - the internal id - 'name' - the name """ # load the right message language _ = gettext_set_language(ln) if catalog['level'] == 1: out = "
    • %s\n" % catalog['name'] else: if catalog['level'] == 2: out = "
    • %s\n" % catalog['name'] else: if catalog['level'] > 2: out = "
    • %s\n" % catalog['name'] if len(catalog['docs']) or len(catalog['sons']): out += "
        " if len(catalog['docs']) != 0: for row in catalog['docs']: out += self.tmpl_submit_home_catalogs_doctype(ln, row) if len(catalog['sons']) != 0: for row in catalog['sons']: out += self.tmpl_submit_home_catalogs_sub(ln, row) if len(catalog['docs']) or len(catalog['sons']): out += "
      " return out def tmpl_submit_home_catalogs_doctype(self, ln, doc): """ Recursive function that produces a catalog's HTML display Parameters: - 'ln' *string* - The language to display the interface in - 'doc' *array* - A catalog of documents, with the properties: - 'id' - the internal id - 'name' - the name """ # load the right message language _ = gettext_set_language(ln) return """
    • %(name)s""" % doc def tmpl_action_page(self, ln, guest, pid, now, doctype, description, docfulldesc, snameCateg, lnameCateg, actionShortDesc, indir, statustext): """ Recursive function that produces a catalog's HTML display Parameters: - 'ln' *string* - The language to display the interface in - 'guest' *boolean* - If the user is logged in or not - 'pid' *string* - The current process id - 'now' *string* - The current time (security control features) - 'doctype' *string* - The selected doctype - 'description' *string* - The description of the doctype - 'docfulldesc' *string* - The title text of the page - 'snameCateg' *array* - The short names of all the categories of documents - 'lnameCateg' *array* - The long names of all the categories of documents - 'actionShortDesc' *array* - The short names (codes) for the different actions - 'indir' *array* - The directories for each of the actions - 'statustext' *array* - The names of the different action buttons """ # load the right message language _ = gettext_set_language(ln) out = "" out += """
      """ % { 'continue_explain' : _("To continue an interrupted submission, enter your access number directly in the input box."), 'doctype' : doctype, 'go' : _("go"), } return out def tmpl_warning_message(self, ln, msg): """ Produces a warning message for the specified text Parameters: - 'ln' *string* - The language to display the interface in - 'msg' *string* - The message to display """ # load the right message language _ = gettext_set_language(ln) return """
      %s
      """ % msg def tmpl_page_interface(self, ln, docname, actname, curpage, nbpages, file, nextPg, access, nbPg, doctype, act, indir, fields, javascript, images, mainmenu): """ Produces a page with the specified fields (in the submit chain) Parameters: - 'ln' *string* - The language to display the interface in - 'doctype' *string* - The document type - 'docname' *string* - The document type name - 'actname' *string* - The action name - 'act' *string* - The action - 'curpage' *int* - The current page of submitting engine - 'nbpages' *int* - The total number of pages - 'nextPg' *int* - The next page - 'access' *string* - The submission number - 'nbPg' *string* - ?? - 'indir' *string* - the directory of submitting - 'fields' *array* - the fields to display in the page, with each record having the structure: - 'fullDesc' *string* - the description of the field - 'text' *string* - the HTML code of the field - 'javascript' *string* - if the field has some associated javascript code - 'type' *string* - the type of field (T, F, I, H, D, S, R) - 'name' *string* - the name of the field - 'rows' *string* - the number of rows for textareas - 'cols' *string* - the number of columns for textareas - 'val' *string* - the default value of the field - 'size' *string* - the size for text fields - 'maxlength' *string* - the maximum length for text fields - 'htmlcode' *string* - the complete HTML code for user-defined fields - 'typename' *string* - the long name of the type - 'javascript' *string* - the javascript code to insert in the page - 'images' *string* - the path to the images - 'mainmenu' *string* - the url of the main menu """ # load the right message language _ = gettext_set_language(ln) # top menu out = """
      \n" # Display the navigation cell # Display "previous page" navigation arrows out += """
      %(docname)s   %(actname)s  """ % { 'docname' : docname, 'actname' : actname, } for i in range(1, nbpages+1): if i == int(curpage): out += """""" % curpage else: out += """""" % (i,i) out += """
         page: %s  %s   
       %(summary)s(2) 

      """ % { 'summary' : _("SUMMARY"), 'doctype' : doctype, 'act' : act, 'access' : access, 'indir' : indir, 'file' : file, 'nextPg' : nextPg, 'curpage' : curpage, 'nbPg' : nbPg, } for field in fields: if field['javascript']: out += """ """ % field['javascript']; # now displays the html form field(s) out += "%s\n%s\n" % (field['fullDesc'], field['text']) out += javascript out += "
       
       
      """ if int(curpage) != 1: out += """ """ % { 'prpage' : int(curpage) - 1, 'images' : images, 'prevpage' : _("previous page"), } else: out += """ """ # Display the submission number out += """ \n""" % { 'submission' : _("Submission no(1)"), 'access' : access, } # Display the "next page" navigation arrow if int(curpage) != int(nbpages): out += """ """ % { 'nxpage' : int(curpage) + 1, 'images' : images, 'nextpage' : _("next page"), } else: out += """ """ out += """
        %(prevpage)s %(prevpage)s  %(submission)s: %(access)s %(nextpage)s %(nextpage)s  


      %(back)s


      %(take_note)s
      %(explain_summary)s
      """ % { 'surequit' : _("Are you sure you want to quit this submission?"), 'back' : _("back to main menu"), 'mainmenu' : mainmenu, 'images' : images, 'take_note' : _("(1) you should take note of this number at the beginning of the submission, it will allow you to get your information back in case your browser crashes before the end of the submission."), 'explain_summary' : _("(2) mandatory fields appear in red in the 'Summary' window."), } return out def tmpl_submit_field(self, ln, field): """ Produces the HTML code for the specified field Parameters: - 'ln' *string* - The language to display the interface in - 'field' *array* - the field to display in the page, with the following structure: - 'javascript' *string* - if the field has some associated javascript code - 'type' *string* - the type of field (T, F, I, H, D, S, R) - 'name' *string* - the name of the field - 'rows' *string* - the number of rows for textareas - 'cols' *string* - the number of columns for textareas - 'val' *string* - the default value of the field - 'size' *string* - the size for text fields - 'maxlength' *string* - the maximum length for text fields - 'htmlcode' *string* - the complete HTML code for user-defined fields - 'typename' *string* - the long name of the type """ # load the right message language _ = gettext_set_language(ln) # If the field is a textarea if field['type'] == 'T': text="" % (field['name'],field['rows'],field['cols'],field['val']) # If the field is a file upload elif field['type'] == 'F': text="" % (field['name'],field['size'], field['maxlength']); # If the field is a text input elif field['type'] == 'I': text="" % (field['name'],field['size'],field['val']) # If the field is a hidden input elif field['type'] == 'H': text="" % (field['name'],field['val']) # If the field is user-defined elif field['type'] == 'D': text=field['htmlcode'] # If the field is a select box elif field['type'] == 'S': text=field['htmlcode'] # If the field type is not recognized else: text="%s: unknown field type" % field['typename'] return text def tmpl_page_interface_js(self, ln, upload, field, fieldhtml, txt, check, level, curdir, values, select, radio, curpage, nbpages, images, returnto): """ Produces the javascript for validation and value filling for a submit interface page Parameters: - 'ln' *string* - The language to display the interface in - 'upload' *array* - booleans if the field is a field - 'field' *array* - the fields' names - 'fieldhtml' *array* - the fields' HTML representation - 'txt' *array* - the fields' long name - 'check' *array* - if the fields should be checked (in javascript) - 'level' *array* - strings, if the fields should be filled (M) or not (O) - 'curdir' *array* - the current directory of the submission - 'values' *array* - the current values of the fields - 'select' *array* - booleans, if the controls are "select" controls - 'radio' *array* - booleans, if the controls are "radio" controls - 'curpage' *int* - the current page - 'nbpages' *int* - the total number of pages - 'images' *int* - the path to the images - 'returnto' *array* - a structure with 'field' and 'page', if a mandatory field on antoher page was not completed """ # load the right message language _ = gettext_set_language(ln) nbFields = len(upload) # if there is a file upload field, we change the encoding type out = """""" return out def tmpl_page_endaction(self, ln, weburl, file, nextPg, startPg, access, curpage, nbPg, nbpages, doctype, act, docname, actname, indir, mainmenu, finished, function_content, next_action, images): """ Produces the pages after all the fields have been submitted. Parameters: - 'ln' *string* - The language to display the interface in - 'weburl' *string* - The url of cdsware - 'doctype' *string* - The document type - 'act' *string* - The action - 'docname' *string* - The document type name - 'actname' *string* - The action name - 'curpage' *int* - The current page of submitting engine - 'startPg' *int* - The start page - 'nextPg' *int* - The next page - 'access' *string* - The submission number - 'nbPg' *string* - total number of pages - 'nbpages' *string* - number of pages (?) - 'indir' *string* - the directory of submitting - 'file' *string* - ?? - 'mainmenu' *string* - the url of the main menu - 'finished' *bool* - if the submission is finished - 'images' *string* - the path to the images - 'function_content' *string* - HTML code produced by some function executed - 'next_action' *string* - if there is another action to be completed, the HTML code for linking to it """ # load the right message language _ = gettext_set_language(ln) out = """
      """ % { 'finished' : _("finished!"), } else: for i in range(1, nbpages + 1): out += """""" % (i,i) out += """
      %(docname)s   %(actname)s  """ % { 'file' : file, 'nextPg' : nextPg, 'startPg' : startPg, 'access' : access, 'curpage' : curpage, 'nbPg' : nbPg, 'doctype' : doctype, 'act' : act, 'docname' : docname, 'actname' : actname, 'indir' : indir, 'mainmenu' : mainmenu, } if finished == 1: out += """
        %(finished)s   
         %s %(end_action)s  
       %(summary)s(2) """ % { 'end_action' : _("end of action"), 'summary' : _("SUMMARY"), 'doctype' : doctype, 'act' : act, 'access' : access, 'indir' : indir, } out += """

      %(function_content)s %(next_action)s

      """ % { 'function_content' : function_content, 'next_action' : next_action, } if finished == 0: out += """%(submission)s²: %(access)s""" % { 'submission' : _("Submission no"), 'access' : access, } else: out += " \n" out += """


      """ # Add the "back to main menu" button if finished == 0: out += """ """ % { 'surequit' : _("Are you sure you want to quit this submission?"), 'mainmenu' : mainmenu, } else: out += """ %(back)s

      """ % { 'back' : _("back to main menu"), 'images' : images, 'mainmenu' : mainmenu, } return out def tmpl_function_output(self, ln, display_on, action, doctype, step, functions): """ Produces the output of the functions. Parameters: - 'ln' *string* - The language to display the interface in - 'display_on' *bool* - If debug information should be displayed - 'doctype' *string* - The document type - 'action' *string* - The action - 'step' *int* - The current step in submission - 'functions' *aray* - HTML code produced by functions executed and informations about the functions - 'name' *string* - the name of the function - 'score' *string* - the score of the function - 'error' *bool* - if the function execution produced errors - 'text' *string* - the HTML code produced by the function """ # load the right message language _ = gettext_set_language(ln) out = "" if display_on: out += """

      %(function_list)s

      """ % { 'function_list' : _("Here is the %(action)s function list for %(doctype)s documents at level %(step)s") % { 'action' : action, 'doctype' : doctype, 'step' : step, }, 'function' : _("Function"), 'score' : _("Score"), 'running' : _("Running Function"), } for function in functions: out += """""" % { 'name' : function['name'], 'score' : function['score'], 'result' : function['error'] and (_("function %s does not exist...") % function['name'] + "
      ") or function['text'] } out += "
      %(function)s%(score)s%(running)s
      %(name)s%(score)s%(result)s
      " else: for function in functions: if not function['error']: out += function['text'] return out def tmpl_next_action(self, ln, actions): """ Produces the output of the functions. Parameters: - 'ln' *string* - The language to display the interface in - 'actions' *array* - The actions to display, in the structure - 'page' *string* - the starting page - 'action' *string* - the action (in terms of submission) - 'doctype' *string* - the doctype - 'nextdir' *string* - the path to the submission data - 'access' *string* - the submission number - 'indir' *string* - ?? - 'name' *string* - the name of the action """ # load the right message language _ = gettext_set_language(ln) out = "

      %(haveto)s

        " % { 'haveto' : _("You now have to"), } i = 0 for action in actions: if i > 0: out += " " + _("or") + " " i += 1 out += """
      • %(name)s """ % action out += "
      " return out def tmpl_filelist(self, ln, filelist, recid, docid, version): """ Displays the file list for a record. Parameters: - 'ln' *string* - The language to display the interface in - 'recid' *string* - The record id - 'docid' *string* - The document id - 'version' *string* - The version of the document - 'filelist' *string* - The HTML string of the filelist (produced by the BibDoc classes) """ # load the right message language _ = gettext_set_language(ln) title = _("record #%s") % ("%s" % (recid,recid)) if docid != "": title += _(" document #%s") % docid if version != "": title += _(" version #%s") % version out = """
      """ % (title, filelist) return out def tmpl_bibrecdoc_filelist(self, ln, types): """ Displays the file list for a record. Parameters: - 'ln' *string* - The language to display the interface in - 'types' *array* - The different types to display, each record in the format: - 'name' *string* - The name of the format - 'content' *array of string* - The HTML code produced by tmpl_bibdoc_filelist, for the right files """ # load the right message language _ = gettext_set_language(ln) out = "" for mytype in types: out += "%s %s:" % (mytype['name'], _("file(s)")) out += "
        " for content in mytype['content']: out += content out += "
      " return out def tmpl_bibdoc_filelist(self, ln, weburl, versions, imagepath, docname, id): """ Displays the file list for a record. Parameters: - 'ln' *string* - The language to display the interface in - 'weburl' *string* - The url of cdsware - 'versions' *array* - The different versions to display, each record in the format: - 'version' *string* - The version - 'content' *string* - The HTML code produced by tmpl_bibdocfile_filelist, for the right file - 'previous' *bool* - If the file has previous versions - 'imagepath' *string* - The path to the image of the file - 'docname' *string* - The name of the document - 'id' *int* - The id of the document """ # load the right message language _ = gettext_set_language(ln) out = """""" % { 'imagepath' : imagepath, 'docname' : docname } for version in versions: if version['previous']: versiontext = """
      (%(see)s %(previous)s)""" % { 'see' : _("see"), 'weburl' : weburl, 'id' : id, 'previous': _("previous"), } else: versiontext = "" out += """" out += "" return out def tmpl_bibdocfile_filelist(self, ln, weburl, id, name, selfformat, version, format, size): """ Displays a file in the file list. Parameters: - 'ln' *string* - The language to display the interface in - 'weburl' *string* - The url of cdsware - 'id' *int* - The id of the document - 'name' *string* - The name of the file - 'selfformat' *string* - The format to pass in parameter - 'version' *string* - The version - 'format' *string* - The display format - 'size' *string* - The size of the file """ # load the right message language _ = gettext_set_language(ln) return """ %(name)s%(format)s [%(size)s B] """ % { 'weburl' : weburl, 'docid' : id, 'quotedname' : urllib.quote(name), 'selfformat' : urllib.quote(selfformat), 'version' : version, 'name' : name, 'format' : format, 'size' : size } def tmpl_submit_summary (self, ln, values, images): """ Displays the summary for the submit procedure. Parameters: - 'ln' *string* - The language to display the interface in - 'values' *array* - The values of submit. Each of the records contain the following fields: - 'name' *string* - The name of the field - 'mandatory' *bool* - If the field is mandatory or not - 'value' *string* - The inserted value - 'page' *int* - The submit page on which the field is entered - 'images' *string* - the path to the images """ # load the right message language _ = gettext_set_language(ln) out = """""" % \ { 'images' : images } for value in values: if value['mandatory']: color = "red" else: color = "" out += """""" % { 'color' : color, 'name' : value['name'], 'value' : value['value'], 'page' : value['page'] } out += "
      %(name)s %(value)s
      " return out def tmpl_yoursubmissions(self, ln, images, weburl, order, doctypes, submissions): """ Displays the list of the user's submissions. Parameters: - 'ln' *string* - The language to display the interface in - 'images' *string* - the path to the images - 'weburl' *string* - The url of cdsware - 'order' *string* - The ordering parameter - 'doctypes' *array* - All the available doctypes, in structures: - 'id' *string* - The doctype id - 'name' *string* - The display name of the doctype - 'selected' *bool* - If the doctype should be selected - 'submissions' *array* - The available submissions, in structures: - 'docname' *string* - The document name - 'actname' *string* - The action name - 'status' *string* - The status of the document - 'cdate' *string* - Creation date - 'mdate' *string* - Modification date - 'id' *string* - The id of the submission - 'reference' *string* - The display name of the doctype - 'pending' *bool* - If the submission is pending - 'act' *string* - The action code - 'doctype' *string* - The doctype code """ # load the right message language _ = gettext_set_language(ln) out = "" out += """
      " return out def tmpl_yourapprovals(self, ln, referees): """ Displays the doctypes and categories for which the user is referee Parameters: - 'ln' *string* - The language to display the interface in - 'referees' *array* - All the doctypes for which the user is referee: - 'doctype' *string* - The doctype - 'docname' *string* - The display name of the doctype - 'categories' *array* - The specific categories for which the user is referee: - 'id' *string* - The category id - 'name' *string* - The display name of the category """ # load the right message language _ = gettext_set_language(ln) out = """ " return out def tmpl_publiline_selectdoctype(self, ln, docs): """ Displays the doctypes that the user can select Parameters: - 'ln' *string* - The language to display the interface in - 'docs' *array* - All the doctypes that the user can select: - 'doctype' *string* - The doctype - 'docname' *string* - The display name of the doctype """ # load the right message language _ = gettext_set_language(ln) out = """ """ return out def tmpl_publiline_selectcateg(self, ln, doctype, title, categories, images): """ Displays the categories from a doctype that the user can select Parameters: - 'ln' *string* - The language to display the interface in - 'doctype' *string* - The doctype - 'title' *string* - The doctype name - 'images' *string* - the path to the images - 'categories' *array* - All the categories that the user can select: - 'id' *string* - The id of the category - 'waiting' *int* - The number of documents waiting - 'approved' *int* - The number of approved documents - 'rejected' *int* - The number of rejected documents """ # load the right message language _ = gettext_set_language(ln) out = """ """ % { 'key' : _("Key"), 'pending' : _("pending"), 'images' : images, 'waiting' : _("waiting for approval"), 'approved' : _("approved"), 'already_approved' : _("already approved"), 'rejected' : _("rejected"), 'rejected_text' : _("rejected"), 'already_approved' : _("already approved"), 'somepending' : _("some documents are pending"), } return out def tmpl_publiline_selectdocument(self, ln, doctype, title, categ, images, docs): """ Displays the documents that the user can select in the specified category Parameters: - 'ln' *string* - The language to display the interface in - 'doctype' *string* - The doctype - 'title' *string* - The doctype name - 'images' *string* - the path to the images - 'categ' *string* - the category - 'docs' *array* - All the categories that the user can select: - 'RN' *string* - The id of the document - 'status' *string* - The status of the document """ # load the right message language _ = gettext_set_language(ln) out = """ """ return out def tmpl_publiline_displaydoc(self, ln, doctype, docname, categ, rn, status, dFirstReq, dLastReq, dAction, access, images, accessurl, confirm_send, auth_code, auth_message, authors, title, sysno, newrn): """ Displays the categories from a doctype that the user can select Parameters: - 'ln' *string* - The language to display the interface in - 'doctype' *string* - The doctype - 'docname' *string* - The doctype name - 'categ' *string* - the category - 'rn' *string* - The document RN (id number) - 'status' *string* - The status of the document - 'dFirstReq' *string* - The date of the first approval request - 'dLastReq' *string* - The date of the last approval request - 'dAction' *string* - The date of the last action (approval or rejection) - 'images' *string* - the path to the images - 'accessurl' *string* - the URL of the publications - 'confirm_send' *bool* - must display a confirmation message about sending approval email - 'auth_code' *bool* - authorised to referee this document - 'auth_message' *string* - ??? - 'authors' *string* - the authors of the submission - 'title' *string* - the title of the submission - 'sysno' *string* - the unique database id for the record - 'newrn' *string* - the record number assigned to the submission """ # load the right message language _ = gettext_set_language(ln) if status == "waiting": image = """""" % images elif status == "approved": image = """""" % images elif status == "rejected": image = """""" % images else: image = "" out = """ """ return out diff --git a/modules/websubmit/web/approve.py b/modules/websubmit/web/approve.py index 28b74a6ef..d7ee72433 100644 --- a/modules/websubmit/web/approve.py +++ b/modules/websubmit/web/approve.py @@ -1,73 +1,73 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## import interesting modules: import string import os import sys import time import types import re +from mod_python import apache from cdsware.config import cdsname,cdslang from cdsware.dbquery import run_sql from cdsware.access_control_engine import acc_authorize_action from cdsware.access_control_admin import acc_isRole from cdsware.websubmit_config import * from cdsware.webpage import page, create_error_box from cdsware.webuser import getUid, get_email, page_not_authorized from cdsware.messages import * -from mod_python import apache from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_SITE def index(req,c=cdsname,ln=cdslang): uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../approve.py/index") ln = wash_language(ln) form = req.form if form.keys(): access = form.keys()[0] if access == "": return errorMsg("approve.py: cannot determine document reference",req) res = run_sql("select doctype,rn from sbmAPPROVAL where access=%s",(access,)) if len(res) == 0: return errorMsg("approve.py: cannot find document in database",req) else: doctype = res[0][0] rn = res[0][1] res = run_sql("select value from sbmPARAMETERS where name='edsrn' and doctype=%s",(doctype,)) edsrn = res[0][0] url = "%s/sub.py?%s=%s&password=%s@APP%s" % (urlpath,edsrn,rn,access,doctype) req.err_headers_out.add("Location", url) raise apache.SERVER_RETURN, apache.HTTP_MOVED_PERMANENTLY return "" else: return errorMsg("Sorry parameter missing...", req, c, ln) def errorMsg(title,req,c=cdsname,ln=cdslang): return page(title="error", body = create_error_box(req, title=title,verbose=0, ln=ln), description="%s - Internal Error" % c, keywords="%s, CDSware, Internal Error" % c, language=ln, urlargs=req.args) diff --git a/modules/websubmit/web/direct.py b/modules/websubmit/web/direct.py index 523780df2..8bc0383c3 100644 --- a/modules/websubmit/web/direct.py +++ b/modules/websubmit/web/direct.py @@ -1,86 +1,86 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## import interesting modules: import string import os import sys import time import types import re +from mod_python import apache from cdsware.config import cdsname,cdslang from cdsware.dbquery import run_sql from cdsware.access_control_engine import acc_authorize_action from cdsware.access_control_admin import acc_isRole from cdsware.websubmit_config import * from cdsware.webpage import page, create_error_box from cdsware.webuser import getUid, get_email, page_not_authorized from cdsware.messages import * -from mod_python import apache from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_SITE def index(req,c=cdsname,ln=cdslang,sub=""): uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../direct.py/index") myQuery = req.args if sub == "": return errorMsg("Sorry parameter missing...",req) res = run_sql("select docname,actname from sbmIMPLEMENT where subname=%s", (sub,)) if len(res)==0: return errorMsg("Sorry. Can't analyse parameter",req) else: # get document type doctype = res[0][0] # get action name action = res[0][1] # retrieve other parameter values params = re.sub("sub=[^&]*","",myQuery) # find existing access number result = re.search("access=([^&]*)",params) if result != None: access = result.group(1) params = re.sub("access=[^&]*","",params) else: # create 'unique' access number pid = os.getpid() now = time.time() access = "%i_%s" % (now,pid) # retrieve 'dir' value res = run_sql ("select dir from sbmACTION where sactname=%s",(action,)) dir = res[0][0] try: mainmenu = req.headers_in['Referer'] except: mainmenu = "" url = "submit.py?doctype=%s&dir=%s&access=%s&act=%s&startPg=1%s&mainmenu=%s" % (doctype,dir,access,action,params,mainmenu) req.err_headers_out.add("Location", url) raise apache.SERVER_RETURN, apache.HTTP_MOVED_PERMANENTLY return "" def errorMsg(title,req,c=cdsname,ln=cdslang): return page(title="error", body = create_error_box(req, title=title,verbose=0, ln=ln), description="%s - Internal Error" % c, keywords="%s, CDSware, Internal Error" % c, language=ln, urlargs=req.args) diff --git a/modules/websubmit/web/getfile.py b/modules/websubmit/web/getfile.py index 31b68ca3e..c37fd7717 100644 --- a/modules/websubmit/web/getfile.py +++ b/modules/websubmit/web/getfile.py @@ -1,114 +1,113 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## import interesting modules: import string import os import time import types import re from mod_python import apache - import sys + from cdsware.config import cdsname,cdslang from cdsware.access_control_engine import acc_authorize_action from cdsware.access_control_admin import acc_isRole from cdsware.webpage import page, create_error_box from cdsware.webuser import getUid, get_email, page_not_authorized from cdsware.messages import * from cdsware.websubmit_config import * from cdsware.file import * from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_SITE from cdsware.messages import gettext_set_language - import cdsware.template websubmit_templates = cdsware.template.load('websubmit') def index(req,c=cdsname,ln=cdslang,recid="",docid="",version="",name="",format=""): # load the right message language _ = gettext_set_language(ln) # get user ID: try: uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../getfile.py/index") uid_email = get_email(uid) except MySQLdb.Error, e: return errorMsg(e.value,req) docfiles = [] t="" filelist="" ip=str(req.get_remote_host(apache.REMOTE_NOLOOKUP)) # if a precise file is requested, we stream it if name!="": if docid=="": return errorMsg(_("Parameter docid missing"), req, c, ln) else: doc = BibDoc(bibdocid=docid) docfile=doc.getFile(name,format,version) if docfile == None: return warningMsg(_("can't find file..."),req, c, ln) else: res = doc.registerDownload(ip, version, format, uid) return docfile.stream(req) # all files attached to a record elif recid!="": bibarchive = BibRecDocs(recid) filelist = bibarchive.display(docid, version, ln = ln) # a precise filename elif docid!="": bibdoc = BibDoc(bibdocid=docid) recid = bibdoc.getRecid() filelist = bibdoc.display(version, ln = ln) t = websubmit_templates.tmpl_filelist( ln = ln, recid = recid, docid = docid, version = version, filelist = filelist, ) p_navtrail = _("Access to Fulltext") return page(title="", body=t, navtrail = p_navtrail, description="", keywords="keywords", uid=uid, language=ln, urlargs=req.args ) def errorMsg(title,req,c=cdsname,ln=cdslang): return page(title="error", body = create_error_box(req, title=title,verbose=0, ln=ln), description="%s - Internal Error" % c, keywords="%s, CDSware, Internal Error" % c, language=ln, urlargs=req.args) def warningMsg(title,req,c=cdsname,ln=cdslang): return page(title="warning", body = title, description="%s - Internal Error" % c, keywords="%s, CDSware, Internal Error" % c, language=ln, urlargs=req.args) diff --git a/modules/websubmit/web/publiline.py b/modules/websubmit/web/publiline.py index 898ea01ba..ef2eebb56 100644 --- a/modules/websubmit/web/publiline.py +++ b/modules/websubmit/web/publiline.py @@ -1,361 +1,362 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## import interesting modules: import string import os import sys import time import types import re import MySQLdb import shutil + from cdsware.config import cdsname,cdslang,supportemail,pylibdir from cdsware.dbquery import run_sql from cdsware.access_control_engine import acc_authorize_action from cdsware.access_control_admin import * from cdsware.webpage import page, create_error_box from cdsware.webuser import getUid, get_email, list_registered_users, page_not_authorized from cdsware.messages import * from cdsware.websubmit_config import * from cdsware.search_engine import search_pattern from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_SITE execfile("%s/cdsware/websubmit_functions/Retrieve_Data.py" % pylibdir) execfile("%s/cdsware/websubmit_functions/mail.py" % pylibdir) from cdsware.messages import gettext_set_language import cdsware.template websubmit_templates = cdsware.template.load('websubmit') def index(req,c=cdsname,ln=cdslang,doctype="",categ="",RN="",send=""): global uid ln = wash_language(ln) # load the right message language _ = gettext_set_language(ln) t="" # get user ID: try: uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../publiline.py/index") uid_email = get_email(uid) except MySQLdb.Error, e: return errorMsg(e.value,req, ln = ln) if doctype == "": t = selectDoctype(ln) elif categ == "": t = selectCateg(doctype, ln) elif RN == "": t = selectDocument(doctype,categ, ln) else: t = displayDocument(doctype,categ,RN,send, ln) return page(title="publication line", navtrail= """%(account)s""" % { 'weburl' : weburl, 'account' : _("Your Account"), }, body=t, description="", keywords="", uid=uid, language=ln, urlargs=req.args) def selectDoctype(ln = cdslang): res = run_sql("select DISTINCT doctype from sbmAPPROVAL") docs = [] for row in res: res2 = run_sql("select ldocname from sbmDOCTYPE where sdocname=%s", (row[0],)) docs.append({ 'doctype' : row[0], 'docname' : res2[0][0], }) t = websubmit_templates.tmpl_publiline_selectdoctype( ln = ln, docs = docs, ) return t def selectCateg(doctype, ln = cdslang): t="" res = run_sql("select ldocname from sbmDOCTYPE where sdocname=%s",(doctype,)) title = res[0][0] sth = run_sql("select * from sbmCATEGORIES where doctype=%s order by lname",(doctype,)) if len(sth) == 0: categ = "unknown" return selectDocument(doctype,categ, ln = ln) categories = [] for arr in sth: waiting = 0 rejected = 0 approved = 0 sth2 = run_sql("select COUNT(*) from sbmAPPROVAL where doctype=%s and categ=%s and status='waiting'", (doctype,arr[1],)) waiting = sth2[0][0] sth2 = run_sql("select COUNT(*) from sbmAPPROVAL where doctype=%s and categ=%s and status='approved'",(doctype,arr[1],)) approved = sth2[0][0] sth2 = run_sql("select COUNT(*) from sbmAPPROVAL where doctype=%s and categ=%s and status='rejected'",(doctype,arr[1],)) rejected = sth2[0][0] categories.append({ 'waiting' : waiting, 'approved' : approved, 'rejected' : rejected, 'id' : arr[1], }) t = websubmit_templates.tmpl_publiline_selectcateg( ln = ln, categories = categories, doctype = doctype, title = title, images = images, ) return t def selectDocument(doctype,categ, ln = cdslang): t="" res = run_sql("select ldocname from sbmDOCTYPE where sdocname=%s", (doctype,)) title = res[0][0] if categ == "": categ == "unknown" docs = [] sth = run_sql("select rn,status from sbmAPPROVAL where doctype=%s and categ=%s order by status DESC,rn DESC",(doctype,categ)) for arr in sth: docs.append({ 'RN' : arr[0], 'status' : arr[1], }) t = websubmit_templates.tmpl_publiline_selectdocument( ln = ln, doctype = doctype, title = title, categ = categ, images = images, docs = docs, ) return t def displayDocument(doctype,categ,RN,send, ln = cdslang): # load the right message language _ = gettext_set_language(ln) t="" res = run_sql("select ldocname from sbmDOCTYPE where sdocname=%s", (doctype,)) docname = res[0][0] if categ == "": categ = "unknown" sth = run_sql("select rn,status,dFirstReq,dLastReq,dAction,access from sbmAPPROVAL where rn=%s",(RN,)) if len(sth) > 0: arr = sth[0] rn = arr[0] status = arr[1] dFirstReq = arr[2] dLastReq = arr[3] dAction = arr[4] access = arr[5] else: return warningMsg(_("This document has never been requested for approval!") + "
       ", ln = ln) (authors,title,sysno,newrn) = getInfo(doctype,categ,RN) confirm_send = 0 if send == _("Send Again"): if authors == "unknown" or title == "unknown": SendWarning(doctype,categ,RN,title,authors,access, ln = ln) else: # @todo - send in different languages SendEnglish(doctype,categ,RN,title,authors,access,sysno) run_sql("update sbmAPPROVAL set dLastReq=NOW() where rn=%s",(RN,)) confirm_send = 1 if status == "waiting": (auth_code, auth_message) = acc_authorize_action(uid, "referee",verbose=0,doctype=doctype, categ=categ) else: (auth_code, auth_message) = (None, None) t = websubmit_templates.tmpl_publiline_displaydoc( ln = ln, docname = docname, doctype = doctype, categ = categ, rn = rn, status = status, dFirstReq = dFirstReq, dLastReq = dLastReq, dAction = dAction, access = access, images = images, accessurl = accessurl, confirm_send = confirm_send, auth_code = auth_code, auth_message = auth_message, authors = authors, title = title, sysno = sysno, newrn = newrn, ) return t # Retrieve info about document def getInfo(doctype,categ,RN): result = getInPending(doctype,categ,RN) if not result: result = getInAlice(doctype,categ,RN) return result #seek info in pending directory def getInPending(doctype,categ,RN): PENDIR="%s/pending" % storage if os.path.exists("%s/%s/%s/AU" % (PENDIR,doctype,RN)): fp = open("%s/%s/%s/AU" % (PENDIR,doctype,RN),"r") authors=fp.read() fp.close() else: authors = "" if os.path.exists("%s/%s/%s/TI" % (PENDIR,doctype,RN)): fp = open("%s/%s/%s/TI" % (PENDIR,doctype,RN),"r") title=fp.read() fp.close() else: title = "" if os.path.exists("%s/%s/%s/SN" % (PENDIR,doctype,RN)): fp = open("%s/%s/%s/SN" % (PENDIR,doctype,RN),"r") sysno=fp.read() fp.close() else: sysno = "" if title == "" and os.path.exists("%s/%s/%s/TIF" % (PENDIR,doctype,RN)): fp = open("%s/%s/%s/TIF" % (PENDIR,doctype,RN),"r") title=fp.read() fp.close() if title == "": return 0 else: return (authors,title,sysno,"") #seek info in Alice database def getInAlice(doctype,categ,RN): # initialize sysno variable sysno = "" searchresults = search_pattern(req=None, p=RN, f="reportnumber").items().tolist() if len(searchresults) == 0: return 0 sysno = searchresults[0] if sysno != "": title = Get_Field('245__a',sysno) emailvalue = Get_Field('8560_f',sysno) authors = Get_Field('100__a',sysno) authors += "\n%s" % Get_Field('700__a',sysno) newrn = Get_Field('037__a',sysno) return (authors,title,sysno,newrn) else: return 0 def SendEnglish(doctype,categ,RN,title,authors,access,sysno): FROMADDR = '%s Submission Engine <%s>' % (cdsname,supportemail) # retrieve useful information from webSubmit configuration res = run_sql("select value from sbmPARAMETERS where name='categformatDAM' and doctype=%s", (doctype,)) categformat = res[0][0] categformat = re.sub("","([^-]*)",categformat) categs = re.match(categformat,RN) if categs != None: categ = categs.group(1) else: categ = "unknown" res = run_sql("select value from sbmPARAMETERS where name='addressesDAM' and doctype=%s",(doctype,)) if len(res) > 0: otheraddresses = res[0][0] otheraddresses = otheraddresses.replace("",categ) else: otheraddresses = "" # Build referee's email address refereeaddress = "" # Try to retrieve the referee's email from the referee's database for user in acc_getRoleUsers(acc_getRoleId("referee_%s_%s" % (doctype,categ))): refereeaddress += user[1] + "," # And if there are general referees for user in acc_getRoleUsers(acc_getRoleId("referee_%s_*" % doctype)): refereeaddress += user[1] + "," refereeaddress = re.sub(",$","",refereeaddress) # Creation of the mail for the referee addresses = "" if refereeaddress != "": addresses = refereeaddress + "," if otheraddresses != "": addresses += otheraddresses else: addresses = re.sub(",$","",addresses) if addresses=="": SendWarning(doctype,categ,RN,title,authors,access) return 0 if authors == "": authors = "-" res = run_sql("select value from sbmPARAMETERS where name='directory' and doctype=%s", (doctype,)) directory = res[0][0] message = """ The document %s has been published as a Communication. Your approval is requested for it to become an official Note. Title: %s Author(s): %s To access the document(s), select the file(s) from the location: <%s/getfile.py?recid=%s> To approve/reject the document, you should go to this URL: <%s/approve.py?%s> --------------------------------------------- Best regards. The submission team.""" % (RN,title,authors,urlpath,sysno,urlpath,access) # send the mail body = forge_email(FROMADDR,addresses,adminemail,"Request for Approval of %s" % RN,message) send_email(FROMADDR,addresses,body,0) return "" def SendWarning(doctype,categ,RN,title,authors,access): FROMADDR = '%s Submission Engine <%s>' % (cdsname,supportemail) message = "Failed sending approval email request for %s" % RN # send the mail body = forge_email(FROMADDR,adminemail,"","Failed sending approval email request",message) send_email(FROMADDR,adminemail,body,0) return "" def errorMsg(title,req,c=cdsname,ln=cdslang): return page(title="error", body = create_error_box(req, title=title,verbose=0, ln=ln), description="%s - Internal Error" % c, keywords="%s, CDSware, Internal Error" % c, language=ln, urlargs=req.args) def warningMsg(title,req,c=cdsname,ln=cdslang): return page(title="warning", body = title, description="%s - Internal Error" % c, keywords="%s, CDSware, Internal Error" % c, language=ln, urlargs=req.args) diff --git a/modules/websubmit/web/sub.py b/modules/websubmit/web/sub.py index baa55823c..2c2e7ed80 100644 --- a/modules/websubmit/web/sub.py +++ b/modules/websubmit/web/sub.py @@ -1,56 +1,56 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## import interesting modules: import string import os import sys import time import types import re +from mod_python import apache from cdsware.config import cdsname,cdslang from cdsware.access_control_engine import acc_authorize_action from cdsware.access_control_admin import acc_isRole from cdsware.websubmit_config import * from cdsware.webpage import page, create_error_box from cdsware.webuser import getUid, get_email, page_not_authorized from cdsware.messages import * -from mod_python import apache from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_SITE def index(req,c=cdsname,ln=cdslang): uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../sub.py/index") myQuery = req.args if myQuery: if re.search("@",myQuery): param = re.sub("@.*","",myQuery) IN = re.sub(".*@","",myQuery) else: IN = myQuery url = "%s/direct.py?sub=%s&%s" % (urlpath,IN,param) req.err_headers_out.add("Location", url) raise apache.SERVER_RETURN, apache.HTTP_MOVED_PERMANENTLY return "" else: return "Illegal page access" diff --git a/modules/websubmit/web/submit.py b/modules/websubmit/web/submit.py index f3b4a235f..4e898d468 100644 --- a/modules/websubmit/web/submit.py +++ b/modules/websubmit/web/submit.py @@ -1,54 +1,54 @@ ## $Id$ ## ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## import interesting modules: import string import os import sys import time import types import re +from mod_python import apache from cdsware.config import cdsname,cdslang from cdsware.access_control_engine import acc_authorize_action from cdsware.access_control_admin import acc_isRole from cdsware.webpage import page, create_error_box from cdsware.webuser import getUid, get_email, page_not_authorized from cdsware.messages import * -from mod_python import apache from cdsware.websubmit_config import * from cdsware.websubmit_engine import * from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_SITE def index(req,c=cdsname,ln=cdslang, doctype="", act="", startPg=1, indir="", access="",mainmenu="",fromdir="",file="",nextPg="",nbPg="",curpage=1,step=0,mode="U"): uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../submit.py/index") if doctype=="": return home(req,c,ln) elif act=="": return action(req,c,ln,doctype) elif int(step)==0: return interface(req,c,ln, doctype, act, startPg, indir, access,mainmenu,fromdir,file,nextPg,nbPg,curpage) else: return endaction(req,c,ln, doctype, act, startPg, indir, access,mainmenu,fromdir,file,nextPg,nbPg,curpage,step,mode) diff --git a/modules/websubmit/web/summary.py b/modules/websubmit/web/summary.py index 771538934..9cbc02e94 100644 --- a/modules/websubmit/web/summary.py +++ b/modules/websubmit/web/summary.py @@ -1,76 +1,75 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## import interesting modules: import string import os import sys import time from cdsware.config import cdsname,cdslang from cdsware.dbquery import run_sql from cdsware.access_control_engine import acc_authorize_action from cdsware.websubmit_config import * from cdsware.webpage import page, create_error_box from cdsware.webuser import getUid,get_email, page_not_authorized from cdsware.messages import * from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_SITE from cdsware.messages import gettext_set_language - import cdsware.template websubmit_templates = cdsware.template.load('websubmit') def index(req,doctype="",act="",access="",indir="", ln=cdslang): uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../summary.py/index") t="" curdir = "%s/%s/%s/%s" % (storage,indir,doctype,access) subname = "%s%s" % (act,doctype) res = run_sql("select sdesc,fidesc,pagenb,level from sbmFIELD where subname=%s order by pagenb,fieldnb", (subname,)) nbFields = 0 values = [] for arr in res: if arr[0] != "": val = { 'mandatory' : (arr[3] == 'M'), 'value' : '', 'page' : arr[2], 'name' : arr[0], } if os.path.exists("%s/%s" % (curdir,arr[1])): fd = open("%s/%s" % (curdir,arr[1]),"r") value = fd.read() fd.close() value = value.replace("\n"," ") value = value.replace("Select:","") else: value = "" val['value'] = value values.append(val) return websubmit_templates.tmpl_submit_summary( ln = ln, values = values, images = images, ) diff --git a/modules/websubmit/web/yourapprovals.py b/modules/websubmit/web/yourapprovals.py index a5067d485..5c41cd2f0 100644 --- a/modules/websubmit/web/yourapprovals.py +++ b/modules/websubmit/web/yourapprovals.py @@ -1,110 +1,110 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## import interesting modules: import os import sys + from cdsware.config import weburl,cdsname,cdslang from cdsware.dbquery import run_sql from cdsware.access_control_engine import acc_authorize_action from cdsware.access_control_admin import * from cdsware.webpage import page, create_error_box from cdsware.webuser import getUid, get_email, list_registered_users, page_not_authorized from cdsware.messages import * from cdsware.websubmit_config import * from cdsware.search_engine import search_pattern from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_SITE from cdsware.messages import gettext_set_language - import cdsware.template websubmit_templates = cdsware.template.load('websubmit') def index(req,c=cdsname,ln=cdslang,order="",doctype="",deletedId="",deletedAction="",deletedDoctype=""): global uid ln = wash_language(ln) # load the right message language _ = gettext_set_language(ln) t="" # get user ID: try: uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../yourapprovals.py/index") u_email = get_email(uid) except MySQLdb.Error, e: return errorMsg(e.value,req, ln = ln) res = run_sql("select sdocname,ldocname from sbmDOCTYPE") referees = [] for row in res: doctype = row[0] docname = row[1] reftext = "" if isReferee(uid,doctype,"*"): res2 = run_sql("select sname,lname from sbmCATEGORIES where doctype=%s",(doctype,)) categories = [] for row2 in res2: category = row2[0] categname = row2[1] if isReferee(uid,doctype,category): categories.append({ 'id' : category, 'name' : categname, }) referees.append({ 'doctype' : doctype, 'docname' : docname, 'categories' : categories }) t = websubmit_templates.tmpl_yourapprovals( ln = ln, referees = referees ) return page(title=_("Your Approvals"), navtrail= """%(account)s""" % { 'weburl' : weburl, 'account' : _("Your Account"), }, body=t, description="", keywords="", uid=uid, language=ln, urlargs=req.args) def isReferee(uid,doctype="",categ=""): (auth_code, auth_message) = acc_authorize_action(uid, "referee",verbose=0,doctype=doctype, categ=categ) if auth_code == 0: return 1 else: return 0 def errorMsg(title,req,c=cdsname,ln=cdslang): return page(title="error", body = create_error_box(req, title=title,verbose=0, ln=ln), description="%s - Internal Error" % c, keywords="%s, CDSware, Internal Error" % c, language=ln, urlargs=req.args) diff --git a/modules/websubmit/web/yoursubmissions.py b/modules/websubmit/web/yoursubmissions.py index ca6ec28de..e5f8be896 100644 --- a/modules/websubmit/web/yoursubmissions.py +++ b/modules/websubmit/web/yoursubmissions.py @@ -1,204 +1,204 @@ ## $Id$ ## This file is part of the CERN Document Server Software (CDSware). ## Copyright (C) 2002, 2003, 2004, 2005 CERN. ## ## The CDSware is free software; you can redistribute it and/or ## modify it under the terms of the GNU General Public License as ## published by the Free Software Foundation; either version 2 of the ## License, or (at your option) any later version. ## ## The CDSware is distributed in the hope that it will be useful, but ## WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ## General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with CDSware; if not, write to the Free Software Foundation, Inc., ## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. ## import interesting modules: import string import os import sys import time import types import re import MySQLdb import shutil import operator + from cdsware.config import weburl,cdsname,cdslang from cdsware.dbquery import run_sql from cdsware.access_control_engine import acc_authorize_action from cdsware.access_control_admin import * from cdsware.webpage import page, create_error_box from cdsware.webuser import getUid, get_email, list_registered_users, page_not_authorized from cdsware.messages import * from cdsware.websubmit_config import * from cdsware.search_engine import search_pattern from cdsware.access_control_config import CFG_ACCESS_CONTROL_LEVEL_SITE from cdsware.messages import gettext_set_language - import cdsware.template websubmit_templates = cdsware.template.load('websubmit') def index(req,c=cdsname,ln=cdslang,order="",doctype="",deletedId="",deletedAction="",deletedDoctype=""): global uid ln = wash_language(ln) # load the right message language _ = gettext_set_language(ln) t="" # get user ID: try: uid = getUid(req) if uid == -1 or CFG_ACCESS_CONTROL_LEVEL_SITE >= 1: return page_not_authorized(req, "../yoursubmissions.py/index") u_email = get_email(uid) except MySQLdb.Error, e: return errorMsg(e.value, req, ln) if u_email == "guest" or u_email == "": return warningMsg(websubmit_templates.tmpl_warning_message( ln = ln, msg = _("You first have to login before using this feature. Use the left menu to log in."), ),req, ln = ln) if deletedId != "": t += deleteSubmission(deletedId,deletedAction,deletedDoctype,u_email) # doctypes res = run_sql("select ldocname,sdocname from sbmDOCTYPE order by ldocname") doctypes = [] for row in res: doctypes.append({ 'id' : row[1], 'name' : row[0], 'selected' : (doctype == row[1]), }) # submissions # request order default value reqorder = "sbmSUBMISSIONS.md DESC, lactname" # requested value if order == "actiondown": reqorder = "lactname ASC, sbmSUBMISSIONS.md DESC" elif order == "actionup": reqorder = "lactname DESC, sbmSUBMISSIONS.md DESC" elif order == "refdown": reqorder = "reference ASC, sbmSUBMISSIONS.md DESC, lactname DESC" elif order == "refup": reqorder = "reference DESC, sbmSUBMISSIONS.md DESC, lactname DESC" elif order == "cddown": reqorder = "sbmSUBMISSIONS.cd DESC, lactname" elif order == "cdup": reqorder = "sbmSUBMISSIONS.cd ASC, lactname" elif order == "mddown": reqorder = "sbmSUBMISSIONS.md DESC, lactname" elif order == "mdup": reqorder = "sbmSUBMISSIONS.md ASC, lactname" elif order == "statusdown": reqorder = "sbmSUBMISSIONS.status DESC, lactname" elif order == "statusup": reqorder = "sbmSUBMISSIONS.status ASC, lactname" if doctype != "": docselect = " and doctype='%s' " % doctype else: docselect = "" res = run_sql("SELECT sbmSUBMISSIONS.* FROM sbmSUBMISSIONS,sbmACTION WHERE sactname=action and email=%s and id!='' "+docselect+" ORDER BY doctype,"+reqorder,(u_email,)) currentdoctype = "" currentaction = "" currentstatus = "" submissions = [] for row in res: if currentdoctype != row[1]: currentdoctype = row[1] currentaction = "" currentstatus = "" res2 = run_sql("SELECT ldocname FROM sbmDOCTYPE WHERE sdocname=%s",(currentdoctype,)) if res2: ldocname = res2[0][0] else: ldocname = """***Unknown Document Type - (%s)""" % (currentdoctype,) if currentaction != row[2]: currentaction = row[2] res2 = run_sql("SELECT lactname FROM sbmACTION WHERE sactname=%s",(currentaction,)) if res2: lactname = res2[0][0] else: lactname = "\"" else: lactname = "\"" if currentstatus != row[3]: currentstatus = row[3] status=row[3] else: status = "\"" submissions.append({ 'docname' : ldocname, 'actname' : lactname, 'status' : status, 'cdate' : row[6], 'mdate' : row[7], 'reference' : row[5], 'id' : row[4], 'act' : currentaction, 'doctype' : currentdoctype, 'pending' : (row[3] == "pending") }) # display t += websubmit_templates.tmpl_yoursubmissions( ln = ln, weburl = weburl, images = images, order = order, doctypes = doctypes, submissions = submissions, ) return page(title="Your Submissions", navtrail= """%(account)s""" % { 'weburl' : weburl, 'account' : _("Your Account"), }, body=t, description="", keywords="", uid=uid, language=ln, urlargs=req.args) def deleteSubmission(id, action, doctype, u_email): global storage run_sql("delete from sbmSUBMISSIONS WHERE doctype=%s and action=%s and email=%s and status='pending' and id=%s",(doctype,action,u_email,id,)) res = run_sql("select dir from sbmACTION where sactname=%s",(action,)) dir = res[0][0] if re.search("\.\.",doctype) == None and re.search("\.\.",id) == None and id != "": if os.path.exists("%s/%s/%s/%s" % (storage,dir,doctype,id)): os.rmdir("%s/%s/%s/%s" % (storage,dir,doctype,id)) return "" def warningMsg(title,req,c=cdsname,ln=cdslang): return page(title="warning", body = title, description="%s - Internal Error" % c, keywords="%s, CDSware, Internal Error" % c, language=ln, urlargs=req.args) def errorMsg(title,req,c=cdsname,ln=cdslang): return page(title="error", body = create_error_box(req, title=title,verbose=0, ln=ln), description="%s - Internal Error" % c, keywords="%s, CDSware, Internal Error" % c, language=ln, urlargs=req.args)